Files
deploy-felhom-compose/TASK.md
T

26 KiB
Raw Blame History

TASK.md — Backup Architecture Overhaul (v0.12.7)

Prompt (copy-paste this into Claude Code)

Read TASK.md for the full plan. Apply all code changes described, then build and deploy.
After all fixes are done:
1. Run `go build ./...` and `go vet ./...` from the controller/ directory — fix any errors
2. Update CHANGELOG.md with a new entry at the top (session 43, v0.12.7)
3. Update controller/README.md backup section with the new architecture
4. Commit, build, and deploy following the workflow in CLAUDE.md

Context and Goals

The backup system has accumulated an unnecessary Enabled flag that gates user data inclusion in the nightly restic backup. This was a design mistake — user data backup must be mandatory for all apps that have HDD data. Without it, the service isn't viable.

This task removes the Enabled gate, makes HDD data backup automatic, and fixes the restore feature to show all apps.

Backup architecture (3-2-1 rule):

  1. Nightly restic (mandatory, same drive) — DB dumps + config + ALL user data (HDD). Every app with data gets backed up automatically. This protects against accidental deletion and enables point-in-time restore. The backup lives on the same drive as the app, so it does NOT protect against drive failure.

  2. Cross-drive backup (opt-in, different device) — rsync or restic to a secondary physical drive. This is the second copy that protects against drive failure. When not configured for an app with HDD data, the UI warns that data is only on one drive.

  3. Remote backup (future) — offsite copy for disaster recovery. Not implemented yet.

Current problems:

  • AppBackupPrefs.Enabled flag gates HDD inclusion in nightly restic — should be automatic
  • The flag gets out of sync with cross-drive config
  • Restore dropdown only shows apps with HasHDDData && BackupEnabled — excludes most apps
  • "Docker kötetek: Auto ✓" in UI implies volumes are separately backed up — they're not
  • Cross-drive backup doesn't trigger a DB dump first

Fix 1: Make HDD data backup mandatory

Remove the Enabled flag as a gate. All apps with HDD data are always included in the nightly restic backup. No toggle, no opt-in.

1a: Rewrite resolveAppBackupPaths() — include ALL HDD paths

File: internal/backup/backup.go, function resolveAppBackupPaths (line 429)

Current:

func (m *Manager) resolveAppBackupPaths() []string {
    if m.stackProvider == nil || m.settings == nil {
        return nil
    }
    appBackupMap := m.settings.GetAppBackupMap()
    if len(appBackupMap) == 0 {
        return nil
    }

    var paths []string
    seen := make(map[string]bool)

    for stackName, enabled := range appBackupMap {
        if !enabled {
            continue
        }
        hddMounts := m.stackProvider.GetStackHDDMounts(stackName)
        for _, mount := range hddMounts {
            if seen[mount] {
                continue
            }
            if _, err := os.Stat(mount); err == nil {
                paths = append(paths, mount)
                seen[mount] = true
                m.logger.Printf("[DEBUG] Including app data: %s (from %s)", mount, stackName)
            }
        }
    }
    return paths
}

Replace with:

// resolveAppBackupPaths returns HDD paths for ALL deployed apps.
// User data backup is mandatory — every app with HDD mounts is included.
func (m *Manager) resolveAppBackupPaths() []string {
    if m.stackProvider == nil {
        return nil
    }

    var paths []string
    seen := make(map[string]bool)

    for _, stack := range m.stackProvider.ListDeployedStacks() {
        hddMounts := m.stackProvider.GetStackHDDMounts(stack.Name)
        for _, mount := range hddMounts {
            if seen[mount] {
                continue
            }
            if _, err := os.Stat(mount); err == nil {
                paths = append(paths, mount)
                seen[mount] = true
                m.logger.Printf("[DEBUG] Including app data: %s (from %s)", mount, stack.Name)
            }
        }
    }
    return paths
}

Key change: no longer reads GetAppBackupMap(), no longer checks Enabled. Iterates all deployed stacks via ListDeployedStacks() and includes every HDD mount. The m.settings dependency is also removed from this method.

1b: Remove backupPrefs parameter from DiscoverAppData()

File: internal/backup/appdata.go, function DiscoverAppData (line 61)

Current:

func DiscoverAppData(provider StackDataProvider, backupPrefs map[string]bool, discoveredDBs []DiscoveredDB) []AppBackupInfo {

and at line 100:

        info.BackupEnabled = backupPrefs[stack.Name]

Replace function signature with:

func DiscoverAppData(provider StackDataProvider, discoveredDBs []DiscoveredDB) []AppBackupInfo {

Replace line 100 with:

        // All apps with HDD data are backed up automatically (mandatory)
        info.BackupEnabled = info.HasHDDData

1c: Update RefreshCache() caller

File: internal/backup/backup.go, in RefreshCache (around line 547)

Current:

    if m.stackProvider != nil {
        backupPrefs := m.settings.GetAppBackupMap()
        status.AppDataInfo = DiscoverAppData(m.stackProvider, backupPrefs, status.DiscoveredDBs)

Replace with:

    if m.stackProvider != nil {
        status.AppDataInfo = DiscoverAppData(m.stackProvider, status.DiscoveredDBs)

1d: Delete dead settings methods

File: internal/settings/settings.go

Delete these methods entirely — they are no longer called anywhere:

  1. IsAppBackupEnabled() (lines 235243) — was used by restore.go, replaced in Fix 3f
  2. SetAppBackup() (lines 245257) — was never called outside settings.go
  3. GetAppBackupMap() (lines 259271) — was used by resolveAppBackupPaths + RefreshCache, replaced in 1a + 1c
  4. SetAppBackupBulk() (lines 273287) — was never called outside settings.go
  5. GetAppBackupPrefs() (lines 289298) — was never called outside settings.go

Keep the AppBackupPrefs struct and AppBackup field in Settings — the JSON field "app_backup" still holds CrossDrive configs. The Enabled field stays in the struct for backward compat (existing settings.json won't break on load) but nothing reads it.


Fix 2: DB dump before cross-drive backup

When cross-drive backup runs (scheduled or manual), trigger a fresh DB dump for that app's databases first. This ensures DB state matches the user data being rsynced.

2a: Define DBDumper interface

File: internal/backup/crossdrive.go, add near top (after imports):

// DBDumper can run a database dump for a specific stack.
type DBDumper interface {
    DumpStackDB(ctx context.Context, stackName string) error
}

2b: Add field to CrossDriveRunner

File: internal/backup/crossdrive.go

Add dbDumper field to the struct (do NOT change the constructor signature):

type CrossDriveRunner struct {
    sett          *settings.Settings
    stackProvider StackDataProvider
    dbDumper      DBDumper
    logger        *log.Logger
    mu            sync.Mutex
    running       map[string]bool
}

Add setter method after NewCrossDriveRunner:

// SetDBDumper sets the DB dumper for pre-backup database dumps.
// Called after backup manager is initialized (avoids circular init dependency).
func (r *CrossDriveRunner) SetDBDumper(d DBDumper) {
    r.dbDumper = d
}

2c: Implement DumpStackDB on backup Manager

File: internal/backup/backup.go, add new method:

// DumpStackDB runs a database dump for containers belonging to a specific stack.
// Used by cross-drive backup to ensure DB state matches user data.
func (m *Manager) DumpStackDB(ctx context.Context, stackName string) error {
    dbs, err := DiscoverDatabases(ctx, m.logger)
    if err != nil {
        return fmt.Errorf("database discovery failed: %w", err)
    }

    var stackDBs []DiscoveredDB
    for _, db := range dbs {
        if db.StackName == stackName {
            stackDBs = append(stackDBs, db)
        }
    }
    if len(stackDBs) == 0 {
        m.logger.Printf("[DEBUG] No databases found for stack %s — skipping pre-backup dump", stackName)
        return nil
    }

    m.logger.Printf("[INFO] Running pre-backup DB dump for %s (%d database(s))", stackName, len(stackDBs))
    results := DumpAll(ctx, stackDBs, m.cfg.Paths.DBDumpDir, m.logger)

    for _, r := range results {
        if r.Error != nil {
            return fmt.Errorf("DB dump failed for %s: %w", r.DB.ContainerName, r.Error)
        }
        m.logger.Printf("[INFO] Pre-backup DB dump OK: %s (%s)", r.DB.ContainerName, humanizeBytes(r.Size))

        // Persist validation to settings
        if m.settings != nil && r.FilePath != "" {
            filename := filepath.Base(r.FilePath)
            cache := settings.DBValidationCache{
                ValidatedAt: time.Now().Format(time.RFC3339),
                TableCount:  r.Validation.TableCount,
                HasHeader:   r.Validation.Valid,
            }
            if !r.Validation.Valid {
                cache.Error = r.Validation.Error
            }
            _ = m.settings.SetDBValidation(filename, cache)
        }
    }
    return nil
}

2d: Call DB dump in RunAppBackup

File: internal/backup/crossdrive.go, in RunAppBackup, add BEFORE the ValidateDestination call (around line 67, after the "Mark as running" block):

    // Trigger fresh DB dump for this app before cross-drive backup
    if r.dbDumper != nil {
        if err := r.dbDumper.DumpStackDB(ctx, stackName); err != nil {
            r.logger.Printf("[WARN] Pre-backup DB dump failed for %s: %v — proceeding with user data backup", stackName, err)
            // Non-fatal: user data backup is still valuable without fresh dump
        }
    }

2e: Wire up in main.go

File: cmd/controller/main.go, after both crossDriveRunner and backupMgr are created (after line 135):

    // Wire cross-drive → backup manager for pre-backup DB dumps
    if backupMgr != nil {
        crossDriveRunner.SetDBDumper(backupMgr)
    }

Fix 3: Restore dropdown shows ALL deployed apps

The restore section should show every deployed app. All apps have restic snapshots (stacks dir + DB dumps are always backed up). Apps with HDD data get full restore.

3a: Update template filter

File: internal/web/templates/backups.html, lines 450456

Current:

{{range .Backup.AppDataInfo}}
{{if and .HasHDDData .BackupEnabled}}
<option value="{{.StackName}}">{{.DisplayName}}</option>
{{end}}
{{end}}

Replace with:

{{range .Backup.AppDataInfo}}
<option value="{{.StackName}}" data-has-hdd="{{.HasHDDData}}" data-has-db="{{.HasDBDump}}">{{.DisplayName}}</option>
{{end}}

Note: no data-backup-enabled attribute — user data is always backed up when present.

3b: Show restore type info when app is selected

File: internal/web/templates/backups.html

Add a new info div after the snapshot selector (after line 464, before the warning div):

        <div id="restore-type-info" class="restore-info" style="display:none;margin-bottom:0.5rem">
        </div>

3c: Update onRestoreAppChange() JavaScript

File: internal/web/templates/backups.html, replace the onRestoreAppChange function (lines 604637) with:

function onRestoreAppChange() {
    var sel = document.getElementById('restore-app');
    var appName = sel.value;
    var snapSel = document.getElementById('restore-snapshot');
    var noSnaps = document.getElementById('restore-no-snapshots');
    var typeInfo = document.getElementById('restore-type-info');

    document.getElementById('restore-confirm-cb').checked = false;
    document.getElementById('restore-btn').disabled = true;
    noSnaps.style.display = 'none';
    typeInfo.style.display = 'none';

    if (!appName) {
        snapSel.innerHTML = '<option value="">— Válasszon alkalmazást —</option>';
        return;
    }

    // Determine restore type from data attributes
    var opt = sel.options[sel.selectedIndex];
    var hasHDD = opt.getAttribute('data-has-hdd') === 'true';
    var hasDB = opt.getAttribute('data-has-db') === 'true';

    if (hasHDD) {
        typeInfo.innerHTML = '🔄 Teljes visszaállítás: adatbázis + konfiguráció + felhasználói adatok a kiválasztott pillanatképből.';
        typeInfo.className = 'restore-info';
    } else if (hasDB) {
        typeInfo.innerHTML = ' Adatbázis és konfiguráció visszaállítása — az alkalmazásnak nincs külön felhasználói adata.';
        typeInfo.className = 'restore-info restore-info-partial';
    } else {
        typeInfo.innerHTML = ' Csak konfiguráció visszaállítása (compose fájlok, beállítások).';
        typeInfo.className = 'restore-info restore-info-partial';
    }
    typeInfo.style.display = 'block';

    snapSel.innerHTML = '<option value="">— Betöltés... —</option>';

    fetch('/api/backup/snapshots?stack=' + encodeURIComponent(appName))
        .then(function(r) { return r.json(); })
        .then(function(data) {
            snapSel.innerHTML = '<option value="">— Válasszon —</option>';
            if (data.ok && data.data && data.data.length > 0) {
                data.data.forEach(function(s) {
                    var o = document.createElement('option');
                    o.value = s.short_id;
                    o.textContent = formatSnapshot(s);
                    snapSel.appendChild(o);
                });
            } else {
                snapSel.innerHTML = '<option value="">— Nincs elérhető mentés —</option>';
                noSnaps.style.display = 'block';
            }
        });
}

Note: the JS is simpler than before — no backupEnabled check. If hasHDD is true, the data IS backed up (it's mandatory). So: hasHDD → full restore, hasDB → config+DB, else → config only.

3d: Add CSS for restore info

File: internal/web/templates/style.css, add:

.restore-info {
    padding: 0.5rem 0.75rem;
    border-radius: 6px;
    font-size: 0.85rem;
    background: rgba(59, 130, 246, 0.1);
    border: 1px solid rgba(59, 130, 246, 0.3);
    color: #93c5fd;
}
.restore-info-partial {
    background: rgba(251, 191, 36, 0.1);
    border-color: rgba(251, 191, 36, 0.3);
    color: #fcd34d;
}

3e: Update snapshot filtering API for non-HDD apps

File: internal/api/router.go, function backupSnapshots (line 448)

Current code (lines 460466):

    if stackName := req.URL.Query().Get("stack"); stackName != "" {
        mounts := r.backupMgr.GetStackHDDMounts(stackName)
        if len(mounts) > 0 {
            snapshots = filterSnapshotsByPaths(snapshots, mounts)
        }
    }

Replace with:

    if stackName := req.URL.Query().Get("stack"); stackName != "" {
        mounts := r.backupMgr.GetStackHDDMounts(stackName)
        if len(mounts) > 0 {
            // App has HDD data — filter to snapshots containing those paths
            snapshots = filterSnapshotsByPaths(snapshots, mounts)
        }
        // Apps without HDD mounts: return all snapshots (they all contain
        // the stacks dir + DB dumps which cover this app's config and database)
    }

This is effectively a no-op change (just adding a comment), since if len(mounts) > 0 already skips filtering for non-HDD apps. The comment clarifies intent.

3f: Update RestoreApp to handle all apps

File: internal/backup/restore.go

Replace the entire file with:

package backup

import (
    "fmt"
    "path/filepath"
    "regexp"
)

// snapshotIDRe validates restic snapshot IDs: 8-64 lowercase hex characters.
var snapshotIDRe = regexp.MustCompile(`^[0-9a-f]{8,64}$`)

// RestoreApp restores an app from a restic snapshot.
// All apps get config + DB dump restored. Apps with HDD data also get user data restored.
func (m *Manager) RestoreApp(stackName, snapshotID string) error {
    if m.stackProvider == nil {
        return fmt.Errorf("stack provider not configured")
    }

    // Validate snapshot ID format
    if !snapshotIDRe.MatchString(snapshotID) {
        return fmt.Errorf("invalid snapshot ID: must be 8-64 lowercase hex characters")
    }

    // Prevent concurrent operations
    m.mu.Lock()
    if m.running {
        m.mu.Unlock()
        return fmt.Errorf("backup or restore already in progress")
    }
    m.running = true
    m.mu.Unlock()
    defer func() {
        m.mu.Lock()
        m.running = false
        m.mu.Unlock()
    }()

    // Determine what to restore
    hddMounts := m.stackProvider.GetStackHDDMounts(stackName)
    hasHDD := len(hddMounts) > 0

    // Build list of paths to restore from the snapshot
    var restorePaths []string

    // Always restore the stack's config dir (compose + app.yaml + .felhom.yml)
    composePath, ok := m.stackProvider.GetStackComposePath(stackName)
    if ok {
        stackDir := filepath.Dir(composePath)
        restorePaths = append(restorePaths, stackDir)
    }

    // Restore DB dump files for this stack
    if m.cfg.Paths.DBDumpDir != "" {
        restorePaths = append(restorePaths, m.cfg.Paths.DBDumpDir)
    }

    // Restore HDD data (always included for apps that have it — backup is mandatory)
    if hasHDD {
        restorePaths = append(restorePaths, hddMounts...)
    }

    if len(restorePaths) == 0 {
        return fmt.Errorf("no restorable paths found for %s", stackName)
    }

    m.logger.Printf("[WARN] RESTORE starting: stack=%s, snapshot=%s, paths=%v, hasHDD=%v",
        stackName, snapshotID, restorePaths, hasHDD)

    // Stop the app before restore
    if err := m.stackProvider.StopStack(stackName); err != nil {
        m.logger.Printf("[WARN] RESTORE could not stop %s: %v (proceeding anyway)", stackName, err)
    }

    // Execute restore via restic
    if err := m.restic.RestoreAppData(snapshotID, restorePaths); err != nil {
        m.logger.Printf("[ERROR] RESTORE failed for %s: %v", stackName, err)
        if startErr := m.stackProvider.StartStack(stackName); startErr != nil {
            m.logger.Printf("[WARN] RESTORE could not restart %s after failure: %v", stackName, startErr)
        }
        return err
    }

    // Restart the app
    if err := m.stackProvider.StartStack(stackName); err != nil {
        m.logger.Printf("[WARN] RESTORE could not restart %s after restore: %v", stackName, err)
    }

    restoreType := "config+DB"
    if hasHDD {
        restoreType = "full (config+DB+userdata)"
    }
    m.logger.Printf("[INFO] RESTORE completed: stack=%s, snapshot=%s, type=%s", stackName, snapshotID, restoreType)
    return nil
}

Key difference from previous version: no IsAppBackupEnabled() check — HDD data is always backed up and always restorable.


Fix 4: Honest Docker volume UI label

4a: Change label in template

File: internal/web/templates/backups.html, lines 283292

Current:

                <div class="backup-layer-row">
                    <span class="layer-label">Docker kötetek</span>
                    <span class="layer-badge">Auto</span>

Replace Docker kötetek with Konfiguráció:

                <div class="backup-layer-row">
                    <span class="layer-label">Konfiguráció</span>
                    <span class="layer-badge">Auto</span>

Why: Docker named volumes (immich_postgres_data) live at /var/lib/docker/volumes/ which is NOT in the restic backup paths. DB data is protected via pg_dump (separate row). What's actually backed up is compose files + app.yaml + .felhom.yml = configuration.


Summary of all changes

Fix What File(s)
1a resolveAppBackupPaths() includes ALL deployed stacks' HDD backup.go
1b Remove backupPrefs parameter from DiscoverAppData() appdata.go
1c Update RefreshCache() caller to match new signature backup.go
1d Delete 5 dead settings methods settings.go
2a DBDumper interface crossdrive.go
2b dbDumper field + SetDBDumper on CrossDriveRunner crossdrive.go
2c DumpStackDB method on backup Manager backup.go
2d Call DB dump before cross-drive backup crossdrive.go
2e Wire dbDumper in main.go main.go
3a Remove filter from restore dropdown backups.html
3b Add restore type info div backups.html
3c Update JS to show restore type per app backups.html
3d CSS for restore info banners style.css
3e Comment clarifying snapshot API for non-HDD apps router.go
3f RestoreApp handles config+DB+HDD restore restore.go
4a Rename "Docker kötetek" → "Konfiguráció" backups.html

Files to modify (8)

  1. internal/backup/backup.go — Fix 1a + 1c + 2c
  2. internal/backup/appdata.go — Fix 1b
  3. internal/settings/settings.go — Fix 1d (delete 5 methods)
  4. internal/backup/crossdrive.go — Fix 2a + 2b + 2d
  5. internal/backup/restore.go — Fix 3f (full rewrite)
  6. internal/web/templates/backups.html — Fix 3a + 3b + 3c + 4a
  7. internal/web/templates/style.css — Fix 3d
  8. internal/api/router.go — Fix 3e
  9. cmd/controller/main.go — Fix 2e

Architecture after fix

Backup layers per app (3-2-1 rule):
┌──────────────────────────────────────────────────────┐
│ RULE 1 — Nightly restic (MANDATORY, same drive)      │
│                                                      │
│  1. Adatbázis mentés (Auto)                          │
│     - pg_dump / mysqldump → /srv/backups/db-dumps/   │
│     - Runs nightly 02:30 (all discovered DBs)        │
│     - Also runs before cross-drive backup (per-app)  │
│                                                      │
│  2. Konfiguráció (Auto)                              │
│     - compose.yml, app.yaml, .felhom.yml             │
│     - In nightly restic snapshot (always)             │
│                                                      │
│  3. Felhasználói adatok (Auto — if app has HDD data) │
│     - HDD bind mounts (photos, documents, etc.)      │
│     - Always included in nightly restic               │
│     - No toggle — backup is mandatory                 │
│                                                      │
├──────────────────────────────────────────────────────┤
│ RULE 2 — Cross-drive backup (OPT-IN, second device)  │
│                                                      │
│  - rsync or restic to a secondary physical drive      │
│  - Protects against primary drive failure             │
│  - When NOT configured: UI warns "nincs 2. másolat"  │
│  - Triggers fresh DB dump before backup               │
│                                                      │
├──────────────────────────────────────────────────────┤
│ RULE 3 — Remote backup (FUTURE)                      │
│                                                      │
│  - Offsite copy for disaster recovery                │
│  - Not implemented yet                               │
└──────────────────────────────────────────────────────┘

Restore capabilities:
┌─────────────────┬────────────┬──────────┬────────────┐
│ App type        │ DB restore │ Config   │ User data  │
├─────────────────┼────────────┼──────────┼────────────┤
│ Has HDD data    │ ✓          │ ✓        │ ✓ (always) │
│ DB only, no HDD │ ✓          │ ✓        │ n/a        │
│ No DB, no HDD   │ —          │ ✓        │ n/a        │
└─────────────────┴────────────┴──────────┴────────────┘

Note: the "DB + HDD, no backup" row from the previous version is GONE. All apps with HDD data are always backed up. The restore table is simpler.

Post-fix checklist

  • go build ./... passes
  • go vet ./... passes
  • Verify no remaining references to deleted methods (IsAppBackupEnabled, GetAppBackupMap, SetAppBackup, SetAppBackupBulk, GetAppBackupPrefs) — use grep -r to confirm
  • Update CHANGELOG.md — session 43, version v0.12.7:
    • HDD data backup is now mandatory for all apps (no opt-in toggle)
    • Removed dead Enabled flag gating from settings
    • Cross-drive: triggers pre-backup DB dump for consistency
    • Restore: all deployed apps now appear in dropdown
    • Restore: shows restore type info (full / config+DB / config only)
    • Restore: supports config+DB restore for apps without user data
    • UI: "Docker kötetek" renamed to "Konfiguráció" (accuracy)
  • Update controller/README.md — backup architecture section:
    • 3-2-1 backup rule (nightly mandatory, cross-drive opt-in, remote future)
    • Restore capabilities matrix
    • Cross-drive pre-backup DB dump flow
  • Commit, build on 192.168.0.180, deploy on 192.168.0.162
  • Verify with docker ps and docker logs
  • After deploy, verify:
    • Immich shows in restore dropdown (HDD data → full restore)
    • Gokapi/Mealie also show (DB+config restore)
    • Selecting Immich shows "Teljes visszaállítás" banner
    • Selecting Gokapi shows "Adatbázis és konfiguráció" banner
    • Manual cross-drive backup triggers DB dump first (check logs)
    • Backup status page shows "Konfiguráció" instead of "Docker kötetek"
    • Nightly restic backup includes Immich HDD paths without any toggle