# TASK.md — Bug Fixes from v0.12.4–v0.13.1 Code Review **Version:** v0.14.1 → v0.14.2 **Type:** Bug fixes identified by thorough code review of today's changes (sessions 40–48) **Scope:** Controller Go code — `internal/backup/`, `internal/web/` --- ## Bug Summary | # | Severity | File | Description | |---|----------|------|-------------| | 1 | **HIGH** | `crossdrive.go` | rsync `--delete` destroys `_db/` and `_config/` directories on every single-mount run | | 2 | **MEDIUM** | `backup.go` | Scheduled backups don't set `m.running` flag — restore/manual backup can overlap | | 3 | **MEDIUM** | `crossdrive.go` | `ValidateDestination` silently succeeds when disk usage can't be read | | 4 | **MEDIUM** | `backup.go` | Empty `systemDataPath` silently produces relative dump paths | --- ## Bug 1 (HIGH): rsync `--delete` destroys `_db/` and `_config/` directories ### File `internal/backup/crossdrive.go`, function `runRsyncBackup`, lines 242–276 ### Root Cause When an app has exactly one HDD mount (`len(mounts) == 1`), rsync copies directly into `destDir/` with the `--delete` flag. The `_db/` and `_config/` subdirectories — created by the DB dump copy (lines 278–286) and config rsync (lines 288–303) from the **previous** backup run — don't exist in the source mount, so `--delete` removes them from the destination **before** the current run re-creates them. ### Impact 1. **Brief window of incomplete backup** on every run — between rsync completion and the subsequent `_db`/`_config` copy steps, the backup destination is missing DB dumps and config 2. **Data loss if interrupted** — if the process is killed (OOM, power loss, context cancellation) between the rsync step and the copy steps, `_db/` and `_config/` are gone until the next successful complete run 3. **Wasted I/O** — every run deletes and re-creates these directories unnecessarily ### Reproduction 1. Deploy Immich with a single HDD mount to `/mnt/hdd_1/storage/immich` 2. Configure cross-drive rsync backup to `/mnt/sys_drive` 3. Run backup twice 4. After first run: `ls /mnt/sys_drive/backups/secondary/immich/rsync/` shows `_db/`, `_config/`, and app data 5. During second run's rsync phase: `_db/` and `_config/` disappear from dest 6. After second run completes: `_db/` and `_config/` reappear (re-created) ### Fix Instructions In `runRsyncBackup()`, add an `--exclude` flag with pattern `_*` to the rsync command. This prevents `--delete` from touching any controller-managed directories (which all use underscore prefix: `_db`, `_config`, and any future ones) while still cleaning up stale user data. **Change the rsync command construction (around line 267) from:** ```go cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete", "--exclude", "backups/*.sql.gz", "--exclude", "backups/*.sql", "--exclude", "backups/*.dump", src, dst) ``` **To:** ```go cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete", "--exclude", "_*", "--exclude", "backups/*.sql.gz", "--exclude", "backups/*.sql", "--exclude", "backups/*.dump", src, dst) ``` **Why `_*` instead of listing `_db` and `_config` individually:** - Future-proof — if we ever add another controller-managed subdirectory (e.g., `_metadata`), it's automatically protected - The underscore prefix is a convention for controller-managed dirs; no real app data directory starts with `_` - Simpler to maintain (one pattern vs growing list) **Why this is safe:** The `_db` and `_config` directories are managed exclusively by the controller (lines 278–303), never by the user's app. No user data directory starts with `_`. The underscore prefix convention was specifically chosen to avoid collision with app data paths. **Note on `_infra`:** The `_infra/` directory lives at `/backups/secondary/_infra/` (a sibling of app directories, NOT inside any app's rsync target directory). So it's not affected by `--delete` regardless. The `_*` exclude is still good practice in case the directory structure changes later. **For the multi-mount case** (`len(mounts) > 1`): rsync goes into per-leaf subdirectories, not the root `destDir`, so `_db/` and `_config/` are siblings (not children) of the rsync targets. The `--delete` flag only affects the target subtree, so they're safe in the multi-mount case. Adding the exclude universally doesn't hurt. --- ## Bug 2 (MEDIUM): Scheduled backups don't set `m.running` flag ### File `internal/backup/backup.go`, functions `RunBackup()` (line 272) and `RunDBDumps()` (line 181) ### Root Cause Only `RunFullBackup()` (line 454) sets `m.running = true` before running. The scheduler in `main.go` calls `RunBackup()` and `RunDBDumps()` directly (not via `RunFullBackup()`), so `m.running` is never set during nightly scheduled runs. Code that checks `m.running`: - `RestoreApp()` in `restore.go:25` — blocks if running - `RunFullBackup()` in `backup.go:456` — blocks if running - `IsRunning()` in `backup.go:496` — used by API and UI to show status ### Impact 1. **UI shows "not running" during nightly backups** — the dashboard backup card and backup page don't indicate that a backup is in progress 2. **Restore can overlap with nightly backup** — a user triggering "Restore" at 03:05 while the nightly restic backup is running won't see a "backup in progress" error. Both operations compete for the restic repo lock; one will fail with an opaque error 3. **Manual "Backup now" can overlap with scheduled backup** — `RunFullBackup()` checks `m.running`, but since the scheduled `RunBackup()` doesn't set it, the manual trigger proceeds and both run concurrently ### Fix Instructions Add `m.running` guard to both `RunBackup()` and `RunDBDumps()`. The pattern is the same as in `RunFullBackup()`: **In `RunBackup()` (line 272), add at the beginning of the function:** ```go func (m *Manager) RunBackup(ctx context.Context) error { m.mu.Lock() if m.running { m.mu.Unlock() return fmt.Errorf("backup already in progress") } m.running = true m.mu.Unlock() defer func() { m.mu.Lock() m.running = false m.mu.Unlock() }() start := time.Now() // ... rest of existing function ``` **In `RunDBDumps()` (line 181), add the same guard:** ```go func (m *Manager) RunDBDumps(ctx context.Context) error { m.mu.Lock() if m.running { m.mu.Unlock() return fmt.Errorf("backup already in progress") } m.running = true m.mu.Unlock() defer func() { m.mu.Lock() m.running = false m.mu.Unlock() }() start := time.Now() // ... rest of existing function ``` **IMPORTANT:** `RunFullBackup()` calls `RunDBDumps()` then `RunBackup()` internally. After adding the guards above, `RunFullBackup()` would deadlock on itself (it sets `m.running=true`, then calls `RunDBDumps()` which also tries to set it). Fix by either: **Option A (recommended):** Extract the logic into internal methods that DON'T check the flag: ```go // Public methods — set the running guard func (m *Manager) RunDBDumps(ctx context.Context) error { if err := m.acquireRunning(); err != nil { return err } defer m.releaseRunning() return m.runDBDumpsInternal(ctx) } func (m *Manager) RunBackup(ctx context.Context) error { if err := m.acquireRunning(); err != nil { return err } defer m.releaseRunning() return m.runBackupInternal(ctx) } func (m *Manager) RunFullBackup(ctx context.Context) error { if err := m.acquireRunning(); err != nil { return err } defer m.releaseRunning() if err := m.runDBDumpsInternal(ctx); err != nil { m.logger.Printf("[WARN] DB dump had errors, continuing with backup anyway") } return m.runBackupInternal(ctx) } // Helper methods func (m *Manager) acquireRunning() error { m.mu.Lock() defer m.mu.Unlock() if m.running { return fmt.Errorf("backup already in progress") } m.running = true return nil } func (m *Manager) releaseRunning() { m.mu.Lock() m.running = false m.mu.Unlock() } // Internal methods — no guard, caller must hold running flag func (m *Manager) runDBDumpsInternal(ctx context.Context) error { // ... current RunDBDumps body (without the running guard) } func (m *Manager) runBackupInternal(ctx context.Context) error { // ... current RunBackup body (without the running guard) } ``` **Option B (simpler):** Only add the guard to `RunBackup()` and `RunDBDumps()`, but make `RunFullBackup()` NOT call them — instead inline the logic. This duplicates code but avoids the refactor. **Option A is recommended** — it's cleaner and ensures all entry points are guarded. --- ## Bug 3 (MEDIUM): `ValidateDestination` silently succeeds when disk usage can't be read ### File `internal/backup/crossdrive.go`, function `ValidateDestination`, line 215 ### Root Cause ```go if di := system.GetDiskUsage(path); di != nil { // Space checks only execute if di != nil ... } return nil // ← Success even when di was nil (space unknown) ``` If `system.GetDiskUsage(path)` returns `nil` (e.g., unsupported filesystem like FUSE/NFS, permission issue, or a path on a virtual filesystem), all space checks are skipped and validation passes without any space verification. ### Impact A destination with zero free space could pass validation, leading to a backup failure mid-operation. The rsync or restic process would fail with "no space left on device", but only after partial work, leaving an incomplete backup. ### Fix Instructions After the `GetDiskUsage` check, add a warning log and optionally block: ```go di := system.GetDiskUsage(path) if di == nil { r.logger.Printf("[WARN] Cannot determine disk usage for %s — proceeding without space verification", path) return nil } ``` This preserves backward compatibility (doesn't block) but makes the situation visible in logs. If you want to be stricter, return an error instead: ```go if di == nil { return fmt.Errorf("destination %s: cannot determine disk usage", path) } ``` **Recommended:** Use the warning-only approach for now. A destination that can't report disk usage is unusual and worth logging, but blocking it could break setups with exotic filesystems (CIFS mounts, etc.). --- ## Bug 4 (MEDIUM): Empty `systemDataPath` produces relative dump paths ### File `internal/backup/backup.go`, function `DumpStackDB`, line 570 ### Root Cause ```go drivePath := m.GetAppDrivePath(stackName) // Returns "" if stackProvider is nil AND systemDataPath is "" dumpDir := AppDBDumpPath(drivePath, stackName) // AppDBDumpPath("", "mealie") = "backups/primary/mealie/db-dumps" (RELATIVE!) ``` If `systemDataPath` is empty (misconfiguration or missing `system_data_path` in controller.yaml) and the stack has no HDD_PATH, `GetAppDrivePath()` returns an empty string. `AppDBDumpPath("", stackName)` then produces a **relative path** instead of an absolute one. The `DumpOne()` function creates directories relative to the process's working directory (typically `/opt/docker/felhom-controller/`), orphaning the dumps where nothing else looks for them. The same issue exists in `RunDBDumps()` (line 212) and `RunBackup()` (line 308, 321). ### Impact - DB dumps written to unexpected location - Restic backup doesn't find them (looks in the expected absolute path) - Backup appears to succeed but DB dumps are orphaned - Restore would not have fresh DB dumps ### Fix Instructions Add a validation check in `GetAppDrivePath()`: ```go func (m *Manager) GetAppDrivePath(stackName string) string { if m.stackProvider != nil { if hddPath := m.stackProvider.GetStackHDDPath(stackName); hddPath != "" { return hddPath } } if m.systemDataPath == "" { m.logger.Printf("[ERROR] systemDataPath is empty — cannot determine drive for %s", stackName) } return m.systemDataPath } ``` Also add a startup validation in `NewManager()`: ```go func NewManager(cfg *config.Config, pinger *monitor.Pinger, sett *settings.Settings, logger *log.Logger) *Manager { if cfg.Paths.SystemDataPath == "" { logger.Printf("[WARN] SystemDataPath is empty in config — SSD-only apps will not have correct backup paths") } return &Manager{ // ... existing fields } } ``` And in `DumpStackDB()`, add a guard before using the path: ```go drivePath := m.GetAppDrivePath(stackName) if drivePath == "" || !filepath.IsAbs(drivePath) { return fmt.Errorf("cannot determine absolute drive path for %s (systemDataPath not configured?)", stackName) } dumpDir := AppDBDumpPath(drivePath, stackName) ``` --- ## Testing Checklist After fixing all bugs, verify: - [ ] **Bug 1:** Run cross-drive rsync backup twice for a single-mount app → `_db/` and `_config/` persist between runs (not deleted by `--delete`) - [ ] **Bug 1:** Run cross-drive rsync backup for a multi-mount app → behavior unchanged - [ ] **Bug 2:** During nightly scheduled backup, UI shows "Mentés folyamatban" on dashboard - [ ] **Bug 2:** During nightly scheduled backup, "Visszaállítás" button shows "already in progress" error - [ ] **Bug 2:** `RunFullBackup()` still works correctly (calls internal methods, doesn't deadlock) - [ ] **Bug 2:** Scheduled `db-dump` and `backup` jobs don't deadlock with each other (they run at different times, but if one overruns, the next should get "already in progress" error) - [ ] **Bug 3:** Cross-drive backup to a path where `GetDiskUsage` returns nil → logs a warning (not a silent pass) - [ ] **Bug 4:** With `system_data_path: ""` in config → startup log warning + `DumpStackDB()` returns error instead of using relative path - [ ] Build succeeds: `go build ./...` --- ## Implementation Notes - **Do NOT change** any template files, CSS, or UI text — only Go backend files - **Do NOT change** any function signatures that are part of the public API (other packages import them) - **Do NOT change** the scheduler wiring in `main.go` — the fix should be in the backup package - **Run `go vet ./...` and `go build ./...`** after all changes to verify no compilation errors - Keep all log messages in English (UI text is Hungarian, but log messages are English) --- ## Files to Modify | File | Bug(s) | Changes | |------|--------|---------| | `internal/backup/crossdrive.go` | 1, 3 | Add rsync `--exclude` flags; add nil check for GetDiskUsage | | `internal/backup/backup.go` | 2, 4 | Extract `acquireRunning`/`releaseRunning` + internal methods; add systemDataPath validation |