14 KiB
TASK.md — Bug Fixes from v0.12.4–v0.13.1 Code Review
Version: v0.14.1 → v0.14.2
Type: Bug fixes identified by thorough code review of today's changes (sessions 40–48)
Scope: Controller Go code — internal/backup/, internal/web/
Bug Summary
| # | Severity | File | Description |
|---|---|---|---|
| 1 | HIGH | crossdrive.go |
rsync --delete destroys _db/ and _config/ directories on every single-mount run |
| 2 | MEDIUM | backup.go |
Scheduled backups don't set m.running flag — restore/manual backup can overlap |
| 3 | MEDIUM | crossdrive.go |
ValidateDestination silently succeeds when disk usage can't be read |
| 4 | MEDIUM | backup.go |
Empty systemDataPath silently produces relative dump paths |
Bug 1 (HIGH): rsync --delete destroys _db/ and _config/ directories
File
internal/backup/crossdrive.go, function runRsyncBackup, lines 242–276
Root Cause
When an app has exactly one HDD mount (len(mounts) == 1), rsync copies directly into destDir/ with the --delete flag. The _db/ and _config/ subdirectories — created by the DB dump copy (lines 278–286) and config rsync (lines 288–303) from the previous backup run — don't exist in the source mount, so --delete removes them from the destination before the current run re-creates them.
Impact
- Brief window of incomplete backup on every run — between rsync completion and the subsequent
_db/_configcopy steps, the backup destination is missing DB dumps and config - Data loss if interrupted — if the process is killed (OOM, power loss, context cancellation) between the rsync step and the copy steps,
_db/and_config/are gone until the next successful complete run - Wasted I/O — every run deletes and re-creates these directories unnecessarily
Reproduction
- Deploy Immich with a single HDD mount to
/mnt/hdd_1/storage/immich - Configure cross-drive rsync backup to
/mnt/sys_drive - Run backup twice
- After first run:
ls /mnt/sys_drive/backups/secondary/immich/rsync/shows_db/,_config/, and app data - During second run's rsync phase:
_db/and_config/disappear from dest - After second run completes:
_db/and_config/reappear (re-created)
Fix Instructions
In runRsyncBackup(), add an --exclude flag with pattern _* to the rsync command. This prevents --delete from touching any controller-managed directories (which all use underscore prefix: _db, _config, and any future ones) while still cleaning up stale user data.
Change the rsync command construction (around line 267) from:
cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete",
"--exclude", "backups/*.sql.gz",
"--exclude", "backups/*.sql",
"--exclude", "backups/*.dump",
src, dst)
To:
cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete",
"--exclude", "_*",
"--exclude", "backups/*.sql.gz",
"--exclude", "backups/*.sql",
"--exclude", "backups/*.dump",
src, dst)
Why _* instead of listing _db and _config individually:
- Future-proof — if we ever add another controller-managed subdirectory (e.g.,
_metadata), it's automatically protected - The underscore prefix is a convention for controller-managed dirs; no real app data directory starts with
_ - Simpler to maintain (one pattern vs growing list)
Why this is safe: The _db and _config directories are managed exclusively by the controller (lines 278–303), never by the user's app. No user data directory starts with _. The underscore prefix convention was specifically chosen to avoid collision with app data paths.
Note on _infra: The _infra/ directory lives at <dest>/backups/secondary/_infra/ (a sibling of app directories, NOT inside any app's rsync target directory). So it's not affected by --delete regardless. The _* exclude is still good practice in case the directory structure changes later.
For the multi-mount case (len(mounts) > 1): rsync goes into per-leaf subdirectories, not the root destDir, so _db/ and _config/ are siblings (not children) of the rsync targets. The --delete flag only affects the target subtree, so they're safe in the multi-mount case. Adding the exclude universally doesn't hurt.
Bug 2 (MEDIUM): Scheduled backups don't set m.running flag
File
internal/backup/backup.go, functions RunBackup() (line 272) and RunDBDumps() (line 181)
Root Cause
Only RunFullBackup() (line 454) sets m.running = true before running. The scheduler in main.go calls RunBackup() and RunDBDumps() directly (not via RunFullBackup()), so m.running is never set during nightly scheduled runs.
Code that checks m.running:
RestoreApp()inrestore.go:25— blocks if runningRunFullBackup()inbackup.go:456— blocks if runningIsRunning()inbackup.go:496— used by API and UI to show status
Impact
- UI shows "not running" during nightly backups — the dashboard backup card and backup page don't indicate that a backup is in progress
- Restore can overlap with nightly backup — a user triggering "Restore" at 03:05 while the nightly restic backup is running won't see a "backup in progress" error. Both operations compete for the restic repo lock; one will fail with an opaque error
- Manual "Backup now" can overlap with scheduled backup —
RunFullBackup()checksm.running, but since the scheduledRunBackup()doesn't set it, the manual trigger proceeds and both run concurrently
Fix Instructions
Add m.running guard to both RunBackup() and RunDBDumps(). The pattern is the same as in RunFullBackup():
In RunBackup() (line 272), add at the beginning of the function:
func (m *Manager) RunBackup(ctx context.Context) error {
m.mu.Lock()
if m.running {
m.mu.Unlock()
return fmt.Errorf("backup already in progress")
}
m.running = true
m.mu.Unlock()
defer func() {
m.mu.Lock()
m.running = false
m.mu.Unlock()
}()
start := time.Now()
// ... rest of existing function
In RunDBDumps() (line 181), add the same guard:
func (m *Manager) RunDBDumps(ctx context.Context) error {
m.mu.Lock()
if m.running {
m.mu.Unlock()
return fmt.Errorf("backup already in progress")
}
m.running = true
m.mu.Unlock()
defer func() {
m.mu.Lock()
m.running = false
m.mu.Unlock()
}()
start := time.Now()
// ... rest of existing function
IMPORTANT: RunFullBackup() calls RunDBDumps() then RunBackup() internally. After adding the guards above, RunFullBackup() would deadlock on itself (it sets m.running=true, then calls RunDBDumps() which also tries to set it). Fix by either:
Option A (recommended): Extract the logic into internal methods that DON'T check the flag:
// Public methods — set the running guard
func (m *Manager) RunDBDumps(ctx context.Context) error {
if err := m.acquireRunning(); err != nil {
return err
}
defer m.releaseRunning()
return m.runDBDumpsInternal(ctx)
}
func (m *Manager) RunBackup(ctx context.Context) error {
if err := m.acquireRunning(); err != nil {
return err
}
defer m.releaseRunning()
return m.runBackupInternal(ctx)
}
func (m *Manager) RunFullBackup(ctx context.Context) error {
if err := m.acquireRunning(); err != nil {
return err
}
defer m.releaseRunning()
if err := m.runDBDumpsInternal(ctx); err != nil {
m.logger.Printf("[WARN] DB dump had errors, continuing with backup anyway")
}
return m.runBackupInternal(ctx)
}
// Helper methods
func (m *Manager) acquireRunning() error {
m.mu.Lock()
defer m.mu.Unlock()
if m.running {
return fmt.Errorf("backup already in progress")
}
m.running = true
return nil
}
func (m *Manager) releaseRunning() {
m.mu.Lock()
m.running = false
m.mu.Unlock()
}
// Internal methods — no guard, caller must hold running flag
func (m *Manager) runDBDumpsInternal(ctx context.Context) error {
// ... current RunDBDumps body (without the running guard)
}
func (m *Manager) runBackupInternal(ctx context.Context) error {
// ... current RunBackup body (without the running guard)
}
Option B (simpler): Only add the guard to RunBackup() and RunDBDumps(), but make RunFullBackup() NOT call them — instead inline the logic. This duplicates code but avoids the refactor.
Option A is recommended — it's cleaner and ensures all entry points are guarded.
Bug 3 (MEDIUM): ValidateDestination silently succeeds when disk usage can't be read
File
internal/backup/crossdrive.go, function ValidateDestination, line 215
Root Cause
if di := system.GetDiskUsage(path); di != nil {
// Space checks only execute if di != nil
...
}
return nil // ← Success even when di was nil (space unknown)
If system.GetDiskUsage(path) returns nil (e.g., unsupported filesystem like FUSE/NFS, permission issue, or a path on a virtual filesystem), all space checks are skipped and validation passes without any space verification.
Impact
A destination with zero free space could pass validation, leading to a backup failure mid-operation. The rsync or restic process would fail with "no space left on device", but only after partial work, leaving an incomplete backup.
Fix Instructions
After the GetDiskUsage check, add a warning log and optionally block:
di := system.GetDiskUsage(path)
if di == nil {
r.logger.Printf("[WARN] Cannot determine disk usage for %s — proceeding without space verification", path)
return nil
}
This preserves backward compatibility (doesn't block) but makes the situation visible in logs. If you want to be stricter, return an error instead:
if di == nil {
return fmt.Errorf("destination %s: cannot determine disk usage", path)
}
Recommended: Use the warning-only approach for now. A destination that can't report disk usage is unusual and worth logging, but blocking it could break setups with exotic filesystems (CIFS mounts, etc.).
Bug 4 (MEDIUM): Empty systemDataPath produces relative dump paths
File
internal/backup/backup.go, function DumpStackDB, line 570
Root Cause
drivePath := m.GetAppDrivePath(stackName) // Returns "" if stackProvider is nil AND systemDataPath is ""
dumpDir := AppDBDumpPath(drivePath, stackName) // AppDBDumpPath("", "mealie") = "backups/primary/mealie/db-dumps" (RELATIVE!)
If systemDataPath is empty (misconfiguration or missing system_data_path in controller.yaml) and the stack has no HDD_PATH, GetAppDrivePath() returns an empty string. AppDBDumpPath("", stackName) then produces a relative path instead of an absolute one. The DumpOne() function creates directories relative to the process's working directory (typically /opt/docker/felhom-controller/), orphaning the dumps where nothing else looks for them.
The same issue exists in RunDBDumps() (line 212) and RunBackup() (line 308, 321).
Impact
- DB dumps written to unexpected location
- Restic backup doesn't find them (looks in the expected absolute path)
- Backup appears to succeed but DB dumps are orphaned
- Restore would not have fresh DB dumps
Fix Instructions
Add a validation check in GetAppDrivePath():
func (m *Manager) GetAppDrivePath(stackName string) string {
if m.stackProvider != nil {
if hddPath := m.stackProvider.GetStackHDDPath(stackName); hddPath != "" {
return hddPath
}
}
if m.systemDataPath == "" {
m.logger.Printf("[ERROR] systemDataPath is empty — cannot determine drive for %s", stackName)
}
return m.systemDataPath
}
Also add a startup validation in NewManager():
func NewManager(cfg *config.Config, pinger *monitor.Pinger, sett *settings.Settings, logger *log.Logger) *Manager {
if cfg.Paths.SystemDataPath == "" {
logger.Printf("[WARN] SystemDataPath is empty in config — SSD-only apps will not have correct backup paths")
}
return &Manager{
// ... existing fields
}
}
And in DumpStackDB(), add a guard before using the path:
drivePath := m.GetAppDrivePath(stackName)
if drivePath == "" || !filepath.IsAbs(drivePath) {
return fmt.Errorf("cannot determine absolute drive path for %s (systemDataPath not configured?)", stackName)
}
dumpDir := AppDBDumpPath(drivePath, stackName)
Testing Checklist
After fixing all bugs, verify:
- Bug 1: Run cross-drive rsync backup twice for a single-mount app →
_db/and_config/persist between runs (not deleted by--delete) - Bug 1: Run cross-drive rsync backup for a multi-mount app → behavior unchanged
- Bug 2: During nightly scheduled backup, UI shows "Mentés folyamatban" on dashboard
- Bug 2: During nightly scheduled backup, "Visszaállítás" button shows "already in progress" error
- Bug 2:
RunFullBackup()still works correctly (calls internal methods, doesn't deadlock) - Bug 2: Scheduled
db-dumpandbackupjobs don't deadlock with each other (they run at different times, but if one overruns, the next should get "already in progress" error) - Bug 3: Cross-drive backup to a path where
GetDiskUsagereturns nil → logs a warning (not a silent pass) - Bug 4: With
system_data_path: ""in config → startup log warning +DumpStackDB()returns error instead of using relative path - Build succeeds:
go build ./...
Implementation Notes
- Do NOT change any template files, CSS, or UI text — only Go backend files
- Do NOT change any function signatures that are part of the public API (other packages import them)
- Do NOT change the scheduler wiring in
main.go— the fix should be in the backup package - Run
go vet ./...andgo build ./...after all changes to verify no compilation errors - Keep all log messages in English (UI text is Hungarian, but log messages are English)
Files to Modify
| File | Bug(s) | Changes |
|---|---|---|
internal/backup/crossdrive.go |
1, 3 | Add rsync --exclude flags; add nil check for GetDiskUsage |
internal/backup/backup.go |
2, 4 | Extract acquireRunning/releaseRunning + internal methods; add systemDataPath validation |