Bug Fixes from v0.12.4–v0.13.1 Code Review

This commit is contained in:
2026-02-18 19:40:50 +01:00
parent fcd20eb524
commit 54dc74a525
+260 -425
View File
@@ -1,504 +1,339 @@
# TASK.md — v0.14.0 Storage & Backup Architecture Overhaul # TASK.md — Bug Fixes from v0.12.4v0.13.1 Code Review
**Version:** v0.14.0 **Version:** v0.14.1 → v0.14.2
**Type:** Architecture overhaul — storage paths, backup structure, multi-drive support **Type:** Bug fixes identified by thorough code review of today's changes (sessions 4048)
**Scope:** Controller Go code + app catalog compose files + setup scripts **Scope:** Controller Go code `internal/backup/`, `internal/web/`
**Note:** Demo node will be reinstalled from scratch — no migration needed
--- ---
## Design Overview ## Bug Summary
### New directory structure (per drive) | # | Severity | File | Description |
|---|----------|------|-------------|
Every drive mount (`/mnt/sys_drive`, `/mnt/hdd_1`, `/mnt/hdd_2`, ...) uses the same layout: | 1 | **HIGH** | `crossdrive.go` | rsync `--delete` destroys `_db/` and `_config/` directories on every single-mount run |
| 2 | **MEDIUM** | `backup.go` | Scheduled backups don't set `m.running` flag — restore/manual backup can overlap |
``` | 3 | **MEDIUM** | `crossdrive.go` | `ValidateDestination` silently succeeds when disk usage can't be read |
/mnt/<drive>/ | 4 | **MEDIUM** | `backup.go` | Empty `systemDataPath` silently produces relative dump paths |
appdata/<app>/ ← live app data (renamed from "storage")
backups/
primary/
<app>/db-dumps/ ← raw DB dumps per app (accessible for testing)
restic/ ← per-drive restic repo (all apps on this drive)
secondary/
<app>/rsync/ ← rsync copies from apps on OTHER drives
restic/ ← restic repo for secondary copies
Dokumentumok/
media/
Download/
movies/
series/
music/
audiobooks/
```
### Key rules
1. **An app's "home drive"** = the drive from its `HDD_PATH` env var, or `cfg.Paths.SystemDataPath` if no HDD_PATH
2. **Primary backup** lives on the SAME drive as the app — protects against accidental deletion, app bugs
3. **Secondary backup** lives on a DIFFERENT drive — protects against drive failure
4. **One restic repo per drive** (in both primary and secondary) — same password for all repos
5. **DB dumps** are raw SQL files per-app, always on the app's home drive, also included in restic
6. **Compose configs + controller.yaml** go into EVERY primary restic repo (small, ensures self-contained restore)
7. **`storage/``appdata/`** rename across all compose templates
8. **Filebrowser** mounts per-drive subdirectories: `media/`, `Dokumentumok/`, `backups/secondary/` (for file recovery)
--- ---
## Phase 1: Config & path helpers ## Bug 1 (HIGH): rsync `--delete` destroys `_db/` and `_config/` directories
### 1a. `internal/config/config.go` ### File
`internal/backup/crossdrive.go`, function `runRsyncBackup`, lines 242276
**Add:** ### Root Cause
- `SystemDataPath string \`yaml:"system_data_path"\`` to `PathsConfig` — default `/mnt/sys_drive` When an app has exactly one HDD mount (`len(mounts) == 1`), rsync copies directly into `destDir/` with the `--delete` flag. The `_db/` and `_config/` subdirectories — created by the DB dump copy (lines 278286) and config rsync (lines 288303) from the **previous** backup run — don't exist in the source mount, so `--delete` removes them from the destination **before** the current run re-creates them.
**Remove from struct:** ### Impact
- `BackupDir string` from PathsConfig 1. **Brief window of incomplete backup** on every run — between rsync completion and the subsequent `_db`/`_config` copy steps, the backup destination is missing DB dumps and config
- `DBDumpDir string` from PathsConfig 2. **Data loss if interrupted** — if the process is killed (OOM, power loss, context cancellation) between the rsync step and the copy steps, `_db/` and `_config/` are gone until the next successful complete run
- `ResticRepo string` from BackupConfig 3. **Wasted I/O** — every run deletes and re-creates these directories unnecessarily
**Keep:** ### Reproduction
- `ResticPasswordFile string` in BackupConfig (shared across all repos) 1. Deploy Immich with a single HDD mount to `/mnt/hdd_1/storage/immich`
- `HDDPath string` in PathsConfig (legacy, still used as default storage) 2. Configure cross-drive rsync backup to `/mnt/sys_drive`
3. Run backup twice
4. After first run: `ls /mnt/sys_drive/backups/secondary/immich/rsync/` shows `_db/`, `_config/`, and app data
5. During second run's rsync phase: `_db/` and `_config/` disappear from dest
6. After second run completes: `_db/` and `_config/` reappear (re-created)
**Update `applyDefaults()`:** ### Fix Instructions
- Remove: `d(&cfg.Paths.BackupDir, "/srv/backups")` In `runRsyncBackup()`, add an `--exclude` flag with pattern `_*` to the rsync command. This prevents `--delete` from touching any controller-managed directories (which all use underscore prefix: `_db`, `_config`, and any future ones) while still cleaning up stale user data.
- Remove: `d(&cfg.Paths.DBDumpDir, "/srv/backups/db-dumps")`
- Remove: `d(&cfg.Backup.ResticRepo, "/srv/backups/restic-repo")`
- Add: `d(&cfg.Paths.SystemDataPath, "/mnt/sys_drive")`
**Gotcha:** All code referencing `cfg.Paths.BackupDir`, `cfg.Paths.DBDumpDir`, `cfg.Backup.ResticRepo` will break. Grep for all references and update.
### 1b. New file: `internal/backup/paths.go`
Path computation helpers (pure functions, no state):
**Change the rsync command construction (around line 267) from:**
```go ```go
package backup cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete",
"--exclude", "backups/*.sql.gz",
import "path/filepath" "--exclude", "backups/*.sql",
"--exclude", "backups/*.dump",
func PrimaryBackupPath(drivePath string) string { src, dst)
return filepath.Join(drivePath, "backups", "primary")
}
func PrimaryResticRepoPath(drivePath string) string {
return filepath.Join(drivePath, "backups", "primary", "restic")
}
func AppDBDumpPath(drivePath, stackName string) string {
return filepath.Join(drivePath, "backups", "primary", stackName, "db-dumps")
}
func SecondaryBackupPath(drivePath string) string {
return filepath.Join(drivePath, "backups", "secondary")
}
func AppSecondaryRsyncPath(drivePath, stackName string) string {
return filepath.Join(drivePath, "backups", "secondary", stackName, "rsync")
}
func SecondaryResticRepoPath(drivePath string) string {
return filepath.Join(drivePath, "backups", "secondary", "restic")
}
func AppDataPath(drivePath, stackName string) string {
return filepath.Join(drivePath, "appdata", stackName)
}
``` ```
### 1c. App drive resolution **To:**
Need a method to determine which drive an app lives on. Add to the backup Manager or StackDataProvider:
```go ```go
// GetAppDrivePath returns the drive path for an app. cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete",
// Uses HDD_PATH from app.yaml if set, otherwise falls back to system data path. "--exclude", "_*",
func (m *Manager) GetAppDrivePath(stackName string) string { "--exclude", "backups/*.sql.gz",
if mounts := m.stackProvider.GetStackHDDMounts(stackName); len(mounts) > 0 { "--exclude", "backups/*.sql",
// The HDD_PATH is the mount point — extract the drive from the first mount "--exclude", "backups/*.dump",
// e.g., /mnt/hdd_1/appdata/immich → /mnt/hdd_1 src, dst)
// Actually, we need the HDD_PATH itself, not the mounts
}
return m.systemDataPath
}
``` ```
**Gotcha:** `GetStackHDDMounts` returns resolved mount paths (e.g., `/mnt/hdd_1/appdata/immich`), not the raw `HDD_PATH` value. Need a way to get the raw HDD_PATH for a stack. Options: **Why `_*` instead of listing `_db` and `_config` individually:**
- Add `GetStackHDDPath(name string) string` to `StackDataProvider` interface - Future-proof — if we ever add another controller-managed subdirectory (e.g., `_metadata`), it's automatically protected
- Or: derive the drive from mount paths by finding the common `/mnt/<drive>` prefix - The underscore prefix is a convention for controller-managed dirs; no real app data directory starts with `_`
- Best: add to StackDataProvider — clean, explicit - Simpler to maintain (one pattern vs growing list)
### 1d. StackDataProvider interface update **Why this is safe:** The `_db` and `_config` directories are managed exclusively by the controller (lines 278303), never by the user's app. No user data directory starts with `_`. The underscore prefix convention was specifically chosen to avoid collision with app data paths.
In `internal/backup/appdata.go`, add: **Note on `_infra`:** The `_infra/` directory lives at `<dest>/backups/secondary/_infra/` (a sibling of app directories, NOT inside any app's rsync target directory). So it's not affected by `--delete` regardless. The `_*` exclude is still good practice in case the directory structure changes later.
```go
type StackDataProvider interface {
// ... existing methods ...
GetStackHDDPath(name string) string // NEW: raw HDD_PATH from app.yaml
}
```
And implement in the `stackAdapter` in `main.go`. **For the multi-mount case** (`len(mounts) > 1`): rsync goes into per-leaf subdirectories, not the root `destDir`, so `_db/` and `_config/` are siblings (not children) of the rsync targets. The `--delete` flag only affects the target subtree, so they're safe in the multi-mount case. Adding the exclude universally doesn't hurt.
--- ---
## Phase 2: DB dump refactor ## Bug 2 (MEDIUM): Scheduled backups don't set `m.running` flag
### 2a. `internal/backup/backup.go` — DumpAll() ### File
`internal/backup/backup.go`, functions `RunBackup()` (line 272) and `RunDBDumps()` (line 181)
Currently dumps all DBs to one global directory (`m.cfg.Paths.DBDumpDir`). ### Root Cause
Only `RunFullBackup()` (line 454) sets `m.running = true` before running. The scheduler in `main.go` calls `RunBackup()` and `RunDBDumps()` directly (not via `RunFullBackup()`), so `m.running` is never set during nightly scheduled runs.
**Change to:** For each discovered DB, determine the app's drive, dump to `<drive>/backups/primary/<stack>/db-dumps/`. Code that checks `m.running`:
- `RestoreApp()` in `restore.go:25` — blocks if running
- `RunFullBackup()` in `backup.go:456` — blocks if running
- `IsRunning()` in `backup.go:496` — used by API and UI to show status
Key changes: ### Impact
- Remove references to `m.cfg.Paths.DBDumpDir` 1. **UI shows "not running" during nightly backups** — the dashboard backup card and backup page don't indicate that a backup is in progress
- Compute dump path per stack: `AppDBDumpPath(m.GetAppDrivePath(stack), stack)` 2. **Restore can overlap with nightly backup** — a user triggering "Restore" at 03:05 while the nightly restic backup is running won't see a "backup in progress" error. Both operations compete for the restic repo lock; one will fail with an opaque error
- Create dir if not exists before dumping 3. **Manual "Backup now" can overlap with scheduled backup**`RunFullBackup()` checks `m.running`, but since the scheduled `RunBackup()` doesn't set it, the manual trigger proceeds and both run concurrently
- Update `DumpResult` to include per-stack dump paths
### 2b. `internal/backup/backup.go` — DumpStackDB() ### Fix Instructions
Add `m.running` guard to both `RunBackup()` and `RunDBDumps()`. The pattern is the same as in `RunFullBackup()`:
Same refactor for single-stack dump (called by cross-drive before running Tier 2 backup). **In `RunBackup()` (line 272), add at the beginning of the function:**
### 2c. Status/validation
Currently `RefreshCache()` lists all dump files from one directory. Need to scan per-drive dump directories instead.
- Scan all registered drives (from settings or deployed stacks)
- For each drive, glob `<drive>/backups/primary/*/db-dumps/*.sql`
- Aggregate results
---
## Phase 3: Restic backup refactor
### 3a. `internal/backup/restic.go` — ResticManager
Currently `ResticManager` has a single `repoPath`. Need to support multiple repos.
**Option A:** Make ResticManager stateless — pass repoPath per operation.
**Option B:** Create multiple ResticManager instances.
**Recommend Option A** — cleaner for per-drive operations. Refactor all ResticManager methods to accept `repoPath` as parameter instead of using `r.repoPath`:
- `EnsureInitialized(repoPath string) error`
- `RunBackup(ctx, repoPath string, paths []string, tags []string) (*SnapshotResult, error)`
- `ListSnapshots(repoPath string) ([]SnapshotInfo, error)`
- `GetRepoStats(repoPath string) (*RepoStats, error)`
- `RunCheck(repoPath string) error`
- `RunPrune(repoPath string) error`
- etc.
Keep `r.passwordFile`, `r.cacheDir`, `r.logger` as instance fields.
### 3b. `internal/backup/backup.go` — RunBackup()
Currently:
```go
paths := []string{stacksDir, dbDumpDir, controllerYaml}
paths = append(paths, appPaths...)
// one restic backup
```
**Change to:**
```go ```go
func (m *Manager) RunBackup(ctx context.Context) error { func (m *Manager) RunBackup(ctx context.Context) error {
// Group deployed stacks by drive m.mu.Lock()
driveStacks := m.groupStacksByDrive() if m.running {
m.mu.Unlock()
return fmt.Errorf("backup already in progress")
}
m.running = true
m.mu.Unlock()
defer func() {
m.mu.Lock()
m.running = false
m.mu.Unlock()
}()
infraPaths := []string{ start := time.Now()
m.cfg.Paths.StacksDir, // ... rest of existing function
"/opt/docker/felhom-controller/controller.yaml",
}
for drivePath, stacks := range driveStacks {
repoPath := PrimaryResticRepoPath(drivePath)
m.restic.EnsureInitialized(repoPath)
var paths []string
// Always include infra (compose configs + controller.yaml) in every repo
paths = append(paths, infraPaths...)
for _, stack := range stacks {
// App data (appdata/<stack>/)
appData := AppDataPath(drivePath, stack.Name)
if _, err := os.Stat(appData); err == nil {
paths = append(paths, appData)
}
// DB dumps for this stack
dumpDir := AppDBDumpPath(drivePath, stack.Name)
if _, err := os.Stat(dumpDir); err == nil {
paths = append(paths, dumpDir)
}
}
// Tag with drive name for easy filtering
tags := []string{filepath.Base(drivePath)}
m.restic.RunBackup(ctx, repoPath, paths, tags)
}
}
``` ```
### 3c. Prune, check, forget — per drive **In `RunDBDumps()` (line 181), add the same guard:**
Currently scheduled as single jobs. Need to loop over all active drive repos:
- `RunPrune()` → for each drive, prune that drive's primary restic repo
- `RunCheck()` → same
- `RunForget()` → same
### 3d. Snapshot listing & stats — aggregate
For the backup page UI:
- `ListSnapshots()` → list from all primary repos, merge and sort by time
- `GetRepoStats()` → aggregate total size and snapshot count across repos
- Tag snapshots with drive name so UI can optionally group them
### 3e. Monitoring pings
After ALL drive backups complete (not per-drive), send the backup ping. If ANY drive fails, the ping is not sent (or sent as failure).
---
## Phase 4: Cross-drive (secondary) backup refactor
### 4a. `internal/backup/crossdrive.go` — runRsyncBackup()
**Current:** `destDir = filepath.Join(destBase, "backups", "rsync", stackName)`
**New:** `destDir = AppSecondaryRsyncPath(destBase, stackName)``<dest>/backups/secondary/<stack>/rsync/`
Update all path computations:
- `destDir` construction
- `_db/` subdirectory (now under rsync/ too)
- `_config/` subdirectory
- Size calculation path
### 4b. `internal/backup/crossdrive.go` — runResticBackup()
**Current:** `repoPath = filepath.Join(destBase, "backups", "restic")`
**New:** `repoPath = SecondaryResticRepoPath(destBase)``<dest>/backups/secondary/restic/`
### 4c. DB dump source path
Currently: `r.dbDumpDir` (global directory)
Now: per-app dump dir: `AppDBDumpPath(appDrivePath, stackName)`
The cross-drive runner needs to know the app's home drive to find its DB dumps.
- Add `GetAppDrivePath` method to CrossDriveRunner (or pass via StackDataProvider)
---
## Phase 5: Protected paths & delete safety
### 5a. `internal/stacks/delete.go` — ProtectedHDDPaths()
**Current:**
```go ```go
return map[string]bool{ func (m *Manager) RunDBDumps(ctx context.Context) error {
hddPath: true, m.mu.Lock()
filepath.Join(hddPath, "media"): true, if m.running {
filepath.Join(hddPath, "storage"): true, m.mu.Unlock()
filepath.Join(hddPath, "Dokumentumok"): true, return fmt.Errorf("backup already in progress")
filepath.Join(hddPath, "appdata"): true, }
} m.running = true
m.mu.Unlock()
defer func() {
m.mu.Lock()
m.running = false
m.mu.Unlock()
}()
start := time.Now()
// ... rest of existing function
``` ```
**Change to:** **IMPORTANT:** `RunFullBackup()` calls `RunDBDumps()` then `RunBackup()` internally. After adding the guards above, `RunFullBackup()` would deadlock on itself (it sets `m.running=true`, then calls `RunDBDumps()` which also tries to set it). Fix by either:
**Option A (recommended):** Extract the logic into internal methods that DON'T check the flag:
```go ```go
return map[string]bool{ // Public methods — set the running guard
hddPath: true, func (m *Manager) RunDBDumps(ctx context.Context) error {
filepath.Join(hddPath, "appdata"): true, if err := m.acquireRunning(); err != nil {
filepath.Join(hddPath, "backups"): true, return err
filepath.Join(hddPath, "media"): true, }
filepath.Join(hddPath, "Dokumentumok"): true, defer m.releaseRunning()
return m.runDBDumpsInternal(ctx)
}
func (m *Manager) RunBackup(ctx context.Context) error {
if err := m.acquireRunning(); err != nil {
return err
}
defer m.releaseRunning()
return m.runBackupInternal(ctx)
}
func (m *Manager) RunFullBackup(ctx context.Context) error {
if err := m.acquireRunning(); err != nil {
return err
}
defer m.releaseRunning()
if err := m.runDBDumpsInternal(ctx); err != nil {
m.logger.Printf("[WARN] DB dump had errors, continuing with backup anyway")
}
return m.runBackupInternal(ctx)
}
// Helper methods
func (m *Manager) acquireRunning() error {
m.mu.Lock()
defer m.mu.Unlock()
if m.running {
return fmt.Errorf("backup already in progress")
}
m.running = true
return nil
}
func (m *Manager) releaseRunning() {
m.mu.Lock()
m.running = false
m.mu.Unlock()
}
// Internal methods — no guard, caller must hold running flag
func (m *Manager) runDBDumpsInternal(ctx context.Context) error {
// ... current RunDBDumps body (without the running guard)
}
func (m *Manager) runBackupInternal(ctx context.Context) error {
// ... current RunBackup body (without the running guard)
} }
``` ```
Remove `storage` (gone), add `backups`. **Option B (simpler):** Only add the guard to `RunBackup()` and `RunDBDumps()`, but make `RunFullBackup()` NOT call them — instead inline the logic. This duplicates code but avoids the refactor.
**Option A is recommended** — it's cleaner and ensures all entry points are guarded.
--- ---
## Phase 6: Filebrowser mount sync ## Bug 3 (MEDIUM): `ValidateDestination` silently succeeds when disk usage can't be read
### 6a. `internal/web/handlers.go` — syncFileBrowserMounts() ### File
`internal/backup/crossdrive.go`, function `ValidateDestination`, line 215
**Current:** Mounts each registered path as one volume: `<path>:/srv/<basename>` ### Root Cause
This exposes EVERYTHING on the drive, including encrypted restic repos and raw appdata.
**Change to:** Mount specific subdirectories per drive:
```go ```go
for _, sp := range paths { if di := system.GetDiskUsage(path); di != nil {
driveName := filepath.Base(sp.Path) // "hdd_1", "sys_drive" // Space checks only execute if di != nil
...
}
return nil // ← Success even when di was nil (space unknown)
```
// User media If `system.GetDiskUsage(path)` returns `nil` (e.g., unsupported filesystem like FUSE/NFS, permission issue, or a path on a virtual filesystem), all space checks are skipped and validation passes without any space verification.
mediaPath := filepath.Join(sp.Path, "media")
if dirExists(mediaPath) {
storageMounts = append(storageMounts,
fmt.Sprintf(" - %s:/srv/%s/media", mediaPath, driveName))
}
// User documents ### Impact
docsPath := filepath.Join(sp.Path, "Dokumentumok") A destination with zero free space could pass validation, leading to a backup failure mid-operation. The rsync or restic process would fail with "no space left on device", but only after partial work, leaving an incomplete backup.
if dirExists(docsPath) {
storageMounts = append(storageMounts,
fmt.Sprintf(" - %s:/srv/%s/Dokumentumok", docsPath, driveName))
}
// Secondary backup copies (rsync — browseable for file recovery) ### Fix Instructions
secPath := filepath.Join(sp.Path, "backups", "secondary") After the `GetDiskUsage` check, add a warning log and optionally block:
if dirExists(secPath) {
storageMounts = append(storageMounts, ```go
fmt.Sprintf(" - %s:/srv/%s/backups:ro", secPath, driveName)) di := system.GetDiskUsage(path)
} if di == nil {
r.logger.Printf("[WARN] Cannot determine disk usage for %s — proceeding without space verification", path)
return nil
} }
``` ```
This gives Filebrowser users access to: This preserves backward compatibility (doesn't block) but makes the situation visible in logs. If you want to be stricter, return an error instead:
- Their media files (movies, music, etc.)
- Their documents
- Secondary backup copies (rsync) for file recovery
- NOT raw appdata (dangerous), NOT restic repos (useless)
--- ```go
if di == nil {
## Phase 7: App catalog changes return fmt.Errorf("destination %s: cannot determine disk usage", path)
}
### 7a. Compose file updates (`app-catalog-felhom.eu`)
All 11+ apps with `needs_hdd: true`: rename `${HDD_PATH}/storage/``${HDD_PATH}/appdata/` in volume mounts.
**Apps to update** (grep for `storage/` in compose files):
- immich, paperless-ngx, audiobookshelf, calibre-web, emby, jellyfin, komga, navidrome, nextcloud, plex, radarr, romm, sonarr
Each compose file's volumes section changes, e.g.:
```yaml
# Before:
- ${HDD_PATH}/storage/immich:/usr/src/app/upload
# After:
- ${HDD_PATH}/appdata/immich:/usr/src/app/upload
``` ```
### 7b. Media-centric apps (Jellyfin, Plex, Emby, Radarr, Sonarr) **Recommended:** Use the warning-only approach for now. A destination that can't report disk usage is unusual and worth logging, but blocking it could break setups with exotic filesystems (CIFS mounts, etc.).
These apps also mount media directories. Check if they reference `${HDD_PATH}/media/` — if so, that's correct (no rename needed for media/).
### 7c. `.felhom.yml` files
The `HDD_PATH` field metadata doesn't reference `storage/` — it just declares the env var. Description says "külső merevlemez elérési útja" which is fine. No changes needed.
--- ---
## Phase 8: Setup script updates ## Bug 4 (MEDIUM): Empty `systemDataPath` produces relative dump paths
### 8a. `scripts/docker-setup.sh` ### File
`internal/backup/backup.go`, function `DumpStackDB`, line 570
- Update `install_filebrowser()` volume mounts to use new per-subdirectory pattern ### Root Cause
- Or: remove Filebrowser initial mounts entirely (controller will sync them on startup) ```go
drivePath := m.GetAppDrivePath(stackName) // Returns "" if stackProvider is nil AND systemDataPath is ""
### 8b. `scripts/hdd-setup.sh` dumpDir := AppDBDumpPath(drivePath, stackName) // AppDBDumpPath("", "mealie") = "backups/primary/mealie/db-dumps" (RELATIVE!)
- Update `STORAGE_DIRS` to remove `storage/` entries
- Update to use `appdata/` naming
- Or: mark as deprecated (controller handles disk init now)
---
## Phase 9: controller.yaml update
New controller.yaml for demo node (after OS reinstall with SSD partition):
```yaml
paths:
stacks_dir: "/opt/docker/stacks"
system_data_path: "/mnt/sys_drive"
backup:
enabled: true
restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
db_dump_schedule: "02:30"
restic_schedule: "03:00"
retention:
keep_daily: 7
keep_weekly: 4
keep_monthly: 6
prune_schedule: "sunday"
``` ```
No more `restic_repo`, `db_dump_dir`, `backup_dir`. If `systemDataPath` is empty (misconfiguration or missing `system_data_path` in controller.yaml) and the stack has no HDD_PATH, `GetAppDrivePath()` returns an empty string. `AppDBDumpPath("", stackName)` then produces a **relative path** instead of an absolute one. The `DumpOne()` function creates directories relative to the process's working directory (typically `/opt/docker/felhom-controller/`), orphaning the dumps where nothing else looks for them.
The same issue exists in `RunDBDumps()` (line 212) and `RunBackup()` (line 308, 321).
### Impact
- DB dumps written to unexpected location
- Restic backup doesn't find them (looks in the expected absolute path)
- Backup appears to succeed but DB dumps are orphaned
- Restore would not have fresh DB dumps
### Fix Instructions
Add a validation check in `GetAppDrivePath()`:
```go
func (m *Manager) GetAppDrivePath(stackName string) string {
if m.stackProvider != nil {
if hddPath := m.stackProvider.GetStackHDDPath(stackName); hddPath != "" {
return hddPath
}
}
if m.systemDataPath == "" {
m.logger.Printf("[ERROR] systemDataPath is empty — cannot determine drive for %s", stackName)
}
return m.systemDataPath
}
```
Also add a startup validation in `NewManager()`:
```go
func NewManager(cfg *config.Config, pinger *monitor.Pinger, sett *settings.Settings, logger *log.Logger) *Manager {
if cfg.Paths.SystemDataPath == "" {
logger.Printf("[WARN] SystemDataPath is empty in config — SSD-only apps will not have correct backup paths")
}
return &Manager{
// ... existing fields
}
}
```
And in `DumpStackDB()`, add a guard before using the path:
```go
drivePath := m.GetAppDrivePath(stackName)
if drivePath == "" || !filepath.IsAbs(drivePath) {
return fmt.Errorf("cannot determine absolute drive path for %s (systemDataPath not configured?)", stackName)
}
dumpDir := AppDBDumpPath(drivePath, stackName)
```
--- ---
## Phase 10: UI — Tároló section (simple update) ## Testing Checklist
The Tároló section on the backup page needs to work with the new multi-drive, multi-repo architecture. Since we already agreed to show combined stats (not paths): After fixing all bugs, verify:
- **Tier 1 summary:** Aggregate snapshot count + total size across all primary repos - [ ] **Bug 1:** Run cross-drive rsync backup twice for a single-mount app → `_db/` and `_config/` persist between runs (not deleted by `--delete`)
- **Tier 2 summary:** How many apps configured, total size - [ ] **Bug 1:** Run cross-drive rsync backup for a multi-mount app → behavior unchanged
- **Keep:** Encryption key display (same password for all repos) - [ ] **Bug 2:** During nightly scheduled backup, UI shows "Mentés folyamatban" on dashboard
- **Remove:** Path displays, DB dump section (unnecessary detail) - [ ] **Bug 2:** During nightly scheduled backup, "Visszaállítás" button shows "already in progress" error
- [ ] **Bug 2:** `RunFullBackup()` still works correctly (calls internal methods, doesn't deadlock)
- [ ] **Bug 2:** Scheduled `db-dump` and `backup` jobs don't deadlock with each other (they run at different times, but if one overruns, the next should get "already in progress" error)
- [ ] **Bug 3:** Cross-drive backup to a path where `GetDiskUsage` returns nil → logs a warning (not a silent pass)
- [ ] **Bug 4:** With `system_data_path: ""` in config → startup log warning + `DumpStackDB()` returns error instead of using relative path
- [ ] Build succeeds: `go build ./...`
--- ---
## Gotchas & risks ## Implementation Notes
1. **Grep for ALL references** to removed config fields: `BackupDir`, `DBDumpDir`, `ResticRepo`, `cfg.Backup.ResticRepo`, `cfg.Paths.DBDumpDir`, `cfg.Paths.BackupDir` - **Do NOT change** any template files, CSS, or UI text — only Go backend files
2. **ResticManager refactor** changes all call sites — grep for `m.restic.` in backup.go - **Do NOT change** any function signatures that are part of the public API (other packages import them)
3. **DB dump path in crossdrive.go** — currently `r.dbDumpDir` (global). Needs per-app resolution. - **Do NOT change** the scheduler wiring in `main.go` — the fix should be in the backup package
4. **Snapshot aggregation** — merging snapshots from multiple repos for the UI. Need to handle different repo sizes, dedup by timestamp. - **Run `go vet ./...` and `go build ./...`** after all changes to verify no compilation errors
5. **New restic repo initialization** — when a new drive is registered and first backup runs, `restic init` must succeed before `restic backup`. The `EnsureInitialized` pattern already exists. - Keep all log messages in English (UI text is Hungarian, but log messages are English)
6. **Empty drives** — a drive with no apps deployed yet should NOT get a restic backup (empty paths). Skip drives with zero apps.
7. **The `systemDataPath` as fallback** — SSD-only apps (Mealie, Gokapi) have no HDD_PATH. Their drive is `cfg.Paths.SystemDataPath`. Make sure this path exists and is registered as a storage path.
8. **Compose config files in multiple repos**`/opt/docker/stacks/` is included in every drive's primary repo. This means the same files are in multiple repos. That's intentional (each repo is self-contained) but uses slightly more storage.
9. **The `ParseComposeHDDMounts` function** references `${HDD_PATH}` with `storage/` subdirs. After rename to `appdata/`, the compose files change, so the parsed mounts change too. The function itself is generic (parses any `${HDD_PATH}` prefix) so it doesn't need code changes — only the compose templates change.
10. **docker-compose.yml volumes in felhom-controller** — currently `- /srv/backups:/srv/backups`. This mount becomes unnecessary since all backups are under `/mnt/`. The `/mnt:/mnt:rshared` mount already provides access. Can remove the `/srv/backups` volume mount from the controller's compose file.
--- ---
## Implementation order ## Files to Modify
1. **Phase 1** — Config + path helpers + StackDataProvider update (foundation, everything depends on this) | File | Bug(s) | Changes |
2. **Phase 7** — App catalog compose files (independent, can do in parallel) |------|--------|---------|
3. **Phase 5** — Protected paths (quick, independent) | `internal/backup/crossdrive.go` | 1, 3 | Add rsync `--exclude` flags; add nil check for GetDiskUsage |
4. **Phase 2** — DB dump refactor | `internal/backup/backup.go` | 2, 4 | Extract `acquireRunning`/`releaseRunning` + internal methods; add systemDataPath validation |
5. **Phase 3** — Restic backup refactor (depends on Phase 1 + 2)
6. **Phase 4** — Cross-drive backup refactor (depends on Phase 1)
7. **Phase 6** — Filebrowser mount sync
8. **Phase 10** — UI Tároló section
9. **Phase 8** — Setup scripts
10. **Phase 9** — controller.yaml
Build, deploy to reinstalled demo node, verify.
---
## Files to modify
### Controller (deploy-felhom-compose/controller/)
| File | Phase | Changes |
|------|-------|---------|
| `internal/config/config.go` | 1a | Add SystemDataPath, remove BackupDir/DBDumpDir/ResticRepo |
| `internal/backup/paths.go` | 1b | **NEW FILE** — path computation helpers |
| `internal/backup/appdata.go` | 1d | Add GetStackHDDPath to StackDataProvider |
| `cmd/controller/main.go` | 1d | Implement GetStackHDDPath in stackAdapter |
| `internal/backup/backup.go` | 2+3 | DumpAll, DumpStackDB, RunBackup, RefreshCache, GetFullStatus, RunPrune, RunCheck |
| `internal/backup/restic.go` | 3a | Make repoPath a parameter, not instance field |
| `internal/backup/crossdrive.go` | 4 | Update destination paths, DB dump source paths |
| `internal/stacks/delete.go` | 5 | Update ProtectedHDDPaths |
| `internal/web/handlers.go` | 6+10 | syncFileBrowserMounts, backupsHandler |
| `internal/web/templates/backups.html` | 10 | Tároló section |
### App catalog (app-catalog-felhom.eu/)
| File | Phase | Changes |
|------|-------|---------|
| `templates/*/docker-compose.yml` (11+ files) | 7 | `storage/``appdata/` in volume mounts |
### Scripts
| File | Phase | Changes |
|------|-------|---------|
| `scripts/docker-setup.sh` | 8 | Filebrowser mounts, path references |
| `scripts/hdd-setup.sh` | 8 | Directory structure arrays |
### Config
| File | Phase | Changes |
|------|-------|---------|
| Demo node `controller.yaml` | 9 | New paths config |
| Demo node `docker-compose.yml` | 10 | Remove `/srv/backups` mount |