Bug Fixes from v0.12.4–v0.13.1 Code Review
This commit is contained in:
@@ -1,504 +1,339 @@
|
||||
# TASK.md — v0.14.0 Storage & Backup Architecture Overhaul
|
||||
# TASK.md — Bug Fixes from v0.12.4–v0.13.1 Code Review
|
||||
|
||||
**Version:** v0.14.0
|
||||
**Type:** Architecture overhaul — storage paths, backup structure, multi-drive support
|
||||
**Scope:** Controller Go code + app catalog compose files + setup scripts
|
||||
**Note:** Demo node will be reinstalled from scratch — no migration needed
|
||||
**Version:** v0.14.1 → v0.14.2
|
||||
**Type:** Bug fixes identified by thorough code review of today's changes (sessions 40–48)
|
||||
**Scope:** Controller Go code — `internal/backup/`, `internal/web/`
|
||||
|
||||
---
|
||||
|
||||
## Design Overview
|
||||
## Bug Summary
|
||||
|
||||
### New directory structure (per drive)
|
||||
|
||||
Every drive mount (`/mnt/sys_drive`, `/mnt/hdd_1`, `/mnt/hdd_2`, ...) uses the same layout:
|
||||
|
||||
```
|
||||
/mnt/<drive>/
|
||||
appdata/<app>/ ← live app data (renamed from "storage")
|
||||
backups/
|
||||
primary/
|
||||
<app>/db-dumps/ ← raw DB dumps per app (accessible for testing)
|
||||
restic/ ← per-drive restic repo (all apps on this drive)
|
||||
secondary/
|
||||
<app>/rsync/ ← rsync copies from apps on OTHER drives
|
||||
restic/ ← restic repo for secondary copies
|
||||
Dokumentumok/
|
||||
media/
|
||||
Download/
|
||||
movies/
|
||||
series/
|
||||
music/
|
||||
audiobooks/
|
||||
```
|
||||
|
||||
### Key rules
|
||||
|
||||
1. **An app's "home drive"** = the drive from its `HDD_PATH` env var, or `cfg.Paths.SystemDataPath` if no HDD_PATH
|
||||
2. **Primary backup** lives on the SAME drive as the app — protects against accidental deletion, app bugs
|
||||
3. **Secondary backup** lives on a DIFFERENT drive — protects against drive failure
|
||||
4. **One restic repo per drive** (in both primary and secondary) — same password for all repos
|
||||
5. **DB dumps** are raw SQL files per-app, always on the app's home drive, also included in restic
|
||||
6. **Compose configs + controller.yaml** go into EVERY primary restic repo (small, ensures self-contained restore)
|
||||
7. **`storage/` → `appdata/`** rename across all compose templates
|
||||
8. **Filebrowser** mounts per-drive subdirectories: `media/`, `Dokumentumok/`, `backups/secondary/` (for file recovery)
|
||||
| # | Severity | File | Description |
|
||||
|---|----------|------|-------------|
|
||||
| 1 | **HIGH** | `crossdrive.go` | rsync `--delete` destroys `_db/` and `_config/` directories on every single-mount run |
|
||||
| 2 | **MEDIUM** | `backup.go` | Scheduled backups don't set `m.running` flag — restore/manual backup can overlap |
|
||||
| 3 | **MEDIUM** | `crossdrive.go` | `ValidateDestination` silently succeeds when disk usage can't be read |
|
||||
| 4 | **MEDIUM** | `backup.go` | Empty `systemDataPath` silently produces relative dump paths |
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Config & path helpers
|
||||
## Bug 1 (HIGH): rsync `--delete` destroys `_db/` and `_config/` directories
|
||||
|
||||
### 1a. `internal/config/config.go`
|
||||
### File
|
||||
`internal/backup/crossdrive.go`, function `runRsyncBackup`, lines 242–276
|
||||
|
||||
**Add:**
|
||||
- `SystemDataPath string \`yaml:"system_data_path"\`` to `PathsConfig` — default `/mnt/sys_drive`
|
||||
### Root Cause
|
||||
When an app has exactly one HDD mount (`len(mounts) == 1`), rsync copies directly into `destDir/` with the `--delete` flag. The `_db/` and `_config/` subdirectories — created by the DB dump copy (lines 278–286) and config rsync (lines 288–303) from the **previous** backup run — don't exist in the source mount, so `--delete` removes them from the destination **before** the current run re-creates them.
|
||||
|
||||
**Remove from struct:**
|
||||
- `BackupDir string` from PathsConfig
|
||||
- `DBDumpDir string` from PathsConfig
|
||||
- `ResticRepo string` from BackupConfig
|
||||
### Impact
|
||||
1. **Brief window of incomplete backup** on every run — between rsync completion and the subsequent `_db`/`_config` copy steps, the backup destination is missing DB dumps and config
|
||||
2. **Data loss if interrupted** — if the process is killed (OOM, power loss, context cancellation) between the rsync step and the copy steps, `_db/` and `_config/` are gone until the next successful complete run
|
||||
3. **Wasted I/O** — every run deletes and re-creates these directories unnecessarily
|
||||
|
||||
**Keep:**
|
||||
- `ResticPasswordFile string` in BackupConfig (shared across all repos)
|
||||
- `HDDPath string` in PathsConfig (legacy, still used as default storage)
|
||||
### Reproduction
|
||||
1. Deploy Immich with a single HDD mount to `/mnt/hdd_1/storage/immich`
|
||||
2. Configure cross-drive rsync backup to `/mnt/sys_drive`
|
||||
3. Run backup twice
|
||||
4. After first run: `ls /mnt/sys_drive/backups/secondary/immich/rsync/` shows `_db/`, `_config/`, and app data
|
||||
5. During second run's rsync phase: `_db/` and `_config/` disappear from dest
|
||||
6. After second run completes: `_db/` and `_config/` reappear (re-created)
|
||||
|
||||
**Update `applyDefaults()`:**
|
||||
- Remove: `d(&cfg.Paths.BackupDir, "/srv/backups")`
|
||||
- Remove: `d(&cfg.Paths.DBDumpDir, "/srv/backups/db-dumps")`
|
||||
- Remove: `d(&cfg.Backup.ResticRepo, "/srv/backups/restic-repo")`
|
||||
- Add: `d(&cfg.Paths.SystemDataPath, "/mnt/sys_drive")`
|
||||
|
||||
**Gotcha:** All code referencing `cfg.Paths.BackupDir`, `cfg.Paths.DBDumpDir`, `cfg.Backup.ResticRepo` will break. Grep for all references and update.
|
||||
|
||||
### 1b. New file: `internal/backup/paths.go`
|
||||
|
||||
Path computation helpers (pure functions, no state):
|
||||
### Fix Instructions
|
||||
In `runRsyncBackup()`, add an `--exclude` flag with pattern `_*` to the rsync command. This prevents `--delete` from touching any controller-managed directories (which all use underscore prefix: `_db`, `_config`, and any future ones) while still cleaning up stale user data.
|
||||
|
||||
**Change the rsync command construction (around line 267) from:**
|
||||
```go
|
||||
package backup
|
||||
cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete",
|
||||
"--exclude", "backups/*.sql.gz",
|
||||
"--exclude", "backups/*.sql",
|
||||
"--exclude", "backups/*.dump",
|
||||
src, dst)
|
||||
```
|
||||
|
||||
import "path/filepath"
|
||||
**To:**
|
||||
```go
|
||||
cmd := exec.CommandContext(ctx, "rsync", "-a", "--delete",
|
||||
"--exclude", "_*",
|
||||
"--exclude", "backups/*.sql.gz",
|
||||
"--exclude", "backups/*.sql",
|
||||
"--exclude", "backups/*.dump",
|
||||
src, dst)
|
||||
```
|
||||
|
||||
func PrimaryBackupPath(drivePath string) string {
|
||||
return filepath.Join(drivePath, "backups", "primary")
|
||||
**Why `_*` instead of listing `_db` and `_config` individually:**
|
||||
- Future-proof — if we ever add another controller-managed subdirectory (e.g., `_metadata`), it's automatically protected
|
||||
- The underscore prefix is a convention for controller-managed dirs; no real app data directory starts with `_`
|
||||
- Simpler to maintain (one pattern vs growing list)
|
||||
|
||||
**Why this is safe:** The `_db` and `_config` directories are managed exclusively by the controller (lines 278–303), never by the user's app. No user data directory starts with `_`. The underscore prefix convention was specifically chosen to avoid collision with app data paths.
|
||||
|
||||
**Note on `_infra`:** The `_infra/` directory lives at `<dest>/backups/secondary/_infra/` (a sibling of app directories, NOT inside any app's rsync target directory). So it's not affected by `--delete` regardless. The `_*` exclude is still good practice in case the directory structure changes later.
|
||||
|
||||
**For the multi-mount case** (`len(mounts) > 1`): rsync goes into per-leaf subdirectories, not the root `destDir`, so `_db/` and `_config/` are siblings (not children) of the rsync targets. The `--delete` flag only affects the target subtree, so they're safe in the multi-mount case. Adding the exclude universally doesn't hurt.
|
||||
|
||||
---
|
||||
|
||||
## Bug 2 (MEDIUM): Scheduled backups don't set `m.running` flag
|
||||
|
||||
### File
|
||||
`internal/backup/backup.go`, functions `RunBackup()` (line 272) and `RunDBDumps()` (line 181)
|
||||
|
||||
### Root Cause
|
||||
Only `RunFullBackup()` (line 454) sets `m.running = true` before running. The scheduler in `main.go` calls `RunBackup()` and `RunDBDumps()` directly (not via `RunFullBackup()`), so `m.running` is never set during nightly scheduled runs.
|
||||
|
||||
Code that checks `m.running`:
|
||||
- `RestoreApp()` in `restore.go:25` — blocks if running
|
||||
- `RunFullBackup()` in `backup.go:456` — blocks if running
|
||||
- `IsRunning()` in `backup.go:496` — used by API and UI to show status
|
||||
|
||||
### Impact
|
||||
1. **UI shows "not running" during nightly backups** — the dashboard backup card and backup page don't indicate that a backup is in progress
|
||||
2. **Restore can overlap with nightly backup** — a user triggering "Restore" at 03:05 while the nightly restic backup is running won't see a "backup in progress" error. Both operations compete for the restic repo lock; one will fail with an opaque error
|
||||
3. **Manual "Backup now" can overlap with scheduled backup** — `RunFullBackup()` checks `m.running`, but since the scheduled `RunBackup()` doesn't set it, the manual trigger proceeds and both run concurrently
|
||||
|
||||
### Fix Instructions
|
||||
Add `m.running` guard to both `RunBackup()` and `RunDBDumps()`. The pattern is the same as in `RunFullBackup()`:
|
||||
|
||||
**In `RunBackup()` (line 272), add at the beginning of the function:**
|
||||
```go
|
||||
func (m *Manager) RunBackup(ctx context.Context) error {
|
||||
m.mu.Lock()
|
||||
if m.running {
|
||||
m.mu.Unlock()
|
||||
return fmt.Errorf("backup already in progress")
|
||||
}
|
||||
m.running = true
|
||||
m.mu.Unlock()
|
||||
defer func() {
|
||||
m.mu.Lock()
|
||||
m.running = false
|
||||
m.mu.Unlock()
|
||||
}()
|
||||
|
||||
start := time.Now()
|
||||
// ... rest of existing function
|
||||
```
|
||||
|
||||
**In `RunDBDumps()` (line 181), add the same guard:**
|
||||
```go
|
||||
func (m *Manager) RunDBDumps(ctx context.Context) error {
|
||||
m.mu.Lock()
|
||||
if m.running {
|
||||
m.mu.Unlock()
|
||||
return fmt.Errorf("backup already in progress")
|
||||
}
|
||||
m.running = true
|
||||
m.mu.Unlock()
|
||||
defer func() {
|
||||
m.mu.Lock()
|
||||
m.running = false
|
||||
m.mu.Unlock()
|
||||
}()
|
||||
|
||||
start := time.Now()
|
||||
// ... rest of existing function
|
||||
```
|
||||
|
||||
**IMPORTANT:** `RunFullBackup()` calls `RunDBDumps()` then `RunBackup()` internally. After adding the guards above, `RunFullBackup()` would deadlock on itself (it sets `m.running=true`, then calls `RunDBDumps()` which also tries to set it). Fix by either:
|
||||
|
||||
**Option A (recommended):** Extract the logic into internal methods that DON'T check the flag:
|
||||
```go
|
||||
// Public methods — set the running guard
|
||||
func (m *Manager) RunDBDumps(ctx context.Context) error {
|
||||
if err := m.acquireRunning(); err != nil {
|
||||
return err
|
||||
}
|
||||
defer m.releaseRunning()
|
||||
return m.runDBDumpsInternal(ctx)
|
||||
}
|
||||
|
||||
func PrimaryResticRepoPath(drivePath string) string {
|
||||
return filepath.Join(drivePath, "backups", "primary", "restic")
|
||||
func (m *Manager) RunBackup(ctx context.Context) error {
|
||||
if err := m.acquireRunning(); err != nil {
|
||||
return err
|
||||
}
|
||||
defer m.releaseRunning()
|
||||
return m.runBackupInternal(ctx)
|
||||
}
|
||||
|
||||
func AppDBDumpPath(drivePath, stackName string) string {
|
||||
return filepath.Join(drivePath, "backups", "primary", stackName, "db-dumps")
|
||||
func (m *Manager) RunFullBackup(ctx context.Context) error {
|
||||
if err := m.acquireRunning(); err != nil {
|
||||
return err
|
||||
}
|
||||
defer m.releaseRunning()
|
||||
|
||||
if err := m.runDBDumpsInternal(ctx); err != nil {
|
||||
m.logger.Printf("[WARN] DB dump had errors, continuing with backup anyway")
|
||||
}
|
||||
return m.runBackupInternal(ctx)
|
||||
}
|
||||
|
||||
func SecondaryBackupPath(drivePath string) string {
|
||||
return filepath.Join(drivePath, "backups", "secondary")
|
||||
// Helper methods
|
||||
func (m *Manager) acquireRunning() error {
|
||||
m.mu.Lock()
|
||||
defer m.mu.Unlock()
|
||||
if m.running {
|
||||
return fmt.Errorf("backup already in progress")
|
||||
}
|
||||
m.running = true
|
||||
return nil
|
||||
}
|
||||
|
||||
func AppSecondaryRsyncPath(drivePath, stackName string) string {
|
||||
return filepath.Join(drivePath, "backups", "secondary", stackName, "rsync")
|
||||
func (m *Manager) releaseRunning() {
|
||||
m.mu.Lock()
|
||||
m.running = false
|
||||
m.mu.Unlock()
|
||||
}
|
||||
|
||||
func SecondaryResticRepoPath(drivePath string) string {
|
||||
return filepath.Join(drivePath, "backups", "secondary", "restic")
|
||||
// Internal methods — no guard, caller must hold running flag
|
||||
func (m *Manager) runDBDumpsInternal(ctx context.Context) error {
|
||||
// ... current RunDBDumps body (without the running guard)
|
||||
}
|
||||
|
||||
func AppDataPath(drivePath, stackName string) string {
|
||||
return filepath.Join(drivePath, "appdata", stackName)
|
||||
func (m *Manager) runBackupInternal(ctx context.Context) error {
|
||||
// ... current RunBackup body (without the running guard)
|
||||
}
|
||||
```
|
||||
|
||||
### 1c. App drive resolution
|
||||
**Option B (simpler):** Only add the guard to `RunBackup()` and `RunDBDumps()`, but make `RunFullBackup()` NOT call them — instead inline the logic. This duplicates code but avoids the refactor.
|
||||
|
||||
Need a method to determine which drive an app lives on. Add to the backup Manager or StackDataProvider:
|
||||
**Option A is recommended** — it's cleaner and ensures all entry points are guarded.
|
||||
|
||||
---
|
||||
|
||||
## Bug 3 (MEDIUM): `ValidateDestination` silently succeeds when disk usage can't be read
|
||||
|
||||
### File
|
||||
`internal/backup/crossdrive.go`, function `ValidateDestination`, line 215
|
||||
|
||||
### Root Cause
|
||||
```go
|
||||
if di := system.GetDiskUsage(path); di != nil {
|
||||
// Space checks only execute if di != nil
|
||||
...
|
||||
}
|
||||
return nil // ← Success even when di was nil (space unknown)
|
||||
```
|
||||
|
||||
If `system.GetDiskUsage(path)` returns `nil` (e.g., unsupported filesystem like FUSE/NFS, permission issue, or a path on a virtual filesystem), all space checks are skipped and validation passes without any space verification.
|
||||
|
||||
### Impact
|
||||
A destination with zero free space could pass validation, leading to a backup failure mid-operation. The rsync or restic process would fail with "no space left on device", but only after partial work, leaving an incomplete backup.
|
||||
|
||||
### Fix Instructions
|
||||
After the `GetDiskUsage` check, add a warning log and optionally block:
|
||||
|
||||
```go
|
||||
di := system.GetDiskUsage(path)
|
||||
if di == nil {
|
||||
r.logger.Printf("[WARN] Cannot determine disk usage for %s — proceeding without space verification", path)
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
This preserves backward compatibility (doesn't block) but makes the situation visible in logs. If you want to be stricter, return an error instead:
|
||||
|
||||
```go
|
||||
if di == nil {
|
||||
return fmt.Errorf("destination %s: cannot determine disk usage", path)
|
||||
}
|
||||
```
|
||||
|
||||
**Recommended:** Use the warning-only approach for now. A destination that can't report disk usage is unusual and worth logging, but blocking it could break setups with exotic filesystems (CIFS mounts, etc.).
|
||||
|
||||
---
|
||||
|
||||
## Bug 4 (MEDIUM): Empty `systemDataPath` produces relative dump paths
|
||||
|
||||
### File
|
||||
`internal/backup/backup.go`, function `DumpStackDB`, line 570
|
||||
|
||||
### Root Cause
|
||||
```go
|
||||
drivePath := m.GetAppDrivePath(stackName) // Returns "" if stackProvider is nil AND systemDataPath is ""
|
||||
dumpDir := AppDBDumpPath(drivePath, stackName) // AppDBDumpPath("", "mealie") = "backups/primary/mealie/db-dumps" (RELATIVE!)
|
||||
```
|
||||
|
||||
If `systemDataPath` is empty (misconfiguration or missing `system_data_path` in controller.yaml) and the stack has no HDD_PATH, `GetAppDrivePath()` returns an empty string. `AppDBDumpPath("", stackName)` then produces a **relative path** instead of an absolute one. The `DumpOne()` function creates directories relative to the process's working directory (typically `/opt/docker/felhom-controller/`), orphaning the dumps where nothing else looks for them.
|
||||
|
||||
The same issue exists in `RunDBDumps()` (line 212) and `RunBackup()` (line 308, 321).
|
||||
|
||||
### Impact
|
||||
- DB dumps written to unexpected location
|
||||
- Restic backup doesn't find them (looks in the expected absolute path)
|
||||
- Backup appears to succeed but DB dumps are orphaned
|
||||
- Restore would not have fresh DB dumps
|
||||
|
||||
### Fix Instructions
|
||||
Add a validation check in `GetAppDrivePath()`:
|
||||
|
||||
```go
|
||||
// GetAppDrivePath returns the drive path for an app.
|
||||
// Uses HDD_PATH from app.yaml if set, otherwise falls back to system data path.
|
||||
func (m *Manager) GetAppDrivePath(stackName string) string {
|
||||
if mounts := m.stackProvider.GetStackHDDMounts(stackName); len(mounts) > 0 {
|
||||
// The HDD_PATH is the mount point — extract the drive from the first mount
|
||||
// e.g., /mnt/hdd_1/appdata/immich → /mnt/hdd_1
|
||||
// Actually, we need the HDD_PATH itself, not the mounts
|
||||
if m.stackProvider != nil {
|
||||
if hddPath := m.stackProvider.GetStackHDDPath(stackName); hddPath != "" {
|
||||
return hddPath
|
||||
}
|
||||
}
|
||||
if m.systemDataPath == "" {
|
||||
m.logger.Printf("[ERROR] systemDataPath is empty — cannot determine drive for %s", stackName)
|
||||
}
|
||||
return m.systemDataPath
|
||||
}
|
||||
```
|
||||
|
||||
**Gotcha:** `GetStackHDDMounts` returns resolved mount paths (e.g., `/mnt/hdd_1/appdata/immich`), not the raw `HDD_PATH` value. Need a way to get the raw HDD_PATH for a stack. Options:
|
||||
- Add `GetStackHDDPath(name string) string` to `StackDataProvider` interface
|
||||
- Or: derive the drive from mount paths by finding the common `/mnt/<drive>` prefix
|
||||
- Best: add to StackDataProvider — clean, explicit
|
||||
Also add a startup validation in `NewManager()`:
|
||||
|
||||
### 1d. StackDataProvider interface update
|
||||
|
||||
In `internal/backup/appdata.go`, add:
|
||||
```go
|
||||
type StackDataProvider interface {
|
||||
// ... existing methods ...
|
||||
GetStackHDDPath(name string) string // NEW: raw HDD_PATH from app.yaml
|
||||
func NewManager(cfg *config.Config, pinger *monitor.Pinger, sett *settings.Settings, logger *log.Logger) *Manager {
|
||||
if cfg.Paths.SystemDataPath == "" {
|
||||
logger.Printf("[WARN] SystemDataPath is empty in config — SSD-only apps will not have correct backup paths")
|
||||
}
|
||||
return &Manager{
|
||||
// ... existing fields
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And implement in the `stackAdapter` in `main.go`.
|
||||
And in `DumpStackDB()`, add a guard before using the path:
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: DB dump refactor
|
||||
|
||||
### 2a. `internal/backup/backup.go` — DumpAll()
|
||||
|
||||
Currently dumps all DBs to one global directory (`m.cfg.Paths.DBDumpDir`).
|
||||
|
||||
**Change to:** For each discovered DB, determine the app's drive, dump to `<drive>/backups/primary/<stack>/db-dumps/`.
|
||||
|
||||
Key changes:
|
||||
- Remove references to `m.cfg.Paths.DBDumpDir`
|
||||
- Compute dump path per stack: `AppDBDumpPath(m.GetAppDrivePath(stack), stack)`
|
||||
- Create dir if not exists before dumping
|
||||
- Update `DumpResult` to include per-stack dump paths
|
||||
|
||||
### 2b. `internal/backup/backup.go` — DumpStackDB()
|
||||
|
||||
Same refactor for single-stack dump (called by cross-drive before running Tier 2 backup).
|
||||
|
||||
### 2c. Status/validation
|
||||
|
||||
Currently `RefreshCache()` lists all dump files from one directory. Need to scan per-drive dump directories instead.
|
||||
|
||||
- Scan all registered drives (from settings or deployed stacks)
|
||||
- For each drive, glob `<drive>/backups/primary/*/db-dumps/*.sql`
|
||||
- Aggregate results
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Restic backup refactor
|
||||
|
||||
### 3a. `internal/backup/restic.go` — ResticManager
|
||||
|
||||
Currently `ResticManager` has a single `repoPath`. Need to support multiple repos.
|
||||
|
||||
**Option A:** Make ResticManager stateless — pass repoPath per operation.
|
||||
**Option B:** Create multiple ResticManager instances.
|
||||
|
||||
**Recommend Option A** — cleaner for per-drive operations. Refactor all ResticManager methods to accept `repoPath` as parameter instead of using `r.repoPath`:
|
||||
- `EnsureInitialized(repoPath string) error`
|
||||
- `RunBackup(ctx, repoPath string, paths []string, tags []string) (*SnapshotResult, error)`
|
||||
- `ListSnapshots(repoPath string) ([]SnapshotInfo, error)`
|
||||
- `GetRepoStats(repoPath string) (*RepoStats, error)`
|
||||
- `RunCheck(repoPath string) error`
|
||||
- `RunPrune(repoPath string) error`
|
||||
- etc.
|
||||
|
||||
Keep `r.passwordFile`, `r.cacheDir`, `r.logger` as instance fields.
|
||||
|
||||
### 3b. `internal/backup/backup.go` — RunBackup()
|
||||
|
||||
Currently:
|
||||
```go
|
||||
paths := []string{stacksDir, dbDumpDir, controllerYaml}
|
||||
paths = append(paths, appPaths...)
|
||||
// one restic backup
|
||||
drivePath := m.GetAppDrivePath(stackName)
|
||||
if drivePath == "" || !filepath.IsAbs(drivePath) {
|
||||
return fmt.Errorf("cannot determine absolute drive path for %s (systemDataPath not configured?)", stackName)
|
||||
}
|
||||
dumpDir := AppDBDumpPath(drivePath, stackName)
|
||||
```
|
||||
|
||||
**Change to:**
|
||||
```go
|
||||
func (m *Manager) RunBackup(ctx context.Context) error {
|
||||
// Group deployed stacks by drive
|
||||
driveStacks := m.groupStacksByDrive()
|
||||
---
|
||||
|
||||
infraPaths := []string{
|
||||
m.cfg.Paths.StacksDir,
|
||||
"/opt/docker/felhom-controller/controller.yaml",
|
||||
}
|
||||
## Testing Checklist
|
||||
|
||||
for drivePath, stacks := range driveStacks {
|
||||
repoPath := PrimaryResticRepoPath(drivePath)
|
||||
m.restic.EnsureInitialized(repoPath)
|
||||
After fixing all bugs, verify:
|
||||
|
||||
var paths []string
|
||||
// Always include infra (compose configs + controller.yaml) in every repo
|
||||
paths = append(paths, infraPaths...)
|
||||
|
||||
for _, stack := range stacks {
|
||||
// App data (appdata/<stack>/)
|
||||
appData := AppDataPath(drivePath, stack.Name)
|
||||
if _, err := os.Stat(appData); err == nil {
|
||||
paths = append(paths, appData)
|
||||
}
|
||||
// DB dumps for this stack
|
||||
dumpDir := AppDBDumpPath(drivePath, stack.Name)
|
||||
if _, err := os.Stat(dumpDir); err == nil {
|
||||
paths = append(paths, dumpDir)
|
||||
}
|
||||
}
|
||||
|
||||
// Tag with drive name for easy filtering
|
||||
tags := []string{filepath.Base(drivePath)}
|
||||
m.restic.RunBackup(ctx, repoPath, paths, tags)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3c. Prune, check, forget — per drive
|
||||
|
||||
Currently scheduled as single jobs. Need to loop over all active drive repos:
|
||||
- `RunPrune()` → for each drive, prune that drive's primary restic repo
|
||||
- `RunCheck()` → same
|
||||
- `RunForget()` → same
|
||||
|
||||
### 3d. Snapshot listing & stats — aggregate
|
||||
|
||||
For the backup page UI:
|
||||
- `ListSnapshots()` → list from all primary repos, merge and sort by time
|
||||
- `GetRepoStats()` → aggregate total size and snapshot count across repos
|
||||
- Tag snapshots with drive name so UI can optionally group them
|
||||
|
||||
### 3e. Monitoring pings
|
||||
|
||||
After ALL drive backups complete (not per-drive), send the backup ping. If ANY drive fails, the ping is not sent (or sent as failure).
|
||||
- [ ] **Bug 1:** Run cross-drive rsync backup twice for a single-mount app → `_db/` and `_config/` persist between runs (not deleted by `--delete`)
|
||||
- [ ] **Bug 1:** Run cross-drive rsync backup for a multi-mount app → behavior unchanged
|
||||
- [ ] **Bug 2:** During nightly scheduled backup, UI shows "Mentés folyamatban" on dashboard
|
||||
- [ ] **Bug 2:** During nightly scheduled backup, "Visszaállítás" button shows "already in progress" error
|
||||
- [ ] **Bug 2:** `RunFullBackup()` still works correctly (calls internal methods, doesn't deadlock)
|
||||
- [ ] **Bug 2:** Scheduled `db-dump` and `backup` jobs don't deadlock with each other (they run at different times, but if one overruns, the next should get "already in progress" error)
|
||||
- [ ] **Bug 3:** Cross-drive backup to a path where `GetDiskUsage` returns nil → logs a warning (not a silent pass)
|
||||
- [ ] **Bug 4:** With `system_data_path: ""` in config → startup log warning + `DumpStackDB()` returns error instead of using relative path
|
||||
- [ ] Build succeeds: `go build ./...`
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Cross-drive (secondary) backup refactor
|
||||
## Implementation Notes
|
||||
|
||||
### 4a. `internal/backup/crossdrive.go` — runRsyncBackup()
|
||||
|
||||
**Current:** `destDir = filepath.Join(destBase, "backups", "rsync", stackName)`
|
||||
**New:** `destDir = AppSecondaryRsyncPath(destBase, stackName)` → `<dest>/backups/secondary/<stack>/rsync/`
|
||||
|
||||
Update all path computations:
|
||||
- `destDir` construction
|
||||
- `_db/` subdirectory (now under rsync/ too)
|
||||
- `_config/` subdirectory
|
||||
- Size calculation path
|
||||
|
||||
### 4b. `internal/backup/crossdrive.go` — runResticBackup()
|
||||
|
||||
**Current:** `repoPath = filepath.Join(destBase, "backups", "restic")`
|
||||
**New:** `repoPath = SecondaryResticRepoPath(destBase)` → `<dest>/backups/secondary/restic/`
|
||||
|
||||
### 4c. DB dump source path
|
||||
|
||||
Currently: `r.dbDumpDir` (global directory)
|
||||
Now: per-app dump dir: `AppDBDumpPath(appDrivePath, stackName)`
|
||||
|
||||
The cross-drive runner needs to know the app's home drive to find its DB dumps.
|
||||
- Add `GetAppDrivePath` method to CrossDriveRunner (or pass via StackDataProvider)
|
||||
- **Do NOT change** any template files, CSS, or UI text — only Go backend files
|
||||
- **Do NOT change** any function signatures that are part of the public API (other packages import them)
|
||||
- **Do NOT change** the scheduler wiring in `main.go` — the fix should be in the backup package
|
||||
- **Run `go vet ./...` and `go build ./...`** after all changes to verify no compilation errors
|
||||
- Keep all log messages in English (UI text is Hungarian, but log messages are English)
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Protected paths & delete safety
|
||||
## Files to Modify
|
||||
|
||||
### 5a. `internal/stacks/delete.go` — ProtectedHDDPaths()
|
||||
|
||||
**Current:**
|
||||
```go
|
||||
return map[string]bool{
|
||||
hddPath: true,
|
||||
filepath.Join(hddPath, "media"): true,
|
||||
filepath.Join(hddPath, "storage"): true,
|
||||
filepath.Join(hddPath, "Dokumentumok"): true,
|
||||
filepath.Join(hddPath, "appdata"): true,
|
||||
}
|
||||
```
|
||||
|
||||
**Change to:**
|
||||
```go
|
||||
return map[string]bool{
|
||||
hddPath: true,
|
||||
filepath.Join(hddPath, "appdata"): true,
|
||||
filepath.Join(hddPath, "backups"): true,
|
||||
filepath.Join(hddPath, "media"): true,
|
||||
filepath.Join(hddPath, "Dokumentumok"): true,
|
||||
}
|
||||
```
|
||||
|
||||
Remove `storage` (gone), add `backups`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Filebrowser mount sync
|
||||
|
||||
### 6a. `internal/web/handlers.go` — syncFileBrowserMounts()
|
||||
|
||||
**Current:** Mounts each registered path as one volume: `<path>:/srv/<basename>`
|
||||
This exposes EVERYTHING on the drive, including encrypted restic repos and raw appdata.
|
||||
|
||||
**Change to:** Mount specific subdirectories per drive:
|
||||
```go
|
||||
for _, sp := range paths {
|
||||
driveName := filepath.Base(sp.Path) // "hdd_1", "sys_drive"
|
||||
|
||||
// User media
|
||||
mediaPath := filepath.Join(sp.Path, "media")
|
||||
if dirExists(mediaPath) {
|
||||
storageMounts = append(storageMounts,
|
||||
fmt.Sprintf(" - %s:/srv/%s/media", mediaPath, driveName))
|
||||
}
|
||||
|
||||
// User documents
|
||||
docsPath := filepath.Join(sp.Path, "Dokumentumok")
|
||||
if dirExists(docsPath) {
|
||||
storageMounts = append(storageMounts,
|
||||
fmt.Sprintf(" - %s:/srv/%s/Dokumentumok", docsPath, driveName))
|
||||
}
|
||||
|
||||
// Secondary backup copies (rsync — browseable for file recovery)
|
||||
secPath := filepath.Join(sp.Path, "backups", "secondary")
|
||||
if dirExists(secPath) {
|
||||
storageMounts = append(storageMounts,
|
||||
fmt.Sprintf(" - %s:/srv/%s/backups:ro", secPath, driveName))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This gives Filebrowser users access to:
|
||||
- Their media files (movies, music, etc.)
|
||||
- Their documents
|
||||
- Secondary backup copies (rsync) for file recovery
|
||||
- NOT raw appdata (dangerous), NOT restic repos (useless)
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: App catalog changes
|
||||
|
||||
### 7a. Compose file updates (`app-catalog-felhom.eu`)
|
||||
|
||||
All 11+ apps with `needs_hdd: true`: rename `${HDD_PATH}/storage/` → `${HDD_PATH}/appdata/` in volume mounts.
|
||||
|
||||
**Apps to update** (grep for `storage/` in compose files):
|
||||
- immich, paperless-ngx, audiobookshelf, calibre-web, emby, jellyfin, komga, navidrome, nextcloud, plex, radarr, romm, sonarr
|
||||
|
||||
Each compose file's volumes section changes, e.g.:
|
||||
```yaml
|
||||
# Before:
|
||||
- ${HDD_PATH}/storage/immich:/usr/src/app/upload
|
||||
# After:
|
||||
- ${HDD_PATH}/appdata/immich:/usr/src/app/upload
|
||||
```
|
||||
|
||||
### 7b. Media-centric apps (Jellyfin, Plex, Emby, Radarr, Sonarr)
|
||||
|
||||
These apps also mount media directories. Check if they reference `${HDD_PATH}/media/` — if so, that's correct (no rename needed for media/).
|
||||
|
||||
### 7c. `.felhom.yml` files
|
||||
|
||||
The `HDD_PATH` field metadata doesn't reference `storage/` — it just declares the env var. Description says "külső merevlemez elérési útja" which is fine. No changes needed.
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Setup script updates
|
||||
|
||||
### 8a. `scripts/docker-setup.sh`
|
||||
|
||||
- Update `install_filebrowser()` volume mounts to use new per-subdirectory pattern
|
||||
- Or: remove Filebrowser initial mounts entirely (controller will sync them on startup)
|
||||
|
||||
### 8b. `scripts/hdd-setup.sh`
|
||||
|
||||
- Update `STORAGE_DIRS` to remove `storage/` entries
|
||||
- Update to use `appdata/` naming
|
||||
- Or: mark as deprecated (controller handles disk init now)
|
||||
|
||||
---
|
||||
|
||||
## Phase 9: controller.yaml update
|
||||
|
||||
New controller.yaml for demo node (after OS reinstall with SSD partition):
|
||||
|
||||
```yaml
|
||||
paths:
|
||||
stacks_dir: "/opt/docker/stacks"
|
||||
system_data_path: "/mnt/sys_drive"
|
||||
|
||||
backup:
|
||||
enabled: true
|
||||
restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
|
||||
db_dump_schedule: "02:30"
|
||||
restic_schedule: "03:00"
|
||||
retention:
|
||||
keep_daily: 7
|
||||
keep_weekly: 4
|
||||
keep_monthly: 6
|
||||
prune_schedule: "sunday"
|
||||
```
|
||||
|
||||
No more `restic_repo`, `db_dump_dir`, `backup_dir`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 10: UI — Tároló section (simple update)
|
||||
|
||||
The Tároló section on the backup page needs to work with the new multi-drive, multi-repo architecture. Since we already agreed to show combined stats (not paths):
|
||||
|
||||
- **Tier 1 summary:** Aggregate snapshot count + total size across all primary repos
|
||||
- **Tier 2 summary:** How many apps configured, total size
|
||||
- **Keep:** Encryption key display (same password for all repos)
|
||||
- **Remove:** Path displays, DB dump section (unnecessary detail)
|
||||
|
||||
---
|
||||
|
||||
## Gotchas & risks
|
||||
|
||||
1. **Grep for ALL references** to removed config fields: `BackupDir`, `DBDumpDir`, `ResticRepo`, `cfg.Backup.ResticRepo`, `cfg.Paths.DBDumpDir`, `cfg.Paths.BackupDir`
|
||||
2. **ResticManager refactor** changes all call sites — grep for `m.restic.` in backup.go
|
||||
3. **DB dump path in crossdrive.go** — currently `r.dbDumpDir` (global). Needs per-app resolution.
|
||||
4. **Snapshot aggregation** — merging snapshots from multiple repos for the UI. Need to handle different repo sizes, dedup by timestamp.
|
||||
5. **New restic repo initialization** — when a new drive is registered and first backup runs, `restic init` must succeed before `restic backup`. The `EnsureInitialized` pattern already exists.
|
||||
6. **Empty drives** — a drive with no apps deployed yet should NOT get a restic backup (empty paths). Skip drives with zero apps.
|
||||
7. **The `systemDataPath` as fallback** — SSD-only apps (Mealie, Gokapi) have no HDD_PATH. Their drive is `cfg.Paths.SystemDataPath`. Make sure this path exists and is registered as a storage path.
|
||||
8. **Compose config files in multiple repos** — `/opt/docker/stacks/` is included in every drive's primary repo. This means the same files are in multiple repos. That's intentional (each repo is self-contained) but uses slightly more storage.
|
||||
9. **The `ParseComposeHDDMounts` function** references `${HDD_PATH}` with `storage/` subdirs. After rename to `appdata/`, the compose files change, so the parsed mounts change too. The function itself is generic (parses any `${HDD_PATH}` prefix) so it doesn't need code changes — only the compose templates change.
|
||||
10. **docker-compose.yml volumes in felhom-controller** — currently `- /srv/backups:/srv/backups`. This mount becomes unnecessary since all backups are under `/mnt/`. The `/mnt:/mnt:rshared` mount already provides access. Can remove the `/srv/backups` volume mount from the controller's compose file.
|
||||
|
||||
---
|
||||
|
||||
## Implementation order
|
||||
|
||||
1. **Phase 1** — Config + path helpers + StackDataProvider update (foundation, everything depends on this)
|
||||
2. **Phase 7** — App catalog compose files (independent, can do in parallel)
|
||||
3. **Phase 5** — Protected paths (quick, independent)
|
||||
4. **Phase 2** — DB dump refactor
|
||||
5. **Phase 3** — Restic backup refactor (depends on Phase 1 + 2)
|
||||
6. **Phase 4** — Cross-drive backup refactor (depends on Phase 1)
|
||||
7. **Phase 6** — Filebrowser mount sync
|
||||
8. **Phase 10** — UI Tároló section
|
||||
9. **Phase 8** — Setup scripts
|
||||
10. **Phase 9** — controller.yaml
|
||||
|
||||
Build, deploy to reinstalled demo node, verify.
|
||||
|
||||
---
|
||||
|
||||
## Files to modify
|
||||
|
||||
### Controller (deploy-felhom-compose/controller/)
|
||||
| File | Phase | Changes |
|
||||
|------|-------|---------|
|
||||
| `internal/config/config.go` | 1a | Add SystemDataPath, remove BackupDir/DBDumpDir/ResticRepo |
|
||||
| `internal/backup/paths.go` | 1b | **NEW FILE** — path computation helpers |
|
||||
| `internal/backup/appdata.go` | 1d | Add GetStackHDDPath to StackDataProvider |
|
||||
| `cmd/controller/main.go` | 1d | Implement GetStackHDDPath in stackAdapter |
|
||||
| `internal/backup/backup.go` | 2+3 | DumpAll, DumpStackDB, RunBackup, RefreshCache, GetFullStatus, RunPrune, RunCheck |
|
||||
| `internal/backup/restic.go` | 3a | Make repoPath a parameter, not instance field |
|
||||
| `internal/backup/crossdrive.go` | 4 | Update destination paths, DB dump source paths |
|
||||
| `internal/stacks/delete.go` | 5 | Update ProtectedHDDPaths |
|
||||
| `internal/web/handlers.go` | 6+10 | syncFileBrowserMounts, backupsHandler |
|
||||
| `internal/web/templates/backups.html` | 10 | Tároló section |
|
||||
|
||||
### App catalog (app-catalog-felhom.eu/)
|
||||
| File | Phase | Changes |
|
||||
|------|-------|---------|
|
||||
| `templates/*/docker-compose.yml` (11+ files) | 7 | `storage/` → `appdata/` in volume mounts |
|
||||
|
||||
### Scripts
|
||||
| File | Phase | Changes |
|
||||
|------|-------|---------|
|
||||
| `scripts/docker-setup.sh` | 8 | Filebrowser mounts, path references |
|
||||
| `scripts/hdd-setup.sh` | 8 | Directory structure arrays |
|
||||
|
||||
### Config
|
||||
| File | Phase | Changes |
|
||||
|------|-------|---------|
|
||||
| Demo node `controller.yaml` | 9 | New paths config |
|
||||
| Demo node `docker-compose.yml` | 10 | Remove `/srv/backups` mount |
|
||||
| File | Bug(s) | Changes |
|
||||
|------|--------|---------|
|
||||
| `internal/backup/crossdrive.go` | 1, 3 | Add rsync `--exclude` flags; add nil check for GetDiskUsage |
|
||||
| `internal/backup/backup.go` | 2, 4 | Extract `acquireRunning`/`releaseRunning` + internal methods; add systemDataPath validation |
|
||||
|
||||
Reference in New Issue
Block a user