feat: backup safety — stop-before-dump, streaming restore, health check, per-app restic, infra configs (v0.34.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-28 08:56:48 +01:00
parent 783830a9d4
commit fb11c3b75a
8 changed files with 147 additions and 33 deletions
+9 -7
View File
@@ -336,6 +336,7 @@ Path computation is centralized in `backup/paths.go` via the `FelhomDataDir = "f
**Phase 1b — Docker Volume Dumps** (`internal/backup/backup.go`, runs after DB dumps)
- Iterates all deployed stacks that have Docker named volumes (`GetDockerVolumes()`)
- **v0.34.0:** Each stack is stopped before dump, restarted after (`DumpAppVolumesSafe()`) — prevents inconsistent tars of live databases. Protected stacks (traefik, etc.) that reject StopStack are skipped with a warning.
- For each volume: `docker run --rm -v <vol>:/vol:ro -v <dumpDir>:/out alpine tar cf /out/<vol>.tar -C /vol .`
- 10-minute timeout per volume; warnings on failure (non-fatal)
- Stale tars cleaned up (volumes that no longer exist)
@@ -347,12 +348,12 @@ Path computation is centralized in `backup/paths.go` via the `FelhomDataDir = "f
- Apps are **grouped by drive** via `groupStacksByDrive()` — each drive's apps are backed up to that drive's restic repo
- App drive resolution: `GetStackHDDPath()` (from `StackDataProvider`) → falls back to `SystemDataPath`
- Auto-generated repository password (32 random bytes, base64url), shared across all repos, synced to hub
- **Paths included in every per-drive snapshot:**
- **Paths included in each per-drive snapshot (v0.34.0: per-app scoped):**
- Per-app DB dump dirs on that drive
- Per-app Docker volume dump dirs (`volume-dumps/*.tar`)
- Per-app HDD mount paths (user data)
- Stacks dir (compose.yml + app.yaml + .felhom.yml for all apps)
- `controller.yaml` (controller config)
- Per-app stack config dir (`<StacksDir>/<stackName>/` — only for stacks on this drive)
- `controller.yaml` — only on the system drive (not duplicated across all drives)
- Auto-detects and unlocks stale locks (restic repo lock)
- Weekly prune on Sundays with configurable retention (keep-daily, keep-weekly, keep-monthly)
- Weekly integrity check (`restic check`) on Sunday 04:00 — checks **all** primary repos
@@ -377,7 +378,7 @@ data back up config + DB + user data + Docker volumes; apps without HDD back up
- **restic** — Versioned, deduplicated, encrypted (shared repo across apps, not browsable)
- Per-app configuration in settings.json: destination path, method, schedule (daily/weekly/manual)
- **Pre-backup DB dump:** `DumpStackDB()` runs fresh pg_dump/mariadb-dump before each cross-drive backup; non-fatal on failure (wired via `DBDumper` interface to avoid circular imports)
- **Pre-backup volume dump (v0.33.0):** `DumpAppVolumes()` exports Docker named volumes to tar before each cross-drive backup (wired via `VolumeDumper` interface)
- **Pre-backup volume dump (v0.33.0, safe stop/start v0.34.0):** `DumpAppVolumesSafe()` stops the stack, exports Docker named volumes to tar, restarts — wired via `VolumeDumper` interface
- **Empty mounts allowed:** `RunAppBackup` accepts apps with no HDD mounts — the rsync
mount loop simply doesn't execute, but DB + config copy still runs
- **Drive-type-aware validation** (`ValidateDestination`):
@@ -440,16 +441,17 @@ appear in the restore dropdown with per-app snapshot filtering.
- Config only: "Csak konfiguracio visszaallitasa"
**Tier 1 restore** (`RestoreApp`):
- Stop app → resolve app's home drive → `restic restore <id> --target / --include <path>...` → populate Docker volumes from restored tars → restart app
- Stop app → resolve app's home drive → `restic restore <id> --target / --include <path>...` → populate Docker volumes from restored tars → restart app → health check
- Restore paths: config dir, DB dump dir, volume dump dir, HDD mounts
- Docker volumes restored via `restoreDockerVolumes()`: `docker volume rm -f` → `docker volume create` → `docker run alpine tar xf`
**Tier 2 restore** (`RestoreAppFromTier2`):
- Stop app → rsync config from `_config/` → rsync HDD data (single/multi-mount) → copy DB dumps from `_db/` → restore Docker volumes from `_volumes/` tars → restart app
- Stop app → rsync config from `_config/` → rsync HDD data (single/multi-mount) → copy DB dumps from `_db/` (streaming `copyFile`) → restore Docker volumes from `_volumes/` tars → restart app → health check
- Uses rsync `--delete` for config and HDD data to ensure exact mirror state
- Single-mount apps: data directly in rsync dir (excluding `_*`); multi-mount: per-leaf subdirectories
**Common:**
- **v0.34.0:** Post-restore health check (`waitForHealthy`) polls container state with `docker ps` refresh every 5s for up to 90s. Warning logged if app doesn't reach running state; restore still returns success (data is restored regardless).
- Running flag prevents concurrent backup/restore operations
- Snapshot ID validated (8-64 lowercase hex, or special `tier2-rsync`)
- Import from `.fab` bundle link shown in restore section for cross-system migration
@@ -970,7 +972,7 @@ After each backup cycle (including manual Tier 2 triggers via `OnCrossDriveCompl
- `controller.yaml` (base64-encoded, full config including secrets)
- `settings.json` (base64-encoded, backup prefs, storage paths, cross-drive configs)
- Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
- Deployed stacks manifest (app names, HDD paths)
- Deployed stacks manifest (app names, HDD paths) with actual config files: `docker-compose.yml`, `app.yaml`, `.felhom.yml` (base64-encoded per stack, v0.34.0)
- Restic passwords (primary + cross-drive, base64-encoded)
This enables fully automated recovery when the system drive is replaced — the new controller pulls the snapshot from the Hub, auto-mounts surviving drives by UUID, and restores all applications.