v0.15.5: Disaster recovery — Hub-based infra backup, auto-mount, restore UI

Complete DR implementation (TASK2.md Phases 1-4):
- Hub infra-backup push/pull endpoints (controller.yaml, disk layout, stacks)
- Fresh-deployment detection pulls config from Hub, auto-mounts drives by UUID
- Full-page restore UI with drive status, app table, sequential restore
- docker-setup.sh shows DR instructions when customer_id is configured

New files: disk_layout.go, restore_scan.go, restore_app_linux.go,
restore_drives_linux.go, infra_backup.go, infra_pull.go,
handler_restore.go, restore.html

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-19 13:16:46 +01:00
parent 5d993b66a2
commit 6713df2186
21 changed files with 3324 additions and 9 deletions
+46 -5
View File
@@ -4,7 +4,7 @@
A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.
**Current version: v0.15.4**
**Current version: v0.15.5**
---
@@ -593,14 +593,49 @@ Periodic JSON push (default every 15 min) to the central felhom-hub service:
Bearer token authentication, 3-attempt retry with 5-second backoff.
#### Infrastructure Backup to Hub (`internal/report/infra_backup.go`)
After each backup cycle, the controller pushes a full infrastructure snapshot to the Hub for disaster recovery. This snapshot includes:
- `controller.yaml` (base64-encoded, full config including secrets)
- `settings.json` (base64-encoded, backup prefs, storage paths, cross-drive configs)
- Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
- Deployed stacks manifest (app names, HDD paths)
- Restic passwords (primary + cross-drive, base64-encoded)
This enables fully automated recovery when the system drive is replaced — the new controller pulls the snapshot from the Hub, auto-mounts surviving drives by UUID, and restores all applications.
#### Hub Dashboard
The hub service (separate Go app in the `felhom.eu` repo) provides:
- Multi-customer overview table with status indicators
- Customer detail page with system/storage/containers/backup/health sections
- Infra backup status per customer (last sync, stack count, disk count)
- Color coding: green (<30min), yellow (30-60min), red (>60min since last report)
- 90-day report retention with daily prune
### 9. Disaster Recovery
When a system drive fails and is replaced, the controller can automatically restore the full deployment:
```
1. docker-setup.sh deploys fresh controller (Hub enabled, customer_id configured)
2. Controller detects empty data dir → fresh deployment
3. Controller pulls infra backup from Hub → gets disk layout, passwords, configs
4. Controller scans block devices for UUIDs matching stored disk layout
5. Controller mounts surviving drives (e.g., HDD with backups)
6. Controller scans mounted drives for local backup data (_infra/ + rsync copies)
7. Controller auto-restores stack configs → apps appear in dashboard
8. User opens dashboard → "Visszaállítás" (Restore) wizard
9. User confirms → sequential restore: rsync first, restic fallback, DB import
10. Apps restored and running
```
**Backup sources (priority order):**
1. **Rsync copies** (cross-drive, plain files, no password needed) — fastest, most reliable
2. **Restic snapshots** (encrypted, needs password from Hub) — comprehensive but slower
**Fallback:** If the Hub is unreachable, the controller can still detect backups on already-mounted drives (manual mount or pre-existing fstab entries).
---
## Repository Layout
@@ -631,7 +666,10 @@ controller/
│ │ ├── restic.go # Restic operations (init, snapshot, prune, check) — repoPath as param
│ │ ├── appdata.go # StackDataProvider interface, app data discovery
│ │ ├── crossdrive.go # Per-app backup to secondary storage (rsync/restic)
│ │ ── restore.go # Per-app restore from per-drive repo
│ │ ── restore.go # Per-app restore from per-drive repo
│ │ ├── restore_scan.go # DR: scan drives for backup data, build restore plan
│ │ ├── restore_app_linux.go # DR: per-app restore (rsync config/data + docker compose up)
│ │ └── restore_drives_linux.go # DR: auto-mount drives by UUID from Hub infra backup
│ ├── api/router.go # REST API endpoints (~30 routes)
│ ├── scheduler/scheduler.go # Central job scheduler (Every, Daily)
│ ├── system/
@@ -648,16 +686,18 @@ controller/
│ ├── notify/notifier.go # Email relay to hub, preference sync, cooldowns
│ ├── report/
│ │ ├── builder.go # Hub report builder (all subsystems → JSON)
│ │ ── pusher.go # HTTP POST to hub (retry, Bearer auth)
│ │ ── pusher.go # HTTP POST to hub (retry, Bearer auth)
│ │ └── infra_pull.go # DR: pull infra backup from Hub for fresh deployment
│ └── web/
│ ├── server.go # HTTP server, routing, static files
│ ├── auth.go # Session auth, login/logout, session cleanup
│ ├── handlers.go # Page handlers (dashboard, stacks, deploy, backups, etc.)
│ ├── handler_restore.go # DR: restore page handler + APIs (scan, restore all, skip)
│ ├── storage_handlers.go # Storage API handlers (scan, format, attach, migrate, cleanup)
│ ├── alerts.go # State-based alert generation
│ ├── funcmap.go # Template functions (state colors, Hungarian formatting)
│ ├── embed.go # go:embed for templates + Chart.js
│ └── templates/ # 12 HTML files + style.css (Hungarian UI)
│ └── templates/ # 13 HTML files + style.css (Hungarian UI)
├── configs/
│ ├── controller.yaml.example # Full config reference
│ └── example-felhom-metadata.yml # .felhom.yml format reference
@@ -869,6 +909,7 @@ See `docker-compose.yml` for the full volume configuration.
- [x] Cross-drive restic pruning (v0.14.0)
- [x] Auto Tier 2 for small apps (v0.14.1) — auto-enable daily rsync for non-HDD apps when ≥2 drives
- [x] Infrastructure config in cross-drive backup (v0.14.1) — stacks dir + controller.yaml in `_infra/` + restic
- [x] Disaster recovery (v0.15.5) — Hub-based infra backup, auto-mount by UUID, restore UI with full-page takeover
### In Progress / Planned
@@ -885,7 +926,7 @@ See `docker-compose.yml` for the full volume configuration.
| Node | Hardware | Domain | Status |
|------|----------|--------|--------|
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | Controller v0.15.0 |
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | Controller v0.15.5 |
| pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | Not yet tested |
## Related Repositories