v0.22.0: First-run setup wizard, local infra backup, hub verification

New controller features:
- Web-based setup wizard replaces docker-setup.sh interactive config
  - Dual listener: :8080 (Traefik) + :8081 (direct HTTP for LAN)
  - Drive scanner finds .felhom-infra-backup/ on all block devices
  - Hub recovery pull (GET /api/v1/recovery/{id}) with retrieval password
  - Fresh install: Hub config download or manual wizard
  - CSRF protection, state persistence, Hungarian UI
- Local infra backup written to all connected drives after each backup cycle
  - .felhom-infra-backup/backup.json + metadata.json with SHA256 checksum
- Hub verification: parse customer_blocked from report push response
  - Limited mode after 7 days without verification
- Recovery info page on Settings + recovery-info.txt file generation
- Pending events queue: DR events sent to Hub on next report push
- docker-setup.sh v6.0.0: removed interactive wizard, minimal controller.yaml only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-21 12:33:17 +01:00
parent e217c3a445
commit 6eb75204b6
28 changed files with 2970 additions and 505 deletions
+103 -21
View File
@@ -4,7 +4,7 @@
A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.
**Current version: v0.21.0**
**Current version: v0.22.0**
---
@@ -20,6 +20,8 @@ A single, lightweight Go container that replaces Portainer + scattered systemd s
- [Update Management](#6-update-management)
- [Authentication & Settings](#7-authentication--settings)
- [Central Hub](#8-central-hub-reporting)
- [Setup Wizard](#9-first-run-setup-wizard)
- [Disaster Recovery](#10-disaster-recovery)
- [Repository Layout](#repository-layout)
- [Configuration](#configuration)
- [REST API](#rest-api)
@@ -812,28 +814,95 @@ The hub service (separate Go app in the `felhom.eu` repo) provides:
- Color coding: green (<30min), yellow (30-60min), red (>60min since last report)
- 90-day report + event retention with daily prune at 04:30 Budapest time
### 9. Disaster Recovery
### 9. First-Run Setup Wizard
When a system drive fails and is replaced, the controller can automatically restore the full deployment:
When the controller starts with no valid customer configuration (`customer.id` empty or `"demo-felhom"`), it enters **setup mode** — a web-based wizard that handles all initial configuration. This replaces the old interactive shell wizard in `docker-setup.sh`.
#### Setup Mode Detection (`internal/setup/setup.go`)
`NeedsSetup(cfg)` returns true when `customer.id` is empty or `"demo-felhom"`. In setup mode, the controller skips normal startup (no scheduler, no backup, no stacks) and serves only the wizard UI on two listeners:
- `:8080` — behind Traefik (accessible via domain, e.g. `https://felhom.example.com`)
- `:8081` — direct HTTP (accessible via LAN IP, e.g. `http://192.168.0.100:8081`)
#### Wizard Flow
```
1. docker-setup.sh deploys fresh controller (Hub enabled, customer_id configured)
2. Controller detects empty data dir → fresh deployment
3. Controller pulls infra backup from Hub → gets disk layout, passwords, configs
4. Controller scans block devices for UUIDs matching stored disk layout
5. Controller mounts surviving drives (e.g., HDD with backups)
6. Controller scans mounted drives for local backup data (_infra/ + rsync copies)
7. Controller auto-restores stack configs → apps appear in dashboard
8. User opens dashboard → "Visszaállítás" (Restore) wizard
9. User confirms → sequential restore: rsync first, restic fallback, DB import
10. Apps restored and running
┌──────────────────────────────────┐
│ 1. Welcome │
│ Choose: Restore / Fresh install │
└─────────┬───────────┬────────────┘
│ │
┌─────▼─────┐ ┌──▼───────────────┐
│ 2a. Scan │ │ 2b. Hub download │
│ drives for│ │ (customer ID + │
│ local │ │ password) │
│ backups │ │ │
└─────┬─────┘ └──────┬────────────┘
│ │
┌─────▼─────┐ │
│ 2a.2 Hub │ │
│ recovery │ │
│ (fallback)│ │
└─────┬─────┘ │
│ │
┌─────▼─────┐ ┌──────▼───────────┐
│ Execute │ │ Execute fresh │
│ restore │ │ install │
└─────┬─────┘ └──────┬───────────┘
│ │
└───────┬───────┘
os.Exit(0) → Docker restarts
→ normal mode
```
#### Key Components
| File | Purpose |
|------|---------|
| `setup/setup.go` | `NeedsSetup()` detection, `SetupState` persistence to `setup-state.json` |
| `setup/handlers.go` | HTTP handlers for each wizard step (welcome, scan, hub-restore, fresh, manual) |
| `setup/scanner.go` | Scans all block devices for `.felhom-infra-backup/` directories via `lsblk` + temp mounts |
| `setup/hub.go` | Hub recovery pull (`GET /api/v1/recovery/{id}`) and config download |
| `setup/csrf.go` | Lightweight CSRF protection (cookie + hidden field, `SameSite=Strict`) |
| `setup/network.go` | Detects local IPs for LAN access URL display |
| `setup/templates/` | 7 embedded HTML templates (Hungarian, dark theme matching main UI) |
#### Local Infra Backup (`internal/backup/local_infra.go`)
The controller writes infrastructure snapshots to **every connected drive** after each backup cycle and on startup. Location: `<drive>/.felhom-infra-backup/`. Files:
- `backup.json` — full infra backup (config, settings, disk layout, passwords, stacks)
- `metadata.json` — schema version, timestamp, customer ID, controller version, SHA256 checksum
During setup wizard drive scan, these backups are discovered, integrity-verified, and offered for one-click restore.
#### Recovery Info (`internal/recovery/info.go`)
Generates `recovery-info.txt` on the system data partition with customer ID, Hub URL, retrieval password, and recovery instructions in Hungarian. Updated on startup and after config changes. Also displayed on the Settings page in a "Vészhelyzeti információk" section.
### 10. Disaster Recovery
When a system drive fails and is replaced, the recovery flow uses the setup wizard:
```
1. docker-setup.sh deploys fresh controller with minimal config (domain + paths only)
2. Controller detects empty customer.id → enters setup mode
3. User opens wizard at http://<LAN-IP>:8081
4. Wizard scans all drives for .felhom-infra-backup/ directories
5. If found: one-click restore (config, settings, passwords, disk layout)
6. If not found: Hub recovery via customer ID + retrieval password
7. Controller restarts into normal mode with full config
8. Controller auto-mounts surviving drives by UUID from disk layout
9. Dashboard shows "Visszaállítás" (Restore) page for app-level recovery
10. User confirms → sequential restore: rsync first, restic fallback, DB import
```
**Backup sources (priority order):**
1. **Rsync copies** (cross-drive, plain files, no password needed) — fastest, most reliable
2. **Restic snapshots** (encrypted, needs password from Hub) — comprehensive but slower
1. **Local infra backup** (`.felhom-infra-backup/` on surviving drives) — fastest, no network needed
2. **Hub recovery endpoint** (`GET /api/v1/recovery/{id}`) — requires retrieval password
3. **Manual config** (wizard form) — enter all details manually as last resort
**Fallback:** If the Hub is unreachable, the controller can still detect backups on already-mounted drives (manual mount or pre-existing fstab entries).
**Hub verification:** After setup, the controller periodically verifies customer standing via the Hub report push response (`customer_blocked` field). If blocked or Hub unreachable for >7 days, the controller enters limited mode (no new deployments).
---
@@ -841,7 +910,7 @@ When a system drive fails and is replaced, the controller can automatically rest
```
controller/
├── cmd/controller/main.go # Entry point, wires all 14 modules
├── cmd/controller/main.go # Entry point, wires all 15 modules (setup mode branch + normal startup)
├── internal/
│ ├── config/config.go # YAML loader, validation, env overrides
│ ├── settings/settings.go # Runtime settings (JSON, atomic writes, RWMutex)
@@ -860,7 +929,8 @@ controller/
│ │ └── *_other.go # Non-Linux stubs for cross-compilation
│ ├── backup/
│ │ ├── backup.go # Orchestrator (per-drive dumps + restic + cross-drive chain)
│ │ ├── paths.go # Per-drive path helpers (PrimaryResticRepoPath, AppDBDumpPath, etc.)
│ │ ├── paths.go # Per-drive path helpers (PrimaryResticRepoPath, InfraBackupDir, etc.)
│ │ ├── local_infra.go # Local infra backup to all drives (.felhom-infra-backup/)
│ │ ├── dbdump.go # DB auto-discovery + dump (pg_dump, mariadb-dump)
│ │ ├── restic.go # Restic operations (init, snapshot, prune, check) — repoPath as param
│ │ ├── appdata.go # StackDataProvider interface, app data discovery
@@ -890,8 +960,16 @@ controller/
│ ├── notify/notifier.go # Email relay to hub, preference sync, cooldowns
│ ├── report/
│ │ ├── builder.go # Hub report builder (all subsystems → JSON)
│ │ ├── pusher.go # HTTP POST to hub (retry, Bearer auth)
│ │ └── infra_pull.go # DR: pull infra backup from Hub for fresh deployment
│ │ ├── pusher.go # HTTP POST to hub (retry, Bearer auth, parses customer_blocked)
│ │ └── infra_pull.go # DR: pull recovery/config from Hub (retrieval password auth)
│ ├── setup/ # First-run setup wizard (web-based, replaces docker-setup.sh wizard)
│ │ ├── setup.go # NeedsSetup() detection, state persistence
│ │ ├── handlers.go # HTTP handlers for all wizard steps
│ │ ├── scanner.go # Drive scanner for local infra backups
│ │ ├── csrf.go # Lightweight CSRF (cookie + hidden field)
│ │ ├── network.go # Local IP detection for LAN access URLs
│ │ └── templates/ # 7 wizard HTML templates (Hungarian)
│ ├── recovery/info.go # Recovery info file generator (recovery-info.txt)
│ └── web/
│ ├── server.go # HTTP server, routing, static files
│ ├── auth.go # Session auth, login/logout, session cleanup
@@ -953,6 +1031,10 @@ monitoring:
backup: "uuid-here"
backup_integrity: "uuid-here"
web:
listen: ":8080"
setup_listen: ":8081" # Plain HTTP for setup wizard LAN access
hub:
enabled: true
url: "https://hub.felhom.eu"
@@ -966,7 +1048,7 @@ Environment variable overrides: `FELHOM_LOGGING_LEVEL=debug`, `FELHOM_HUB_ENABLE
### Runtime settings (`settings.json`)
Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache. All writes are atomic.
Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache, Hub verification state (`hub_verified`, `hub_verified_at`), retrieval password for disaster recovery, and pending event queue. All writes are atomic (write `.tmp`, rename).
### Per-app config (`app.yaml`)