# REPORT — felhom-controller v0.53.1 (Phase 2: per-app recovery unit, capture side) Each app's on-drive backup is now a self-contained, recreatable **recovery unit** — and it is **secret-free by design**. Built, unit-tested, shipped to `main`, and validated live on guest 9201. (Phase 1, the deploy-side double-nest GATE, shipped earlier as v0.52.0 — see git history.) ## The design decision that shaped Phase 2 (secret handling) The recovery unit carries **no secrets, no data-keys, and not the Docker image**. This was decided after reading the *actual* hub code (the controller README that implied the hub stores app.yaml is stale pre-strip): - The hub is deliberately **zero-knowledge** — it holds a per-host recovery-code-wrapped PBS key it cannot decrypt + non-secret config; **no per-app secrets**. Escrowing app secrets there would regress that posture, so it was rejected. - `app.yaml` (encrypted) + the encryption key live on the **guest rootfs** (`local-lvm:vm-9201-disk-0`, confirmed via `pct config`) → already inside the **PBS whole-guest snapshot**; the external drive (`mp0` bind) is not. So the secret↔data split maps onto the tiers: **secrets ride PBS; bulk userdata rides the drive + (Phase 3) Tier 2.** - Therefore: secret-free unit; restore recovers the original secrets from the guest's own app.yaml (live, else PBS); **regenerate nothing**. `data_key` is a fail-closed annotation, not a preserve/regenerate decision. ## What shipped (v0.53.0 + v0.53.1) - **Unit layout** (rooted at the existing `backups/primary//` — a deliberate low-churn choice, no risky dump-dir migration): `compose/` (docker-compose.yml + .felhom.yml + a **secret-stripped** app.yaml) + the existing `db-dumps/` + `volume-dumps/` + `manifest.json`. New helpers `RecoveryUnit{Path,ComposePath,ManifestPath}` (`internal/appbackup/paths.go`). - **Secret-free manifest** (`internal/backup/recovery_unit.go`): app id, display name, controller version, timestamp, drive, namespace root, **image pins** (image NOT stored — re-pulled on restore), the **NAMES** of secret env vars (values never stored), `data_key` env-var names, an explicit `secret_source` note, captured config-file list, enumerated dumps, sha256 checksums. - **Capture needs no secret access:** non-secret env is plaintext in app.yaml, so the capture excludes secret-named keys (plus a defensive `crypto.IsEncrypted` guard) and reads no secret value. New `StackDataProvider.GetStackRecoveryInfo` + `RecoveryInfo`, implemented by the main.go `stackAdapter`; `ParseComposeImages` extracts pins. - **`data_key`**: `DeployField.DataKey` + `Metadata.DataKeyEnvVars()`; catalog `adventurelog/.felhom.yml` `SECRET_KEY` ("Titkosítási kulcs") marked `data_key: true`. - **Refresh cadence (v0.53.1):** capture runs from the daily DB dump AND the periodic `RefreshCache` (startup + every 5m), **idempotent** — content is built in memory and writes are skipped when the unit is already current (checksum + dump-set + version), so a spinning USB drive is not thrashed. - **Tests:** capture is secret-free (a secret in the source app.yaml never appears in the unit) + manifest structure + idempotency (unchanged → skip; config change → rewrite). `go build ./...` clean. ## Deploy mechanism (resolved this session) The controller in guest 9201 is **golden/bootstrap-managed**: `felhom-controller-bootstrap.service` runs `/usr/local/sbin/felhom-controller-bootstrap.sh`, which `docker run`s the tag from `/etc/felhom-controller-image` (gitea anon-pull, no login). Deploy = build+push tag → anon-pull → update that tag file → `systemctl restart felhom-controller-bootstrap.service`. Data volume + encryption key persist. (This is what "self-update handles version drift" refers to.) ## Live validation (guest 9201, demo-felhom) - Deployed v0.53.1; on startup `RefreshCache` captured units: **romm** (`images=3, secrets-referenced=3, data_keys=0`) and **actualbudget** (`images=1`, system-fallback path `…/sys_drive/felhom-data/…`). - RomM unit on disk: `compose/{app.yaml,docker-compose.yml,.felhom.yml}` + `db-dumps/romm-mariadb.sql` + `manifest.json`. Manifest is secret-free (image pins + secret NAMES + `secret_source`); captured app.yaml holds only DOMAIN/HDD_PATH/SUBDOMAIN with the three secret names listed as stripped. - **Secret-leak grep against the three actual RomM secret values → `NO_LEAK`.** Idempotency confirmed (single capture log line; the 5m refresh skips). ## Not done — Phase 2b (the immediate next increment) The restore-from-unit **recreate** (write compose/config back → re-pull image from pins → recover secrets from the guest's app.yaml, live or via PBS → restore DB+volumes+userdata → boot), the **fail-closed `data_key` gate** (refuse + warn if an encrypted app's key is unrecoverable), and the live **AdventureLog readable-data** validation (deploy with an encryption key → back up → recreate → confirm data decrypts). The existing `RestoreApp` still does the live-guest volume-tar restore. The README backup-paths section still describes the stale restic/secondary layout — rewritten when Tier 2 (Phase 3) lands.