v0.53.0: Phase 2 capture side — per-app secret-free recovery unit

Each app's on-drive backup becomes a self-contained, recreatable recovery unit:
compose/ (docker-compose.yml + .felhom.yml + secret-stripped app.yaml) alongside
the existing db-dumps/ + volume-dumps/, plus a secret-free manifest.json (image
pins, secret env-var NAMES, data_key names, checksums). The unit stores no secret
value, no data-key, and not the image — secrets are recovered at restore from the
guest's own app.yaml (live/PBS), never regenerated.

- appbackup: RecoveryUnit* path helpers, RecoveryInfo + GetStackRecoveryInfo,
  ParseComposeImages; AppDBDump/Volume refactored onto RecoveryUnitPath.
- backup: recovery_unit.go (manifest + CaptureRecoveryUnit), wired into RunDBDumps;
  capture test proves secret-free.
- stacks: DeployField.DataKey + Metadata.DataKeyEnvVars(); main.go stackAdapter
  implements GetStackRecoveryInfo (excludes secret-named + encrypted values).
- Restore-from-unit recreate + fail-closed gate + live AdventureLog validation: next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 10:20:37 +02:00
parent 5eb25c3861
commit 70eb521cd0
9 changed files with 586 additions and 3 deletions
+35
View File
@@ -1,5 +1,40 @@
## Changelog
### v0.53.0 — Phase 2: per-app self-contained recovery unit (capture side, SECRET-FREE) (2026-06-13)
Each app's on-drive backup becomes a complete, recreatable **recovery unit** — not just DB dumps +
volume tars, but the app's *definition* too, so it can be recreated. The unit is **secret-free by
design** (decided after reading the actual hub code: the hub is deliberately zero-knowledge and holds
no app secrets; app.yaml + the encryption key live on the guest rootfs → already inside the PBS
whole-guest snapshot). Secrets/data-keys are recovered at restore from the guest's own app.yaml (live,
or via PBS) — **never stored in the unit, never regenerated**.
- **Unit layout** (rooted at the existing `backups/primary/<app>/` — no risky dump-dir migration):
`compose/` (docker-compose.yml + .felhom.yml + a **secret-stripped** app.yaml) + the existing
`db-dumps/` + `volume-dumps/` + `manifest.json`. New path helpers `RecoveryUnitPath` /
`RecoveryUnitComposePath` / `RecoveryUnitManifestPath` in `internal/appbackup/paths.go`
(`AppDBDumpPath`/`AppVolumeDumpPath` refactored onto `RecoveryUnitPath` — identical resolved paths).
- **Secret-free manifest** (`internal/backup/recovery_unit.go`): app id, display name, controller
version, timestamp, drive, namespace root, pinned **image tags** (image NOT stored — re-pulled on
restore), the **NAMES** of secret env vars (values never stored), the `data_key` env-var names, the
explicit `secret_source` note ("guest app.yaml (live) or PBS — never stored in this unit"), captured
config-file list, enumerated dumps, and sha256 checksums of the captured config.
- **Capture has no secret access:** non-secret env is plaintext in app.yaml; the capture simply excludes
the secret-named keys (plus a defensive `crypto.IsEncrypted` guard), so it reads no secret value. New
`StackDataProvider.GetStackRecoveryInfo` + `RecoveryInfo` (in `appbackup`), implemented by the main.go
`stackAdapter`; `ParseComposeImages` extracts the image pins.
- **`data_key` annotation** (`DeployField.DataKey`, `Metadata.DataKeyEnvVars()`): marks a
data-encrypting key (e.g. AdventureLog's "Titkosítási kulcs", `SECRET_KEY`) — a **fail-closed** safety
annotation for restore (refuse + warn rather than regenerate-and-corrupt), NOT a per-secret
preserve/regenerate decision. Catalog: `adventurelog/.felhom.yml` `SECRET_KEY` marked `data_key: true`.
- **Wired into the dump flow:** `RunDBDumps` refreshes every deployed app's recovery unit after the DB
dumps (best-effort per app; skips disconnected/decommissioned drives). Capture test
(`recovery_unit_test.go`) proves the unit is secret-free (a secret in the source app.yaml never
appears in the unit) and the manifest structure.
- **NOT in this increment (next):** the restore-from-unit *recreate* (re-pull + compose-up + secret
recovery from guest/PBS) and its fail-closed `data_key` gate, with live AdventureLog readable-data
validation. The README backup-paths section (stale restic/secondary) is rewritten when Tier 2 lands.
### v0.52.0 — Phase 1 GATE: deploy-side double-nest fix + path-agreement lock (2026-06-13)
Completes the Model-A double-nest reconciliation deferred in v0.48.0. v0.51.0 fixed the **backup