v0.54.0: Phase 2b — restore-from-recovery-unit + fail-closed data-key gate
Restore recreates an app from its on-drive unit + the guest's own secrets, regenerating nothing. reconcileRestoreSecrets (pure, unit-tested) merges the unit's non-secret env with secrets recovered from the live app.yaml and FAILS CLOSED if a data-encrypting key is unrecoverable (refuse — a PBS whole-guest restore is needed — rather than regenerate and corrupt). Resettable secrets missing → warn + proceed. - backup: RestoreFromRecoveryUnit (manifest -> recover secrets -> gate -> restore volumes -> recreate definition + redeploy w/ re-pull); falls back to volume-only. - seams: RecoverStackSecrets/RecreateStackFromUnit (adapter +encKey), stacks.RedeployFromEnv. Wired into /backup/restore. - tests: gate (refuse/proceed/verbatim) + data_key parsing. Gate + reconcile + data_key parsing unit-tested; capture live-validated (v0.53.1). Full readable-data e2e vs AdventureLog needs the auth-gated dashboard restore — pending. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,33 @@
|
||||
## Changelog
|
||||
|
||||
### v0.54.0 — Phase 2b: restore-from-recovery-unit + fail-closed data-key gate (2026-06-13)
|
||||
|
||||
Restore now recreates an app from its on-drive recovery unit **plus the guest's own secrets** — never
|
||||
from secrets stored in the unit (there are none), and **regenerating nothing**.
|
||||
|
||||
- **Fail-closed data-key gate** (`reconcileRestoreSecrets`, `internal/backup/restore_unit.go` — a pure,
|
||||
exhaustively unit-tested function): merges the unit's non-secret env with the secret values recovered
|
||||
from the guest's live app.yaml. A missing/empty **data-encrypting key** (`data_key`) **aborts the
|
||||
restore** with a clear message (a PBS whole-guest restore is required) — because regenerating it would
|
||||
render stored data unreadable. A missing *resettable* secret (DB/admin password) is non-fatal (warn +
|
||||
proceed; the app may need a credential reset). Secrets are recovered, never regenerated.
|
||||
- **`RestoreFromRecoveryUnit`**: reads the unit manifest → recovers secrets from the guest
|
||||
(`RecoverStackSecrets`) → applies the gate → restores named-volume data from the unit's tars →
|
||||
recovers the app definition from the unit and redeploys with the reconstructed env (re-pulling the
|
||||
pinned image). Falls back to the legacy volume-only `RestoreApp` if no unit exists. Wired into the
|
||||
`/backup/restore` web handler.
|
||||
- **New seams:** `StackDataProvider.RecoverStackSecrets` / `RecreateStackFromUnit` (main.go
|
||||
`stackAdapter`, with the controller `encKey` for decrypting the live app.yaml); `stacks.Manager.
|
||||
RedeployFromEnv` (writes app.yaml from the full env incl. locked secrets, then `compose up -d`).
|
||||
- **Tests:** the gate (all recovered / data-key missing → refuse / empty data-key → refuse / resettable
|
||||
missing → proceed+warn, recovered values used verbatim) and `data_key` parsing from `.felhom.yml`
|
||||
(`Metadata.DataKeyEnvVars()`).
|
||||
- **Validation status:** the gate + reconciliation + data_key parsing are unit-tested (authoritative for
|
||||
the refuse/proceed/regenerate-nothing behaviour); the capture side is live-validated (v0.53.1, RomM).
|
||||
The full live **readable-data e2e** against AdventureLog (deploy → back up → restore → confirm the
|
||||
data decrypts) requires triggering the **auth-gated** `/backup/restore` from the dashboard — pending an
|
||||
operator-run on the demo.
|
||||
|
||||
### v0.53.1 — Phase 2: recovery units refresh on the periodic cache cycle (idempotent) (2026-06-13)
|
||||
|
||||
The recovery-unit capture now also runs from `RefreshCache` (controller startup + every 5m), not only
|
||||
|
||||
Reference in New Issue
Block a user