docs: Phase 2b fail-closed gate LIVE-validated on AdventureLog

Demo has no dashboard password (API open: auth+CSRF both skip in that mode), driven
via the public URL. AdventureLog's unit manifest carries data_key_env_vars=[SECRET_KEY]
(catalog->manifest live); with SECRET_KEY unrecoverable, POST /backup/restore REFUSED
with the exact fail-closed message before any compose-up. Full deploy-with-data e2e
blocked by the 8G guest rootfs (AdventureLog images too big — the Phase 3 concern, live).
CHANGELOG/REPORT/CONTEXT updated; demo left clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 12:35:08 +02:00
parent 1ed20c7069
commit d8fe8f5ead
3 changed files with 29 additions and 14 deletions
+11 -5
View File
@@ -22,11 +22,17 @@ from secrets stored in the unit (there are none), and **regenerating nothing**.
- **Tests:** the gate (all recovered / data-key missing → refuse / empty data-key → refuse / resettable
missing → proceed+warn, recovered values used verbatim) and `data_key` parsing from `.felhom.yml`
(`Metadata.DataKeyEnvVars()`).
- **Validation status:** the gate + reconciliation + data_key parsing are unit-tested (authoritative for
the refuse/proceed/regenerate-nothing behaviour); the capture side is live-validated (v0.53.1, RomM).
The full live **readable-data e2e** against AdventureLog (deploy → back up → restore → confirm the
data decrypts) requires triggering the **auth-gated** `/backup/restore` from the dashboard — pending an
operator-run on the demo.
- **Live-validated on guest 9201 (AdventureLog, a real data_key app):** its recovery-unit manifest
correctly carries `data_key_env_vars: [SECRET_KEY]` (catalog→metadata→manifest flow proven live); and
with `SECRET_KEY` made unrecoverable, `POST /backup/restore` **refused** with the exact fail-closed
message ("…[SECRET_KEY] could not be recovered … a PBS whole-guest restore is required first…"),
**before any compose-up** (no side effects). The demo has no dashboard password, so the API is open
(auth + CSRF are both skipped in that mode) — this was driven via the public URL. Gate + reconciliation
+ orchestration + data_key parsing are also unit-tested.
- **One e2e not run (environment limit, not a code gap):** the full "deploy with data → restore →
confirm data decrypts" — AdventureLog's images don't fit the **8 GB guest rootfs** (the deploy hit "no
space left on device"). This is exactly the Phase 3 rootfs-headroom concern, now observed live.
Key-preservation/regenerate-nothing is covered by the gate's verbatim-recovery unit test.
### v0.53.1 — Phase 2: recovery units refresh on the periodic cache cycle (idempotent) (2026-06-13)