docs: Phase 2b fail-closed gate LIVE-validated on AdventureLog
Demo has no dashboard password (API open: auth+CSRF both skip in that mode), driven via the public URL. AdventureLog's unit manifest carries data_key_env_vars=[SECRET_KEY] (catalog->manifest live); with SECRET_KEY unrecoverable, POST /backup/restore REFUSED with the exact fail-closed message before any compose-up. Full deploy-with-data e2e blocked by the 8G guest rootfs (AdventureLog images too big — the Phase 3 concern, live). CHANGELOG/REPORT/CONTEXT updated; demo left clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+11
-5
@@ -22,11 +22,17 @@ from secrets stored in the unit (there are none), and **regenerating nothing**.
|
|||||||
- **Tests:** the gate (all recovered / data-key missing → refuse / empty data-key → refuse / resettable
|
- **Tests:** the gate (all recovered / data-key missing → refuse / empty data-key → refuse / resettable
|
||||||
missing → proceed+warn, recovered values used verbatim) and `data_key` parsing from `.felhom.yml`
|
missing → proceed+warn, recovered values used verbatim) and `data_key` parsing from `.felhom.yml`
|
||||||
(`Metadata.DataKeyEnvVars()`).
|
(`Metadata.DataKeyEnvVars()`).
|
||||||
- **Validation status:** the gate + reconciliation + data_key parsing are unit-tested (authoritative for
|
- **Live-validated on guest 9201 (AdventureLog, a real data_key app):** its recovery-unit manifest
|
||||||
the refuse/proceed/regenerate-nothing behaviour); the capture side is live-validated (v0.53.1, RomM).
|
correctly carries `data_key_env_vars: [SECRET_KEY]` (catalog→metadata→manifest flow proven live); and
|
||||||
The full live **readable-data e2e** against AdventureLog (deploy → back up → restore → confirm the
|
with `SECRET_KEY` made unrecoverable, `POST /backup/restore` **refused** with the exact fail-closed
|
||||||
data decrypts) requires triggering the **auth-gated** `/backup/restore` from the dashboard — pending an
|
message ("…[SECRET_KEY] could not be recovered … a PBS whole-guest restore is required first…"),
|
||||||
operator-run on the demo.
|
**before any compose-up** (no side effects). The demo has no dashboard password, so the API is open
|
||||||
|
(auth + CSRF are both skipped in that mode) — this was driven via the public URL. Gate + reconciliation
|
||||||
|
+ orchestration + data_key parsing are also unit-tested.
|
||||||
|
- **One e2e not run (environment limit, not a code gap):** the full "deploy with data → restore →
|
||||||
|
confirm data decrypts" — AdventureLog's images don't fit the **8 GB guest rootfs** (the deploy hit "no
|
||||||
|
space left on device"). This is exactly the Phase 3 rootfs-headroom concern, now observed live.
|
||||||
|
Key-preservation/regenerate-nothing is covered by the gate's verbatim-recovery unit test.
|
||||||
|
|
||||||
### v0.53.1 — Phase 2: recovery units refresh on the periodic cache cycle (idempotent) (2026-06-13)
|
### v0.53.1 — Phase 2: recovery units refresh on the periodic cache cycle (idempotent) (2026-06-13)
|
||||||
|
|
||||||
|
|||||||
+7
-2
@@ -34,8 +34,13 @@ Last updated: 2026-06-12 (storage UX polish)
|
|||||||
> `stacks.RedeployFromEnv`), regenerating nothing. `reconcileRestoreSecrets` (pure, unit-tested) is the
|
> `stacks.RedeployFromEnv`), regenerating nothing. `reconcileRestoreSecrets` (pure, unit-tested) is the
|
||||||
> fail-closed gate: missing/empty data-key → REFUSE (needs PBS whole-guest restore); missing resettable
|
> fail-closed gate: missing/empty data-key → REFUSE (needs PBS whole-guest restore); missing resettable
|
||||||
> secret → warn+proceed. Wired into `/backup/restore`. Gate + orchestration + data_key parsing
|
> secret → warn+proceed. Wired into `/backup/restore`. Gate + orchestration + data_key parsing
|
||||||
> unit/integration-tested; deployed v0.54.0 healthy. **PENDING:** live readable-data e2e vs AdventureLog
|
> unit/integration-tested; deployed v0.54.0 healthy.
|
||||||
> needs the auth-gated dashboard restore (no web cred in bootstrap.json) — operator-run.
|
> - **LIVE-validated (9201, AdventureLog):** unit manifest `data_key_env_vars:[SECRET_KEY]`
|
||||||
|
> (catalog→manifest live); with SECRET_KEY made unrecoverable, `POST /backup/restore` REFUSED with the
|
||||||
|
> exact fail-closed message BEFORE any compose-up. Demo has NO dashboard password → API open (auth+CSRF
|
||||||
|
> skipped), driven via public URL. NOTE: full deploy-with-data→restore e2e blocked because AdventureLog
|
||||||
|
> images don't fit the 8G guest rootfs ("no space left") — that's the Phase 3 rootfs-headroom concern
|
||||||
|
> seen live. Demo left clean (AdventureLog reverted to not-deployed).
|
||||||
> - Next: Phase 3 (Tier 2 auto off-drive, rootfs-headroom guard), Phase 4 (FileBrowser + UI).
|
> - Next: Phase 3 (Tier 2 auto off-drive, rootfs-headroom guard), Phase 4 (FileBrowser + UI).
|
||||||
>
|
>
|
||||||
> **2026-06-13 — v0.52.0 Phase 1 GATE: deploy-side double-nest fix (catalog) + path-agreement test:**
|
> **2026-06-13 — v0.52.0 Phase 1 GATE: deploy-side double-nest fix (catalog) + path-agreement test:**
|
||||||
|
|||||||
@@ -71,15 +71,19 @@ persist. (This is what "self-update handles version drift" refers to.)
|
|||||||
→proceed, values used verbatim), the full orchestration (success→recreate-with-merged-env;
|
→proceed, values used verbatim), the full orchestration (success→recreate-with-merged-env;
|
||||||
data-key-missing→refused, recreate never called), and `data_key` parsing from `.felhom.yml`.
|
data-key-missing→refused, recreate never called), and `data_key` parsing from `.felhom.yml`.
|
||||||
|
|
||||||
## Validation status (honest)
|
## Validation status
|
||||||
- **Unit/integration-tested (authoritative):** the fail-closed gate, the restore orchestration, secret
|
- **Unit/integration-tested (authoritative):** the fail-closed gate, the restore orchestration, secret
|
||||||
reconciliation (regenerate-nothing), and the catalog→metadata `data_key` flow.
|
reconciliation (regenerate-nothing), and the catalog→metadata `data_key` flow.
|
||||||
- **Live-validated:** the capture side (v0.53.1, RomM — secret-free unit, NO_LEAK grep); v0.54.0 deployed
|
- **Live-validated (guest 9201):** the capture side (v0.53.1, RomM — secret-free, NO_LEAK). For Phase 2b
|
||||||
+ healthy + capture regression clean.
|
on **AdventureLog** (a real data_key app): its unit manifest carries `data_key_env_vars: [SECRET_KEY]`
|
||||||
- **PENDING (auth-gated):** the full live **readable-data e2e** vs AdventureLog (deploy with an
|
(catalog→manifest flow live); and with `SECRET_KEY` made unrecoverable, `POST /backup/restore`
|
||||||
encryption key → back up → restore → confirm data decrypts) needs triggering the session-authed
|
**refused** with the exact fail-closed message **before any compose-up** (no side effects). The demo
|
||||||
`/backup/restore` from the dashboard. `bootstrap.json` carries no web credential and the password is a
|
has no dashboard password → the API is open (auth + CSRF skipped), driven via the public URL.
|
||||||
bcrypt hash, so this needs an operator-run (or the demo dashboard password).
|
- **One e2e not run — environment limit, not a code gap:** the full "deploy with data → restore →
|
||||||
|
confirm decrypts" — AdventureLog's images do not fit the **8 GB guest rootfs** (deploy hit "no space
|
||||||
|
left on device"). That is precisely the Phase 3 rootfs-headroom concern, now observed live.
|
||||||
|
Key-preservation is covered by the gate's verbatim-recovery unit test. Demo left clean (AdventureLog
|
||||||
|
reverted to not-deployed, no leftovers).
|
||||||
|
|
||||||
## Still ahead
|
## Still ahead
|
||||||
Phase 3 (auto off-drive Tier 2 with rootfs-headroom guard) and Phase 4 (FileBrowser scoping + deploy-UI
|
Phase 3 (auto off-drive Tier 2 with rootfs-headroom guard) and Phase 4 (FileBrowser scoping + deploy-UI
|
||||||
|
|||||||
Reference in New Issue
Block a user