hub: opaque PBS recovery-code escrow storage (v0.8.0) + doc 03 §8a posture model

Slice-7 close-out (hub half). PUT /api/v1/hosts/{host_id}/escrow (per-host key)
stores the agent's OPAQUE R-wrapped blob verbatim against the host; the hub never
decrypts it (no recovery code, no decrypt path). host_escrow table + Save/GetHostEscrow.
Tests: verbatim store, rotation last-write-wins, 401/403/400 auth+body, wire contract.

doc 03 §8a rewritten into the key-custody posture model: separation principle,
topology matrix, default + anti-lockout ladder, SSH-vs-key, breach/legal, integrity
caveat. Corrected: hub opaque storage is slice 7 (this task); serving is slice 10.
Slice table + §13 updated.

No secrets committed (R/K never appear; spike findings + docs use placeholders).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-10 07:46:33 +02:00
parent fe7d0850a5
commit 7eb3772000
6 changed files with 372 additions and 72 deletions
+36 -34
View File
@@ -4,49 +4,51 @@
---
# REPORT — Hub: restore-test "passed with warnings" visibility (v0.7.5) (2026-06-09)
# REPORT — Slice 7 close-out: PBS escrow — hub opaque storage + doc 03 §8a (v0.8.0) (2026-06-10)
## Outcome
**Phase B (hub half) of `TASK — Restore-test must not false-fail on benign start warnings`.**
The agent (v0.7.0, already deployed + live-validated) now treats a benign guest-start advisory
(e.g. `WARN: Systemd 257 detected. You may need to enable nesting.`) as a PASS — verdict is
liveness, not the start-task exitstatus — and carries the warning text on the wire. This is the
hub half: ingest those fields and make a passed-with-warnings restore-test visible to the
operator instead of indistinguishable from a clean pass.
The `felhom.eu` half of `TASK — Slice 7 close-out: PBS recovery-code escrow`. The agent
(felhom-agent v0.9.0) creates an **opaque** `R`-wrapped copy of the PBS key in the zero-knowledge
default; this slice adds the **hub opaque storage** for that blob and rewrites **doc 03 §8a** into a
full key-custody posture model. The wrap→recover→restore round-trip was proven on a throwaway first
(`documentation/tests/slice7-escrow-spike-findings.md`).
## What landed (`hub/internal/api/handler.go`, golden, `host_test.go`)
## What landed (hub v0.8.0)
- **Wire mirror:** `hostRestoreTest` gains `warnings []string` + `warnings_recognized bool`
(`omitempty`), matching the agent's `hub.RestoreTest` field-for-field. An absent
`warnings_recognized``false` ⇒ the **louder** unrecognized path, so a missing flag can only
over-notice, never hide a real warning.
- **Ingest behaviour:** a passed restore-test that carried warnings now logs
`[INFO] restore-test passed WITH WARNINGS (recognized)` when every warning is the known-benign
anchor, escalated to `[WARN] … UNRECOGNIZED WARNINGS` otherwise (as loud as a failed PBS
verify). A FAILED restore-test still logs the existing `[WARN] … FAILED`.
- **Contract:** `restore_tests[0]` in the host-report golden gains the two keys; the golden stays
**byte-identical** with felhom-agent's copy (sha256 `e6999d77…`), and the bidirectional
key-set contract test round-trips the new keys through `hostRestoreTest`. `go test ./...` green.
- **`PUT /api/v1/hosts/{host_id}/escrow`** (`internal/api/handler.go`) — per-host-key authed (a host
writes only its own escrow; global operator key also accepted). Decodes the base64 blob and stores
the **opaque bytes verbatim** against the host. The hub **never decrypts** — there is no decrypt
path; it has no recovery code. Rotation is last-write-wins.
- **`host_escrow`** table + `SaveHostEscrow`/`GetHostEscrow` (`internal/store`). Blob is ciphertext.
- **Contract:** `escrowUploadRequest` mirrors the agent's emit struct (`blob_b64`, `key_fingerprint`,
`posture`, `created_at`); a key-set test in each repo guards drift.
- **Tests:** stores the blob byte-identical; rotation last-write-wins; 401 (absent/wrong key), 403
(host writing another host's escrow), 400 (bad base64); contract key-set. `go test ./...` green.
## Scope note — no dashboard widget this slice
## Documentation (doc 03 §8a)
The task asked to "surface in the dashboard distinctly from a clean pass." The hub web layer
currently renders **only controller-report data** — there is no host-domain dashboard surface
yet (guests/storage/restore_tests/pbs_snapshots are log+persist only; the failed-PBS-verify
signal is likewise log-only). Building one is out of scope here; distinct dashboard treatment
should land with the host-domain dashboard (slice 10). The operator signal this slice is the log
line, consistent with the established failed-PBS-verify precedent.
Rewrote §8a into the **key-custody posture model**: the **separation principle** (reading data needs
both chunks *and* a key; zero-knowledge holds while Felhom never holds both), the **topology matrix**
(data location × key custody → who can read; the one dangerous cell flagged), the **default**
(Felhom storage + customer-only key; `R` printed durably), the **anti-lockout ladder** ((b) wrapped
offline copy → (a) raw paperkey → Felhom-holds-a-key), **SSH-for-support is a separate grant** (not
coupled to key custody), **why zero-knowledge stays default** (breach + legal compellability), and
the **integrity caveat** for self-hosted-data postures. Corrected the storage-slice note: hub opaque
storage is **slice 7** (this task); only restore-mode **serving** is slice 10. §9 slice table + §13
updated.
## Backward compatibility
## Live validation
An agent that omits/empties `warnings`/`warnings_recognized` is accepted unchanged (the deployed
v0.7.4 hub already ignores them). The legacy controller report path is untouched.
After the v0.8.0 deploy, the demo agent's `--selftest=escrow-create -upload` PUT the opaque blob and
the hub stored it against the host; the stored bytes are **ciphertext** (not the key). The recovery
code `R` is never sent to or stored by the hub. *(No `R`/`K` value appears in any committed file.)*
## Deferred / security
Restore-mode serving + consumption → slice 10. The hub holds ciphertext only — possessing the blob
does not let Felhom read customer data (separation principle). No secrets committed.
## Deploy (GitOps)
Build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.5` → bump the `image:` tag in
`manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app (auto-sync off). Live-validated
after sync: the demo host's restore-test (agent v0.7.0, which passes-with-recognized-warnings on
the Debian-13 guest 9999) reflects on the hub as `passed WITH WARNINGS (recognized)` — not a
plain pass and not a FAILED.
Build+push `felhom-hub:v0.8.0` → bump `manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app.