From 7eb3772000c10d2c6b4b3478ec169721a16ecb2b Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Wed, 10 Jun 2026 07:46:33 +0200 Subject: [PATCH] =?UTF-8?q?hub:=20opaque=20PBS=20recovery-code=20escrow=20?= =?UTF-8?q?storage=20(v0.8.0)=20+=20doc=2003=20=C2=A78a=20posture=20model?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Slice-7 close-out (hub half). PUT /api/v1/hosts/{host_id}/escrow (per-host key) stores the agent's OPAQUE R-wrapped blob verbatim against the host; the hub never decrypts it (no recovery code, no decrypt path). host_escrow table + Save/GetHostEscrow. Tests: verbatim store, rotation last-write-wins, 401/403/400 auth+body, wire contract. doc 03 §8a rewritten into the key-custody posture model: separation principle, topology matrix, default + anti-lockout ladder, SSH-vs-key, breach/legal, integrity caveat. Corrected: hub opaque storage is slice 7 (this task); serving is slice 10. Slice table + §13 updated. No secrets committed (R/K never appear; spike findings + docs use placeholders). Co-Authored-By: Claude Opus 4.8 (1M context) --- REPORT.md | 70 ++++++------ documentation/architecture/03-host-agent.md | 111 ++++++++++++++------ hub/CHANGELOG.md | 24 +++++ hub/internal/api/escrow_test.go | 111 ++++++++++++++++++++ hub/internal/api/handler.go | 70 +++++++++++- hub/internal/store/store.go | 58 ++++++++++ 6 files changed, 372 insertions(+), 72 deletions(-) create mode 100644 hub/internal/api/escrow_test.go diff --git a/REPORT.md b/REPORT.md index 28d6548..a0176db 100644 --- a/REPORT.md +++ b/REPORT.md @@ -4,49 +4,51 @@ --- -# REPORT — Hub: restore-test "passed with warnings" visibility (v0.7.5) (2026-06-09) +# REPORT — Slice 7 close-out: PBS escrow — hub opaque storage + doc 03 §8a (v0.8.0) (2026-06-10) ## Outcome -**Phase B (hub half) of `TASK — Restore-test must not false-fail on benign start warnings`.** -The agent (v0.7.0, already deployed + live-validated) now treats a benign guest-start advisory -(e.g. `WARN: Systemd 257 detected. You may need to enable nesting.`) as a PASS — verdict is -liveness, not the start-task exitstatus — and carries the warning text on the wire. This is the -hub half: ingest those fields and make a passed-with-warnings restore-test visible to the -operator instead of indistinguishable from a clean pass. +The `felhom.eu` half of `TASK — Slice 7 close-out: PBS recovery-code escrow`. The agent +(felhom-agent v0.9.0) creates an **opaque** `R`-wrapped copy of the PBS key in the zero-knowledge +default; this slice adds the **hub opaque storage** for that blob and rewrites **doc 03 §8a** into a +full key-custody posture model. The wrap→recover→restore round-trip was proven on a throwaway first +(`documentation/tests/slice7-escrow-spike-findings.md`). -## What landed (`hub/internal/api/handler.go`, golden, `host_test.go`) +## What landed (hub v0.8.0) -- **Wire mirror:** `hostRestoreTest` gains `warnings []string` + `warnings_recognized bool` - (`omitempty`), matching the agent's `hub.RestoreTest` field-for-field. An absent - `warnings_recognized` ⇒ `false` ⇒ the **louder** unrecognized path, so a missing flag can only - over-notice, never hide a real warning. -- **Ingest behaviour:** a passed restore-test that carried warnings now logs - `[INFO] restore-test passed WITH WARNINGS (recognized)` when every warning is the known-benign - anchor, escalated to `[WARN] … UNRECOGNIZED WARNINGS` otherwise (as loud as a failed PBS - verify). A FAILED restore-test still logs the existing `[WARN] … FAILED`. -- **Contract:** `restore_tests[0]` in the host-report golden gains the two keys; the golden stays - **byte-identical** with felhom-agent's copy (sha256 `e6999d77…`), and the bidirectional - key-set contract test round-trips the new keys through `hostRestoreTest`. `go test ./...` green. +- **`PUT /api/v1/hosts/{host_id}/escrow`** (`internal/api/handler.go`) — per-host-key authed (a host + writes only its own escrow; global operator key also accepted). Decodes the base64 blob and stores + the **opaque bytes verbatim** against the host. The hub **never decrypts** — there is no decrypt + path; it has no recovery code. Rotation is last-write-wins. +- **`host_escrow`** table + `SaveHostEscrow`/`GetHostEscrow` (`internal/store`). Blob is ciphertext. +- **Contract:** `escrowUploadRequest` mirrors the agent's emit struct (`blob_b64`, `key_fingerprint`, + `posture`, `created_at`); a key-set test in each repo guards drift. +- **Tests:** stores the blob byte-identical; rotation last-write-wins; 401 (absent/wrong key), 403 + (host writing another host's escrow), 400 (bad base64); contract key-set. `go test ./...` green. -## Scope note — no dashboard widget this slice +## Documentation (doc 03 §8a) -The task asked to "surface in the dashboard distinctly from a clean pass." The hub web layer -currently renders **only controller-report data** — there is no host-domain dashboard surface -yet (guests/storage/restore_tests/pbs_snapshots are log+persist only; the failed-PBS-verify -signal is likewise log-only). Building one is out of scope here; distinct dashboard treatment -should land with the host-domain dashboard (slice 10). The operator signal this slice is the log -line, consistent with the established failed-PBS-verify precedent. +Rewrote §8a into the **key-custody posture model**: the **separation principle** (reading data needs +both chunks *and* a key; zero-knowledge holds while Felhom never holds both), the **topology matrix** +(data location × key custody → who can read; the one dangerous cell flagged), the **default** +(Felhom storage + customer-only key; `R` printed durably), the **anti-lockout ladder** ((b) wrapped +offline copy → (a) raw paperkey → Felhom-holds-a-key), **SSH-for-support is a separate grant** (not +coupled to key custody), **why zero-knowledge stays default** (breach + legal compellability), and +the **integrity caveat** for self-hosted-data postures. Corrected the storage-slice note: hub opaque +storage is **slice 7** (this task); only restore-mode **serving** is slice 10. §9 slice table + §13 +updated. -## Backward compatibility +## Live validation -An agent that omits/empties `warnings`/`warnings_recognized` is accepted unchanged (the deployed -v0.7.4 hub already ignores them). The legacy controller report path is untouched. +After the v0.8.0 deploy, the demo agent's `--selftest=escrow-create -upload` PUT the opaque blob and +the hub stored it against the host; the stored bytes are **ciphertext** (not the key). The recovery +code `R` is never sent to or stored by the hub. *(No `R`/`K` value appears in any committed file.)* + +## Deferred / security + +Restore-mode serving + consumption → slice 10. The hub holds ciphertext only — possessing the blob +does not let Felhom read customer data (separation principle). No secrets committed. ## Deploy (GitOps) -Build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.5` → bump the `image:` tag in -`manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app (auto-sync off). Live-validated -after sync: the demo host's restore-test (agent v0.7.0, which passes-with-recognized-warnings on -the Debian-13 guest 9999) reflects on the hub as `passed WITH WARNINGS (recognized)` — not a -plain pass and not a FAILED. +Build+push `felhom-hub:v0.8.0` → bump `manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app. diff --git a/documentation/architecture/03-host-agent.md b/documentation/architecture/03-host-agent.md index 92cbc80..18cfe7e 100644 --- a/documentation/architecture/03-host-agent.md +++ b/documentation/architecture/03-host-agent.md @@ -192,44 +192,84 @@ per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a ba necessity, not just convenience. Integrity-verify (cheap, ciphertext-level) runs more often as the lighter check. -### 8a. PBS recovery-code escrow (zero-knowledge offsite-key recovery) +### 8a. PBS recovery-code escrow + the key-custody posture model (zero-knowledge offsite-key recovery) -The DR substrate is the PBS offsite tier, and it is client-side encrypted (zero-knowledge): if the -box dies, restoring the offsite backups requires the **PBS client encryption key `K`**, which died -with the box. The escrow is how `K` comes back **without** Felhom ever being able to read customer -data. Design (decisions, with the rationale that pins them): +The DR substrate is the PBS offsite tier, client-side encrypted (zero-knowledge): if the box dies, +restoring the offsite backups requires the **PBS client encryption key `K`**, which died with the +box. The escrow is how `K` comes back **without** Felhom ever being able to read customer data. +**Status: implemented** — escrow *creation* (agent v0.9.0, `internal/escrow`) + hub *opaque storage* +(hub v0.8.0, `PUT /api/v1/hosts/{host_id}/escrow`). Validated end-to-end on a throwaway in +`documentation/tests/slice7-escrow-spike-findings.md`. Restore-mode *serving/consumption* is slice 10. +#### The separation principle (the rule that governs every posture) +Reading customer data needs **BOTH** the encrypted chunks **AND** a usable key. **Zero-knowledge +holds for exactly as long as Felhom never holds both at once.** Every posture below is just a +choice about where the data and the key live; the principle decides who can read. + +#### Topology matrix (data location × key custody → who can read) +| Data location | Key custody | Who can read | Notes | +|---|---|---|---| +| **Felhom storage** | customer-only key | **only the customer** | **the DEFAULT** — genuine zero-knowledge | +| **Felhom storage** | Felhom also holds a key | **Felhom can read** | the one dangerous cell — explicit, informed opt-in only; never default, never silent | +| Customer's own offsite | customer key | only the customer | self-hosted data; key XOR data | +| Customer's own offsite | Felhom holds a key | only the customer | safe by separation (key and data never co-located at Felhom) | + +#### The escrow mechanism (decisions + the rationale that pins them) - **Live key unencrypted on the box** (`0600`, root): the agent backs up *and* runs restore-tests - unattended — no passphrase prompt on the management path. The privilege concentration this - implies is the whole argument for §3 root-minimization + a small auditable agent. -- **Wrap mechanism — PBS-native, not custom crypto.** At enrollment the agent generates a - high-entropy **recovery code `R`** and produces a **passphrase-protected copy of `K` under `R`** - using PBS's own key passphrase KDF (`proxmox-backup-client key` family). *Decision: lean on PBS's - documented, battle-tested key+passphrase path; do not roll a bespoke AEAD wrap.* Host/customer - binding is provided at the hub-storage layer (blob keyed by host-id), not by custom crypto. + unattended — no passphrase prompt on the management path. The privilege concentration this implies + is the whole argument for §3 root-minimization + a small auditable agent. +- **Wrap — PBS-native, not custom crypto.** At enrollment the agent generates a high-entropy + **recovery code `R`** and produces a **passphrase-protected copy of `K` under `R`** via PBS's own + key passphrase KDF (`proxmox-backup-client key change-passphrase --kdf scrypt`; no bespoke AEAD). + The spike pinned two implementation constraints: that command is **TTY-only** (drive it over a + pty), and the pty **echoes the passphrase** (discard the pty output so `R` can't leak) — F-A1/F-A2. - **Agent-side generation.** `R` is generated **on the box** (it already holds `K` and does the - wrapping), so `R` never touches the hub even in transit — zero-knowledge by construction. -- **Escrow = the `R`-wrapped blob → hub.** The hub stores opaque ciphertext bound to the - host/customer. Without `R` it is undecryptable; the operator cannot read customer data. (Hub-side - storage schema for the blob is a slice-10 / doc-05 item.) -- **Recovery code custody.** `R` is shown to the customer **once** at enrollment (printed/displayed) - and **never stored by Felhom in recoverable form**. Format: a grouped/word-list code (≥128-bit - entropy) — it is transcribed off paper by a non-technical household, so raw base32 invites typos. -- **Consumption (slice 10, host-loss).** New box re-enrolls in restore mode → hub ships the escrow - blob → customer enters `R` → box unwraps `K` → PBS restores proceed. -- **Optional belt-and-suspenders (product decision, default OFF).** A PBS **paperkey** (the raw key, - for a safe) gives the customer a recovery path that survives *both* box loss *and* recovery-code - loss, at the cost of a higher-value secret (raw key on paper, no second factor). Default is - hub-escrow + `R` only; offer the paperkey as an opt-in "advanced" path. + wrapping), so `R` never touches the hub even in transit — zero-knowledge by construction. `R` is + ≥128 bits, **word-list form** (EFF large wordlist, 10 words ≈ 129 bits) for off-paper transcription. +- **Self-verify before shipping.** Creation unwraps a copy of the blob with `R` and checks the key + fingerprint matches — "an escrow you haven't recovered isn't an escrow." +- **Escrow = the `R`-wrapped blob → hub (opaque storage, slice 7).** The hub stores the ciphertext + bytes against the host record and **never decrypts them** (it has no `R`; there is no decrypt + path). Per-host-key authed; rotation is last-write-wins. **Restore-mode serving is slice 10.** +- **Recovery code custody.** `R` is surfaced to the customer **exactly once** at enrollment + (printed/displayed) and **never stored by Felhom in any recoverable form**. -**Properties stated for honesty (these go to the customer at enrollment):** +#### Default posture + the anti-lockout ladder (opt-in, increasing trust) +**Default:** *Felhom storage + customer-only key*, and **`R` is delivered durably (printed) always** +— note this is distinct from a raw-key paperkey: `R` is a safe two-factor *passphrase* (useless +without the hub's blob); the raw key is the footgun. The ladder trades resilience for trust: +- **(b) `R`-wrapped offline copy** — the same two-factor blob, for the customer to print/store. **No + extra trust**; resilience if the hub ever vanishes (still needs `R`). *Implemented (opt-in).* +- **(a) raw paperkey** — `proxmox-backup-client key paperkey` of the unwrapped key, for a safe. + Covers **losing `R`**, but it is **single-factor and unrevocable**. *Implemented (opt-in, loud + caveat).* +- **Felhom-holds-a-key** — maximum convenience, but **gives up zero-knowledge** (the dangerous + matrix cell). **Not implemented** — it needs a separate Felhom-side secure key store + explicit + opt-in UX, built only when a customer asks. + +#### SSH-for-support is a SEPARATE grant — deliberately not coupled to key custody +Support access (active / consented / observable — customer-toggleable, commands shown) is **not** +the same as a standing / passive / invisible decryption capability. The transparency features prove +*controlled* support access **without Felhom holding a key**. Conflating the two is exactly the +mistake the separation principle prevents. + +#### Why zero-knowledge stays the default (breach + legal) +Holding data **and** a key makes a single hub breach an **all-customer data leak**, and makes Felhom +**compellable** — a court can order what Felhom *can* produce. Genuine zero-knowledge means *"we +can't be forced to hand over what we can't read."* This is core to the sovereignty pitch, not a +nicety. + +#### Honesty properties (stated to the customer at enrollment) - **Irreducible residual:** losing `R` *and* the box (and, if not opted in, having no paperkey) = - the offsite backups are **unrecoverable, by anyone, including Felhom.** This is the cost of - genuine zero-knowledge and must be communicated, not buried. -- **Rotation ≠ key rotation:** rotating `R` re-wraps the escrow blob (and re-shows the customer a - new code) but does **not** re-encrypt existing PBS data — that data stays keyed by `K`. Changing - `K` itself is a separate, heavier operation (new key → new backups; old backups still need old - `K`) and is out of scope for routine recovery-code rotation. + the offsite backups are **unrecoverable, by anyone, including Felhom.** The cost of genuine + zero-knowledge — communicated, not buried. +- **Rotation ≠ key rotation:** rotating `R` re-wraps the escrow blob (and re-shows a new code) but + does **not** re-encrypt existing PBS data — that stays keyed by `K`. Changing `K` itself is a + separate, heavier op (new key → new backups; old backups still need old `K`), out of scope for + routine recovery-code rotation. +- **Integrity caveat (self-hosted-data postures):** moving data to the customer's own offsite + **loses Felhom's backup guarantees** — no PBS verify / monitoring on storage we can't reach. An + honest signup-time tradeoff, not a hidden one. ## 9. Provisioning & DR flows @@ -295,7 +335,7 @@ this path — bring up + reattach external storage and it is whole. This is full | Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment | | Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) | | **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) | -| PBS recovery-code escrow **creation** (§8a) | **7** | designed (§8a); implement | +| PBS recovery-code escrow **creation** + **hub opaque storage** (§8a) | **7** | **implemented** (agent v0.9.0 `internal/escrow`; hub v0.8.0 `PUT /hosts/{id}/escrow`) | | Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) | | **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) | | PBS escrow **consumption** (recover `K` on a new box) | **10** | deferred — exercised by host-loss DR | @@ -359,8 +399,9 @@ Still open: - **Golden base image** refresh cadence + fleet versioning — operational, non-blocking (§9). - **Identity-reset set** (live, link-up) — pinned empirically by the slice-7 bring-up spike; the scenario-specific policy is settled in §9, the exact field list is the spike's deliverable. -- **Hub-side escrow storage + restore-mode serving** — the blob's hub schema and the restore-mode - desired-state handover are slice-10 / doc-05 (§8a, §9 host-loss). +- **Escrow restore-mode serving / consumption** — handing the opaque blob back to a re-enrolling + box and unwrapping `K` with `R` is slice-10 / doc-05 (§8a, §9 host-loss). *Escrow creation + hub + opaque storage are done (slice 7).* This doc hands the implementation three contracts it was waiting on: diff --git a/hub/CHANGELOG.md b/hub/CHANGELOG.md index 9215e4b..14e19ea 100644 --- a/hub/CHANGELOG.md +++ b/hub/CHANGELOG.md @@ -1,5 +1,29 @@ # Felhom Hub — Changelog +## v0.8.0 — opaque PBS recovery-code escrow storage (slice 7, doc 03 §8a) (2026-06-10) + +Hub half of slice-7 close-out: store the agent's **opaque** `R`-wrapped PBS-key escrow blob. The +default posture is zero-knowledge — the hub holds ciphertext it **cannot open** (it has no recovery +code; there is no decrypt path). Pairs with felhom-agent v0.9.0 (escrow creation). Consumption / +restore-mode serving is slice 10. + +### Added +- **`PUT /api/v1/hosts/{host_id}/escrow`** — authed with the **per-host key** (a host may only write + its own escrow; the global operator key is also accepted). Body mirrors the agent's emit struct + (`blob_b64`, `key_fingerprint`, `posture`, `created_at`). Stores the decoded **opaque bytes + verbatim**; rotation is last-write-wins. No serving this slice. +- **`host_escrow`** table (`host_id` PK, `blob` BLOB, fingerprint/posture/created_at). Store methods + `SaveHostEscrow` / `GetHostEscrow` (`HostEscrow`). The hub never transforms or decrypts the blob. + +### Tests +- Stores the opaque blob **verbatim** (round-trips byte-identical); rotation last-write-wins; + rejects an absent/wrong key (401) and a host writing another host's escrow (403); bad/empty + base64 → 400; the wire-contract key-set matches the agent's emit struct. + +### Security note +The hub stores ciphertext only — holding the blob does NOT let Felhom read customer data +(separation principle, doc 03 §8a). The per-host-key gate scopes writes to the owning host. + ## v0.7.5 — restore-test "passed with warnings" visibility (2026-06-09) Hub half of `TASK — Restore-test must not false-fail on benign start warnings` (Phase B). The diff --git a/hub/internal/api/escrow_test.go b/hub/internal/api/escrow_test.go new file mode 100644 index 0000000..6aa6fec --- /dev/null +++ b/hub/internal/api/escrow_test.go @@ -0,0 +1,111 @@ +package api + +import ( + "encoding/base64" + "encoding/json" + "net/http" + "reflect" + "sort" + "testing" + + "gitea.dooplex.hu/admin/felhom-hub/internal/store" +) + +// a stand-in for the opaque R-wrapped blob — the hub treats it as ciphertext it cannot read. +var opaqueBlob = []byte("\x00\x01OPAQUE-pbs-scrypt-keyfile-bytes\xff\xfe") + +func escrowBody(blob []byte) string { + b, _ := json.Marshal(map[string]string{ + "blob_b64": base64.StdEncoding.EncodeToString(blob), + "key_fingerprint": "ab:cd:ef", + "posture": "zero_knowledge", + "created_at": "2026-06-10T05:00:00Z", + }) + return string(b) +} + +func TestHandleHostEscrow_StoresOpaqueBlobVerbatim(t *testing.T) { + h, st, _ := newTestHandler(t) + st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"}) + + rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody(opaqueBlob)) + if rr.Code != http.StatusOK { + t.Fatalf("PUT escrow = %d, want 200 (%s)", rr.Code, rr.Body.String()) + } + got, err := st.GetHostEscrow("h1") + if err != nil || got == nil { + t.Fatalf("GetHostEscrow: %v / %v", got, err) + } + // the hub stored the OPAQUE bytes verbatim (it never decrypts / transforms them). + if !reflect.DeepEqual(got.Blob, opaqueBlob) { + t.Fatalf("stored blob != uploaded blob (hub must keep ciphertext verbatim)") + } + if got.KeyFingerprint != "ab:cd:ef" || got.Posture != "zero_knowledge" { + t.Errorf("metadata not stored: %+v", got) + } +} + +func TestHandleHostEscrow_LastWriteWins(t *testing.T) { + h, st, _ := newTestHandler(t) + st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"}) + do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody([]byte("first"))) + rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody([]byte("second-rotated"))) + if rr.Code != http.StatusOK { + t.Fatalf("rotation PUT = %d", rr.Code) + } + got, _ := st.GetHostEscrow("h1") + if string(got.Blob) != "second-rotated" { + t.Fatalf("rotation must be last-write-wins, got %q", got.Blob) + } +} + +func TestHandleHostEscrow_AuthRejected(t *testing.T) { + h, st, _ := newTestHandler(t) + st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"}) + st.UpsertHost(&store.Host{HostID: "h2", CustomerID: "c2", APIKey: "HKEY2"}) + + // absent / wrong key → 401 + if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "", escrowBody(opaqueBlob)); rr.Code != http.StatusUnauthorized { + t.Errorf("no key: got %d want 401", rr.Code) + } + if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "WRONG", escrowBody(opaqueBlob)); rr.Code != http.StatusUnauthorized { + t.Errorf("wrong key: got %d want 401", rr.Code) + } + // h2's key writing h1's escrow → 403 (a host may only write its own) + if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY2", escrowBody(opaqueBlob)); rr.Code != http.StatusForbidden { + t.Errorf("host_id mismatch: got %d want 403", rr.Code) + } + // and nothing was stored for h1 by the rejected attempts. + if got, _ := st.GetHostEscrow("h1"); got != nil { + t.Errorf("rejected attempts must not store anything, got %+v", got) + } +} + +func TestHandleHostEscrow_BadBody(t *testing.T) { + h, st, _ := newTestHandler(t) + st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"}) + if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", `{"blob_b64":""}`); rr.Code != http.StatusBadRequest { + t.Errorf("empty blob: got %d want 400", rr.Code) + } + if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", `{"blob_b64":"!!!not base64!!!"}`); rr.Code != http.StatusBadRequest { + t.Errorf("bad base64: got %d want 400", rr.Code) + } +} + +// TestEscrowUploadContract pins the wire shape that MUST match the agent's emit struct +// (felhom-agent escrowUploadRequest). Cross-repo, no shared module — this is the hub half of the +// contract guard; the agent has the mirror in its own test. +func TestEscrowUploadContract(t *testing.T) { + b, _ := json.Marshal(escrowUploadRequest{BlobB64: "x", KeyFingerprint: "y", Posture: "z", CreatedAt: "t"}) + var m map[string]any + json.Unmarshal(b, &m) + got := make([]string, 0, len(m)) + for k := range m { + got = append(got, k) + } + sort.Strings(got) + want := []string{"blob_b64", "created_at", "key_fingerprint", "posture"} + if !reflect.DeepEqual(got, want) { + t.Fatalf("escrow wire contract drift: got %v want %v (must match the agent emit struct)", got, want) + } +} diff --git a/hub/internal/api/handler.go b/hub/internal/api/handler.go index ece7fe9..fca95d8 100644 --- a/hub/internal/api/handler.go +++ b/hub/internal/api/handler.go @@ -3,6 +3,7 @@ package api import ( "bytes" "crypto/subtle" + "encoding/base64" "encoding/json" "fmt" "io" @@ -124,6 +125,9 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { h.handleHostReport(w, r) case r.Method == http.MethodPost && path == "/admin/hosts": h.handleAdminCreateHost(w, r) + case r.Method == http.MethodPut && strings.HasPrefix(path, "/hosts/") && strings.HasSuffix(path, "/escrow"): + hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/hosts/"), "/escrow") + h.handleHostEscrowPut(w, r, hostID) case r.Method == http.MethodPost && path == "/event": h.handleEvent(w, r) case r.Method == http.MethodPost && path == "/notify": @@ -258,9 +262,9 @@ type hostReportPayload struct { ControllerVersion string `json:"controller_version"` } `json:"guests"` StorageTargets []hostStorageTarget `json:"storage_targets"` - Backups []hostBackup `json:"backups"` // slice 6 - RestoreTests []hostRestoreTest `json:"restore_tests"` // slice 6 - PBSSnapshots []hostPBSSnapshot `json:"pbs_snapshots"` // slice 6 Phase B + Backups []hostBackup `json:"backups"` // slice 6 + RestoreTests []hostRestoreTest `json:"restore_tests"` // slice 6 + PBSSnapshots []hostPBSSnapshot `json:"pbs_snapshots"` // slice 6 Phase B Cloudflared struct { Status string `json:"status"` } `json:"cloudflared"` @@ -569,6 +573,66 @@ func (h *Handler) handleAdminCreateHost(w http.ResponseWriter, r *http.Request) json.NewEncoder(w).Encode(map[string]string{"host_id": hostID, "api_key": apiKey}) } +// escrowUploadRequest is the agent→hub wire shape for the OPAQUE PBS recovery-code escrow blob +// (slice 7, doc 03 §8a). It MUST stay in lockstep with the agent's emit struct +// (felhom-agent cmd/felhom-agent escrowUploadRequest). The hub stores the bytes and NEVER decrypts +// them (it has no recovery code). +type escrowUploadRequest struct { + BlobB64 string `json:"blob_b64"` // base64 of the opaque R-wrapped blob (ciphertext) + KeyFingerprint string `json:"key_fingerprint"` // for operator display only + Posture string `json:"posture"` // e.g. "zero_knowledge" + CreatedAt string `json:"created_at"` // RFC3339 +} + +// handleHostEscrowPut stores a host's opaque escrow blob (doc 03 §8a). Authed with the PER-HOST key +// (a host may only write its own escrow; the global operator key is also accepted). The hub keeps +// the ciphertext and never opens it. Last-write-wins (rotation). No serving this slice (slice 10). +func (h *Handler) handleHostEscrowPut(w http.ResponseWriter, r *http.Request, pathHostID string) { + authHostID, _, isGlobal, ok := h.checkAuthHost(r) + if !ok { + http.Error(w, "Unauthorized", http.StatusUnauthorized) + return + } + if pathHostID == "" { + http.Error(w, "Missing host_id", http.StatusBadRequest) + return + } + // A per-host key may only write ITS OWN escrow; the global key may write any. + if !isGlobal && authHostID != pathHostID { + http.Error(w, "Forbidden: host_id mismatch", http.StatusForbidden) + return + } + body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20)) // 1 MB cap; the blob is ~hundreds of bytes + if err != nil { + http.Error(w, "Bad request", http.StatusBadRequest) + return + } + var req escrowUploadRequest + if err := json.Unmarshal(body, &req); err != nil || req.BlobB64 == "" { + http.Error(w, "Invalid payload: blob_b64 required", http.StatusBadRequest) + return + } + blob, err := base64.StdEncoding.DecodeString(req.BlobB64) + if err != nil || len(blob) == 0 { + http.Error(w, "Invalid payload: blob_b64 not valid base64", http.StatusBadRequest) + return + } + createdAt := req.CreatedAt + if createdAt == "" { + createdAt = time.Now().UTC().Format(time.RFC3339) + } + // Store the OPAQUE bytes. No decrypt path exists — the hub cannot open this. + if err := h.store.SaveHostEscrow(pathHostID, blob, req.KeyFingerprint, req.Posture, createdAt); err != nil { + h.logger.Printf("[ERROR] Failed to store escrow for host %s: %v", pathHostID, err) + http.Error(w, "Internal error", http.StatusInternalServerError) + return + } + h.logger.Printf("[INFO] stored opaque escrow blob for host %s (%d bytes, posture=%s, fp=%s)", + pathHostID, len(blob), req.Posture, req.KeyFingerprint) + w.WriteHeader(http.StatusOK) + w.Write([]byte(`{"status":"ok"}`)) +} + // allowedEventTypes lists all valid event_type values the Hub accepts. var allowedEventTypes = map[string]bool{ // Controller-pushed events diff --git a/hub/internal/store/store.go b/hub/internal/store/store.go index 1d8e85c..74dd55c 100644 --- a/hub/internal/store/store.go +++ b/hub/internal/store/store.go @@ -269,6 +269,19 @@ func (s *Store) migrate() error { ); CREATE INDEX IF NOT EXISTS idx_host_reports_host ON host_reports(host_id, received_at DESC); CREATE INDEX IF NOT EXISTS idx_host_reports_customer ON host_reports(customer_id, received_at DESC); + + -- host_escrow (slice 7, doc 03 §8a): the OPAQUE R-wrapped PBS-key escrow blob. The hub + -- stores the ciphertext bytes against the host and NEVER decrypts them (it has no recovery + -- code). One row per host; a re-upload (rotation) is last-write-wins. Restore-mode serving + -- (handing the blob back to a re-enrolling box) is slice 10. + CREATE TABLE IF NOT EXISTS host_escrow ( + host_id TEXT PRIMARY KEY, + blob BLOB NOT NULL, + key_fingerprint TEXT NOT NULL DEFAULT '', + posture TEXT NOT NULL DEFAULT '', + created_at DATETIME NOT NULL, + updated_at DATETIME NOT NULL DEFAULT (datetime('now')) + ); `) if err != nil { return err @@ -1381,6 +1394,51 @@ func (s *Store) UpsertHost(h *Host) error { return err } +// HostEscrow is the opaque R-wrapped escrow blob stored for a host (doc 03 §8a). Blob is +// ciphertext the hub cannot open. +type HostEscrow struct { + HostID string + Blob []byte + KeyFingerprint string + Posture string + CreatedAt string + UpdatedAt string +} + +// SaveHostEscrow stores (last-write-wins) the OPAQUE escrow blob for a host. The hub keeps the +// bytes and NEVER decrypts them — there is no decrypt path. createdAt is the agent's timestamp. +func (s *Store) SaveHostEscrow(hostID string, blob []byte, keyFingerprint, posture, createdAt string) error { + _, err := s.db.Exec(` + INSERT INTO host_escrow (host_id, blob, key_fingerprint, posture, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, datetime('now')) + ON CONFLICT(host_id) DO UPDATE SET + blob = excluded.blob, + key_fingerprint = excluded.key_fingerprint, + posture = excluded.posture, + created_at = excluded.created_at, + updated_at = datetime('now')`, + hostID, blob, keyFingerprint, posture, createdAt, + ) + return err +} + +// GetHostEscrow returns the stored opaque escrow for a host (nil if none). Used by tests and +// (future, slice 10) restore-mode serving. The hub returns bytes verbatim; it never decrypts. +func (s *Store) GetHostEscrow(hostID string) (*HostEscrow, error) { + var e HostEscrow + err := s.db.QueryRow(` + SELECT host_id, blob, key_fingerprint, posture, created_at, updated_at + FROM host_escrow WHERE host_id = ?`, hostID). + Scan(&e.HostID, &e.Blob, &e.KeyFingerprint, &e.Posture, &e.CreatedAt, &e.UpdatedAt) + if err == sql.ErrNoRows { + return nil, nil + } + if err != nil { + return nil, err + } + return &e, nil +} + // SaveHostReport inserts a host_reports row and bumps the host's reality columns // (agent_version/last_report_at/updated_at) — never the inert intent columns. func (s *Store) SaveHostReport(hostID, customerID string, reportJSON []byte, d HostReportDenorm) error {