hub: opaque PBS recovery-code escrow storage (v0.8.0) + doc 03 §8a posture model

Slice-7 close-out (hub half). PUT /api/v1/hosts/{host_id}/escrow (per-host key)
stores the agent's OPAQUE R-wrapped blob verbatim against the host; the hub never
decrypts it (no recovery code, no decrypt path). host_escrow table + Save/GetHostEscrow.
Tests: verbatim store, rotation last-write-wins, 401/403/400 auth+body, wire contract.

doc 03 §8a rewritten into the key-custody posture model: separation principle,
topology matrix, default + anti-lockout ladder, SSH-vs-key, breach/legal, integrity
caveat. Corrected: hub opaque storage is slice 7 (this task); serving is slice 10.
Slice table + §13 updated.

No secrets committed (R/K never appear; spike findings + docs use placeholders).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-10 07:46:33 +02:00
parent fe7d0850a5
commit 7eb3772000
6 changed files with 372 additions and 72 deletions
+36 -34
View File
@@ -4,49 +4,51 @@
---
# REPORT — Hub: restore-test "passed with warnings" visibility (v0.7.5) (2026-06-09)
# REPORT — Slice 7 close-out: PBS escrow — hub opaque storage + doc 03 §8a (v0.8.0) (2026-06-10)
## Outcome
**Phase B (hub half) of `TASK — Restore-test must not false-fail on benign start warnings`.**
The agent (v0.7.0, already deployed + live-validated) now treats a benign guest-start advisory
(e.g. `WARN: Systemd 257 detected. You may need to enable nesting.`) as a PASS — verdict is
liveness, not the start-task exitstatus — and carries the warning text on the wire. This is the
hub half: ingest those fields and make a passed-with-warnings restore-test visible to the
operator instead of indistinguishable from a clean pass.
The `felhom.eu` half of `TASK — Slice 7 close-out: PBS recovery-code escrow`. The agent
(felhom-agent v0.9.0) creates an **opaque** `R`-wrapped copy of the PBS key in the zero-knowledge
default; this slice adds the **hub opaque storage** for that blob and rewrites **doc 03 §8a** into a
full key-custody posture model. The wrap→recover→restore round-trip was proven on a throwaway first
(`documentation/tests/slice7-escrow-spike-findings.md`).
## What landed (`hub/internal/api/handler.go`, golden, `host_test.go`)
## What landed (hub v0.8.0)
- **Wire mirror:** `hostRestoreTest` gains `warnings []string` + `warnings_recognized bool`
(`omitempty`), matching the agent's `hub.RestoreTest` field-for-field. An absent
`warnings_recognized``false` ⇒ the **louder** unrecognized path, so a missing flag can only
over-notice, never hide a real warning.
- **Ingest behaviour:** a passed restore-test that carried warnings now logs
`[INFO] restore-test passed WITH WARNINGS (recognized)` when every warning is the known-benign
anchor, escalated to `[WARN] … UNRECOGNIZED WARNINGS` otherwise (as loud as a failed PBS
verify). A FAILED restore-test still logs the existing `[WARN] … FAILED`.
- **Contract:** `restore_tests[0]` in the host-report golden gains the two keys; the golden stays
**byte-identical** with felhom-agent's copy (sha256 `e6999d77…`), and the bidirectional
key-set contract test round-trips the new keys through `hostRestoreTest`. `go test ./...` green.
- **`PUT /api/v1/hosts/{host_id}/escrow`** (`internal/api/handler.go`) — per-host-key authed (a host
writes only its own escrow; global operator key also accepted). Decodes the base64 blob and stores
the **opaque bytes verbatim** against the host. The hub **never decrypts** — there is no decrypt
path; it has no recovery code. Rotation is last-write-wins.
- **`host_escrow`** table + `SaveHostEscrow`/`GetHostEscrow` (`internal/store`). Blob is ciphertext.
- **Contract:** `escrowUploadRequest` mirrors the agent's emit struct (`blob_b64`, `key_fingerprint`,
`posture`, `created_at`); a key-set test in each repo guards drift.
- **Tests:** stores the blob byte-identical; rotation last-write-wins; 401 (absent/wrong key), 403
(host writing another host's escrow), 400 (bad base64); contract key-set. `go test ./...` green.
## Scope note — no dashboard widget this slice
## Documentation (doc 03 §8a)
The task asked to "surface in the dashboard distinctly from a clean pass." The hub web layer
currently renders **only controller-report data** — there is no host-domain dashboard surface
yet (guests/storage/restore_tests/pbs_snapshots are log+persist only; the failed-PBS-verify
signal is likewise log-only). Building one is out of scope here; distinct dashboard treatment
should land with the host-domain dashboard (slice 10). The operator signal this slice is the log
line, consistent with the established failed-PBS-verify precedent.
Rewrote §8a into the **key-custody posture model**: the **separation principle** (reading data needs
both chunks *and* a key; zero-knowledge holds while Felhom never holds both), the **topology matrix**
(data location × key custody → who can read; the one dangerous cell flagged), the **default**
(Felhom storage + customer-only key; `R` printed durably), the **anti-lockout ladder** ((b) wrapped
offline copy → (a) raw paperkey → Felhom-holds-a-key), **SSH-for-support is a separate grant** (not
coupled to key custody), **why zero-knowledge stays default** (breach + legal compellability), and
the **integrity caveat** for self-hosted-data postures. Corrected the storage-slice note: hub opaque
storage is **slice 7** (this task); only restore-mode **serving** is slice 10. §9 slice table + §13
updated.
## Backward compatibility
## Live validation
An agent that omits/empties `warnings`/`warnings_recognized` is accepted unchanged (the deployed
v0.7.4 hub already ignores them). The legacy controller report path is untouched.
After the v0.8.0 deploy, the demo agent's `--selftest=escrow-create -upload` PUT the opaque blob and
the hub stored it against the host; the stored bytes are **ciphertext** (not the key). The recovery
code `R` is never sent to or stored by the hub. *(No `R`/`K` value appears in any committed file.)*
## Deferred / security
Restore-mode serving + consumption → slice 10. The hub holds ciphertext only — possessing the blob
does not let Felhom read customer data (separation principle). No secrets committed.
## Deploy (GitOps)
Build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.5` → bump the `image:` tag in
`manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app (auto-sync off). Live-validated
after sync: the demo host's restore-test (agent v0.7.0, which passes-with-recognized-warnings on
the Debian-13 guest 9999) reflects on the hub as `passed WITH WARNINGS (recognized)` — not a
plain pass and not a FAILED.
Build+push `felhom-hub:v0.8.0` → bump `manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app.
+76 -35
View File
@@ -192,44 +192,84 @@ per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a ba
necessity, not just convenience. Integrity-verify (cheap, ciphertext-level) runs more often
as the lighter check.
### 8a. PBS recovery-code escrow (zero-knowledge offsite-key recovery)
### 8a. PBS recovery-code escrow + the key-custody posture model (zero-knowledge offsite-key recovery)
The DR substrate is the PBS offsite tier, and it is client-side encrypted (zero-knowledge): if the
box dies, restoring the offsite backups requires the **PBS client encryption key `K`**, which died
with the box. The escrow is how `K` comes back **without** Felhom ever being able to read customer
data. Design (decisions, with the rationale that pins them):
The DR substrate is the PBS offsite tier, client-side encrypted (zero-knowledge): if the box dies,
restoring the offsite backups requires the **PBS client encryption key `K`**, which died with the
box. The escrow is how `K` comes back **without** Felhom ever being able to read customer data.
**Status: implemented** — escrow *creation* (agent v0.9.0, `internal/escrow`) + hub *opaque storage*
(hub v0.8.0, `PUT /api/v1/hosts/{host_id}/escrow`). Validated end-to-end on a throwaway in
`documentation/tests/slice7-escrow-spike-findings.md`. Restore-mode *serving/consumption* is slice 10.
#### The separation principle (the rule that governs every posture)
Reading customer data needs **BOTH** the encrypted chunks **AND** a usable key. **Zero-knowledge
holds for exactly as long as Felhom never holds both at once.** Every posture below is just a
choice about where the data and the key live; the principle decides who can read.
#### Topology matrix (data location × key custody → who can read)
| Data location | Key custody | Who can read | Notes |
|---|---|---|---|
| **Felhom storage** | customer-only key | **only the customer** | **the DEFAULT** — genuine zero-knowledge |
| **Felhom storage** | Felhom also holds a key | **Felhom can read** | the one dangerous cell — explicit, informed opt-in only; never default, never silent |
| Customer's own offsite | customer key | only the customer | self-hosted data; key XOR data |
| Customer's own offsite | Felhom holds a key | only the customer | safe by separation (key and data never co-located at Felhom) |
#### The escrow mechanism (decisions + the rationale that pins them)
- **Live key unencrypted on the box** (`0600`, root): the agent backs up *and* runs restore-tests
unattended — no passphrase prompt on the management path. The privilege concentration this
implies is the whole argument for §3 root-minimization + a small auditable agent.
- **Wrap mechanism — PBS-native, not custom crypto.** At enrollment the agent generates a
high-entropy **recovery code `R`** and produces a **passphrase-protected copy of `K` under `R`**
using PBS's own key passphrase KDF (`proxmox-backup-client key` family). *Decision: lean on PBS's
documented, battle-tested key+passphrase path; do not roll a bespoke AEAD wrap.* Host/customer
binding is provided at the hub-storage layer (blob keyed by host-id), not by custom crypto.
unattended — no passphrase prompt on the management path. The privilege concentration this implies
is the whole argument for §3 root-minimization + a small auditable agent.
- **Wrap — PBS-native, not custom crypto.** At enrollment the agent generates a high-entropy
**recovery code `R`** and produces a **passphrase-protected copy of `K` under `R`** via PBS's own
key passphrase KDF (`proxmox-backup-client key change-passphrase --kdf scrypt`; no bespoke AEAD).
The spike pinned two implementation constraints: that command is **TTY-only** (drive it over a
pty), and the pty **echoes the passphrase** (discard the pty output so `R` can't leak) — F-A1/F-A2.
- **Agent-side generation.** `R` is generated **on the box** (it already holds `K` and does the
wrapping), so `R` never touches the hub even in transit — zero-knowledge by construction.
- **Escrow = the `R`-wrapped blob → hub.** The hub stores opaque ciphertext bound to the
host/customer. Without `R` it is undecryptable; the operator cannot read customer data. (Hub-side
storage schema for the blob is a slice-10 / doc-05 item.)
- **Recovery code custody.** `R` is shown to the customer **once** at enrollment (printed/displayed)
and **never stored by Felhom in recoverable form**. Format: a grouped/word-list code (≥128-bit
entropy) — it is transcribed off paper by a non-technical household, so raw base32 invites typos.
- **Consumption (slice 10, host-loss).** New box re-enrolls in restore mode → hub ships the escrow
blob → customer enters `R` → box unwraps `K` → PBS restores proceed.
- **Optional belt-and-suspenders (product decision, default OFF).** A PBS **paperkey** (the raw key,
for a safe) gives the customer a recovery path that survives *both* box loss *and* recovery-code
loss, at the cost of a higher-value secret (raw key on paper, no second factor). Default is
hub-escrow + `R` only; offer the paperkey as an opt-in "advanced" path.
wrapping), so `R` never touches the hub even in transit — zero-knowledge by construction. `R` is
≥128 bits, **word-list form** (EFF large wordlist, 10 words ≈ 129 bits) for off-paper transcription.
- **Self-verify before shipping.** Creation unwraps a copy of the blob with `R` and checks the key
fingerprint matches — "an escrow you haven't recovered isn't an escrow."
- **Escrow = the `R`-wrapped blob → hub (opaque storage, slice 7).** The hub stores the ciphertext
bytes against the host record and **never decrypts them** (it has no `R`; there is no decrypt
path). Per-host-key authed; rotation is last-write-wins. **Restore-mode serving is slice 10.**
- **Recovery code custody.** `R` is surfaced to the customer **exactly once** at enrollment
(printed/displayed) and **never stored by Felhom in any recoverable form**.
**Properties stated for honesty (these go to the customer at enrollment):**
#### Default posture + the anti-lockout ladder (opt-in, increasing trust)
**Default:** *Felhom storage + customer-only key*, and **`R` is delivered durably (printed) always**
— note this is distinct from a raw-key paperkey: `R` is a safe two-factor *passphrase* (useless
without the hub's blob); the raw key is the footgun. The ladder trades resilience for trust:
- **(b) `R`-wrapped offline copy** — the same two-factor blob, for the customer to print/store. **No
extra trust**; resilience if the hub ever vanishes (still needs `R`). *Implemented (opt-in).*
- **(a) raw paperkey** — `proxmox-backup-client key paperkey` of the unwrapped key, for a safe.
Covers **losing `R`**, but it is **single-factor and unrevocable**. *Implemented (opt-in, loud
caveat).*
- **Felhom-holds-a-key** — maximum convenience, but **gives up zero-knowledge** (the dangerous
matrix cell). **Not implemented** — it needs a separate Felhom-side secure key store + explicit
opt-in UX, built only when a customer asks.
#### SSH-for-support is a SEPARATE grant — deliberately not coupled to key custody
Support access (active / consented / observable — customer-toggleable, commands shown) is **not**
the same as a standing / passive / invisible decryption capability. The transparency features prove
*controlled* support access **without Felhom holding a key**. Conflating the two is exactly the
mistake the separation principle prevents.
#### Why zero-knowledge stays the default (breach + legal)
Holding data **and** a key makes a single hub breach an **all-customer data leak**, and makes Felhom
**compellable** — a court can order what Felhom *can* produce. Genuine zero-knowledge means *"we
can't be forced to hand over what we can't read."* This is core to the sovereignty pitch, not a
nicety.
#### Honesty properties (stated to the customer at enrollment)
- **Irreducible residual:** losing `R` *and* the box (and, if not opted in, having no paperkey) =
the offsite backups are **unrecoverable, by anyone, including Felhom.** This is the cost of
genuine zero-knowledge and must be communicated, not buried.
- **Rotation ≠ key rotation:** rotating `R` re-wraps the escrow blob (and re-shows the customer a
new code) but does **not** re-encrypt existing PBS data — that data stays keyed by `K`. Changing
`K` itself is a separate, heavier operation (new key → new backups; old backups still need old
`K`) and is out of scope for routine recovery-code rotation.
the offsite backups are **unrecoverable, by anyone, including Felhom.** The cost of genuine
zero-knowledge communicated, not buried.
- **Rotation ≠ key rotation:** rotating `R` re-wraps the escrow blob (and re-shows a new code) but
does **not** re-encrypt existing PBS data — that stays keyed by `K`. Changing `K` itself is a
separate, heavier op (new key → new backups; old backups still need old `K`), out of scope for
routine recovery-code rotation.
- **Integrity caveat (self-hosted-data postures):** moving data to the customer's own offsite
**loses Felhom's backup guarantees** — no PBS verify / monitoring on storage we can't reach. An
honest signup-time tradeoff, not a hidden one.
## 9. Provisioning & DR flows
@@ -295,7 +335,7 @@ this path — bring up + reattach external storage and it is whole. This is full
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment |
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) |
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) |
| PBS recovery-code escrow **creation** (§8a) | **7** | designed (§8a); implement |
| PBS recovery-code escrow **creation** + **hub opaque storage** (§8a) | **7** | **implemented** (agent v0.9.0 `internal/escrow`; hub v0.8.0 `PUT /hosts/{id}/escrow`) |
| Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) |
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) |
| PBS escrow **consumption** (recover `K` on a new box) | **10** | deferred — exercised by host-loss DR |
@@ -359,8 +399,9 @@ Still open:
- **Golden base image** refresh cadence + fleet versioning — operational, non-blocking (§9).
- **Identity-reset set** (live, link-up) — pinned empirically by the slice-7 bring-up spike; the
scenario-specific policy is settled in §9, the exact field list is the spike's deliverable.
- **Hub-side escrow storage + restore-mode serving** — the blob's hub schema and the restore-mode
desired-state handover are slice-10 / doc-05 (§8a, §9 host-loss).
- **Escrow restore-mode serving / consumption** — handing the opaque blob back to a re-enrolling
box and unwrapping `K` with `R` is slice-10 / doc-05 (§8a, §9 host-loss). *Escrow creation + hub
opaque storage are done (slice 7).*
This doc hands the implementation three contracts it was waiting on:
+24
View File
@@ -1,5 +1,29 @@
# Felhom Hub — Changelog
## v0.8.0 — opaque PBS recovery-code escrow storage (slice 7, doc 03 §8a) (2026-06-10)
Hub half of slice-7 close-out: store the agent's **opaque** `R`-wrapped PBS-key escrow blob. The
default posture is zero-knowledge — the hub holds ciphertext it **cannot open** (it has no recovery
code; there is no decrypt path). Pairs with felhom-agent v0.9.0 (escrow creation). Consumption /
restore-mode serving is slice 10.
### Added
- **`PUT /api/v1/hosts/{host_id}/escrow`** — authed with the **per-host key** (a host may only write
its own escrow; the global operator key is also accepted). Body mirrors the agent's emit struct
(`blob_b64`, `key_fingerprint`, `posture`, `created_at`). Stores the decoded **opaque bytes
verbatim**; rotation is last-write-wins. No serving this slice.
- **`host_escrow`** table (`host_id` PK, `blob` BLOB, fingerprint/posture/created_at). Store methods
`SaveHostEscrow` / `GetHostEscrow` (`HostEscrow`). The hub never transforms or decrypts the blob.
### Tests
- Stores the opaque blob **verbatim** (round-trips byte-identical); rotation last-write-wins;
rejects an absent/wrong key (401) and a host writing another host's escrow (403); bad/empty
base64 → 400; the wire-contract key-set matches the agent's emit struct.
### Security note
The hub stores ciphertext only — holding the blob does NOT let Felhom read customer data
(separation principle, doc 03 §8a). The per-host-key gate scopes writes to the owning host.
## v0.7.5 — restore-test "passed with warnings" visibility (2026-06-09)
Hub half of `TASK — Restore-test must not false-fail on benign start warnings` (Phase B). The
+111
View File
@@ -0,0 +1,111 @@
package api
import (
"encoding/base64"
"encoding/json"
"net/http"
"reflect"
"sort"
"testing"
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
)
// a stand-in for the opaque R-wrapped blob — the hub treats it as ciphertext it cannot read.
var opaqueBlob = []byte("\x00\x01OPAQUE-pbs-scrypt-keyfile-bytes\xff\xfe")
func escrowBody(blob []byte) string {
b, _ := json.Marshal(map[string]string{
"blob_b64": base64.StdEncoding.EncodeToString(blob),
"key_fingerprint": "ab:cd:ef",
"posture": "zero_knowledge",
"created_at": "2026-06-10T05:00:00Z",
})
return string(b)
}
func TestHandleHostEscrow_StoresOpaqueBlobVerbatim(t *testing.T) {
h, st, _ := newTestHandler(t)
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody(opaqueBlob))
if rr.Code != http.StatusOK {
t.Fatalf("PUT escrow = %d, want 200 (%s)", rr.Code, rr.Body.String())
}
got, err := st.GetHostEscrow("h1")
if err != nil || got == nil {
t.Fatalf("GetHostEscrow: %v / %v", got, err)
}
// the hub stored the OPAQUE bytes verbatim (it never decrypts / transforms them).
if !reflect.DeepEqual(got.Blob, opaqueBlob) {
t.Fatalf("stored blob != uploaded blob (hub must keep ciphertext verbatim)")
}
if got.KeyFingerprint != "ab:cd:ef" || got.Posture != "zero_knowledge" {
t.Errorf("metadata not stored: %+v", got)
}
}
func TestHandleHostEscrow_LastWriteWins(t *testing.T) {
h, st, _ := newTestHandler(t)
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody([]byte("first")))
rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody([]byte("second-rotated")))
if rr.Code != http.StatusOK {
t.Fatalf("rotation PUT = %d", rr.Code)
}
got, _ := st.GetHostEscrow("h1")
if string(got.Blob) != "second-rotated" {
t.Fatalf("rotation must be last-write-wins, got %q", got.Blob)
}
}
func TestHandleHostEscrow_AuthRejected(t *testing.T) {
h, st, _ := newTestHandler(t)
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
st.UpsertHost(&store.Host{HostID: "h2", CustomerID: "c2", APIKey: "HKEY2"})
// absent / wrong key → 401
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "", escrowBody(opaqueBlob)); rr.Code != http.StatusUnauthorized {
t.Errorf("no key: got %d want 401", rr.Code)
}
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "WRONG", escrowBody(opaqueBlob)); rr.Code != http.StatusUnauthorized {
t.Errorf("wrong key: got %d want 401", rr.Code)
}
// h2's key writing h1's escrow → 403 (a host may only write its own)
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY2", escrowBody(opaqueBlob)); rr.Code != http.StatusForbidden {
t.Errorf("host_id mismatch: got %d want 403", rr.Code)
}
// and nothing was stored for h1 by the rejected attempts.
if got, _ := st.GetHostEscrow("h1"); got != nil {
t.Errorf("rejected attempts must not store anything, got %+v", got)
}
}
func TestHandleHostEscrow_BadBody(t *testing.T) {
h, st, _ := newTestHandler(t)
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", `{"blob_b64":""}`); rr.Code != http.StatusBadRequest {
t.Errorf("empty blob: got %d want 400", rr.Code)
}
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", `{"blob_b64":"!!!not base64!!!"}`); rr.Code != http.StatusBadRequest {
t.Errorf("bad base64: got %d want 400", rr.Code)
}
}
// TestEscrowUploadContract pins the wire shape that MUST match the agent's emit struct
// (felhom-agent escrowUploadRequest). Cross-repo, no shared module — this is the hub half of the
// contract guard; the agent has the mirror in its own test.
func TestEscrowUploadContract(t *testing.T) {
b, _ := json.Marshal(escrowUploadRequest{BlobB64: "x", KeyFingerprint: "y", Posture: "z", CreatedAt: "t"})
var m map[string]any
json.Unmarshal(b, &m)
got := make([]string, 0, len(m))
for k := range m {
got = append(got, k)
}
sort.Strings(got)
want := []string{"blob_b64", "created_at", "key_fingerprint", "posture"}
if !reflect.DeepEqual(got, want) {
t.Fatalf("escrow wire contract drift: got %v want %v (must match the agent emit struct)", got, want)
}
}
+64
View File
@@ -3,6 +3,7 @@ package api
import (
"bytes"
"crypto/subtle"
"encoding/base64"
"encoding/json"
"fmt"
"io"
@@ -124,6 +125,9 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
h.handleHostReport(w, r)
case r.Method == http.MethodPost && path == "/admin/hosts":
h.handleAdminCreateHost(w, r)
case r.Method == http.MethodPut && strings.HasPrefix(path, "/hosts/") && strings.HasSuffix(path, "/escrow"):
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/hosts/"), "/escrow")
h.handleHostEscrowPut(w, r, hostID)
case r.Method == http.MethodPost && path == "/event":
h.handleEvent(w, r)
case r.Method == http.MethodPost && path == "/notify":
@@ -569,6 +573,66 @@ func (h *Handler) handleAdminCreateHost(w http.ResponseWriter, r *http.Request)
json.NewEncoder(w).Encode(map[string]string{"host_id": hostID, "api_key": apiKey})
}
// escrowUploadRequest is the agent→hub wire shape for the OPAQUE PBS recovery-code escrow blob
// (slice 7, doc 03 §8a). It MUST stay in lockstep with the agent's emit struct
// (felhom-agent cmd/felhom-agent escrowUploadRequest). The hub stores the bytes and NEVER decrypts
// them (it has no recovery code).
type escrowUploadRequest struct {
BlobB64 string `json:"blob_b64"` // base64 of the opaque R-wrapped blob (ciphertext)
KeyFingerprint string `json:"key_fingerprint"` // for operator display only
Posture string `json:"posture"` // e.g. "zero_knowledge"
CreatedAt string `json:"created_at"` // RFC3339
}
// handleHostEscrowPut stores a host's opaque escrow blob (doc 03 §8a). Authed with the PER-HOST key
// (a host may only write its own escrow; the global operator key is also accepted). The hub keeps
// the ciphertext and never opens it. Last-write-wins (rotation). No serving this slice (slice 10).
func (h *Handler) handleHostEscrowPut(w http.ResponseWriter, r *http.Request, pathHostID string) {
authHostID, _, isGlobal, ok := h.checkAuthHost(r)
if !ok {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
if pathHostID == "" {
http.Error(w, "Missing host_id", http.StatusBadRequest)
return
}
// A per-host key may only write ITS OWN escrow; the global key may write any.
if !isGlobal && authHostID != pathHostID {
http.Error(w, "Forbidden: host_id mismatch", http.StatusForbidden)
return
}
body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20)) // 1 MB cap; the blob is ~hundreds of bytes
if err != nil {
http.Error(w, "Bad request", http.StatusBadRequest)
return
}
var req escrowUploadRequest
if err := json.Unmarshal(body, &req); err != nil || req.BlobB64 == "" {
http.Error(w, "Invalid payload: blob_b64 required", http.StatusBadRequest)
return
}
blob, err := base64.StdEncoding.DecodeString(req.BlobB64)
if err != nil || len(blob) == 0 {
http.Error(w, "Invalid payload: blob_b64 not valid base64", http.StatusBadRequest)
return
}
createdAt := req.CreatedAt
if createdAt == "" {
createdAt = time.Now().UTC().Format(time.RFC3339)
}
// Store the OPAQUE bytes. No decrypt path exists — the hub cannot open this.
if err := h.store.SaveHostEscrow(pathHostID, blob, req.KeyFingerprint, req.Posture, createdAt); err != nil {
h.logger.Printf("[ERROR] Failed to store escrow for host %s: %v", pathHostID, err)
http.Error(w, "Internal error", http.StatusInternalServerError)
return
}
h.logger.Printf("[INFO] stored opaque escrow blob for host %s (%d bytes, posture=%s, fp=%s)",
pathHostID, len(blob), req.Posture, req.KeyFingerprint)
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"status":"ok"}`))
}
// allowedEventTypes lists all valid event_type values the Hub accepts.
var allowedEventTypes = map[string]bool{
// Controller-pushed events
+58
View File
@@ -269,6 +269,19 @@ func (s *Store) migrate() error {
);
CREATE INDEX IF NOT EXISTS idx_host_reports_host ON host_reports(host_id, received_at DESC);
CREATE INDEX IF NOT EXISTS idx_host_reports_customer ON host_reports(customer_id, received_at DESC);
-- host_escrow (slice 7, doc 03 §8a): the OPAQUE R-wrapped PBS-key escrow blob. The hub
-- stores the ciphertext bytes against the host and NEVER decrypts them (it has no recovery
-- code). One row per host; a re-upload (rotation) is last-write-wins. Restore-mode serving
-- (handing the blob back to a re-enrolling box) is slice 10.
CREATE TABLE IF NOT EXISTS host_escrow (
host_id TEXT PRIMARY KEY,
blob BLOB NOT NULL,
key_fingerprint TEXT NOT NULL DEFAULT '',
posture TEXT NOT NULL DEFAULT '',
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL DEFAULT (datetime('now'))
);
`)
if err != nil {
return err
@@ -1381,6 +1394,51 @@ func (s *Store) UpsertHost(h *Host) error {
return err
}
// HostEscrow is the opaque R-wrapped escrow blob stored for a host (doc 03 §8a). Blob is
// ciphertext the hub cannot open.
type HostEscrow struct {
HostID string
Blob []byte
KeyFingerprint string
Posture string
CreatedAt string
UpdatedAt string
}
// SaveHostEscrow stores (last-write-wins) the OPAQUE escrow blob for a host. The hub keeps the
// bytes and NEVER decrypts them — there is no decrypt path. createdAt is the agent's timestamp.
func (s *Store) SaveHostEscrow(hostID string, blob []byte, keyFingerprint, posture, createdAt string) error {
_, err := s.db.Exec(`
INSERT INTO host_escrow (host_id, blob, key_fingerprint, posture, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, datetime('now'))
ON CONFLICT(host_id) DO UPDATE SET
blob = excluded.blob,
key_fingerprint = excluded.key_fingerprint,
posture = excluded.posture,
created_at = excluded.created_at,
updated_at = datetime('now')`,
hostID, blob, keyFingerprint, posture, createdAt,
)
return err
}
// GetHostEscrow returns the stored opaque escrow for a host (nil if none). Used by tests and
// (future, slice 10) restore-mode serving. The hub returns bytes verbatim; it never decrypts.
func (s *Store) GetHostEscrow(hostID string) (*HostEscrow, error) {
var e HostEscrow
err := s.db.QueryRow(`
SELECT host_id, blob, key_fingerprint, posture, created_at, updated_at
FROM host_escrow WHERE host_id = ?`, hostID).
Scan(&e.HostID, &e.Blob, &e.KeyFingerprint, &e.Posture, &e.CreatedAt, &e.UpdatedAt)
if err == sql.ErrNoRows {
return nil, nil
}
if err != nil {
return nil, err
}
return &e, nil
}
// SaveHostReport inserts a host_reports row and bumps the host's reality columns
// (agent_version/last_report_at/updated_at) — never the inert intent columns.
func (s *Store) SaveHostReport(hostID, customerID string, reportJSON []byte, d HostReportDenorm) error {