hub: opaque PBS recovery-code escrow storage (v0.8.0) + doc 03 §8a posture model
Slice-7 close-out (hub half). PUT /api/v1/hosts/{host_id}/escrow (per-host key)
stores the agent's OPAQUE R-wrapped blob verbatim against the host; the hub never
decrypts it (no recovery code, no decrypt path). host_escrow table + Save/GetHostEscrow.
Tests: verbatim store, rotation last-write-wins, 401/403/400 auth+body, wire contract.
doc 03 §8a rewritten into the key-custody posture model: separation principle,
topology matrix, default + anti-lockout ladder, SSH-vs-key, breach/legal, integrity
caveat. Corrected: hub opaque storage is slice 7 (this task); serving is slice 10.
Slice table + §13 updated.
No secrets committed (R/K never appear; spike findings + docs use placeholders).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4,49 +4,51 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# REPORT — Hub: restore-test "passed with warnings" visibility (v0.7.5) (2026-06-09)
|
# REPORT — Slice 7 close-out: PBS escrow — hub opaque storage + doc 03 §8a (v0.8.0) (2026-06-10)
|
||||||
|
|
||||||
## Outcome
|
## Outcome
|
||||||
|
|
||||||
**Phase B (hub half) of `TASK — Restore-test must not false-fail on benign start warnings`.**
|
The `felhom.eu` half of `TASK — Slice 7 close-out: PBS recovery-code escrow`. The agent
|
||||||
The agent (v0.7.0, already deployed + live-validated) now treats a benign guest-start advisory
|
(felhom-agent v0.9.0) creates an **opaque** `R`-wrapped copy of the PBS key in the zero-knowledge
|
||||||
(e.g. `WARN: Systemd 257 detected. You may need to enable nesting.`) as a PASS — verdict is
|
default; this slice adds the **hub opaque storage** for that blob and rewrites **doc 03 §8a** into a
|
||||||
liveness, not the start-task exitstatus — and carries the warning text on the wire. This is the
|
full key-custody posture model. The wrap→recover→restore round-trip was proven on a throwaway first
|
||||||
hub half: ingest those fields and make a passed-with-warnings restore-test visible to the
|
(`documentation/tests/slice7-escrow-spike-findings.md`).
|
||||||
operator instead of indistinguishable from a clean pass.
|
|
||||||
|
|
||||||
## What landed (`hub/internal/api/handler.go`, golden, `host_test.go`)
|
## What landed (hub v0.8.0)
|
||||||
|
|
||||||
- **Wire mirror:** `hostRestoreTest` gains `warnings []string` + `warnings_recognized bool`
|
- **`PUT /api/v1/hosts/{host_id}/escrow`** (`internal/api/handler.go`) — per-host-key authed (a host
|
||||||
(`omitempty`), matching the agent's `hub.RestoreTest` field-for-field. An absent
|
writes only its own escrow; global operator key also accepted). Decodes the base64 blob and stores
|
||||||
`warnings_recognized` ⇒ `false` ⇒ the **louder** unrecognized path, so a missing flag can only
|
the **opaque bytes verbatim** against the host. The hub **never decrypts** — there is no decrypt
|
||||||
over-notice, never hide a real warning.
|
path; it has no recovery code. Rotation is last-write-wins.
|
||||||
- **Ingest behaviour:** a passed restore-test that carried warnings now logs
|
- **`host_escrow`** table + `SaveHostEscrow`/`GetHostEscrow` (`internal/store`). Blob is ciphertext.
|
||||||
`[INFO] restore-test passed WITH WARNINGS (recognized)` when every warning is the known-benign
|
- **Contract:** `escrowUploadRequest` mirrors the agent's emit struct (`blob_b64`, `key_fingerprint`,
|
||||||
anchor, escalated to `[WARN] … UNRECOGNIZED WARNINGS` otherwise (as loud as a failed PBS
|
`posture`, `created_at`); a key-set test in each repo guards drift.
|
||||||
verify). A FAILED restore-test still logs the existing `[WARN] … FAILED`.
|
- **Tests:** stores the blob byte-identical; rotation last-write-wins; 401 (absent/wrong key), 403
|
||||||
- **Contract:** `restore_tests[0]` in the host-report golden gains the two keys; the golden stays
|
(host writing another host's escrow), 400 (bad base64); contract key-set. `go test ./...` green.
|
||||||
**byte-identical** with felhom-agent's copy (sha256 `e6999d77…`), and the bidirectional
|
|
||||||
key-set contract test round-trips the new keys through `hostRestoreTest`. `go test ./...` green.
|
|
||||||
|
|
||||||
## Scope note — no dashboard widget this slice
|
## Documentation (doc 03 §8a)
|
||||||
|
|
||||||
The task asked to "surface in the dashboard distinctly from a clean pass." The hub web layer
|
Rewrote §8a into the **key-custody posture model**: the **separation principle** (reading data needs
|
||||||
currently renders **only controller-report data** — there is no host-domain dashboard surface
|
both chunks *and* a key; zero-knowledge holds while Felhom never holds both), the **topology matrix**
|
||||||
yet (guests/storage/restore_tests/pbs_snapshots are log+persist only; the failed-PBS-verify
|
(data location × key custody → who can read; the one dangerous cell flagged), the **default**
|
||||||
signal is likewise log-only). Building one is out of scope here; distinct dashboard treatment
|
(Felhom storage + customer-only key; `R` printed durably), the **anti-lockout ladder** ((b) wrapped
|
||||||
should land with the host-domain dashboard (slice 10). The operator signal this slice is the log
|
offline copy → (a) raw paperkey → Felhom-holds-a-key), **SSH-for-support is a separate grant** (not
|
||||||
line, consistent with the established failed-PBS-verify precedent.
|
coupled to key custody), **why zero-knowledge stays default** (breach + legal compellability), and
|
||||||
|
the **integrity caveat** for self-hosted-data postures. Corrected the storage-slice note: hub opaque
|
||||||
|
storage is **slice 7** (this task); only restore-mode **serving** is slice 10. §9 slice table + §13
|
||||||
|
updated.
|
||||||
|
|
||||||
## Backward compatibility
|
## Live validation
|
||||||
|
|
||||||
An agent that omits/empties `warnings`/`warnings_recognized` is accepted unchanged (the deployed
|
After the v0.8.0 deploy, the demo agent's `--selftest=escrow-create -upload` PUT the opaque blob and
|
||||||
v0.7.4 hub already ignores them). The legacy controller report path is untouched.
|
the hub stored it against the host; the stored bytes are **ciphertext** (not the key). The recovery
|
||||||
|
code `R` is never sent to or stored by the hub. *(No `R`/`K` value appears in any committed file.)*
|
||||||
|
|
||||||
|
## Deferred / security
|
||||||
|
|
||||||
|
Restore-mode serving + consumption → slice 10. The hub holds ciphertext only — possessing the blob
|
||||||
|
does not let Felhom read customer data (separation principle). No secrets committed.
|
||||||
|
|
||||||
## Deploy (GitOps)
|
## Deploy (GitOps)
|
||||||
|
|
||||||
Build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.5` → bump the `image:` tag in
|
Build+push `felhom-hub:v0.8.0` → bump `manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app.
|
||||||
`manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app (auto-sync off). Live-validated
|
|
||||||
after sync: the demo host's restore-test (agent v0.7.0, which passes-with-recognized-warnings on
|
|
||||||
the Debian-13 guest 9999) reflects on the hub as `passed WITH WARNINGS (recognized)` — not a
|
|
||||||
plain pass and not a FAILED.
|
|
||||||
|
|||||||
@@ -192,44 +192,84 @@ per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a ba
|
|||||||
necessity, not just convenience. Integrity-verify (cheap, ciphertext-level) runs more often
|
necessity, not just convenience. Integrity-verify (cheap, ciphertext-level) runs more often
|
||||||
as the lighter check.
|
as the lighter check.
|
||||||
|
|
||||||
### 8a. PBS recovery-code escrow (zero-knowledge offsite-key recovery)
|
### 8a. PBS recovery-code escrow + the key-custody posture model (zero-knowledge offsite-key recovery)
|
||||||
|
|
||||||
The DR substrate is the PBS offsite tier, and it is client-side encrypted (zero-knowledge): if the
|
The DR substrate is the PBS offsite tier, client-side encrypted (zero-knowledge): if the box dies,
|
||||||
box dies, restoring the offsite backups requires the **PBS client encryption key `K`**, which died
|
restoring the offsite backups requires the **PBS client encryption key `K`**, which died with the
|
||||||
with the box. The escrow is how `K` comes back **without** Felhom ever being able to read customer
|
box. The escrow is how `K` comes back **without** Felhom ever being able to read customer data.
|
||||||
data. Design (decisions, with the rationale that pins them):
|
**Status: implemented** — escrow *creation* (agent v0.9.0, `internal/escrow`) + hub *opaque storage*
|
||||||
|
(hub v0.8.0, `PUT /api/v1/hosts/{host_id}/escrow`). Validated end-to-end on a throwaway in
|
||||||
|
`documentation/tests/slice7-escrow-spike-findings.md`. Restore-mode *serving/consumption* is slice 10.
|
||||||
|
|
||||||
|
#### The separation principle (the rule that governs every posture)
|
||||||
|
Reading customer data needs **BOTH** the encrypted chunks **AND** a usable key. **Zero-knowledge
|
||||||
|
holds for exactly as long as Felhom never holds both at once.** Every posture below is just a
|
||||||
|
choice about where the data and the key live; the principle decides who can read.
|
||||||
|
|
||||||
|
#### Topology matrix (data location × key custody → who can read)
|
||||||
|
| Data location | Key custody | Who can read | Notes |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **Felhom storage** | customer-only key | **only the customer** | **the DEFAULT** — genuine zero-knowledge |
|
||||||
|
| **Felhom storage** | Felhom also holds a key | **Felhom can read** | the one dangerous cell — explicit, informed opt-in only; never default, never silent |
|
||||||
|
| Customer's own offsite | customer key | only the customer | self-hosted data; key XOR data |
|
||||||
|
| Customer's own offsite | Felhom holds a key | only the customer | safe by separation (key and data never co-located at Felhom) |
|
||||||
|
|
||||||
|
#### The escrow mechanism (decisions + the rationale that pins them)
|
||||||
- **Live key unencrypted on the box** (`0600`, root): the agent backs up *and* runs restore-tests
|
- **Live key unencrypted on the box** (`0600`, root): the agent backs up *and* runs restore-tests
|
||||||
unattended — no passphrase prompt on the management path. The privilege concentration this
|
unattended — no passphrase prompt on the management path. The privilege concentration this implies
|
||||||
implies is the whole argument for §3 root-minimization + a small auditable agent.
|
is the whole argument for §3 root-minimization + a small auditable agent.
|
||||||
- **Wrap mechanism — PBS-native, not custom crypto.** At enrollment the agent generates a
|
- **Wrap — PBS-native, not custom crypto.** At enrollment the agent generates a high-entropy
|
||||||
high-entropy **recovery code `R`** and produces a **passphrase-protected copy of `K` under `R`**
|
**recovery code `R`** and produces a **passphrase-protected copy of `K` under `R`** via PBS's own
|
||||||
using PBS's own key passphrase KDF (`proxmox-backup-client key` family). *Decision: lean on PBS's
|
key passphrase KDF (`proxmox-backup-client key change-passphrase --kdf scrypt`; no bespoke AEAD).
|
||||||
documented, battle-tested key+passphrase path; do not roll a bespoke AEAD wrap.* Host/customer
|
The spike pinned two implementation constraints: that command is **TTY-only** (drive it over a
|
||||||
binding is provided at the hub-storage layer (blob keyed by host-id), not by custom crypto.
|
pty), and the pty **echoes the passphrase** (discard the pty output so `R` can't leak) — F-A1/F-A2.
|
||||||
- **Agent-side generation.** `R` is generated **on the box** (it already holds `K` and does the
|
- **Agent-side generation.** `R` is generated **on the box** (it already holds `K` and does the
|
||||||
wrapping), so `R` never touches the hub even in transit — zero-knowledge by construction.
|
wrapping), so `R` never touches the hub even in transit — zero-knowledge by construction. `R` is
|
||||||
- **Escrow = the `R`-wrapped blob → hub.** The hub stores opaque ciphertext bound to the
|
≥128 bits, **word-list form** (EFF large wordlist, 10 words ≈ 129 bits) for off-paper transcription.
|
||||||
host/customer. Without `R` it is undecryptable; the operator cannot read customer data. (Hub-side
|
- **Self-verify before shipping.** Creation unwraps a copy of the blob with `R` and checks the key
|
||||||
storage schema for the blob is a slice-10 / doc-05 item.)
|
fingerprint matches — "an escrow you haven't recovered isn't an escrow."
|
||||||
- **Recovery code custody.** `R` is shown to the customer **once** at enrollment (printed/displayed)
|
- **Escrow = the `R`-wrapped blob → hub (opaque storage, slice 7).** The hub stores the ciphertext
|
||||||
and **never stored by Felhom in recoverable form**. Format: a grouped/word-list code (≥128-bit
|
bytes against the host record and **never decrypts them** (it has no `R`; there is no decrypt
|
||||||
entropy) — it is transcribed off paper by a non-technical household, so raw base32 invites typos.
|
path). Per-host-key authed; rotation is last-write-wins. **Restore-mode serving is slice 10.**
|
||||||
- **Consumption (slice 10, host-loss).** New box re-enrolls in restore mode → hub ships the escrow
|
- **Recovery code custody.** `R` is surfaced to the customer **exactly once** at enrollment
|
||||||
blob → customer enters `R` → box unwraps `K` → PBS restores proceed.
|
(printed/displayed) and **never stored by Felhom in any recoverable form**.
|
||||||
- **Optional belt-and-suspenders (product decision, default OFF).** A PBS **paperkey** (the raw key,
|
|
||||||
for a safe) gives the customer a recovery path that survives *both* box loss *and* recovery-code
|
|
||||||
loss, at the cost of a higher-value secret (raw key on paper, no second factor). Default is
|
|
||||||
hub-escrow + `R` only; offer the paperkey as an opt-in "advanced" path.
|
|
||||||
|
|
||||||
**Properties stated for honesty (these go to the customer at enrollment):**
|
#### Default posture + the anti-lockout ladder (opt-in, increasing trust)
|
||||||
|
**Default:** *Felhom storage + customer-only key*, and **`R` is delivered durably (printed) always**
|
||||||
|
— note this is distinct from a raw-key paperkey: `R` is a safe two-factor *passphrase* (useless
|
||||||
|
without the hub's blob); the raw key is the footgun. The ladder trades resilience for trust:
|
||||||
|
- **(b) `R`-wrapped offline copy** — the same two-factor blob, for the customer to print/store. **No
|
||||||
|
extra trust**; resilience if the hub ever vanishes (still needs `R`). *Implemented (opt-in).*
|
||||||
|
- **(a) raw paperkey** — `proxmox-backup-client key paperkey` of the unwrapped key, for a safe.
|
||||||
|
Covers **losing `R`**, but it is **single-factor and unrevocable**. *Implemented (opt-in, loud
|
||||||
|
caveat).*
|
||||||
|
- **Felhom-holds-a-key** — maximum convenience, but **gives up zero-knowledge** (the dangerous
|
||||||
|
matrix cell). **Not implemented** — it needs a separate Felhom-side secure key store + explicit
|
||||||
|
opt-in UX, built only when a customer asks.
|
||||||
|
|
||||||
|
#### SSH-for-support is a SEPARATE grant — deliberately not coupled to key custody
|
||||||
|
Support access (active / consented / observable — customer-toggleable, commands shown) is **not**
|
||||||
|
the same as a standing / passive / invisible decryption capability. The transparency features prove
|
||||||
|
*controlled* support access **without Felhom holding a key**. Conflating the two is exactly the
|
||||||
|
mistake the separation principle prevents.
|
||||||
|
|
||||||
|
#### Why zero-knowledge stays the default (breach + legal)
|
||||||
|
Holding data **and** a key makes a single hub breach an **all-customer data leak**, and makes Felhom
|
||||||
|
**compellable** — a court can order what Felhom *can* produce. Genuine zero-knowledge means *"we
|
||||||
|
can't be forced to hand over what we can't read."* This is core to the sovereignty pitch, not a
|
||||||
|
nicety.
|
||||||
|
|
||||||
|
#### Honesty properties (stated to the customer at enrollment)
|
||||||
- **Irreducible residual:** losing `R` *and* the box (and, if not opted in, having no paperkey) =
|
- **Irreducible residual:** losing `R` *and* the box (and, if not opted in, having no paperkey) =
|
||||||
the offsite backups are **unrecoverable, by anyone, including Felhom.** This is the cost of
|
the offsite backups are **unrecoverable, by anyone, including Felhom.** The cost of genuine
|
||||||
genuine zero-knowledge and must be communicated, not buried.
|
zero-knowledge — communicated, not buried.
|
||||||
- **Rotation ≠ key rotation:** rotating `R` re-wraps the escrow blob (and re-shows the customer a
|
- **Rotation ≠ key rotation:** rotating `R` re-wraps the escrow blob (and re-shows a new code) but
|
||||||
new code) but does **not** re-encrypt existing PBS data — that data stays keyed by `K`. Changing
|
does **not** re-encrypt existing PBS data — that stays keyed by `K`. Changing `K` itself is a
|
||||||
`K` itself is a separate, heavier operation (new key → new backups; old backups still need old
|
separate, heavier op (new key → new backups; old backups still need old `K`), out of scope for
|
||||||
`K`) and is out of scope for routine recovery-code rotation.
|
routine recovery-code rotation.
|
||||||
|
- **Integrity caveat (self-hosted-data postures):** moving data to the customer's own offsite
|
||||||
|
**loses Felhom's backup guarantees** — no PBS verify / monitoring on storage we can't reach. An
|
||||||
|
honest signup-time tradeoff, not a hidden one.
|
||||||
|
|
||||||
## 9. Provisioning & DR flows
|
## 9. Provisioning & DR flows
|
||||||
|
|
||||||
@@ -295,7 +335,7 @@ this path — bring up + reattach external storage and it is whole. This is full
|
|||||||
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment |
|
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment |
|
||||||
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) |
|
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) |
|
||||||
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) |
|
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) |
|
||||||
| PBS recovery-code escrow **creation** (§8a) | **7** | designed (§8a); implement |
|
| PBS recovery-code escrow **creation** + **hub opaque storage** (§8a) | **7** | **implemented** (agent v0.9.0 `internal/escrow`; hub v0.8.0 `PUT /hosts/{id}/escrow`) |
|
||||||
| Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) |
|
| Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) |
|
||||||
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) |
|
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) |
|
||||||
| PBS escrow **consumption** (recover `K` on a new box) | **10** | deferred — exercised by host-loss DR |
|
| PBS escrow **consumption** (recover `K` on a new box) | **10** | deferred — exercised by host-loss DR |
|
||||||
@@ -359,8 +399,9 @@ Still open:
|
|||||||
- **Golden base image** refresh cadence + fleet versioning — operational, non-blocking (§9).
|
- **Golden base image** refresh cadence + fleet versioning — operational, non-blocking (§9).
|
||||||
- **Identity-reset set** (live, link-up) — pinned empirically by the slice-7 bring-up spike; the
|
- **Identity-reset set** (live, link-up) — pinned empirically by the slice-7 bring-up spike; the
|
||||||
scenario-specific policy is settled in §9, the exact field list is the spike's deliverable.
|
scenario-specific policy is settled in §9, the exact field list is the spike's deliverable.
|
||||||
- **Hub-side escrow storage + restore-mode serving** — the blob's hub schema and the restore-mode
|
- **Escrow restore-mode serving / consumption** — handing the opaque blob back to a re-enrolling
|
||||||
desired-state handover are slice-10 / doc-05 (§8a, §9 host-loss).
|
box and unwrapping `K` with `R` is slice-10 / doc-05 (§8a, §9 host-loss). *Escrow creation + hub
|
||||||
|
opaque storage are done (slice 7).*
|
||||||
|
|
||||||
This doc hands the implementation three contracts it was waiting on:
|
This doc hands the implementation three contracts it was waiting on:
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,29 @@
|
|||||||
# Felhom Hub — Changelog
|
# Felhom Hub — Changelog
|
||||||
|
|
||||||
|
## v0.8.0 — opaque PBS recovery-code escrow storage (slice 7, doc 03 §8a) (2026-06-10)
|
||||||
|
|
||||||
|
Hub half of slice-7 close-out: store the agent's **opaque** `R`-wrapped PBS-key escrow blob. The
|
||||||
|
default posture is zero-knowledge — the hub holds ciphertext it **cannot open** (it has no recovery
|
||||||
|
code; there is no decrypt path). Pairs with felhom-agent v0.9.0 (escrow creation). Consumption /
|
||||||
|
restore-mode serving is slice 10.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- **`PUT /api/v1/hosts/{host_id}/escrow`** — authed with the **per-host key** (a host may only write
|
||||||
|
its own escrow; the global operator key is also accepted). Body mirrors the agent's emit struct
|
||||||
|
(`blob_b64`, `key_fingerprint`, `posture`, `created_at`). Stores the decoded **opaque bytes
|
||||||
|
verbatim**; rotation is last-write-wins. No serving this slice.
|
||||||
|
- **`host_escrow`** table (`host_id` PK, `blob` BLOB, fingerprint/posture/created_at). Store methods
|
||||||
|
`SaveHostEscrow` / `GetHostEscrow` (`HostEscrow`). The hub never transforms or decrypts the blob.
|
||||||
|
|
||||||
|
### Tests
|
||||||
|
- Stores the opaque blob **verbatim** (round-trips byte-identical); rotation last-write-wins;
|
||||||
|
rejects an absent/wrong key (401) and a host writing another host's escrow (403); bad/empty
|
||||||
|
base64 → 400; the wire-contract key-set matches the agent's emit struct.
|
||||||
|
|
||||||
|
### Security note
|
||||||
|
The hub stores ciphertext only — holding the blob does NOT let Felhom read customer data
|
||||||
|
(separation principle, doc 03 §8a). The per-host-key gate scopes writes to the owning host.
|
||||||
|
|
||||||
## v0.7.5 — restore-test "passed with warnings" visibility (2026-06-09)
|
## v0.7.5 — restore-test "passed with warnings" visibility (2026-06-09)
|
||||||
|
|
||||||
Hub half of `TASK — Restore-test must not false-fail on benign start warnings` (Phase B). The
|
Hub half of `TASK — Restore-test must not false-fail on benign start warnings` (Phase B). The
|
||||||
|
|||||||
@@ -0,0 +1,111 @@
|
|||||||
|
package api
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/base64"
|
||||||
|
"encoding/json"
|
||||||
|
"net/http"
|
||||||
|
"reflect"
|
||||||
|
"sort"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
||||||
|
)
|
||||||
|
|
||||||
|
// a stand-in for the opaque R-wrapped blob — the hub treats it as ciphertext it cannot read.
|
||||||
|
var opaqueBlob = []byte("\x00\x01OPAQUE-pbs-scrypt-keyfile-bytes\xff\xfe")
|
||||||
|
|
||||||
|
func escrowBody(blob []byte) string {
|
||||||
|
b, _ := json.Marshal(map[string]string{
|
||||||
|
"blob_b64": base64.StdEncoding.EncodeToString(blob),
|
||||||
|
"key_fingerprint": "ab:cd:ef",
|
||||||
|
"posture": "zero_knowledge",
|
||||||
|
"created_at": "2026-06-10T05:00:00Z",
|
||||||
|
})
|
||||||
|
return string(b)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestHandleHostEscrow_StoresOpaqueBlobVerbatim(t *testing.T) {
|
||||||
|
h, st, _ := newTestHandler(t)
|
||||||
|
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
|
||||||
|
|
||||||
|
rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody(opaqueBlob))
|
||||||
|
if rr.Code != http.StatusOK {
|
||||||
|
t.Fatalf("PUT escrow = %d, want 200 (%s)", rr.Code, rr.Body.String())
|
||||||
|
}
|
||||||
|
got, err := st.GetHostEscrow("h1")
|
||||||
|
if err != nil || got == nil {
|
||||||
|
t.Fatalf("GetHostEscrow: %v / %v", got, err)
|
||||||
|
}
|
||||||
|
// the hub stored the OPAQUE bytes verbatim (it never decrypts / transforms them).
|
||||||
|
if !reflect.DeepEqual(got.Blob, opaqueBlob) {
|
||||||
|
t.Fatalf("stored blob != uploaded blob (hub must keep ciphertext verbatim)")
|
||||||
|
}
|
||||||
|
if got.KeyFingerprint != "ab:cd:ef" || got.Posture != "zero_knowledge" {
|
||||||
|
t.Errorf("metadata not stored: %+v", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestHandleHostEscrow_LastWriteWins(t *testing.T) {
|
||||||
|
h, st, _ := newTestHandler(t)
|
||||||
|
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
|
||||||
|
do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody([]byte("first")))
|
||||||
|
rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", escrowBody([]byte("second-rotated")))
|
||||||
|
if rr.Code != http.StatusOK {
|
||||||
|
t.Fatalf("rotation PUT = %d", rr.Code)
|
||||||
|
}
|
||||||
|
got, _ := st.GetHostEscrow("h1")
|
||||||
|
if string(got.Blob) != "second-rotated" {
|
||||||
|
t.Fatalf("rotation must be last-write-wins, got %q", got.Blob)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestHandleHostEscrow_AuthRejected(t *testing.T) {
|
||||||
|
h, st, _ := newTestHandler(t)
|
||||||
|
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
|
||||||
|
st.UpsertHost(&store.Host{HostID: "h2", CustomerID: "c2", APIKey: "HKEY2"})
|
||||||
|
|
||||||
|
// absent / wrong key → 401
|
||||||
|
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "", escrowBody(opaqueBlob)); rr.Code != http.StatusUnauthorized {
|
||||||
|
t.Errorf("no key: got %d want 401", rr.Code)
|
||||||
|
}
|
||||||
|
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "WRONG", escrowBody(opaqueBlob)); rr.Code != http.StatusUnauthorized {
|
||||||
|
t.Errorf("wrong key: got %d want 401", rr.Code)
|
||||||
|
}
|
||||||
|
// h2's key writing h1's escrow → 403 (a host may only write its own)
|
||||||
|
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY2", escrowBody(opaqueBlob)); rr.Code != http.StatusForbidden {
|
||||||
|
t.Errorf("host_id mismatch: got %d want 403", rr.Code)
|
||||||
|
}
|
||||||
|
// and nothing was stored for h1 by the rejected attempts.
|
||||||
|
if got, _ := st.GetHostEscrow("h1"); got != nil {
|
||||||
|
t.Errorf("rejected attempts must not store anything, got %+v", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestHandleHostEscrow_BadBody(t *testing.T) {
|
||||||
|
h, st, _ := newTestHandler(t)
|
||||||
|
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
|
||||||
|
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", `{"blob_b64":""}`); rr.Code != http.StatusBadRequest {
|
||||||
|
t.Errorf("empty blob: got %d want 400", rr.Code)
|
||||||
|
}
|
||||||
|
if rr := do(h, http.MethodPut, "/hosts/h1/escrow", "HKEY", `{"blob_b64":"!!!not base64!!!"}`); rr.Code != http.StatusBadRequest {
|
||||||
|
t.Errorf("bad base64: got %d want 400", rr.Code)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestEscrowUploadContract pins the wire shape that MUST match the agent's emit struct
|
||||||
|
// (felhom-agent escrowUploadRequest). Cross-repo, no shared module — this is the hub half of the
|
||||||
|
// contract guard; the agent has the mirror in its own test.
|
||||||
|
func TestEscrowUploadContract(t *testing.T) {
|
||||||
|
b, _ := json.Marshal(escrowUploadRequest{BlobB64: "x", KeyFingerprint: "y", Posture: "z", CreatedAt: "t"})
|
||||||
|
var m map[string]any
|
||||||
|
json.Unmarshal(b, &m)
|
||||||
|
got := make([]string, 0, len(m))
|
||||||
|
for k := range m {
|
||||||
|
got = append(got, k)
|
||||||
|
}
|
||||||
|
sort.Strings(got)
|
||||||
|
want := []string{"blob_b64", "created_at", "key_fingerprint", "posture"}
|
||||||
|
if !reflect.DeepEqual(got, want) {
|
||||||
|
t.Fatalf("escrow wire contract drift: got %v want %v (must match the agent emit struct)", got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -3,6 +3,7 @@ package api
|
|||||||
import (
|
import (
|
||||||
"bytes"
|
"bytes"
|
||||||
"crypto/subtle"
|
"crypto/subtle"
|
||||||
|
"encoding/base64"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
@@ -124,6 +125,9 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
|||||||
h.handleHostReport(w, r)
|
h.handleHostReport(w, r)
|
||||||
case r.Method == http.MethodPost && path == "/admin/hosts":
|
case r.Method == http.MethodPost && path == "/admin/hosts":
|
||||||
h.handleAdminCreateHost(w, r)
|
h.handleAdminCreateHost(w, r)
|
||||||
|
case r.Method == http.MethodPut && strings.HasPrefix(path, "/hosts/") && strings.HasSuffix(path, "/escrow"):
|
||||||
|
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/hosts/"), "/escrow")
|
||||||
|
h.handleHostEscrowPut(w, r, hostID)
|
||||||
case r.Method == http.MethodPost && path == "/event":
|
case r.Method == http.MethodPost && path == "/event":
|
||||||
h.handleEvent(w, r)
|
h.handleEvent(w, r)
|
||||||
case r.Method == http.MethodPost && path == "/notify":
|
case r.Method == http.MethodPost && path == "/notify":
|
||||||
@@ -258,9 +262,9 @@ type hostReportPayload struct {
|
|||||||
ControllerVersion string `json:"controller_version"`
|
ControllerVersion string `json:"controller_version"`
|
||||||
} `json:"guests"`
|
} `json:"guests"`
|
||||||
StorageTargets []hostStorageTarget `json:"storage_targets"`
|
StorageTargets []hostStorageTarget `json:"storage_targets"`
|
||||||
Backups []hostBackup `json:"backups"` // slice 6
|
Backups []hostBackup `json:"backups"` // slice 6
|
||||||
RestoreTests []hostRestoreTest `json:"restore_tests"` // slice 6
|
RestoreTests []hostRestoreTest `json:"restore_tests"` // slice 6
|
||||||
PBSSnapshots []hostPBSSnapshot `json:"pbs_snapshots"` // slice 6 Phase B
|
PBSSnapshots []hostPBSSnapshot `json:"pbs_snapshots"` // slice 6 Phase B
|
||||||
Cloudflared struct {
|
Cloudflared struct {
|
||||||
Status string `json:"status"`
|
Status string `json:"status"`
|
||||||
} `json:"cloudflared"`
|
} `json:"cloudflared"`
|
||||||
@@ -569,6 +573,66 @@ func (h *Handler) handleAdminCreateHost(w http.ResponseWriter, r *http.Request)
|
|||||||
json.NewEncoder(w).Encode(map[string]string{"host_id": hostID, "api_key": apiKey})
|
json.NewEncoder(w).Encode(map[string]string{"host_id": hostID, "api_key": apiKey})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// escrowUploadRequest is the agent→hub wire shape for the OPAQUE PBS recovery-code escrow blob
|
||||||
|
// (slice 7, doc 03 §8a). It MUST stay in lockstep with the agent's emit struct
|
||||||
|
// (felhom-agent cmd/felhom-agent escrowUploadRequest). The hub stores the bytes and NEVER decrypts
|
||||||
|
// them (it has no recovery code).
|
||||||
|
type escrowUploadRequest struct {
|
||||||
|
BlobB64 string `json:"blob_b64"` // base64 of the opaque R-wrapped blob (ciphertext)
|
||||||
|
KeyFingerprint string `json:"key_fingerprint"` // for operator display only
|
||||||
|
Posture string `json:"posture"` // e.g. "zero_knowledge"
|
||||||
|
CreatedAt string `json:"created_at"` // RFC3339
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleHostEscrowPut stores a host's opaque escrow blob (doc 03 §8a). Authed with the PER-HOST key
|
||||||
|
// (a host may only write its own escrow; the global operator key is also accepted). The hub keeps
|
||||||
|
// the ciphertext and never opens it. Last-write-wins (rotation). No serving this slice (slice 10).
|
||||||
|
func (h *Handler) handleHostEscrowPut(w http.ResponseWriter, r *http.Request, pathHostID string) {
|
||||||
|
authHostID, _, isGlobal, ok := h.checkAuthHost(r)
|
||||||
|
if !ok {
|
||||||
|
http.Error(w, "Unauthorized", http.StatusUnauthorized)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if pathHostID == "" {
|
||||||
|
http.Error(w, "Missing host_id", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// A per-host key may only write ITS OWN escrow; the global key may write any.
|
||||||
|
if !isGlobal && authHostID != pathHostID {
|
||||||
|
http.Error(w, "Forbidden: host_id mismatch", http.StatusForbidden)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20)) // 1 MB cap; the blob is ~hundreds of bytes
|
||||||
|
if err != nil {
|
||||||
|
http.Error(w, "Bad request", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
var req escrowUploadRequest
|
||||||
|
if err := json.Unmarshal(body, &req); err != nil || req.BlobB64 == "" {
|
||||||
|
http.Error(w, "Invalid payload: blob_b64 required", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
blob, err := base64.StdEncoding.DecodeString(req.BlobB64)
|
||||||
|
if err != nil || len(blob) == 0 {
|
||||||
|
http.Error(w, "Invalid payload: blob_b64 not valid base64", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
createdAt := req.CreatedAt
|
||||||
|
if createdAt == "" {
|
||||||
|
createdAt = time.Now().UTC().Format(time.RFC3339)
|
||||||
|
}
|
||||||
|
// Store the OPAQUE bytes. No decrypt path exists — the hub cannot open this.
|
||||||
|
if err := h.store.SaveHostEscrow(pathHostID, blob, req.KeyFingerprint, req.Posture, createdAt); err != nil {
|
||||||
|
h.logger.Printf("[ERROR] Failed to store escrow for host %s: %v", pathHostID, err)
|
||||||
|
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
h.logger.Printf("[INFO] stored opaque escrow blob for host %s (%d bytes, posture=%s, fp=%s)",
|
||||||
|
pathHostID, len(blob), req.Posture, req.KeyFingerprint)
|
||||||
|
w.WriteHeader(http.StatusOK)
|
||||||
|
w.Write([]byte(`{"status":"ok"}`))
|
||||||
|
}
|
||||||
|
|
||||||
// allowedEventTypes lists all valid event_type values the Hub accepts.
|
// allowedEventTypes lists all valid event_type values the Hub accepts.
|
||||||
var allowedEventTypes = map[string]bool{
|
var allowedEventTypes = map[string]bool{
|
||||||
// Controller-pushed events
|
// Controller-pushed events
|
||||||
|
|||||||
@@ -269,6 +269,19 @@ func (s *Store) migrate() error {
|
|||||||
);
|
);
|
||||||
CREATE INDEX IF NOT EXISTS idx_host_reports_host ON host_reports(host_id, received_at DESC);
|
CREATE INDEX IF NOT EXISTS idx_host_reports_host ON host_reports(host_id, received_at DESC);
|
||||||
CREATE INDEX IF NOT EXISTS idx_host_reports_customer ON host_reports(customer_id, received_at DESC);
|
CREATE INDEX IF NOT EXISTS idx_host_reports_customer ON host_reports(customer_id, received_at DESC);
|
||||||
|
|
||||||
|
-- host_escrow (slice 7, doc 03 §8a): the OPAQUE R-wrapped PBS-key escrow blob. The hub
|
||||||
|
-- stores the ciphertext bytes against the host and NEVER decrypts them (it has no recovery
|
||||||
|
-- code). One row per host; a re-upload (rotation) is last-write-wins. Restore-mode serving
|
||||||
|
-- (handing the blob back to a re-enrolling box) is slice 10.
|
||||||
|
CREATE TABLE IF NOT EXISTS host_escrow (
|
||||||
|
host_id TEXT PRIMARY KEY,
|
||||||
|
blob BLOB NOT NULL,
|
||||||
|
key_fingerprint TEXT NOT NULL DEFAULT '',
|
||||||
|
posture TEXT NOT NULL DEFAULT '',
|
||||||
|
created_at DATETIME NOT NULL,
|
||||||
|
updated_at DATETIME NOT NULL DEFAULT (datetime('now'))
|
||||||
|
);
|
||||||
`)
|
`)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -1381,6 +1394,51 @@ func (s *Store) UpsertHost(h *Host) error {
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// HostEscrow is the opaque R-wrapped escrow blob stored for a host (doc 03 §8a). Blob is
|
||||||
|
// ciphertext the hub cannot open.
|
||||||
|
type HostEscrow struct {
|
||||||
|
HostID string
|
||||||
|
Blob []byte
|
||||||
|
KeyFingerprint string
|
||||||
|
Posture string
|
||||||
|
CreatedAt string
|
||||||
|
UpdatedAt string
|
||||||
|
}
|
||||||
|
|
||||||
|
// SaveHostEscrow stores (last-write-wins) the OPAQUE escrow blob for a host. The hub keeps the
|
||||||
|
// bytes and NEVER decrypts them — there is no decrypt path. createdAt is the agent's timestamp.
|
||||||
|
func (s *Store) SaveHostEscrow(hostID string, blob []byte, keyFingerprint, posture, createdAt string) error {
|
||||||
|
_, err := s.db.Exec(`
|
||||||
|
INSERT INTO host_escrow (host_id, blob, key_fingerprint, posture, created_at, updated_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, datetime('now'))
|
||||||
|
ON CONFLICT(host_id) DO UPDATE SET
|
||||||
|
blob = excluded.blob,
|
||||||
|
key_fingerprint = excluded.key_fingerprint,
|
||||||
|
posture = excluded.posture,
|
||||||
|
created_at = excluded.created_at,
|
||||||
|
updated_at = datetime('now')`,
|
||||||
|
hostID, blob, keyFingerprint, posture, createdAt,
|
||||||
|
)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetHostEscrow returns the stored opaque escrow for a host (nil if none). Used by tests and
|
||||||
|
// (future, slice 10) restore-mode serving. The hub returns bytes verbatim; it never decrypts.
|
||||||
|
func (s *Store) GetHostEscrow(hostID string) (*HostEscrow, error) {
|
||||||
|
var e HostEscrow
|
||||||
|
err := s.db.QueryRow(`
|
||||||
|
SELECT host_id, blob, key_fingerprint, posture, created_at, updated_at
|
||||||
|
FROM host_escrow WHERE host_id = ?`, hostID).
|
||||||
|
Scan(&e.HostID, &e.Blob, &e.KeyFingerprint, &e.Posture, &e.CreatedAt, &e.UpdatedAt)
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return &e, nil
|
||||||
|
}
|
||||||
|
|
||||||
// SaveHostReport inserts a host_reports row and bumps the host's reality columns
|
// SaveHostReport inserts a host_reports row and bumps the host's reality columns
|
||||||
// (agent_version/last_report_at/updated_at) — never the inert intent columns.
|
// (agent_version/last_report_at/updated_at) — never the inert intent columns.
|
||||||
func (s *Store) SaveHostReport(hostID, customerID string, reportJSON []byte, d HostReportDenorm) error {
|
func (s *Store) SaveHostReport(hostID, customerID string, reportJSON []byte, d HostReportDenorm) error {
|
||||||
|
|||||||
Reference in New Issue
Block a user