docs: slice 10C escrow consumption productionized (doc 03 §8a/§9)

Agent-only implementation (felhom-agent v0.17.0 escrow.Consume); no hub code
change. 10C done; 10D is the last piece of slice 10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-10 22:18:10 +02:00
parent 680b1592c5
commit a98210ae00
2 changed files with 32 additions and 32 deletions
+18 -31
View File
@@ -4,41 +4,28 @@
--- ---
# REPORT — Slice 10B (hub half): signed-op job completion (hub v0.10.0) (2026-06-10) # REPORT — Slice 10C (docs only): escrow consumption productionized (2026-06-10)
## Type ## Type
TASK (CC-implemented). The hub half of slice 10B. Pairs with `felhom-agent` v0.16.0 (the signing CLI Documentation update for **slice 10C** (implementation is **agent-only**: `felhom-agent` v0.17.0
+ verify-and-execute machinery + the storage-wipe consumer). `escrow.Consume`). **No hub code change** — 10C reads a restore directive it is given; 10D wires the
hub side (serving the blob + expected fingerprint + PBS connection, prompting for R).
## What changed (hub) ## What changed (doc 03 — host-agent)
Small by design — the hub stores + serves the operator-signed blobs **opaquely** (it holds no signing - **§8a**: escrow **consumption** is now a real, tested path (`escrow.Consume` = **Unwrap →
key, can neither forge nor open them; the agent verifies + executes). 10B adds the **completion** path. fingerprint-gate → install**), replacing the throwaway spike harness. The spike findings are baked
in: F-C2 (install the raw key where the restore reads it), **F-C3** (wrong R fails closed), **F-C4**
### Store + API (fingerprint-gate *before* any multi-GB restore), **F-C6** (blob read-only/retryable, `K` never
- **`DELETE /api/v1/hosts/{host_id}/jobs/{job_id}`** (per-host key, **self-scoped**; global key may mutated). **Zero-knowledge holds end-to-end**: the hub serves the blob + expected fingerprint + PBS
clear any) — the agent calls it after executing OR terminally rejecting a job. Idempotent. Store: connection; **R comes from the customer by hand, never the hub** — a hub compromise alone cannot
`DeleteSignedJob`. decrypt.
- Reused unchanged from 10A: `POST /admin/hosts/{id}/jobs` (operator enqueue), `GET /hosts/{id}/jobs` - **§9 slice table**: **10C done**. **10D** (DR capstone — re-enroll in restore mode, serve the
(agent fetch), `has_signed_ops` envelope flag. The signed blob stays opaque on the wire (a base64 directive, consume, restore guests + identity, reuse the 10B gate for restore-overwrite, the
`{op_blob_b64, sig_armored}` envelope) — **no jobs-wire golden change**. re-enrollment-auth fork) is the last piece of slice 10.
## Tests (green)
- `DELETE …/jobs/{id}` self-scoped (host A cannot clear host B's job → 403) + idempotent.
## Docs
- Doc 03 §4 (the operator-signed path is LIVE: gate → pending op → offline signature → verify
(pinned key / nonce-burn / expiry / host + durable-id anti-retarget) → execute; key floor: not in
the hub, not in the agent), §6 (the 8C data-bearing wipe now completes via 10B), §9 slice table
(**10B done**; 10C escrow-consumption spike-validated, 10D DR capstone pending).
## Security framing (why the hub stays minimal)
The hub is deliberately a dumb queue here: it cannot forge a signed op (no key) and the agent never
trusts a queued blob until the pinned-key verify passes. A **compromised hub queuing a forged blob is
rejected** by the agent (tested in felhom-agent). That is the whole point of the offline-key design.
## Pending ## Pending
- Build + deploy hub v0.10.0 (+ agent v0.16.0) and live-validate the full loop on the demo: a
data-bearing wipe → `pending_signature` → offline-signed → queued → agent verifies + wipes the - Live validation runs against the demo (agent v0.17.0): create escrow → `Consume` → restore real
device; replay + non-pinned-key rejected. data with the consumed key; wrong R → clean failure, nothing installed; live `K` byte-unchanged.
+14 -1
View File
@@ -422,7 +422,7 @@ this path — bring up + reattach external storage and it is whole. This is full
| **Host metrics to the controller** (`GET /host/metrics` — the customer host-health view) | **9** | **implemented** (agent v0.14.0: `GET /host/metrics` reuses the slice-4 collector + a new CPU/chassis-temp collector `internal/hub/cputemp.go`, graceful-null; the shared `HostMetrics` gains `cpu_temp_c` so the hub report carries it too — cross-repo golden updated; controller v0.39.0: agentapi `HostMetrics()` + a thin `/api/host-metrics` proxy + the monitoring page's host-health card). **Host-wide, token-authed, fresh** (not the 15-min hub snapshot). **Assumption: one customer per host** (the home-server model) — host-wide CPU/mem would leak cross-customer load on a multi-customer host; revisit then. Out of scope: multi-tenant metric filtering; historical/time-series storage (this is a live snapshot). | | **Host metrics to the controller** (`GET /host/metrics` — the customer host-health view) | **9** | **implemented** (agent v0.14.0: `GET /host/metrics` reuses the slice-4 collector + a new CPU/chassis-temp collector `internal/hub/cputemp.go`, graceful-null; the shared `HostMetrics` gains `cpu_temp_c` so the hub report carries it too — cross-repo golden updated; controller v0.39.0: agentapi `HostMetrics()` + a thin `/api/host-metrics` proxy + the monitoring page's host-health card). **Host-wide, token-authed, fresh** (not the 15-min hub snapshot). **Assumption: one customer per host** (the home-server model) — host-wide CPU/mem would leak cross-customer load on a multi-customer host; revisit then. Out of scope: multi-tenant metric filtering; historical/time-series storage (this is a live snapshot). |
| **Hub desired-state serving** (the "Down" channel) — store + serve per-host desired-state, bump `desired_generation`, signed-jobs queue + `has_signed_ops`; agent activates the envelope + a hub-backed provider (benign reconciled, destructive gated pending) | **10A** | **implemented** (hub v0.9.0: `PUT /admin/hosts/{id}/desired-state` bumps the generation, `GET /hosts/{id}/desired-state` + `/jobs` self-scoped, `signed_jobs` queue; agent v0.15.0: `ControlEnvelope` fields live, `Client.FetchDesiredState`, `internal/desired` Syncer + `reconcile.CachingProvider` feeding the engine — an explicit guest `decommission` is the destructive delta, gated `pending_signature`). Serves to already-authenticated hosts only; desired-state stored opaquely (agent owns the schema). Cross-repo golden (envelope + desired-state) byte-identical. | | **Hub desired-state serving** (the "Down" channel) — store + serve per-host desired-state, bump `desired_generation`, signed-jobs queue + `has_signed_ops`; agent activates the envelope + a hub-backed provider (benign reconciled, destructive gated pending) | **10A** | **implemented** (hub v0.9.0: `PUT /admin/hosts/{id}/desired-state` bumps the generation, `GET /hosts/{id}/desired-state` + `/jobs` self-scoped, `signed_jobs` queue; agent v0.15.0: `ControlEnvelope` fields live, `Client.FetchDesiredState`, `internal/desired` Syncer + `reconcile.CachingProvider` feeding the engine — an explicit guest `decommission` is the destructive delta, gated `pending_signature`). Serves to already-authenticated hosts only; desired-state stored opaquely (agent owns the schema). Cross-repo golden (envelope + desired-state) byte-identical. |
| **Signed-op execution** (verify + run the gated destructive op) | **10B** | **implemented** (agent v0.16.0: `cmd/felhom-opsign` offline signing CLI + `internal/signedjobs` runner/WipeExecutor + `internal/storage` durable-device resolution; hub v0.10.0: `DELETE /hosts/{id}/jobs/{job_id}` completion). Verify → durable nonce-burn → execute → clear; pinned-key (multi-key rotation, trusted path), host + **durable-id** anti-retarget, 8C re-inspect. Closes the 8C data-bearing-wipe gap. Other destructive executors (guest_destroy, decommission, restore-overwrite → 10D) reuse the same gate+runner machinery. | | **Signed-op execution** (verify + run the gated destructive op) | **10B** | **implemented** (agent v0.16.0: `cmd/felhom-opsign` offline signing CLI + `internal/signedjobs` runner/WipeExecutor + `internal/storage` durable-device resolution; hub v0.10.0: `DELETE /hosts/{id}/jobs/{job_id}` completion). Verify → durable nonce-burn → execute → clear; pinned-key (multi-key rotation, trusted path), host + **durable-id** anti-retarget, 8C re-inspect. Closes the 8C data-bearing-wipe gap. Other destructive executors (guest_destroy, decommission, restore-overwrite → 10D) reuse the same gate+runner machinery. |
| **PBS escrow consumption** (recover `K` on a new box) | **10C** | **spike validated** (2026-06-10, `documentation/tests/slice10-escrow-consumption-spike-findings.md` — recover-from-`(blob,R)` on a key-less box + real-data restore proven, GO). Productionizing the consumption path is 10C; exercised by host-loss DR (10D). | | **PBS escrow consumption** (recover `K` on a new box) | **10C** | **implemented** (agent v0.17.0: `escrow.Consume` = Unwrap → fingerprint-gate → atomic install; spike-proven crypto + real-data restore productionized; `--selftest=escrow-consume`). Zero-knowledge holds (hub serves all but R). Spike findings: `documentation/tests/slice10-escrow-consumption-spike-findings.md`. The four inputs are sourced from the hub directive in 10D. |
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive (the `restore_directive` field exists in 10A's desired-state, consumed here) | **10D** | deferred — the DR capstone; consumes 10A serving + 10C escrow consumption + re-enrollment authorization | | **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive (the `restore_directive` field exists in 10A's desired-state, consumed here) | **10D** | deferred — the DR capstone; consumes 10A serving + 10C escrow consumption + re-enrollment authorization |
| Golden base refresh cadence + fleet versioning | post-launch | operational, non-blocking (§13) | | Golden base refresh cadence + fleet versioning | post-launch | operational, non-blocking (§13) |
@@ -501,6 +501,19 @@ This doc hands the implementation three contracts it was waiting on:
## Changelog — design-review + Phase-3 fold-in (2026-06-08) ## Changelog — design-review + Phase-3 fold-in (2026-06-08)
### Slice-10C implemented — escrow consumption (productionized) (2026-06-10)
- §8a: escrow **consumption** is now a real, tested path (`escrow.Consume`): **Unwrap → fingerprint-
gate → install**. The throwaway 10C spike harness is gone; the spike's findings are baked in (F-C2
install = place the raw `kdf=none` key where the restore reads it; **F-C3** wrong-R fails closed;
**F-C4** fingerprint-gate BEFORE any multi-GB restore; **F-C6** the blob is read-only/retryable, `K`
never mutated). **Zero-knowledge holds end-to-end through consumption**: the hub serves the blob +
expected fingerprint + PBS connection; **R comes from the customer by hand, never the hub** — a hub
compromise alone still cannot decrypt. The four inputs are PARAMETERS (standalone-testable);
`--selftest=escrow-consume` invokes the real path live (R via env, never a flag). Status:
implemented (agent v0.17.0; **no hub change** — 10D wires the hub directive that supplies inputs
1/3/4 and prompts for R).
- §9 slice table: **10C done**. **10D** (DR capstone) is the last piece of slice 10.
### Slice-10B implemented — operator-signed destructive completion (offline key + signing CLI) (2026-06-10) ### Slice-10B implemented — operator-signed destructive completion (offline key + signing CLI) (2026-06-10)
- §4: the **operator-signed path is LIVE**. gate → pending op (the agent surfaces the bound intent: - §4: the **operator-signed path is LIVE**. gate → pending op (the agent surfaces the bound intent:
op + target + params on **durable** ids) → the operator **signs OFFLINE** (`cmd/felhom-opsign`, op + target + params on **durable** ids) → the operator **signs OFFLINE** (`cmd/felhom-opsign`,