slice 8A spike: agent<->controller channel + controller deploy plumbing findings
Doc-only spike (no hub code change). Validated on demo-felhom (guest 8200, torn down): (1) guest->host HTTPS over vmbr0 with fingerprint-pin + bearer + self-scoping (200/401/403, wrong-pin TLS fail, no firewall rule needed); (2) config-mount + golden-baked bootstrap unit deploys+runs the controller (docker login/pull/run v0.34.0) with no pct exec. Verdict: GO to 8A spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -4,51 +4,49 @@
|
||||
|
||||
---
|
||||
|
||||
# REPORT — Slice 7 close-out: PBS escrow — hub opaque storage + doc 03 §8a (v0.8.0) (2026-06-10)
|
||||
# REPORT — Slice 8 Phase A spike: agent↔controller channel + controller deploy plumbing (2026-06-10)
|
||||
|
||||
## Outcome
|
||||
## Type
|
||||
|
||||
The `felhom.eu` half of `TASK — Slice 7 close-out: PBS recovery-code escrow`. The agent
|
||||
(felhom-agent v0.9.0) creates an **opaque** `R`-wrapped copy of the PBS key in the zero-knowledge
|
||||
default; this slice adds the **hub opaque storage** for that blob and rewrites **doc 03 §8a** into a
|
||||
full key-custody posture model. The wrap→recover→restore round-trip was proven on a throwaway first
|
||||
(`documentation/tests/slice7-escrow-spike-findings.md`).
|
||||
**SPIKE** (CC-executed on the demo). Doc-only deliverable — **no hub/code change, no version bump,
|
||||
no deploy**. Probes the two unvalidated foundations of slice 8 *before* speccing the local API
|
||||
(doc §6) and the provisioning back-half. Findings:
|
||||
[documentation/tests/slice8a-channel-deploy-spike-findings.md](documentation/tests/slice8a-channel-deploy-spike-findings.md).
|
||||
|
||||
## What landed (hub v0.8.0)
|
||||
## What was proven on `demo-felhom`
|
||||
|
||||
- **`PUT /api/v1/hosts/{host_id}/escrow`** (`internal/api/handler.go`) — per-host-key authed (a host
|
||||
writes only its own escrow; global operator key also accepted). Decodes the base64 blob and stores
|
||||
the **opaque bytes verbatim** against the host. The hub **never decrypts** — there is no decrypt
|
||||
path; it has no recovery code. Rotation is last-write-wins.
|
||||
- **`host_escrow`** table + `SaveHostEscrow`/`GetHostEscrow` (`internal/store`). Blob is ciphertext.
|
||||
- **Contract:** `escrowUploadRequest` mirrors the agent's emit struct (`blob_b64`, `key_fingerprint`,
|
||||
`posture`, `created_at`); a key-set test in each repo guards drift.
|
||||
- **Tests:** stores the blob byte-identical; rotation last-write-wins; 401 (absent/wrong key), 403
|
||||
(host writing another host's escrow), 400 (bad base64); contract key-set. `go test ./...` green.
|
||||
Spike guest **8200** was produced by the **real slice-7 bring-up job** (`felhom-agent v0.9.0`,
|
||||
`-mode provision`) from the golden archive — a golden, link-up, Docker-29.5.3 guest in 8s, fresh MAC.
|
||||
Torn down at the end; demo left as found (only pre-existing 9001/9999 remain; golden archive intact).
|
||||
|
||||
## Documentation (doc 03 §8a)
|
||||
### 1. The channel (guest → host HTTPS over `vmbr0`, fingerprint-pinned) — **PASS**
|
||||
A throwaway self-signed HTTPS stub on `192.168.0.162:8443`, hit from **inside guest 8200**:
|
||||
- correct pin + guest-8200 token → **200**; no token → **401**; **other-guest** token → **403**
|
||||
(self-scoping holds); **wrong pin → hard TLS failure** (curl exit 90 — the pin gates the handshake).
|
||||
- **No firewall rule needed** (PVE firewall off; guest and host share the `vmbr0` /24, direct route).
|
||||
- Security note: the local-API binds the host **LAN IP** → reachable by anything on the LAN; **auth
|
||||
is the only gate** (it held). Both pin forms captured (SPKI + leaf-cert SHA-256) for the 8A choice.
|
||||
|
||||
Rewrote §8a into the **key-custody posture model**: the **separation principle** (reading data needs
|
||||
both chunks *and* a key; zero-knowledge holds while Felhom never holds both), the **topology matrix**
|
||||
(data location × key custody → who can read; the one dangerous cell flagged), the **default**
|
||||
(Felhom storage + customer-only key; `R` printed durably), the **anti-lockout ladder** ((b) wrapped
|
||||
offline copy → (a) raw paperkey → Felhom-holds-a-key), **SSH-for-support is a separate grant** (not
|
||||
coupled to key custody), **why zero-knowledge stays default** (breach + legal compellability), and
|
||||
the **integrity caveat** for self-hosted-data postures. Corrected the storage-slice note: hub opaque
|
||||
storage is **slice 7** (this task); only restore-mode **serving** is slice 10. §9 slice table + §13
|
||||
updated.
|
||||
### 2. The deploy plumbing (no `pct exec` — config mount + golden-baked unit) — **PASS**
|
||||
The F3 principle end-to-end: agent stays host-side, populates a **read-only config mount**
|
||||
(`/etc/felhom-bootstrap`, bind-mount hotplugged live); a **golden-baked oneshot** reads it →
|
||||
`docker login` (token via `--password-stdin`) → `docker pull …/felhom-controller:v0.34.0` →
|
||||
`docker run`. The controller came up **Up (healthy)**; an in-guest process read the bootstrap token
|
||||
from the mount and reached the host `/storage` → **200**. **No `pct exec` used.**
|
||||
|
||||
## Live validation
|
||||
## Gotchas carried into 8A / the back-half
|
||||
1. **Unprivileged-LXC uid mapping** — the agent must `chown 100000:100000` files it writes into the
|
||||
mount (else the guest reads them as `nobody`; the secret config is inaccessible).
|
||||
2. **Registry-cred scope** — the bootstrap currently carries the shared `admin` pull token; production
|
||||
wants a narrow, read-only, ideally per-guest/short-lived registry token (mount is the right channel).
|
||||
3. **Controller config contract** — `bootstrap.json` ≠ the controller's `controller.yaml`; the
|
||||
controller boots to *setup mode* until 8A emits the real config format/path (or the unit translates).
|
||||
4. **Pin form** (SPKI vs leaf-cert SHA-256) and **LAN exposure** narrowing — 8A/back-half decisions.
|
||||
|
||||
After the v0.8.0 deploy, the demo agent's `--selftest=escrow-create -upload` PUT the opaque blob and
|
||||
the hub stored it against the host; the stored bytes are **ciphertext** (not the key). The recovery
|
||||
code `R` is never sent to or stored by the hub. *(No `R`/`K` value appears in any committed file.)*
|
||||
## Verdict — **GO** to spec 8A (local-API server + the 7 §6 endpoints) and the provisioning back-half.
|
||||
|
||||
## Deferred / security
|
||||
|
||||
Restore-mode serving + consumption → slice 10. The hub holds ciphertext only — possessing the blob
|
||||
does not let Felhom read customer data (separation principle). No secrets committed.
|
||||
|
||||
## Deploy (GitOps)
|
||||
|
||||
Build+push `felhom-hub:v0.8.0` → bump `manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app.
|
||||
## Secret handling (held)
|
||||
Test local-API tokens + the registry pull cred kept in `0600` host files, referenced by location,
|
||||
never logged/committed; the stub never logged the `Authorization` header; `docker login` via
|
||||
`--password-stdin`. No real per-guest token or registry cred in git. All scratch shredded on teardown.
|
||||
No throwaway registry token was minted (the existing `gitea-creds` admin cred was used by reference).
|
||||
|
||||
Reference in New Issue
Block a user