doc 03: slice 8A implemented — §6a local-API impl, §9 back-half row, §13 (2026-06-10)

§6a (new): the local-API implementation — stable leaf-SHA-256 pin, token->guest
self-scoping (cross-guest 403), bootstrap.json contract + controller ingestion
(c), baked-controller deploy (no registry cred in guest), firewall narrowing.
§9 slice table: back-half = slice 8A implemented (8B quiesce / 8C de-priv split
out); build-golden.sh bakes the controller. §13 + doc changelog.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-10 10:02:11 +02:00
parent 4a81a96678
commit e436b61368
+53 -7
View File
@@ -124,6 +124,34 @@ A controller can only `POST /rollback` (or snapshot/backup) **its own** guest
token → guest and authorizes per guest, so a compromised controller's blast radius is
**self-scoped and bounded** to its own guest.
### 6a. Implementation (slice 8A — implemented)
**Status: implemented** (agent v0.10.0 `internal/localapi`; controller v0.35.0 `internal/bootstrap`
+ `internal/agentapi`). Grounded by `documentation/tests/slice8a-channel-deploy-spike-findings.md`
(commit `4a81a96`). The 7 endpoints above are live; `GET /backup/due` is **thin** in 8A (the
quiesce-on-due consumer is 8B), the rest wrap the existing slice-5/6/7 machinery.
- **Transport / pin.** The agent serves a **persisted self-signed leaf** bound to the host bridge IP
on a fixed port (default `:8443`). The controller pins the **leaf-cert SHA-256** (decision:
consistency with the agent's Proxmox/PBS cert pinning), carried in its bootstrap. The leaf is
generated **once and persisted**, so its fingerprint is stable across agent restarts (a fresh cert
each boot would invalidate every already-issued bootstrap pin). Defense-in-depth: the listener
binds the **bridge IP** (not `0.0.0.0`) and a host firewall rule narrows the port to the guest
bridge subnet (`configs/felhom-localapi-firewall.example`) — the **per-guest token stays the gate**.
- **Token custody.** The per-guest token is minted by the back-half (§9), persisted as a **SHA-256
hash** only (the plaintext exists transiently at mint→write-to-mount, then is discarded), in a
durable last-write-wins map. **Self-scoping** is enforced by the token→guest map alone: the VMID is
resolved from the token, never from a caller-supplied id; an explicit `vmid` that disagrees is
refused (**403**) and the Proxmox op is never issued for the other guest. Absent/unknown token → 401.
- **The bootstrap contract `(c)`.** The agent emits a stable `bootstrap.json`
(`schema: felhom.bootstrap/v1`: customer identity, hub, and the local-API `{endpoint, fingerprint,
token}`) into a read-only config mount; the controller **ingests it on first run and seeds its own
`controller.yaml`, skipping setup mode** (idempotent — never clobbers an existing config; fail-safe
— a malformed/absent bootstrap stays in setup). The agent emits the contract; the controller owns
the translation — they stay decoupled (no shared config schema). **No registry credential ever
enters a guest**: the controller image is **baked into the golden** (§9), so deploy does no
`docker login`/`pull`.
## 7. Storage manifest & reconciliation
The manifest is the load-bearing contract. It absorbs the **persisted** disk-state fields that
@@ -307,7 +335,7 @@ identity" is shorthand for two different operations:
NOT auto-regenerate host keys after a restore, so the golden carries the regeneration, keeping
the agent host-side-only). It then receives a **fresh** controller identity (host-id, local
token, hub channel), **fresh restic repo identity**, and a fresh tunnel association — all minted
in the back half (slice 8).
in the back half (slice 8A — implemented).
- **Guest-loss DR (customer backup) → preserve continuity identity, reset only what would
collide.** The restored guest must *continue* the customer's world: **keep** the restic repo
identity (resetting it orphans the existing backup chain — a silent data-continuity bug), the
@@ -332,11 +360,13 @@ this path — bring up + reattach external storage and it is whole. This is full
| Capability | Slice | Status |
|---|---|---|
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment |
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit; **now also bakes the controller image + a controller-bootstrap unit**, slice 8A); golden archived at enrollment |
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) |
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) |
| PBS recovery-code escrow **creation** + **hub opaque storage** (§8a) | **7** | **implemented** (agent v0.9.0 `internal/escrow`; hub v0.8.0 `PUT /hosts/{id}/escrow`) |
| Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) |
| **Local API** server (§6) + provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8A** | **implemented** (agent v0.10.0 `internal/localapi` + `internal/provision`; controller v0.35.0 `internal/bootstrap` + `internal/agentapi`). The controller image is **baked into the golden** (no registry cred in any guest); the back-half mints the token, writes a 0600 `bootstrap.json` to a `chown 100000:100000` config mount, and `pct set`-attaches it read-only; the golden's baked unit deploys the controller, which ingests the bootstrap, comes up configured, and reaches the agent over the bridge (leaf-pin + token). Validated live end-to-end on the demo. |
| **Quiesced app-consistent backup** (`/backup/due`-driven stack-stop) | **8B** | deferred — `/backup/due` is thin in 8A; the controller quiesce-then-`POST /backup` loop is 8B |
| **Controller de-privileging** (retire the disk-execution subsystem; new customer disk endpoints behind the slice-4 data-bearing classifier) | **8C** | deferred |
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) |
| PBS escrow **consumption** (recover `K` on a new box) | **10** | deferred — exercised by host-loss DR |
| Golden base refresh cadence + fleet versioning | post-launch | operational, non-blocking (§13) |
@@ -386,10 +416,13 @@ argument for §3's root-minimization and a small, auditable agent.
Resolved here: tunnel placement (host, agent-managed, own systemd service), the
reconcile-vs-jobs fork (hybrid, gated by reversibility), agent process model, self-update
ownership, the local-API surface, the storage-manifest schema, **provision-by-restore**, the
**provision/DR slice boundary** (7 front-half + guest-loss DR + escrow creation; 8 provisioning
back-half; 10 host-loss DR + escrow consumption — §9 table), the **PBS recovery-code escrow
design** (§8a), and the **root-vs-API boundary** (Phase 3, B3).
ownership, the local-API surface (**implemented, slice 8A — §6a**), the storage-manifest schema,
**provision-by-restore**, the **provision/DR slice boundary** (7 front-half + guest-loss DR +
escrow creation; **8A provisioning back-half + local API — implemented**; 8B quiesced backup; 8C
controller de-privileging; 10 host-loss DR + escrow consumption — §9 table), the **PBS
recovery-code escrow design** (§8a), and the **root-vs-API boundary** (Phase 3, B3 — the slice-8A
back-half's host-side `chown`/`pct set` bind-mount is a deliberate, narrow addition OUTSIDE the
API token, in `internal/provision`, not the 3-exception `proxmox.Privileged` fence).
Still open:
@@ -413,6 +446,19 @@ This doc hands the implementation three contracts it was waiting on:
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
### Slice-8A implemented: local API + provisioning back-half (2026-06-10)
- NEW §6a: the **local-API implementation** (agent v0.10.0 `internal/localapi`; controller v0.35.0
`internal/bootstrap` + `internal/agentapi`) — persisted self-signed leaf with a **stable
leaf-SHA-256 pin**, the **token→guest self-scoping** (explicit cross-guest id → 403, op never
issued), the stable **`bootstrap.json` contract + controller ingestion `(c)`** (seed
`controller.yaml`, skip setup; idempotent + fail-safe), and the **baked-controller deploy** (no
registry credential in any guest). Firewall narrowing = defense-in-depth; the token stays the gate.
- §9: the provisioning **back half** row is now **slice 8A — implemented** (split from the old "8");
`build-golden.sh` now **bakes the controller + a bootstrap unit**; quiesced backup → 8B, controller
de-privileging → 8C. The host-side `chown`/`pct set` bind-mount is a deliberate narrow surface in
`internal/provision` (NOT the 3-exception `proxmox.Privileged` fence). Validated live end-to-end.
- §13 updated accordingly.
### Slice-7 scope + escrow design (2026-06-09)
- §9 rewritten: the bring-up primitive is a **shared front half only** — identity-reset policy is
**scenario-specific** (provision = fresh everything; guest-loss DR = preserve restic/tunnel/hub