doc 03: slice 8A implemented — §6a local-API impl, §9 back-half row, §13 (2026-06-10)
§6a (new): the local-API implementation — stable leaf-SHA-256 pin, token->guest self-scoping (cross-guest 403), bootstrap.json contract + controller ingestion (c), baked-controller deploy (no registry cred in guest), firewall narrowing. §9 slice table: back-half = slice 8A implemented (8B quiesce / 8C de-priv split out); build-golden.sh bakes the controller. §13 + doc changelog. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -124,6 +124,34 @@ A controller can only `POST /rollback` (or snapshot/backup) **its own** guest
|
||||
token → guest and authorizes per guest, so a compromised controller's blast radius is
|
||||
**self-scoped and bounded** to its own guest.
|
||||
|
||||
### 6a. Implementation (slice 8A — implemented)
|
||||
|
||||
**Status: implemented** (agent v0.10.0 `internal/localapi`; controller v0.35.0 `internal/bootstrap`
|
||||
+ `internal/agentapi`). Grounded by `documentation/tests/slice8a-channel-deploy-spike-findings.md`
|
||||
(commit `4a81a96`). The 7 endpoints above are live; `GET /backup/due` is **thin** in 8A (the
|
||||
quiesce-on-due consumer is 8B), the rest wrap the existing slice-5/6/7 machinery.
|
||||
|
||||
- **Transport / pin.** The agent serves a **persisted self-signed leaf** bound to the host bridge IP
|
||||
on a fixed port (default `:8443`). The controller pins the **leaf-cert SHA-256** (decision:
|
||||
consistency with the agent's Proxmox/PBS cert pinning), carried in its bootstrap. The leaf is
|
||||
generated **once and persisted**, so its fingerprint is stable across agent restarts (a fresh cert
|
||||
each boot would invalidate every already-issued bootstrap pin). Defense-in-depth: the listener
|
||||
binds the **bridge IP** (not `0.0.0.0`) and a host firewall rule narrows the port to the guest
|
||||
bridge subnet (`configs/felhom-localapi-firewall.example`) — the **per-guest token stays the gate**.
|
||||
- **Token custody.** The per-guest token is minted by the back-half (§9), persisted as a **SHA-256
|
||||
hash** only (the plaintext exists transiently at mint→write-to-mount, then is discarded), in a
|
||||
durable last-write-wins map. **Self-scoping** is enforced by the token→guest map alone: the VMID is
|
||||
resolved from the token, never from a caller-supplied id; an explicit `vmid` that disagrees is
|
||||
refused (**403**) and the Proxmox op is never issued for the other guest. Absent/unknown token → 401.
|
||||
- **The bootstrap contract `(c)`.** The agent emits a stable `bootstrap.json`
|
||||
(`schema: felhom.bootstrap/v1`: customer identity, hub, and the local-API `{endpoint, fingerprint,
|
||||
token}`) into a read-only config mount; the controller **ingests it on first run and seeds its own
|
||||
`controller.yaml`, skipping setup mode** (idempotent — never clobbers an existing config; fail-safe
|
||||
— a malformed/absent bootstrap stays in setup). The agent emits the contract; the controller owns
|
||||
the translation — they stay decoupled (no shared config schema). **No registry credential ever
|
||||
enters a guest**: the controller image is **baked into the golden** (§9), so deploy does no
|
||||
`docker login`/`pull`.
|
||||
|
||||
## 7. Storage manifest & reconciliation
|
||||
|
||||
The manifest is the load-bearing contract. It absorbs the **persisted** disk-state fields that
|
||||
@@ -307,7 +335,7 @@ identity" is shorthand for two different operations:
|
||||
NOT auto-regenerate host keys after a restore, so the golden carries the regeneration, keeping
|
||||
the agent host-side-only). It then receives a **fresh** controller identity (host-id, local
|
||||
token, hub channel), **fresh restic repo identity**, and a fresh tunnel association — all minted
|
||||
in the back half (slice 8).
|
||||
in the back half (slice 8A — implemented).
|
||||
- **Guest-loss DR (customer backup) → preserve continuity identity, reset only what would
|
||||
collide.** The restored guest must *continue* the customer's world: **keep** the restic repo
|
||||
identity (resetting it orphans the existing backup chain — a silent data-continuity bug), the
|
||||
@@ -332,11 +360,13 @@ this path — bring up + reattach external storage and it is whole. This is full
|
||||
|
||||
| Capability | Slice | Status |
|
||||
|---|---|---|
|
||||
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment |
|
||||
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit; **now also bakes the controller image + a controller-bootstrap unit**, slice 8A); golden archived at enrollment |
|
||||
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) |
|
||||
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) |
|
||||
| PBS recovery-code escrow **creation** + **hub opaque storage** (§8a) | **7** | **implemented** (agent v0.9.0 `internal/escrow`; hub v0.8.0 `PUT /hosts/{id}/escrow`) |
|
||||
| Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) |
|
||||
| **Local API** server (§6) + provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8A** | **implemented** (agent v0.10.0 `internal/localapi` + `internal/provision`; controller v0.35.0 `internal/bootstrap` + `internal/agentapi`). The controller image is **baked into the golden** (no registry cred in any guest); the back-half mints the token, writes a 0600 `bootstrap.json` to a `chown 100000:100000` config mount, and `pct set`-attaches it read-only; the golden's baked unit deploys the controller, which ingests the bootstrap, comes up configured, and reaches the agent over the bridge (leaf-pin + token). Validated live end-to-end on the demo. |
|
||||
| **Quiesced app-consistent backup** (`/backup/due`-driven stack-stop) | **8B** | deferred — `/backup/due` is thin in 8A; the controller quiesce-then-`POST /backup` loop is 8B |
|
||||
| **Controller de-privileging** (retire the disk-execution subsystem; new customer disk endpoints behind the slice-4 data-bearing classifier) | **8C** | deferred |
|
||||
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) |
|
||||
| PBS escrow **consumption** (recover `K` on a new box) | **10** | deferred — exercised by host-loss DR |
|
||||
| Golden base refresh cadence + fleet versioning | post-launch | operational, non-blocking (§13) |
|
||||
@@ -386,10 +416,13 @@ argument for §3's root-minimization and a small, auditable agent.
|
||||
|
||||
Resolved here: tunnel placement (host, agent-managed, own systemd service), the
|
||||
reconcile-vs-jobs fork (hybrid, gated by reversibility), agent process model, self-update
|
||||
ownership, the local-API surface, the storage-manifest schema, **provision-by-restore**, the
|
||||
**provision/DR slice boundary** (7 front-half + guest-loss DR + escrow creation; 8 provisioning
|
||||
back-half; 10 host-loss DR + escrow consumption — §9 table), the **PBS recovery-code escrow
|
||||
design** (§8a), and the **root-vs-API boundary** (Phase 3, B3).
|
||||
ownership, the local-API surface (**implemented, slice 8A — §6a**), the storage-manifest schema,
|
||||
**provision-by-restore**, the **provision/DR slice boundary** (7 front-half + guest-loss DR +
|
||||
escrow creation; **8A provisioning back-half + local API — implemented**; 8B quiesced backup; 8C
|
||||
controller de-privileging; 10 host-loss DR + escrow consumption — §9 table), the **PBS
|
||||
recovery-code escrow design** (§8a), and the **root-vs-API boundary** (Phase 3, B3 — the slice-8A
|
||||
back-half's host-side `chown`/`pct set` bind-mount is a deliberate, narrow addition OUTSIDE the
|
||||
API token, in `internal/provision`, not the 3-exception `proxmox.Privileged` fence).
|
||||
|
||||
Still open:
|
||||
|
||||
@@ -413,6 +446,19 @@ This doc hands the implementation three contracts it was waiting on:
|
||||
|
||||
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
|
||||
|
||||
### Slice-8A implemented: local API + provisioning back-half (2026-06-10)
|
||||
- NEW §6a: the **local-API implementation** (agent v0.10.0 `internal/localapi`; controller v0.35.0
|
||||
`internal/bootstrap` + `internal/agentapi`) — persisted self-signed leaf with a **stable
|
||||
leaf-SHA-256 pin**, the **token→guest self-scoping** (explicit cross-guest id → 403, op never
|
||||
issued), the stable **`bootstrap.json` contract + controller ingestion `(c)`** (seed
|
||||
`controller.yaml`, skip setup; idempotent + fail-safe), and the **baked-controller deploy** (no
|
||||
registry credential in any guest). Firewall narrowing = defense-in-depth; the token stays the gate.
|
||||
- §9: the provisioning **back half** row is now **slice 8A — implemented** (split from the old "8");
|
||||
`build-golden.sh` now **bakes the controller + a bootstrap unit**; quiesced backup → 8B, controller
|
||||
de-privileging → 8C. The host-side `chown`/`pct set` bind-mount is a deliberate narrow surface in
|
||||
`internal/provision` (NOT the 3-exception `proxmox.Privileged` fence). Validated live end-to-end.
|
||||
- §13 updated accordingly.
|
||||
|
||||
### Slice-7 scope + escrow design (2026-06-09)
|
||||
- §9 rewritten: the bring-up primitive is a **shared front half only** — identity-reset policy is
|
||||
**scenario-specific** (provision = fresh everything; guest-loss DR = preserve restic/tunnel/hub
|
||||
|
||||
Reference in New Issue
Block a user