diff --git a/documentation/architecture/03-host-agent.md b/documentation/architecture/03-host-agent.md index a6ef56b..92cbc80 100644 --- a/documentation/architecture/03-host-agent.md +++ b/documentation/architecture/03-host-agent.md @@ -259,10 +259,15 @@ the *archive* and the *back half*, **and in identity policy** (below). **Identity reset is scenario-specific — this is a correctness boundary, not a detail.** "Reset identity" is shorthand for two different operations: -- **Provision (golden base) → fresh identity, everything.** A provisioned guest is new: regenerate - MAC, hostname, **`/etc/machine-id`** (a duplicate breaks journald/DHCP/systemd), **SSH host - keys**, and it receives a **fresh** controller identity (host-id, local token, hub channel), - **fresh restic repo identity**, and a fresh tunnel association — all minted in the back half. +- **Provision (golden base) → fresh identity, everything.** A provisioned guest is new: reset + MAC + hostname **host-side via the token config** (the agent does NOT touch guest internals), + while **`/etc/machine-id`** (a duplicate breaks journald/DHCP/systemd) and **SSH host keys** + regenerate **guest-side on first boot** — machine-id by systemd for free, host keys by a baked, + Condition-gated `felhom-regen-hostkeys.service` unit in the golden (the F3 decision: Debian does + NOT auto-regenerate host keys after a restore, so the golden carries the regeneration, keeping + the agent host-side-only). It then receives a **fresh** controller identity (host-id, local + token, hub channel), **fresh restic repo identity**, and a fresh tunnel association — all minted + in the back half (slice 8). - **Guest-loss DR (customer backup) → preserve continuity identity, reset only what would collide.** The restored guest must *continue* the customer's world: **keep** the restic repo identity (resetting it orphans the existing backup chain — a silent data-continuity bug), the @@ -272,9 +277,11 @@ identity" is shorthand for two different operations: exactly this reason); in a true total guest-loss the original is gone, so the MAC can be kept to preserve DHCP reservations. The agent decides MAC handling from the scenario, not a fixed rule. -The exact reset set is being pinned empirically by the slice-7 bring-up spike (live, link-up, -which the slice-6 restore-test never did — it boots link-down precisely because identity reset is -slice 7). +The exact reset set was pinned empirically by the slice-7 bring-up spike (live, link-up — +`documentation/tests/slice7-bringup-spike-findings.md`, commit `3342993`) and **implemented in the +unified bring-up reconcile job** (agent v0.8.0, `internal/reconcile/bringup.go`): F1 — a restore +preserves the archived MAC, so provision reset is unconditional (`PUT net0` with `hwaddr` omitted); +F3 — host keys via the baked golden unit, not an agent guest-internal op. **Guest loss (slice 7).** Agent restores G from the fastest surviving tier (snapshot → local → PBS) and applies the **DR identity policy** above so the restored guest rejoins cleanly. The @@ -285,9 +292,9 @@ this path — bring up + reattach external storage and it is whole. This is full | Capability | Slice | Status | |---|---|---| -| Golden base image build (root@pam, at enrollment) | **7** | spike → build | -| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | spike → spec → implement | -| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | in scope | +| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment | +| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) | +| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) | | PBS recovery-code escrow **creation** (§8a) | **7** | designed (§8a); implement | | Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) | | **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) |