doc(03-host-agent): slice-7 bring-up front half + golden host-key unit implemented

§9: the provision front half, guest-loss DR front half, and golden recipe are now
implemented (agent v0.8.0, internal/reconcile/bringup.go; configs/build-golden.sh).
Identity reset settled + implemented: provision resets MAC (unconditional, F1) +
hostname host-side; machine-id + SSH host keys regenerate guest-side (systemd + the
baked first-boot felhom-regen-hostkeys unit, F3) — agent stays host-side-only. Slice
mapping table statuses updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-09 21:37:54 +02:00
parent 33429933af
commit 15c4728e2c
+17 -10
View File
@@ -259,10 +259,15 @@ the *archive* and the *back half*, **and in identity policy** (below).
**Identity reset is scenario-specific — this is a correctness boundary, not a detail.** "Reset
identity" is shorthand for two different operations:
- **Provision (golden base) → fresh identity, everything.** A provisioned guest is new: regenerate
MAC, hostname, **`/etc/machine-id`** (a duplicate breaks journald/DHCP/systemd), **SSH host
keys**, and it receives a **fresh** controller identity (host-id, local token, hub channel),
**fresh restic repo identity**, and a fresh tunnel association — all minted in the back half.
- **Provision (golden base) → fresh identity, everything.** A provisioned guest is new: reset
MAC + hostname **host-side via the token config** (the agent does NOT touch guest internals),
while **`/etc/machine-id`** (a duplicate breaks journald/DHCP/systemd) and **SSH host keys**
regenerate **guest-side on first boot** — machine-id by systemd for free, host keys by a baked,
Condition-gated `felhom-regen-hostkeys.service` unit in the golden (the F3 decision: Debian does
NOT auto-regenerate host keys after a restore, so the golden carries the regeneration, keeping
the agent host-side-only). It then receives a **fresh** controller identity (host-id, local
token, hub channel), **fresh restic repo identity**, and a fresh tunnel association — all minted
in the back half (slice 8).
- **Guest-loss DR (customer backup) → preserve continuity identity, reset only what would
collide.** The restored guest must *continue* the customer's world: **keep** the restic repo
identity (resetting it orphans the existing backup chain — a silent data-continuity bug), the
@@ -272,9 +277,11 @@ identity" is shorthand for two different operations:
exactly this reason); in a true total guest-loss the original is gone, so the MAC can be kept to
preserve DHCP reservations. The agent decides MAC handling from the scenario, not a fixed rule.
The exact reset set is being pinned empirically by the slice-7 bring-up spike (live, link-up,
which the slice-6 restore-test never did — it boots link-down precisely because identity reset is
slice 7).
The exact reset set was pinned empirically by the slice-7 bring-up spike (live, link-up
`documentation/tests/slice7-bringup-spike-findings.md`, commit `3342993`) and **implemented in the
unified bring-up reconcile job** (agent v0.8.0, `internal/reconcile/bringup.go`): F1 — a restore
preserves the archived MAC, so provision reset is unconditional (`PUT net0` with `hwaddr` omitted);
F3 — host keys via the baked golden unit, not an agent guest-internal op.
**Guest loss (slice 7).** Agent restores G from the fastest surviving tier (snapshot → local →
PBS) and applies the **DR identity policy** above so the restored guest rejoins cleanly. The
@@ -285,9 +292,9 @@ this path — bring up + reattach external storage and it is whole. This is full
| Capability | Slice | Status |
|---|---|---|
| Golden base image build (root@pam, at enrollment) | **7** | spike → build |
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | spike → spec → implement |
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | in scope |
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit); golden archived at enrollment |
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) |
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) |
| PBS recovery-code escrow **creation** (§8a) | **7** | designed (§8a); implement |
| Provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8** | deferred — needs the controller-deploy path + agent↔controller local API (§6) |
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / PBS namespace / tunnel token / storage manifest / restore directive | **10** | deferred — needs hub desired-state serving; hub store today holds only `{host_id, customer_id, api_key}` (slice 3) |