# Slice 7 Phase 0 — Golden base build + live bring-up (front half): Findings **Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, Debian 13 (Trixie). Bridge `vmbr0`, LAN DHCP (router at 192.168.0.1). **Date:** 2026-06-09. **Driver:** SPIKE-RUNBOOK (root@pam CLI for the golden build; the `FelhomAgent` API token for the per-customer front-half ops — restore/config/resize/start). **VMIDs:** golden-build `9100`, restored-test `9101` (both torn down; golden archive kept). > This document presents **data, observations, and the resulting design deliverables** (the > identity-reset field list). It feeds the spec of the unified bring-up reconcile job (Phase 7.1). --- ## 1. Provenance / setup | Component | Value | |---|---| | Template | `local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst` | | Restore storage | `local-lvm` (lvmthin) · Archive storage | `local` (dir, `/var/lib/vz/dump`) | | Token | `felhom@pve!agent` (the `FelhomAgent` 16-priv role; by reference) | | Golden archive (KEPT) | `local:backup/vzdump-lxc-9100-2026_06_09-20_41_10.tar.zst` (298 MB) | | openssh-server (in guest) | `1:10.0p1-7` | | Docker storage driver | **`overlayfs`** (not `overlay2`/`vfs`) — consistent with phase0 | Token API smoke (S0): `GET /version` → 200, `GET /nodes/demo-felhom/lxc` → 200. Token holds `VM.Allocate`, `Datastore.Allocate`/`AllocateSpace`, `VM.Config.{Disk,Network,Options,…}`, `VM.PowerMgmt`, `VM.Backup`, etc. (full set confirmed via `/access/permissions`). ## 2. Golden recipe (validated — build the real golden from this) 1. **Create (root@pam — the one root step; `keyctl=1` is root-only, phase3 #1):** ``` pct create 9100 local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst \ --hostname felhom-golden --unprivileged 1 --features nesting=1,keyctl=1 \ --rootfs local-lvm:8 --cores 2 --memory 2048 \ --net0 name=eth0,bridge=vmbr0,ip=dhcp --onboot 0 ``` (`pct create` auto-generates SSH host keys — these get wiped in step 3.) 2. **Docker (official apt repo, `trixie` channel):** `ca-certificates curl` → keyring → `docker-ce docker-ce-cli containerd.io`. Confirmed working in the build guest: `docker run --rm hello-world` → "Hello from Docker!", **storage driver `overlayfs`**. 3. **Identity-clean + minimize (guest-internal, run during build):** ``` systemctl stop docker containerd apt-get clean; rm -rf /var/lib/apt/lists/* rm -f /etc/ssh/ssh_host_* # SSH host keys truncate -s 0 /etc/machine-id # systemd regenerates on first boot rm -f /var/lib/dbus/machine-id; ln -sf /etc/machine-id /var/lib/dbus/machine-id rm -rf /var/log/*; : > /root/.bash_history rm -f /etc/hostname # set per-guest at provision ``` 4. **Stop + archive (root vzdump is fine for the build):** `pct stop 9100; vzdump 9100 --storage local --mode stop --compress zstd`. 5. **Archive carries keyctl (verified, phase3 method — embedded `./etc/vzdump/pct.conf`):** `features: nesting=1,keyctl=1` · `unprivileged: 1`. **It also carries the build guest's baked MAC** `BC:24:11:63:43:F4` and `hostname: felhom-golden` — see §4. ## 3. Result matrix | Property | As-restored (9101, stopped, pre-reset) | Front-half reset (token) | After link-up boot | |---|---|---|---| | keyctl / nesting / unpriv | **preserved** `nesting=1,keyctl=1,unprivileged:1` | — | **Docker runs** (`hello-world` OK) — keyctl *functional*, not just flag-present | | **MAC** | **KEPT golden's** `BC:24:11:63:43:F4` | reset → fresh `BC:24:11:A6:C0:DE` (PUT net0, **omit hwaddr** → PVE regenerates) | DHCP lease `192.168.0.109`; MAC unique; no LAN collision | | **hostname** | **KEPT golden's** `felhom-golden` (config field; `/etc/hostname` file absent) | reset → `felhom-spike-9101` (PUT hostname) | **propagated** inside (`hostname` = `felhom-spike-9101`) | | **machine-id** | **empty** (baked `truncate`) | — | **auto-regenerated by systemd** → `faeffb0bc1b8403089cdd0b981cff109` (unique) | | **SSH host keys** | **absent** (baked `rm`) | — | **NOT regenerated; `ssh.service` FAILED** — see Finding F3 | | rootfs | 8 G | **resize → 10 G** (`PUT /resize disk=rootfs size=+2G`) | — | | mp0 mount | n/a | attached `local-lvm:1,mp=/mnt/spike-test` (transient 500 → retry 200, F4) | present + **writable** (ext4) | Token ops all ran as `felhom-agent@pve!agent` (restore `vzrestore` OK, start `vzstart` OK) — the per-customer front half is **fully token-covered**. ## 4. Findings - **F1 — MAC reset is UNCONDITIONAL.** A token `vzrestore` **preserves the archived MAC** (9101 came up with the golden's `BC:24:11:63:43:F4`). Every guest restored from the golden would therefore share one MAC → guaranteed L2 collision. The reconcile job **must** reset MAC on every provision (host-side: `PUT net0` with `hwaddr` omitted → PVE generates a fresh `BC:24:11:xx:xx:xx`). This settles the §9 "MAC handling" question for the *provision* path: always reset. (DR-restore of a *customer* backup is the separate continuity case — §9.) - **F2 — hostname is carried in the config and must be reset host-side.** The archive's `hostname:` field restored verbatim (`felhom-golden`); `PUT hostname=` resets it and it **propagates into the guest** on boot. Host-side, token-covered — no guest-internal step. - **F3 — machine-id regenerates for free; SSH host keys do NOT (design consequence).** - `machine-id`: bake `truncate -s 0` → **systemd regenerates it on first boot** (confirmed non-empty + unique). No agent action needed. ✓ free. - SSH host keys: bake `rm` → on Debian 13 they are **not** regenerated at boot (the keygen is a `pct create` hook + a package-install action; **`pct restore` runs neither**). Result: `openssh-server` is installed and `ssh.service` is **enabled but FAILED** on first boot (no host keys). `ssh-keygen -A` regenerates them cleanly (unique fingerprint `SHA256:MAX191…ED25519`, `root@felhom-spike-9101`). → **The bring-up reconcile job must regenerate SSH host keys guest-side** (`ssh-keygen -A`, or `dpkg-reconfigure openssh-server`). **This widens the agent's guest-internal surface** beyond pure host-side config — the one real design consequence this spike surfaced. *Alternative to consider in the spec:* bake a one-shot first-boot unit into the golden that runs `ssh-keygen -A` (keeps regeneration guest-internal-but-baked, so the agent stays host-side-only). Either way it must be decided; it is **not** free like machine-id. - **F4 — transient config-lock 500 on back-to-back PUTs.** A `mp0` attach issued immediately after a `resize` returned **HTTP 500**, then succeeded (200) on retry seconds later — a config-lock contention, **not** a permission issue (token holds `VM.Config.Disk` + `Datastore.AllocateSpace`). The reconcile job's existing **per-guest serialization** avoids this; add a **retry on transient 500** for safety. - **F5 — keyctl-through-restore is *functional*, not just flag-present.** Docker started and ran `hello-world` in the *restored* guest — re-confirms phase3 #8 on the golden specifically. - **F6 — Docker storage driver is `overlayfs`** (not `overlay2`), matching phase0's LXC result. No extra config beyond `nesting=1,keyctl=1` was needed. - **F7 — live link-up surfaced no DHCP/ARP problem.** Fresh MAC → fresh lease `192.168.0.109`; the golden's old MAC only lingered as a STALE IPv6-neighbour cache entry from the (stopped) build guest. No active collision. ## 5. Identity-reset deliverable (the §9 / doc-13 open item — settled for the *provision* path) | Field | Restore leaves it as | Who resets it | Where | Cost | |---|---|---|---|---| | MAC | golden's archived MAC | reconcile job (unconditional) | **host-side** token `PUT net0` (omit hwaddr) | cheap | | hostname | golden's archived hostname | reconcile job | **host-side** token `PUT hostname` | cheap | | machine-id | empty (baked) | **systemd, first boot** | guest first-boot regen (golden bake) | **free** | | SSH host keys | absent (baked) | reconcile job | **guest-side** `ssh-keygen -A` (or baked first-boot unit) | **surface-widening — flag** | **Reconcile-job front-half reset set (provision):** host-side `{MAC, hostname}` via token config; guest-side `{SSH host keys}` via `ssh-keygen -A` (or a baked first-boot unit); `{machine-id}` is handled for free by the bake-clean golden. Restic / tunnel / hub identity are **out of scope** here (back half, slice 8 / DR policy §9). ## 6. Verdict **READY to spec the unified bring-up reconcile job (Phase 7.1).** The golden recipe is validated end-to-end and the token-covered front half (restore → reset MAC+hostname → resize → attach mount → start link-up) works with Docker functional in the restored guest. **One design change the findings force:** the front half is **not** purely host-side — SSH-host-key regeneration is a guest-internal step (F3). The spec must choose between an agent-run `ssh-keygen -A` (widening the guest-internal surface) and a baked first-boot unit in the golden (keeping the agent host-side). machine-id needs no such step. MAC reset is unconditional (F1). ## 7. Out of scope (not done here — note for the implementation) - Controller deploy / bootstrap / per-guest local-token mint — **slice 8** (back half). - Restic / tunnel / hub identity handling — DR identity policy (§9) + slice 8/10. - Reconcile-job journaling + compensating rollback — the **implementation** (Phase 7.1), specced from these findings; this spike restored/destroyed manually without the journal. - PBS escrow (§8a) — separate slice-7 thread.