Also overwrite REPORT.md with the live --selftest=task validation on demo-felhom (snapshot/rollback/delete on guest 9999, exitstatus=OK under the felhom-agent@pve privsep token; slice-1 mutating-ops gap closed, slice 4 unblocked). No version bump. Token secret stored out-of-band, not committed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3.6 KiB
REPORT — live --selftest=task validation on the demo host (2026-06-08)
Overwrite-latest report (most recent significant run only). Cumulative history lives in CHANGELOG.md.
Outcome
--selftest=task PASSED live against the demo Proxmox host. The slice-1 gap — slice-1 mutating ops + WaitTask were unit-tested only, never run against a live host — is closed. The shared async WaitTask foundation (UPID poll → assert exitstatus == "OK") is now validated live. Slice 4 (reconcile) is unblocked.
What ran
Executed the live-validation runbook end-to-end on node demo-felhom (https://192.168.0.162:8006, PVE 9.2.x; root@pam via SSH alias felhom-pve for provisioning, agent run from the build server 192.168.0.180 on go1.26).
- Provisioned the operator-tier token (Part A). Created the
FelhomAgentrole with the full 16 privileges, thefelhom-agent@pveuser, and a--privsep 1tokenfelhom-agent@pve!agent. Granted the role on both the user and the token (the privsep intersection gotcha) — verified two ACL rows at/. - Scratch guest (Part B). Created stopped LXC 9999 (
felhom-selftest-scratch), rootfs onlocal-lvm(lvmthin → snapshot-capable). Kept stopped for deterministic rollback. No stalefelhom-selftestsnapshot present. - Config + TLS (Part C). Confirmed the demo host's current leaf-cert SHA-256 fingerprint still matches the pinned value (
BA:7C:99:7D:45:D0…). Built the agent (v0.3.1) on the build server. - Read-only gate (Part D).
--selftest=readclean with the new token: PVE 9.2.2, node online, guest 9999 visible, storages listed. - Live mutating run (Part E).
--selftest=task -vmid 9999— snapshot → rollback → delete-snapshot, each returning a real UPID thatWaitTaskpolled toexitstatus=OK.
Evidence
exitstatus=OKon all three ops (a200on the POST is explicitly not treated as success — theexitstatusassertion is the point of the run).- The task UPIDs name the token actor (
…:vzsnapshot:9999:felhom-agent@pve!agent:, likewisevzrollback/vzdelsnapshot) — confirming the privsep token path was genuinely exercised, no privilege drift. - Role: all 16 privileges present (
VM.Snapshot,VM.Snapshot.Rollback,VM.Backup, theVM.Config.*set,VM.PowerMgmt,VM.Allocate/Audit,Datastore.*,Sys.Audit,SDN.Use). - ACLs: both
-user felhom-agent@pveand-token felhom-agent@pve!agentcarryFelhomAgentat/. - Post-state (Part F):
felhom-selftestsnapshot created then cleaned (onlycurrentremains); guest left stopped, as started.
Scope / not covered (by design)
- Not validated live:
Start/Stop/SetConfig(reversible, low-risk;SetConfigis used by reconcile — an optional selftest extension could add them),Vzdump(already confirmed live in the phase1-2 spike), andRestoreLXC/ provision-by-restore (deferred until the golden base image exists, ~slice 7). - The run used a stopped guest deliberately, to keep rollback deterministic (LXC snapshots carry no running-memory state; rollback of a running CT may error or stop the guest). Characterizing running-guest rollback is optional follow-up intel, not a slice-4 blocker.
Credentials
The standing FelhomAgent operator token (felhom-agent@pve!agent) provisioned here is the one slice 4+ consumes — not deleted. Its secret is stored out-of-band, supplied to the agent via FELHOM_AGENT_PROXMOX_TOKEN; it is not persisted to the repo (the on-disk config holds only a placeholder). Scratch guest 9999 is retained (stopped) as the standing selftest target.