237452c8c6
Also overwrite REPORT.md with the live --selftest=task validation on demo-felhom (snapshot/rollback/delete on guest 9999, exitstatus=OK under the felhom-agent@pve privsep token; slice-1 mutating-ops gap closed, slice 4 unblocked). No version bump. Token secret stored out-of-band, not committed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
35 lines
3.6 KiB
Markdown
35 lines
3.6 KiB
Markdown
# REPORT — live `--selftest=task` validation on the demo host (2026-06-08)
|
|
|
|
> Overwrite-latest report (most recent significant run only). Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
|
|
|
|
## Outcome
|
|
|
|
**`--selftest=task` PASSED live against the demo Proxmox host.** The slice-1 gap — slice-1 mutating ops + `WaitTask` were unit-tested only, never run against a live host — is **closed**. The shared async `WaitTask` foundation (UPID poll → assert `exitstatus == "OK"`) is now validated live. **Slice 4 (reconcile) is unblocked.**
|
|
|
|
## What ran
|
|
|
|
Executed the live-validation runbook end-to-end on node `demo-felhom` (`https://192.168.0.162:8006`, PVE 9.2.x; `root@pam` via SSH alias `felhom-pve` for provisioning, agent run from the build server `192.168.0.180` on go1.26).
|
|
|
|
1. **Provisioned the operator-tier token (Part A).** Created the `FelhomAgent` role with the full **16 privileges**, the `felhom-agent@pve` user, and a `--privsep 1` token `felhom-agent@pve!agent`. Granted the role on **both** the user **and** the token (the privsep intersection gotcha) — verified two ACL rows at `/`.
|
|
2. **Scratch guest (Part B).** Created stopped LXC **9999** (`felhom-selftest-scratch`), rootfs on `local-lvm` (lvmthin → snapshot-capable). Kept stopped for deterministic rollback. No stale `felhom-selftest` snapshot present.
|
|
3. **Config + TLS (Part C).** Confirmed the demo host's current leaf-cert SHA-256 fingerprint still matches the pinned value (`BA:7C:99:7D:45:D0…`). Built the agent (v0.3.1) on the build server.
|
|
4. **Read-only gate (Part D).** `--selftest=read` clean with the new token: PVE 9.2.2, node online, guest 9999 visible, storages listed.
|
|
5. **Live mutating run (Part E).** `--selftest=task -vmid 9999` — snapshot → rollback → delete-snapshot, each returning a real UPID that `WaitTask` polled to `exitstatus=OK`.
|
|
|
|
## Evidence
|
|
|
|
- **`exitstatus=OK`** on all three ops (a `200` on the POST is explicitly *not* treated as success — the `exitstatus` assertion is the point of the run).
|
|
- The task **UPIDs name the token actor** (`…:vzsnapshot:9999:felhom-agent@pve!agent:`, likewise `vzrollback` / `vzdelsnapshot`) — confirming the privsep token path was genuinely exercised, no privilege drift.
|
|
- **Role:** all 16 privileges present (`VM.Snapshot`, `VM.Snapshot.Rollback`, `VM.Backup`, the `VM.Config.*` set, `VM.PowerMgmt`, `VM.Allocate/Audit`, `Datastore.*`, `Sys.Audit`, `SDN.Use`).
|
|
- **ACLs:** both `-user felhom-agent@pve` and `-token felhom-agent@pve!agent` carry `FelhomAgent` at `/`.
|
|
- **Post-state (Part F):** `felhom-selftest` snapshot created then cleaned (only `current` remains); guest left **stopped**, as started.
|
|
|
|
## Scope / not covered (by design)
|
|
|
|
- **Not validated live:** `Start`/`Stop`/`SetConfig` (reversible, low-risk; `SetConfig` is used by reconcile — an optional selftest extension could add them), `Vzdump` (already confirmed live in the phase1-2 spike), and `RestoreLXC` / provision-by-restore (deferred until the golden base image exists, ~slice 7).
|
|
- The run used a **stopped** guest deliberately, to keep rollback deterministic (LXC snapshots carry no running-memory state; rollback of a running CT may error or stop the guest). Characterizing running-guest rollback is optional follow-up intel, not a slice-4 blocker.
|
|
|
|
## Credentials
|
|
|
|
The standing `FelhomAgent` operator token (`felhom-agent@pve!agent`) provisioned here is the one slice 4+ consumes — **not deleted**. Its secret is **stored out-of-band**, supplied to the agent via `FELHOM_AGENT_PROXMOX_TOKEN`; it is **not persisted to the repo** (the on-disk config holds only a placeholder). Scratch guest 9999 is retained (stopped) as the standing selftest target.
|