Files
admin 605ce25f58 v0.3.2: reversible SetConfig step in --selftest=task (slice-4 pre-check)
Append a reversible SetConfig write+revert to runSelftestTask: read
GuestConfig, write a `description` marker, verify it landed, restore the
original (or delete if absent), verify the restore. Handles PVE's dual-mode
SetConfig return (empty UPID = synchronous; UPID = WaitTask+assert OK).

Live self-gate PASSED on demo-felhom / guest 9999. Findings:
- LXC `description` write is synchronous (empty UPID) — dual-mode modeling
  confirmed; empty string is success, not an error.
- PVE appends a trailing newline to `description` on read; slice-4 reconcile
  must normalize description comparisons (hence normDesc helper).

First live exercise of the VM.Config.* privilege cluster. Standing operator
token rotated during the run; new secret stored out-of-band, not in the repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:13:04 +02:00

3.8 KiB

REPORT — SetConfig selftest extension, live self-gate (2026-06-08)

Overwrite-latest report (most recent significant run only). Cumulative history lives in CHANGELOG.md.

Outcome

SetConfig PASSED live under the scoped operator token. The slice-4 pre-check is satisfied — --selftest=task -vmid 9999 now exercises a reversible SetConfig write+revert end-to-end and reached === selftest=task OK === (exit 0). Reconcile (slice 4) can be built on SetConfig with confidence.

What was implemented

A reversible SetConfig step appended to the existing runSelftestTask flow (cmd/felhom-agent/main.go, selftestSetConfig), keeping the prior snapshot → rollback → delete-snapshot steps intact. Against guest 9999:

  1. GuestConfig — capture the original description (was absent).
  2. SetConfig description="felhom-selftest <RFC3339>" — dual-mode return handled per the mutate.go contract (empty UPID = synchronous; UPID = WaitTask+assert OK).
  3. GuestConfig again — confirm the marker landed.
  4. Restore — original was absent, so SetConfig delete=description; confirm cleared.

Output matches the existing format:

  [ ok ] setconfig        synchronous  exitstatus=OK
  [ ok ] verify-write     description verified == marker
  [ ok ] setconfig-revert synchronous  exitstatus=OK
  [ ok ] verify-revert    description restored to original

Key finding — synchronous, not async

The LXC description write came back synchronous (empty UPID). PVE applied it inline with no task object; the agent printed synchronous exitstatus=OK on the empty-string path. This confirms the agent's dual-mode SetConfig modeling matches Proxmox reality: for description, the empty-UPID branch is the live path, and treating "" as success (not an error) is correct. This was the first live exercise of the VM.Config.* privilege cluster (previously only the snapshot/rollback/backup privileges had been run live).

Second finding — description trailing-newline normalization

PVE appends a trailing \n to description on read (stored URL-encoded as %0A...). The first live run surfaced this as a (false) verify mismatch: got="...Z\n" vs want="...Z". The write had genuinely landed — only my exact-match check was too strict. Fixed with normDesc (strip trailing newline) at every comparison point, and the run went green. This is load-bearing intel for slice 4: a reconcile that compares desired vs actual description verbatim will detect perpetual drift; it must normalize the trailing newline.

Live run environment

  • Built v0.3.2 on the build server (192.168.0.180, go1.26), pointed at demo-felhom (https://192.168.0.162:8006, PVE 9.2.2).
  • Pinned leaf-cert SHA-256 fingerprint re-verified — still BA:7C:99:7D:45:D0… (matches the agent's pin).
  • --selftest=read clean first (PVE 9.2.2, node online, guests 9001+9999 visible, storages listed), then the gated --selftest=task -vmid 9999.
  • Task UPIDs name the token actor (…:vzsnapshot:9999:felhom-agent@pve!agent: etc.) — privsep token path genuinely exercised, no privilege drift.

Post-state

Guest 9999 left pristine: stopped, description absent, only current remains (no leftover felhom-selftest snapshot).

Credentials

The standing operator token (felhom-agent@pve!agent, privsep) was rotated during this run — the prior secret was not retrievable (PVE reveals a token secret only once at creation), so a fresh secret was minted via root@felhom-pve and the FelhomAgent role re-confirmed on both the user and the token ACL at / (privsep intersection gotcha). The token was consumed via the standing operator token through FELHOM_AGENT_PROXMOX_TOKEN, not persisted to the repo — the on-disk demo config carries only a placeholder. The new secret is stored out-of-band.