Files
felhom-agent/REPORT.md
T
admin 605ce25f58 v0.3.2: reversible SetConfig step in --selftest=task (slice-4 pre-check)
Append a reversible SetConfig write+revert to runSelftestTask: read
GuestConfig, write a `description` marker, verify it landed, restore the
original (or delete if absent), verify the restore. Handles PVE's dual-mode
SetConfig return (empty UPID = synchronous; UPID = WaitTask+assert OK).

Live self-gate PASSED on demo-felhom / guest 9999. Findings:
- LXC `description` write is synchronous (empty UPID) — dual-mode modeling
  confirmed; empty string is success, not an error.
- PVE appends a trailing newline to `description` on read; slice-4 reconcile
  must normalize description comparisons (hence normDesc helper).

First live exercise of the VM.Config.* privilege cluster. Standing operator
token rotated during the run; new secret stored out-of-band, not in the repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:13:04 +02:00

77 lines
3.8 KiB
Markdown

# REPORT — `SetConfig` selftest extension, live self-gate (2026-06-08)
> Overwrite-latest report (most recent significant run only). Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
## Outcome
**`SetConfig` PASSED live under the scoped operator token.** The slice-4 pre-check is
satisfied — `--selftest=task -vmid 9999` now exercises a reversible `SetConfig`
write+revert end-to-end and reached `=== selftest=task OK ===` (exit 0). Reconcile
(slice 4) can be built on `SetConfig` with confidence.
## What was implemented
A reversible `SetConfig` step appended to the existing `runSelftestTask` flow
(`cmd/felhom-agent/main.go`, `selftestSetConfig`), keeping the prior
snapshot → rollback → delete-snapshot steps intact. Against guest 9999:
1. `GuestConfig` — capture the original `description` (was **absent**).
2. `SetConfig description="felhom-selftest <RFC3339>"` — dual-mode return handled per
the `mutate.go` contract (empty UPID = synchronous; UPID = `WaitTask`+assert OK).
3. `GuestConfig` again — confirm the marker landed.
4. **Restore** — original was absent, so `SetConfig delete=description`; confirm cleared.
Output matches the existing format:
```
[ ok ] setconfig synchronous exitstatus=OK
[ ok ] verify-write description verified == marker
[ ok ] setconfig-revert synchronous exitstatus=OK
[ ok ] verify-revert description restored to original
```
## Key finding — synchronous, not async
**The LXC `description` write came back synchronous (empty UPID).** PVE applied it
inline with no task object; the agent printed `synchronous exitstatus=OK` on the
empty-string path. This confirms the agent's **dual-mode `SetConfig` modeling matches
Proxmox reality**: for `description`, the empty-UPID branch is the live path, and
treating `""` as success (not an error) is correct. This was the **first live exercise
of the `VM.Config.*` privilege cluster** (previously only the snapshot/rollback/backup
privileges had been run live).
## Second finding — `description` trailing-newline normalization
PVE **appends a trailing `\n` to `description` on read** (stored URL-encoded as
`%0A...`). The first live run surfaced this as a (false) verify mismatch:
`got="...Z\n"` vs `want="...Z"`. The write had genuinely landed — only my exact-match
check was too strict. Fixed with `normDesc` (strip trailing newline) at every
comparison point, and the run went green. **This is load-bearing intel for slice 4:**
a reconcile that compares desired vs actual `description` verbatim will detect
perpetual drift; it must normalize the trailing newline.
## Live run environment
- Built **v0.3.2** on the build server (192.168.0.180, go1.26), pointed at
`demo-felhom` (`https://192.168.0.162:8006`, PVE 9.2.2).
- Pinned leaf-cert SHA-256 fingerprint re-verified — still
`BA:7C:99:7D:45:D0…` (matches the agent's pin).
- `--selftest=read` clean first (PVE 9.2.2, node online, guests 9001+9999 visible,
storages listed), then the gated `--selftest=task -vmid 9999`.
- Task UPIDs name the token actor (`…:vzsnapshot:9999:felhom-agent@pve!agent:` etc.) —
privsep token path genuinely exercised, no privilege drift.
## Post-state
Guest **9999** left pristine: **stopped**, `description` **absent**, only `current`
remains (no leftover `felhom-selftest` snapshot).
## Credentials
The standing operator token (`felhom-agent@pve!agent`, privsep) was **rotated** during
this run — the prior secret was not retrievable (PVE reveals a token secret only once
at creation), so a fresh secret was minted via `root@felhom-pve` and the `FelhomAgent`
role re-confirmed on **both** the user and the token ACL at `/` (privsep intersection
gotcha). The token was consumed via the **standing operator token through
`FELHOM_AGENT_PROXMOX_TOKEN`, not persisted to the repo** — the on-disk demo config
carries only a placeholder. The new secret is **stored out-of-band**.