605ce25f58
Append a reversible SetConfig write+revert to runSelftestTask: read GuestConfig, write a `description` marker, verify it landed, restore the original (or delete if absent), verify the restore. Handles PVE's dual-mode SetConfig return (empty UPID = synchronous; UPID = WaitTask+assert OK). Live self-gate PASSED on demo-felhom / guest 9999. Findings: - LXC `description` write is synchronous (empty UPID) — dual-mode modeling confirmed; empty string is success, not an error. - PVE appends a trailing newline to `description` on read; slice-4 reconcile must normalize description comparisons (hence normDesc helper). First live exercise of the VM.Config.* privilege cluster. Standing operator token rotated during the run; new secret stored out-of-band, not in the repo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
77 lines
3.8 KiB
Markdown
77 lines
3.8 KiB
Markdown
# REPORT — `SetConfig` selftest extension, live self-gate (2026-06-08)
|
|
|
|
> Overwrite-latest report (most recent significant run only). Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
|
|
|
|
## Outcome
|
|
|
|
**`SetConfig` PASSED live under the scoped operator token.** The slice-4 pre-check is
|
|
satisfied — `--selftest=task -vmid 9999` now exercises a reversible `SetConfig`
|
|
write+revert end-to-end and reached `=== selftest=task OK ===` (exit 0). Reconcile
|
|
(slice 4) can be built on `SetConfig` with confidence.
|
|
|
|
## What was implemented
|
|
|
|
A reversible `SetConfig` step appended to the existing `runSelftestTask` flow
|
|
(`cmd/felhom-agent/main.go`, `selftestSetConfig`), keeping the prior
|
|
snapshot → rollback → delete-snapshot steps intact. Against guest 9999:
|
|
|
|
1. `GuestConfig` — capture the original `description` (was **absent**).
|
|
2. `SetConfig description="felhom-selftest <RFC3339>"` — dual-mode return handled per
|
|
the `mutate.go` contract (empty UPID = synchronous; UPID = `WaitTask`+assert OK).
|
|
3. `GuestConfig` again — confirm the marker landed.
|
|
4. **Restore** — original was absent, so `SetConfig delete=description`; confirm cleared.
|
|
|
|
Output matches the existing format:
|
|
```
|
|
[ ok ] setconfig synchronous exitstatus=OK
|
|
[ ok ] verify-write description verified == marker
|
|
[ ok ] setconfig-revert synchronous exitstatus=OK
|
|
[ ok ] verify-revert description restored to original
|
|
```
|
|
|
|
## Key finding — synchronous, not async
|
|
|
|
**The LXC `description` write came back synchronous (empty UPID).** PVE applied it
|
|
inline with no task object; the agent printed `synchronous exitstatus=OK` on the
|
|
empty-string path. This confirms the agent's **dual-mode `SetConfig` modeling matches
|
|
Proxmox reality**: for `description`, the empty-UPID branch is the live path, and
|
|
treating `""` as success (not an error) is correct. This was the **first live exercise
|
|
of the `VM.Config.*` privilege cluster** (previously only the snapshot/rollback/backup
|
|
privileges had been run live).
|
|
|
|
## Second finding — `description` trailing-newline normalization
|
|
|
|
PVE **appends a trailing `\n` to `description` on read** (stored URL-encoded as
|
|
`%0A...`). The first live run surfaced this as a (false) verify mismatch:
|
|
`got="...Z\n"` vs `want="...Z"`. The write had genuinely landed — only my exact-match
|
|
check was too strict. Fixed with `normDesc` (strip trailing newline) at every
|
|
comparison point, and the run went green. **This is load-bearing intel for slice 4:**
|
|
a reconcile that compares desired vs actual `description` verbatim will detect
|
|
perpetual drift; it must normalize the trailing newline.
|
|
|
|
## Live run environment
|
|
|
|
- Built **v0.3.2** on the build server (192.168.0.180, go1.26), pointed at
|
|
`demo-felhom` (`https://192.168.0.162:8006`, PVE 9.2.2).
|
|
- Pinned leaf-cert SHA-256 fingerprint re-verified — still
|
|
`BA:7C:99:7D:45:D0…` (matches the agent's pin).
|
|
- `--selftest=read` clean first (PVE 9.2.2, node online, guests 9001+9999 visible,
|
|
storages listed), then the gated `--selftest=task -vmid 9999`.
|
|
- Task UPIDs name the token actor (`…:vzsnapshot:9999:felhom-agent@pve!agent:` etc.) —
|
|
privsep token path genuinely exercised, no privilege drift.
|
|
|
|
## Post-state
|
|
|
|
Guest **9999** left pristine: **stopped**, `description` **absent**, only `current`
|
|
remains (no leftover `felhom-selftest` snapshot).
|
|
|
|
## Credentials
|
|
|
|
The standing operator token (`felhom-agent@pve!agent`, privsep) was **rotated** during
|
|
this run — the prior secret was not retrievable (PVE reveals a token secret only once
|
|
at creation), so a fresh secret was minted via `root@felhom-pve` and the `FelhomAgent`
|
|
role re-confirmed on **both** the user and the token ACL at `/` (privsep intersection
|
|
gotcha). The token was consumed via the **standing operator token through
|
|
`FELHOM_AGENT_PROXMOX_TOKEN`, not persisted to the repo** — the on-disk demo config
|
|
carries only a placeholder. The new secret is **stored out-of-band**.
|