43b7e96905
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4.2 KiB
4.2 KiB
felhom-agent — latest task report
This file holds the report for the most recent change, fully overwritten each task. Cumulative history lives in CHANGELOG.md.
Task: Agent scaffold + proxmox interaction package (slice 1) — v0.1.0
Stood up the host-agent project and its foundation — the typed proxmox interaction
layer every other agent module will call — with a runnable read-only --selftest.
Pushed to main (main-only repo). Build/vet/test green; verified live against the demo host.
Public surface
proxmox.Client (API backend):
- Read:
Version,Nodes,NodeStatus,ListLXC,GuestStatus,GuestConfig,ListStorage,NodeStorage,StorageContent - Async mutating (return a UPID):
RestoreLXC(primary create path),Vzdump,Snapshot,Rollback,DeleteSnapshot,SetConfig,Start,Stop - Tasks:
WaitTask(ctx, upid, WaitOptions),TaskStatusOnce,TaskLogTail - Errors:
*APIError(parses the offending privilege from a 403),*TaskError(parses it from a failed taskexitstatus+ log tail) - Types:
Version, Node, NodeStatus, Guest, GuestConfig (+Extra/MountPoints/Nets), Storage, StorageContent, TaskStatus, UPID
proxmox.Privileged (fenced root-CLI; Runner iface, ExecRunner direct/sudo -n): CreateGoldenLXC (keyctl), MountUSBByUUID, SMART, Sensors — each documents why it can't be the API.
API-vs-root routing table
| Backend | Ops | Why |
|---|---|---|
| API | node status, list/status/config guests, storage list+content, task status/log, restore, vzdump, snapshot/rollback/delete-snap, set-config, start/stop | FelhomAgent 16-priv token |
| root-CLI (fenced) | golden pct create (keyctl=1), USB mount-by-UUID/fstab, SMART/sensors |
keyctl is root@pam-only; host mounts + SMART aren't API ops |
Fence is structural (Client has no runner, Privileged has no HTTP client) and asserted in routing_test.go.
OPEN-item choices
- Config: JSON file +
FELHOM_AGENT_*env overrides (stdlib, zero-dep; swappable toyaml.v3if YAML house-style is preferred). Token never logged (Redacted()). - Privileged runner / uid:
Runneriface;ExecRunner{Mode: sudo|direct}, defaultsudo -n. Proposed (not finalized): non-root service user + narrow sudoers allowlist for the 3 fenced commands. - Polling: first poll immediate, then 1s → exponential backoff capped 5s, default total timeout 10m; honors ctx cancellation. Tunable via
WaitOptions. --selftest=task: included (gated behind the flag +-vmid). Unit-tested via mocks; not run live (the live token was read-only).- Versioning:
versionvar inmain.go(default0.1.0,-ldflags -X main.version=),--versionflag.
What the live host revealed (recorded, not guessed)
- Node name is
demo-felhom;felhom-pveis only the SSH alias. /nodes/{node}/status:cpuis a 0..1 fraction,loadavgis an array of strings;memory/rootfs/swapnested.vmidis an integer in list/status;status/currentcarries novmid(set from the path arg).- Task:
status∈ {running, stopped},exitstatusonly once stopped; task log is[{"n":N,"t":"…"}]. UPID =UPID:node:pid(hex):pstart(hex):starttime(hex):worker:id:user:. pveum user token add … --output-format jsonreturns{"value":"…"}.- No spike fact failed in practice — 16-priv role, async/UPID model, keyctl boundary, dual-grant privsep all held. Teardown logged
ignore invalid acl token …, confirming ACL auto-invalidation (phase1-2 §5).
Verification
go build/vet/testgreen twice: locally (Go 1.26) and on the build server (Go 1.24.4).- Live read-only
--selftest(built on 192.168.0.180, againsthttps://192.168.0.162:8006, TLS fingerprint-pinned — no insecure mode): version, nodes, node status, guests, storage all[ ok ]. slog confirmed the token rendered as…=********. Throwaway token created + torn down. - Mutating ops + live
WaitTaskare unit-tested only (live run used a read-only token);--selftest=taskis ready to exercise them against a realFelhomAgenttoken.
Repo state
- Branch:
mainonly (feature branch merged + deleted, local & remote). Latest:chore(agent): add CHANGELOG, version the agent at 0.1.0.