Files
felhom-agent/REPORT.md
T
2026-06-08 14:47:38 +02:00

54 lines
4.2 KiB
Markdown

# felhom-agent — latest task report
> This file holds the report for the **most recent** change, fully overwritten each task.
> Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
## Task: Agent scaffold + `proxmox` interaction package (slice 1) — v0.1.0
Stood up the host-agent project and its foundation — the typed `proxmox` interaction
layer every other agent module will call — with a runnable read-only `--selftest`.
Pushed to `main` (main-only repo). Build/vet/test green; verified live against the demo host.
### Public surface
**`proxmox.Client`** (API backend):
- Read: `Version`, `Nodes`, `NodeStatus`, `ListLXC`, `GuestStatus`, `GuestConfig`, `ListStorage`, `NodeStorage`, `StorageContent`
- Async mutating (return a UPID): `RestoreLXC` (primary create path), `Vzdump`, `Snapshot`, `Rollback`, `DeleteSnapshot`, `SetConfig`, `Start`, `Stop`
- Tasks: `WaitTask(ctx, upid, WaitOptions)`, `TaskStatusOnce`, `TaskLogTail`
- Errors: `*APIError` (parses the offending privilege from a 403), `*TaskError` (parses it from a failed task `exitstatus` + log tail)
- Types: `Version, Node, NodeStatus, Guest, GuestConfig (+Extra/MountPoints/Nets), Storage, StorageContent, TaskStatus, UPID`
**`proxmox.Privileged`** (fenced root-CLI; `Runner` iface, `ExecRunner` direct/`sudo -n`): `CreateGoldenLXC` (keyctl), `MountUSBByUUID`, `SMART`, `Sensors` — each documents *why it can't be the API*.
### API-vs-root routing table
| Backend | Ops | Why |
|---|---|---|
| **API** | node status, list/status/config guests, storage list+content, task status/log, **restore**, vzdump, snapshot/rollback/delete-snap, set-config, start/stop | FelhomAgent 16-priv token |
| **root-CLI (fenced)** | golden `pct create` (keyctl=1), USB mount-by-UUID/fstab, SMART/sensors | keyctl is `root@pam`-only; host mounts + SMART aren't API ops |
Fence is **structural** (`Client` has no runner, `Privileged` has no HTTP client) and asserted in `routing_test.go`.
### OPEN-item choices
- **Config:** JSON file + `FELHOM_AGENT_*` env overrides (stdlib, zero-dep; swappable to `yaml.v3` if YAML house-style is preferred). Token never logged (`Redacted()`).
- **Privileged runner / uid:** `Runner` iface; `ExecRunner{Mode: sudo|direct}`, default `sudo -n`. Proposed (not finalized): non-root service user + narrow sudoers allowlist for the 3 fenced commands.
- **Polling:** first poll immediate, then 1s → exponential backoff capped 5s, default total timeout 10m; honors ctx cancellation. Tunable via `WaitOptions`.
- **`--selftest=task`:** included (gated behind the flag + `-vmid`). Unit-tested via mocks; not run live (the live token was read-only).
- **Versioning:** `version` var in `main.go` (default `0.1.0`, `-ldflags -X main.version=`), `--version` flag.
### What the live host revealed (recorded, not guessed)
- Node name is **`demo-felhom`**; `felhom-pve` is only the SSH alias.
- `/nodes/{node}/status`: `cpu` is a 0..1 fraction, **`loadavg` is an array of strings**; `memory`/`rootfs`/`swap` nested.
- `vmid` is an **integer** in list/status; `status/current` carries no `vmid` (set from the path arg).
- Task: `status` ∈ {running, stopped}, `exitstatus` only once stopped; task log is `[{"n":N,"t":"…"}]`. UPID = `UPID:node:pid(hex):pstart(hex):starttime(hex):worker:id:user:`.
- `pveum user token add … --output-format json` returns `{"value":"…"}`.
- **No spike fact failed in practice** — 16-priv role, async/UPID model, keyctl boundary, dual-grant privsep all held. Teardown logged `ignore invalid acl token …`, confirming ACL auto-invalidation (phase1-2 §5).
### Verification
- `go build/vet/test` green twice: locally (Go 1.26) and on the build server (Go 1.24.4).
- **Live read-only `--selftest`** (built on 192.168.0.180, against `https://192.168.0.162:8006`, **TLS fingerprint-pinned** — no insecure mode): version, nodes, node status, guests, storage all `[ ok ]`. slog confirmed the token rendered as `…=********`. Throwaway token created + torn down.
- Mutating ops + live `WaitTask` are unit-tested only (live run used a read-only token); `--selftest=task` is ready to exercise them against a real `FelhomAgent` token.
### Repo state
- Branch: `main` only (feature branch merged + deleted, local & remote). Latest: `chore(agent): add CHANGELOG, version the agent at 0.1.0`.