docs: reflow CLAUDE.md; unify REPORT/CHANGELOG convention; add no-secrets rule

Also overwrite REPORT.md with the live --selftest=task validation on demo-felhom
(snapshot/rollback/delete on guest 9999, exitstatus=OK under the felhom-agent@pve
privsep token; slice-1 mutating-ops gap closed, slice 4 unblocked). No version bump.
Token secret stored out-of-band, not committed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 20:54:18 +02:00
parent 11c91a0dde
commit 237452c8c6
3 changed files with 54 additions and 75 deletions
+21 -38
View File
@@ -1,29 +1,21 @@
# CLAUDE.md — `felhom-agent`
> Place at the repo root (`felhom-agent/CLAUDE.md`). Loads when Claude Code touches this repo.
> Keep under ~200 lines. The cross-repo orientation lives in the workspace-root `e:\git\CLAUDE.md`;
> this file is `felhom-agent`-specific.
> Place at the repo root (`felhom-agent/CLAUDE.md`). Loads when Claude Code touches this repo. Keep under ~200 lines. The cross-repo orientation lives in the workspace-root `e:\git\CLAUDE.md`; this file is `felhom-agent`-specific.
## What this repo is
`felhom-agent` is the operator-tier **host agent** that runs on each Proxmox host and owns **all**
Proxmox interaction: provision/restore guests, host storage, backup/restore orchestration, the hub
control loop, and a narrow per-guest local API. It is the **most privilege-sensitive** component.
`felhom-agent` is the operator-tier **host agent** that runs on each Proxmox host and owns **all** Proxmox interaction: provision/restore guests, host storage, backup/restore orchestration, the hub control loop, and a narrow per-guest local API. It is the **most privilege-sensitive** component.
- It is the renamed former `proxmox-controller` repo.
- **Distinct from `felhom-controller`** — that is the *in-guest* controller (Docker-only, no Proxmox
creds). Do not confuse them.
- **Distinct from `felhom-controller`** — that is the *in-guest* controller (Docker-only, no Proxmox creds). Do not confuse them.
- Control plane, not data plane: if the agent dies, apps keep serving; only management degrades.
## Build / run
- Module `gitea.dooplex.hu/admin/felhom-agent`; binary `felhom-agent` (`cmd/felhom-agent/`).
- **Pure Go stdlib + `golang.org/x/crypto` only** — no web frameworks.
- `go.mod` directive **go 1.25.0**; dep `golang.org/x/crypto v0.52.0` (declares go 1.25, will NOT
build on Go 1.24). The **build server (192.168.0.180) runs go1.26.0** (upstream Go on PATH,
backward-compatible). Build/run the agent there for live tests (same LAN as the demo host).
- Version: `version` var in `cmd/felhom-agent/main.go`, overridable via `-ldflags "-X main.version=<v>"`;
`--version` flag. **Current: v0.3.1.** Bump on meaningful changes + add a CHANGELOG entry.
- `go.mod` directive **go 1.25.0**; dep `golang.org/x/crypto v0.52.0` (declares go 1.25, will NOT build on Go 1.24). The **build server (192.168.0.180) runs go1.26.0** (upstream Go on PATH, backward-compatible). Build/run the agent there for live tests (same LAN as the demo host).
- Version: `version` var in `cmd/felhom-agent/main.go`, overridable via `-ldflags "-X main.version=<v>"`; `--version` flag. **Current: v0.3.1.** Bump on meaningful changes + add a CHANGELOG entry.
## Layout
@@ -38,15 +30,10 @@ internal/hub/ daemon: HostReport collector + Bearer client + resilient Lo
## Proxmox model (the load-bearing rules)
- **API-first** via a scoped `FelhomAgent` token (16 privileges). Raw root-CLI is **fenced to
exactly 3 exceptions**: keyctl `pct create` (golden image), USB mount/fstab, SMART/sensors.
`Client` never shells out; `Privileged` never makes HTTP calls (asserted by tests). Keep that fence.
- **Every mutating op is async** → returns a UPID → `WaitTask` asserts `exitstatus == "OK"`.
A 200 on the POST is **not** success; authorization can fail at task execution, not the POST.
- **API-first** via a scoped `FelhomAgent` token (16 privileges). Raw root-CLI is **fenced to exactly 3 exceptions**: keyctl `pct create` (golden image), USB mount/fstab, SMART/sensors. `Client` never shells out; `Privileged` never makes HTTP calls (asserted by tests). Keep that fence.
- **Every mutating op is async** → returns a UPID → `WaitTask` asserts `exitstatus == "OK"`. A 200 on the POST is **not** success; authorization can fail at task execution, not the POST.
- **TLS:** SHA-256 leaf-cert pinning (the host serves a self-signed cert). No insecure default.
- **Privsep token gotcha:** a `--privsep 1` token's rights = intersection of the backing user's
perms AND the token's ACLs — so the role must be granted on **both** user and token, or every
call 403s. (Token provisioning is out-of-band / human-run; the agent only consumes the token.)
- **Privsep token gotcha:** a `--privsep 1` token's rights = intersection of the backing user's perms AND the token's ACLs — so the role must be granted on **both** user and token, or every call 403s. (Token provisioning is out-of-band / human-run; the agent only consumes the token.)
## Design + platform facts (read before designing)
@@ -58,19 +45,13 @@ internal/hub/ daemon: HostReport collector + Bearer client + resilient Lo
Built in slices, all on `main`:
- **v0.1.0** slice 1 — scaffold + `internal/proxmox` + `internal/config`/`log` + `--selftest`.
- **v0.2.0** slice 2 — `internal/authz` signed-op verifier.
- **v0.3.0** slice 3 — `internal/hub`: the first **daemon loop** (no-`--selftest` mode) posting a
read-only `HostReport` to the hub (= the heartbeat). Report's storage/backup/restore/pbs/audit
fields are **defined-but-empty** (slices 5/6); the envelope's desired-state/signed-ops fields are
**parsed-but-ignored** (slice 4).
- **v0.3.0** slice 3 — `internal/hub`: the first **daemon loop** (no-`--selftest` mode) posting a read-only `HostReport` to the hub (= the heartbeat). Report's storage/backup/restore/pbs/audit fields are **defined-but-empty** (slices 5/6); the envelope's desired-state/signed-ops fields are **parsed-but-ignored** (slice 4).
- **v0.3.1** — slice-3 validation follow-ups.
- **Next: slice 4 (reconcile + benign/destructive gate)** — the first slice that issues real Proxmox
mutations. **Gated** on passing the live `--selftest=task` runbook first.
- **Next: slice 4 (reconcile + benign/destructive gate)** — the first slice that issues real Proxmox mutations. **Gated** on passing the live `--selftest=task` runbook first.
## Demo host (for live tests)
Node **`demo-felhom`**, API `https://192.168.0.162:8006`, PVE 9.2.2; leaf-cert SHA-256 fingerprint
starts `BA:7C:99:7D:45:D0…` (verify it still matches before a live run — the agent pins it).
`pveum`/`pct` ops need `root@pam` on the PVE (SSH alias `felhom-pve`) - available to Claude Code
Node **`demo-felhom`**, API `https://192.168.0.162:8006`, PVE 9.2.2; leaf-cert SHA-256 fingerprint starts `BA:7C:99:7D:45:D0…` (verify it still matches before a live run — the agent pins it). `pveum`/`pct` ops need `root@pam` on the PVE (SSH alias `felhom-pve`) - available to Claude Code
Selftest modes (run from the build server, pointed at the demo API):
- `--selftest` / `--selftest=read` — read-only health checks.
@@ -81,14 +62,16 @@ Selftest modes (run from the build server, pointed at the demo API):
## Conventions
- Push to `main` directly; no feature branches.
- `CHANGELOG.md` (repo root), newest on top, on every pushed change.
- `REPORT.md` (repo root) is **fully overwritten** each task with that task's report (cumulative
history lives in CHANGELOG).
- Code quality: verify generated code for bugs/edge cases; add debug logging; **ask rather than
guess** when you'd otherwise invent input/output.
> **In every repository where you make a change, update both files in that repo:**
> - **`CHANGELOG.md`** — a cumulative log of **all** changes; newest entry on top.
> - **`REPORT.md`** — **overwrite** with a summary of the **most recent** implementation (or significant validation/operational run) only; not cumulative.
>
> **Never write secrets** — tokens, passwords, private keys, API keys — into `CHANGELOG.md`, `REPORT.md`, or any committed file. Reference them as "stored out-of-band" instead.
- Code quality: verify generated code for bugs/edge cases; add debug logging; **ask rather than guess** when you'd otherwise invent input/output.
## Workflow & artifacts
- Implement **`TASK.md` / `TASK-*.md`** specs (when placed as `TASK.md` or told to implement one),
then push + CHANGELOG + REPORT.md.
- **`RUNBOOK-*.md`** — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.)
- Implement **`TASK.md` / `TASK-*.md`** specs (when placed as `TASK.md` or told to implement one), then push + CHANGELOG + REPORT.md.
- **`RUNBOOK-*.md`** — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.)