From 11c91a0ddedea5dfa86a9d73acacfc48ec266bf6 Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Mon, 8 Jun 2026 20:07:52 +0200 Subject: [PATCH] upodate --- CLAUDE.md | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..0d8f59b --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,94 @@ +# CLAUDE.md — `felhom-agent` + +> Place at the repo root (`felhom-agent/CLAUDE.md`). Loads when Claude Code touches this repo. +> Keep under ~200 lines. The cross-repo orientation lives in the workspace-root `e:\git\CLAUDE.md`; +> this file is `felhom-agent`-specific. + +## What this repo is + +`felhom-agent` is the operator-tier **host agent** that runs on each Proxmox host and owns **all** +Proxmox interaction: provision/restore guests, host storage, backup/restore orchestration, the hub +control loop, and a narrow per-guest local API. It is the **most privilege-sensitive** component. + +- It is the renamed former `proxmox-controller` repo. +- **Distinct from `felhom-controller`** — that is the *in-guest* controller (Docker-only, no Proxmox + creds). Do not confuse them. +- Control plane, not data plane: if the agent dies, apps keep serving; only management degrades. + +## Build / run + +- Module `gitea.dooplex.hu/admin/felhom-agent`; binary `felhom-agent` (`cmd/felhom-agent/`). +- **Pure Go stdlib + `golang.org/x/crypto` only** — no web frameworks. +- `go.mod` directive **go 1.25.0**; dep `golang.org/x/crypto v0.52.0` (declares go 1.25, will NOT + build on Go 1.24). The **build server (192.168.0.180) runs go1.26.0** (upstream Go on PATH, + backward-compatible). Build/run the agent there for live tests (same LAN as the demo host). +- Version: `version` var in `cmd/felhom-agent/main.go`, overridable via `-ldflags "-X main.version="`; + `--version` flag. **Current: v0.3.1.** Bump on meaningful changes + add a CHANGELOG entry. + +## Layout + +``` +cmd/felhom-agent/ main + flag handling + --selftest modes + the daemon entry +internal/config/ JSON config + FELHOM_AGENT_* env overlay; secrets redacted (Redacted()) +internal/log/ slog setup +internal/proxmox/ API-first Client + fenced root-CLI Privileged + UPID WaitTask +internal/authz/ operator signed-op verifier (SSHSIG); durable FileNonceStore +internal/hub/ daemon: HostReport collector + Bearer client + resilient Loop +``` + +## Proxmox model (the load-bearing rules) + +- **API-first** via a scoped `FelhomAgent` token (16 privileges). Raw root-CLI is **fenced to + exactly 3 exceptions**: keyctl `pct create` (golden image), USB mount/fstab, SMART/sensors. + `Client` never shells out; `Privileged` never makes HTTP calls (asserted by tests). Keep that fence. +- **Every mutating op is async** → returns a UPID → `WaitTask` asserts `exitstatus == "OK"`. + A 200 on the POST is **not** success; authorization can fail at task execution, not the POST. +- **TLS:** SHA-256 leaf-cert pinning (the host serves a self-signed cert). No insecure default. +- **Privsep token gotcha:** a `--privsep 1` token's rights = intersection of the backing user's + perms AND the token's ACLs — so the role must be granted on **both** user and token, or every + call 403s. (Token provisioning is out-of-band / human-run; the agent only consumes the token.) + +## Design + platform facts (read before designing) + +- Design doc: `felhom.eu/documentation/architecture/03-host-agent.md` (locked). +- Platform facts: `felhom.eu/documentation/proxmox-platform.md` + `tests/phase{0,1-2,3,4}-findings.md`. + +## Current state + +Built in slices, all on `main`: +- **v0.1.0** slice 1 — scaffold + `internal/proxmox` + `internal/config`/`log` + `--selftest`. +- **v0.2.0** slice 2 — `internal/authz` signed-op verifier. +- **v0.3.0** slice 3 — `internal/hub`: the first **daemon loop** (no-`--selftest` mode) posting a + read-only `HostReport` to the hub (= the heartbeat). Report's storage/backup/restore/pbs/audit + fields are **defined-but-empty** (slices 5/6); the envelope's desired-state/signed-ops fields are + **parsed-but-ignored** (slice 4). +- **v0.3.1** — slice-3 validation follow-ups. +- **Next: slice 4 (reconcile + benign/destructive gate)** — the first slice that issues real Proxmox + mutations. **Gated** on passing the live `--selftest=task` runbook first. + +## Demo host (for live tests) + +Node **`demo-felhom`**, API `https://192.168.0.162:8006`, PVE 9.2.2; leaf-cert SHA-256 fingerprint +starts `BA:7C:99:7D:45:D0…` (verify it still matches before a live run — the agent pins it). +`pveum`/`pct` ops need `root@pam` on the PVE (SSH alias `felhom-pve`) - available to Claude Code + +Selftest modes (run from the build server, pointed at the demo API): +- `--selftest` / `--selftest=read` — read-only health checks. +- `--selftest=task -vmid N` — reversible snapshot→rollback→delete on guest N (gated; never under bare `--selftest`). +- `--selftest=hub` — one collect + report round-trip to the hub. +- No flag → the **daemon** (poll loop); requires `hub` config. + +## Conventions + +- Push to `main` directly; no feature branches. +- `CHANGELOG.md` (repo root), newest on top, on every pushed change. +- `REPORT.md` (repo root) is **fully overwritten** each task with that task's report (cumulative + history lives in CHANGELOG). +- Code quality: verify generated code for bugs/edge cases; add debug logging; **ask rather than + guess** when you'd otherwise invent input/output. + +## Workflow & artifacts + +- Implement **`TASK.md` / `TASK-*.md`** specs (when placed as `TASK.md` or told to implement one), + then push + CHANGELOG + REPORT.md. +- **`RUNBOOK-*.md`** — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.) \ No newline at end of file