feat(hub): host-report client + collector + first daemon loop (slice 3, v0.3.0)

internal/hub: the agent's first daemon — a periodic read-only host-report POSTed to
the hub (the heartbeat; no separate ping).

- HostReport wire contract (shared field-for-field with the hub ingest): host
  metrics, guests (vmid + spec), cloudflared status; storage/backups/restore-tests/
  pbs/audit collections DEFINED but emitted empty (slices 5/6 fill).
- Collector over a read-only proxmoxReader (adapted to the real proxmox surface;
  no proxmox changes) + a CloudflaredProber. Partial-failure: NodeStatus fail = hard
  (skip POST); per-guest GuestConfig fail = status "unknown", still report.
- Client: Bearer-auth POST, standard TLS (system roots / optional ca_file), typed
  TransportError/HTTPError, token never in errors.
- Loop: immediate first report, adopt hub poll_interval (clamp [60,3600]), resilient
  to collect/report errors, clean ctx-cancel shutdown.
- ControlEnvelope: only poll_interval_seconds acted on; blocked/desired_generation/
  has_signed_ops parsed-but-ignored (slice 4).
- config: HubConfig + FELHOM_AGENT_HUB_* overlay + mode-aware HubConfig.Validate +
  WithDefaults + hub-key redaction; example config updated.
- main: no-selftest mode is now the daemon; added --selftest=hub. Version -> 0.3.0.

Tests: report serialization, client (incl. token-redaction), collector partial-
failure, loop continuation+interval adoption, config. internal/proxmox + internal/
authz untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 16:20:09 +02:00
parent f0fee7e193
commit ab77fa3544
16 changed files with 1352 additions and 91 deletions
+56
View File
@@ -3,6 +3,62 @@
All notable changes to **felhom-agent** are recorded here. Update on every code
change that gets pushed.
## v0.3.0 — hub client + host-report + first daemon loop (slice 3) (2026-06-08)
The agent's first daemon: a periodic read-only host-report POSTed to the hub (the
heartbeat). No Proxmox mutations, no desired-state/signed-op consumption, no
storage/backup collection yet — those are slices 4/5/6.
### Added
- **`internal/hub`** package:
- **`HostReport`** wire contract (`report.go`) shared field-for-field with the hub
ingest: host metrics, guests (`vmid` + spec), `cloudflared` status, and the
`storage_targets`/`backups`/`restore_tests`/`pbs_snapshots`/`audit_tail`
collections **defined but emitted empty** (typed `[]`, slices 5/6 fill them).
- **`Collector`** (`collect.go`) builds the report from a read-only `proxmoxReader`
(adapted to the real `internal/proxmox` surface — node held by the client, value
returns, `proxmox.Guest`) + a `CloudflaredProber`. Partial-failure policy: a
failed `NodeStatus` is a hard error (skip the POST); a failed per-guest
`GuestConfig` degrades that guest to `status="unknown"` (spec omitted) but still
sends; a cloudflared probe failure → `"unknown"`, never fatal.
- **`CloudflaredProber`** + `SystemctlProber` (`systemctl is-active cloudflared`;
read-only — NOT a Privileged/root op; tunnel management is a later slice).
- **`Client`** (`client.go`): `POST /api/v1/host-report` with
`Authorization: Bearer <key>`, standard TLS (system roots or optional `ca_file`;
verification always on). Typed `*TransportError` / `*HTTPError`; the bearer token
never appears in any error.
- **`Loop`** (`loop.go`): the daemon — immediate first report then tick; adopts the
hub's `poll_interval_seconds` clamped to [60,3600]; resilient (a collect/report
error is logged and the loop continues); clean shutdown on context cancel.
- **`ControlEnvelope`**: only `poll_interval_seconds` is acted on; `blocked` /
`desired_generation` / `has_signed_ops` are parsed-but-ignored (logged at most)
pending reconcile (slice 4).
- **Config**: `HubConfig` (url/host_id/api_key/poll_seconds/timeout_seconds/ca_file),
`FELHOM_AGENT_HUB_*` env overlay, `HubConfig.Validate()` (mode-aware — proxmox-only
`--selftest=read|task` still runs without hub config), `WithDefaults()`, and
`Redacted()` now also blanks the hub key. `configs/agent.example.json` gains `hub`
(and `authz`) blocks.
- **`cmd/felhom-agent`**: the no-`--selftest` mode is now the **daemon** (poll loop);
added **`--selftest=hub`** (one collect+report, prints the report + envelope).
Version 0.2.0 → 0.3.0.
### Tests
- Report serialization (field names; empty collections are `[]` not `null`; spec
omitted when unknown); client (Bearer header, non-2xx→`*HTTPError`,
transport→`*TransportError`, **token never in error**); collector (host mapping,
guest spec, per-guest failure degrades-but-still-reports, NodeStatus hard error,
cloudflared error→unknown); loop (immediate first report, continuation after an
injected error, interval adoption + clamp); config (hub validate/redact/env).
### Notes
- `internal/proxmox` and `internal/authz` were **not touched** — no new proxmox
surface was needed (`ListLXC` already exposes status/maxmem/maxdisk; `GuestConfig`
exposes cores). The task's `proxmoxReader` sketch (node-arg/pointer/`LXC`) was
adapted to the real exports as instructed.
- **Defined-but-empty** this slice: `storage_targets`, `backups`, `restore_tests`,
`pbs_snapshots`, `audit_tail` (slices 5/6). **Parsed-but-ignored**: the envelope's
`blocked`/`desired_generation`/`has_signed_ops` (slice 4).
## v0.2.0 — `authz` signed-op verifier (slice 2) (2026-06-08)
Production form of the Phase-4 signing primitive: a key-type-agnostic SSHSIG