Files
felhom-agent/REPORT.md
T
admin ab77fa3544 feat(hub): host-report client + collector + first daemon loop (slice 3, v0.3.0)
internal/hub: the agent's first daemon — a periodic read-only host-report POSTed to
the hub (the heartbeat; no separate ping).

- HostReport wire contract (shared field-for-field with the hub ingest): host
  metrics, guests (vmid + spec), cloudflared status; storage/backups/restore-tests/
  pbs/audit collections DEFINED but emitted empty (slices 5/6 fill).
- Collector over a read-only proxmoxReader (adapted to the real proxmox surface;
  no proxmox changes) + a CloudflaredProber. Partial-failure: NodeStatus fail = hard
  (skip POST); per-guest GuestConfig fail = status "unknown", still report.
- Client: Bearer-auth POST, standard TLS (system roots / optional ca_file), typed
  TransportError/HTTPError, token never in errors.
- Loop: immediate first report, adopt hub poll_interval (clamp [60,3600]), resilient
  to collect/report errors, clean ctx-cancel shutdown.
- ControlEnvelope: only poll_interval_seconds acted on; blocked/desired_generation/
  has_signed_ops parsed-but-ignored (slice 4).
- config: HubConfig + FELHOM_AGENT_HUB_* overlay + mode-aware HubConfig.Validate +
  WithDefaults + hub-key redaction; example config updated.
- main: no-selftest mode is now the daemon; added --selftest=hub. Version -> 0.3.0.

Tests: report serialization, client (incl. token-redaction), collector partial-
failure, loop continuation+interval adoption, config. internal/proxmox + internal/
authz untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:20:09 +02:00

4.6 KiB

felhom-agent — latest task report

This file holds the report for the most recent change, fully overwritten each task. Cumulative history lives in CHANGELOG.md.

Task: hub client + host-report + first daemon loop (slice 3) — v0.3.0

The agent's first daemon: a periodic, read-only host-report POSTed to the hub — which is the heartbeat (its server-side received_at is the dead-man's-switch signal). New internal/hub package + config additions + main.go daemon wiring. Pushed to main; build/vet/test green locally (go1.26) and on the build server.

internal/hub public surface

  • HostReport + sub-types (HostMetrics, Guest, GuestSpec, Cloudflared, ControlEnvelope) — the JSON wire contract shared field-for-field with the hub ingest.
  • CollectorNewCollector(px proxmoxReader, cf CloudflaredProber, hostID, agentVersion, logger); Collect(ctx) (*HostReport, error).
  • CloudflaredProber interface + SystemctlProber (systemctl is-active).
  • ClientNewClient(cfg config.HubConfig, logger) (*Client, error); Report(ctx, *HostReport) (*ControlEnvelope, error); typed *TransportError / *HTTPError.
  • LoopNewLoop(collector, client, interval, logger); Run(ctx) error. Constants MinPollSeconds=60 / MaxPollSeconds=3600.

Config additions (internal/config)

  • HubConfig{URL, HostID, APIKey, PollSeconds, TimeoutSeconds, CAFile} on Config.Hub.
  • FELHOM_AGENT_HUB_{URL,HOST_ID,API_KEY,POLL_SECONDS,TIMEOUT_SECONDS,CA_FILE} overlay (int parse errors warn to stderr + keep file value, never crash).
  • HubConfig.Validate() (mode-aware — proxmox-only selftests unaffected; https required except loopback for tests), HubConfig.WithDefaults() (900s/30s), Redacted() blanks the key.
  • configs/agent.example.json gains hub (and authz) blocks.

Daemon-loop behaviour (main.go)

  • No --selftest flag → daemon: validate proxmox + hub config → build read-path proxmox client, collector, hub client, loop → signal.NotifyContext(SIGINT, SIGTERM)loop.Run.
  • Immediate first report, then tick at the interval; adopt the hub's poll_interval_seconds (clamped [60,3600], reset the ticker on change).
  • Resilient: any collect/report error is logged and the loop continues (survives hub 5xx and transient proxmox read errors). Clean nil return on context cancel.
  • --selftest=hub: one collect + report; prints the report it would send + the envelope.
  • Startup line logs host_id/url/interval with the key redacted; no secret ever logged.

Explicitly deferred (defined now, not active)

  • Defined-but-EMPTY this slice (slices 5/6 fill): storage_targets, backups, restore_tests, pbs_snapshots, audit_tail — emitted as typed empty [].
  • Parsed-but-IGNORED (slice 4 / reconcile consumes): the envelope's blocked, desired_generation, has_signed_ops — logged at most, never acted on.
  • No per-guest work queue (zero Proxmox mutations this slice); no canonical JSON (nothing signs the report); no controller_version (slice 8) — emitted "".

proxmox surface

No changes to internal/proxmox or internal/authz. No new proxmox surface was needed: ListLXC already returns status/maxmem/maxdisk and GuestConfig returns cores. The task's proxmoxReader sketch (node-arg / pointer returns / LXC type) was adapted to the real exportsNode() on the client, value returns, proxmox.Guest — per its instruction.

Test matrix (all green)

  • report: field names match §4; empty collections serialize as [] not null; spec omitted when unknown.
  • client: sets Bearer; non-2xx → *HTTPError (status preserved); transport → *TransportError; asserts the bearer token never appears in any error string.
  • collector: NodeStatus→host block; ListLXC+GuestConfig→guest spec; a failing GuestConfigstatus="unknown" + omitted spec + still returns a report; a failing NodeStatus → hard error; cloudflared probe error → "unknown".
  • loop: immediate first report; continues after an injected report error (≥3 cycles); adopts + clamps the envelope interval (cycle-level) and applies a slower interval in Run.
  • config: hub validate cases, key redaction, env overlay + defaults.

Verification

  • go build/vet/test green locally (go1.26.0) and on the build server (go1.26.0). No live hub or systemctl in unit tests (mock transport + fake prober/collector/reporter).

Repo state

  • Branch: main only. Version 0.3.0. Dep unchanged (golang.org/x/crypto v0.52.0).