internal/hub: the agent's first daemon — a periodic read-only host-report POSTed to the hub (the heartbeat; no separate ping). - HostReport wire contract (shared field-for-field with the hub ingest): host metrics, guests (vmid + spec), cloudflared status; storage/backups/restore-tests/ pbs/audit collections DEFINED but emitted empty (slices 5/6 fill). - Collector over a read-only proxmoxReader (adapted to the real proxmox surface; no proxmox changes) + a CloudflaredProber. Partial-failure: NodeStatus fail = hard (skip POST); per-guest GuestConfig fail = status "unknown", still report. - Client: Bearer-auth POST, standard TLS (system roots / optional ca_file), typed TransportError/HTTPError, token never in errors. - Loop: immediate first report, adopt hub poll_interval (clamp [60,3600]), resilient to collect/report errors, clean ctx-cancel shutdown. - ControlEnvelope: only poll_interval_seconds acted on; blocked/desired_generation/ has_signed_ops parsed-but-ignored (slice 4). - config: HubConfig + FELHOM_AGENT_HUB_* overlay + mode-aware HubConfig.Validate + WithDefaults + hub-key redaction; example config updated. - main: no-selftest mode is now the daemon; added --selftest=hub. Version -> 0.3.0. Tests: report serialization, client (incl. token-redaction), collector partial- failure, loop continuation+interval adoption, config. internal/proxmox + internal/ authz untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4.6 KiB
felhom-agent — latest task report
This file holds the report for the most recent change, fully overwritten each task. Cumulative history lives in CHANGELOG.md.
Task: hub client + host-report + first daemon loop (slice 3) — v0.3.0
The agent's first daemon: a periodic, read-only host-report POSTed to the hub — which
is the heartbeat (its server-side received_at is the dead-man's-switch signal). New
internal/hub package + config additions + main.go daemon wiring. Pushed to main;
build/vet/test green locally (go1.26) and on the build server.
internal/hub public surface
HostReport+ sub-types (HostMetrics,Guest,GuestSpec,Cloudflared,ControlEnvelope) — the JSON wire contract shared field-for-field with the hub ingest.Collector—NewCollector(px proxmoxReader, cf CloudflaredProber, hostID, agentVersion, logger);Collect(ctx) (*HostReport, error).CloudflaredProberinterface +SystemctlProber(systemctl is-active).Client—NewClient(cfg config.HubConfig, logger) (*Client, error);Report(ctx, *HostReport) (*ControlEnvelope, error); typed*TransportError/*HTTPError.Loop—NewLoop(collector, client, interval, logger);Run(ctx) error. ConstantsMinPollSeconds=60/MaxPollSeconds=3600.
Config additions (internal/config)
HubConfig{URL, HostID, APIKey, PollSeconds, TimeoutSeconds, CAFile}onConfig.Hub.FELHOM_AGENT_HUB_{URL,HOST_ID,API_KEY,POLL_SECONDS,TIMEOUT_SECONDS,CA_FILE}overlay (int parse errors warn to stderr + keep file value, never crash).HubConfig.Validate()(mode-aware — proxmox-only selftests unaffected; https required except loopback for tests),HubConfig.WithDefaults()(900s/30s),Redacted()blanks the key.configs/agent.example.jsongainshub(andauthz) blocks.
Daemon-loop behaviour (main.go)
- No
--selftestflag → daemon: validate proxmox + hub config → build read-path proxmox client, collector, hub client, loop →signal.NotifyContext(SIGINT, SIGTERM)→loop.Run. - Immediate first report, then tick at the interval; adopt the hub's
poll_interval_seconds(clamped [60,3600], reset the ticker on change). - Resilient: any collect/report error is logged and the loop continues (survives hub 5xx
and transient proxmox read errors). Clean
nilreturn on context cancel. --selftest=hub: one collect + report; prints the report it would send + the envelope.- Startup line logs host_id/url/interval with the key redacted; no secret ever logged.
Explicitly deferred (defined now, not active)
- Defined-but-EMPTY this slice (slices 5/6 fill):
storage_targets,backups,restore_tests,pbs_snapshots,audit_tail— emitted as typed empty[]. - Parsed-but-IGNORED (slice 4 / reconcile consumes): the envelope's
blocked,desired_generation,has_signed_ops— logged at most, never acted on. - No per-guest work queue (zero Proxmox mutations this slice); no canonical JSON (nothing
signs the report); no controller_version (slice 8) — emitted
"".
proxmox surface
No changes to internal/proxmox or internal/authz. No new proxmox surface was needed:
ListLXC already returns status/maxmem/maxdisk and GuestConfig returns cores. The task's
proxmoxReader sketch (node-arg / pointer returns / LXC type) was adapted to the real
exports — Node() on the client, value returns, proxmox.Guest — per its instruction.
Test matrix (all green)
- report: field names match §4; empty collections serialize as
[]notnull; spec omitted when unknown. - client: sets
Bearer; non-2xx →*HTTPError(status preserved); transport →*TransportError; asserts the bearer token never appears in any error string. - collector:
NodeStatus→host block;ListLXC+GuestConfig→guest spec; a failingGuestConfig→status="unknown"+ omitted spec + still returns a report; a failingNodeStatus→ hard error; cloudflared probe error →"unknown". - loop: immediate first report; continues after an injected report error (≥3 cycles);
adopts + clamps the envelope interval (cycle-level) and applies a slower interval in
Run. - config: hub validate cases, key redaction, env overlay + defaults.
Verification
go build/vet/testgreen locally (go1.26.0) and on the build server (go1.26.0). No live hub orsystemctlin unit tests (mock transport + fake prober/collector/reporter).
Repo state
- Branch:
mainonly. Version 0.3.0. Dep unchanged (golang.org/x/crypto v0.52.0).