fix(agent): slice-3 follow-ups — keep run-status on config fail, selftest usage, contract golden (v0.3.1)
- collect: a per-guest GuestConfig failure preserves the ListLXC run-status (only spec dropped); empty status normalized to "unknown". Test asserts preserved "running" + nil spec. - main: --selftest usage error now reads (want read|task|hub). - contract: testdata/host-report.golden.json + TestHostReport_ContractMatchesGolden (field-name key-set check vs golden; byte-identical with the hub copy). - version 0.3.0 -> 0.3.1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,71 +3,45 @@
|
||||
> This file holds the report for the **most recent** change, fully overwritten each task.
|
||||
> Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
|
||||
|
||||
## Task: hub client + host-report + first daemon loop (slice 3) — v0.3.0
|
||||
## Task: slice-3 validation follow-ups — v0.3.1
|
||||
|
||||
The agent's **first daemon**: a periodic, read-only host-report POSTed to the hub — which
|
||||
**is** the heartbeat (its server-side `received_at` is the dead-man's-switch signal). New
|
||||
`internal/hub` package + config additions + `main.go` daemon wiring. Pushed to `main`;
|
||||
build/vet/test green locally (go1.26) and on the build server.
|
||||
Small fixes surfaced during slice-3 validation (agent half). Pushed to `main`; build/vet/test
|
||||
green locally (go1.26) and on the build server.
|
||||
|
||||
### `internal/hub` public surface
|
||||
- **`HostReport`** + sub-types (`HostMetrics`, `Guest`, `GuestSpec`, `Cloudflared`,
|
||||
`ControlEnvelope`) — the JSON wire contract shared field-for-field with the hub ingest.
|
||||
- **`Collector`** — `NewCollector(px proxmoxReader, cf CloudflaredProber, hostID, agentVersion, logger)`;
|
||||
`Collect(ctx) (*HostReport, error)`.
|
||||
- **`CloudflaredProber`** interface + **`SystemctlProber`** (`systemctl is-active`).
|
||||
- **`Client`** — `NewClient(cfg config.HubConfig, logger) (*Client, error)`;
|
||||
`Report(ctx, *HostReport) (*ControlEnvelope, error)`; typed `*TransportError` / `*HTTPError`.
|
||||
- **`Loop`** — `NewLoop(collector, client, interval, logger)`; `Run(ctx) error`. Constants
|
||||
`MinPollSeconds=60` / `MaxPollSeconds=3600`.
|
||||
### §1 — `--selftest` usage string
|
||||
`selftestFlag.Set`'s error now reads `(want read|task|hub)` (was missing `hub`, which became a
|
||||
valid mode in slice 3). Cosmetic.
|
||||
|
||||
### Config additions (`internal/config`)
|
||||
- `HubConfig{URL, HostID, APIKey, PollSeconds, TimeoutSeconds, CAFile}` on `Config.Hub`.
|
||||
- `FELHOM_AGENT_HUB_{URL,HOST_ID,API_KEY,POLL_SECONDS,TIMEOUT_SECONDS,CA_FILE}` overlay
|
||||
(int parse errors warn to stderr + keep file value, never crash).
|
||||
- `HubConfig.Validate()` (mode-aware — proxmox-only selftests unaffected; https required
|
||||
except loopback for tests), `HubConfig.WithDefaults()` (900s/30s), `Redacted()` blanks the key.
|
||||
- `configs/agent.example.json` gains `hub` (and `authz`) blocks.
|
||||
### §2 — collector keeps run-status on a `GuestConfig` failure
|
||||
`internal/hub/collect.go` `collectGuests`: a per-guest `GuestConfig` error no longer forces
|
||||
`status="unknown"`. The run-status from `ListLXC` is **preserved** (only `spec` is dropped — that's
|
||||
the only thing actually unknown). An *empty* status is still normalized to `unknown`, so the wire
|
||||
value is always `running|stopped|unknown` (matches the hub handler's empty→unknown defaulting).
|
||||
Test renamed `TestCollect_GuestConfigFailureKeepsStatusOmitsSpec`, now asserting the preserved
|
||||
`running` status **and** nil spec (not a hollow `!= "unknown"` check).
|
||||
|
||||
### Daemon-loop behaviour (`main.go`)
|
||||
- No `--selftest` flag → **daemon**: validate proxmox + hub config → build read-path proxmox
|
||||
client, collector, hub client, loop → `signal.NotifyContext(SIGINT, SIGTERM)` → `loop.Run`.
|
||||
- **Immediate first report**, then tick at the interval; adopt the hub's
|
||||
`poll_interval_seconds` (clamped [60,3600], reset the ticker on change).
|
||||
- **Resilient**: any collect/report error is logged and the loop continues (survives hub 5xx
|
||||
and transient proxmox read errors). Clean `nil` return on context cancel.
|
||||
- **`--selftest=hub`**: one collect + report; prints the report it would send + the envelope.
|
||||
- Startup line logs host_id/url/interval with the **key redacted**; no secret ever logged.
|
||||
### §4 — cross-repo contract golden fixture (agent half)
|
||||
The host-report shape lives in two repos with nothing failing on drift (the hub ignores unknown
|
||||
fields). Locked it with a golden sample:
|
||||
- `internal/hub/testdata/host-report.golden.json` — a populated report (host block, two guests:
|
||||
one `running` with `spec`, one `stopped`; `cloudflared`; the four empty collections + `audit_tail`
|
||||
as `[]`).
|
||||
- `TestHostReport_ContractMatchesGolden` — marshals a constructed `HostReport`, unmarshals the
|
||||
golden, and compares **field-name key sets** at top level + `host` + `guests[0]`. A renamed/added/
|
||||
removed json tag fails it.
|
||||
|
||||
### Explicitly deferred (defined now, not active)
|
||||
- **Defined-but-EMPTY** this slice (slices 5/6 fill): `storage_targets`, `backups`,
|
||||
`restore_tests`, `pbs_snapshots`, `audit_tail` — emitted as typed empty `[]`.
|
||||
- **Parsed-but-IGNORED** (slice 4 / reconcile consumes): the envelope's `blocked`,
|
||||
`desired_generation`, `has_signed_ops` — logged at most, never acted on.
|
||||
- No per-guest work queue (zero Proxmox mutations this slice); no canonical JSON (nothing
|
||||
signs the report); no controller_version (slice 8) — emitted `""`.
|
||||
**Caveat (called out):** this is a *duplicated* contract — the file must stay **byte-identical**
|
||||
with `felhom-hub`'s `hub/internal/api/testdata/host-report.golden.json`. JSON can't carry a comment,
|
||||
so the mandatory "keep byte-identical" note lives in the test file's doc comment in both repos
|
||||
instead of a JSON header. When slices 5/6 add real `storage_targets`/`backups` fields, revisit
|
||||
promoting this to a shared Go types module (the proper fix).
|
||||
|
||||
### proxmox surface
|
||||
**No changes to `internal/proxmox` or `internal/authz`.** No new proxmox surface was needed:
|
||||
`ListLXC` already returns status/maxmem/maxdisk and `GuestConfig` returns cores. The task's
|
||||
`proxmoxReader` sketch (node-arg / pointer returns / `LXC` type) was **adapted to the real
|
||||
exports** — `Node()` on the client, value returns, `proxmox.Guest` — per its instruction.
|
||||
|
||||
### Test matrix (all green)
|
||||
- **report**: field names match §4; empty collections serialize as `[]` not `null`; spec
|
||||
omitted when unknown.
|
||||
- **client**: sets `Bearer`; non-2xx → `*HTTPError` (status preserved); transport → `*TransportError`;
|
||||
**asserts the bearer token never appears in any error string**.
|
||||
- **collector**: `NodeStatus`→host block; `ListLXC`+`GuestConfig`→guest spec; a failing
|
||||
`GuestConfig` → `status="unknown"` + omitted spec + **still returns a report**; a failing
|
||||
`NodeStatus` → hard error; cloudflared probe error → `"unknown"`.
|
||||
- **loop**: immediate first report; continues after an injected report error (≥3 cycles);
|
||||
adopts + clamps the envelope interval (cycle-level) and applies a slower interval in `Run`.
|
||||
- **config**: hub validate cases, key redaction, env overlay + defaults.
|
||||
### Not touched (confirmed)
|
||||
The daemon's proxmox client timeout is already bounded: `proxmox.NewClient` defaults `HTTPTimeout`
|
||||
to 30s when zero, and `newProxmoxClient` leaves it zero. No change (was a "confirm" item).
|
||||
|
||||
### Verification
|
||||
- `go build/vet/test` green locally (go1.26.0) and on the build server (go1.26.0). No live hub
|
||||
or `systemctl` in unit tests (mock transport + fake prober/collector/reporter).
|
||||
`go build/vet/test ./...` green locally (go1.26) and on the build server (go1.26). Version 0.3.0 → 0.3.1.
|
||||
|
||||
### Repo state
|
||||
- Branch: `main` only. Version 0.3.0. Dep unchanged (`golang.org/x/crypto v0.52.0`).
|
||||
Branch: `main` only. Dep unchanged (`golang.org/x/crypto v0.52.0`).
|
||||
|
||||
Reference in New Issue
Block a user