feat(hub): host-report client + collector + first daemon loop (slice 3, v0.3.0)

internal/hub: the agent's first daemon — a periodic read-only host-report POSTed to
the hub (the heartbeat; no separate ping).

- HostReport wire contract (shared field-for-field with the hub ingest): host
  metrics, guests (vmid + spec), cloudflared status; storage/backups/restore-tests/
  pbs/audit collections DEFINED but emitted empty (slices 5/6 fill).
- Collector over a read-only proxmoxReader (adapted to the real proxmox surface;
  no proxmox changes) + a CloudflaredProber. Partial-failure: NodeStatus fail = hard
  (skip POST); per-guest GuestConfig fail = status "unknown", still report.
- Client: Bearer-auth POST, standard TLS (system roots / optional ca_file), typed
  TransportError/HTTPError, token never in errors.
- Loop: immediate first report, adopt hub poll_interval (clamp [60,3600]), resilient
  to collect/report errors, clean ctx-cancel shutdown.
- ControlEnvelope: only poll_interval_seconds acted on; blocked/desired_generation/
  has_signed_ops parsed-but-ignored (slice 4).
- config: HubConfig + FELHOM_AGENT_HUB_* overlay + mode-aware HubConfig.Validate +
  WithDefaults + hub-key redaction; example config updated.
- main: no-selftest mode is now the daemon; added --selftest=hub. Version -> 0.3.0.

Tests: report serialization, client (incl. token-redaction), collector partial-
failure, loop continuation+interval adoption, config. internal/proxmox + internal/
authz untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 16:20:09 +02:00
parent f0fee7e193
commit ab77fa3544
16 changed files with 1352 additions and 91 deletions
+85
View File
@@ -0,0 +1,85 @@
package hub
// HostReport is the wire contract shared with the hub's ingest
// (felhom.eu TASK-slice3-hub-ingest). Field NAMES must match the hub
// field-for-field. Encoding is ordinary encoding/json (no canonicalization —
// nothing signs this report; canonical JSON is a slice-10 signing concern).
//
// The report IS the heartbeat: one periodic POST /api/v1/host-report, whose
// server-side received_at is the hub's dead-man's-switch liveness signal. There is
// no separate heartbeat endpoint.
type HostReport struct {
HostID string `json:"host_id"` // echoes config.Hub.HostID
ReportedAt string `json:"reported_at"` // RFC3339, agent clock
AgentVersion string `json:"agent_version"`
Host HostMetrics `json:"host"`
Guests []Guest `json:"guests"`
// Defined now as the stable contract; emitted EMPTY (non-nil) this slice.
StorageTargets []StorageTarget `json:"storage_targets"` // slice 5 (storage manifest)
Backups []Backup `json:"backups"` // slice 6
RestoreTests []RestoreTest `json:"restore_tests"` // slice 6
PBSSnapshots []PBSSnapshot `json:"pbs_snapshots"` // slice 6
Cloudflared Cloudflared `json:"cloudflared"`
AuditTail []AuditEntry `json:"audit_tail"` // populated by a later slice
}
// HostMetrics is the host block, sourced from proxmox NodeStatus.
type HostMetrics struct {
Node string `json:"node"`
CPUPercent float64 `json:"cpu_percent"` // 0100
MemoryTotalBytes int64 `json:"memory_total_bytes"`
MemoryUsedBytes int64 `json:"memory_used_bytes"`
MemoryPercent float64 `json:"memory_percent"`
DiskTotalBytes int64 `json:"disk_total_bytes"` // host root fs
DiskUsedBytes int64 `json:"disk_used_bytes"`
DiskPercent float64 `json:"disk_percent"`
LoadAvg []string `json:"loadavg"` // array of STRINGS (PVE shape)
UptimeSeconds int64 `json:"uptime_seconds"`
}
// Guest is one LXC. The agent reports vmid; the hub derives the guest PK
// "<host_id>/<vmid>" (keeping the id scheme hub-side — locked decision 4).
type Guest struct {
VMID int `json:"vmid"`
Name string `json:"name"`
Status string `json:"status"` // running | stopped | unknown
ControllerVersion string `json:"controller_version"` // "" this slice (slice 8 fills)
Spec *GuestSpec `json:"spec,omitempty"` // omitted when status unknown
}
// GuestSpec is the provisioned guest sizing.
type GuestSpec struct {
Cores int `json:"cores"`
MemoryBytes int64 `json:"memory_bytes"`
DiskBytes int64 `json:"disk_bytes"`
}
// Cloudflared is the tunnel service health (read-only probe this slice).
type Cloudflared struct {
Status string `json:"status"` // active | inactive | failed | unknown
}
// The following element types are declared now so the empty collections above are
// typed and slices 5/6 only fill them. No wire fields are committed yet.
type StorageTarget struct{} // slice 5: storage manifest target fields TBD
type Backup struct{} // slice 6: per-target backup status fields TBD
type RestoreTest struct{} // slice 6: self-restore-test result fields TBD
type PBSSnapshot struct{} // slice 6: PBS snapshot inventory fields TBD
type AuditEntry struct{} // audit-log tail entry fields TBD
// ControlEnvelope is the hub's 200 response to a host-report. This slice the agent
// adopts ONLY PollIntervalSeconds; the rest are reserved/forward-compat fields it
// logs at most and never acts on (reconcile, slice 4, consumes them).
type ControlEnvelope struct {
Status string `json:"status"`
// PollIntervalSeconds is a pointer so a missing field (keep current interval) is
// distinguishable from an explicit 0.
PollIntervalSeconds *int `json:"poll_interval_seconds"`
Blocked bool `json:"blocked"` // reserved — ignored (slice 4)
DesiredGeneration int64 `json:"desired_generation"` // reserved — ignored (slice 4)
HasSignedOps bool `json:"has_signed_ops"` // reserved — ignored (slice 4)
}