Files
admin ab77fa3544 feat(hub): host-report client + collector + first daemon loop (slice 3, v0.3.0)
internal/hub: the agent's first daemon — a periodic read-only host-report POSTed to
the hub (the heartbeat; no separate ping).

- HostReport wire contract (shared field-for-field with the hub ingest): host
  metrics, guests (vmid + spec), cloudflared status; storage/backups/restore-tests/
  pbs/audit collections DEFINED but emitted empty (slices 5/6 fill).
- Collector over a read-only proxmoxReader (adapted to the real proxmox surface;
  no proxmox changes) + a CloudflaredProber. Partial-failure: NodeStatus fail = hard
  (skip POST); per-guest GuestConfig fail = status "unknown", still report.
- Client: Bearer-auth POST, standard TLS (system roots / optional ca_file), typed
  TransportError/HTTPError, token never in errors.
- Loop: immediate first report, adopt hub poll_interval (clamp [60,3600]), resilient
  to collect/report errors, clean ctx-cancel shutdown.
- ControlEnvelope: only poll_interval_seconds acted on; blocked/desired_generation/
  has_signed_ops parsed-but-ignored (slice 4).
- config: HubConfig + FELHOM_AGENT_HUB_* overlay + mode-aware HubConfig.Validate +
  WithDefaults + hub-key redaction; example config updated.
- main: no-selftest mode is now the daemon; added --selftest=hub. Version -> 0.3.0.

Tests: report serialization, client (incl. token-redaction), collector partial-
failure, loop continuation+interval adoption, config. internal/proxmox + internal/
authz untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:20:09 +02:00

47 lines
1.5 KiB
Go

package hub
import (
"context"
"os/exec"
"strings"
)
// CloudflaredProber reports the cloudflared tunnel service health. It is a
// READ-ONLY probe: the agent does NOT manage or restart cloudflared in this slice
// (that is the tunnel-management slice — this is the seam for it). Injectable so
// tests use a fake and never exec.
type CloudflaredProber interface {
// Status returns one of: "active" | "inactive" | "failed" | "unknown".
Status(ctx context.Context) (string, error)
}
// SystemctlProber runs `systemctl is-active cloudflared`. This is NOT a Privileged
// (root-CLI) op — `is-active` is non-root readable and is not one of the three
// proven root exceptions, so it does not go through internal/proxmox.Privileged.
type SystemctlProber struct {
Unit string // defaults to "cloudflared"
}
// Status maps `systemctl is-active` output to the report vocabulary. systemctl
// exits non-zero for inactive/failed, so the output string is authoritative over
// the exit code; any exec error (binary missing, etc.) maps to "unknown".
func (p SystemctlProber) Status(ctx context.Context) (string, error) {
unit := p.Unit
if unit == "" {
unit = "cloudflared"
}
out, _ := exec.CommandContext(ctx, "systemctl", "is-active", unit).Output()
switch strings.TrimSpace(string(out)) {
case "active":
return "active", nil
case "failed":
return "failed", nil
case "inactive", "deactivating", "activating":
return "inactive", nil
case "":
return "unknown", nil // no output → systemctl/exec problem
default:
return "unknown", nil
}
}