ab77fa3544
internal/hub: the agent's first daemon — a periodic read-only host-report POSTed to the hub (the heartbeat; no separate ping). - HostReport wire contract (shared field-for-field with the hub ingest): host metrics, guests (vmid + spec), cloudflared status; storage/backups/restore-tests/ pbs/audit collections DEFINED but emitted empty (slices 5/6 fill). - Collector over a read-only proxmoxReader (adapted to the real proxmox surface; no proxmox changes) + a CloudflaredProber. Partial-failure: NodeStatus fail = hard (skip POST); per-guest GuestConfig fail = status "unknown", still report. - Client: Bearer-auth POST, standard TLS (system roots / optional ca_file), typed TransportError/HTTPError, token never in errors. - Loop: immediate first report, adopt hub poll_interval (clamp [60,3600]), resilient to collect/report errors, clean ctx-cancel shutdown. - ControlEnvelope: only poll_interval_seconds acted on; blocked/desired_generation/ has_signed_ops parsed-but-ignored (slice 4). - config: HubConfig + FELHOM_AGENT_HUB_* overlay + mode-aware HubConfig.Validate + WithDefaults + hub-key redaction; example config updated. - main: no-selftest mode is now the daemon; added --selftest=hub. Version -> 0.3.0. Tests: report serialization, client (incl. token-redaction), collector partial- failure, loop continuation+interval adoption, config. internal/proxmox + internal/ authz untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
47 lines
1.5 KiB
Go
47 lines
1.5 KiB
Go
package hub
|
|
|
|
import (
|
|
"context"
|
|
"os/exec"
|
|
"strings"
|
|
)
|
|
|
|
// CloudflaredProber reports the cloudflared tunnel service health. It is a
|
|
// READ-ONLY probe: the agent does NOT manage or restart cloudflared in this slice
|
|
// (that is the tunnel-management slice — this is the seam for it). Injectable so
|
|
// tests use a fake and never exec.
|
|
type CloudflaredProber interface {
|
|
// Status returns one of: "active" | "inactive" | "failed" | "unknown".
|
|
Status(ctx context.Context) (string, error)
|
|
}
|
|
|
|
// SystemctlProber runs `systemctl is-active cloudflared`. This is NOT a Privileged
|
|
// (root-CLI) op — `is-active` is non-root readable and is not one of the three
|
|
// proven root exceptions, so it does not go through internal/proxmox.Privileged.
|
|
type SystemctlProber struct {
|
|
Unit string // defaults to "cloudflared"
|
|
}
|
|
|
|
// Status maps `systemctl is-active` output to the report vocabulary. systemctl
|
|
// exits non-zero for inactive/failed, so the output string is authoritative over
|
|
// the exit code; any exec error (binary missing, etc.) maps to "unknown".
|
|
func (p SystemctlProber) Status(ctx context.Context) (string, error) {
|
|
unit := p.Unit
|
|
if unit == "" {
|
|
unit = "cloudflared"
|
|
}
|
|
out, _ := exec.CommandContext(ctx, "systemctl", "is-active", unit).Output()
|
|
switch strings.TrimSpace(string(out)) {
|
|
case "active":
|
|
return "active", nil
|
|
case "failed":
|
|
return "failed", nil
|
|
case "inactive", "deactivating", "activating":
|
|
return "inactive", nil
|
|
case "":
|
|
return "unknown", nil // no output → systemctl/exec problem
|
|
default:
|
|
return "unknown", nil
|
|
}
|
|
}
|