hub v0.7.2: ingest agent storage_targets (slice 5 Phase A)

Accept + persist the now-populated host-report storage_targets. Minimal — the
authoritative storage manifest is hub-owned (slice 10); this mirrors what the agent
observes.

- hostReportPayload.StorageTargets: full mirror of the agent's hub.StorageTarget
  wire contract; persisted verbatim in report_json (no schema change); count +
  WARN on disconnected targets.
- shared host-report golden updated with two populated targets; byte-identical with
  felhom-agent's copy.
- TestHostStorageTarget_GoldenContract: hub half of the bidirectional key-set test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-09 09:59:27 +02:00
parent 2f8658981d
commit aaff268fff
5 changed files with 249 additions and 125 deletions
+28 -119
View File
@@ -1,132 +1,41 @@
# felhom.eu — task reports
> **Overwrite** this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md). Sections below predate this convention change and are retained as history.
> **Overwrite** this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md).
---
## Hub slice 3 — host-domain ingest (v0.7.0) 2026-06-08
# REPORT — Hub: ingest agent `storage_targets` (v0.7.2) (2026-06-09)
Purely **additive** host-domain ingest in `hub/`: new tables, the agent's
`/host-report` heartbeat endpoint, per-host Bearer auth, a provisional host mint, and a
host-domain dead-man's-switch. The existing controller path is **untouched**; the schema/
auth cutover remains **slice 10**. Pushed to `main`; build/vet/test green locally and on
the build server.
## Outcome
### New tables (`store.go migrate()`, idempotent — `// v0.7.0: host-domain`)
- **`hosts`** — one per customer agent. Reality columns (`agent_version`, `last_report_at`)
+ operator-intent columns **INERT until slice 10** (`desired_json`, `desired_generation`,
`dr_record_json`).
- **`guests`** — one per controller LXC, PK `guest_id = "<host_id>/<vmid>"` (hub-derived).
Reality columns (`display_name`, `status`, `controller_version`, `vmid`, `last_seen_at`)
+ **INERT** `api_key`, `desired_spec_json`.
- **`host_reports`** — the report stream + denormalized columns (cpu/mem/disk %, guest
counts, cloudflared status); pruned by `Prune(maxDays)` alongside `reports`.
**Pushed as hub `v0.7.2`.** The felhom-agent slice-5 Phase A work populates the host-report's
`storage_targets` (previously a defined-but-empty stub). This change is the hub half: accept
and persist them. Deliberately minimal — the authoritative storage manifest (desired
class/role/policy/creds) is hub-owned and arrives at slice 10; this slice only mirrors what the
agent observes.
> Inert columns exist **now** so slice 10 needs no `ALTER`; nothing reads/writes them this
> slice. Migration is additive-only (no `DROP`, no edits to `reports`/`customer_configs`)
> and idempotent.
## What landed (`hub/internal/api/handler.go`, `host_test.go`, golden)
### New store methods
`GetHostByAPIKey`, `GetHost`, `ListHosts`, `UpsertHost` (updates only identity + `updated_at`
on conflict), `SaveHostReport` (inserts a report row + bumps reality columns only),
`UpsertGuestFromReport` (updates reality columns only — **preserves** `api_key`/
`desired_spec_json`), `GetHostStaleness` (skips never-reported hosts), `GuestID`.
Structs: `Host`, `Guest`, `HostReportDenorm`, `HostStaleRow`.
- `hostReportPayload` now parses `storage_targets` via a `hostStorageTarget` mirror struct that
matches the agent's `hub.StorageTarget` wire contract field-for-field (name/type/durable_id/
state/reachable/usage/content/mount/class_hint/role/`thin_pool`/`smart`).
- Persistence: the targets are stored verbatim in the existing `report_json` column (no schema
change / no migration). The handler counts them and logs a `[WARN]` listing disconnected
targets — the storage analog of host-down visibility.
- The shared `testdata/host-report.golden.json` now carries two populated targets (an lvmthin
with `thin_pool`, a usb) and is **byte-identical** with felhom-agent's copy.
- Tests: `TestHostStorageTarget_GoldenContract` is the hub half of the bidirectional key-set
test (round-trips the golden through the mirror, asserts exact key match);
`TestHostReport_GoldenContract` also asserts the targets persist + parse back. `go test
./internal/api/ ./internal/store/` is green.
### Auth (added; existing path unchanged)
`checkAuthHost(r)``(hostID, customerID, isGlobal, ok)`: global key → trust `body.host_id`;
per-host key → bound identity; failure → not-ok. `checkAuthCustomer` is byte-for-byte unchanged.
## Backward compatibility
### Endpoints
- **`POST /api/v1/host-report`** (the heartbeat): per-host auth; 4 MiB body; computes denorm
(`guest_running` counts only `status=="running"`); `SaveHostReport` + per-guest
`UpsertGuestFromReport` (a guest upsert failure is logged, not fatal — liveness); returns the
control envelope `{status:"ok", poll_interval_seconds:900, blocked, desired_generation:0,
has_signed_ops:false}`. `blocked` reflects `customer_configs.status`; the other two are
reserved placeholders (slice 4). Global-key bootstrap requires the host to already exist
(else 400); per-host key requires `body.host_id == hostID` (else 403).
- **`POST /api/v1/admin/hosts`** — **PROVISIONAL**, global-key only. Mints `host_id` (legible
`<customer>-<hex>`) + a random `api_key` (`configgen.RandomHex(32)`); 201 `{host_id, api_key}`.
Flagged in code as the slice-3 bootstrap to be removed/locked at enrollment (slices 78).
An older agent that sends `storage_targets: []` (or omits the field) is accepted unchanged.
The legacy controller report path is untouched (frozen until the slice-10 cutover).
### Host dead-man's-switch
`monitor.HostStalenessChecker` (`host_staleness.go`) — a **sibling** of the controller
`StalenessChecker`, keyed on host↔`host_reports`, emitting `host_stale`/`host_down`/
`host_recovered` (30m / 60m), attributed to the host's customer (so the existing per-customer
notification UX picks them up). Registered in `allowedEventTypes`; wired in `main.go` on the
existing 60s ticker. The controller staleness/deadline checkers are untouched and keep running.
## Deploy
### Contract
The `/host-report` JSON matches the agent spec §4 field-for-field (host_id, reported_at,
agent_version, host{…}, guests[{vmid,name,status,controller_version,spec}], cloudflared{status},
and the empty storage_targets/backups/restore_tests/pbs_snapshots/audit_tail — accepted
empty/absent). The envelope matches agent spec §5.
### Test matrix (new, hermetic — temp SQLite, no live data)
- **store**: upsert/lookup; a report-path update **preserves** `desired_json`/`desired_generation`;
guest upsert **preserves** `api_key`/`desired_spec_json` while updating reality; `GuestID`;
staleness skips never-reported.
- **auth**: `checkAuthHost` global / per-host / unknown.
- **ingest**: valid → 200 + envelope + denorm (`guest_running` = 1 of 2); host_id mismatch → 403;
unknown host under global key → 400; blocked customer → `blocked:true`; oversize body → 400.
- **admin mint**: non-global → 403; unknown customer → 400; success → 201 + minted key
round-trips through `/host-report`.
- **host staleness**: seed emits no events; ok→stale→down→recovered transitions.
### Untouched / deferred (explicit)
- **Controller path unchanged**: `/api/v1/report`, `reports`, `customer_configs`,
`checkAuthCustomer`, existing staleness + deadline checkers — additions only, all still green.
- **Not built** (per scope): desired-state serving, `signed_ops`, geo→hub, DR-record migration,
dashboard re-design. The cutover (drop `reports``guest_reports`, merge checkers, tighten the
provisional admin/global-key auth) remains **slice 10**.
### Versioning / deploy
Hub version is the `main.Version` ldflags var (`build.sh <VER>`), default `"dev"`; recorded
**v0.7.0** in `hub/CHANGELOG.md`. The image build + ArgoCD deploy are **not** part of this task
(no deploy performed).
### Repo state
Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the
build server (go1.26).
---
## Hub slice-3 follow-ups (v0.7.1) — 2026-06-08
Validation follow-ups (hub half). Pushed to `main`; build/vet/test green locally (go1.26) and on
the build server.
### §3 — `/host-report` rejects oversize with 413 (not silent truncation)
`handleHostReport` now reads `maxHostReportBytes+1` (const `4 << 20`, defined near
`defaultHostPollSeconds`) and returns **`413 Payload too large`** when exceeded, instead of relying
on `LimitReader` truncation (which could accept a truncated-but-valid JSON as a partial report,
dropping guests from the mirror). **Scope-frozen:** the controller `handleReport` 1 MiB read is
**unchanged** (diff touches only the host path); the small divergence is acceptable until cutover.
`TestHandleHostReport_OversizeRejected` now asserts 413.
### §4 — cross-repo contract golden fixture (hub half)
- `hub/internal/api/testdata/host-report.golden.json` — a **byte-identical copy** of felhom-agent's
golden (verified by md5).
- `TestHostReport_GoldenContract` — mints a host, POSTs the golden through the **real**
`handleHostReport`, asserts 200 + denorm (`guest_total=2`, `guest_running=1`,
`cloudflared_status="active"`) + both guests upserted. Proves `hostReportPayload` still extracts
the contract from the real wire shape.
**Caveat (called out):** the two golden files are a *duplicated* contract with no shared source of
truth. JSON can't hold a comment, so the mandatory "keep byte-identical" marker lives in each test
file's doc comment. When slices 5/6 add real `storage_targets`/`backups` fields, promote this to a
shared Go types module (the proper fix); this fixture is the bridge.
### Versioning / scope
Recorded **v0.7.1** in `hub/CHANGELOG.md`. The hub version is the `main.Version` ldflags var
(`build.sh <VER>`, default `"dev"`) — there is no in-repo version constant to bump (the task's
pointer to `web/version.go` is the controller-image `VersionChecker`, unrelated); the image tag is
applied at build/deploy (ArgoCD), not in this task. No deploy performed.
### Untouched (confirmed)
Controller path (`handleReport`/`reports`/`customer_configs`/`checkAuthCustomer`/existing checkers)
unchanged. The agent's proxmox client timeout was a "confirm" item — already bounded (30s default),
no change.
### Repo state
Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the build server (go1.26).
Standard hub flow (build server 192.168.0.180): `./build.sh v0.7.2 --push` then deploy. If the
hub deployment is ArgoCD-managed, update the image tag via the managed path rather than a bare
`kubectl set image` (drift-correction would revert it).
+24
View File
@@ -1,5 +1,29 @@
# Felhom Hub — Changelog
## v0.7.2 — ingest agent storage_targets (slice 5 Phase A) (2026-06-09)
The agent's slice-5 work populates the host-report's `storage_targets` (previously empty).
This is the hub half: accept + persist them. Minimal by design — the rich, authoritative
storage manifest (desired class/role/policy/creds) is hub-owned and lands at slice 10; this
slice only mirrors what the agent observes.
### Added
- **`hostReportPayload.StorageTargets`** (`internal/api/handler.go`) — a full mirror of the
agent's `hub.StorageTarget` wire contract (name/type/durable_id/state/reachable/usage/
content/mount/class_hint/role/`thin_pool`/`smart`). The targets are persisted verbatim in
the existing `report_json` row (no schema change); the handler counts them and logs a
`[WARN]` when any are `disconnected` (the storage analog of host-down visibility).
- **`testdata/host-report.golden.json`** — updated to carry two populated `storage_targets`
(an lvmthin with `thin_pool`, a usb), kept **byte-identical** with felhom-agent's copy.
- **`TestHostStorageTarget_GoldenContract`** — the hub half of the bidirectional key-set test:
round-trips the golden's `storage_targets[0]` through the mirror struct and asserts the key
set matches exactly (no missing/extra fields vs the agent). `TestHostReport_GoldenContract`
also now asserts the targets are persisted + parse back.
### Notes
- Backward-compatible: an older agent that sends `storage_targets: []` (or omits it) is
accepted unchanged. The legacy controller report path is untouched (frozen until slice 10).
## Repo docs — no hub version change (2026-06-08)
### Changed
+63 -4
View File
@@ -234,9 +234,15 @@ const defaultHostPollSeconds = 900
const maxHostReportBytes = 4 << 20 // 4 MiB
// hostReportPayload is the subset of the agent host-report (slice-3 contract,
// §3 / agent spec §4) the hub needs for denorm + guest reality. Unknown fields
// (storage_targets/backups/restore_tests/pbs_snapshots/audit_tail) are ignored,
// so an empty or absent collection is accepted without error.
// §3 / agent spec §4) the hub needs for denorm + guest reality. The remaining fields
// (backups/restore_tests/pbs_snapshots/audit_tail) are ignored, so an empty or absent
// collection is accepted without error.
//
// storage_targets (slice 5) is now parsed: the agent populates it, and the hub accepts
// + persists it. Persistence is the full report_json row (which carries the targets
// verbatim) plus the denorm counts below — the RICH manifest schema (desired class/role/
// policy/creds) is hub-owned and lands in slice 10; this slice only mirrors what the agent
// observes.
type hostReportPayload struct {
HostID string `json:"host_id"`
AgentVersion string `json:"agent_version"`
@@ -251,11 +257,49 @@ type hostReportPayload struct {
Status string `json:"status"`
ControllerVersion string `json:"controller_version"`
} `json:"guests"`
StorageTargets []hostStorageTarget `json:"storage_targets"`
Cloudflared struct {
Status string `json:"status"`
} `json:"cloudflared"`
}
// hostStorageTarget mirrors the agent's hub.StorageTarget wire contract field-for-field.
// It is a DUPLICATED contract (no shared types module yet); testdata/host-report.golden.json
// must stay byte-identical with felhom-agent's copy and the key-set test guards drift.
// The hub does not act on these yet beyond persisting + counting them (slice 10 adds the
// authoritative manifest), but mirroring the full shape keeps the cross-repo contract honest.
type hostStorageTarget struct {
Name string `json:"name"`
Type string `json:"type"`
DurableID string `json:"durable_id"`
State string `json:"state"`
Reachable bool `json:"reachable"`
TotalBytes int64 `json:"total_bytes"`
UsedBytes int64 `json:"used_bytes"`
AvailBytes int64 `json:"avail_bytes"`
UsedFraction float64 `json:"used_fraction"`
Content string `json:"content"`
MountPath string `json:"mount_path"`
BackingDevice string `json:"backing_device"`
ClassHint string `json:"class_hint"`
Role string `json:"role"`
ThinPool *struct {
DataUsedFraction float64 `json:"data_used_fraction"`
MetadataUsedFraction *float64 `json:"metadata_used_fraction"`
} `json:"thin_pool,omitempty"`
Smart struct {
Health string `json:"health"`
TemperatureC *int `json:"temperature_c"`
PowerOnHours *int `json:"power_on_hours"`
ReallocatedSectors *int `json:"reallocated_sectors"`
PendingSectors *int `json:"pending_sectors"`
OfflineUncorrectable *int `json:"offline_uncorrectable"`
CriticalWarning *int `json:"critical_warning"`
MediaErrors *int `json:"media_errors"`
PercentageUsed *int `json:"percentage_used"`
} `json:"smart"`
}
// handleHostReport ingests the agent's host-report (the heartbeat) and returns the
// control envelope (agent spec §5).
func (h *Handler) handleHostReport(w http.ResponseWriter, r *http.Request) {
@@ -340,7 +384,22 @@ func (h *Handler) handleHostReport(w http.ResponseWriter, r *http.Request) {
}
}
h.logger.Printf("[INFO] host-report from %s (%d guests, %d bytes)", hostID, len(rep.Guests), len(body))
// storage_targets (slice 5): persisted as part of report_json above. Count + surface
// disconnected ones in the log (the slice-10 manifest will reconcile them; for now the
// signal is the visibility — a disconnected target is the storage analog of host-down).
disconnected := 0
for _, st := range rep.StorageTargets {
if st.State == "disconnected" {
disconnected++
}
}
if disconnected > 0 {
h.logger.Printf("[WARN] host %s reports %d disconnected storage target(s) of %d",
hostID, disconnected, len(rep.StorageTargets))
}
h.logger.Printf("[INFO] host-report from %s (%d guests, %d storage targets, %d bytes)",
hostID, len(rep.Guests), len(rep.StorageTargets), len(body))
blocked := false
if cc, err := h.store.GetCustomerConfig(custID); err == nil && cc != nil && cc.Status == "blocked" {
+76
View File
@@ -9,6 +9,8 @@ import (
"net/http/httptest"
"os"
"path/filepath"
"reflect"
"sort"
"strings"
"testing"
@@ -229,4 +231,78 @@ func TestHostReport_GoldenContract(t *testing.T) {
if guestCount != 2 {
t.Errorf("guests upserted = %d, want 2", guestCount)
}
// storage_targets (slice 5) must be persisted verbatim in report_json.
var reportJSON string
if err := db.QueryRow(`SELECT report_json FROM host_reports WHERE host_id='demo-host-01' ORDER BY id DESC LIMIT 1`).
Scan(&reportJSON); err != nil {
t.Fatal(err)
}
var parsed hostReportPayload
if err := json.Unmarshal([]byte(reportJSON), &parsed); err != nil {
t.Fatalf("persisted report_json does not parse: %v", err)
}
if len(parsed.StorageTargets) != 2 {
t.Errorf("persisted storage_targets = %d, want 2", len(parsed.StorageTargets))
}
if parsed.StorageTargets[0].Type != "lvmthin" || parsed.StorageTargets[0].ThinPool == nil {
t.Errorf("storage_targets[0] = %+v, want lvmthin with thin_pool", parsed.StorageTargets[0])
}
}
// TestHostStorageTarget_GoldenContract asserts the hub's hostStorageTarget mirror covers
// the golden's storage_targets[0] field-for-field (the bidirectional key-set test, the hub
// half of the cross-repo contract). It round-trips the golden element through the mirror
// struct and requires the re-marshaled key set to match exactly — neither missing a field
// the agent sends nor inventing one it doesn't.
func TestHostStorageTarget_GoldenContract(t *testing.T) {
raw, err := os.ReadFile("testdata/host-report.golden.json")
if err != nil {
t.Fatal(err)
}
var golden struct {
StorageTargets []json.RawMessage `json:"storage_targets"`
}
if err := json.Unmarshal(raw, &golden); err != nil {
t.Fatal(err)
}
if len(golden.StorageTargets) == 0 {
t.Fatal("golden has no storage_targets to check")
}
var goldenKeys map[string]any
json.Unmarshal(golden.StorageTargets[0], &goldenKeys)
var mirror hostStorageTarget
if err := json.Unmarshal(golden.StorageTargets[0], &mirror); err != nil {
t.Fatalf("golden storage target does not parse into the mirror: %v", err)
}
b, _ := json.Marshal(mirror)
var mirrorKeys map[string]any
json.Unmarshal(b, &mirrorKeys)
assertSameStorageKeys(t, "storage_targets[0]", goldenKeys, mirrorKeys)
assertSameStorageKeys(t, "storage_targets[0].smart", goldenKeys["smart"], mirrorKeys["smart"])
assertSameStorageKeys(t, "storage_targets[0].thin_pool", goldenKeys["thin_pool"], mirrorKeys["thin_pool"])
}
func assertSameStorageKeys(t *testing.T, where string, a, b any) {
t.Helper()
ka, kb := sortedKeys(a), sortedKeys(b)
if !reflect.DeepEqual(ka, kb) {
t.Errorf("contract drift at %s:\n golden keys = %v\n mirror keys = %v", where, ka, kb)
}
}
func sortedKeys(v any) []string {
m, ok := v.(map[string]any)
if !ok {
return nil
}
keys := make([]string, 0, len(m))
for k := range m {
keys = append(keys, k)
}
sort.Strings(keys)
return keys
}
+57 -1
View File
@@ -29,7 +29,63 @@
"controller_version": ""
}
],
"storage_targets": [],
"storage_targets": [
{
"name": "local-lvm",
"type": "lvmthin",
"durable_id": "pve/data",
"state": "attached",
"reachable": true,
"total_bytes": 100000000000,
"used_bytes": 42000000000,
"avail_bytes": 58000000000,
"used_fraction": 0.42,
"content": "rootdir,images",
"mount_path": "",
"backing_device": "",
"class_hint": "fast",
"role": "",
"thin_pool": { "data_used_fraction": 0.42, "metadata_used_fraction": null },
"smart": {
"health": "UNKNOWN",
"temperature_c": null,
"power_on_hours": null,
"reallocated_sectors": null,
"pending_sectors": null,
"offline_uncorrectable": null,
"critical_warning": null,
"media_errors": null,
"percentage_used": null
}
},
{
"name": "usb-backup",
"type": "usb",
"durable_id": "uuid:0fc63daf-8483-4772-8e79-3d69d8477de4",
"state": "attached",
"reachable": true,
"total_bytes": 2000000000000,
"used_bytes": 500000000000,
"avail_bytes": 1500000000000,
"used_fraction": 0.25,
"content": "backup",
"mount_path": "/mnt/usb-backup",
"backing_device": "/dev/sdb1",
"class_hint": "slow",
"role": "",
"smart": {
"health": "UNKNOWN",
"temperature_c": null,
"power_on_hours": null,
"reallocated_sectors": null,
"pending_sectors": null,
"offline_uncorrectable": null,
"critical_warning": null,
"media_errors": null,
"percentage_used": null
}
}
],
"backups": [],
"restore_tests": [],
"pbs_snapshots": [],