diff --git a/REPORT.md b/REPORT.md index 21e7d42..b17e7f4 100644 --- a/REPORT.md +++ b/REPORT.md @@ -1,132 +1,41 @@ # felhom.eu — task reports -> **Overwrite** this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md). Sections below predate this convention change and are retained as history. +> **Overwrite** this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md). --- -## Hub slice 3 — host-domain ingest (v0.7.0) — 2026-06-08 +# REPORT — Hub: ingest agent `storage_targets` (v0.7.2) (2026-06-09) -Purely **additive** host-domain ingest in `hub/`: new tables, the agent's -`/host-report` heartbeat endpoint, per-host Bearer auth, a provisional host mint, and a -host-domain dead-man's-switch. The existing controller path is **untouched**; the schema/ -auth cutover remains **slice 10**. Pushed to `main`; build/vet/test green locally and on -the build server. +## Outcome -### New tables (`store.go migrate()`, idempotent — `// v0.7.0: host-domain`) -- **`hosts`** — one per customer agent. Reality columns (`agent_version`, `last_report_at`) - + operator-intent columns **INERT until slice 10** (`desired_json`, `desired_generation`, - `dr_record_json`). -- **`guests`** — one per controller LXC, PK `guest_id = "/"` (hub-derived). - Reality columns (`display_name`, `status`, `controller_version`, `vmid`, `last_seen_at`) - + **INERT** `api_key`, `desired_spec_json`. -- **`host_reports`** — the report stream + denormalized columns (cpu/mem/disk %, guest - counts, cloudflared status); pruned by `Prune(maxDays)` alongside `reports`. +**Pushed as hub `v0.7.2`.** The felhom-agent slice-5 Phase A work populates the host-report's +`storage_targets` (previously a defined-but-empty stub). This change is the hub half: accept +and persist them. Deliberately minimal — the authoritative storage manifest (desired +class/role/policy/creds) is hub-owned and arrives at slice 10; this slice only mirrors what the +agent observes. -> Inert columns exist **now** so slice 10 needs no `ALTER`; nothing reads/writes them this -> slice. Migration is additive-only (no `DROP`, no edits to `reports`/`customer_configs`) -> and idempotent. +## What landed (`hub/internal/api/handler.go`, `host_test.go`, golden) -### New store methods -`GetHostByAPIKey`, `GetHost`, `ListHosts`, `UpsertHost` (updates only identity + `updated_at` -on conflict), `SaveHostReport` (inserts a report row + bumps reality columns only), -`UpsertGuestFromReport` (updates reality columns only — **preserves** `api_key`/ -`desired_spec_json`), `GetHostStaleness` (skips never-reported hosts), `GuestID`. -Structs: `Host`, `Guest`, `HostReportDenorm`, `HostStaleRow`. +- `hostReportPayload` now parses `storage_targets` via a `hostStorageTarget` mirror struct that + matches the agent's `hub.StorageTarget` wire contract field-for-field (name/type/durable_id/ + state/reachable/usage/content/mount/class_hint/role/`thin_pool`/`smart`). +- Persistence: the targets are stored verbatim in the existing `report_json` column (no schema + change / no migration). The handler counts them and logs a `[WARN]` listing disconnected + targets — the storage analog of host-down visibility. +- The shared `testdata/host-report.golden.json` now carries two populated targets (an lvmthin + with `thin_pool`, a usb) and is **byte-identical** with felhom-agent's copy. +- Tests: `TestHostStorageTarget_GoldenContract` is the hub half of the bidirectional key-set + test (round-trips the golden through the mirror, asserts exact key match); + `TestHostReport_GoldenContract` also asserts the targets persist + parse back. `go test + ./internal/api/ ./internal/store/` is green. -### Auth (added; existing path unchanged) -`checkAuthHost(r)` → `(hostID, customerID, isGlobal, ok)`: global key → trust `body.host_id`; -per-host key → bound identity; failure → not-ok. `checkAuthCustomer` is byte-for-byte unchanged. +## Backward compatibility -### Endpoints -- **`POST /api/v1/host-report`** (the heartbeat): per-host auth; 4 MiB body; computes denorm - (`guest_running` counts only `status=="running"`); `SaveHostReport` + per-guest - `UpsertGuestFromReport` (a guest upsert failure is logged, not fatal — liveness); returns the - control envelope `{status:"ok", poll_interval_seconds:900, blocked, desired_generation:0, - has_signed_ops:false}`. `blocked` reflects `customer_configs.status`; the other two are - reserved placeholders (slice 4). Global-key bootstrap requires the host to already exist - (else 400); per-host key requires `body.host_id == hostID` (else 403). -- **`POST /api/v1/admin/hosts`** — **PROVISIONAL**, global-key only. Mints `host_id` (legible - `-`) + a random `api_key` (`configgen.RandomHex(32)`); 201 `{host_id, api_key}`. - Flagged in code as the slice-3 bootstrap to be removed/locked at enrollment (slices 7–8). +An older agent that sends `storage_targets: []` (or omits the field) is accepted unchanged. +The legacy controller report path is untouched (frozen until the slice-10 cutover). -### Host dead-man's-switch -`monitor.HostStalenessChecker` (`host_staleness.go`) — a **sibling** of the controller -`StalenessChecker`, keyed on host↔`host_reports`, emitting `host_stale`/`host_down`/ -`host_recovered` (30m / 60m), attributed to the host's customer (so the existing per-customer -notification UX picks them up). Registered in `allowedEventTypes`; wired in `main.go` on the -existing 60s ticker. The controller staleness/deadline checkers are untouched and keep running. +## Deploy -### Contract -The `/host-report` JSON matches the agent spec §4 field-for-field (host_id, reported_at, -agent_version, host{…}, guests[{vmid,name,status,controller_version,spec}], cloudflared{status}, -and the empty storage_targets/backups/restore_tests/pbs_snapshots/audit_tail — accepted -empty/absent). The envelope matches agent spec §5. - -### Test matrix (new, hermetic — temp SQLite, no live data) -- **store**: upsert/lookup; a report-path update **preserves** `desired_json`/`desired_generation`; - guest upsert **preserves** `api_key`/`desired_spec_json` while updating reality; `GuestID`; - staleness skips never-reported. -- **auth**: `checkAuthHost` global / per-host / unknown. -- **ingest**: valid → 200 + envelope + denorm (`guest_running` = 1 of 2); host_id mismatch → 403; - unknown host under global key → 400; blocked customer → `blocked:true`; oversize body → 400. -- **admin mint**: non-global → 403; unknown customer → 400; success → 201 + minted key - round-trips through `/host-report`. -- **host staleness**: seed emits no events; ok→stale→down→recovered transitions. - -### Untouched / deferred (explicit) -- **Controller path unchanged**: `/api/v1/report`, `reports`, `customer_configs`, - `checkAuthCustomer`, existing staleness + deadline checkers — additions only, all still green. -- **Not built** (per scope): desired-state serving, `signed_ops`, geo→hub, DR-record migration, - dashboard re-design. The cutover (drop `reports`→`guest_reports`, merge checkers, tighten the - provisional admin/global-key auth) remains **slice 10**. - -### Versioning / deploy -Hub version is the `main.Version` ldflags var (`build.sh `), default `"dev"`; recorded -**v0.7.0** in `hub/CHANGELOG.md`. The image build + ArgoCD deploy are **not** part of this task -(no deploy performed). - -### Repo state -Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the -build server (go1.26). - ---- - -## Hub slice-3 follow-ups (v0.7.1) — 2026-06-08 - -Validation follow-ups (hub half). Pushed to `main`; build/vet/test green locally (go1.26) and on -the build server. - -### §3 — `/host-report` rejects oversize with 413 (not silent truncation) -`handleHostReport` now reads `maxHostReportBytes+1` (const `4 << 20`, defined near -`defaultHostPollSeconds`) and returns **`413 Payload too large`** when exceeded, instead of relying -on `LimitReader` truncation (which could accept a truncated-but-valid JSON as a partial report, -dropping guests from the mirror). **Scope-frozen:** the controller `handleReport` 1 MiB read is -**unchanged** (diff touches only the host path); the small divergence is acceptable until cutover. -`TestHandleHostReport_OversizeRejected` now asserts 413. - -### §4 — cross-repo contract golden fixture (hub half) -- `hub/internal/api/testdata/host-report.golden.json` — a **byte-identical copy** of felhom-agent's - golden (verified by md5). -- `TestHostReport_GoldenContract` — mints a host, POSTs the golden through the **real** - `handleHostReport`, asserts 200 + denorm (`guest_total=2`, `guest_running=1`, - `cloudflared_status="active"`) + both guests upserted. Proves `hostReportPayload` still extracts - the contract from the real wire shape. - -**Caveat (called out):** the two golden files are a *duplicated* contract with no shared source of -truth. JSON can't hold a comment, so the mandatory "keep byte-identical" marker lives in each test -file's doc comment. When slices 5/6 add real `storage_targets`/`backups` fields, promote this to a -shared Go types module (the proper fix); this fixture is the bridge. - -### Versioning / scope -Recorded **v0.7.1** in `hub/CHANGELOG.md`. The hub version is the `main.Version` ldflags var -(`build.sh `, default `"dev"`) — there is no in-repo version constant to bump (the task's -pointer to `web/version.go` is the controller-image `VersionChecker`, unrelated); the image tag is -applied at build/deploy (ArgoCD), not in this task. No deploy performed. - -### Untouched (confirmed) -Controller path (`handleReport`/`reports`/`customer_configs`/`checkAuthCustomer`/existing checkers) -unchanged. The agent's proxmox client timeout was a "confirm" item — already bounded (30s default), -no change. - -### Repo state -Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the build server (go1.26). +Standard hub flow (build server 192.168.0.180): `./build.sh v0.7.2 --push` then deploy. If the +hub deployment is ArgoCD-managed, update the image tag via the managed path rather than a bare +`kubectl set image` (drift-correction would revert it). diff --git a/hub/CHANGELOG.md b/hub/CHANGELOG.md index 327977e..d813330 100644 --- a/hub/CHANGELOG.md +++ b/hub/CHANGELOG.md @@ -1,5 +1,29 @@ # Felhom Hub — Changelog +## v0.7.2 — ingest agent storage_targets (slice 5 Phase A) (2026-06-09) + +The agent's slice-5 work populates the host-report's `storage_targets` (previously empty). +This is the hub half: accept + persist them. Minimal by design — the rich, authoritative +storage manifest (desired class/role/policy/creds) is hub-owned and lands at slice 10; this +slice only mirrors what the agent observes. + +### Added +- **`hostReportPayload.StorageTargets`** (`internal/api/handler.go`) — a full mirror of the + agent's `hub.StorageTarget` wire contract (name/type/durable_id/state/reachable/usage/ + content/mount/class_hint/role/`thin_pool`/`smart`). The targets are persisted verbatim in + the existing `report_json` row (no schema change); the handler counts them and logs a + `[WARN]` when any are `disconnected` (the storage analog of host-down visibility). +- **`testdata/host-report.golden.json`** — updated to carry two populated `storage_targets` + (an lvmthin with `thin_pool`, a usb), kept **byte-identical** with felhom-agent's copy. +- **`TestHostStorageTarget_GoldenContract`** — the hub half of the bidirectional key-set test: + round-trips the golden's `storage_targets[0]` through the mirror struct and asserts the key + set matches exactly (no missing/extra fields vs the agent). `TestHostReport_GoldenContract` + also now asserts the targets are persisted + parse back. + +### Notes +- Backward-compatible: an older agent that sends `storage_targets: []` (or omits it) is + accepted unchanged. The legacy controller report path is untouched (frozen until slice 10). + ## Repo docs — no hub version change (2026-06-08) ### Changed diff --git a/hub/internal/api/handler.go b/hub/internal/api/handler.go index ef9cbb8..13832d9 100644 --- a/hub/internal/api/handler.go +++ b/hub/internal/api/handler.go @@ -234,9 +234,15 @@ const defaultHostPollSeconds = 900 const maxHostReportBytes = 4 << 20 // 4 MiB // hostReportPayload is the subset of the agent host-report (slice-3 contract, -// §3 / agent spec §4) the hub needs for denorm + guest reality. Unknown fields -// (storage_targets/backups/restore_tests/pbs_snapshots/audit_tail) are ignored, -// so an empty or absent collection is accepted without error. +// §3 / agent spec §4) the hub needs for denorm + guest reality. The remaining fields +// (backups/restore_tests/pbs_snapshots/audit_tail) are ignored, so an empty or absent +// collection is accepted without error. +// +// storage_targets (slice 5) is now parsed: the agent populates it, and the hub accepts +// + persists it. Persistence is the full report_json row (which carries the targets +// verbatim) plus the denorm counts below — the RICH manifest schema (desired class/role/ +// policy/creds) is hub-owned and lands in slice 10; this slice only mirrors what the agent +// observes. type hostReportPayload struct { HostID string `json:"host_id"` AgentVersion string `json:"agent_version"` @@ -251,11 +257,49 @@ type hostReportPayload struct { Status string `json:"status"` ControllerVersion string `json:"controller_version"` } `json:"guests"` - Cloudflared struct { + StorageTargets []hostStorageTarget `json:"storage_targets"` + Cloudflared struct { Status string `json:"status"` } `json:"cloudflared"` } +// hostStorageTarget mirrors the agent's hub.StorageTarget wire contract field-for-field. +// It is a DUPLICATED contract (no shared types module yet); testdata/host-report.golden.json +// must stay byte-identical with felhom-agent's copy and the key-set test guards drift. +// The hub does not act on these yet beyond persisting + counting them (slice 10 adds the +// authoritative manifest), but mirroring the full shape keeps the cross-repo contract honest. +type hostStorageTarget struct { + Name string `json:"name"` + Type string `json:"type"` + DurableID string `json:"durable_id"` + State string `json:"state"` + Reachable bool `json:"reachable"` + TotalBytes int64 `json:"total_bytes"` + UsedBytes int64 `json:"used_bytes"` + AvailBytes int64 `json:"avail_bytes"` + UsedFraction float64 `json:"used_fraction"` + Content string `json:"content"` + MountPath string `json:"mount_path"` + BackingDevice string `json:"backing_device"` + ClassHint string `json:"class_hint"` + Role string `json:"role"` + ThinPool *struct { + DataUsedFraction float64 `json:"data_used_fraction"` + MetadataUsedFraction *float64 `json:"metadata_used_fraction"` + } `json:"thin_pool,omitempty"` + Smart struct { + Health string `json:"health"` + TemperatureC *int `json:"temperature_c"` + PowerOnHours *int `json:"power_on_hours"` + ReallocatedSectors *int `json:"reallocated_sectors"` + PendingSectors *int `json:"pending_sectors"` + OfflineUncorrectable *int `json:"offline_uncorrectable"` + CriticalWarning *int `json:"critical_warning"` + MediaErrors *int `json:"media_errors"` + PercentageUsed *int `json:"percentage_used"` + } `json:"smart"` +} + // handleHostReport ingests the agent's host-report (the heartbeat) and returns the // control envelope (agent spec §5). func (h *Handler) handleHostReport(w http.ResponseWriter, r *http.Request) { @@ -340,7 +384,22 @@ func (h *Handler) handleHostReport(w http.ResponseWriter, r *http.Request) { } } - h.logger.Printf("[INFO] host-report from %s (%d guests, %d bytes)", hostID, len(rep.Guests), len(body)) + // storage_targets (slice 5): persisted as part of report_json above. Count + surface + // disconnected ones in the log (the slice-10 manifest will reconcile them; for now the + // signal is the visibility — a disconnected target is the storage analog of host-down). + disconnected := 0 + for _, st := range rep.StorageTargets { + if st.State == "disconnected" { + disconnected++ + } + } + if disconnected > 0 { + h.logger.Printf("[WARN] host %s reports %d disconnected storage target(s) of %d", + hostID, disconnected, len(rep.StorageTargets)) + } + + h.logger.Printf("[INFO] host-report from %s (%d guests, %d storage targets, %d bytes)", + hostID, len(rep.Guests), len(rep.StorageTargets), len(body)) blocked := false if cc, err := h.store.GetCustomerConfig(custID); err == nil && cc != nil && cc.Status == "blocked" { diff --git a/hub/internal/api/host_test.go b/hub/internal/api/host_test.go index 6a09bd2..c6aa30d 100644 --- a/hub/internal/api/host_test.go +++ b/hub/internal/api/host_test.go @@ -9,6 +9,8 @@ import ( "net/http/httptest" "os" "path/filepath" + "reflect" + "sort" "strings" "testing" @@ -229,4 +231,78 @@ func TestHostReport_GoldenContract(t *testing.T) { if guestCount != 2 { t.Errorf("guests upserted = %d, want 2", guestCount) } + + // storage_targets (slice 5) must be persisted verbatim in report_json. + var reportJSON string + if err := db.QueryRow(`SELECT report_json FROM host_reports WHERE host_id='demo-host-01' ORDER BY id DESC LIMIT 1`). + Scan(&reportJSON); err != nil { + t.Fatal(err) + } + var parsed hostReportPayload + if err := json.Unmarshal([]byte(reportJSON), &parsed); err != nil { + t.Fatalf("persisted report_json does not parse: %v", err) + } + if len(parsed.StorageTargets) != 2 { + t.Errorf("persisted storage_targets = %d, want 2", len(parsed.StorageTargets)) + } + if parsed.StorageTargets[0].Type != "lvmthin" || parsed.StorageTargets[0].ThinPool == nil { + t.Errorf("storage_targets[0] = %+v, want lvmthin with thin_pool", parsed.StorageTargets[0]) + } +} + +// TestHostStorageTarget_GoldenContract asserts the hub's hostStorageTarget mirror covers +// the golden's storage_targets[0] field-for-field (the bidirectional key-set test, the hub +// half of the cross-repo contract). It round-trips the golden element through the mirror +// struct and requires the re-marshaled key set to match exactly — neither missing a field +// the agent sends nor inventing one it doesn't. +func TestHostStorageTarget_GoldenContract(t *testing.T) { + raw, err := os.ReadFile("testdata/host-report.golden.json") + if err != nil { + t.Fatal(err) + } + var golden struct { + StorageTargets []json.RawMessage `json:"storage_targets"` + } + if err := json.Unmarshal(raw, &golden); err != nil { + t.Fatal(err) + } + if len(golden.StorageTargets) == 0 { + t.Fatal("golden has no storage_targets to check") + } + + var goldenKeys map[string]any + json.Unmarshal(golden.StorageTargets[0], &goldenKeys) + + var mirror hostStorageTarget + if err := json.Unmarshal(golden.StorageTargets[0], &mirror); err != nil { + t.Fatalf("golden storage target does not parse into the mirror: %v", err) + } + b, _ := json.Marshal(mirror) + var mirrorKeys map[string]any + json.Unmarshal(b, &mirrorKeys) + + assertSameStorageKeys(t, "storage_targets[0]", goldenKeys, mirrorKeys) + assertSameStorageKeys(t, "storage_targets[0].smart", goldenKeys["smart"], mirrorKeys["smart"]) + assertSameStorageKeys(t, "storage_targets[0].thin_pool", goldenKeys["thin_pool"], mirrorKeys["thin_pool"]) +} + +func assertSameStorageKeys(t *testing.T, where string, a, b any) { + t.Helper() + ka, kb := sortedKeys(a), sortedKeys(b) + if !reflect.DeepEqual(ka, kb) { + t.Errorf("contract drift at %s:\n golden keys = %v\n mirror keys = %v", where, ka, kb) + } +} + +func sortedKeys(v any) []string { + m, ok := v.(map[string]any) + if !ok { + return nil + } + keys := make([]string, 0, len(m)) + for k := range m { + keys = append(keys, k) + } + sort.Strings(keys) + return keys } diff --git a/hub/internal/api/testdata/host-report.golden.json b/hub/internal/api/testdata/host-report.golden.json index e2a3066..73ff263 100644 --- a/hub/internal/api/testdata/host-report.golden.json +++ b/hub/internal/api/testdata/host-report.golden.json @@ -29,7 +29,63 @@ "controller_version": "" } ], - "storage_targets": [], + "storage_targets": [ + { + "name": "local-lvm", + "type": "lvmthin", + "durable_id": "pve/data", + "state": "attached", + "reachable": true, + "total_bytes": 100000000000, + "used_bytes": 42000000000, + "avail_bytes": 58000000000, + "used_fraction": 0.42, + "content": "rootdir,images", + "mount_path": "", + "backing_device": "", + "class_hint": "fast", + "role": "", + "thin_pool": { "data_used_fraction": 0.42, "metadata_used_fraction": null }, + "smart": { + "health": "UNKNOWN", + "temperature_c": null, + "power_on_hours": null, + "reallocated_sectors": null, + "pending_sectors": null, + "offline_uncorrectable": null, + "critical_warning": null, + "media_errors": null, + "percentage_used": null + } + }, + { + "name": "usb-backup", + "type": "usb", + "durable_id": "uuid:0fc63daf-8483-4772-8e79-3d69d8477de4", + "state": "attached", + "reachable": true, + "total_bytes": 2000000000000, + "used_bytes": 500000000000, + "avail_bytes": 1500000000000, + "used_fraction": 0.25, + "content": "backup", + "mount_path": "/mnt/usb-backup", + "backing_device": "/dev/sdb1", + "class_hint": "slow", + "role": "", + "smart": { + "health": "UNKNOWN", + "temperature_c": null, + "power_on_hours": null, + "reallocated_sectors": null, + "pending_sectors": null, + "offline_uncorrectable": null, + "critical_warning": null, + "media_errors": null, + "percentage_used": null + } + } + ], "backups": [], "restore_tests": [], "pbs_snapshots": [],