hub v0.7.2: ingest agent storage_targets (slice 5 Phase A)
Accept + persist the now-populated host-report storage_targets. Minimal — the authoritative storage manifest is hub-owned (slice 10); this mirrors what the agent observes. - hostReportPayload.StorageTargets: full mirror of the agent's hub.StorageTarget wire contract; persisted verbatim in report_json (no schema change); count + WARN on disconnected targets. - shared host-report golden updated with two populated targets; byte-identical with felhom-agent's copy. - TestHostStorageTarget_GoldenContract: hub half of the bidirectional key-set test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,132 +1,41 @@
|
||||
# felhom.eu — task reports
|
||||
|
||||
> **Overwrite** this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md). Sections below predate this convention change and are retained as history.
|
||||
> **Overwrite** this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md).
|
||||
|
||||
---
|
||||
|
||||
## Hub slice 3 — host-domain ingest (v0.7.0) — 2026-06-08
|
||||
# REPORT — Hub: ingest agent `storage_targets` (v0.7.2) (2026-06-09)
|
||||
|
||||
Purely **additive** host-domain ingest in `hub/`: new tables, the agent's
|
||||
`/host-report` heartbeat endpoint, per-host Bearer auth, a provisional host mint, and a
|
||||
host-domain dead-man's-switch. The existing controller path is **untouched**; the schema/
|
||||
auth cutover remains **slice 10**. Pushed to `main`; build/vet/test green locally and on
|
||||
the build server.
|
||||
## Outcome
|
||||
|
||||
### New tables (`store.go migrate()`, idempotent — `// v0.7.0: host-domain`)
|
||||
- **`hosts`** — one per customer agent. Reality columns (`agent_version`, `last_report_at`)
|
||||
+ operator-intent columns **INERT until slice 10** (`desired_json`, `desired_generation`,
|
||||
`dr_record_json`).
|
||||
- **`guests`** — one per controller LXC, PK `guest_id = "<host_id>/<vmid>"` (hub-derived).
|
||||
Reality columns (`display_name`, `status`, `controller_version`, `vmid`, `last_seen_at`)
|
||||
+ **INERT** `api_key`, `desired_spec_json`.
|
||||
- **`host_reports`** — the report stream + denormalized columns (cpu/mem/disk %, guest
|
||||
counts, cloudflared status); pruned by `Prune(maxDays)` alongside `reports`.
|
||||
**Pushed as hub `v0.7.2`.** The felhom-agent slice-5 Phase A work populates the host-report's
|
||||
`storage_targets` (previously a defined-but-empty stub). This change is the hub half: accept
|
||||
and persist them. Deliberately minimal — the authoritative storage manifest (desired
|
||||
class/role/policy/creds) is hub-owned and arrives at slice 10; this slice only mirrors what the
|
||||
agent observes.
|
||||
|
||||
> Inert columns exist **now** so slice 10 needs no `ALTER`; nothing reads/writes them this
|
||||
> slice. Migration is additive-only (no `DROP`, no edits to `reports`/`customer_configs`)
|
||||
> and idempotent.
|
||||
## What landed (`hub/internal/api/handler.go`, `host_test.go`, golden)
|
||||
|
||||
### New store methods
|
||||
`GetHostByAPIKey`, `GetHost`, `ListHosts`, `UpsertHost` (updates only identity + `updated_at`
|
||||
on conflict), `SaveHostReport` (inserts a report row + bumps reality columns only),
|
||||
`UpsertGuestFromReport` (updates reality columns only — **preserves** `api_key`/
|
||||
`desired_spec_json`), `GetHostStaleness` (skips never-reported hosts), `GuestID`.
|
||||
Structs: `Host`, `Guest`, `HostReportDenorm`, `HostStaleRow`.
|
||||
- `hostReportPayload` now parses `storage_targets` via a `hostStorageTarget` mirror struct that
|
||||
matches the agent's `hub.StorageTarget` wire contract field-for-field (name/type/durable_id/
|
||||
state/reachable/usage/content/mount/class_hint/role/`thin_pool`/`smart`).
|
||||
- Persistence: the targets are stored verbatim in the existing `report_json` column (no schema
|
||||
change / no migration). The handler counts them and logs a `[WARN]` listing disconnected
|
||||
targets — the storage analog of host-down visibility.
|
||||
- The shared `testdata/host-report.golden.json` now carries two populated targets (an lvmthin
|
||||
with `thin_pool`, a usb) and is **byte-identical** with felhom-agent's copy.
|
||||
- Tests: `TestHostStorageTarget_GoldenContract` is the hub half of the bidirectional key-set
|
||||
test (round-trips the golden through the mirror, asserts exact key match);
|
||||
`TestHostReport_GoldenContract` also asserts the targets persist + parse back. `go test
|
||||
./internal/api/ ./internal/store/` is green.
|
||||
|
||||
### Auth (added; existing path unchanged)
|
||||
`checkAuthHost(r)` → `(hostID, customerID, isGlobal, ok)`: global key → trust `body.host_id`;
|
||||
per-host key → bound identity; failure → not-ok. `checkAuthCustomer` is byte-for-byte unchanged.
|
||||
## Backward compatibility
|
||||
|
||||
### Endpoints
|
||||
- **`POST /api/v1/host-report`** (the heartbeat): per-host auth; 4 MiB body; computes denorm
|
||||
(`guest_running` counts only `status=="running"`); `SaveHostReport` + per-guest
|
||||
`UpsertGuestFromReport` (a guest upsert failure is logged, not fatal — liveness); returns the
|
||||
control envelope `{status:"ok", poll_interval_seconds:900, blocked, desired_generation:0,
|
||||
has_signed_ops:false}`. `blocked` reflects `customer_configs.status`; the other two are
|
||||
reserved placeholders (slice 4). Global-key bootstrap requires the host to already exist
|
||||
(else 400); per-host key requires `body.host_id == hostID` (else 403).
|
||||
- **`POST /api/v1/admin/hosts`** — **PROVISIONAL**, global-key only. Mints `host_id` (legible
|
||||
`<customer>-<hex>`) + a random `api_key` (`configgen.RandomHex(32)`); 201 `{host_id, api_key}`.
|
||||
Flagged in code as the slice-3 bootstrap to be removed/locked at enrollment (slices 7–8).
|
||||
An older agent that sends `storage_targets: []` (or omits the field) is accepted unchanged.
|
||||
The legacy controller report path is untouched (frozen until the slice-10 cutover).
|
||||
|
||||
### Host dead-man's-switch
|
||||
`monitor.HostStalenessChecker` (`host_staleness.go`) — a **sibling** of the controller
|
||||
`StalenessChecker`, keyed on host↔`host_reports`, emitting `host_stale`/`host_down`/
|
||||
`host_recovered` (30m / 60m), attributed to the host's customer (so the existing per-customer
|
||||
notification UX picks them up). Registered in `allowedEventTypes`; wired in `main.go` on the
|
||||
existing 60s ticker. The controller staleness/deadline checkers are untouched and keep running.
|
||||
## Deploy
|
||||
|
||||
### Contract
|
||||
The `/host-report` JSON matches the agent spec §4 field-for-field (host_id, reported_at,
|
||||
agent_version, host{…}, guests[{vmid,name,status,controller_version,spec}], cloudflared{status},
|
||||
and the empty storage_targets/backups/restore_tests/pbs_snapshots/audit_tail — accepted
|
||||
empty/absent). The envelope matches agent spec §5.
|
||||
|
||||
### Test matrix (new, hermetic — temp SQLite, no live data)
|
||||
- **store**: upsert/lookup; a report-path update **preserves** `desired_json`/`desired_generation`;
|
||||
guest upsert **preserves** `api_key`/`desired_spec_json` while updating reality; `GuestID`;
|
||||
staleness skips never-reported.
|
||||
- **auth**: `checkAuthHost` global / per-host / unknown.
|
||||
- **ingest**: valid → 200 + envelope + denorm (`guest_running` = 1 of 2); host_id mismatch → 403;
|
||||
unknown host under global key → 400; blocked customer → `blocked:true`; oversize body → 400.
|
||||
- **admin mint**: non-global → 403; unknown customer → 400; success → 201 + minted key
|
||||
round-trips through `/host-report`.
|
||||
- **host staleness**: seed emits no events; ok→stale→down→recovered transitions.
|
||||
|
||||
### Untouched / deferred (explicit)
|
||||
- **Controller path unchanged**: `/api/v1/report`, `reports`, `customer_configs`,
|
||||
`checkAuthCustomer`, existing staleness + deadline checkers — additions only, all still green.
|
||||
- **Not built** (per scope): desired-state serving, `signed_ops`, geo→hub, DR-record migration,
|
||||
dashboard re-design. The cutover (drop `reports`→`guest_reports`, merge checkers, tighten the
|
||||
provisional admin/global-key auth) remains **slice 10**.
|
||||
|
||||
### Versioning / deploy
|
||||
Hub version is the `main.Version` ldflags var (`build.sh <VER>`), default `"dev"`; recorded
|
||||
**v0.7.0** in `hub/CHANGELOG.md`. The image build + ArgoCD deploy are **not** part of this task
|
||||
(no deploy performed).
|
||||
|
||||
### Repo state
|
||||
Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the
|
||||
build server (go1.26).
|
||||
|
||||
---
|
||||
|
||||
## Hub slice-3 follow-ups (v0.7.1) — 2026-06-08
|
||||
|
||||
Validation follow-ups (hub half). Pushed to `main`; build/vet/test green locally (go1.26) and on
|
||||
the build server.
|
||||
|
||||
### §3 — `/host-report` rejects oversize with 413 (not silent truncation)
|
||||
`handleHostReport` now reads `maxHostReportBytes+1` (const `4 << 20`, defined near
|
||||
`defaultHostPollSeconds`) and returns **`413 Payload too large`** when exceeded, instead of relying
|
||||
on `LimitReader` truncation (which could accept a truncated-but-valid JSON as a partial report,
|
||||
dropping guests from the mirror). **Scope-frozen:** the controller `handleReport` 1 MiB read is
|
||||
**unchanged** (diff touches only the host path); the small divergence is acceptable until cutover.
|
||||
`TestHandleHostReport_OversizeRejected` now asserts 413.
|
||||
|
||||
### §4 — cross-repo contract golden fixture (hub half)
|
||||
- `hub/internal/api/testdata/host-report.golden.json` — a **byte-identical copy** of felhom-agent's
|
||||
golden (verified by md5).
|
||||
- `TestHostReport_GoldenContract` — mints a host, POSTs the golden through the **real**
|
||||
`handleHostReport`, asserts 200 + denorm (`guest_total=2`, `guest_running=1`,
|
||||
`cloudflared_status="active"`) + both guests upserted. Proves `hostReportPayload` still extracts
|
||||
the contract from the real wire shape.
|
||||
|
||||
**Caveat (called out):** the two golden files are a *duplicated* contract with no shared source of
|
||||
truth. JSON can't hold a comment, so the mandatory "keep byte-identical" marker lives in each test
|
||||
file's doc comment. When slices 5/6 add real `storage_targets`/`backups` fields, promote this to a
|
||||
shared Go types module (the proper fix); this fixture is the bridge.
|
||||
|
||||
### Versioning / scope
|
||||
Recorded **v0.7.1** in `hub/CHANGELOG.md`. The hub version is the `main.Version` ldflags var
|
||||
(`build.sh <VER>`, default `"dev"`) — there is no in-repo version constant to bump (the task's
|
||||
pointer to `web/version.go` is the controller-image `VersionChecker`, unrelated); the image tag is
|
||||
applied at build/deploy (ArgoCD), not in this task. No deploy performed.
|
||||
|
||||
### Untouched (confirmed)
|
||||
Controller path (`handleReport`/`reports`/`customer_configs`/`checkAuthCustomer`/existing checkers)
|
||||
unchanged. The agent's proxmox client timeout was a "confirm" item — already bounded (30s default),
|
||||
no change.
|
||||
|
||||
### Repo state
|
||||
Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the build server (go1.26).
|
||||
Standard hub flow (build server 192.168.0.180): `./build.sh v0.7.2 --push` then deploy. If the
|
||||
hub deployment is ArgoCD-managed, update the image tag via the managed path rather than a bare
|
||||
`kubectl set image` (drift-correction would revert it).
|
||||
|
||||
Reference in New Issue
Block a user