Files
felhom.eu/REPORT.md
T
admin 7c0c75457f feat(hub): host-domain ingest — tables + /host-report + per-host auth + host dead-man's-switch (v0.7.0, slice 3)
Purely additive; the controller path (reports/customer_configs/checkAuthCustomer/
existing checkers) is untouched. Cutover remains slice 10.

- store: new hosts/guests/host_reports tables (full schema incl. columns INERT
  until slice 10, so no later ALTER); GetHostByAPIKey/GetHost/ListHosts/UpsertHost/
  SaveHostReport/UpsertGuestFromReport (preserves inert cols)/GetHostStaleness/
  GuestID; Prune also prunes host_reports.
- api: checkAuthHost (sibling of checkAuthCustomer); POST /host-report (per-host
  Bearer, 4MiB, denorm + guest upsert, control envelope); POST /admin/hosts
  (PROVISIONAL global-key host mint); host_* event types registered.
- monitor: HostStalenessChecker sibling over host_reports (host_stale/down/
  recovered), wired on the existing 60s ticker; controller checkers unchanged.
- tests (hermetic): store intent/inert-column preservation, auth, ingest
  (envelope+denorm, mismatch/unknown/blocked/oversize), admin mint round-trip,
  host staleness transitions.

CHANGELOG v0.7.0. Contract matches the agent host-report spec field-for-field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:36:16 +02:00

92 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# felhom.eu — task reports
> One section per task, **appended** (newest last) — not overwritten. Cumulative
> hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md).
---
## Hub slice 3 — host-domain ingest (v0.7.0) — 2026-06-08
Purely **additive** host-domain ingest in `hub/`: new tables, the agent's
`/host-report` heartbeat endpoint, per-host Bearer auth, a provisional host mint, and a
host-domain dead-man's-switch. The existing controller path is **untouched**; the schema/
auth cutover remains **slice 10**. Pushed to `main`; build/vet/test green locally and on
the build server.
### New tables (`store.go migrate()`, idempotent — `// v0.7.0: host-domain`)
- **`hosts`** — one per customer agent. Reality columns (`agent_version`, `last_report_at`)
+ operator-intent columns **INERT until slice 10** (`desired_json`, `desired_generation`,
`dr_record_json`).
- **`guests`** — one per controller LXC, PK `guest_id = "<host_id>/<vmid>"` (hub-derived).
Reality columns (`display_name`, `status`, `controller_version`, `vmid`, `last_seen_at`)
+ **INERT** `api_key`, `desired_spec_json`.
- **`host_reports`** — the report stream + denormalized columns (cpu/mem/disk %, guest
counts, cloudflared status); pruned by `Prune(maxDays)` alongside `reports`.
> Inert columns exist **now** so slice 10 needs no `ALTER`; nothing reads/writes them this
> slice. Migration is additive-only (no `DROP`, no edits to `reports`/`customer_configs`)
> and idempotent.
### New store methods
`GetHostByAPIKey`, `GetHost`, `ListHosts`, `UpsertHost` (updates only identity + `updated_at`
on conflict), `SaveHostReport` (inserts a report row + bumps reality columns only),
`UpsertGuestFromReport` (updates reality columns only — **preserves** `api_key`/
`desired_spec_json`), `GetHostStaleness` (skips never-reported hosts), `GuestID`.
Structs: `Host`, `Guest`, `HostReportDenorm`, `HostStaleRow`.
### Auth (added; existing path unchanged)
`checkAuthHost(r)``(hostID, customerID, isGlobal, ok)`: global key → trust `body.host_id`;
per-host key → bound identity; failure → not-ok. `checkAuthCustomer` is byte-for-byte unchanged.
### Endpoints
- **`POST /api/v1/host-report`** (the heartbeat): per-host auth; 4 MiB body; computes denorm
(`guest_running` counts only `status=="running"`); `SaveHostReport` + per-guest
`UpsertGuestFromReport` (a guest upsert failure is logged, not fatal — liveness); returns the
control envelope `{status:"ok", poll_interval_seconds:900, blocked, desired_generation:0,
has_signed_ops:false}`. `blocked` reflects `customer_configs.status`; the other two are
reserved placeholders (slice 4). Global-key bootstrap requires the host to already exist
(else 400); per-host key requires `body.host_id == hostID` (else 403).
- **`POST /api/v1/admin/hosts`** — **PROVISIONAL**, global-key only. Mints `host_id` (legible
`<customer>-<hex>`) + a random `api_key` (`configgen.RandomHex(32)`); 201 `{host_id, api_key}`.
Flagged in code as the slice-3 bootstrap to be removed/locked at enrollment (slices 78).
### Host dead-man's-switch
`monitor.HostStalenessChecker` (`host_staleness.go`) — a **sibling** of the controller
`StalenessChecker`, keyed on host↔`host_reports`, emitting `host_stale`/`host_down`/
`host_recovered` (30m / 60m), attributed to the host's customer (so the existing per-customer
notification UX picks them up). Registered in `allowedEventTypes`; wired in `main.go` on the
existing 60s ticker. The controller staleness/deadline checkers are untouched and keep running.
### Contract
The `/host-report` JSON matches the agent spec §4 field-for-field (host_id, reported_at,
agent_version, host{…}, guests[{vmid,name,status,controller_version,spec}], cloudflared{status},
and the empty storage_targets/backups/restore_tests/pbs_snapshots/audit_tail — accepted
empty/absent). The envelope matches agent spec §5.
### Test matrix (new, hermetic — temp SQLite, no live data)
- **store**: upsert/lookup; a report-path update **preserves** `desired_json`/`desired_generation`;
guest upsert **preserves** `api_key`/`desired_spec_json` while updating reality; `GuestID`;
staleness skips never-reported.
- **auth**: `checkAuthHost` global / per-host / unknown.
- **ingest**: valid → 200 + envelope + denorm (`guest_running` = 1 of 2); host_id mismatch → 403;
unknown host under global key → 400; blocked customer → `blocked:true`; oversize body → 400.
- **admin mint**: non-global → 403; unknown customer → 400; success → 201 + minted key
round-trips through `/host-report`.
- **host staleness**: seed emits no events; ok→stale→down→recovered transitions.
### Untouched / deferred (explicit)
- **Controller path unchanged**: `/api/v1/report`, `reports`, `customer_configs`,
`checkAuthCustomer`, existing staleness + deadline checkers — additions only, all still green.
- **Not built** (per scope): desired-state serving, `signed_ops`, geo→hub, DR-record migration,
dashboard re-design. The cutover (drop `reports``guest_reports`, merge checkers, tighten the
provisional admin/global-key auth) remains **slice 10**.
### Versioning / deploy
Hub version is the `main.Version` ldflags var (`build.sh <VER>`), default `"dev"`; recorded
**v0.7.0** in `hub/CHANGELOG.md`. The image build + ArgoCD deploy are **not** part of this task
(no deploy performed).
### Repo state
Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the
build server (go1.26).