hub: restore-test "passed with warnings" visibility (v0.7.5)

Phase B (hub half) of the restore-test warning fix. The agent v0.7.0 now passes a
restore-test that emitted a benign start advisory (systemd-nesting) and carries the
warning text on the wire.

- hostRestoreTest gains warnings + warnings_recognized mirror fields (omitempty;
  absent recognized => false => louder unrecognized path)
- ingest logs [INFO] passed WITH WARNINGS (recognized), [WARN] for unrecognized;
  FAILED still [WARN]
- golden restore_tests[0] gains the keys, byte-identical with felhom-agent (sha256
  e6999d77...); bidirectional key-set contract test round-trips them
- no dashboard widget: no host-domain dashboard surface exists yet (log+persist only,
  as with pbs_snapshots) -- deferred to slice 10

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-09 19:41:21 +02:00
parent 5268411014
commit 4bd0909f2b
4 changed files with 93 additions and 23 deletions
+36 -20
View File
@@ -4,33 +4,49 @@
---
# REPORT — Hub: ingest agent pbs_snapshots (v0.7.4) (2026-06-09)
# REPORT — Hub: restore-test "passed with warnings" visibility (v0.7.5) (2026-06-09)
## Outcome
**Code committed + pushed (changelogged as `v0.7.4`); image build/deploy deferred to an
operator decision.** The felhom-agent slice-6 Phase B work populates the host-report's
`pbs_snapshots` (PBS offsite inventory + per-snapshot verify-state). This is the hub half:
accept + persist them. Minimal — the authoritative offsite policy is hub-owned (slice 10).
**Phase B (hub half) of `TASK — Restore-test must not false-fail on benign start warnings`.**
The agent (v0.7.0, already deployed + live-validated) now treats a benign guest-start advisory
(e.g. `WARN: Systemd 257 detected. You may need to enable nesting.`) as a PASS — verdict is
liveness, not the start-task exitstatus — and carries the warning text on the wire. This is the
hub half: ingest those fields and make a passed-with-warnings restore-test visible to the
operator instead of indistinguishable from a clean pass.
## What landed (`hub/internal/api/handler.go`, `host_test.go`, golden)
## What landed (`hub/internal/api/handler.go`, golden, `host_test.go`)
- `hostReportPayload` gains a `hostPBSSnapshot` mirror struct matching the agent's
`hub.PBSSnapshot` field-for-field, persisted via the existing `report_json` column.
- The handler logs a **FAILED PBS verify prominently** (`[WARN]` the loudest offsite-DR
signal); the host-report info line now counts pbs-snapshots too.
- The shared `testdata/host-report.golden.json` carries a populated `pbs_snapshots[0]`,
**byte-identical** with felhom-agent's copy; `TestHostPBSSnapshot_GoldenContract` is the
hub half of the bidirectional key-set test. `go test ./internal/api/` is green.
- **Wire mirror:** `hostRestoreTest` gains `warnings []string` + `warnings_recognized bool`
(`omitempty`), matching the agent's `hub.RestoreTest` field-for-field. An absent
`warnings_recognized``false` the **louder** unrecognized path, so a missing flag can only
over-notice, never hide a real warning.
- **Ingest behaviour:** a passed restore-test that carried warnings now logs
`[INFO] restore-test passed WITH WARNINGS (recognized)` when every warning is the known-benign
anchor, escalated to `[WARN] … UNRECOGNIZED WARNINGS` otherwise (as loud as a failed PBS
verify). A FAILED restore-test still logs the existing `[WARN] … FAILED`.
- **Contract:** `restore_tests[0]` in the host-report golden gains the two keys; the golden stays
**byte-identical** with felhom-agent's copy (sha256 `e6999d77…`), and the bidirectional
key-set contract test round-trips the new keys through `hostRestoreTest`. `go test ./...` green.
## Scope note — no dashboard widget this slice
The task asked to "surface in the dashboard distinctly from a clean pass." The hub web layer
currently renders **only controller-report data** — there is no host-domain dashboard surface
yet (guests/storage/restore_tests/pbs_snapshots are log+persist only; the failed-PBS-verify
signal is likewise log-only). Building one is out of scope here; distinct dashboard treatment
should land with the host-domain dashboard (slice 10). The operator signal this slice is the log
line, consistent with the established failed-PBS-verify precedent.
## Backward compatibility
An agent that omits/empties `pbs_snapshots` is accepted unchanged. The legacy controller
report path is untouched (frozen until the slice-10 cutover).
An agent that omits/empties `warnings`/`warnings_recognized` is accepted unchanged (the deployed
v0.7.4 hub already ignores them). The legacy controller report path is untouched.
## Deploy
## Deploy (GitOps)
> Per the GitOps flow (`CLAUDE.md`): build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.4`,
> bump `manifests/hub.yaml`, commit, then sync the `felhom` ArgoCD app. **Deferred** at this
> checkpoint — the change is backward-compatible, so the live hub (v0.7.3) keeps ingesting
> host-reports fine until then.
Build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.5` → bump the `image:` tag in
`manifests/hub.yaml` commit sync the `felhom` ArgoCD app (auto-sync off). Live-validated
after sync: the demo host's restore-test (agent v0.7.0, which passes-with-recognized-warnings on
the Debian-13 guest 9999) reflects on the hub as `passed WITH WARNINGS (recognized)` — not a
plain pass and not a FAILED.