hub: restore-test "passed with warnings" visibility (v0.7.5)
Phase B (hub half) of the restore-test warning fix. The agent v0.7.0 now passes a restore-test that emitted a benign start advisory (systemd-nesting) and carries the warning text on the wire. - hostRestoreTest gains warnings + warnings_recognized mirror fields (omitempty; absent recognized => false => louder unrecognized path) - ingest logs [INFO] passed WITH WARNINGS (recognized), [WARN] for unrecognized; FAILED still [WARN] - golden restore_tests[0] gains the keys, byte-identical with felhom-agent (sha256 e6999d77...); bidirectional key-set contract test round-trips them - no dashboard widget: no host-domain dashboard surface exists yet (log+persist only, as with pbs_snapshots) -- deferred to slice 10 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4,33 +4,49 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# REPORT — Hub: ingest agent pbs_snapshots (v0.7.4) (2026-06-09)
|
# REPORT — Hub: restore-test "passed with warnings" visibility (v0.7.5) (2026-06-09)
|
||||||
|
|
||||||
## Outcome
|
## Outcome
|
||||||
|
|
||||||
**Code committed + pushed (changelogged as `v0.7.4`); image build/deploy deferred to an
|
**Phase B (hub half) of `TASK — Restore-test must not false-fail on benign start warnings`.**
|
||||||
operator decision.** The felhom-agent slice-6 Phase B work populates the host-report's
|
The agent (v0.7.0, already deployed + live-validated) now treats a benign guest-start advisory
|
||||||
`pbs_snapshots` (PBS offsite inventory + per-snapshot verify-state). This is the hub half:
|
(e.g. `WARN: Systemd 257 detected. You may need to enable nesting.`) as a PASS — verdict is
|
||||||
accept + persist them. Minimal — the authoritative offsite policy is hub-owned (slice 10).
|
liveness, not the start-task exitstatus — and carries the warning text on the wire. This is the
|
||||||
|
hub half: ingest those fields and make a passed-with-warnings restore-test visible to the
|
||||||
|
operator instead of indistinguishable from a clean pass.
|
||||||
|
|
||||||
## What landed (`hub/internal/api/handler.go`, `host_test.go`, golden)
|
## What landed (`hub/internal/api/handler.go`, golden, `host_test.go`)
|
||||||
|
|
||||||
- `hostReportPayload` gains a `hostPBSSnapshot` mirror struct matching the agent's
|
- **Wire mirror:** `hostRestoreTest` gains `warnings []string` + `warnings_recognized bool`
|
||||||
`hub.PBSSnapshot` field-for-field, persisted via the existing `report_json` column.
|
(`omitempty`), matching the agent's `hub.RestoreTest` field-for-field. An absent
|
||||||
- The handler logs a **FAILED PBS verify prominently** (`[WARN]` — the loudest offsite-DR
|
`warnings_recognized` ⇒ `false` ⇒ the **louder** unrecognized path, so a missing flag can only
|
||||||
signal); the host-report info line now counts pbs-snapshots too.
|
over-notice, never hide a real warning.
|
||||||
- The shared `testdata/host-report.golden.json` carries a populated `pbs_snapshots[0]`,
|
- **Ingest behaviour:** a passed restore-test that carried warnings now logs
|
||||||
**byte-identical** with felhom-agent's copy; `TestHostPBSSnapshot_GoldenContract` is the
|
`[INFO] restore-test passed WITH WARNINGS (recognized)` when every warning is the known-benign
|
||||||
hub half of the bidirectional key-set test. `go test ./internal/api/` is green.
|
anchor, escalated to `[WARN] … UNRECOGNIZED WARNINGS` otherwise (as loud as a failed PBS
|
||||||
|
verify). A FAILED restore-test still logs the existing `[WARN] … FAILED`.
|
||||||
|
- **Contract:** `restore_tests[0]` in the host-report golden gains the two keys; the golden stays
|
||||||
|
**byte-identical** with felhom-agent's copy (sha256 `e6999d77…`), and the bidirectional
|
||||||
|
key-set contract test round-trips the new keys through `hostRestoreTest`. `go test ./...` green.
|
||||||
|
|
||||||
|
## Scope note — no dashboard widget this slice
|
||||||
|
|
||||||
|
The task asked to "surface in the dashboard distinctly from a clean pass." The hub web layer
|
||||||
|
currently renders **only controller-report data** — there is no host-domain dashboard surface
|
||||||
|
yet (guests/storage/restore_tests/pbs_snapshots are log+persist only; the failed-PBS-verify
|
||||||
|
signal is likewise log-only). Building one is out of scope here; distinct dashboard treatment
|
||||||
|
should land with the host-domain dashboard (slice 10). The operator signal this slice is the log
|
||||||
|
line, consistent with the established failed-PBS-verify precedent.
|
||||||
|
|
||||||
## Backward compatibility
|
## Backward compatibility
|
||||||
|
|
||||||
An agent that omits/empties `pbs_snapshots` is accepted unchanged. The legacy controller
|
An agent that omits/empties `warnings`/`warnings_recognized` is accepted unchanged (the deployed
|
||||||
report path is untouched (frozen until the slice-10 cutover).
|
v0.7.4 hub already ignores them). The legacy controller report path is untouched.
|
||||||
|
|
||||||
## Deploy
|
## Deploy (GitOps)
|
||||||
|
|
||||||
> Per the GitOps flow (`CLAUDE.md`): build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.4`,
|
Build+push `gitea.dooplex.hu/admin/felhom-hub:v0.7.5` → bump the `image:` tag in
|
||||||
> bump `manifests/hub.yaml`, commit, then sync the `felhom` ArgoCD app. **Deferred** at this
|
`manifests/hub.yaml` → commit → sync the `felhom` ArgoCD app (auto-sync off). Live-validated
|
||||||
> checkpoint — the change is backward-compatible, so the live hub (v0.7.3) keeps ingesting
|
after sync: the demo host's restore-test (agent v0.7.0, which passes-with-recognized-warnings on
|
||||||
> host-reports fine until then.
|
the Debian-13 guest 9999) reflects on the hub as `passed WITH WARNINGS (recognized)` — not a
|
||||||
|
plain pass and not a FAILED.
|
||||||
|
|||||||
@@ -1,5 +1,35 @@
|
|||||||
# Felhom Hub — Changelog
|
# Felhom Hub — Changelog
|
||||||
|
|
||||||
|
## v0.7.5 — restore-test "passed with warnings" visibility (2026-06-09)
|
||||||
|
|
||||||
|
Hub half of `TASK — Restore-test must not false-fail on benign start warnings` (Phase B). The
|
||||||
|
agent (v0.7.0) now treats a guest-start advisory like the systemd-nesting warning as a PASS
|
||||||
|
(verdict is liveness, not the start exitstatus) and carries the warning text on the wire. This
|
||||||
|
makes that visible to the operator instead of indistinguishable from a clean pass.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `hostRestoreTest.warnings` (`[]string`) + `warnings_recognized` (`bool`) mirror fields, matching
|
||||||
|
the agent's `hub.RestoreTest` wire contract (`omitempty`; an absent `warnings_recognized` ⇒
|
||||||
|
`false` ⇒ treated as the louder unrecognized case — a missing flag can only over-notice).
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Host-report ingest now surfaces a **passed** restore-test that carried warnings:
|
||||||
|
`[INFO] restore-test passed WITH WARNINGS (recognized)` when every warning is the known-benign
|
||||||
|
anchor, escalated to `[WARN] … UNRECOGNIZED WARNINGS` otherwise — as loud as a failed PBS
|
||||||
|
verify, so a real restore warning can't hide behind a green pass. A FAILED restore-test still
|
||||||
|
logs the existing `[WARN] … FAILED`.
|
||||||
|
|
||||||
|
### Tests / contract
|
||||||
|
- `restore_tests[0]` in the host-report golden gains `warnings` + `warnings_recognized`; the golden
|
||||||
|
stays **byte-identical** with felhom-agent's copy (sha256-verified) and the bidirectional
|
||||||
|
key-set contract test now round-trips the new keys through `hostRestoreTest`.
|
||||||
|
|
||||||
|
### Not in this slice
|
||||||
|
- No dashboard widget: the hub web layer renders only controller-report data — there is no
|
||||||
|
host-domain dashboard surface yet (guests/storage/restore_tests/pbs_snapshots are log+persist
|
||||||
|
only, same as the failed-PBS-verify signal). Distinct dashboard treatment lands when the
|
||||||
|
host-domain dashboard does (slice 10). The operator signal this slice is the log line.
|
||||||
|
|
||||||
## v0.7.4 — ingest agent pbs_snapshots (slice 6 Phase B) (2026-06-09)
|
## v0.7.4 — ingest agent pbs_snapshots (slice 6 Phase B) (2026-06-09)
|
||||||
|
|
||||||
The agent's slice-6 Phase B work populates the host-report's `pbs_snapshots` (the PBS offsite
|
The agent's slice-6 Phase B work populates the host-report's `pbs_snapshots` (the PBS offsite
|
||||||
|
|||||||
@@ -310,6 +310,13 @@ type hostRestoreTest struct {
|
|||||||
Error string `json:"error,omitempty"`
|
Error string `json:"error,omitempty"`
|
||||||
TestedAt string `json:"tested_at"`
|
TestedAt string `json:"tested_at"`
|
||||||
DurationSeconds float64 `json:"duration_seconds"`
|
DurationSeconds float64 `json:"duration_seconds"`
|
||||||
|
// Warnings are the guest-start task's warning line(s) on a PASS (e.g. the systemd-nesting
|
||||||
|
// advisory). The verdict is liveness-only, so a passed restore-test can carry warnings.
|
||||||
|
Warnings []string `json:"warnings,omitempty"`
|
||||||
|
// WarningsRecognized is true iff every warning is the known-benign anchor. Absent ⇒ false,
|
||||||
|
// which is the SAFE default: the hub then treats it as an unrecognized warning (the louder
|
||||||
|
// path), so a missing flag can only over-notice, never hide a real warning.
|
||||||
|
WarningsRecognized bool `json:"warnings_recognized,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// hostStorageTarget mirrors the agent's hub.StorageTarget wire contract field-for-field.
|
// hostStorageTarget mirrors the agent's hub.StorageTarget wire contract field-for-field.
|
||||||
@@ -448,11 +455,24 @@ func (h *Handler) handleHostReport(w http.ResponseWriter, r *http.Request) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// restore_tests (slice 6): a FAILED self-restore-test is the loudest DR signal there is
|
// restore_tests (slice 6): a FAILED self-restore-test is the loudest DR signal there is
|
||||||
// — surface it prominently. A backup whose vzdump failed is also worth a warning.
|
// — surface it prominently. A PASS that carried start warnings (e.g. the systemd-nesting
|
||||||
|
// advisory) is surfaced too: INFO when every warning is recognized-benign, escalated to
|
||||||
|
// WARN when an UNRECOGNIZED warning stood out (as loud as a failed PBS verify is for
|
||||||
|
// backups), so a real restore warning can't hide behind a green pass. A backup whose
|
||||||
|
// vzdump failed is also worth a warning.
|
||||||
for _, rt := range rep.RestoreTests {
|
for _, rt := range rep.RestoreTests {
|
||||||
if !rt.Pass {
|
switch {
|
||||||
|
case !rt.Pass:
|
||||||
h.logger.Printf("[WARN] host %s restore-test FAILED: archive=%s tier=%s scratch=%d err=%q",
|
h.logger.Printf("[WARN] host %s restore-test FAILED: archive=%s tier=%s scratch=%d err=%q",
|
||||||
hostID, rt.SourceArchive, rt.SourceTier, rt.ScratchVMID, rt.Error)
|
hostID, rt.SourceArchive, rt.SourceTier, rt.ScratchVMID, rt.Error)
|
||||||
|
case len(rt.Warnings) == 0:
|
||||||
|
// clean pass — nothing to surface here (counted in the summary line below).
|
||||||
|
case rt.WarningsRecognized:
|
||||||
|
h.logger.Printf("[INFO] host %s restore-test passed WITH WARNINGS (recognized): archive=%s tier=%s warnings=%v",
|
||||||
|
hostID, rt.SourceArchive, rt.SourceTier, rt.Warnings)
|
||||||
|
default:
|
||||||
|
h.logger.Printf("[WARN] host %s restore-test passed WITH UNRECOGNIZED WARNINGS: archive=%s tier=%s warnings=%v",
|
||||||
|
hostID, rt.SourceArchive, rt.SourceTier, rt.Warnings)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
for _, bk := range rep.Backups {
|
for _, bk := range rep.Backups {
|
||||||
|
|||||||
+5
-1
@@ -108,7 +108,11 @@
|
|||||||
"pass": true,
|
"pass": true,
|
||||||
"verified": "boot+running",
|
"verified": "boot+running",
|
||||||
"tested_at": "2026-06-09T11:05:00Z",
|
"tested_at": "2026-06-09T11:05:00Z",
|
||||||
"duration_seconds": 38.2
|
"duration_seconds": 38.2,
|
||||||
|
"warnings": [
|
||||||
|
"WARN: Systemd 257 detected. You may need to enable nesting."
|
||||||
|
],
|
||||||
|
"warnings_recognized": true
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"pbs_snapshots": [
|
"pbs_snapshots": [
|
||||||
|
|||||||
Reference in New Issue
Block a user