Files
felhom-agent/REPORT.md
T
admin 05c450147c v0.4.0-rc1: slice 4 Phase A — reconcile engine (structural, runs live unfed)
New internal/reconcile package: the agent-side control core's structural half.

- Per-guest serializer Queue (doc 03 §10): the single choke point all mutation
  sources funnel through; same-vmid serial in submit order, different vmids
  parallel (cond-var FIFO lanes).
- Desired-state model + DesiredProvider seam; EmptyProvider is the only live
  source at slice 4 (no hub serving until slice 10) so the live engine computes
  an empty action set and performs zero mutations.
- Normalization layer (FieldNormalizers): normalized desired-vs-actual so
  Proxmox round-trip quirks don't read as drift. normDesc promoted out of
  main.go to reconcile.NormDescription; selftest uses the shared helper.
- Plan (pure diff): minimal benign action set (Start/Stop/SetConfig) for guests
  in both desired and actual; provision/destroy out of scope here.
- Engine: dispatches onto the shared queue; honors the dual-mode SetConfig
  contract (UPID -> WaitTask; empty UPID -> synchronous success).
- Durable op journal + idempotency store (mirrors authz.FileNonceStore):
  in-flight task ids for crash detection + AlreadyApplied dedupe across restart.
- Wired into runDaemon alongside the hub loop, sharing the queue; runs cleanly
  with no desired state and no signers.

Full module race-clean and vet-clean on the Linux build server.

CHECKPOINT: Phase A only. Awaiting validation before Phase B (the reversibility
gate + signed-op consuming layer, landing v0.4.0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 23:21:55 +02:00

4.9 KiB

REPORT — Slice 4 Phase A: reconcile engine (structural) (2026-06-08)

Overwrite-latest report (most recent significant work only). Cumulative history lives in CHANGELOG.md.

Outcome

Phase A of slice 4 is implemented, tested, and pushed as the checkpoint marker v0.4.0-rc1. This is the structural half of the agent-side control core: the reconcile engine, the per-guest serializer (doc 03 §10), the desired-state model + its provider seam, the field-normalization layer, the plan/diff engine, and the durable operation journal + idempotency store — all adversarially fixture-tested.

Per the task, I have STOPPED at the checkpoint and am awaiting the validation pass before starting Phase B (the benign/destructive classifier, the reversibility gate, and the signed-op consuming layer over internal/authz). Phase B is the security core and earns isolated review.

What runs (and what deliberately doesn't)

The engine runs live but unfed. At slice 4 there is no desired-state source (hub serving is slice 10; provisioning is slice 7), so the only production DesiredProvider is EmptyProvider → the live engine reads state, computes an empty action set, and performs zero mutations every tick. That is the correct, expected slice-4 behavior; the first live convergence arrives when slice 10 serves desired state into the seam.

The wired action set is benign-on-existing-guest only: Start, Stop, SetConfig. Provisioning and the destructive set are out of scope for Phase A (the destructive set is classified and gated in Phase B but not wired to live execution — nothing serves destructive deltas yet).

Package internal/reconcile

  • Queue (per-guest serializer, doc 03 §10) — the single choke point all mutation sources funnel through. Same-vmid jobs run strictly one-at-a-time in submit order; independent vmids run in parallel. Each vmid is an unbounded cond-var FIFO lane (non-blocking, order-preserving submission); Close drains pending jobs gracefully.
  • Desired-state model + DesiredProviderDesiredGuest makes each field individually optional (run-state / *hub.GuestSpec / *description) so a source pins only what it manages. EmptyProvider (live, slice 4) and StaticProvider (fixtures).
  • Normalization layer (FieldNormalizers) — reconcile compares normalized desired-vs-actual. description's trailing newline (the slice-4-proven quirk) is the first registered normalizer; the registry takes more as discovered. normDesc was promoted out of main.go to reconcile.NormDescription, and the --selftest=task round-trip now uses that shared helper — one source of truth.
  • Plan (pure diff engine) — minimal benign action set for guests in both desired and actual: normalized comparison, deterministic vmid order, config-before-run-state. Skips provision (slice 7) and destroy (gated, slice 10); never writes a config it couldn't first read; disk grow deferred.
  • Engine — reads desired+actual, plans, dispatches onto the shared queue. Honors the mutate.go dual-mode contract: non-empty UPID → WaitTask+assert; empty UPID → clean synchronous success. Per-action failures counted, never fatal.
  • Journal — durable fsync'd JSONL (mirrors authz.FileNonceStore): op lifecycle with the Proxmox task id (crash mid-op detected + re-checkable via InFlight()), plus an idempotency-key store so a one-shot op never double-runs across retries/restarts. Reconcile actions carry no idempotency key (convergent — must re-run on real drift).

Daemon wiring

runDaemon now runs reconcile alongside the hub loop on the poll cadence, sharing the per-guest queue. The journal lives at a journal.log sibling of the nonce store. The daemon runs cleanly with no desired state and no signers — reconcile is a logged no-op; a journal-open failure degrades to journal-less rather than crashing.

Verification

  • Full module race-clean (go test -race -count=1 ./...) and go vet clean on the Linux build server (go1.26); all unit tests green locally and there.
  • Adversarial fixture coverage: serializer concurrency/ordering, normalization + extensibility seam, the full plan matrix (drift / no-false-drift / unmanaged / spec-unknown / scope skips / ordering / empty-desired), engine sync-vs-async + failure counting, and journal persistence + idempotency dedupe across a simulated restart.
  • No live Proxmox needed (the engine is unfed); the live exercise is deferred — there is nothing to converge until a desired-state source exists.

Next (after validation)

Phase B: the classifier (benign vs destructive by provenance + data-bearing-ness, not by verb), the reversibility gate in front of the queue's executor, and the signed-op consuming layer over internal/authz with role-scoping + op-to-action binding + the adversarial rejection matrix — landing v0.4.0. I will not start it until the Phase-A validation passes.