New internal/reconcile package: the agent-side control core's structural half. - Per-guest serializer Queue (doc 03 §10): the single choke point all mutation sources funnel through; same-vmid serial in submit order, different vmids parallel (cond-var FIFO lanes). - Desired-state model + DesiredProvider seam; EmptyProvider is the only live source at slice 4 (no hub serving until slice 10) so the live engine computes an empty action set and performs zero mutations. - Normalization layer (FieldNormalizers): normalized desired-vs-actual so Proxmox round-trip quirks don't read as drift. normDesc promoted out of main.go to reconcile.NormDescription; selftest uses the shared helper. - Plan (pure diff): minimal benign action set (Start/Stop/SetConfig) for guests in both desired and actual; provision/destroy out of scope here. - Engine: dispatches onto the shared queue; honors the dual-mode SetConfig contract (UPID -> WaitTask; empty UPID -> synchronous success). - Durable op journal + idempotency store (mirrors authz.FileNonceStore): in-flight task ids for crash detection + AlreadyApplied dedupe across restart. - Wired into runDaemon alongside the hub loop, sharing the queue; runs cleanly with no desired state and no signers. Full module race-clean and vet-clean on the Linux build server. CHECKPOINT: Phase A only. Awaiting validation before Phase B (the reversibility gate + signed-op consuming layer, landing v0.4.0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4.9 KiB
REPORT — Slice 4 Phase A: reconcile engine (structural) (2026-06-08)
Overwrite-latest report (most recent significant work only). Cumulative history lives in CHANGELOG.md.
Outcome
Phase A of slice 4 is implemented, tested, and pushed as the checkpoint marker
v0.4.0-rc1. This is the structural half of the agent-side control core: the
reconcile engine, the per-guest serializer (doc 03 §10), the desired-state model + its
provider seam, the field-normalization layer, the plan/diff engine, and the durable
operation journal + idempotency store — all adversarially fixture-tested.
Per the task, I have STOPPED at the checkpoint and am awaiting the validation pass
before starting Phase B (the benign/destructive classifier, the reversibility gate,
and the signed-op consuming layer over internal/authz). Phase B is the security core
and earns isolated review.
What runs (and what deliberately doesn't)
The engine runs live but unfed. At slice 4 there is no desired-state source (hub
serving is slice 10; provisioning is slice 7), so the only production DesiredProvider
is EmptyProvider → the live engine reads state, computes an empty action set, and
performs zero mutations every tick. That is the correct, expected slice-4 behavior;
the first live convergence arrives when slice 10 serves desired state into the seam.
The wired action set is benign-on-existing-guest only: Start, Stop, SetConfig.
Provisioning and the destructive set are out of scope for Phase A (the destructive set
is classified and gated in Phase B but not wired to live execution — nothing serves
destructive deltas yet).
Package internal/reconcile
Queue(per-guest serializer, doc 03 §10) — the single choke point all mutation sources funnel through. Same-vmid jobs run strictly one-at-a-time in submit order; independent vmids run in parallel. Each vmid is an unbounded cond-var FIFO lane (non-blocking, order-preserving submission);Closedrains pending jobs gracefully.- Desired-state model +
DesiredProvider—DesiredGuestmakes each field individually optional (run-state /*hub.GuestSpec/*description) so a source pins only what it manages.EmptyProvider(live, slice 4) andStaticProvider(fixtures). - Normalization layer (
FieldNormalizers) — reconcile compares normalized desired-vs-actual.description's trailing newline (the slice-4-proven quirk) is the first registered normalizer; the registry takes more as discovered.normDescwas promoted out ofmain.gotoreconcile.NormDescription, and the--selftest=taskround-trip now uses that shared helper — one source of truth. Plan(pure diff engine) — minimal benign action set for guests in both desired and actual: normalized comparison, deterministic vmid order, config-before-run-state. Skips provision (slice 7) and destroy (gated, slice 10); never writes a config it couldn't first read; disk grow deferred.Engine— reads desired+actual, plans, dispatches onto the shared queue. Honors the mutate.go dual-mode contract: non-empty UPID →WaitTask+assert; empty UPID → clean synchronous success. Per-action failures counted, never fatal.Journal— durable fsync'd JSONL (mirrorsauthz.FileNonceStore): op lifecycle with the Proxmox task id (crash mid-op detected + re-checkable viaInFlight()), plus an idempotency-key store so a one-shot op never double-runs across retries/restarts. Reconcile actions carry no idempotency key (convergent — must re-run on real drift).
Daemon wiring
runDaemon now runs reconcile alongside the hub loop on the poll cadence, sharing the
per-guest queue. The journal lives at a journal.log sibling of the nonce store. The
daemon runs cleanly with no desired state and no signers — reconcile is a logged
no-op; a journal-open failure degrades to journal-less rather than crashing.
Verification
- Full module race-clean (
go test -race -count=1 ./...) andgo vetclean on the Linux build server (go1.26); all unit tests green locally and there. - Adversarial fixture coverage: serializer concurrency/ordering, normalization + extensibility seam, the full plan matrix (drift / no-false-drift / unmanaged / spec-unknown / scope skips / ordering / empty-desired), engine sync-vs-async + failure counting, and journal persistence + idempotency dedupe across a simulated restart.
- No live Proxmox needed (the engine is unfed); the live exercise is deferred — there is nothing to converge until a desired-state source exists.
Next (after validation)
Phase B: the classifier (benign vs destructive by provenance + data-bearing-ness, not by
verb), the reversibility gate in front of the queue's executor, and the signed-op
consuming layer over internal/authz with role-scoping + op-to-action binding + the
adversarial rejection matrix — landing v0.4.0. I will not start it until the Phase-A
validation passes.