05c450147c
New internal/reconcile package: the agent-side control core's structural half. - Per-guest serializer Queue (doc 03 §10): the single choke point all mutation sources funnel through; same-vmid serial in submit order, different vmids parallel (cond-var FIFO lanes). - Desired-state model + DesiredProvider seam; EmptyProvider is the only live source at slice 4 (no hub serving until slice 10) so the live engine computes an empty action set and performs zero mutations. - Normalization layer (FieldNormalizers): normalized desired-vs-actual so Proxmox round-trip quirks don't read as drift. normDesc promoted out of main.go to reconcile.NormDescription; selftest uses the shared helper. - Plan (pure diff): minimal benign action set (Start/Stop/SetConfig) for guests in both desired and actual; provision/destroy out of scope here. - Engine: dispatches onto the shared queue; honors the dual-mode SetConfig contract (UPID -> WaitTask; empty UPID -> synchronous success). - Durable op journal + idempotency store (mirrors authz.FileNonceStore): in-flight task ids for crash detection + AlreadyApplied dedupe across restart. - Wired into runDaemon alongside the hub loop, sharing the queue; runs cleanly with no desired state and no signers. Full module race-clean and vet-clean on the Linux build server. CHECKPOINT: Phase A only. Awaiting validation before Phase B (the reversibility gate + signed-op consuming layer, landing v0.4.0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
83 lines
4.9 KiB
Markdown
83 lines
4.9 KiB
Markdown
# REPORT — Slice 4 Phase A: reconcile engine (structural) (2026-06-08)
|
|
|
|
> Overwrite-latest report (most recent significant work only). Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
|
|
|
|
## Outcome
|
|
|
|
**Phase A of slice 4 is implemented, tested, and pushed as the checkpoint marker
|
|
`v0.4.0-rc1`.** This is the structural half of the agent-side control core: the
|
|
reconcile engine, the per-guest serializer (doc 03 §10), the desired-state model + its
|
|
provider seam, the field-normalization layer, the plan/diff engine, and the durable
|
|
operation journal + idempotency store — all adversarially fixture-tested.
|
|
|
|
**Per the task, I have STOPPED at the checkpoint and am awaiting the validation pass
|
|
before starting Phase B** (the benign/destructive classifier, the reversibility gate,
|
|
and the signed-op consuming layer over `internal/authz`). Phase B is the security core
|
|
and earns isolated review.
|
|
|
|
## What runs (and what deliberately doesn't)
|
|
|
|
The engine **runs live but unfed**. At slice 4 there is no desired-state source (hub
|
|
serving is slice 10; provisioning is slice 7), so the only production `DesiredProvider`
|
|
is `EmptyProvider` → the live engine reads state, computes an **empty action set**, and
|
|
performs **zero mutations** every tick. That is the correct, expected slice-4 behavior;
|
|
the first live convergence arrives when slice 10 serves desired state into the seam.
|
|
|
|
The wired action set is **benign-on-existing-guest only**: `Start`, `Stop`, `SetConfig`.
|
|
Provisioning and the destructive set are out of scope for Phase A (the destructive set
|
|
is classified and gated in Phase B but not wired to live execution — nothing serves
|
|
destructive deltas yet).
|
|
|
|
## Package `internal/reconcile`
|
|
|
|
- **`Queue` (per-guest serializer, doc 03 §10)** — the single choke point all mutation
|
|
sources funnel through. Same-vmid jobs run strictly one-at-a-time in submit order;
|
|
independent vmids run in parallel. Each vmid is an unbounded cond-var FIFO lane
|
|
(non-blocking, order-preserving submission); `Close` drains pending jobs gracefully.
|
|
- **Desired-state model + `DesiredProvider`** — `DesiredGuest` makes each field
|
|
individually optional (run-state / `*hub.GuestSpec` / `*description`) so a source pins
|
|
only what it manages. `EmptyProvider` (live, slice 4) and `StaticProvider` (fixtures).
|
|
- **Normalization layer (`FieldNormalizers`)** — reconcile compares *normalized*
|
|
desired-vs-actual. `description`'s trailing newline (the slice-4-proven quirk) is the
|
|
first registered normalizer; the registry takes more as discovered. `normDesc` was
|
|
**promoted** out of `main.go` to `reconcile.NormDescription`, and the `--selftest=task`
|
|
round-trip now uses that shared helper — one source of truth.
|
|
- **`Plan` (pure diff engine)** — minimal benign action set for guests in both desired
|
|
and actual: normalized comparison, deterministic vmid order, config-before-run-state.
|
|
Skips provision (slice 7) and destroy (gated, slice 10); never writes a config it
|
|
couldn't first read; disk grow deferred.
|
|
- **`Engine`** — reads desired+actual, plans, dispatches onto the shared queue. Honors
|
|
the mutate.go dual-mode contract: non-empty UPID → `WaitTask`+assert; empty UPID →
|
|
clean synchronous success. Per-action failures counted, never fatal.
|
|
- **`Journal`** — durable fsync'd JSONL (mirrors `authz.FileNonceStore`): op lifecycle
|
|
with the Proxmox task id (crash mid-op detected + re-checkable via `InFlight()`), plus
|
|
an idempotency-key store so a one-shot op never double-runs across retries/restarts.
|
|
Reconcile actions carry no idempotency key (convergent — must re-run on real drift).
|
|
|
|
## Daemon wiring
|
|
|
|
`runDaemon` now runs reconcile alongside the hub loop on the poll cadence, sharing the
|
|
per-guest queue. The journal lives at a `journal.log` sibling of the nonce store. The
|
|
daemon runs cleanly with **no desired state and no signers** — reconcile is a logged
|
|
no-op; a journal-open failure degrades to journal-less rather than crashing.
|
|
|
|
## Verification
|
|
|
|
- Full module **race-clean** (`go test -race -count=1 ./...`) and `go vet` clean on the
|
|
Linux build server (go1.26); all unit tests green locally and there.
|
|
- Adversarial fixture coverage: serializer concurrency/ordering, normalization +
|
|
extensibility seam, the full plan matrix (drift / no-false-drift / unmanaged /
|
|
spec-unknown / scope skips / ordering / empty-desired), engine sync-vs-async +
|
|
failure counting, and journal persistence + idempotency dedupe **across a simulated
|
|
restart**.
|
|
- No live Proxmox needed (the engine is unfed); the live exercise is deferred — there is
|
|
nothing to converge until a desired-state source exists.
|
|
|
|
## Next (after validation)
|
|
|
|
Phase B: the classifier (benign vs destructive by provenance + data-bearing-ness, not by
|
|
verb), the reversibility gate in front of the queue's executor, and the signed-op
|
|
consuming layer over `internal/authz` with role-scoping + op-to-action binding + the
|
|
adversarial rejection matrix — landing **v0.4.0**. I will not start it until the Phase-A
|
|
validation passes.
|