v0.4.0-rc1: slice 4 Phase A — reconcile engine (structural, runs live unfed)
New internal/reconcile package: the agent-side control core's structural half. - Per-guest serializer Queue (doc 03 §10): the single choke point all mutation sources funnel through; same-vmid serial in submit order, different vmids parallel (cond-var FIFO lanes). - Desired-state model + DesiredProvider seam; EmptyProvider is the only live source at slice 4 (no hub serving until slice 10) so the live engine computes an empty action set and performs zero mutations. - Normalization layer (FieldNormalizers): normalized desired-vs-actual so Proxmox round-trip quirks don't read as drift. normDesc promoted out of main.go to reconcile.NormDescription; selftest uses the shared helper. - Plan (pure diff): minimal benign action set (Start/Stop/SetConfig) for guests in both desired and actual; provision/destroy out of scope here. - Engine: dispatches onto the shared queue; honors the dual-mode SetConfig contract (UPID -> WaitTask; empty UPID -> synchronous success). - Durable op journal + idempotency store (mirrors authz.FileNonceStore): in-flight task ids for crash detection + AlreadyApplied dedupe across restart. - Wired into runDaemon alongside the hub loop, sharing the queue; runs cleanly with no desired state and no signers. Full module race-clean and vet-clean on the Linux build server. CHECKPOINT: Phase A only. Awaiting validation before Phase B (the reversibility gate + signed-op consuming layer, landing v0.4.0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,6 +3,75 @@
|
||||
All notable changes to **felhom-agent** are recorded here. Update on every code
|
||||
change that gets pushed.
|
||||
|
||||
## v0.4.0-rc1 — slice 4 Phase A: reconcile engine (structural; runs live, unfed) (2026-06-08)
|
||||
|
||||
The agent-side control core's structural half. **Checkpoint marker** — `-rc1` is the
|
||||
Phase-A push; awaiting validation before Phase B (the reversibility gate + signed-op
|
||||
consuming layer) lands the final **v0.4.0**. Runs LIVE but UNFED: with no desired-state
|
||||
provider until slice 10, the live engine computes an empty action set and performs
|
||||
**zero mutations**.
|
||||
|
||||
### Added
|
||||
- **`internal/reconcile`** package — the engine, the per-guest serializer, the
|
||||
desired-state model, the normalization layer, and the durable op journal:
|
||||
- **Per-guest serializer (`Queue`, doc 03 §10)** — the single choke point ALL
|
||||
mutation sources funnel through. Same-vmid jobs run strictly one-at-a-time in
|
||||
submit order; independent vmids run in parallel. Each vmid is a cond-var FIFO lane
|
||||
(unbounded, non-blocking, order-preserving); graceful drain on `Close`.
|
||||
- **Desired-state model + `DesiredProvider` seam** — `DesiredGuest` (per-field
|
||||
optional: run-state / `*hub.GuestSpec` / `*description`), `DesiredState`. The only
|
||||
live provider is **`EmptyProvider`** (slice 4 has no source); `StaticProvider`
|
||||
feeds fixtures. The seam is where slice 10's hub-serving plugs in — no hub/local
|
||||
source invented here.
|
||||
- **Normalization layer (`FieldNormalizers`)** — reconcile compares *normalized*
|
||||
desired-vs-actual so Proxmox round-trip quirks don't read as drift. `description`'s
|
||||
trailing newline is the first registered case; the registry takes more (boolean
|
||||
coercion, list ordering) as discovered. `normDesc` **promoted** out of
|
||||
`cmd/felhom-agent/main.go` to **`reconcile.NormDescription`**; the `--selftest=task`
|
||||
description round-trip now uses that shared helper (one source of truth for the quirk).
|
||||
- **Plan engine (`Plan`, pure function)** — computes the minimal **benign** action set
|
||||
(`Start`/`Stop`/`SetConfig`) for guests present in both desired and actual, with
|
||||
normalized comparison, deterministic vmid ordering, config-before-run-state. Skips
|
||||
provision (desired-absent-in-actual, slice 7) and destroy (actual-absent-in-desired,
|
||||
gated, slice 10); never writes a config it couldn't first read (`SpecKnown`). Disk
|
||||
(rootfs grow) intentionally not reconciled here.
|
||||
- **Reconcile engine (`Engine`)** — reads desired+actual, plans, dispatches each action
|
||||
onto the shared queue. Every Proxmox op handled per the mutate.go contract: non-empty
|
||||
UPID → `WaitTask` + assert `exitstatus`; empty UPID → clean **synchronous** success
|
||||
(slice-4 proven). Per-action failures are counted, not fatal (other guests still
|
||||
converge).
|
||||
- **Operation journal (`Journal`)** — durable fsync'd append-only JSONL mirroring
|
||||
`authz.FileNonceStore`: records each op's lifecycle (started → task_running →
|
||||
succeeded/failed) with its Proxmox task id (crash mid-op is detected and re-checkable
|
||||
on restart via `InFlight()`), plus an **idempotency-key store** (`AlreadyApplied`) so
|
||||
a one-shot op never re-runs across retries/restarts. Reconcile actions carry no
|
||||
idempotency key (convergent — must re-run on real drift).
|
||||
- **Daemon wiring (`runDaemon`)** — reconcile runs alongside the hub loop on the poll
|
||||
cadence, **sharing the per-guest queue**. Journal path is a `journal.log` sibling of the
|
||||
nonce store. The daemon runs cleanly with **no desired state and no signers** (reconcile
|
||||
is a logged live no-op; a journal-open failure degrades to journal-less, never crashes).
|
||||
|
||||
### Tests
|
||||
- Serializer: same-guest serialized (max-concurrency 1, submit order preserved) and
|
||||
different-guests parallel (cross-waiting jobs both complete — would deadlock if not);
|
||||
error propagation; drain-pending-on-close; submit-after-close.
|
||||
- Normalization: description round-trip; unknown-field identity; extensibility seam
|
||||
(synthetic boolean-coercion + list-ordering normalizers).
|
||||
- Plan: run-state start/stop, spec drift (cores/memory), disk-not-reconciled,
|
||||
description-newline-not-drift, unmanaged fields, spec-unknown skips config keeps
|
||||
run-state, desired-absent skipped, combined ordering, empty-desired no-op, deterministic
|
||||
vmid order.
|
||||
- Engine: empty-provider zero mutations; async start (WaitTask); synchronous SetConfig
|
||||
(no WaitTask); WaitTask failure + POST error counted failed; list error = pass failure.
|
||||
- Journal: lifecycle latest-wins; in-flight survives restart; idempotency dedupe across
|
||||
restart; failed key not applied; torn-trailing-line skipped.
|
||||
- Full module **race-clean** (`go test -race`) on the Linux build server; vet clean.
|
||||
|
||||
### Not in this phase (Phase B)
|
||||
- The benign/destructive classifier, the reversibility gate, and the signed-op consuming
|
||||
layer over `internal/authz` (doc 03 §4 / doc 04) — added next, in front of the queue's
|
||||
executor, landing **v0.4.0**.
|
||||
|
||||
## v0.3.2 — SetConfig selftest extension (slice-4 pre-check) (2026-06-08)
|
||||
|
||||
The gate before slice 4: prove `SetConfig` works live under the scoped token before
|
||||
|
||||
Reference in New Issue
Block a user