v0.4.0-rc1: slice 4 Phase A — reconcile engine (structural, runs live unfed)
New internal/reconcile package: the agent-side control core's structural half. - Per-guest serializer Queue (doc 03 §10): the single choke point all mutation sources funnel through; same-vmid serial in submit order, different vmids parallel (cond-var FIFO lanes). - Desired-state model + DesiredProvider seam; EmptyProvider is the only live source at slice 4 (no hub serving until slice 10) so the live engine computes an empty action set and performs zero mutations. - Normalization layer (FieldNormalizers): normalized desired-vs-actual so Proxmox round-trip quirks don't read as drift. normDesc promoted out of main.go to reconcile.NormDescription; selftest uses the shared helper. - Plan (pure diff): minimal benign action set (Start/Stop/SetConfig) for guests in both desired and actual; provision/destroy out of scope here. - Engine: dispatches onto the shared queue; honors the dual-mode SetConfig contract (UPID -> WaitTask; empty UPID -> synchronous success). - Durable op journal + idempotency store (mirrors authz.FileNonceStore): in-flight task ids for crash detection + AlreadyApplied dedupe across restart. - Wired into runDaemon alongside the hub loop, sharing the queue; runs cleanly with no desired state and no signers. Full module race-clean and vet-clean on the Linux build server. CHECKPOINT: Phase A only. Awaiting validation before Phase B (the reversibility gate + signed-op consuming layer, landing v0.4.0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,6 +3,75 @@
|
|||||||
All notable changes to **felhom-agent** are recorded here. Update on every code
|
All notable changes to **felhom-agent** are recorded here. Update on every code
|
||||||
change that gets pushed.
|
change that gets pushed.
|
||||||
|
|
||||||
|
## v0.4.0-rc1 — slice 4 Phase A: reconcile engine (structural; runs live, unfed) (2026-06-08)
|
||||||
|
|
||||||
|
The agent-side control core's structural half. **Checkpoint marker** — `-rc1` is the
|
||||||
|
Phase-A push; awaiting validation before Phase B (the reversibility gate + signed-op
|
||||||
|
consuming layer) lands the final **v0.4.0**. Runs LIVE but UNFED: with no desired-state
|
||||||
|
provider until slice 10, the live engine computes an empty action set and performs
|
||||||
|
**zero mutations**.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- **`internal/reconcile`** package — the engine, the per-guest serializer, the
|
||||||
|
desired-state model, the normalization layer, and the durable op journal:
|
||||||
|
- **Per-guest serializer (`Queue`, doc 03 §10)** — the single choke point ALL
|
||||||
|
mutation sources funnel through. Same-vmid jobs run strictly one-at-a-time in
|
||||||
|
submit order; independent vmids run in parallel. Each vmid is a cond-var FIFO lane
|
||||||
|
(unbounded, non-blocking, order-preserving); graceful drain on `Close`.
|
||||||
|
- **Desired-state model + `DesiredProvider` seam** — `DesiredGuest` (per-field
|
||||||
|
optional: run-state / `*hub.GuestSpec` / `*description`), `DesiredState`. The only
|
||||||
|
live provider is **`EmptyProvider`** (slice 4 has no source); `StaticProvider`
|
||||||
|
feeds fixtures. The seam is where slice 10's hub-serving plugs in — no hub/local
|
||||||
|
source invented here.
|
||||||
|
- **Normalization layer (`FieldNormalizers`)** — reconcile compares *normalized*
|
||||||
|
desired-vs-actual so Proxmox round-trip quirks don't read as drift. `description`'s
|
||||||
|
trailing newline is the first registered case; the registry takes more (boolean
|
||||||
|
coercion, list ordering) as discovered. `normDesc` **promoted** out of
|
||||||
|
`cmd/felhom-agent/main.go` to **`reconcile.NormDescription`**; the `--selftest=task`
|
||||||
|
description round-trip now uses that shared helper (one source of truth for the quirk).
|
||||||
|
- **Plan engine (`Plan`, pure function)** — computes the minimal **benign** action set
|
||||||
|
(`Start`/`Stop`/`SetConfig`) for guests present in both desired and actual, with
|
||||||
|
normalized comparison, deterministic vmid ordering, config-before-run-state. Skips
|
||||||
|
provision (desired-absent-in-actual, slice 7) and destroy (actual-absent-in-desired,
|
||||||
|
gated, slice 10); never writes a config it couldn't first read (`SpecKnown`). Disk
|
||||||
|
(rootfs grow) intentionally not reconciled here.
|
||||||
|
- **Reconcile engine (`Engine`)** — reads desired+actual, plans, dispatches each action
|
||||||
|
onto the shared queue. Every Proxmox op handled per the mutate.go contract: non-empty
|
||||||
|
UPID → `WaitTask` + assert `exitstatus`; empty UPID → clean **synchronous** success
|
||||||
|
(slice-4 proven). Per-action failures are counted, not fatal (other guests still
|
||||||
|
converge).
|
||||||
|
- **Operation journal (`Journal`)** — durable fsync'd append-only JSONL mirroring
|
||||||
|
`authz.FileNonceStore`: records each op's lifecycle (started → task_running →
|
||||||
|
succeeded/failed) with its Proxmox task id (crash mid-op is detected and re-checkable
|
||||||
|
on restart via `InFlight()`), plus an **idempotency-key store** (`AlreadyApplied`) so
|
||||||
|
a one-shot op never re-runs across retries/restarts. Reconcile actions carry no
|
||||||
|
idempotency key (convergent — must re-run on real drift).
|
||||||
|
- **Daemon wiring (`runDaemon`)** — reconcile runs alongside the hub loop on the poll
|
||||||
|
cadence, **sharing the per-guest queue**. Journal path is a `journal.log` sibling of the
|
||||||
|
nonce store. The daemon runs cleanly with **no desired state and no signers** (reconcile
|
||||||
|
is a logged live no-op; a journal-open failure degrades to journal-less, never crashes).
|
||||||
|
|
||||||
|
### Tests
|
||||||
|
- Serializer: same-guest serialized (max-concurrency 1, submit order preserved) and
|
||||||
|
different-guests parallel (cross-waiting jobs both complete — would deadlock if not);
|
||||||
|
error propagation; drain-pending-on-close; submit-after-close.
|
||||||
|
- Normalization: description round-trip; unknown-field identity; extensibility seam
|
||||||
|
(synthetic boolean-coercion + list-ordering normalizers).
|
||||||
|
- Plan: run-state start/stop, spec drift (cores/memory), disk-not-reconciled,
|
||||||
|
description-newline-not-drift, unmanaged fields, spec-unknown skips config keeps
|
||||||
|
run-state, desired-absent skipped, combined ordering, empty-desired no-op, deterministic
|
||||||
|
vmid order.
|
||||||
|
- Engine: empty-provider zero mutations; async start (WaitTask); synchronous SetConfig
|
||||||
|
(no WaitTask); WaitTask failure + POST error counted failed; list error = pass failure.
|
||||||
|
- Journal: lifecycle latest-wins; in-flight survives restart; idempotency dedupe across
|
||||||
|
restart; failed key not applied; torn-trailing-line skipped.
|
||||||
|
- Full module **race-clean** (`go test -race`) on the Linux build server; vet clean.
|
||||||
|
|
||||||
|
### Not in this phase (Phase B)
|
||||||
|
- The benign/destructive classifier, the reversibility gate, and the signed-op consuming
|
||||||
|
layer over `internal/authz` (doc 03 §4 / doc 04) — added next, in front of the queue's
|
||||||
|
executor, landing **v0.4.0**.
|
||||||
|
|
||||||
## v0.3.2 — SetConfig selftest extension (slice-4 pre-check) (2026-06-08)
|
## v0.3.2 — SetConfig selftest extension (slice-4 pre-check) (2026-06-08)
|
||||||
|
|
||||||
The gate before slice 4: prove `SetConfig` works live under the scoped token before
|
The gate before slice 4: prove `SetConfig` works live under the scoped token before
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
- Module `gitea.dooplex.hu/admin/felhom-agent`; binary `felhom-agent` (`cmd/felhom-agent/`).
|
- Module `gitea.dooplex.hu/admin/felhom-agent`; binary `felhom-agent` (`cmd/felhom-agent/`).
|
||||||
- **Pure Go stdlib + `golang.org/x/crypto` only** — no web frameworks.
|
- **Pure Go stdlib + `golang.org/x/crypto` only** — no web frameworks.
|
||||||
- `go.mod` directive **go 1.25.0**; dep `golang.org/x/crypto v0.52.0` (declares go 1.25, will NOT build on Go 1.24). The **build server (192.168.0.180) runs go1.26.0** (upstream Go on PATH, backward-compatible). Build/run the agent there for live tests (same LAN as the demo host).
|
- `go.mod` directive **go 1.25.0**; dep `golang.org/x/crypto v0.52.0` (declares go 1.25, will NOT build on Go 1.24). The **build server (192.168.0.180) runs go1.26.0** (upstream Go on PATH, backward-compatible). Build/run the agent there for live tests (same LAN as the demo host).
|
||||||
- Version: `version` var in `cmd/felhom-agent/main.go`, overridable via `-ldflags "-X main.version=<v>"`; `--version` flag. **Current: v0.3.2.** Bump on meaningful changes + add a CHANGELOG entry.
|
- Version: `version` var in `cmd/felhom-agent/main.go`, overridable via `-ldflags "-X main.version=<v>"`; `--version` flag. **Current: v0.4.0-rc1** (slice-4 Phase A checkpoint; v0.4.0 lands with Phase B). Bump on meaningful changes + add a CHANGELOG entry.
|
||||||
|
|
||||||
## Layout
|
## Layout
|
||||||
|
|
||||||
@@ -48,7 +48,8 @@ Built in slices, all on `main`:
|
|||||||
- **v0.3.0** slice 3 — `internal/hub`: the first **daemon loop** (no-`--selftest` mode) posting a read-only `HostReport` to the hub (= the heartbeat). Report's storage/backup/restore/pbs/audit fields are **defined-but-empty** (slices 5/6); the envelope's desired-state/signed-ops fields are **parsed-but-ignored** (slice 4).
|
- **v0.3.0** slice 3 — `internal/hub`: the first **daemon loop** (no-`--selftest` mode) posting a read-only `HostReport` to the hub (= the heartbeat). Report's storage/backup/restore/pbs/audit fields are **defined-but-empty** (slices 5/6); the envelope's desired-state/signed-ops fields are **parsed-but-ignored** (slice 4).
|
||||||
- **v0.3.1** — slice-3 validation follow-ups.
|
- **v0.3.1** — slice-3 validation follow-ups.
|
||||||
- **v0.3.2** — slice-4 pre-check: reversible `SetConfig` step added to `--selftest=task`; passed live on guest 9999. Findings: LXC `description` write is **synchronous** (empty UPID — dual-mode modeling confirmed); PVE appends a trailing `\n` to `description` on read (reconcile must normalize). First live `VM.Config.*` exercise.
|
- **v0.3.2** — slice-4 pre-check: reversible `SetConfig` step added to `--selftest=task`; passed live on guest 9999. Findings: LXC `description` write is **synchronous** (empty UPID — dual-mode modeling confirmed); PVE appends a trailing `\n` to `description` on read (reconcile must normalize). First live `VM.Config.*` exercise.
|
||||||
- **Next: slice 4 (reconcile + benign/destructive gate)** — the first slice that issues real Proxmox mutations. The live `--selftest=task` gate (snapshot/rollback/delete + `SetConfig`) is now **passed**.
|
- **v0.4.0-rc1** — slice-4 **Phase A** (structural): `internal/reconcile` — engine, per-guest serializer (§10), desired-state model + `DesiredProvider` seam, normalization layer (`NormDescription` promoted out of main.go), plan/diff engine (benign Start/Stop/SetConfig set), durable op journal + idempotency store. Wired into `runDaemon` sharing the queue. Runs **live but unfed** (EmptyProvider → zero mutations until slice 10). Checkpoint: awaiting validation before Phase B.
|
||||||
|
- **Next: slice 4 Phase B** — the benign/destructive classifier, the reversibility gate, and the signed-op consuming layer over `internal/authz` (doc 03 §4 / doc 04), in front of the queue's executor → lands **v0.4.0**.
|
||||||
|
|
||||||
## Demo host (for live tests)
|
## Demo host (for live tests)
|
||||||
|
|
||||||
|
|||||||
@@ -1,76 +1,82 @@
|
|||||||
# REPORT — `SetConfig` selftest extension, live self-gate (2026-06-08)
|
# REPORT — Slice 4 Phase A: reconcile engine (structural) (2026-06-08)
|
||||||
|
|
||||||
> Overwrite-latest report (most recent significant run only). Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
|
> Overwrite-latest report (most recent significant work only). Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
|
||||||
|
|
||||||
## Outcome
|
## Outcome
|
||||||
|
|
||||||
**`SetConfig` PASSED live under the scoped operator token.** The slice-4 pre-check is
|
**Phase A of slice 4 is implemented, tested, and pushed as the checkpoint marker
|
||||||
satisfied — `--selftest=task -vmid 9999` now exercises a reversible `SetConfig`
|
`v0.4.0-rc1`.** This is the structural half of the agent-side control core: the
|
||||||
write+revert end-to-end and reached `=== selftest=task OK ===` (exit 0). Reconcile
|
reconcile engine, the per-guest serializer (doc 03 §10), the desired-state model + its
|
||||||
(slice 4) can be built on `SetConfig` with confidence.
|
provider seam, the field-normalization layer, the plan/diff engine, and the durable
|
||||||
|
operation journal + idempotency store — all adversarially fixture-tested.
|
||||||
|
|
||||||
## What was implemented
|
**Per the task, I have STOPPED at the checkpoint and am awaiting the validation pass
|
||||||
|
before starting Phase B** (the benign/destructive classifier, the reversibility gate,
|
||||||
|
and the signed-op consuming layer over `internal/authz`). Phase B is the security core
|
||||||
|
and earns isolated review.
|
||||||
|
|
||||||
A reversible `SetConfig` step appended to the existing `runSelftestTask` flow
|
## What runs (and what deliberately doesn't)
|
||||||
(`cmd/felhom-agent/main.go`, `selftestSetConfig`), keeping the prior
|
|
||||||
snapshot → rollback → delete-snapshot steps intact. Against guest 9999:
|
|
||||||
|
|
||||||
1. `GuestConfig` — capture the original `description` (was **absent**).
|
The engine **runs live but unfed**. At slice 4 there is no desired-state source (hub
|
||||||
2. `SetConfig description="felhom-selftest <RFC3339>"` — dual-mode return handled per
|
serving is slice 10; provisioning is slice 7), so the only production `DesiredProvider`
|
||||||
the `mutate.go` contract (empty UPID = synchronous; UPID = `WaitTask`+assert OK).
|
is `EmptyProvider` → the live engine reads state, computes an **empty action set**, and
|
||||||
3. `GuestConfig` again — confirm the marker landed.
|
performs **zero mutations** every tick. That is the correct, expected slice-4 behavior;
|
||||||
4. **Restore** — original was absent, so `SetConfig delete=description`; confirm cleared.
|
the first live convergence arrives when slice 10 serves desired state into the seam.
|
||||||
|
|
||||||
Output matches the existing format:
|
The wired action set is **benign-on-existing-guest only**: `Start`, `Stop`, `SetConfig`.
|
||||||
```
|
Provisioning and the destructive set are out of scope for Phase A (the destructive set
|
||||||
[ ok ] setconfig synchronous exitstatus=OK
|
is classified and gated in Phase B but not wired to live execution — nothing serves
|
||||||
[ ok ] verify-write description verified == marker
|
destructive deltas yet).
|
||||||
[ ok ] setconfig-revert synchronous exitstatus=OK
|
|
||||||
[ ok ] verify-revert description restored to original
|
|
||||||
```
|
|
||||||
|
|
||||||
## Key finding — synchronous, not async
|
## Package `internal/reconcile`
|
||||||
|
|
||||||
**The LXC `description` write came back synchronous (empty UPID).** PVE applied it
|
- **`Queue` (per-guest serializer, doc 03 §10)** — the single choke point all mutation
|
||||||
inline with no task object; the agent printed `synchronous exitstatus=OK` on the
|
sources funnel through. Same-vmid jobs run strictly one-at-a-time in submit order;
|
||||||
empty-string path. This confirms the agent's **dual-mode `SetConfig` modeling matches
|
independent vmids run in parallel. Each vmid is an unbounded cond-var FIFO lane
|
||||||
Proxmox reality**: for `description`, the empty-UPID branch is the live path, and
|
(non-blocking, order-preserving submission); `Close` drains pending jobs gracefully.
|
||||||
treating `""` as success (not an error) is correct. This was the **first live exercise
|
- **Desired-state model + `DesiredProvider`** — `DesiredGuest` makes each field
|
||||||
of the `VM.Config.*` privilege cluster** (previously only the snapshot/rollback/backup
|
individually optional (run-state / `*hub.GuestSpec` / `*description`) so a source pins
|
||||||
privileges had been run live).
|
only what it manages. `EmptyProvider` (live, slice 4) and `StaticProvider` (fixtures).
|
||||||
|
- **Normalization layer (`FieldNormalizers`)** — reconcile compares *normalized*
|
||||||
|
desired-vs-actual. `description`'s trailing newline (the slice-4-proven quirk) is the
|
||||||
|
first registered normalizer; the registry takes more as discovered. `normDesc` was
|
||||||
|
**promoted** out of `main.go` to `reconcile.NormDescription`, and the `--selftest=task`
|
||||||
|
round-trip now uses that shared helper — one source of truth.
|
||||||
|
- **`Plan` (pure diff engine)** — minimal benign action set for guests in both desired
|
||||||
|
and actual: normalized comparison, deterministic vmid order, config-before-run-state.
|
||||||
|
Skips provision (slice 7) and destroy (gated, slice 10); never writes a config it
|
||||||
|
couldn't first read; disk grow deferred.
|
||||||
|
- **`Engine`** — reads desired+actual, plans, dispatches onto the shared queue. Honors
|
||||||
|
the mutate.go dual-mode contract: non-empty UPID → `WaitTask`+assert; empty UPID →
|
||||||
|
clean synchronous success. Per-action failures counted, never fatal.
|
||||||
|
- **`Journal`** — durable fsync'd JSONL (mirrors `authz.FileNonceStore`): op lifecycle
|
||||||
|
with the Proxmox task id (crash mid-op detected + re-checkable via `InFlight()`), plus
|
||||||
|
an idempotency-key store so a one-shot op never double-runs across retries/restarts.
|
||||||
|
Reconcile actions carry no idempotency key (convergent — must re-run on real drift).
|
||||||
|
|
||||||
## Second finding — `description` trailing-newline normalization
|
## Daemon wiring
|
||||||
|
|
||||||
PVE **appends a trailing `\n` to `description` on read** (stored URL-encoded as
|
`runDaemon` now runs reconcile alongside the hub loop on the poll cadence, sharing the
|
||||||
`%0A...`). The first live run surfaced this as a (false) verify mismatch:
|
per-guest queue. The journal lives at a `journal.log` sibling of the nonce store. The
|
||||||
`got="...Z\n"` vs `want="...Z"`. The write had genuinely landed — only my exact-match
|
daemon runs cleanly with **no desired state and no signers** — reconcile is a logged
|
||||||
check was too strict. Fixed with `normDesc` (strip trailing newline) at every
|
no-op; a journal-open failure degrades to journal-less rather than crashing.
|
||||||
comparison point, and the run went green. **This is load-bearing intel for slice 4:**
|
|
||||||
a reconcile that compares desired vs actual `description` verbatim will detect
|
|
||||||
perpetual drift; it must normalize the trailing newline.
|
|
||||||
|
|
||||||
## Live run environment
|
## Verification
|
||||||
|
|
||||||
- Built **v0.3.2** on the build server (192.168.0.180, go1.26), pointed at
|
- Full module **race-clean** (`go test -race -count=1 ./...`) and `go vet` clean on the
|
||||||
`demo-felhom` (`https://192.168.0.162:8006`, PVE 9.2.2).
|
Linux build server (go1.26); all unit tests green locally and there.
|
||||||
- Pinned leaf-cert SHA-256 fingerprint re-verified — still
|
- Adversarial fixture coverage: serializer concurrency/ordering, normalization +
|
||||||
`BA:7C:99:7D:45:D0…` (matches the agent's pin).
|
extensibility seam, the full plan matrix (drift / no-false-drift / unmanaged /
|
||||||
- `--selftest=read` clean first (PVE 9.2.2, node online, guests 9001+9999 visible,
|
spec-unknown / scope skips / ordering / empty-desired), engine sync-vs-async +
|
||||||
storages listed), then the gated `--selftest=task -vmid 9999`.
|
failure counting, and journal persistence + idempotency dedupe **across a simulated
|
||||||
- Task UPIDs name the token actor (`…:vzsnapshot:9999:felhom-agent@pve!agent:` etc.) —
|
restart**.
|
||||||
privsep token path genuinely exercised, no privilege drift.
|
- No live Proxmox needed (the engine is unfed); the live exercise is deferred — there is
|
||||||
|
nothing to converge until a desired-state source exists.
|
||||||
|
|
||||||
## Post-state
|
## Next (after validation)
|
||||||
|
|
||||||
Guest **9999** left pristine: **stopped**, `description` **absent**, only `current`
|
Phase B: the classifier (benign vs destructive by provenance + data-bearing-ness, not by
|
||||||
remains (no leftover `felhom-selftest` snapshot).
|
verb), the reversibility gate in front of the queue's executor, and the signed-op
|
||||||
|
consuming layer over `internal/authz` with role-scoping + op-to-action binding + the
|
||||||
## Credentials
|
adversarial rejection matrix — landing **v0.4.0**. I will not start it until the Phase-A
|
||||||
|
validation passes.
|
||||||
The standing operator token (`felhom-agent@pve!agent`, privsep) was **rotated** during
|
|
||||||
this run — the prior secret was not retrievable (PVE reveals a token secret only once
|
|
||||||
at creation), so a fresh secret was minted via `root@felhom-pve` and the `FelhomAgent`
|
|
||||||
role re-confirmed on **both** the user and the token ACL at `/` (privsep intersection
|
|
||||||
gotcha). The token was consumed via the **standing operator token through
|
|
||||||
`FELHOM_AGENT_PROXMOX_TOKEN`, not persisted to the repo** — the on-disk demo config
|
|
||||||
carries only a placeholder. The new secret is **stored out-of-band**.
|
|
||||||
|
|||||||
+60
-16
@@ -15,7 +15,7 @@ import (
|
|||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
"os"
|
||||||
"os/signal"
|
"os/signal"
|
||||||
"strings"
|
"path/filepath"
|
||||||
"syscall"
|
"syscall"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
@@ -23,11 +23,12 @@ import (
|
|||||||
"gitea.dooplex.hu/admin/felhom-agent/internal/hub"
|
"gitea.dooplex.hu/admin/felhom-agent/internal/hub"
|
||||||
applog "gitea.dooplex.hu/admin/felhom-agent/internal/log"
|
applog "gitea.dooplex.hu/admin/felhom-agent/internal/log"
|
||||||
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
|
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
|
||||||
|
"gitea.dooplex.hu/admin/felhom-agent/internal/reconcile"
|
||||||
)
|
)
|
||||||
|
|
||||||
// version is the agent version. Overridable at build time with
|
// version is the agent version. Overridable at build time with
|
||||||
// -ldflags "-X main.version=<v>"; defaults to the in-repo CHANGELOG version.
|
// -ldflags "-X main.version=<v>"; defaults to the in-repo CHANGELOG version.
|
||||||
var version = "0.3.2"
|
var version = "0.4.0-rc1"
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
var (
|
var (
|
||||||
@@ -109,19 +110,66 @@ func runDaemon(cfg config.Config, logger *slog.Logger) int {
|
|||||||
hcfg := cfg.Hub.WithDefaults()
|
hcfg := cfg.Hub.WithDefaults()
|
||||||
collector := hub.NewCollector(px, hub.SystemctlProber{}, cfg.Hub.HostID, version, logger)
|
collector := hub.NewCollector(px, hub.SystemctlProber{}, cfg.Hub.HostID, version, logger)
|
||||||
loop := hub.NewLoop(collector, client, time.Duration(hcfg.PollSeconds)*time.Second, logger)
|
loop := hub.NewLoop(collector, client, time.Duration(hcfg.PollSeconds)*time.Second, logger)
|
||||||
|
interval := time.Duration(hcfg.PollSeconds) * time.Second
|
||||||
|
|
||||||
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
|
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
|
||||||
defer stop()
|
defer stop()
|
||||||
logger.Info("felhom-agent daemon starting",
|
logger.Info("felhom-agent daemon starting",
|
||||||
"version", version, "host_id", cfg.Hub.HostID, "hub_url", cfg.Hub.URL,
|
"version", version, "host_id", cfg.Hub.HostID, "hub_url", cfg.Hub.URL,
|
||||||
"interval_s", hcfg.PollSeconds) // hub key intentionally not logged
|
"interval_s", hcfg.PollSeconds) // hub key intentionally not logged
|
||||||
if err := loop.Run(ctx); err != nil {
|
|
||||||
logger.Error("daemon: loop exited with error", "err", err)
|
// Reconcile (slice 4) runs alongside the hub loop, sharing the per-guest queue
|
||||||
|
// (doc 03 §10). At slice 4 the desired-state provider is empty (no hub serving
|
||||||
|
// until slice 10), so reconcile is a live no-op: it reads state and computes an
|
||||||
|
// empty action set each tick, mutating nothing. The daemon must run cleanly with
|
||||||
|
// no desired state and no signers configured — so a journal-open failure is logged
|
||||||
|
// and reconcile proceeds journal-less (it has nothing destructive to journal yet).
|
||||||
|
queue := reconcile.NewQueue()
|
||||||
|
defer queue.Close()
|
||||||
|
var journal *reconcile.Journal
|
||||||
|
if jp := reconcileJournalPath(cfg); jp != "" {
|
||||||
|
if err := os.MkdirAll(filepath.Dir(jp), 0o700); err != nil {
|
||||||
|
logger.Warn("daemon: cannot ensure journal dir; reconcile runs without a journal", "path", jp, "err", err)
|
||||||
|
} else if j, err := reconcile.OpenJournal(jp); err != nil {
|
||||||
|
logger.Warn("daemon: cannot open op journal; reconcile runs without a journal", "path", jp, "err", err)
|
||||||
|
} else {
|
||||||
|
journal = j
|
||||||
|
defer journal.Close()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
engine := reconcile.NewEngine(reconcile.EngineOptions{
|
||||||
|
API: px,
|
||||||
|
Queue: queue,
|
||||||
|
Journal: journal,
|
||||||
|
Provider: reconcile.EmptyProvider{}, // slice 4: no live desired-state source
|
||||||
|
Logger: logger,
|
||||||
|
})
|
||||||
|
|
||||||
|
// Run reconcile and the hub loop concurrently; either returning ends the daemon.
|
||||||
|
errc := make(chan error, 2)
|
||||||
|
go func() { errc <- engine.Run(ctx, interval) }()
|
||||||
|
go func() { errc <- loop.Run(ctx) }()
|
||||||
|
|
||||||
|
err = <-errc
|
||||||
|
stop() // tear down the sibling on the first exit
|
||||||
|
<-errc // wait for it
|
||||||
|
if err != nil && err != context.Canceled {
|
||||||
|
logger.Error("daemon: exited with error", "err", err)
|
||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
return 0
|
return 0
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// reconcileJournalPath chooses the op-journal path: a `journal.log` sibling of the
|
||||||
|
// configured nonce store (both are durable agent state), falling back to the standard
|
||||||
|
// host state dir when the nonce store is unset.
|
||||||
|
func reconcileJournalPath(cfg config.Config) string {
|
||||||
|
if p := cfg.Authz.NonceStorePath; p != "" {
|
||||||
|
return filepath.Join(filepath.Dir(p), "journal.log")
|
||||||
|
}
|
||||||
|
return "/var/lib/felhom-agent/journal.log"
|
||||||
|
}
|
||||||
|
|
||||||
// runSelftestHub validates hub config, does ONE collect + report, and prints the
|
// runSelftestHub validates hub config, does ONE collect + report, and prints the
|
||||||
// report it would send plus the envelope it got back.
|
// report it would send plus the envelope it got back.
|
||||||
func runSelftestHub(ctx context.Context, cfg config.Config, logger *slog.Logger) int {
|
func runSelftestHub(ctx context.Context, cfg config.Config, logger *slog.Logger) int {
|
||||||
@@ -317,10 +365,11 @@ func selftestSetConfig(ctx context.Context, client *proxmox.Client, vmid int) in
|
|||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
// PVE normalizes `description` by appending a trailing newline on read, so all
|
// PVE normalizes `description` by appending a trailing newline on read, so all
|
||||||
// comparisons here use normDesc (strip trailing newlines) and restores write the
|
// comparisons here use the shared reconcile.NormDescription (strip trailing
|
||||||
// normalized original — otherwise an exact-match check sees false drift. This is
|
// newlines) and restores write the normalized original — otherwise an exact-match
|
||||||
// load-bearing intel for slice-4 reconcile (compare descriptions normalized).
|
// check sees false drift. Same helper the slice-4 reconciler uses to normalize
|
||||||
origDesc = normDesc(origDesc)
|
// description, so the quirk has one source of truth.
|
||||||
|
origDesc = reconcile.NormDescription(origDesc)
|
||||||
|
|
||||||
// 2. Write the marker.
|
// 2. Write the marker.
|
||||||
marker := "felhom-selftest " + time.Now().UTC().Format(time.RFC3339)
|
marker := "felhom-selftest " + time.Now().UTC().Format(time.RFC3339)
|
||||||
@@ -339,7 +388,7 @@ func selftestSetConfig(ctx context.Context, client *proxmox.Client, vmid int) in
|
|||||||
fmt.Printf(" [FAIL] %-16s decode description (verify): %v\n", "setconfig", err)
|
fmt.Printf(" [FAIL] %-16s decode description (verify): %v\n", "setconfig", err)
|
||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
if !present || normDesc(got) != marker {
|
if !present || reconcile.NormDescription(got) != marker {
|
||||||
fmt.Printf(" [FAIL] %-16s write did not land: present=%v got=%q want=%q\n", "setconfig", present, got, marker)
|
fmt.Printf(" [FAIL] %-16s write did not land: present=%v got=%q want=%q\n", "setconfig", present, got, marker)
|
||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
@@ -368,8 +417,8 @@ func selftestSetConfig(ctx context.Context, client *proxmox.Client, vmid int) in
|
|||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
if origPresent {
|
if origPresent {
|
||||||
if !present || normDesc(got) != origDesc {
|
if !present || reconcile.NormDescription(got) != origDesc {
|
||||||
fmt.Printf(" [FAIL] %-16s revert did not restore: present=%v got=%q want=%q\n", "setconfig-revert", present, normDesc(got), origDesc)
|
fmt.Printf(" [FAIL] %-16s revert did not restore: present=%v got=%q want=%q\n", "setconfig-revert", present, reconcile.NormDescription(got), origDesc)
|
||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
} else if present {
|
} else if present {
|
||||||
@@ -407,11 +456,6 @@ func applySetConfig(ctx context.Context, client *proxmox.Client, vmid int, step
|
|||||||
return 0
|
return 0
|
||||||
}
|
}
|
||||||
|
|
||||||
// normDesc strips trailing newlines that PVE appends to the `description` field
|
|
||||||
// on read, so a written value round-trips equal. (PVE stores `description` with a
|
|
||||||
// trailing "\n"; comparing raw would always mismatch.)
|
|
||||||
func normDesc(s string) string { return strings.TrimRight(s, "\n") }
|
|
||||||
|
|
||||||
// extraString reads a string-valued key from GuestConfig.Extra (raw JSON). It
|
// extraString reads a string-valued key from GuestConfig.Extra (raw JSON). It
|
||||||
// returns ("", false, nil) when the key is absent, and decodes the JSON string
|
// returns ("", false, nil) when the key is absent, and decodes the JSON string
|
||||||
// otherwise.
|
// otherwise.
|
||||||
|
|||||||
@@ -0,0 +1,24 @@
|
|||||||
|
// Package reconcile is the agent-side control core (slice 4): the engine that
|
||||||
|
// converges actual Proxmox state toward a desired state, the per-guest serializer
|
||||||
|
// that all mutation sources funnel through (doc 03 §10), the field-normalization
|
||||||
|
// layer that keeps Proxmox round-trip quirks from reading as drift, and the durable
|
||||||
|
// operation journal + idempotency store.
|
||||||
|
//
|
||||||
|
// Phase A (this file set) is structural and runs LIVE but UNFED: at slice 4 there is
|
||||||
|
// no desired-state provider (hub serving is slice 10, provisioning is slice 7), so
|
||||||
|
// the live engine computes an empty action set and performs ZERO mutations. The
|
||||||
|
// engine, serializer, normalizer, journal, and planner are all exercised against
|
||||||
|
// synthetic fixtures. The first live convergence arrives when slice 10 serves desired
|
||||||
|
// state into the DesiredProvider seam.
|
||||||
|
//
|
||||||
|
// Phase B (added later) layers the benign/destructive classifier and the
|
||||||
|
// reversibility gate (doc 03 §4) plus the signed-op consuming layer over
|
||||||
|
// internal/authz (doc 04) in front of the queue's executor — so every mutation passes
|
||||||
|
// the gate. Phase A deliberately wires only the benign-on-existing-guest action set:
|
||||||
|
// Start / Stop / SetConfig.
|
||||||
|
//
|
||||||
|
// Action set scope (slice 4): Start, Stop, SetConfig on EXISTING guests only.
|
||||||
|
// Provisioning (restore-to-new-guest) and destroy/overwrite are out of scope here —
|
||||||
|
// the destructive set is classified and gated in Phase B but not wired to live
|
||||||
|
// execution, because nothing serves destructive deltas until slice 10.
|
||||||
|
package reconcile
|
||||||
@@ -0,0 +1,249 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
"strconv"
|
||||||
|
"sync/atomic"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Engine converges actual Proxmox state toward the desired state. One Reconcile pass:
|
||||||
|
// read desired (from the provider), read actual (from Proxmox), Plan the minimal
|
||||||
|
// benign action set, and dispatch each action onto the per-guest Queue — journaling
|
||||||
|
// each op for crash-safety. At slice 4 the provider is EmptyProvider, so the action
|
||||||
|
// set is empty and the pass performs zero mutations (correct and expected).
|
||||||
|
//
|
||||||
|
// Concurrency: actions for different guests run in parallel (separate Queue lanes);
|
||||||
|
// actions for the same guest run serially in plan order. Every Proxmox mutation is
|
||||||
|
// async-or-sync per the mutate.go contract: a non-empty UPID is WaitTask'd and its
|
||||||
|
// exitstatus asserted; an empty UPID is a clean synchronous success.
|
||||||
|
type Engine struct {
|
||||||
|
api GuestAPI
|
||||||
|
queue *Queue
|
||||||
|
journal *Journal
|
||||||
|
provider DesiredProvider
|
||||||
|
norm FieldNormalizers
|
||||||
|
logger *slog.Logger
|
||||||
|
|
||||||
|
opSeq uint64 // atomic; makes each op id unique per attempt
|
||||||
|
}
|
||||||
|
|
||||||
|
// EngineOptions configures a new Engine. Norm defaults to DefaultNormalizers, Logger
|
||||||
|
// to a discard logger.
|
||||||
|
type EngineOptions struct {
|
||||||
|
API GuestAPI
|
||||||
|
Queue *Queue
|
||||||
|
Journal *Journal
|
||||||
|
Provider DesiredProvider
|
||||||
|
Norm FieldNormalizers
|
||||||
|
Logger *slog.Logger
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewEngine builds an Engine. The Queue is shared (the single §10 choke point); the
|
||||||
|
// caller owns its lifecycle (Close on shutdown).
|
||||||
|
func NewEngine(opts EngineOptions) *Engine {
|
||||||
|
norm := opts.Norm
|
||||||
|
if norm == nil {
|
||||||
|
norm = DefaultNormalizers()
|
||||||
|
}
|
||||||
|
logger := opts.Logger
|
||||||
|
if logger == nil {
|
||||||
|
logger = slog.New(slog.NewTextHandler(discard{}, nil))
|
||||||
|
}
|
||||||
|
provider := opts.Provider
|
||||||
|
if provider == nil {
|
||||||
|
provider = EmptyProvider{}
|
||||||
|
}
|
||||||
|
return &Engine{
|
||||||
|
api: opts.API,
|
||||||
|
queue: opts.Queue,
|
||||||
|
journal: opts.Journal,
|
||||||
|
provider: provider,
|
||||||
|
norm: norm,
|
||||||
|
logger: logger,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Result summarizes one Reconcile pass.
|
||||||
|
type Result struct {
|
||||||
|
Planned int
|
||||||
|
Executed int // succeeded
|
||||||
|
Failed int // errored
|
||||||
|
Errors []error // one per failed action
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reconcile runs one convergence pass. It returns an error only on a pass-level
|
||||||
|
// failure (can't read desired/actual); per-action failures are counted in Result and
|
||||||
|
// do not abort the pass (other guests still converge).
|
||||||
|
func (e *Engine) Reconcile(ctx context.Context) (Result, error) {
|
||||||
|
desired, err := e.provider.Desired(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return Result{}, fmt.Errorf("reconcile: desired state: %w", err)
|
||||||
|
}
|
||||||
|
actual, err := e.readActual(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return Result{}, fmt.Errorf("reconcile: actual state: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
actions := Plan(desired, actual, e.norm)
|
||||||
|
res := Result{Planned: len(actions)}
|
||||||
|
if len(actions) == 0 {
|
||||||
|
e.logger.Debug("reconcile: no drift, no actions",
|
||||||
|
"desired_guests", len(desired.Guests), "actual_guests", len(actual.Guests))
|
||||||
|
return res, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Dispatch all actions onto the shared per-guest queue, then await each. Same-vmid
|
||||||
|
// actions serialize in submit order; different vmids run concurrently.
|
||||||
|
chans := make([]<-chan error, len(actions))
|
||||||
|
for i := range actions {
|
||||||
|
act := actions[i]
|
||||||
|
chans[i] = e.queue.Submit(act.VMID, func() error { return e.execute(ctx, act) })
|
||||||
|
}
|
||||||
|
for i, ch := range chans {
|
||||||
|
if err := <-ch; err != nil {
|
||||||
|
res.Failed++
|
||||||
|
res.Errors = append(res.Errors, err)
|
||||||
|
e.logger.Error("reconcile: action failed",
|
||||||
|
"vmid", actions[i].VMID, "kind", actions[i].Kind, "err", err)
|
||||||
|
} else {
|
||||||
|
res.Executed++
|
||||||
|
e.logger.Info("reconcile: action applied",
|
||||||
|
"vmid", actions[i].VMID, "kind", actions[i].Kind, "reason", actions[i].Reason)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return res, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// execute dispatches one benign action against Proxmox and journals its lifecycle.
|
||||||
|
// Reconcile actions carry NO idempotency key (convergent — safe to re-run on drift);
|
||||||
|
// crash-safety comes from the in-flight journal records, not idempotency suppression.
|
||||||
|
func (e *Engine) execute(ctx context.Context, act Action) error {
|
||||||
|
opID := e.nextOpID(act)
|
||||||
|
e.append(JournalEntry{OpID: opID, VMID: act.VMID, Kind: string(act.Kind),
|
||||||
|
Params: act.Params, State: OpStarted, At: time.Now().UTC()})
|
||||||
|
|
||||||
|
var upid string
|
||||||
|
var err error
|
||||||
|
switch act.Kind {
|
||||||
|
case ActionStart:
|
||||||
|
upid, err = e.api.Start(ctx, act.VMID)
|
||||||
|
case ActionStop:
|
||||||
|
upid, err = e.api.Stop(ctx, act.VMID)
|
||||||
|
case ActionSetConfig:
|
||||||
|
upid, err = e.api.SetConfig(ctx, act.VMID, act.Params)
|
||||||
|
default:
|
||||||
|
err = fmt.Errorf("reconcile: unknown action kind %q", act.Kind)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
e.append(JournalEntry{OpID: opID, VMID: act.VMID, Kind: string(act.Kind),
|
||||||
|
State: OpFailed, At: time.Now().UTC()})
|
||||||
|
return fmt.Errorf("reconcile: %s vmid %d: %w", act.Kind, act.VMID, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Record the task id (if any) before awaiting it, so a crash mid-wait is
|
||||||
|
// detectable on restart and the task status can be re-checked.
|
||||||
|
e.append(JournalEntry{OpID: opID, VMID: act.VMID, Kind: string(act.Kind),
|
||||||
|
UPID: upid, State: OpTaskRunning, At: time.Now().UTC()})
|
||||||
|
|
||||||
|
if upid != "" {
|
||||||
|
st, err := e.api.WaitTask(ctx, upid, proxmox.WaitOptions{})
|
||||||
|
if err != nil { // WaitTask already errors on a non-OK exitstatus
|
||||||
|
e.append(JournalEntry{OpID: opID, VMID: act.VMID, Kind: string(act.Kind),
|
||||||
|
UPID: upid, State: OpFailed, At: time.Now().UTC()})
|
||||||
|
return fmt.Errorf("reconcile: %s vmid %d: %w", act.Kind, act.VMID, err)
|
||||||
|
}
|
||||||
|
if st.ExitStatus != "OK" { // defensive — WaitTask should have errored
|
||||||
|
e.append(JournalEntry{OpID: opID, VMID: act.VMID, Kind: string(act.Kind),
|
||||||
|
UPID: upid, State: OpFailed, At: time.Now().UTC()})
|
||||||
|
return fmt.Errorf("reconcile: %s vmid %d: exitstatus=%s", act.Kind, act.VMID, st.ExitStatus)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// upid == "" is the synchronous path (slice-4 proven for SetConfig description).
|
||||||
|
|
||||||
|
e.append(JournalEntry{OpID: opID, VMID: act.VMID, Kind: string(act.Kind),
|
||||||
|
UPID: upid, State: OpSucceeded, At: time.Now().UTC()})
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// readActual reads observed state from Proxmox: run-state from the list, sizing +
|
||||||
|
// description from per-guest config. A GuestConfig read failure keeps the run-state
|
||||||
|
// (SpecKnown=false) rather than dropping the guest — matching the collector.
|
||||||
|
func (e *Engine) readActual(ctx context.Context) (ActualState, error) {
|
||||||
|
lxc, err := e.api.ListLXC(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return ActualState{}, err
|
||||||
|
}
|
||||||
|
guests := make(map[int]ActualGuest, len(lxc))
|
||||||
|
for _, g := range lxc {
|
||||||
|
a := ActualGuest{VMID: g.VMID, Run: normRun(g.Status)}
|
||||||
|
cfg, err := e.api.GuestConfig(ctx, g.VMID)
|
||||||
|
if err != nil {
|
||||||
|
e.logger.Warn("reconcile: GuestConfig failed; spec unknown (run-state kept)",
|
||||||
|
"vmid", g.VMID, "err", err)
|
||||||
|
} else {
|
||||||
|
a.SpecKnown = true
|
||||||
|
a.Cores = cfg.Cores
|
||||||
|
a.MemoryMiB = cfg.Memory
|
||||||
|
a.Description = guestDescription(cfg)
|
||||||
|
}
|
||||||
|
guests[g.VMID] = a
|
||||||
|
}
|
||||||
|
return ActualState{Guests: guests}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run reconciles once immediately, then on every interval tick until ctx is done. A
|
||||||
|
// per-pass failure is logged and the loop continues (drift is corrected next tick).
|
||||||
|
// At slice 4 (EmptyProvider) every pass is a logged no-op.
|
||||||
|
func (e *Engine) Run(ctx context.Context, interval time.Duration) error {
|
||||||
|
e.reconcileOnce(ctx)
|
||||||
|
t := time.NewTicker(interval)
|
||||||
|
defer t.Stop()
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
return ctx.Err()
|
||||||
|
case <-t.C:
|
||||||
|
e.reconcileOnce(ctx)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (e *Engine) reconcileOnce(ctx context.Context) {
|
||||||
|
res, err := e.Reconcile(ctx)
|
||||||
|
if err != nil {
|
||||||
|
e.logger.Error("reconcile: pass failed", "err", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if res.Planned > 0 {
|
||||||
|
e.logger.Info("reconcile: pass complete",
|
||||||
|
"planned", res.Planned, "executed", res.Executed, "failed", res.Failed)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// nextOpID builds a per-attempt unique op id (kind-vmid-seq) for journal correlation.
|
||||||
|
func (e *Engine) nextOpID(act Action) string {
|
||||||
|
n := atomic.AddUint64(&e.opSeq, 1)
|
||||||
|
return string(act.Kind) + "-" + strconv.Itoa(act.VMID) + "-" + strconv.FormatUint(n, 10)
|
||||||
|
}
|
||||||
|
|
||||||
|
// append journals a lifecycle record, logging (never failing the op on) a journal I/O
|
||||||
|
// error — the Proxmox op already happened; a missing journal line is a crash-recovery
|
||||||
|
// degradation, not a reason to abort.
|
||||||
|
func (e *Engine) append(rec JournalEntry) {
|
||||||
|
if e.journal == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if err := e.journal.Append(rec); err != nil {
|
||||||
|
e.logger.Error("reconcile: journal append failed", "op_id", rec.OpID, "state", rec.State, "err", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// discard is an io.Writer sink for the default no-op logger.
|
||||||
|
type discard struct{}
|
||||||
|
|
||||||
|
func (discard) Write(p []byte) (int, error) { return len(p), nil }
|
||||||
@@ -0,0 +1,212 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"path/filepath"
|
||||||
|
"sync"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-agent/internal/hub"
|
||||||
|
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
|
||||||
|
)
|
||||||
|
|
||||||
|
// fakeAPI is a configurable GuestAPI for engine tests: it records mutating calls and
|
||||||
|
// returns canned UPIDs (""=synchronous, non-empty=async) and WaitTask verdicts.
|
||||||
|
type fakeAPI struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
lxc []proxmox.Guest
|
||||||
|
cfg map[int]proxmox.GuestConfig
|
||||||
|
|
||||||
|
startUPID, stopUPID, setUPID string
|
||||||
|
startErr, stopErr, setErr error
|
||||||
|
// waitFunc maps a UPID to a (status, err); default = OK. Mirrors the real client,
|
||||||
|
// which errors on a non-OK exitstatus.
|
||||||
|
waitFunc func(upid string) (proxmox.TaskStatus, error)
|
||||||
|
|
||||||
|
starts []int
|
||||||
|
stops []int
|
||||||
|
sets []setCall
|
||||||
|
waits []string
|
||||||
|
listErr error
|
||||||
|
}
|
||||||
|
|
||||||
|
type setCall struct {
|
||||||
|
vmid int
|
||||||
|
params map[string]string
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeAPI) ListLXC(context.Context) ([]proxmox.Guest, error) {
|
||||||
|
if f.listErr != nil {
|
||||||
|
return nil, f.listErr
|
||||||
|
}
|
||||||
|
return f.lxc, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeAPI) GuestConfig(_ context.Context, vmid int) (proxmox.GuestConfig, error) {
|
||||||
|
c, ok := f.cfg[vmid]
|
||||||
|
if !ok {
|
||||||
|
return proxmox.GuestConfig{}, errors.New("no config")
|
||||||
|
}
|
||||||
|
return c, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeAPI) Start(_ context.Context, vmid int) (string, error) {
|
||||||
|
f.mu.Lock()
|
||||||
|
f.starts = append(f.starts, vmid)
|
||||||
|
f.mu.Unlock()
|
||||||
|
return f.startUPID, f.startErr
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeAPI) Stop(_ context.Context, vmid int) (string, error) {
|
||||||
|
f.mu.Lock()
|
||||||
|
f.stops = append(f.stops, vmid)
|
||||||
|
f.mu.Unlock()
|
||||||
|
return f.stopUPID, f.stopErr
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeAPI) SetConfig(_ context.Context, vmid int, params map[string]string) (string, error) {
|
||||||
|
f.mu.Lock()
|
||||||
|
f.sets = append(f.sets, setCall{vmid, params})
|
||||||
|
f.mu.Unlock()
|
||||||
|
return f.setUPID, f.setErr
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeAPI) WaitTask(_ context.Context, upid string, _ proxmox.WaitOptions) (proxmox.TaskStatus, error) {
|
||||||
|
f.mu.Lock()
|
||||||
|
f.waits = append(f.waits, upid)
|
||||||
|
f.mu.Unlock()
|
||||||
|
if f.waitFunc != nil {
|
||||||
|
return f.waitFunc(upid)
|
||||||
|
}
|
||||||
|
return proxmox.TaskStatus{Status: "stopped", ExitStatus: "OK"}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func newEngine(t *testing.T, api GuestAPI, provider DesiredProvider) (*Engine, *Journal, *Queue) {
|
||||||
|
t.Helper()
|
||||||
|
jp := filepath.Join(t.TempDir(), "journal.log")
|
||||||
|
j, err := OpenJournal(jp)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("OpenJournal: %v", err)
|
||||||
|
}
|
||||||
|
t.Cleanup(func() { j.Close() })
|
||||||
|
q := NewQueue()
|
||||||
|
t.Cleanup(q.Close)
|
||||||
|
e := NewEngine(EngineOptions{API: api, Queue: q, Journal: j, Provider: provider})
|
||||||
|
return e, j, q
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEngine_EmptyProviderNoMutations(t *testing.T) {
|
||||||
|
api := &fakeAPI{
|
||||||
|
lxc: []proxmox.Guest{{VMID: 100, Status: "running"}},
|
||||||
|
cfg: map[int]proxmox.GuestConfig{100: {Cores: 2}},
|
||||||
|
}
|
||||||
|
e, _, _ := newEngine(t, api, EmptyProvider{})
|
||||||
|
res, err := e.Reconcile(context.Background())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Reconcile: %v", err)
|
||||||
|
}
|
||||||
|
if res.Planned != 0 || res.Executed != 0 {
|
||||||
|
t.Errorf("EmptyProvider should plan nothing, got %+v", res)
|
||||||
|
}
|
||||||
|
if len(api.starts)+len(api.stops)+len(api.sets) != 0 {
|
||||||
|
t.Errorf("EmptyProvider mutated Proxmox: starts=%v stops=%v sets=%v", api.starts, api.stops, api.sets)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEngine_AsyncStartWaitsTask(t *testing.T) {
|
||||||
|
api := &fakeAPI{
|
||||||
|
lxc: []proxmox.Guest{{VMID: 100, Status: "stopped"}},
|
||||||
|
cfg: map[int]proxmox.GuestConfig{100: {Cores: 2}},
|
||||||
|
startUPID: "UPID:demo:start:100:",
|
||||||
|
}
|
||||||
|
e, j, _ := newEngine(t, api, StaticProvider{State: desired(DesiredGuest{VMID: 100, Run: RunRunning})})
|
||||||
|
res, err := e.Reconcile(context.Background())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Reconcile: %v", err)
|
||||||
|
}
|
||||||
|
if res.Executed != 1 || res.Failed != 0 {
|
||||||
|
t.Fatalf("want 1 executed, got %+v", res)
|
||||||
|
}
|
||||||
|
if len(api.starts) != 1 || api.starts[0] != 100 {
|
||||||
|
t.Errorf("expected Start(100), got %v", api.starts)
|
||||||
|
}
|
||||||
|
if len(api.waits) != 1 {
|
||||||
|
t.Errorf("async op must WaitTask, got waits=%v", api.waits)
|
||||||
|
}
|
||||||
|
if len(j.InFlight()) != 0 {
|
||||||
|
t.Errorf("no ops should be in-flight after success: %+v", j.InFlight())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEngine_SynchronousSetConfigNoWait(t *testing.T) {
|
||||||
|
// Empty UPID = PVE applied synchronously (slice-4 proven for description). Must be
|
||||||
|
// treated as success WITHOUT a WaitTask call.
|
||||||
|
api := &fakeAPI{
|
||||||
|
lxc: []proxmox.Guest{{VMID: 100, Status: "stopped"}},
|
||||||
|
cfg: map[int]proxmox.GuestConfig{100: {Cores: 2}},
|
||||||
|
setUPID: "", // synchronous
|
||||||
|
}
|
||||||
|
e, _, _ := newEngine(t, api, StaticProvider{State: desired(
|
||||||
|
DesiredGuest{VMID: 100, Spec: &hub.GuestSpec{Cores: 4, MemoryBytes: mib(2048)}})})
|
||||||
|
res, err := e.Reconcile(context.Background())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Reconcile: %v", err)
|
||||||
|
}
|
||||||
|
if res.Executed != 1 {
|
||||||
|
t.Fatalf("want 1 executed, got %+v", res)
|
||||||
|
}
|
||||||
|
if len(api.sets) != 1 || api.sets[0].params["cores"] != "4" {
|
||||||
|
t.Errorf("expected SetConfig cores=4, got %v", api.sets)
|
||||||
|
}
|
||||||
|
if len(api.waits) != 0 {
|
||||||
|
t.Errorf("synchronous op must NOT WaitTask, got waits=%v", api.waits)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEngine_WaitTaskFailureCountsFailed(t *testing.T) {
|
||||||
|
api := &fakeAPI{
|
||||||
|
lxc: []proxmox.Guest{{VMID: 100, Status: "stopped"}},
|
||||||
|
cfg: map[int]proxmox.GuestConfig{100: {Cores: 2}},
|
||||||
|
startUPID: "UPID:demo:start:100:",
|
||||||
|
waitFunc: func(string) (proxmox.TaskStatus, error) {
|
||||||
|
return proxmox.TaskStatus{Status: "stopped", ExitStatus: "got 403"}, errors.New("task failed: got 403")
|
||||||
|
},
|
||||||
|
}
|
||||||
|
e, j, _ := newEngine(t, api, StaticProvider{State: desired(DesiredGuest{VMID: 100, Run: RunRunning})})
|
||||||
|
res, err := e.Reconcile(context.Background())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Reconcile (pass): %v", err)
|
||||||
|
}
|
||||||
|
if res.Failed != 1 || res.Executed != 0 {
|
||||||
|
t.Fatalf("want 1 failed, got %+v", res)
|
||||||
|
}
|
||||||
|
// The failed op is journaled terminal (failed), not left in-flight.
|
||||||
|
if len(j.InFlight()) != 0 {
|
||||||
|
t.Errorf("failed op should be terminal, in-flight=%+v", j.InFlight())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEngine_PostErrorCountsFailed(t *testing.T) {
|
||||||
|
api := &fakeAPI{
|
||||||
|
lxc: []proxmox.Guest{{VMID: 100, Status: "stopped"}},
|
||||||
|
cfg: map[int]proxmox.GuestConfig{100: {Cores: 2}},
|
||||||
|
startErr: errors.New("connection refused"),
|
||||||
|
}
|
||||||
|
e, _, _ := newEngine(t, api, StaticProvider{State: desired(DesiredGuest{VMID: 100, Run: RunRunning})})
|
||||||
|
res, _ := e.Reconcile(context.Background())
|
||||||
|
if res.Failed != 1 {
|
||||||
|
t.Fatalf("want 1 failed on POST error, got %+v", res)
|
||||||
|
}
|
||||||
|
if len(api.waits) != 0 {
|
||||||
|
t.Errorf("POST error must not reach WaitTask, got %v", api.waits)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEngine_ListErrorIsPassFailure(t *testing.T) {
|
||||||
|
api := &fakeAPI{listErr: errors.New("api down")}
|
||||||
|
e, _, _ := newEngine(t, api, StaticProvider{State: desired(DesiredGuest{VMID: 100, Run: RunRunning})})
|
||||||
|
if _, err := e.Reconcile(context.Background()); err == nil {
|
||||||
|
t.Error("expected a pass-level error when actual state can't be read")
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,189 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"encoding/json"
|
||||||
|
"errors"
|
||||||
|
"io/fs"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// OpState is the lifecycle state of a journaled operation.
|
||||||
|
type OpState string
|
||||||
|
|
||||||
|
const (
|
||||||
|
// OpStarted: the op was planned and dispatch began (no Proxmox task yet).
|
||||||
|
OpStarted OpState = "started"
|
||||||
|
// OpTaskRunning: the Proxmox POST returned a UPID we are/were awaiting. Recorded
|
||||||
|
// so a crash mid-op is detected on restart and the task status re-checked.
|
||||||
|
OpTaskRunning OpState = "task_running"
|
||||||
|
// OpSucceeded: the op completed (task exitstatus OK, or a clean synchronous apply).
|
||||||
|
OpSucceeded OpState = "succeeded"
|
||||||
|
// OpFailed: the op errored (POST error, task non-OK, or WaitTask error).
|
||||||
|
OpFailed OpState = "failed"
|
||||||
|
)
|
||||||
|
|
||||||
|
// terminal reports whether a state is final (no further records expected).
|
||||||
|
func (s OpState) terminal() bool { return s == OpSucceeded || s == OpFailed }
|
||||||
|
|
||||||
|
// JournalEntry is one durable record. Multiple records share an OpID across an op's
|
||||||
|
// lifecycle (started → task_running → succeeded/failed); the latest wins in the index.
|
||||||
|
//
|
||||||
|
// IdempKey is the one-shot idempotency key: set on ops that must run AT MOST ONCE
|
||||||
|
// across retries/restarts (signed jobs, slice B). Reconcile actions leave it empty —
|
||||||
|
// reconcile is convergent and SHOULD re-run on real drift, so it is never suppressed
|
||||||
|
// by the idempotency set. A non-empty IdempKey that reaches OpSucceeded marks the key
|
||||||
|
// applied (AlreadyApplied true forever after, surviving restarts).
|
||||||
|
type JournalEntry struct {
|
||||||
|
OpID string `json:"op_id"`
|
||||||
|
VMID int `json:"vmid"`
|
||||||
|
Kind string `json:"kind"`
|
||||||
|
Params map[string]string `json:"params,omitempty"`
|
||||||
|
UPID string `json:"upid,omitempty"`
|
||||||
|
State OpState `json:"state"`
|
||||||
|
IdempKey string `json:"idemp_key,omitempty"`
|
||||||
|
At time.Time `json:"at"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Journal is the durable operation log + idempotency store. It mirrors
|
||||||
|
// authz.FileNonceStore: an fsync'd append-only JSONL with an in-memory index. A
|
||||||
|
// record is on disk AND fsync'd before Append returns, so a crash never loses a
|
||||||
|
// committed lifecycle transition.
|
||||||
|
//
|
||||||
|
// Phase-A scope: it records single-task ops (Start/Stop/SetConfig) for crash
|
||||||
|
// detection and provides the idempotency-key dedupe. Full multi-step compensating
|
||||||
|
// rollback (provision/restore, slices 6/7) reuses this structure with richer replay.
|
||||||
|
type Journal struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
path string
|
||||||
|
f *os.File
|
||||||
|
latest map[string]JournalEntry // op_id -> latest record
|
||||||
|
applied map[string]struct{} // idemp keys that reached OpSucceeded
|
||||||
|
}
|
||||||
|
|
||||||
|
// OpenJournal opens (or creates) the journal at path, replaying any existing log into
|
||||||
|
// the index. The parent dir must exist (the daemon ensures it, sibling to the nonce
|
||||||
|
// store).
|
||||||
|
func OpenJournal(path string) (*Journal, error) {
|
||||||
|
j := &Journal{
|
||||||
|
path: path,
|
||||||
|
latest: make(map[string]JournalEntry),
|
||||||
|
applied: make(map[string]struct{}),
|
||||||
|
}
|
||||||
|
if err := j.load(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
f, err := os.OpenFile(path, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o600)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
j.f = f
|
||||||
|
syncJournalDir(filepath.Dir(path))
|
||||||
|
return j, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (j *Journal) load() error {
|
||||||
|
b, err := os.ReadFile(j.path)
|
||||||
|
if errors.Is(err, fs.ErrNotExist) {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
for _, line := range bytes.Split(b, []byte("\n")) {
|
||||||
|
line = bytes.TrimSpace(line)
|
||||||
|
if len(line) == 0 {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
var e JournalEntry
|
||||||
|
if json.Unmarshal(line, &e) != nil {
|
||||||
|
continue // skip a torn trailing line from a crash mid-append
|
||||||
|
}
|
||||||
|
j.index(e)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// index folds one entry into the in-memory state (latest-by-op + applied set).
|
||||||
|
func (j *Journal) index(e JournalEntry) {
|
||||||
|
j.latest[e.OpID] = e
|
||||||
|
if e.State == OpSucceeded && e.IdempKey != "" {
|
||||||
|
j.applied[e.IdempKey] = struct{}{}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Append durably writes one lifecycle record (fsync before returning) and updates the
|
||||||
|
// index. Callers build entries via the engine's lifecycle (Begin/RecordTask/Complete
|
||||||
|
// helpers below).
|
||||||
|
func (j *Journal) Append(e JournalEntry) error {
|
||||||
|
j.mu.Lock()
|
||||||
|
defer j.mu.Unlock()
|
||||||
|
rec, err := json.Marshal(e)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
rec = append(rec, '\n')
|
||||||
|
if _, err := j.f.Write(rec); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if err := j.f.Sync(); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
j.index(e)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Latest returns the most recent record for an op id.
|
||||||
|
func (j *Journal) Latest(opID string) (JournalEntry, bool) {
|
||||||
|
j.mu.Lock()
|
||||||
|
defer j.mu.Unlock()
|
||||||
|
e, ok := j.latest[opID]
|
||||||
|
return e, ok
|
||||||
|
}
|
||||||
|
|
||||||
|
// InFlight returns ops whose latest state is non-terminal — i.e. started or
|
||||||
|
// task_running with no succeeded/failed record. On restart the daemon re-checks each
|
||||||
|
// (resume-or-rollback). Order is unspecified.
|
||||||
|
func (j *Journal) InFlight() []JournalEntry {
|
||||||
|
j.mu.Lock()
|
||||||
|
defer j.mu.Unlock()
|
||||||
|
var out []JournalEntry
|
||||||
|
for _, e := range j.latest {
|
||||||
|
if !e.State.terminal() {
|
||||||
|
out = append(out, e)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
// AlreadyApplied reports whether a one-shot op with this idempotency key has already
|
||||||
|
// succeeded (survives restarts via the replayed log). Empty key is never "applied".
|
||||||
|
func (j *Journal) AlreadyApplied(idempKey string) bool {
|
||||||
|
if idempKey == "" {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
j.mu.Lock()
|
||||||
|
defer j.mu.Unlock()
|
||||||
|
_, ok := j.applied[idempKey]
|
||||||
|
return ok
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close releases the file handle.
|
||||||
|
func (j *Journal) Close() error {
|
||||||
|
j.mu.Lock()
|
||||||
|
defer j.mu.Unlock()
|
||||||
|
if j.f != nil {
|
||||||
|
return j.f.Close()
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func syncJournalDir(dir string) {
|
||||||
|
if d, err := os.Open(dir); err == nil {
|
||||||
|
_ = d.Sync()
|
||||||
|
_ = d.Close()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,154 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
func appendRaw(t *testing.T, path, line string) {
|
||||||
|
t.Helper()
|
||||||
|
f, err := os.OpenFile(path, os.O_WRONLY|os.O_APPEND, 0o600)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("open for raw append: %v", err)
|
||||||
|
}
|
||||||
|
defer f.Close()
|
||||||
|
if _, err := f.WriteString(line + "\n"); err != nil {
|
||||||
|
t.Fatalf("raw append: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func reopen(t *testing.T, j *Journal, path string) *Journal {
|
||||||
|
t.Helper()
|
||||||
|
if err := j.Close(); err != nil {
|
||||||
|
t.Fatalf("close: %v", err)
|
||||||
|
}
|
||||||
|
nj, err := OpenJournal(path)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("reopen: %v", err)
|
||||||
|
}
|
||||||
|
t.Cleanup(func() { nj.Close() })
|
||||||
|
return nj
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestJournal_LifecycleLatestWins(t *testing.T) {
|
||||||
|
path := filepath.Join(t.TempDir(), "journal.log")
|
||||||
|
j, err := OpenJournal(path)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("open: %v", err)
|
||||||
|
}
|
||||||
|
t.Cleanup(func() { j.Close() })
|
||||||
|
|
||||||
|
now := time.Now().UTC()
|
||||||
|
for _, e := range []JournalEntry{
|
||||||
|
{OpID: "op1", VMID: 100, Kind: "start", State: OpStarted, At: now},
|
||||||
|
{OpID: "op1", VMID: 100, Kind: "start", UPID: "UPID:x:", State: OpTaskRunning, At: now},
|
||||||
|
{OpID: "op1", VMID: 100, Kind: "start", UPID: "UPID:x:", State: OpSucceeded, At: now},
|
||||||
|
} {
|
||||||
|
if err := j.Append(e); err != nil {
|
||||||
|
t.Fatalf("append: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
got, ok := j.Latest("op1")
|
||||||
|
if !ok || got.State != OpSucceeded {
|
||||||
|
t.Fatalf("Latest(op1) = %+v ok=%v, want succeeded", got, ok)
|
||||||
|
}
|
||||||
|
if len(j.InFlight()) != 0 {
|
||||||
|
t.Errorf("a succeeded op must not be in-flight: %+v", j.InFlight())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestJournal_InFlightSurvivesRestart(t *testing.T) {
|
||||||
|
path := filepath.Join(t.TempDir(), "journal.log")
|
||||||
|
j, err := OpenJournal(path)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("open: %v", err)
|
||||||
|
}
|
||||||
|
now := time.Now().UTC()
|
||||||
|
// op started + got a task id, but NO terminal record — simulates a crash mid-op.
|
||||||
|
mustAppend(t, j, JournalEntry{OpID: "op9", VMID: 100, Kind: "set_config", UPID: "UPID:crash:", State: OpTaskRunning, At: now})
|
||||||
|
|
||||||
|
j2 := reopen(t, j, path)
|
||||||
|
inflight := j2.InFlight()
|
||||||
|
if len(inflight) != 1 || inflight[0].OpID != "op9" || inflight[0].UPID != "UPID:crash:" {
|
||||||
|
t.Fatalf("crash-mid-op should replay as in-flight with its task id, got %+v", inflight)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestJournal_IdempotencyDedupeAcrossRestart(t *testing.T) {
|
||||||
|
path := filepath.Join(t.TempDir(), "journal.log")
|
||||||
|
j, err := OpenJournal(path)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("open: %v", err)
|
||||||
|
}
|
||||||
|
now := time.Now().UTC()
|
||||||
|
|
||||||
|
const key = "job-abc-123"
|
||||||
|
if j.AlreadyApplied(key) {
|
||||||
|
t.Fatal("key should not be applied before any record")
|
||||||
|
}
|
||||||
|
// A one-shot op succeeds carrying an idempotency key.
|
||||||
|
mustAppend(t, j, JournalEntry{OpID: "op1", VMID: 100, Kind: "restore", IdempKey: key, State: OpStarted, At: now})
|
||||||
|
mustAppend(t, j, JournalEntry{OpID: "op1", VMID: 100, Kind: "restore", IdempKey: key, State: OpSucceeded, At: now})
|
||||||
|
if !j.AlreadyApplied(key) {
|
||||||
|
t.Fatal("key should be applied after success")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Survives a restart (replayed from the log) — a redelivered job must not re-run.
|
||||||
|
j2 := reopen(t, j, path)
|
||||||
|
if !j2.AlreadyApplied(key) {
|
||||||
|
t.Error("idempotency key must survive an agent restart")
|
||||||
|
}
|
||||||
|
// Empty key is never 'applied'.
|
||||||
|
if j2.AlreadyApplied("") {
|
||||||
|
t.Error("empty idempotency key must never be considered applied")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestJournal_FailedKeyNotApplied(t *testing.T) {
|
||||||
|
path := filepath.Join(t.TempDir(), "journal.log")
|
||||||
|
j, err := OpenJournal(path)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("open: %v", err)
|
||||||
|
}
|
||||||
|
t.Cleanup(func() { j.Close() })
|
||||||
|
now := time.Now().UTC()
|
||||||
|
const key = "job-fail"
|
||||||
|
mustAppend(t, j, JournalEntry{OpID: "opF", VMID: 1, Kind: "restore", IdempKey: key, State: OpStarted, At: now})
|
||||||
|
mustAppend(t, j, JournalEntry{OpID: "opF", VMID: 1, Kind: "restore", IdempKey: key, State: OpFailed, At: now})
|
||||||
|
if j.AlreadyApplied(key) {
|
||||||
|
t.Error("a FAILED one-shot op must not mark its key applied (it may be retried)")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestJournal_SkipsTornTrailingLine(t *testing.T) {
|
||||||
|
path := filepath.Join(t.TempDir(), "journal.log")
|
||||||
|
j, err := OpenJournal(path)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("open: %v", err)
|
||||||
|
}
|
||||||
|
mustAppend(t, j, JournalEntry{OpID: "ok", VMID: 1, Kind: "start", State: OpSucceeded, At: time.Now().UTC()})
|
||||||
|
j.Close()
|
||||||
|
// Append a torn (partial) JSON line as a crash would leave.
|
||||||
|
appendRaw(t, path, `{"op_id":"torn","state":`)
|
||||||
|
|
||||||
|
j2, err := OpenJournal(path)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("reopen with torn line: %v", err)
|
||||||
|
}
|
||||||
|
t.Cleanup(func() { j2.Close() })
|
||||||
|
if _, ok := j2.Latest("ok"); !ok {
|
||||||
|
t.Error("the good record before the torn line must still load")
|
||||||
|
}
|
||||||
|
if _, ok := j2.Latest("torn"); ok {
|
||||||
|
t.Error("the torn line must be skipped")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func mustAppend(t *testing.T, j *Journal, e JournalEntry) {
|
||||||
|
t.Helper()
|
||||||
|
if err := j.Append(e); err != nil {
|
||||||
|
t.Fatalf("append: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,47 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import "strings"
|
||||||
|
|
||||||
|
// The normalization layer keeps Proxmox round-trip quirks from reading as drift.
|
||||||
|
// Reconcile compares NORMALIZED desired-vs-actual so a value the agent wrote and then
|
||||||
|
// read back equal does not look like a change. `description`'s trailing newline (the
|
||||||
|
// first proven case, slice-4 pre-check) is the seed; the structure is a per-field
|
||||||
|
// registry so other quirks (omitted-default fields, boolean coercions, list ordering)
|
||||||
|
// slot in as they are discovered — each is a Normalizer mapped to a field name.
|
||||||
|
|
||||||
|
// Normalizer maps a raw field value to its canonical comparison form. It must be
|
||||||
|
// idempotent: Normalizer(Normalizer(x)) == Normalizer(x).
|
||||||
|
type Normalizer func(string) string
|
||||||
|
|
||||||
|
// FieldNormalizers maps a Proxmox config field name to its Normalizer. A field with
|
||||||
|
// no entry compares verbatim (identity).
|
||||||
|
type FieldNormalizers map[string]Normalizer
|
||||||
|
|
||||||
|
// DefaultNormalizers is the production set. Today only `description` needs one (PVE
|
||||||
|
// appends a trailing newline on read). New quirks are added here as they are found.
|
||||||
|
func DefaultNormalizers() FieldNormalizers {
|
||||||
|
return FieldNormalizers{
|
||||||
|
"description": NormDescription,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Norm returns the canonical comparison form of value for field, applying the
|
||||||
|
// field's Normalizer if one is registered, else the value unchanged.
|
||||||
|
func (n FieldNormalizers) Norm(field, value string) string {
|
||||||
|
if f, ok := n[field]; ok && f != nil {
|
||||||
|
return f(value)
|
||||||
|
}
|
||||||
|
return value
|
||||||
|
}
|
||||||
|
|
||||||
|
// Equal reports whether two raw values for field are equal AFTER normalization.
|
||||||
|
func (n FieldNormalizers) Equal(field, a, b string) bool {
|
||||||
|
return n.Norm(field, a) == n.Norm(field, b)
|
||||||
|
}
|
||||||
|
|
||||||
|
// NormDescription strips the trailing newline(s) PVE appends to the LXC `description`
|
||||||
|
// field on read, so a written value round-trips equal. (Proven slice-4 pre-check:
|
||||||
|
// PVE stores `description` with a trailing "\n"; a verbatim compare always mismatches.)
|
||||||
|
// Exported so the --selftest=task description round-trip uses the SAME helper the
|
||||||
|
// reconciler does — one source of truth for the quirk.
|
||||||
|
func NormDescription(s string) string { return strings.TrimRight(s, "\n") }
|
||||||
@@ -0,0 +1,87 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"sort"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestNormDescription_TrimsTrailingNewlines(t *testing.T) {
|
||||||
|
cases := map[string]string{
|
||||||
|
"felhom-selftest 2026": "felhom-selftest 2026",
|
||||||
|
"felhom-selftest 2026\n": "felhom-selftest 2026", // PVE's single trailing \n
|
||||||
|
"x\n\n": "x", // defensive: multiple
|
||||||
|
"": "",
|
||||||
|
"\n": "",
|
||||||
|
"keep trailing spaces ": "keep trailing spaces ", // only newlines stripped
|
||||||
|
}
|
||||||
|
for in, want := range cases {
|
||||||
|
if got := NormDescription(in); got != want {
|
||||||
|
t.Errorf("NormDescription(%q) = %q, want %q", in, got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Idempotent.
|
||||||
|
if NormDescription(NormDescription("a\n")) != NormDescription("a\n") {
|
||||||
|
t.Error("NormDescription not idempotent")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestFieldNormalizers_DescriptionRoundTrip(t *testing.T) {
|
||||||
|
n := DefaultNormalizers()
|
||||||
|
// A value the agent wrote ("...Z") and read back with PVE's newline ("...Z\n")
|
||||||
|
// must compare EQUAL — the whole point of the layer.
|
||||||
|
if !n.Equal("description", "felhom-op", "felhom-op\n") {
|
||||||
|
t.Error("description round-trip should normalize equal")
|
||||||
|
}
|
||||||
|
if n.Equal("description", "felhom-op", "different") {
|
||||||
|
t.Error("genuinely different descriptions must not be equal")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestFieldNormalizers_UnknownFieldIsIdentity(t *testing.T) {
|
||||||
|
n := DefaultNormalizers()
|
||||||
|
if n.Norm("cores", "2\n") != "2\n" {
|
||||||
|
t.Error("a field with no normalizer must compare verbatim")
|
||||||
|
}
|
||||||
|
if n.Equal("cores", "2", "2\n") {
|
||||||
|
t.Error("unknown field must not normalize away differences")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestFieldNormalizers_ExtensibilitySeam proves the structure accepts new quirks
|
||||||
|
// (boolean coercion, list ordering) the way it will as they're discovered — the task's
|
||||||
|
// "structure it so other normalizers slot in." These are synthetic, not production.
|
||||||
|
func TestFieldNormalizers_ExtensibilitySeam(t *testing.T) {
|
||||||
|
booleanCoerce := func(s string) string {
|
||||||
|
switch strings.ToLower(strings.TrimSpace(s)) {
|
||||||
|
case "1", "true", "on", "yes":
|
||||||
|
return "1"
|
||||||
|
default:
|
||||||
|
return "0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
sortCSV := func(s string) string {
|
||||||
|
parts := strings.Split(s, ",")
|
||||||
|
sort.Strings(parts)
|
||||||
|
return strings.Join(parts, ",")
|
||||||
|
}
|
||||||
|
n := FieldNormalizers{
|
||||||
|
"description": NormDescription,
|
||||||
|
"onboot": booleanCoerce,
|
||||||
|
"tags": sortCSV,
|
||||||
|
}
|
||||||
|
|
||||||
|
if !n.Equal("onboot", "true", "1") || !n.Equal("onboot", "on", "1") {
|
||||||
|
t.Error("boolean coercion normalizer should equate truthy forms")
|
||||||
|
}
|
||||||
|
if n.Equal("onboot", "true", "0") {
|
||||||
|
t.Error("boolean coercion must still distinguish true from false")
|
||||||
|
}
|
||||||
|
if !n.Equal("tags", "b,a,c", "a,b,c") {
|
||||||
|
t.Error("list-ordering normalizer should equate reordered lists")
|
||||||
|
}
|
||||||
|
// The built-in still works alongside the synthetic ones.
|
||||||
|
if !n.Equal("description", "d", "d\n") {
|
||||||
|
t.Error("description normalizer should coexist with added ones")
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,129 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"sort"
|
||||||
|
"strconv"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ActionKind is the benign-on-existing-guest action set wired in slice 4. The
|
||||||
|
// destructive set (guest destroy, storage wipe, restore-overwrite, decommission) is
|
||||||
|
// classified and gated in Phase B but not represented here — nothing serves
|
||||||
|
// destructive deltas until slice 10.
|
||||||
|
type ActionKind string
|
||||||
|
|
||||||
|
const (
|
||||||
|
// ActionStart powers on a stopped guest (proxmox VM.PowerMgmt).
|
||||||
|
ActionStart ActionKind = "start"
|
||||||
|
// ActionStop powers off a running guest (proxmox VM.PowerMgmt).
|
||||||
|
ActionStop ActionKind = "stop"
|
||||||
|
// ActionSetConfig applies benign config changes (cores/memory/description) in one
|
||||||
|
// PUT (proxmox VM.Config.*). May return synchronously (empty UPID) — slice-4 proven.
|
||||||
|
ActionSetConfig ActionKind = "set_config"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Action is one minimal mutation the engine will dispatch onto the per-guest queue.
|
||||||
|
// In Phase A every Action is benign by construction (only the benign kinds exist).
|
||||||
|
// Phase B's classifier/gate sits in front of the executor and may tag an action
|
||||||
|
// destructive (requiring a signature) without changing this shape.
|
||||||
|
type Action struct {
|
||||||
|
VMID int
|
||||||
|
Kind ActionKind
|
||||||
|
Params map[string]string // non-nil only for ActionSetConfig
|
||||||
|
Reason string // human/debug: why this action was planned
|
||||||
|
}
|
||||||
|
|
||||||
|
// bytesPerMiB converts the desired-spec MemoryBytes (hub.GuestSpec is in bytes) to
|
||||||
|
// the MiB unit Proxmox's LXC `memory` config field uses.
|
||||||
|
const bytesPerMiB = 1024 * 1024
|
||||||
|
|
||||||
|
// Plan computes the minimal benign action set converging actual → desired. It is a
|
||||||
|
// pure function (deterministic, side-effect-free) so it is exhaustively fixture-test
|
||||||
|
// -able. Actions are returned sorted by vmid, then config-before-runstate per guest.
|
||||||
|
//
|
||||||
|
// Scope rules (slice 4):
|
||||||
|
// - Only guests present in BOTH desired and actual are reconciled. A guest desired
|
||||||
|
// but absent from actual would be PROVISIONING (restore-to-new-guest, slice 7) —
|
||||||
|
// skipped here. A guest actual but not desired would be a DESTROY (destructive,
|
||||||
|
// gated, slice 10) — skipped here.
|
||||||
|
// - Unmanaged desired fields (RunUnspecified / nil Spec / nil Description) produce
|
||||||
|
// no action.
|
||||||
|
// - If actual spec is unknown (GuestConfig read failed), spec/description are not
|
||||||
|
// compared (run-state still is) — we never write a config we couldn't read first.
|
||||||
|
// - Comparisons are NORMALIZED (description trailing-newline, etc.) so a faithful
|
||||||
|
// round-trip is not mistaken for drift.
|
||||||
|
func Plan(desired DesiredState, actual ActualState, norm FieldNormalizers) []Action {
|
||||||
|
if norm == nil {
|
||||||
|
norm = DefaultNormalizers()
|
||||||
|
}
|
||||||
|
vmids := make([]int, 0, len(desired.Guests))
|
||||||
|
for vmid := range desired.Guests {
|
||||||
|
vmids = append(vmids, vmid)
|
||||||
|
}
|
||||||
|
sort.Ints(vmids)
|
||||||
|
|
||||||
|
var actions []Action
|
||||||
|
for _, vmid := range vmids {
|
||||||
|
d := desired.Guests[vmid]
|
||||||
|
a, ok := actual.Guests[vmid]
|
||||||
|
if !ok {
|
||||||
|
// Desired but not present: provisioning (slice 7), not a slice-4 action.
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Benign spec/description changes → a single SetConfig, only when we could
|
||||||
|
// read the current config (else we'd write blind).
|
||||||
|
if a.SpecKnown {
|
||||||
|
params := map[string]string{}
|
||||||
|
var reasons []string
|
||||||
|
if d.Spec != nil {
|
||||||
|
if d.Spec.Cores != a.Cores {
|
||||||
|
params["cores"] = strconv.Itoa(d.Spec.Cores)
|
||||||
|
reasons = append(reasons, fmt.Sprintf("cores %d->%d", a.Cores, d.Spec.Cores))
|
||||||
|
}
|
||||||
|
if want := d.Spec.MemoryBytes / bytesPerMiB; want != a.MemoryMiB {
|
||||||
|
params["memory"] = strconv.FormatInt(want, 10)
|
||||||
|
reasons = append(reasons, fmt.Sprintf("memory %dMiB->%dMiB", a.MemoryMiB, want))
|
||||||
|
}
|
||||||
|
// DiskBytes is intentionally NOT reconciled here (rootfs grow is
|
||||||
|
// `pct resize`, grow-only and separate — a later slice).
|
||||||
|
}
|
||||||
|
if d.Description != nil && !norm.Equal("description", *d.Description, a.Description) {
|
||||||
|
params["description"] = *d.Description
|
||||||
|
reasons = append(reasons, "description")
|
||||||
|
}
|
||||||
|
if len(params) > 0 {
|
||||||
|
actions = append(actions, Action{
|
||||||
|
VMID: vmid,
|
||||||
|
Kind: ActionSetConfig,
|
||||||
|
Params: params,
|
||||||
|
Reason: "spec drift: " + join(reasons),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run-state (Start/Stop) — always comparable from the list status.
|
||||||
|
if d.Run != RunUnspecified && d.Run != a.Run {
|
||||||
|
switch d.Run {
|
||||||
|
case RunRunning:
|
||||||
|
actions = append(actions, Action{VMID: vmid, Kind: ActionStart,
|
||||||
|
Reason: fmt.Sprintf("run %q->running", a.Run)})
|
||||||
|
case RunStopped:
|
||||||
|
actions = append(actions, Action{VMID: vmid, Kind: ActionStop,
|
||||||
|
Reason: fmt.Sprintf("run %q->stopped", a.Run)})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return actions
|
||||||
|
}
|
||||||
|
|
||||||
|
func join(parts []string) string {
|
||||||
|
out := ""
|
||||||
|
for i, p := range parts {
|
||||||
|
if i > 0 {
|
||||||
|
out += ", "
|
||||||
|
}
|
||||||
|
out += p
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
@@ -0,0 +1,211 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-agent/internal/hub"
|
||||||
|
)
|
||||||
|
|
||||||
|
func sp(s string) *string { return &s }
|
||||||
|
|
||||||
|
func mib(n int64) int64 { return n * bytesPerMiB }
|
||||||
|
|
||||||
|
// desired/actual builders keep the table compact.
|
||||||
|
func desired(gs ...DesiredGuest) DesiredState {
|
||||||
|
m := map[int]DesiredGuest{}
|
||||||
|
for _, g := range gs {
|
||||||
|
m[g.VMID] = g
|
||||||
|
}
|
||||||
|
return DesiredState{Guests: m}
|
||||||
|
}
|
||||||
|
|
||||||
|
func actual(gs ...ActualGuest) ActualState {
|
||||||
|
m := map[int]ActualGuest{}
|
||||||
|
for _, g := range gs {
|
||||||
|
m[g.VMID] = g
|
||||||
|
}
|
||||||
|
return ActualState{Guests: m}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_RunStateStartAndStop(t *testing.T) {
|
||||||
|
// stopped -> running
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Run: RunRunning}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got, Action{VMID: 100, Kind: ActionStart})
|
||||||
|
|
||||||
|
// running -> stopped
|
||||||
|
got = Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Run: RunStopped}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunRunning, SpecKnown: true}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got, Action{VMID: 100, Kind: ActionStop})
|
||||||
|
|
||||||
|
// already matches -> nothing
|
||||||
|
got = Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Run: RunRunning}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunRunning, SpecKnown: true}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_SpecDrift(t *testing.T) {
|
||||||
|
// cores change
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Spec: &hub.GuestSpec{Cores: 4, MemoryBytes: mib(2048)}}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true, Cores: 2, MemoryMiB: 2048}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got, Action{VMID: 100, Kind: ActionSetConfig, Params: map[string]string{"cores": "4"}})
|
||||||
|
|
||||||
|
// memory change (bytes desired -> MiB param)
|
||||||
|
got = Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Spec: &hub.GuestSpec{Cores: 2, MemoryBytes: mib(4096)}}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true, Cores: 2, MemoryMiB: 2048}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got, Action{VMID: 100, Kind: ActionSetConfig, Params: map[string]string{"memory": "4096"}})
|
||||||
|
|
||||||
|
// no spec drift -> nothing
|
||||||
|
got = Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Spec: &hub.GuestSpec{Cores: 2, MemoryBytes: mib(2048)}}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true, Cores: 2, MemoryMiB: 2048}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_DiskNotReconciled(t *testing.T) {
|
||||||
|
// DiskBytes differs but is intentionally not reconciled (pct resize, later slice).
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Spec: &hub.GuestSpec{Cores: 2, MemoryBytes: mib(2048), DiskBytes: 1 << 40}}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true, Cores: 2, MemoryMiB: 2048}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_DescriptionNormalizedNoFalseDrift(t *testing.T) {
|
||||||
|
// PVE returns the description with a trailing newline; desired has none. Must NOT
|
||||||
|
// be planned as drift.
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Description: sp("felhom-managed")}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true, Description: "felhom-managed\n"}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got)
|
||||||
|
|
||||||
|
// A genuine description change IS planned.
|
||||||
|
got = Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Description: sp("new-desc")}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true, Description: "old-desc\n"}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got, Action{VMID: 100, Kind: ActionSetConfig, Params: map[string]string{"description": "new-desc"}})
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_UnmanagedFieldsProduceNothing(t *testing.T) {
|
||||||
|
// Run unspecified, Spec nil, Description nil -> the reconciler leaves it alone even
|
||||||
|
// though actual differs from "defaults".
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunRunning, SpecKnown: true, Cores: 8, MemoryMiB: 9999, Description: "whatever"}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_SpecUnknownSkipsConfigButKeepsRunState(t *testing.T) {
|
||||||
|
// GuestConfig read failed (SpecKnown=false): never write a config we couldn't read,
|
||||||
|
// but run-state is still comparable from the list.
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Run: RunRunning, Spec: &hub.GuestSpec{Cores: 4, MemoryBytes: mib(4096)}, Description: sp("x")}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: false}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got, Action{VMID: 100, Kind: ActionStart})
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_DesiredAbsentInActualSkipped(t *testing.T) {
|
||||||
|
// A guest desired but not present would be provisioning (slice 7) — not a slice-4
|
||||||
|
// action. And a guest present but not desired would be a destroy (gated, slice 10).
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 200, Run: RunRunning}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_CombinedConfigBeforeRunState(t *testing.T) {
|
||||||
|
// cores + memory + description + run change: one SetConfig (all params) THEN the
|
||||||
|
// run-state action, both on the same vmid (the queue serializes them).
|
||||||
|
got := Plan(
|
||||||
|
desired(DesiredGuest{VMID: 100, Run: RunRunning,
|
||||||
|
Spec: &hub.GuestSpec{Cores: 4, MemoryBytes: mib(4096)}, Description: sp("new")}),
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true, Cores: 2, MemoryMiB: 2048, Description: "old\n"}),
|
||||||
|
nil)
|
||||||
|
if len(got) != 2 {
|
||||||
|
t.Fatalf("want 2 actions, got %d: %+v", len(got), got)
|
||||||
|
}
|
||||||
|
if got[0].Kind != ActionSetConfig {
|
||||||
|
t.Errorf("first action should be SetConfig, got %s", got[0].Kind)
|
||||||
|
}
|
||||||
|
for _, k := range []string{"cores", "memory", "description"} {
|
||||||
|
if _, ok := got[0].Params[k]; !ok {
|
||||||
|
t.Errorf("SetConfig params missing %q: %v", k, got[0].Params)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if got[1].Kind != ActionStart {
|
||||||
|
t.Errorf("second action should be Start, got %s", got[1].Kind)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_EmptyDesiredNoActions(t *testing.T) {
|
||||||
|
// The slice-4 production case: empty desired -> zero actions regardless of actual.
|
||||||
|
got := Plan(
|
||||||
|
DesiredState{Guests: map[int]DesiredGuest{}},
|
||||||
|
actual(ActualGuest{VMID: 100, Run: RunRunning, SpecKnown: true}),
|
||||||
|
nil)
|
||||||
|
mustActions(t, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPlan_DeterministicVMIDOrder(t *testing.T) {
|
||||||
|
got := Plan(
|
||||||
|
desired(
|
||||||
|
DesiredGuest{VMID: 300, Run: RunRunning},
|
||||||
|
DesiredGuest{VMID: 100, Run: RunRunning},
|
||||||
|
DesiredGuest{VMID: 200, Run: RunRunning},
|
||||||
|
),
|
||||||
|
actual(
|
||||||
|
ActualGuest{VMID: 100, Run: RunStopped, SpecKnown: true},
|
||||||
|
ActualGuest{VMID: 200, Run: RunStopped, SpecKnown: true},
|
||||||
|
ActualGuest{VMID: 300, Run: RunStopped, SpecKnown: true},
|
||||||
|
),
|
||||||
|
nil)
|
||||||
|
if len(got) != 3 || got[0].VMID != 100 || got[1].VMID != 200 || got[2].VMID != 300 {
|
||||||
|
t.Fatalf("actions not sorted by vmid: %+v", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// mustActions asserts the planned actions equal want (vmid+kind+params), ignoring
|
||||||
|
// Reason (debug-only).
|
||||||
|
func mustActions(t *testing.T, got []Action, want ...Action) {
|
||||||
|
t.Helper()
|
||||||
|
if len(got) != len(want) {
|
||||||
|
t.Fatalf("got %d actions, want %d: %+v", len(got), len(want), got)
|
||||||
|
}
|
||||||
|
for i := range want {
|
||||||
|
if got[i].VMID != want[i].VMID || got[i].Kind != want[i].Kind {
|
||||||
|
t.Errorf("action[%d] = {vmid:%d kind:%s}, want {vmid:%d kind:%s}",
|
||||||
|
i, got[i].VMID, got[i].Kind, want[i].VMID, want[i].Kind)
|
||||||
|
}
|
||||||
|
if !sameParams(got[i].Params, want[i].Params) {
|
||||||
|
t.Errorf("action[%d] params = %v, want %v", i, got[i].Params, want[i].Params)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func sameParams(a, b map[string]string) bool {
|
||||||
|
if len(a) != len(b) {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
for k, v := range b {
|
||||||
|
if a[k] != v {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
}
|
||||||
@@ -0,0 +1,137 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"errors"
|
||||||
|
"sync"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ErrQueueClosed is returned for a job submitted after Close.
|
||||||
|
var ErrQueueClosed = errors.New("reconcile: queue closed")
|
||||||
|
|
||||||
|
// Queue is the per-guest serializer (doc 03 §10): the single choke point ALL
|
||||||
|
// mutation sources funnel through (reconcile now; one-shot jobs and the local API
|
||||||
|
// later). Jobs for the SAME vmid run strictly one-at-a-time in submit order;
|
||||||
|
// independent vmids proceed in parallel. This is what keeps Proxmox from ever seeing
|
||||||
|
// concurrent conflicting ops on one guest while letting a multi-guest host converge
|
||||||
|
// concurrently.
|
||||||
|
//
|
||||||
|
// Each vmid gets a "lane": a goroutine draining an unbounded FIFO of jobs. Submit
|
||||||
|
// never blocks the caller (the FIFO is unbounded) and returns a channel that delivers
|
||||||
|
// that job's error when it runs — so a caller can await a specific job while others
|
||||||
|
// proceed.
|
||||||
|
type Queue struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
lanes map[int]*lane
|
||||||
|
closed bool
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewQueue builds an empty queue. Lanes are created lazily on first Submit per vmid
|
||||||
|
// and live for the queue's lifetime (a host has a bounded guest set); Close stops them.
|
||||||
|
func NewQueue() *Queue {
|
||||||
|
return &Queue{lanes: make(map[int]*lane)}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Submit enqueues fn on vmid's lane and returns a buffered channel that receives
|
||||||
|
// fn's return value (or ErrQueueClosed) exactly once. Submission preserves per-lane
|
||||||
|
// FIFO order; different lanes run concurrently. The result channel is buffered, so the
|
||||||
|
// lane never blocks delivering a result even if the caller abandons it.
|
||||||
|
func (q *Queue) Submit(vmid int, fn func() error) <-chan error {
|
||||||
|
res := make(chan error, 1)
|
||||||
|
|
||||||
|
q.mu.Lock()
|
||||||
|
if q.closed {
|
||||||
|
q.mu.Unlock()
|
||||||
|
res <- ErrQueueClosed
|
||||||
|
return res
|
||||||
|
}
|
||||||
|
l := q.lanes[vmid]
|
||||||
|
if l == nil {
|
||||||
|
l = newLane()
|
||||||
|
q.lanes[vmid] = l
|
||||||
|
go l.run()
|
||||||
|
}
|
||||||
|
q.mu.Unlock()
|
||||||
|
|
||||||
|
l.enqueue(&laneTask{fn: fn, res: res})
|
||||||
|
return res
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close stops accepting new jobs and signals every lane to drain its pending jobs and
|
||||||
|
// exit. Already-queued jobs still run (graceful drain). Idempotent.
|
||||||
|
func (q *Queue) Close() {
|
||||||
|
q.mu.Lock()
|
||||||
|
if q.closed {
|
||||||
|
q.mu.Unlock()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
q.closed = true
|
||||||
|
lanes := make([]*lane, 0, len(q.lanes))
|
||||||
|
for _, l := range q.lanes {
|
||||||
|
lanes = append(lanes, l)
|
||||||
|
}
|
||||||
|
q.mu.Unlock()
|
||||||
|
|
||||||
|
for _, l := range lanes {
|
||||||
|
l.close()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
type laneTask struct {
|
||||||
|
fn func() error
|
||||||
|
res chan error
|
||||||
|
}
|
||||||
|
|
||||||
|
// lane is one vmid's serial worker: a goroutine draining an unbounded FIFO guarded by
|
||||||
|
// a cond var. cond-var (not a buffered channel) gives unbounded, non-blocking,
|
||||||
|
// order-preserving submission without an arbitrary capacity.
|
||||||
|
type lane struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
cond *sync.Cond
|
||||||
|
pending []*laneTask
|
||||||
|
closed bool
|
||||||
|
}
|
||||||
|
|
||||||
|
func newLane() *lane {
|
||||||
|
l := &lane{}
|
||||||
|
l.cond = sync.NewCond(&l.mu)
|
||||||
|
return l
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *lane) enqueue(t *laneTask) {
|
||||||
|
l.mu.Lock()
|
||||||
|
if l.closed {
|
||||||
|
l.mu.Unlock()
|
||||||
|
t.res <- ErrQueueClosed
|
||||||
|
return
|
||||||
|
}
|
||||||
|
l.pending = append(l.pending, t)
|
||||||
|
l.cond.Signal()
|
||||||
|
l.mu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *lane) close() {
|
||||||
|
l.mu.Lock()
|
||||||
|
l.closed = true
|
||||||
|
l.cond.Broadcast()
|
||||||
|
l.mu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
// run drains the FIFO one job at a time. On close it finishes any already-queued jobs
|
||||||
|
// (so nothing submitted-before-close is silently dropped) and then exits.
|
||||||
|
func (l *lane) run() {
|
||||||
|
for {
|
||||||
|
l.mu.Lock()
|
||||||
|
for len(l.pending) == 0 && !l.closed {
|
||||||
|
l.cond.Wait()
|
||||||
|
}
|
||||||
|
if len(l.pending) == 0 { // closed and drained
|
||||||
|
l.mu.Unlock()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
t := l.pending[0]
|
||||||
|
l.pending = l.pending[1:]
|
||||||
|
l.mu.Unlock()
|
||||||
|
|
||||||
|
t.res <- t.fn()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,137 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"errors"
|
||||||
|
"sync"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestQueue_SameGuestSerialized asserts that jobs for one vmid run strictly
|
||||||
|
// one-at-a-time in submit order — the core §10 guarantee that keeps Proxmox from
|
||||||
|
// seeing concurrent conflicting ops on a guest.
|
||||||
|
func TestQueue_SameGuestSerialized(t *testing.T) {
|
||||||
|
q := NewQueue()
|
||||||
|
defer q.Close()
|
||||||
|
|
||||||
|
const n = 50
|
||||||
|
var mu sync.Mutex
|
||||||
|
var order []int
|
||||||
|
inside := 0
|
||||||
|
maxConcurrent := 0
|
||||||
|
|
||||||
|
chans := make([]<-chan error, n)
|
||||||
|
for i := 0; i < n; i++ {
|
||||||
|
i := i
|
||||||
|
chans[i] = q.Submit(100, func() error {
|
||||||
|
mu.Lock()
|
||||||
|
inside++
|
||||||
|
if inside > maxConcurrent {
|
||||||
|
maxConcurrent = inside
|
||||||
|
}
|
||||||
|
order = append(order, i)
|
||||||
|
mu.Unlock()
|
||||||
|
|
||||||
|
time.Sleep(time.Millisecond) // widen any overlap window
|
||||||
|
|
||||||
|
mu.Lock()
|
||||||
|
inside--
|
||||||
|
mu.Unlock()
|
||||||
|
return nil
|
||||||
|
})
|
||||||
|
}
|
||||||
|
for _, ch := range chans {
|
||||||
|
if err := <-ch; err != nil {
|
||||||
|
t.Fatalf("unexpected job error: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if maxConcurrent != 1 {
|
||||||
|
t.Errorf("same-guest jobs overlapped: maxConcurrent=%d, want 1", maxConcurrent)
|
||||||
|
}
|
||||||
|
for i := 0; i < n; i++ {
|
||||||
|
if order[i] != i {
|
||||||
|
t.Fatalf("same-guest jobs ran out of submit order: got %v", order)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestQueue_DifferentGuestsParallel asserts independent vmids proceed concurrently:
|
||||||
|
// two jobs on different lanes that each wait for the other before finishing must BOTH
|
||||||
|
// complete (they'd deadlock under a global lock / single worker).
|
||||||
|
func TestQueue_DifferentGuestsParallel(t *testing.T) {
|
||||||
|
q := NewQueue()
|
||||||
|
defer q.Close()
|
||||||
|
|
||||||
|
aReady := make(chan struct{})
|
||||||
|
bReady := make(chan struct{})
|
||||||
|
|
||||||
|
chA := q.Submit(1, func() error {
|
||||||
|
close(aReady)
|
||||||
|
select {
|
||||||
|
case <-bReady:
|
||||||
|
return nil
|
||||||
|
case <-time.After(2 * time.Second):
|
||||||
|
return errors.New("guest 1 timed out waiting for guest 2 (not parallel)")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
chB := q.Submit(2, func() error {
|
||||||
|
close(bReady)
|
||||||
|
select {
|
||||||
|
case <-aReady:
|
||||||
|
return nil
|
||||||
|
case <-time.After(2 * time.Second):
|
||||||
|
return errors.New("guest 2 timed out waiting for guest 1 (not parallel)")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
if err := <-chA; err != nil {
|
||||||
|
t.Error(err)
|
||||||
|
}
|
||||||
|
if err := <-chB; err != nil {
|
||||||
|
t.Error(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestQueue_PropagatesJobError confirms a job's error reaches its result channel.
|
||||||
|
func TestQueue_PropagatesJobError(t *testing.T) {
|
||||||
|
q := NewQueue()
|
||||||
|
defer q.Close()
|
||||||
|
want := errors.New("boom")
|
||||||
|
if got := <-q.Submit(7, func() error { return want }); got != want {
|
||||||
|
t.Errorf("Submit result = %v, want %v", got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestQueue_DrainsPendingOnClose confirms jobs already queued before Close still run.
|
||||||
|
func TestQueue_DrainsPendingOnClose(t *testing.T) {
|
||||||
|
q := NewQueue()
|
||||||
|
release := make(chan struct{})
|
||||||
|
var ran sync.WaitGroup
|
||||||
|
ran.Add(2)
|
||||||
|
|
||||||
|
// First job blocks until released, pinning the lane so the second sits pending.
|
||||||
|
ch1 := q.Submit(5, func() error { <-release; ran.Done(); return nil })
|
||||||
|
ch2 := q.Submit(5, func() error { ran.Done(); return nil })
|
||||||
|
|
||||||
|
q.Close() // close while job1 is queued/running and job2 is pending
|
||||||
|
close(release)
|
||||||
|
|
||||||
|
if err := <-ch1; err != nil {
|
||||||
|
t.Errorf("job1 err: %v", err)
|
||||||
|
}
|
||||||
|
if err := <-ch2; err != nil {
|
||||||
|
t.Errorf("pending job2 should still run after Close, got: %v", err)
|
||||||
|
}
|
||||||
|
ran.Wait()
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestQueue_SubmitAfterClose returns ErrQueueClosed.
|
||||||
|
func TestQueue_SubmitAfterClose(t *testing.T) {
|
||||||
|
q := NewQueue()
|
||||||
|
q.Close()
|
||||||
|
if got := <-q.Submit(1, func() error { return nil }); got != ErrQueueClosed {
|
||||||
|
t.Errorf("Submit after Close = %v, want ErrQueueClosed", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,130 @@
|
|||||||
|
package reconcile
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-agent/internal/hub"
|
||||||
|
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
|
||||||
|
)
|
||||||
|
|
||||||
|
// RunState is a guest's desired/actual power state. The empty value means
|
||||||
|
// "unmanaged" on the desired side (the reconciler leaves run-state alone).
|
||||||
|
type RunState string
|
||||||
|
|
||||||
|
const (
|
||||||
|
// RunUnspecified (the zero value) — on a DesiredGuest it means run-state is not
|
||||||
|
// managed; the reconciler never starts/stops the guest for run-state reasons.
|
||||||
|
RunUnspecified RunState = ""
|
||||||
|
// RunRunning maps to proxmox status "running".
|
||||||
|
RunRunning RunState = "running"
|
||||||
|
// RunStopped maps to proxmox status "stopped".
|
||||||
|
RunStopped RunState = "stopped"
|
||||||
|
)
|
||||||
|
|
||||||
|
// normRun maps a raw proxmox status string to a RunState, collapsing anything
|
||||||
|
// unrecognized (e.g. "") to RunUnspecified so actual-state comparison is well-defined.
|
||||||
|
func normRun(status string) RunState {
|
||||||
|
switch status {
|
||||||
|
case "running":
|
||||||
|
return RunRunning
|
||||||
|
case "stopped":
|
||||||
|
return RunStopped
|
||||||
|
default:
|
||||||
|
return RunUnspecified
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// DesiredGuest is the target state for one existing guest. Every field is
|
||||||
|
// individually optional ("unmanaged") so a desired-state source can pin only what it
|
||||||
|
// cares about — slice 4's planner only acts on the fields that are set.
|
||||||
|
type DesiredGuest struct {
|
||||||
|
VMID int
|
||||||
|
// Run is the target power state; RunUnspecified leaves it alone.
|
||||||
|
Run RunState
|
||||||
|
// Spec, when non-nil, manages sizing. Reuses hub.GuestSpec (cores/memory/disk).
|
||||||
|
// Phase A reconciles Cores and Memory via SetConfig; DiskBytes is reported but
|
||||||
|
// NOT reconciled here (a rootfs grow is `pct resize`, grow-only and separate —
|
||||||
|
// deferred to a later slice). Nil = sizing unmanaged.
|
||||||
|
Spec *hub.GuestSpec
|
||||||
|
// Description, when non-nil, manages the cosmetic `description` field (the first
|
||||||
|
// proven SetConfig round-trip, slice-4 pre-check). Nil = unmanaged.
|
||||||
|
Description *string
|
||||||
|
}
|
||||||
|
|
||||||
|
// DesiredState is the vmid-keyed target for this host. At slice 4 the only live
|
||||||
|
// source is the empty provider, so Guests is empty in production; fixtures inject it
|
||||||
|
// in tests. Host-level desired state (storage manifest, etc.) arrives in later slices.
|
||||||
|
type DesiredState struct {
|
||||||
|
Guests map[int]DesiredGuest
|
||||||
|
}
|
||||||
|
|
||||||
|
// ActualGuest is one guest's observed state, read from Proxmox.
|
||||||
|
type ActualGuest struct {
|
||||||
|
VMID int
|
||||||
|
Run RunState
|
||||||
|
// SpecKnown is false when GuestConfig could not be read (the run-state from the
|
||||||
|
// list is still trusted; spec/description comparisons are skipped). Mirrors the
|
||||||
|
// collector's "keep run-status, omit spec" degradation.
|
||||||
|
SpecKnown bool
|
||||||
|
Cores int
|
||||||
|
MemoryMiB int64 // proxmox LXC `memory` is MiB
|
||||||
|
Description string // raw (may carry PVE's trailing newline; compared via normalizers)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ActualState is the vmid-keyed observed state for this host.
|
||||||
|
type ActualState struct {
|
||||||
|
Guests map[int]ActualGuest
|
||||||
|
}
|
||||||
|
|
||||||
|
// DesiredProvider is the seam the desired-state source plugs into. At slice 4 the
|
||||||
|
// only implementation is EmptyProvider (no live source); slice 10's hub-serving path
|
||||||
|
// is the real implementation. Do NOT invent a hub/local-file source here.
|
||||||
|
type DesiredProvider interface {
|
||||||
|
Desired(ctx context.Context) (DesiredState, error)
|
||||||
|
}
|
||||||
|
|
||||||
|
// EmptyProvider is the slice-4 production provider: no desired state, so reconcile is
|
||||||
|
// a live no-op (the engine computes an empty action set).
|
||||||
|
type EmptyProvider struct{}
|
||||||
|
|
||||||
|
// Desired returns an empty desired state.
|
||||||
|
func (EmptyProvider) Desired(context.Context) (DesiredState, error) {
|
||||||
|
return DesiredState{Guests: map[int]DesiredGuest{}}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// StaticProvider serves a fixed DesiredState — used by fixtures (and usable as a
|
||||||
|
// local override later). It never mutates the value it was given.
|
||||||
|
type StaticProvider struct{ State DesiredState }
|
||||||
|
|
||||||
|
// Desired returns the static state.
|
||||||
|
func (p StaticProvider) Desired(context.Context) (DesiredState, error) { return p.State, nil }
|
||||||
|
|
||||||
|
// GuestAPI is the narrow Proxmox surface the engine needs: read actual state and
|
||||||
|
// dispatch the benign-on-existing-guest mutations. *proxmox.Client satisfies it; a
|
||||||
|
// fake satisfies it in tests. Every mutating call returns a UPID (or "" for the
|
||||||
|
// synchronous path) per the proxmox/mutate.go contract — the engine WaitTasks a
|
||||||
|
// non-empty UPID and treats "" as a clean synchronous success.
|
||||||
|
type GuestAPI interface {
|
||||||
|
ListLXC(ctx context.Context) ([]proxmox.Guest, error)
|
||||||
|
GuestConfig(ctx context.Context, vmid int) (proxmox.GuestConfig, error)
|
||||||
|
Start(ctx context.Context, vmid int) (string, error)
|
||||||
|
Stop(ctx context.Context, vmid int) (string, error)
|
||||||
|
SetConfig(ctx context.Context, vmid int, params map[string]string) (string, error)
|
||||||
|
WaitTask(ctx context.Context, upid string, opts proxmox.WaitOptions) (proxmox.TaskStatus, error)
|
||||||
|
}
|
||||||
|
|
||||||
|
// guestDescription decodes the (string-valued) `description` key from a GuestConfig's
|
||||||
|
// raw Extra map, returning "" when absent. The value is returned raw — PVE appends a
|
||||||
|
// trailing newline on read, which the normalization layer strips at comparison time.
|
||||||
|
func guestDescription(cfg proxmox.GuestConfig) string {
|
||||||
|
raw, ok := cfg.Extra["description"]
|
||||||
|
if !ok || len(raw) == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
var s string
|
||||||
|
if err := json.Unmarshal(raw, &s); err != nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return s
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user