1af21a6cac
The security core of slice 4: hub-supplied intent is no longer trusted for destructive change. The gate fronts the per-guest queue's executor, so every mutation passes it. Reuses internal/authz for all crypto (surface untouched). - Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing- ness, NOT by verb. Destroy/overwrite of customer data is destructive unless agent-internal provenance (same-journaled-txn create, or agent-tagged scratch) makes it benign — and that provenance is journal-recorded, NEVER hub-sourced. Unknown op class fails safe to destructive. - Reversibility gate: benign -> allowed unsigned; destructive -> requires a verified, role-scoped, action-bound operator signature, else pending_signature and never executed. Every decision audited (signal, never the guard). - Signed-op consuming layer over authz.Verifier.Verify (locked pipeline untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational= ordinary destructive + planned rotation) + op-to-action binding (op+host+ guest+params must match the gated action). - Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped execution via an injected DestructiveExecutor (nil this slice — inert). - Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at startup (resume-or-rollback) — covers an op that crashed after the POST and before its terminal record, which idempotency dedupe alone cannot. Added TaskStatusOnce to the GuestAPI seam. Wired into daemon startup. - Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift. - Daemon: builds the verifier from config signers (none = nil verifier, the common slice-4 state), the gate (+SlogAudit), runs Recover before mutating. Adversarial matrix proven against the REAL authz.Verifier with in-test-minted SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no signing added to the verify-only package): unsigned job + unsigned desired-state delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong host -> typed authz rejections; wrong guest/op/params -> binding_mismatch; recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag ignored -> refused; valid+role+target+fresh nonce -> accepted then replay rejected. Full module race-clean + vet-clean on the Linux build server. Inert this slice: no destructive deltas served until slice 10; the destructive path is classified, gated, and tested but not wired to live execution. CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
96 lines
5.5 KiB
Markdown
96 lines
5.5 KiB
Markdown
# REPORT — Slice 4: reconcile engine + the reversibility gate (v0.4.0) (2026-06-08)
|
|
|
|
> Overwrite-latest report (most recent significant work only). Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
|
|
|
|
## Outcome
|
|
|
|
**Slice 4 is complete and pushed as `v0.4.0`.** Both phases landed:
|
|
|
|
- **Phase A** (structural, pushed earlier as `v0.4.0-rc1`): the reconcile engine, the
|
|
per-guest serializer (doc 03 §10), the desired-state model + provider seam, the
|
|
field-normalization layer, the plan/diff engine, and the durable op journal +
|
|
idempotency store. Runs **live but unfed** — `EmptyProvider` → zero mutations until
|
|
slice 10 serves desired state.
|
|
- **Phase B** (this push, the security core): the benign/destructive **classifier**,
|
|
the **reversibility gate**, and the **signed-op consuming layer** over `internal/authz`
|
|
— with role-scoping, op-to-action binding, idempotency/journaling, audit, and the
|
|
crash-recovery consumer. The gate sits in front of the per-guest queue's executor, so
|
|
**every mutation passes it**.
|
|
|
|
The whole module is **race-clean and vet-clean** on the Linux build server; 62 reconcile
|
|
tests pass (the adversarial matrix runs against the real `authz.Verifier`).
|
|
|
|
## The security model (Phase B)
|
|
|
|
Hub-supplied intent is no longer trusted for destructive change — **by provenance +
|
|
data-bearing-ness, not by verb** (doc 03 §4):
|
|
|
|
- **Benign** (unsigned): start/stop/restart/create, and destroying a resource the agent
|
|
created in the **same journaled transaction** (compensating rollback) or **tagged
|
|
scratch**. That scratch/same-txn provenance is **agent-internal, journal-recorded, and
|
|
never accepted from the hub** — a compromised hub cannot relabel a data-bearing guest
|
|
as scratch to walk the gate.
|
|
- **Destructive** (signature required): destroy/overwrite of the only/primary copy of
|
|
customer data — **regardless of whether it arrives as a job or a desired-state delta**.
|
|
Absent/invalid signature → refused **`pending_signature`**, never executed.
|
|
|
|
The signed-op consuming layer calls `authz.Verifier.Verify` (the locked
|
|
namespace→allow-list→crypto→target→time→nonce pipeline, untouched) and then enforces
|
|
the slice-4 policy on the `VerifiedOp`: **role-scoping** (recovery key = key-rotation
|
|
only; operational key = ordinary destructive + planned rotation, doc 04 §4) and
|
|
**op-to-action binding** (the verified op + host + guest + params must name the exact
|
|
gated action). Idempotency keys the journal by the op nonce; every decision is audited
|
|
(a signal, never the guard).
|
|
|
|
## Inert by design (slice-4 scope)
|
|
|
|
There is **no live destructive execution** this slice: nothing serves destructive deltas
|
|
until slice 10, and the guest-destroy/storage-wipe/restore-overwrite executors land in
|
|
6/7. So the destructive path is fully **classified, gated, and adversarially tested**,
|
|
but `RunSignedJob`'s executor is nil in production — an authorized destructive op is
|
|
journaled as authorized-but-not-executed. Reconcile itself only produces the benign
|
|
Start/Stop/SetConfig set, all allowed through the gate unsigned.
|
|
|
|
## Adversarial proof (each case independently rejected)
|
|
|
|
Run against the **real** `authz.Verifier` with in-test-minted SSHSIGs (the ~40-line
|
|
framing is replicated in reconcile's test binary — production `authz` is untouched and
|
|
gains no signing capability; live minting is required because the verifier's clock is
|
|
not cross-package injectable):
|
|
|
|
unsigned destructive **job** → pending_signature · unsigned destructive **desired-state
|
|
delta** → pending_signature (distrusts hub desired state, not just jobs) · forged /
|
|
unknown signer → `ErrUnknownSigner` · expired → `ErrExpired` · **replayed nonce across an
|
|
agent restart** (durable `FileNonceStore`) → `ErrReplay` · wrong host → `ErrTarget` ·
|
|
wrong guest / wrong op / wrong params → binding_mismatch · **recovery key on ordinary
|
|
destructive** → role_denied · **hub-supplied "scratch" tag** on a data-bearing guest →
|
|
ignored, still destructive → refused · **valid + correct role + correct target + fresh
|
|
nonce → accepted**, and a second presentation → `ErrReplay`.
|
|
|
|
## The two forward-looking notes
|
|
|
|
- **Note 1 (carried in)** — the `InFlight()` **resume-or-rollback** startup consumer
|
|
(`Engine.Recover`) landed **together with** the signed-op executor, as required. An op
|
|
that crashed after the Proxmox POST but before its terminal record (`OpTaskRunning`,
|
|
nonce already consumed) is not covered by idempotency dedupe — only this consumer
|
|
resolves it (re-read the task via the new `TaskStatusOnce`, record the real outcome; a
|
|
no-task-id op is abandoned fail-safe). Wired into daemon startup and tested.
|
|
- **Note 2 (addressed)** — the memory comparison is canonicalized (`desiredMemoryMiB`):
|
|
desired and actual compare in the same MiB unit that is then written, so a
|
|
non-MiB-aligned `MemoryBytes` converges in one pass rather than re-issuing SetConfig
|
|
every cycle. A test proves convergence. Recommendation stands that slice 10 serve
|
|
MiB-aligned specs at the source.
|
|
|
|
## Verification
|
|
|
|
- `go test -race -count=1 ./...` and `go vet ./...` clean on the Linux build server
|
|
(go1.26); all tests green locally and there.
|
|
- No live Proxmox needed — Phase A is unfed and Phase B's destructive path is inert this
|
|
slice. The gate's crypto path is proven end-to-end against the real verifier.
|
|
|
|
## Conventions
|
|
|
|
Version → **v0.4.0**. CHANGELOG has a per-phase entry (newest on top). No secrets in any
|
|
committed file. Pushed to `main`. Per the task, I stop at this checkpoint and await the
|
|
validation pass.
|