Files
felhom-agent/REPORT.md
T
admin 1af21a6cac v0.4.0: slice 4 Phase B — reversibility gate + signed-op consuming layer
The security core of slice 4: hub-supplied intent is no longer trusted for
destructive change. The gate fronts the per-guest queue's executor, so every
mutation passes it. Reuses internal/authz for all crypto (surface untouched).

- Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing-
  ness, NOT by verb. Destroy/overwrite of customer data is destructive unless
  agent-internal provenance (same-journaled-txn create, or agent-tagged scratch)
  makes it benign — and that provenance is journal-recorded, NEVER hub-sourced.
  Unknown op class fails safe to destructive.
- Reversibility gate: benign -> allowed unsigned; destructive -> requires a
  verified, role-scoped, action-bound operator signature, else pending_signature
  and never executed. Every decision audited (signal, never the guard).
- Signed-op consuming layer over authz.Verifier.Verify (locked pipeline
  untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational=
  ordinary destructive + planned rotation) + op-to-action binding (op+host+
  guest+params must match the gated action).
- Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped
  execution via an injected DestructiveExecutor (nil this slice — inert).
- Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at
  startup (resume-or-rollback) — covers an op that crashed after the POST and
  before its terminal record, which idempotency dedupe alone cannot. Added
  TaskStatusOnce to the GuestAPI seam. Wired into daemon startup.
- Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a
  non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift.
- Daemon: builds the verifier from config signers (none = nil verifier, the
  common slice-4 state), the gate (+SlogAudit), runs Recover before mutating.

Adversarial matrix proven against the REAL authz.Verifier with in-test-minted
SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no
signing added to the verify-only package): unsigned job + unsigned desired-state
delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong
host -> typed authz rejections; wrong guest/op/params -> binding_mismatch;
recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag
ignored -> refused; valid+role+target+fresh nonce -> accepted then replay
rejected. Full module race-clean + vet-clean on the Linux build server.

Inert this slice: no destructive deltas served until slice 10; the destructive
path is classified, gated, and tested but not wired to live execution.

CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 23:56:20 +02:00

5.5 KiB

REPORT — Slice 4: reconcile engine + the reversibility gate (v0.4.0) (2026-06-08)

Overwrite-latest report (most recent significant work only). Cumulative history lives in CHANGELOG.md.

Outcome

Slice 4 is complete and pushed as v0.4.0. Both phases landed:

  • Phase A (structural, pushed earlier as v0.4.0-rc1): the reconcile engine, the per-guest serializer (doc 03 §10), the desired-state model + provider seam, the field-normalization layer, the plan/diff engine, and the durable op journal + idempotency store. Runs live but unfedEmptyProvider → zero mutations until slice 10 serves desired state.
  • Phase B (this push, the security core): the benign/destructive classifier, the reversibility gate, and the signed-op consuming layer over internal/authz — with role-scoping, op-to-action binding, idempotency/journaling, audit, and the crash-recovery consumer. The gate sits in front of the per-guest queue's executor, so every mutation passes it.

The whole module is race-clean and vet-clean on the Linux build server; 62 reconcile tests pass (the adversarial matrix runs against the real authz.Verifier).

The security model (Phase B)

Hub-supplied intent is no longer trusted for destructive change — by provenance + data-bearing-ness, not by verb (doc 03 §4):

  • Benign (unsigned): start/stop/restart/create, and destroying a resource the agent created in the same journaled transaction (compensating rollback) or tagged scratch. That scratch/same-txn provenance is agent-internal, journal-recorded, and never accepted from the hub — a compromised hub cannot relabel a data-bearing guest as scratch to walk the gate.
  • Destructive (signature required): destroy/overwrite of the only/primary copy of customer data — regardless of whether it arrives as a job or a desired-state delta. Absent/invalid signature → refused pending_signature, never executed.

The signed-op consuming layer calls authz.Verifier.Verify (the locked namespace→allow-list→crypto→target→time→nonce pipeline, untouched) and then enforces the slice-4 policy on the VerifiedOp: role-scoping (recovery key = key-rotation only; operational key = ordinary destructive + planned rotation, doc 04 §4) and op-to-action binding (the verified op + host + guest + params must name the exact gated action). Idempotency keys the journal by the op nonce; every decision is audited (a signal, never the guard).

Inert by design (slice-4 scope)

There is no live destructive execution this slice: nothing serves destructive deltas until slice 10, and the guest-destroy/storage-wipe/restore-overwrite executors land in 6/7. So the destructive path is fully classified, gated, and adversarially tested, but RunSignedJob's executor is nil in production — an authorized destructive op is journaled as authorized-but-not-executed. Reconcile itself only produces the benign Start/Stop/SetConfig set, all allowed through the gate unsigned.

Adversarial proof (each case independently rejected)

Run against the real authz.Verifier with in-test-minted SSHSIGs (the ~40-line framing is replicated in reconcile's test binary — production authz is untouched and gains no signing capability; live minting is required because the verifier's clock is not cross-package injectable):

unsigned destructive job → pending_signature · unsigned destructive desired-state delta → pending_signature (distrusts hub desired state, not just jobs) · forged / unknown signer → ErrUnknownSigner · expired → ErrExpired · replayed nonce across an agent restart (durable FileNonceStore) → ErrReplay · wrong host → ErrTarget · wrong guest / wrong op / wrong params → binding_mismatch · recovery key on ordinary destructive → role_denied · hub-supplied "scratch" tag on a data-bearing guest → ignored, still destructive → refused · valid + correct role + correct target + fresh nonce → accepted, and a second presentation → ErrReplay.

The two forward-looking notes

  • Note 1 (carried in) — the InFlight() resume-or-rollback startup consumer (Engine.Recover) landed together with the signed-op executor, as required. An op that crashed after the Proxmox POST but before its terminal record (OpTaskRunning, nonce already consumed) is not covered by idempotency dedupe — only this consumer resolves it (re-read the task via the new TaskStatusOnce, record the real outcome; a no-task-id op is abandoned fail-safe). Wired into daemon startup and tested.
  • Note 2 (addressed) — the memory comparison is canonicalized (desiredMemoryMiB): desired and actual compare in the same MiB unit that is then written, so a non-MiB-aligned MemoryBytes converges in one pass rather than re-issuing SetConfig every cycle. A test proves convergence. Recommendation stands that slice 10 serve MiB-aligned specs at the source.

Verification

  • go test -race -count=1 ./... and go vet ./... clean on the Linux build server (go1.26); all tests green locally and there.
  • No live Proxmox needed — Phase A is unfed and Phase B's destructive path is inert this slice. The gate's crypto path is proven end-to-end against the real verifier.

Conventions

Version → v0.4.0. CHANGELOG has a per-phase entry (newest on top). No secrets in any committed file. Pushed to main. Per the task, I stop at this checkpoint and await the validation pass.