v0.4.0: slice 4 Phase B — reversibility gate + signed-op consuming layer
The security core of slice 4: hub-supplied intent is no longer trusted for destructive change. The gate fronts the per-guest queue's executor, so every mutation passes it. Reuses internal/authz for all crypto (surface untouched). - Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing- ness, NOT by verb. Destroy/overwrite of customer data is destructive unless agent-internal provenance (same-journaled-txn create, or agent-tagged scratch) makes it benign — and that provenance is journal-recorded, NEVER hub-sourced. Unknown op class fails safe to destructive. - Reversibility gate: benign -> allowed unsigned; destructive -> requires a verified, role-scoped, action-bound operator signature, else pending_signature and never executed. Every decision audited (signal, never the guard). - Signed-op consuming layer over authz.Verifier.Verify (locked pipeline untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational= ordinary destructive + planned rotation) + op-to-action binding (op+host+ guest+params must match the gated action). - Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped execution via an injected DestructiveExecutor (nil this slice — inert). - Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at startup (resume-or-rollback) — covers an op that crashed after the POST and before its terminal record, which idempotency dedupe alone cannot. Added TaskStatusOnce to the GuestAPI seam. Wired into daemon startup. - Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift. - Daemon: builds the verifier from config signers (none = nil verifier, the common slice-4 state), the gate (+SlogAudit), runs Recover before mutating. Adversarial matrix proven against the REAL authz.Verifier with in-test-minted SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no signing added to the verify-only package): unsigned job + unsigned desired-state delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong host -> typed authz rejections; wrong guest/op/params -> binding_mismatch; recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag ignored -> refused; valid+role+target+fresh nonce -> accepted then replay rejected. Full module race-clean + vet-clean on the Linux build server. Inert this slice: no destructive deltas served until slice 10; the destructive path is classified, gated, and tested but not wired to live execution. CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,6 +3,75 @@
|
||||
All notable changes to **felhom-agent** are recorded here. Update on every code
|
||||
change that gets pushed.
|
||||
|
||||
## v0.4.0 — slice 4 Phase B: reversibility gate + signed-op consuming layer (2026-06-08)
|
||||
|
||||
The security core of slice 4: hub-supplied intent stops being trusted for destructive
|
||||
change. Layered in front of the per-guest queue's executor — **every** mutation now
|
||||
passes the gate. Reuses `internal/authz` for all crypto (untouched surface). Inert
|
||||
this slice: no destructive deltas are served until slice 10, so the destructive path is
|
||||
classified, gated, and adversarially tested but not wired to live execution.
|
||||
|
||||
### Added
|
||||
- **Classifier (`classify.go`, doc 03 §4)** — benign vs destructive by **provenance +
|
||||
data-bearing-ness, NOT by verb**. The `OpClass` vocabulary (seeded by the committed
|
||||
slice-2 `op_blob.json`: `guest_destroy`) is the agent-side contract slice 10 matches.
|
||||
Destroy/overwrite of customer data is destructive UNLESS **agent-internal**
|
||||
provenance (same-journaled-transaction create → compensating rollback, or
|
||||
agent-tagged scratch) makes it benign. `Provenance` is journal-recorded and **never
|
||||
populated from the hub** (its zero value is the only thing an external intent may
|
||||
carry). Unknown op class fails safe → destructive.
|
||||
- **Reversibility gate (`gate.go`)** — `Gate.Authorize(intent, signed)`: benign →
|
||||
allowed unsigned; destructive → requires a verified, role-authorized, action-bound
|
||||
operator signature, else refused **`pending_signature`**, never executed. Every
|
||||
decision is written to an `AuditSink` (audit is a signal, never the guard).
|
||||
- **Signed-op consuming layer over `authz`** — verifies via `authz.Verifier.Verify`
|
||||
(the locked pipeline, untouched), then enforces on the `VerifiedOp`:
|
||||
- **Role-scoping (doc 04 §4)** — recovery key authorizes key-rotation re-pins ONLY;
|
||||
operational key authorizes ordinary destructive ops + planned rotation.
|
||||
- **Op-to-action binding** — verified `op` + host + guest + `params` must match the
|
||||
gated action (a signature for guest X / op A can't authorize guest Y / op B);
|
||||
params compared semantically (key-order/whitespace independent).
|
||||
- **Signed-job orchestration (`job.go`)** — `RunSignedJob`: idempotency dedupe (the
|
||||
op nonce as the journal key — a redelivered completed op is skipped, not re-run),
|
||||
gate authorization, then journal-wrapped execution via an injected
|
||||
`DestructiveExecutor` (nil this slice — authorized destructive ops are inert, no
|
||||
executor wired until 6/7).
|
||||
- **Crash-recovery consumer (`recover.go`, Note 1 / doc 03 §10)** — `Engine.Recover`
|
||||
consumes the journal's `InFlight()` at startup: an op that crashed AFTER the Proxmox
|
||||
POST and BEFORE its terminal record (`OpTaskRunning`, nonce already consumed) is NOT
|
||||
covered by idempotency dedupe — only this resume-or-rollback resolves it (re-read the
|
||||
task via the new `TaskStatusOnce`, record the real outcome; a no-task-id op is
|
||||
abandoned fail-safe). Landed together with the signed-op executor, as Note 1 required.
|
||||
- **Daemon wiring** — `runDaemon` builds the verifier from `config.Authz.Signers` (a
|
||||
bad key / missing nonce-store path is a fatal misconfig; **no signers = nil verifier**,
|
||||
the common slice-4 state), constructs the gate (+ `SlogAudit`), runs `Recover` before
|
||||
issuing any mutation, and routes every reconcile action through the gate.
|
||||
|
||||
### Changed
|
||||
- **Memory comparison canonicalized (Note 2)** — `desiredMemoryMiB` makes the
|
||||
desired↔actual memory compare in the same MiB unit that is then written, so a
|
||||
non-MiB-aligned `MemoryBytes` converges in one pass instead of re-issuing SetConfig
|
||||
forever (the numeric cousin of the description-newline normalization). Test proves
|
||||
convergence. Slice 10 should still serve MiB-aligned specs at the source.
|
||||
|
||||
### Tests (the security proof — each independently rejected)
|
||||
- **Adversarial matrix** via the REAL `authz.Verifier` with in-test-minted SSHSIGs
|
||||
(framing replicated in reconcile's test binary; production authz untouched, no signing
|
||||
added to the verify-only package): unsigned destructive **job** → pending_signature;
|
||||
unsigned destructive **desired-state delta** → pending_signature (distrusts hub
|
||||
desired state, not just jobs); forged/unknown signer → `ErrUnknownSigner`; expired →
|
||||
`ErrExpired`; **replayed nonce across an agent restart** (durable `FileNonceStore`) →
|
||||
`ErrReplay`; wrong host → `ErrTarget`; wrong guest / wrong op / wrong params →
|
||||
binding_mismatch; **recovery key on ordinary destructive** → role_denied;
|
||||
**hub-supplied "scratch" tag ignored** → still destructive → refused; **valid + role +
|
||||
target + fresh nonce → accepted**, and a second presentation → `ErrReplay` (nonce
|
||||
consumed).
|
||||
- Classifier (benign/destructive/provenance/key-rotation/fail-safe), role-scoping,
|
||||
params binding, crash-recovery (resume OK / fail / still-running / no-task rollback /
|
||||
unreadable / one-shot key applied on resume), signed-job idempotency (execute once,
|
||||
dedupe redelivery, refused-not-executed, no-executor-inert, executor-error).
|
||||
- Full module **race-clean** (`go test -race`) + vet clean on the Linux build server.
|
||||
|
||||
## v0.4.0-rc1 — slice 4 Phase A: reconcile engine (structural; runs live, unfed) (2026-06-08)
|
||||
|
||||
The agent-side control core's structural half. **Checkpoint marker** — `-rc1` is the
|
||||
|
||||
Reference in New Issue
Block a user