Files
felhom-agent/internal/reconcile/job.go
T
admin 1af21a6cac v0.4.0: slice 4 Phase B — reversibility gate + signed-op consuming layer
The security core of slice 4: hub-supplied intent is no longer trusted for
destructive change. The gate fronts the per-guest queue's executor, so every
mutation passes it. Reuses internal/authz for all crypto (surface untouched).

- Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing-
  ness, NOT by verb. Destroy/overwrite of customer data is destructive unless
  agent-internal provenance (same-journaled-txn create, or agent-tagged scratch)
  makes it benign — and that provenance is journal-recorded, NEVER hub-sourced.
  Unknown op class fails safe to destructive.
- Reversibility gate: benign -> allowed unsigned; destructive -> requires a
  verified, role-scoped, action-bound operator signature, else pending_signature
  and never executed. Every decision audited (signal, never the guard).
- Signed-op consuming layer over authz.Verifier.Verify (locked pipeline
  untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational=
  ordinary destructive + planned rotation) + op-to-action binding (op+host+
  guest+params must match the gated action).
- Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped
  execution via an injected DestructiveExecutor (nil this slice — inert).
- Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at
  startup (resume-or-rollback) — covers an op that crashed after the POST and
  before its terminal record, which idempotency dedupe alone cannot. Added
  TaskStatusOnce to the GuestAPI seam. Wired into daemon startup.
- Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a
  non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift.
- Daemon: builds the verifier from config signers (none = nil verifier, the
  common slice-4 state), the gate (+SlogAudit), runs Recover before mutating.

Adversarial matrix proven against the REAL authz.Verifier with in-test-minted
SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no
signing added to the verify-only package): unsigned job + unsigned desired-state
delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong
host -> typed authz rejections; wrong guest/op/params -> binding_mismatch;
recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag
ignored -> refused; valid+role+target+fresh nonce -> accepted then replay
rejected. Full module race-clean + vet-clean on the Linux build server.

Inert this slice: no destructive deltas served until slice 10; the destructive
path is classified, gated, and tested but not wired to live execution.

CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 23:56:20 +02:00

121 lines
5.3 KiB
Go

package reconcile
import (
"context"
"encoding/json"
"fmt"
"time"
"gitea.dooplex.hu/admin/felhom-agent/internal/authz"
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
)
// DestructiveExecutor performs an authorized destructive op against the host. At slice
// 4 there is NO live implementation (guest-destroy / storage-wipe / restore-overwrite
// executors land in slices 6/7) — the consuming layer is wired and tested with fixture
// executors but never executes a real destructive op, because nothing serves
// destructive deltas until slice 10. It returns a Proxmox UPID (or "" for a synchronous
// op) so the journal/Recover path is identical to benign execution.
type DestructiveExecutor func(ctx context.Context, intent Intent, vop *authz.VerifiedOp) (upid string, err error)
// JobResult is the outcome of RunSignedJob.
type JobResult struct {
Decision Decision
AlreadyApplied bool // the op's idempotency key was already applied (deduped, not re-run)
Executed bool // the executor ran and succeeded
Err error // execution error (after a successful authorization)
}
// RunSignedJob is the signed one-shot consuming layer (doc 03 §4(b) / doc 04). It adds
// idempotency dedupe + journaling around the gate:
//
// 1. Dedupe: if the op's idempotency key (its nonce) is already applied, skip — a
// redelivered, already-completed op must not re-run (returns AlreadyApplied).
// 2. Gate: classify + verify + role-scope + op-to-action bind. A refusal returns the
// Decision and executes nothing.
// 3. Journal + execute: record started → run the executor → record the task id →
// record the terminal state under the idempotency key (so success marks the key
// applied; a crash mid-execute is resolved by Recover, never by idempotency alone).
//
// exec may be nil — then an AUTHORIZED destructive op is journaled as authorized but
// not executed (the slice-4 inert state: the gate works, the executor doesn't exist
// yet). A REFUSED op never reaches exec.
func (e *Engine) RunSignedJob(ctx context.Context, intent Intent, signed *SignedOp, exec DestructiveExecutor) JobResult {
idemKey := jobIdempotencyKey(signed)
// 1. Idempotency dedupe (redelivery after a prior success).
if idemKey != "" && e.journal != nil && e.journal.AlreadyApplied(idemKey) {
e.logger.Info("job: idempotency key already applied; skipping", "key", auditNonce(idemKey))
return JobResult{AlreadyApplied: true, Decision: Decision{Allowed: true, Reason: ReasonSigned}}
}
// 2. Gate (classification + the full signed-op consuming policy).
dec := e.gate.Authorize(intent, signed)
if !dec.Allowed {
return JobResult{Decision: dec}
}
// 3. Journal + execute. Benign authorized ops (no signature path) also flow here if
// routed as jobs; they carry no idempotency key and are simply executed.
opID := e.nextJobOpID(intent)
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
State: OpStarted, IdempKey: idemKey, At: time.Now().UTC()})
if exec == nil {
// Slice-4 inert: authorized, but no destructive executor wired. Record the
// authorization terminally (do NOT mark applied — nothing actually ran).
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
State: OpFailed, IdempKey: "", At: time.Now().UTC()})
e.logger.Warn("job: authorized but no executor wired (slice-4 inert)", "class", intent.Class)
return JobResult{Decision: dec, Err: fmt.Errorf("reconcile: no executor for %s (not wired this slice)", intent.Class)}
}
upid, err := exec(ctx, intent, dec.Verified)
if err != nil {
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
State: OpFailed, IdempKey: idemKey, At: time.Now().UTC()})
return JobResult{Decision: dec, Err: err}
}
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
UPID: upid, State: OpTaskRunning, IdempKey: idemKey, At: time.Now().UTC()})
if upid != "" {
st, err := e.api.WaitTask(ctx, upid, proxmox.WaitOptions{})
if err != nil {
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
UPID: upid, State: OpFailed, IdempKey: idemKey, At: time.Now().UTC()})
return JobResult{Decision: dec, Err: err}
}
_ = st
}
// Terminal success — marks the idempotency key applied (survives restart).
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
UPID: upid, State: OpSucceeded, IdempKey: idemKey, At: time.Now().UTC()})
return JobResult{Decision: dec, Executed: true}
}
// jobIdempotencyKey derives the idempotency key from the signed op's nonce — unique
// per op (≥128-bit, doc 04 §2.1) and already the anti-replay token, so reusing it as
// the journal dedupe key is exact. Parsed from the UNVERIFIED blob: it is only a map
// key here (the gate's verifier is the trust boundary), and a forged blob is refused at
// the gate regardless.
func jobIdempotencyKey(signed *SignedOp) string {
if signed == nil || len(signed.Blob) == 0 {
return ""
}
var b struct {
Nonce string `json:"nonce"`
}
if json.Unmarshal(signed.Blob, &b) != nil {
return ""
}
return b.Nonce
}
// nextJobOpID builds a per-attempt op id for a signed job (distinct namespace from
// reconcile op ids).
func (e *Engine) nextJobOpID(intent Intent) string {
return "job-" + string(intent.Class) + "-" + intent.GuestID + "-" + nextSeq(&e.opSeq)
}