1af21a6cac
The security core of slice 4: hub-supplied intent is no longer trusted for destructive change. The gate fronts the per-guest queue's executor, so every mutation passes it. Reuses internal/authz for all crypto (surface untouched). - Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing- ness, NOT by verb. Destroy/overwrite of customer data is destructive unless agent-internal provenance (same-journaled-txn create, or agent-tagged scratch) makes it benign — and that provenance is journal-recorded, NEVER hub-sourced. Unknown op class fails safe to destructive. - Reversibility gate: benign -> allowed unsigned; destructive -> requires a verified, role-scoped, action-bound operator signature, else pending_signature and never executed. Every decision audited (signal, never the guard). - Signed-op consuming layer over authz.Verifier.Verify (locked pipeline untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational= ordinary destructive + planned rotation) + op-to-action binding (op+host+ guest+params must match the gated action). - Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped execution via an injected DestructiveExecutor (nil this slice — inert). - Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at startup (resume-or-rollback) — covers an op that crashed after the POST and before its terminal record, which idempotency dedupe alone cannot. Added TaskStatusOnce to the GuestAPI seam. Wired into daemon startup. - Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift. - Daemon: builds the verifier from config signers (none = nil verifier, the common slice-4 state), the gate (+SlogAudit), runs Recover before mutating. Adversarial matrix proven against the REAL authz.Verifier with in-test-minted SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no signing added to the verify-only package): unsigned job + unsigned desired-state delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong host -> typed authz rejections; wrong guest/op/params -> binding_mismatch; recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag ignored -> refused; valid+role+target+fresh nonce -> accepted then replay rejected. Full module race-clean + vet-clean on the Linux build server. Inert this slice: no destructive deltas served until slice 10; the destructive path is classified, gated, and tested but not wired to live execution. CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
121 lines
5.3 KiB
Go
121 lines
5.3 KiB
Go
package reconcile
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"time"
|
|
|
|
"gitea.dooplex.hu/admin/felhom-agent/internal/authz"
|
|
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
|
|
)
|
|
|
|
// DestructiveExecutor performs an authorized destructive op against the host. At slice
|
|
// 4 there is NO live implementation (guest-destroy / storage-wipe / restore-overwrite
|
|
// executors land in slices 6/7) — the consuming layer is wired and tested with fixture
|
|
// executors but never executes a real destructive op, because nothing serves
|
|
// destructive deltas until slice 10. It returns a Proxmox UPID (or "" for a synchronous
|
|
// op) so the journal/Recover path is identical to benign execution.
|
|
type DestructiveExecutor func(ctx context.Context, intent Intent, vop *authz.VerifiedOp) (upid string, err error)
|
|
|
|
// JobResult is the outcome of RunSignedJob.
|
|
type JobResult struct {
|
|
Decision Decision
|
|
AlreadyApplied bool // the op's idempotency key was already applied (deduped, not re-run)
|
|
Executed bool // the executor ran and succeeded
|
|
Err error // execution error (after a successful authorization)
|
|
}
|
|
|
|
// RunSignedJob is the signed one-shot consuming layer (doc 03 §4(b) / doc 04). It adds
|
|
// idempotency dedupe + journaling around the gate:
|
|
//
|
|
// 1. Dedupe: if the op's idempotency key (its nonce) is already applied, skip — a
|
|
// redelivered, already-completed op must not re-run (returns AlreadyApplied).
|
|
// 2. Gate: classify + verify + role-scope + op-to-action bind. A refusal returns the
|
|
// Decision and executes nothing.
|
|
// 3. Journal + execute: record started → run the executor → record the task id →
|
|
// record the terminal state under the idempotency key (so success marks the key
|
|
// applied; a crash mid-execute is resolved by Recover, never by idempotency alone).
|
|
//
|
|
// exec may be nil — then an AUTHORIZED destructive op is journaled as authorized but
|
|
// not executed (the slice-4 inert state: the gate works, the executor doesn't exist
|
|
// yet). A REFUSED op never reaches exec.
|
|
func (e *Engine) RunSignedJob(ctx context.Context, intent Intent, signed *SignedOp, exec DestructiveExecutor) JobResult {
|
|
idemKey := jobIdempotencyKey(signed)
|
|
|
|
// 1. Idempotency dedupe (redelivery after a prior success).
|
|
if idemKey != "" && e.journal != nil && e.journal.AlreadyApplied(idemKey) {
|
|
e.logger.Info("job: idempotency key already applied; skipping", "key", auditNonce(idemKey))
|
|
return JobResult{AlreadyApplied: true, Decision: Decision{Allowed: true, Reason: ReasonSigned}}
|
|
}
|
|
|
|
// 2. Gate (classification + the full signed-op consuming policy).
|
|
dec := e.gate.Authorize(intent, signed)
|
|
if !dec.Allowed {
|
|
return JobResult{Decision: dec}
|
|
}
|
|
|
|
// 3. Journal + execute. Benign authorized ops (no signature path) also flow here if
|
|
// routed as jobs; they carry no idempotency key and are simply executed.
|
|
opID := e.nextJobOpID(intent)
|
|
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
|
|
State: OpStarted, IdempKey: idemKey, At: time.Now().UTC()})
|
|
|
|
if exec == nil {
|
|
// Slice-4 inert: authorized, but no destructive executor wired. Record the
|
|
// authorization terminally (do NOT mark applied — nothing actually ran).
|
|
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
|
|
State: OpFailed, IdempKey: "", At: time.Now().UTC()})
|
|
e.logger.Warn("job: authorized but no executor wired (slice-4 inert)", "class", intent.Class)
|
|
return JobResult{Decision: dec, Err: fmt.Errorf("reconcile: no executor for %s (not wired this slice)", intent.Class)}
|
|
}
|
|
|
|
upid, err := exec(ctx, intent, dec.Verified)
|
|
if err != nil {
|
|
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
|
|
State: OpFailed, IdempKey: idemKey, At: time.Now().UTC()})
|
|
return JobResult{Decision: dec, Err: err}
|
|
}
|
|
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
|
|
UPID: upid, State: OpTaskRunning, IdempKey: idemKey, At: time.Now().UTC()})
|
|
|
|
if upid != "" {
|
|
st, err := e.api.WaitTask(ctx, upid, proxmox.WaitOptions{})
|
|
if err != nil {
|
|
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
|
|
UPID: upid, State: OpFailed, IdempKey: idemKey, At: time.Now().UTC()})
|
|
return JobResult{Decision: dec, Err: err}
|
|
}
|
|
_ = st
|
|
}
|
|
|
|
// Terminal success — marks the idempotency key applied (survives restart).
|
|
e.append(JournalEntry{OpID: opID, VMID: intent.VMID, Kind: string(intent.Class),
|
|
UPID: upid, State: OpSucceeded, IdempKey: idemKey, At: time.Now().UTC()})
|
|
return JobResult{Decision: dec, Executed: true}
|
|
}
|
|
|
|
// jobIdempotencyKey derives the idempotency key from the signed op's nonce — unique
|
|
// per op (≥128-bit, doc 04 §2.1) and already the anti-replay token, so reusing it as
|
|
// the journal dedupe key is exact. Parsed from the UNVERIFIED blob: it is only a map
|
|
// key here (the gate's verifier is the trust boundary), and a forged blob is refused at
|
|
// the gate regardless.
|
|
func jobIdempotencyKey(signed *SignedOp) string {
|
|
if signed == nil || len(signed.Blob) == 0 {
|
|
return ""
|
|
}
|
|
var b struct {
|
|
Nonce string `json:"nonce"`
|
|
}
|
|
if json.Unmarshal(signed.Blob, &b) != nil {
|
|
return ""
|
|
}
|
|
return b.Nonce
|
|
}
|
|
|
|
// nextJobOpID builds a per-attempt op id for a signed job (distinct namespace from
|
|
// reconcile op ids).
|
|
func (e *Engine) nextJobOpID(intent Intent) string {
|
|
return "job-" + string(intent.Class) + "-" + intent.GuestID + "-" + nextSeq(&e.opSeq)
|
|
}
|