Files
felhom-agent/internal/reconcile/gate.go
T
admin 1af21a6cac v0.4.0: slice 4 Phase B — reversibility gate + signed-op consuming layer
The security core of slice 4: hub-supplied intent is no longer trusted for
destructive change. The gate fronts the per-guest queue's executor, so every
mutation passes it. Reuses internal/authz for all crypto (surface untouched).

- Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing-
  ness, NOT by verb. Destroy/overwrite of customer data is destructive unless
  agent-internal provenance (same-journaled-txn create, or agent-tagged scratch)
  makes it benign — and that provenance is journal-recorded, NEVER hub-sourced.
  Unknown op class fails safe to destructive.
- Reversibility gate: benign -> allowed unsigned; destructive -> requires a
  verified, role-scoped, action-bound operator signature, else pending_signature
  and never executed. Every decision audited (signal, never the guard).
- Signed-op consuming layer over authz.Verifier.Verify (locked pipeline
  untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational=
  ordinary destructive + planned rotation) + op-to-action binding (op+host+
  guest+params must match the gated action).
- Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped
  execution via an injected DestructiveExecutor (nil this slice — inert).
- Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at
  startup (resume-or-rollback) — covers an op that crashed after the POST and
  before its terminal record, which idempotency dedupe alone cannot. Added
  TaskStatusOnce to the GuestAPI seam. Wired into daemon startup.
- Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a
  non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift.
- Daemon: builds the verifier from config signers (none = nil verifier, the
  common slice-4 state), the gate (+SlogAudit), runs Recover before mutating.

Adversarial matrix proven against the REAL authz.Verifier with in-test-minted
SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no
signing added to the verify-only package): unsigned job + unsigned desired-state
delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong
host -> typed authz rejections; wrong guest/op/params -> binding_mismatch;
recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag
ignored -> refused; valid+role+target+fresh nonce -> accepted then replay
rejected. Full module race-clean + vet-clean on the Linux build server.

Inert this slice: no destructive deltas served until slice 10; the destructive
path is classified, gated, and tested but not wired to live execution.

CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 23:56:20 +02:00

292 lines
10 KiB
Go

package reconcile
import (
"encoding/json"
"log/slog"
"reflect"
"strconv"
"time"
"gitea.dooplex.hu/admin/felhom-agent/internal/authz"
)
// SourceKind records where an intent came from — audit/debug ONLY. Classification
// does NOT depend on it: a destructive desired-state delta and a destructive one-shot
// job are gated identically (the agent distrusts hub desired state for destructive
// change, not just jobs — doc 03 §4).
type SourceKind string
const (
SourceDesiredDelta SourceKind = "desired_delta"
SourceOneShotJob SourceKind = "one_shot_job"
)
// Intent is an intended mutation presented to the gate. For benign reconcile actions
// the engine builds one per planned Action; destructive intents (jobs / deltas) carry
// their op class + canonical params for binding.
type Intent struct {
Class OpClass
HostID string
GuestID string // blob-style guest id ("" = host-scoped); matches OpBlob.target.guest_id
VMID int // numeric, for queue routing (0 = host-scoped)
// ParamsJSON is the canonical params (matching the signed blob's `params`) used for
// op-to-action binding on destructive ops. Nil for benign actions (not bound).
ParamsJSON json.RawMessage
// Provenance is AGENT-INTERNAL only (never hub-sourced) — see classify.go.
Provenance Provenance
Source SourceKind
}
// SignedOp is the opaque operator-signed blob+signature pair the hub queues (doc 04
// §5). The agent never trusts it until authz.Verifier.Verify passes.
type SignedOp struct {
Blob []byte // the canonical OpBlob JSON bytes (verified over RAW bytes)
Sig []byte // the armored SSHSIG
}
// RefuseReason is a stable, machine-readable gate refusal reason.
type RefuseReason string
const (
ReasonBenign RefuseReason = "benign" // allowed, no signature needed
ReasonSigned RefuseReason = "signed" // allowed by a verified op
ReasonPendingSignature RefuseReason = "pending_signature" // destructive, no/again-needed signature
ReasonRejected RefuseReason = "rejected" // signature failed authz verification
ReasonRoleDenied RefuseReason = "role_denied" // signer role not authorized for this op class
ReasonBindingMismatch RefuseReason = "binding_mismatch" // signature is for a different action
)
// Decision is the gate verdict.
type Decision struct {
Allowed bool
Disposition Disposition
Reason RefuseReason
// Verified is the authenticated op when a signature authorized the action.
Verified *authz.VerifiedOp
// Err is the underlying authz rejection (errors.Is-friendly: ErrUnknownSigner,
// ErrExpired, ErrReplay, …) when Reason == ReasonRejected.
Err error
}
// OpVerifier is the crypto verifier seam — *authz.Verifier in production; a fake in
// gate unit tests. The gate never re-implements any crypto; it only consumes the
// verdict and enforces the policy layer on top (role-scoping + op-to-action binding).
type OpVerifier interface {
Verify(blob, sigArmored []byte) (*authz.VerifiedOp, error)
}
// AuditSink records every gate decision to the customer-visible audit log. Audit is a
// SIGNAL, never the guard (doc 03 §4 / doc 04 §5): a compromised hub could suppress a
// notice, which is exactly why the signature — not the audit — is the control.
type AuditSink interface {
Record(rec AuditRecord)
}
// AuditRecord is one audited gate decision.
type AuditRecord struct {
Time time.Time
Class OpClass
HostID string
GuestID string
Source SourceKind
Disposition Disposition
Allowed bool
Reason RefuseReason
KeyID string // matched signer's key id, when signed
Nonce string // the op nonce, when signed
}
// Gate is the reversibility gate: it sits in front of the per-guest queue's executor
// so EVERY mutation passes it. Benign intents are allowed unsigned; destructive
// intents require a verified, role-authorized, action-bound operator signature, else
// they are refused with pending_signature and never executed.
type Gate struct {
verifier OpVerifier // may be nil (no signers pinned) → destructive is always pending_signature
hostID string
audit AuditSink
logger *slog.Logger
}
// NewGate builds a gate. verifier may be nil when no signers are configured (the
// common slice-4 state) — then there is nothing destructive to authorize and any
// destructive intent is refused pending_signature. audit/logger default to no-ops.
func NewGate(verifier OpVerifier, hostID string, audit AuditSink, logger *slog.Logger) *Gate {
if audit == nil {
audit = noopAudit{}
}
if logger == nil {
logger = slog.New(slog.NewTextHandler(discard{}, nil))
}
return &Gate{verifier: verifier, hostID: hostID, audit: audit, logger: logger}
}
// Authorize classifies the intent and, for destructive intents, runs the full
// consuming-layer policy over the verifier verdict. It writes the decision to the
// audit log and returns it. It NEVER executes anything — the caller dispatches an
// Allowed decision onto the queue.
func (g *Gate) Authorize(intent Intent, signed *SignedOp) Decision {
disp := Classify(intent.Class, intent.Provenance)
// Benign: allowed without a signature.
if disp == Benign {
d := Decision{Allowed: true, Disposition: Benign, Reason: ReasonBenign}
g.record(intent, d)
return d
}
// Destructive from here: a verified, role-authorized, action-bound signature is
// mandatory. Missing signature OR no pinned verifier → pending_signature (refuse).
if signed == nil || g.verifier == nil {
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonPendingSignature}
g.record(intent, d)
return d
}
// Crypto + namespace + allow-list + target + time + nonce — the LOCKED authz
// pipeline. The nonce is consumed (recorded) only if this passes.
vop, err := g.verifier.Verify(signed.Blob, signed.Sig)
if err != nil {
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonRejected, Err: err}
g.record(intent, d)
return d
}
// Role-scoping (the slice-4 job per verifier.go): the signer's pinned role must be
// authorized for THIS op class.
if !roleAuthorizes(vop.Signer.Role, intent.Class) {
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonRoleDenied, Verified: vop}
g.record(intent, d)
return d
}
// Op-to-action binding: the verified op must name THIS exact action (op + target +
// params) — a signature for "restore guest X" cannot authorize destroying guest Y.
if !g.bindsToAction(vop, intent) {
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonBindingMismatch, Verified: vop}
g.record(intent, d)
return d
}
d := Decision{Allowed: true, Disposition: Destructive, Reason: ReasonSigned, Verified: vop}
g.record(intent, d)
return d
}
// roleAuthorizes enforces the doc 04 §4 two-key role model: the cold recovery key
// authorizes ONLY key-rotation re-pins; the operational key authorizes ordinary
// destructive ops AND planned key-rotation.
func roleAuthorizes(role authz.KeyRole, class OpClass) bool {
if class == ClassKeyRotation {
return role == authz.RoleOperational || role == authz.RoleRecovery
}
return role == authz.RoleOperational
}
// bindsToAction checks the verified op names this exact action: host (already checked
// by the verifier, re-asserted here), guest, op class, and params. This is the binding
// BEYOND the verifier's target check (doc 04 §2.3 binds host; this binds the full
// action).
func (g *Gate) bindsToAction(vop *authz.VerifiedOp, intent Intent) bool {
if vop.HostID != g.hostID || vop.HostID != intent.HostID {
return false
}
if vop.GuestID != intent.GuestID {
return false
}
if vop.Op != string(intent.Class) {
return false
}
return paramsEqual(vop.Params, intent.ParamsJSON)
}
// paramsEqual compares two JSON param objects semantically (key order / whitespace
// independent). Absent params on both sides ({} or empty) compare equal.
func paramsEqual(a, b json.RawMessage) bool {
ax, aok := decodeParams(a)
bx, bok := decodeParams(b)
if !aok || !bok {
return false
}
return reflect.DeepEqual(ax, bx)
}
func decodeParams(p json.RawMessage) (any, bool) {
if len(p) == 0 {
return map[string]any{}, true // absent == empty object
}
var v any
if err := json.Unmarshal(p, &v); err != nil {
return nil, false
}
if v == nil {
return map[string]any{}, true // explicit null == empty
}
return v, true
}
func (g *Gate) record(intent Intent, d Decision) {
rec := AuditRecord{
Time: time.Now().UTC(),
Class: intent.Class,
HostID: intent.HostID,
GuestID: intent.GuestID,
Source: intent.Source,
Disposition: d.Disposition,
Allowed: d.Allowed,
Reason: d.Reason,
}
if d.Verified != nil {
rec.KeyID = d.Verified.Signer.KeyID
rec.Nonce = d.Verified.Nonce
}
g.audit.Record(rec)
g.logger.Info("gate decision",
"class", intent.Class, "guest", intent.GuestID, "source", intent.Source,
"disposition", d.Disposition, "allowed", d.Allowed, "reason", d.Reason)
}
// intentForAction builds the gate Intent for a benign reconcile action. The provenance
// is the zero value (no agent-internal destroy evidence) and the source is the
// desired-state delta — reconcile never fabricates scratch/same-txn provenance.
func intentForAction(hostID string, act Action) Intent {
return Intent{
Class: classOfAction(act.Kind),
HostID: hostID,
GuestID: strconv.Itoa(act.VMID),
VMID: act.VMID,
Provenance: Provenance{}, // benign actions need none; never hub-sourced
Source: SourceDesiredDelta,
}
}
// noopAudit drops audit records (used when no sink is configured).
type noopAudit struct{}
func (noopAudit) Record(AuditRecord) {}
// SlogAudit is a minimal AuditSink that emits records to a logger. The durable,
// customer-visible audit log + its inclusion in the host-report (HostReport.AuditTail)
// is a later-slice concern; this keeps the signal flowing now without inventing that
// wire schema.
type SlogAudit struct{ Logger *slog.Logger }
// Record logs the audit entry at info level.
func (s SlogAudit) Record(rec AuditRecord) {
if s.Logger == nil {
return
}
s.Logger.Info("audit: gate decision",
"class", rec.Class, "host", rec.HostID, "guest", rec.GuestID, "source", rec.Source,
"disposition", rec.Disposition, "allowed", rec.Allowed, "reason", rec.Reason,
"key_id", rec.KeyID, "nonce", auditNonce(rec.Nonce))
}
// auditNonce shortens a nonce for the log (full nonce is high-cardinality; a prefix is
// enough to correlate without bloating logs).
func auditNonce(n string) string {
if len(n) <= 8 {
return n
}
return n[:8] + "…"
}