1af21a6cac
The security core of slice 4: hub-supplied intent is no longer trusted for destructive change. The gate fronts the per-guest queue's executor, so every mutation passes it. Reuses internal/authz for all crypto (surface untouched). - Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing- ness, NOT by verb. Destroy/overwrite of customer data is destructive unless agent-internal provenance (same-journaled-txn create, or agent-tagged scratch) makes it benign — and that provenance is journal-recorded, NEVER hub-sourced. Unknown op class fails safe to destructive. - Reversibility gate: benign -> allowed unsigned; destructive -> requires a verified, role-scoped, action-bound operator signature, else pending_signature and never executed. Every decision audited (signal, never the guard). - Signed-op consuming layer over authz.Verifier.Verify (locked pipeline untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational= ordinary destructive + planned rotation) + op-to-action binding (op+host+ guest+params must match the gated action). - Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped execution via an injected DestructiveExecutor (nil this slice — inert). - Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at startup (resume-or-rollback) — covers an op that crashed after the POST and before its terminal record, which idempotency dedupe alone cannot. Added TaskStatusOnce to the GuestAPI seam. Wired into daemon startup. - Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift. - Daemon: builds the verifier from config signers (none = nil verifier, the common slice-4 state), the gate (+SlogAudit), runs Recover before mutating. Adversarial matrix proven against the REAL authz.Verifier with in-test-minted SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no signing added to the verify-only package): unsigned job + unsigned desired-state delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong host -> typed authz rejections; wrong guest/op/params -> binding_mismatch; recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag ignored -> refused; valid+role+target+fresh nonce -> accepted then replay rejected. Full module race-clean + vet-clean on the Linux build server. Inert this slice: no destructive deltas served until slice 10; the destructive path is classified, gated, and tested but not wired to live execution. CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
292 lines
10 KiB
Go
292 lines
10 KiB
Go
package reconcile
|
|
|
|
import (
|
|
"encoding/json"
|
|
"log/slog"
|
|
"reflect"
|
|
"strconv"
|
|
"time"
|
|
|
|
"gitea.dooplex.hu/admin/felhom-agent/internal/authz"
|
|
)
|
|
|
|
// SourceKind records where an intent came from — audit/debug ONLY. Classification
|
|
// does NOT depend on it: a destructive desired-state delta and a destructive one-shot
|
|
// job are gated identically (the agent distrusts hub desired state for destructive
|
|
// change, not just jobs — doc 03 §4).
|
|
type SourceKind string
|
|
|
|
const (
|
|
SourceDesiredDelta SourceKind = "desired_delta"
|
|
SourceOneShotJob SourceKind = "one_shot_job"
|
|
)
|
|
|
|
// Intent is an intended mutation presented to the gate. For benign reconcile actions
|
|
// the engine builds one per planned Action; destructive intents (jobs / deltas) carry
|
|
// their op class + canonical params for binding.
|
|
type Intent struct {
|
|
Class OpClass
|
|
HostID string
|
|
GuestID string // blob-style guest id ("" = host-scoped); matches OpBlob.target.guest_id
|
|
VMID int // numeric, for queue routing (0 = host-scoped)
|
|
// ParamsJSON is the canonical params (matching the signed blob's `params`) used for
|
|
// op-to-action binding on destructive ops. Nil for benign actions (not bound).
|
|
ParamsJSON json.RawMessage
|
|
// Provenance is AGENT-INTERNAL only (never hub-sourced) — see classify.go.
|
|
Provenance Provenance
|
|
Source SourceKind
|
|
}
|
|
|
|
// SignedOp is the opaque operator-signed blob+signature pair the hub queues (doc 04
|
|
// §5). The agent never trusts it until authz.Verifier.Verify passes.
|
|
type SignedOp struct {
|
|
Blob []byte // the canonical OpBlob JSON bytes (verified over RAW bytes)
|
|
Sig []byte // the armored SSHSIG
|
|
}
|
|
|
|
// RefuseReason is a stable, machine-readable gate refusal reason.
|
|
type RefuseReason string
|
|
|
|
const (
|
|
ReasonBenign RefuseReason = "benign" // allowed, no signature needed
|
|
ReasonSigned RefuseReason = "signed" // allowed by a verified op
|
|
ReasonPendingSignature RefuseReason = "pending_signature" // destructive, no/again-needed signature
|
|
ReasonRejected RefuseReason = "rejected" // signature failed authz verification
|
|
ReasonRoleDenied RefuseReason = "role_denied" // signer role not authorized for this op class
|
|
ReasonBindingMismatch RefuseReason = "binding_mismatch" // signature is for a different action
|
|
)
|
|
|
|
// Decision is the gate verdict.
|
|
type Decision struct {
|
|
Allowed bool
|
|
Disposition Disposition
|
|
Reason RefuseReason
|
|
// Verified is the authenticated op when a signature authorized the action.
|
|
Verified *authz.VerifiedOp
|
|
// Err is the underlying authz rejection (errors.Is-friendly: ErrUnknownSigner,
|
|
// ErrExpired, ErrReplay, …) when Reason == ReasonRejected.
|
|
Err error
|
|
}
|
|
|
|
// OpVerifier is the crypto verifier seam — *authz.Verifier in production; a fake in
|
|
// gate unit tests. The gate never re-implements any crypto; it only consumes the
|
|
// verdict and enforces the policy layer on top (role-scoping + op-to-action binding).
|
|
type OpVerifier interface {
|
|
Verify(blob, sigArmored []byte) (*authz.VerifiedOp, error)
|
|
}
|
|
|
|
// AuditSink records every gate decision to the customer-visible audit log. Audit is a
|
|
// SIGNAL, never the guard (doc 03 §4 / doc 04 §5): a compromised hub could suppress a
|
|
// notice, which is exactly why the signature — not the audit — is the control.
|
|
type AuditSink interface {
|
|
Record(rec AuditRecord)
|
|
}
|
|
|
|
// AuditRecord is one audited gate decision.
|
|
type AuditRecord struct {
|
|
Time time.Time
|
|
Class OpClass
|
|
HostID string
|
|
GuestID string
|
|
Source SourceKind
|
|
Disposition Disposition
|
|
Allowed bool
|
|
Reason RefuseReason
|
|
KeyID string // matched signer's key id, when signed
|
|
Nonce string // the op nonce, when signed
|
|
}
|
|
|
|
// Gate is the reversibility gate: it sits in front of the per-guest queue's executor
|
|
// so EVERY mutation passes it. Benign intents are allowed unsigned; destructive
|
|
// intents require a verified, role-authorized, action-bound operator signature, else
|
|
// they are refused with pending_signature and never executed.
|
|
type Gate struct {
|
|
verifier OpVerifier // may be nil (no signers pinned) → destructive is always pending_signature
|
|
hostID string
|
|
audit AuditSink
|
|
logger *slog.Logger
|
|
}
|
|
|
|
// NewGate builds a gate. verifier may be nil when no signers are configured (the
|
|
// common slice-4 state) — then there is nothing destructive to authorize and any
|
|
// destructive intent is refused pending_signature. audit/logger default to no-ops.
|
|
func NewGate(verifier OpVerifier, hostID string, audit AuditSink, logger *slog.Logger) *Gate {
|
|
if audit == nil {
|
|
audit = noopAudit{}
|
|
}
|
|
if logger == nil {
|
|
logger = slog.New(slog.NewTextHandler(discard{}, nil))
|
|
}
|
|
return &Gate{verifier: verifier, hostID: hostID, audit: audit, logger: logger}
|
|
}
|
|
|
|
// Authorize classifies the intent and, for destructive intents, runs the full
|
|
// consuming-layer policy over the verifier verdict. It writes the decision to the
|
|
// audit log and returns it. It NEVER executes anything — the caller dispatches an
|
|
// Allowed decision onto the queue.
|
|
func (g *Gate) Authorize(intent Intent, signed *SignedOp) Decision {
|
|
disp := Classify(intent.Class, intent.Provenance)
|
|
|
|
// Benign: allowed without a signature.
|
|
if disp == Benign {
|
|
d := Decision{Allowed: true, Disposition: Benign, Reason: ReasonBenign}
|
|
g.record(intent, d)
|
|
return d
|
|
}
|
|
|
|
// Destructive from here: a verified, role-authorized, action-bound signature is
|
|
// mandatory. Missing signature OR no pinned verifier → pending_signature (refuse).
|
|
if signed == nil || g.verifier == nil {
|
|
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonPendingSignature}
|
|
g.record(intent, d)
|
|
return d
|
|
}
|
|
|
|
// Crypto + namespace + allow-list + target + time + nonce — the LOCKED authz
|
|
// pipeline. The nonce is consumed (recorded) only if this passes.
|
|
vop, err := g.verifier.Verify(signed.Blob, signed.Sig)
|
|
if err != nil {
|
|
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonRejected, Err: err}
|
|
g.record(intent, d)
|
|
return d
|
|
}
|
|
|
|
// Role-scoping (the slice-4 job per verifier.go): the signer's pinned role must be
|
|
// authorized for THIS op class.
|
|
if !roleAuthorizes(vop.Signer.Role, intent.Class) {
|
|
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonRoleDenied, Verified: vop}
|
|
g.record(intent, d)
|
|
return d
|
|
}
|
|
|
|
// Op-to-action binding: the verified op must name THIS exact action (op + target +
|
|
// params) — a signature for "restore guest X" cannot authorize destroying guest Y.
|
|
if !g.bindsToAction(vop, intent) {
|
|
d := Decision{Allowed: false, Disposition: Destructive, Reason: ReasonBindingMismatch, Verified: vop}
|
|
g.record(intent, d)
|
|
return d
|
|
}
|
|
|
|
d := Decision{Allowed: true, Disposition: Destructive, Reason: ReasonSigned, Verified: vop}
|
|
g.record(intent, d)
|
|
return d
|
|
}
|
|
|
|
// roleAuthorizes enforces the doc 04 §4 two-key role model: the cold recovery key
|
|
// authorizes ONLY key-rotation re-pins; the operational key authorizes ordinary
|
|
// destructive ops AND planned key-rotation.
|
|
func roleAuthorizes(role authz.KeyRole, class OpClass) bool {
|
|
if class == ClassKeyRotation {
|
|
return role == authz.RoleOperational || role == authz.RoleRecovery
|
|
}
|
|
return role == authz.RoleOperational
|
|
}
|
|
|
|
// bindsToAction checks the verified op names this exact action: host (already checked
|
|
// by the verifier, re-asserted here), guest, op class, and params. This is the binding
|
|
// BEYOND the verifier's target check (doc 04 §2.3 binds host; this binds the full
|
|
// action).
|
|
func (g *Gate) bindsToAction(vop *authz.VerifiedOp, intent Intent) bool {
|
|
if vop.HostID != g.hostID || vop.HostID != intent.HostID {
|
|
return false
|
|
}
|
|
if vop.GuestID != intent.GuestID {
|
|
return false
|
|
}
|
|
if vop.Op != string(intent.Class) {
|
|
return false
|
|
}
|
|
return paramsEqual(vop.Params, intent.ParamsJSON)
|
|
}
|
|
|
|
// paramsEqual compares two JSON param objects semantically (key order / whitespace
|
|
// independent). Absent params on both sides ({} or empty) compare equal.
|
|
func paramsEqual(a, b json.RawMessage) bool {
|
|
ax, aok := decodeParams(a)
|
|
bx, bok := decodeParams(b)
|
|
if !aok || !bok {
|
|
return false
|
|
}
|
|
return reflect.DeepEqual(ax, bx)
|
|
}
|
|
|
|
func decodeParams(p json.RawMessage) (any, bool) {
|
|
if len(p) == 0 {
|
|
return map[string]any{}, true // absent == empty object
|
|
}
|
|
var v any
|
|
if err := json.Unmarshal(p, &v); err != nil {
|
|
return nil, false
|
|
}
|
|
if v == nil {
|
|
return map[string]any{}, true // explicit null == empty
|
|
}
|
|
return v, true
|
|
}
|
|
|
|
func (g *Gate) record(intent Intent, d Decision) {
|
|
rec := AuditRecord{
|
|
Time: time.Now().UTC(),
|
|
Class: intent.Class,
|
|
HostID: intent.HostID,
|
|
GuestID: intent.GuestID,
|
|
Source: intent.Source,
|
|
Disposition: d.Disposition,
|
|
Allowed: d.Allowed,
|
|
Reason: d.Reason,
|
|
}
|
|
if d.Verified != nil {
|
|
rec.KeyID = d.Verified.Signer.KeyID
|
|
rec.Nonce = d.Verified.Nonce
|
|
}
|
|
g.audit.Record(rec)
|
|
g.logger.Info("gate decision",
|
|
"class", intent.Class, "guest", intent.GuestID, "source", intent.Source,
|
|
"disposition", d.Disposition, "allowed", d.Allowed, "reason", d.Reason)
|
|
}
|
|
|
|
// intentForAction builds the gate Intent for a benign reconcile action. The provenance
|
|
// is the zero value (no agent-internal destroy evidence) and the source is the
|
|
// desired-state delta — reconcile never fabricates scratch/same-txn provenance.
|
|
func intentForAction(hostID string, act Action) Intent {
|
|
return Intent{
|
|
Class: classOfAction(act.Kind),
|
|
HostID: hostID,
|
|
GuestID: strconv.Itoa(act.VMID),
|
|
VMID: act.VMID,
|
|
Provenance: Provenance{}, // benign actions need none; never hub-sourced
|
|
Source: SourceDesiredDelta,
|
|
}
|
|
}
|
|
|
|
// noopAudit drops audit records (used when no sink is configured).
|
|
type noopAudit struct{}
|
|
|
|
func (noopAudit) Record(AuditRecord) {}
|
|
|
|
// SlogAudit is a minimal AuditSink that emits records to a logger. The durable,
|
|
// customer-visible audit log + its inclusion in the host-report (HostReport.AuditTail)
|
|
// is a later-slice concern; this keeps the signal flowing now without inventing that
|
|
// wire schema.
|
|
type SlogAudit struct{ Logger *slog.Logger }
|
|
|
|
// Record logs the audit entry at info level.
|
|
func (s SlogAudit) Record(rec AuditRecord) {
|
|
if s.Logger == nil {
|
|
return
|
|
}
|
|
s.Logger.Info("audit: gate decision",
|
|
"class", rec.Class, "host", rec.HostID, "guest", rec.GuestID, "source", rec.Source,
|
|
"disposition", rec.Disposition, "allowed", rec.Allowed, "reason", rec.Reason,
|
|
"key_id", rec.KeyID, "nonce", auditNonce(rec.Nonce))
|
|
}
|
|
|
|
// auditNonce shortens a nonce for the log (full nonce is high-cardinality; a prefix is
|
|
// enough to correlate without bloating logs).
|
|
func auditNonce(n string) string {
|
|
if len(n) <= 8 {
|
|
return n
|
|
}
|
|
return n[:8] + "…"
|
|
}
|