Files
felhom-agent/internal/reconcile/state.go
T
admin 1af21a6cac v0.4.0: slice 4 Phase B — reversibility gate + signed-op consuming layer
The security core of slice 4: hub-supplied intent is no longer trusted for
destructive change. The gate fronts the per-guest queue's executor, so every
mutation passes it. Reuses internal/authz for all crypto (surface untouched).

- Classifier (doc 03 §4): benign vs destructive by provenance + data-bearing-
  ness, NOT by verb. Destroy/overwrite of customer data is destructive unless
  agent-internal provenance (same-journaled-txn create, or agent-tagged scratch)
  makes it benign — and that provenance is journal-recorded, NEVER hub-sourced.
  Unknown op class fails safe to destructive.
- Reversibility gate: benign -> allowed unsigned; destructive -> requires a
  verified, role-scoped, action-bound operator signature, else pending_signature
  and never executed. Every decision audited (signal, never the guard).
- Signed-op consuming layer over authz.Verifier.Verify (locked pipeline
  untouched): role-scoping (doc 04 §4 — recovery=rotation only, operational=
  ordinary destructive + planned rotation) + op-to-action binding (op+host+
  guest+params must match the gated action).
- Signed-job orchestration: idempotency dedupe by nonce + journal-wrapped
  execution via an injected DestructiveExecutor (nil this slice — inert).
- Crash recovery (Note 1): Engine.Recover consumes the journal InFlight() set at
  startup (resume-or-rollback) — covers an op that crashed after the POST and
  before its terminal record, which idempotency dedupe alone cannot. Added
  TaskStatusOnce to the GuestAPI seam. Wired into daemon startup.
- Note 2: memory comparison canonicalized to MiB (desiredMemoryMiB) so a
  non-MiB-aligned MemoryBytes converges in one pass, not perpetual drift.
- Daemon: builds the verifier from config signers (none = nil verifier, the
  common slice-4 state), the gate (+SlogAudit), runs Recover before mutating.

Adversarial matrix proven against the REAL authz.Verifier with in-test-minted
SSHSIGs (framing replicated in reconcile's test binary; authz untouched, no
signing added to the verify-only package): unsigned job + unsigned desired-state
delta -> pending_signature; unknown signer/expired/replay-across-restart/wrong
host -> typed authz rejections; wrong guest/op/params -> binding_mismatch;
recovery key on ordinary destructive -> role_denied; hub-supplied scratch tag
ignored -> refused; valid+role+target+fresh nonce -> accepted then replay
rejected. Full module race-clean + vet-clean on the Linux build server.

Inert this slice: no destructive deltas served until slice 10; the destructive
path is classified, gated, and tested but not wired to live execution.

CHECKPOINT: Phase B complete (slice 4 done). Awaiting validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 23:56:20 +02:00

134 lines
5.3 KiB
Go

package reconcile
import (
"context"
"encoding/json"
"gitea.dooplex.hu/admin/felhom-agent/internal/hub"
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
)
// RunState is a guest's desired/actual power state. The empty value means
// "unmanaged" on the desired side (the reconciler leaves run-state alone).
type RunState string
const (
// RunUnspecified (the zero value) — on a DesiredGuest it means run-state is not
// managed; the reconciler never starts/stops the guest for run-state reasons.
RunUnspecified RunState = ""
// RunRunning maps to proxmox status "running".
RunRunning RunState = "running"
// RunStopped maps to proxmox status "stopped".
RunStopped RunState = "stopped"
)
// normRun maps a raw proxmox status string to a RunState, collapsing anything
// unrecognized (e.g. "") to RunUnspecified so actual-state comparison is well-defined.
func normRun(status string) RunState {
switch status {
case "running":
return RunRunning
case "stopped":
return RunStopped
default:
return RunUnspecified
}
}
// DesiredGuest is the target state for one existing guest. Every field is
// individually optional ("unmanaged") so a desired-state source can pin only what it
// cares about — slice 4's planner only acts on the fields that are set.
type DesiredGuest struct {
VMID int
// Run is the target power state; RunUnspecified leaves it alone.
Run RunState
// Spec, when non-nil, manages sizing. Reuses hub.GuestSpec (cores/memory/disk).
// Phase A reconciles Cores and Memory via SetConfig; DiskBytes is reported but
// NOT reconciled here (a rootfs grow is `pct resize`, grow-only and separate —
// deferred to a later slice). Nil = sizing unmanaged.
Spec *hub.GuestSpec
// Description, when non-nil, manages the cosmetic `description` field (the first
// proven SetConfig round-trip, slice-4 pre-check). Nil = unmanaged.
Description *string
}
// DesiredState is the vmid-keyed target for this host. At slice 4 the only live
// source is the empty provider, so Guests is empty in production; fixtures inject it
// in tests. Host-level desired state (storage manifest, etc.) arrives in later slices.
type DesiredState struct {
Guests map[int]DesiredGuest
}
// ActualGuest is one guest's observed state, read from Proxmox.
type ActualGuest struct {
VMID int
Run RunState
// SpecKnown is false when GuestConfig could not be read (the run-state from the
// list is still trusted; spec/description comparisons are skipped). Mirrors the
// collector's "keep run-status, omit spec" degradation.
SpecKnown bool
Cores int
MemoryMiB int64 // proxmox LXC `memory` is MiB
Description string // raw (may carry PVE's trailing newline; compared via normalizers)
}
// ActualState is the vmid-keyed observed state for this host.
type ActualState struct {
Guests map[int]ActualGuest
}
// DesiredProvider is the seam the desired-state source plugs into. At slice 4 the
// only implementation is EmptyProvider (no live source); slice 10's hub-serving path
// is the real implementation. Do NOT invent a hub/local-file source here.
type DesiredProvider interface {
Desired(ctx context.Context) (DesiredState, error)
}
// EmptyProvider is the slice-4 production provider: no desired state, so reconcile is
// a live no-op (the engine computes an empty action set).
type EmptyProvider struct{}
// Desired returns an empty desired state.
func (EmptyProvider) Desired(context.Context) (DesiredState, error) {
return DesiredState{Guests: map[int]DesiredGuest{}}, nil
}
// StaticProvider serves a fixed DesiredState — used by fixtures (and usable as a
// local override later). It never mutates the value it was given.
type StaticProvider struct{ State DesiredState }
// Desired returns the static state.
func (p StaticProvider) Desired(context.Context) (DesiredState, error) { return p.State, nil }
// GuestAPI is the narrow Proxmox surface the engine needs: read actual state and
// dispatch the benign-on-existing-guest mutations. *proxmox.Client satisfies it; a
// fake satisfies it in tests. Every mutating call returns a UPID (or "" for the
// synchronous path) per the proxmox/mutate.go contract — the engine WaitTasks a
// non-empty UPID and treats "" as a clean synchronous success.
type GuestAPI interface {
ListLXC(ctx context.Context) ([]proxmox.Guest, error)
GuestConfig(ctx context.Context, vmid int) (proxmox.GuestConfig, error)
Start(ctx context.Context, vmid int) (string, error)
Stop(ctx context.Context, vmid int) (string, error)
SetConfig(ctx context.Context, vmid int, params map[string]string) (string, error)
WaitTask(ctx context.Context, upid string, opts proxmox.WaitOptions) (proxmox.TaskStatus, error)
// TaskStatusOnce is a single non-blocking task-status read — used by crash
// recovery to learn the outcome of an op that was in flight when the agent died.
TaskStatusOnce(ctx context.Context, upid string) (proxmox.TaskStatus, error)
}
// guestDescription decodes the (string-valued) `description` key from a GuestConfig's
// raw Extra map, returning "" when absent. The value is returned raw — PVE appends a
// trailing newline on read, which the normalization layer strips at comparison time.
func guestDescription(cfg proxmox.GuestConfig) string {
raw, ok := cfg.Extra["description"]
if !ok || len(raw) == 0 {
return ""
}
var s string
if err := json.Unmarshal(raw, &s); err != nil {
return ""
}
return s
}