diff --git a/CHANGELOG.md b/CHANGELOG.md index 35e7d21..81173d9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,62 @@ All notable changes to **felhom-agent** are recorded here. Update on every code change that gets pushed. +## v0.2.0 — `authz` signed-op verifier (slice 2) (2026-06-08) + +Production form of the Phase-4 signing primitive: a key-type-agnostic SSHSIG +verifier for operator-signed destructive ops, with the full anti-replay/ +authorization pipeline and a durable, crash-safe nonce store. What slice 4 +(reconcile) will call to gate destructive desired-state deltas. No hub, no signing +CLI, no reconcile loop. + +### Added +- **`internal/authz` — `Verifier`**: `New(signers, store, hostID)` + `Verify(blob, + sigArmored) (*VerifiedOp, error)`. Runs the LOCKED pipeline (order is + load-bearing): parse armor → namespace → parse pubkey → allow-list (by key + **material**, `pub.Marshal()` equality, not key_id) → crypto verify (over the + **raw received bytes**, never re-canonicalized) → parse blob → target → time + window → **nonce recorded LAST**. Each post-crypto stage rejects even with a + valid signature. +- **SSHSIG framing** (`sshsig.go`) via `golang.org/x/crypto/ssh` — `pem.Decode` → + strip 6-byte magic → `ssh.Unmarshal` → `ssh.ParsePublicKey` → recompute signed + data with the named hash → `pub.Verify` (dispatches on key algorithm). No + hand-rolled crypto. Key-type-agnostic: ed25519 / **sk-ssh-ed25519 (FIDO2)** / + rsa / ecdsa via the one path. +- **Fixed namespace** `felhom-op-v1` (package constant, never caller-supplied). +- **`OpBlob`** (corrected `host_id`/`guest_id` json tags) + **`VerifiedOp`** (op, + host/guest, params, key_id, matched signer). key_id is advisory/audit only — + never an authz input. +- **Typed errors**: `ErrMalformed, ErrNamespace, ErrUnknownSigner, ErrBadSignature, + ErrTarget, ErrExpired, ErrNotYetValid, ErrReplay` (errors.Is-friendly). +- **`NonceStore`** + two impls: `MemoryNonceStore` (tests) and **`FileNonceStore`** + — durable, crash-safe (fsync'd append log, replayed into an index on open, + periodic compaction, expiry-only pruning). A nonce is fsync'd to disk before + `SeenOrRecord` returns false; replay protection survives restart; I/O failure + fails safe (reports seen=true). Target generalization: host_id matched strictly, + guest_id surfaced for the caller to route. +- **Config**: `AuthzConfig` (nonce-store path + pinned operator `signers` tagged + `operational`/`recovery` with a key_id, as authorized_keys lines). +- **Version 0.2.0.** + +### Tests +- Real OpenSSH interop via a committed `ssh-keygen -Y sign` vector (hermetic CI); + per-stage rejection (each with an otherwise-valid sig); the headline + **invalid-sig-does-not-burn-the-nonce** invariant; replay; **persistence across + restart**; synthetic **sk-ssh-ed25519** through the unchanged path; byte-exactness + (a re-serialized blob fails crypto — not re-canonicalized). + +### Notes / corrections to the Phase-4 reference +- §7's `Target` lacked json tags (`host_id`/`guest_id`) — fixed. +- The doc paired "Go 1.24.4 / x/crypto v0.52.0", but v0.52.0 declares `go 1.25.0` + and does **not** build on Go 1.24. Resolved by upgrading the build server to + go1.26.0 (backward-compatible; felhom-controller/hub unaffected); the module is + `go 1.25.0` on x/crypto v0.52.0. +- Free function → constructed `Verifier`; returns the full `VerifiedOp`; typed + errors; clock-skew tolerance added; durable nonce store is the net-new work. +- **Shared-contract dependency flagged** (not built): the hub and the `felhom-sign` + CLI must emit byte-identical canonical JSON or signatures won't verify; a shared + canonicalizer both import would be the right home. + ## v0.1.0 — Scaffold + `proxmox` interaction layer (slice 1) (2026-06-08) First slice: stand up the host-agent project and its foundation — the typed diff --git a/REPORT.md b/REPORT.md index 3288481..9d57921 100644 --- a/REPORT.md +++ b/REPORT.md @@ -3,51 +3,74 @@ > This file holds the report for the **most recent** change, fully overwritten each task. > Cumulative history lives in [CHANGELOG.md](CHANGELOG.md). -## Task: Agent scaffold + `proxmox` interaction package (slice 1) — v0.1.0 +## Task: `authz` signed-op verifier (slice 2) — v0.2.0 -Stood up the host-agent project and its foundation — the typed `proxmox` interaction -layer every other agent module will call — with a runnable read-only `--selftest`. -Pushed to `main` (main-only repo). Build/vet/test green; verified live against the demo host. +Turned the Phase-4 reference `VerifySignedOp` into a production package +(`internal/authz`): a key-type-agnostic SSHSIG verifier for operator-signed destructive +ops, the full anti-replay/authorization pipeline, and a durable, crash-safe nonce store. +This is what slice 4 (reconcile) calls to gate destructive desired-state deltas. Pushed to +`main`. Build/vet/test green locally (Go 1.26) and on the build server. -### Public surface +### Public surface (`internal/authz`) +- **`Verifier`** — `New(signers []AllowedSigner, store NonceStore, hostID string) *Verifier`; + `Verify(blob, sigArmored []byte) (*VerifiedOp, error)`. Optional `ClockSkew` (default 2m, + not-yet-valid only) and `Logger` (advisory key_id-mismatch warning). +- **`OpBlob`** — canonical signed object; `Target{HostID,GuestID}` with corrected + `host_id`/`guest_id` json tags; `Params json.RawMessage`, `Nonce`, `IssuedAt`, `ExpiresAt`, `KeyID`. +- **`VerifiedOp`** — `Op, HostID, GuestID, Params, Nonce, IssuedAt, ExpiresAt, KeyID (advisory), + Signer (matched), KeyIDMatchesSigner`. +- **`AllowedSigner`** + `NewAllowedSigner(keyID, role, authorizedKeyLine)`; roles + `RoleOperational` / `RoleRecovery` (doc 04 two-key model; role-scoping enforced by the caller). +- **`NonceStore`** interface + `MemoryNonceStore` (tests) and **`FileNonceStore`** (durable). +- **Typed errors**: `ErrMalformed, ErrNamespace, ErrUnknownSigner, ErrBadSignature, ErrTarget, + ErrExpired, ErrNotYetValid, ErrReplay` (errors.Is-friendly). +- **Config**: `config.AuthzConfig` (nonce-store path + pinned `Signers`). -**`proxmox.Client`** (API backend): -- Read: `Version`, `Nodes`, `NodeStatus`, `ListLXC`, `GuestStatus`, `GuestConfig`, `ListStorage`, `NodeStorage`, `StorageContent` -- Async mutating (return a UPID): `RestoreLXC` (primary create path), `Vzdump`, `Snapshot`, `Rollback`, `DeleteSnapshot`, `SetConfig`, `Start`, `Stop` -- Tasks: `WaitTask(ctx, upid, WaitOptions)`, `TaskStatusOnce`, `TaskLogTail` -- Errors: `*APIError` (parses the offending privilege from a 403), `*TaskError` (parses it from a failed task `exitstatus` + log tail) -- Types: `Version, Node, NodeStatus, Guest, GuestConfig (+Extra/MountPoints/Nets), Storage, StorageContent, TaskStatus, UPID` +### Locked pipeline (order load-bearing) +`parse armor → namespace (fixed felhom-op-v1) → parse pubkey → allow-list by key MATERIAL (not +key_id) → crypto verify over RAW received bytes → parse blob → target (host strict, guest +surfaced) → time window → nonce recorded LAST`. Each post-crypto stage rejects even with a +valid signature; an invalid signature can never consume a nonce. -**`proxmox.Privileged`** (fenced root-CLI; `Runner` iface, `ExecRunner` direct/`sudo -n`): `CreateGoldenLXC` (keyctl), `MountUSBByUUID`, `SMART`, `Sensors` — each documents *why it can't be the API*. +### Durable nonce store — mechanism & guarantee +fsync'd append-only JSONL log + in-memory index (replayed on open) + periodic compaction. +- **Crash-safe**: a nonce is written and `fsync`'d before `SeenOrRecord` returns `false`, so the + caller acts only *after* the durable record. A crash between verify and execute drops the op + (fail-safe) and never enables a replay. I/O failure → returns seen=true (op not executed). +- **Survives restart**: the log is replayed into the index on `OpenFileNonceStore`. +- **Pruning**: expired nonces dropped only at compaction (never before exp) — and an expired op + is rejected by the time check before the nonce check, so pruning is housekeeping, not a hole. +- **Concurrency-safe**: single mutex over file handle + index. -### API-vs-root routing table +### OPEN choices +- **Clock skew**: 2-minute tolerance on *not-yet-valid* only; expiry not extended (window stays an + honest bound). +- **Durable mechanism**: fsync'd append log + compaction (simple, honest, no embedded-KV dep). +- **Fixtures**: committed real `ssh-keygen -Y sign` vector (hermetic + proves OpenSSH interop) + + in-Go minting for rejection cases; the sk case is synthetic (spec-faithful, no hardware). +- **Package name**: `authz` (control-plane-authorization layer, matches doc 04). -| Backend | Ops | Why | -|---|---|---| -| **API** | node status, list/status/config guests, storage list+content, task status/log, **restore**, vzdump, snapshot/rollback/delete-snap, set-config, start/stop | FelhomAgent 16-priv token | -| **root-CLI (fenced)** | golden `pct create` (keyctl=1), USB mount-by-UUID/fstab, SMART/sensors | keyctl is `root@pam`-only; host mounts + SMART aren't API ops | +### Test matrix (all pass — 14 tests) +Real ssh-keygen fixture · happy path · per-stage rejection {namespace, unknown-signer, tampered, +retargeted-host, expired, not-yet-valid, replay} · **invalid-sig-does-NOT-burn-nonce** (then the +valid op with that nonce still succeeds) · replay-rejected-across-restart (durable store) · +key-type-agnostic synthetic **sk-ssh-ed25519** · byte-exactness (re-serialized blob fails crypto). -Fence is **structural** (`Client` has no runner, `Privileged` has no HTTP client) and asserted in `routing_test.go`. - -### OPEN-item choices -- **Config:** JSON file + `FELHOM_AGENT_*` env overrides (stdlib, zero-dep; swappable to `yaml.v3` if YAML house-style is preferred). Token never logged (`Redacted()`). -- **Privileged runner / uid:** `Runner` iface; `ExecRunner{Mode: sudo|direct}`, default `sudo -n`. Proposed (not finalized): non-root service user + narrow sudoers allowlist for the 3 fenced commands. -- **Polling:** first poll immediate, then 1s → exponential backoff capped 5s, default total timeout 10m; honors ctx cancellation. Tunable via `WaitOptions`. -- **`--selftest=task`:** included (gated behind the flag + `-vmid`). Unit-tested via mocks; not run live (the live token was read-only). -- **Versioning:** `version` var in `main.go` (default `0.1.0`, `-ldflags -X main.version=`), `--version` flag. - -### What the live host revealed (recorded, not guessed) -- Node name is **`demo-felhom`**; `felhom-pve` is only the SSH alias. -- `/nodes/{node}/status`: `cpu` is a 0..1 fraction, **`loadavg` is an array of strings**; `memory`/`rootfs`/`swap` nested. -- `vmid` is an **integer** in list/status; `status/current` carries no `vmid` (set from the path arg). -- Task: `status` ∈ {running, stopped}, `exitstatus` only once stopped; task log is `[{"n":N,"t":"…"}]`. UPID = `UPID:node:pid(hex):pstart(hex):starttime(hex):worker:id:user:`. -- `pveum user token add … --output-format json` returns `{"value":"…"}`. -- **No spike fact failed in practice** — 16-priv role, async/UPID model, keyctl boundary, dual-grant privsep all held. Teardown logged `ignore invalid acl token …`, confirming ACL auto-invalidation (phase1-2 §5). +### Corrections to the Phase-4 §7 reference (for production) +- `Target` needed `host_id`/`guest_id` json tags — fixed. +- **The doc's "Go 1.24.4 / x/crypto v0.52.0" does not hold**: x/crypto v0.52.0 declares + `go 1.25.0` and won't build on Go 1.24. Resolved by upgrading the build server to **go1.26.0** + (backward-compatible — felhom-controller/hub build unchanged; distro Go package left intact, + upstream Go fronted on PATH). +- Free function → constructed `Verifier`; returns full `VerifiedOp`; typed errors; clock-skew; + durable nonce store (the net-new engineering). +- **Shared-contract flag (not built)**: the hub and `felhom-sign` CLI must produce byte-identical + canonical JSON or signatures won't verify; a shared canonicalizer both import is the right home. ### Verification -- `go build/vet/test` green twice: locally (Go 1.26) and on the build server (Go 1.24.4). -- **Live read-only `--selftest`** (built on 192.168.0.180, against `https://192.168.0.162:8006`, **TLS fingerprint-pinned** — no insecure mode): version, nodes, node status, guests, storage all `[ ok ]`. slog confirmed the token rendered as `…=********`. Throwaway token created + torn down. -- Mutating ops + live `WaitTask` are unit-tested only (live run used a read-only token); `--selftest=task` is ready to exercise them against a real `FelhomAgent` token. +- `go build/vet/test` green locally (go1.26.0) and on the build server (upgraded to go1.26.0). +- Real OpenSSH `ssh-keygen` (OpenSSH 10.0p2) minted the committed fixture and self-verified it + before commit. ### Repo state -- Branch: `main` only (feature branch merged + deleted, local & remote). Latest: `chore(agent): add CHANGELOG, version the agent at 0.1.0`. +- Branch: `main` only. Dep: `golang.org/x/crypto v0.52.0` (+ `x/sys` indirect); `go 1.25.0`. diff --git a/cmd/felhom-agent/main.go b/cmd/felhom-agent/main.go index bca5603..189e361 100644 --- a/cmd/felhom-agent/main.go +++ b/cmd/felhom-agent/main.go @@ -24,7 +24,7 @@ import ( // version is the agent version. Overridable at build time with // -ldflags "-X main.version="; defaults to the in-repo CHANGELOG version. -var version = "0.1.0" +var version = "0.2.0" func main() { var ( diff --git a/go.mod b/go.mod index ad3f32f..976577c 100644 --- a/go.mod +++ b/go.mod @@ -1,3 +1,7 @@ module gitea.dooplex.hu/admin/felhom-agent -go 1.24 +go 1.25.0 + +require golang.org/x/crypto v0.52.0 + +require golang.org/x/sys v0.45.0 // indirect diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..3f17939 --- /dev/null +++ b/go.sum @@ -0,0 +1,6 @@ +golang.org/x/crypto v0.52.0 h1:RMs7fP2rXdep0CftQlK8Uf+kibLm7qkCcradZWYz988= +golang.org/x/crypto v0.52.0/go.mod h1:1QgfPxDqh0T2M/elOJtp9RvuR95kVjir0e6/BvEmGbc= +golang.org/x/sys v0.45.0 h1:dO4czNzziLiiXplLQgBCEpCvXQ3dnkn0SdaZSYdQ+FY= +golang.org/x/sys v0.45.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= +golang.org/x/term v0.43.0 h1:S4RLU2sB31O/NCl+zFN9Aru9A/Cq2aqKpTZJ6B+DwT4= +golang.org/x/term v0.43.0/go.mod h1:lrhlHNdQJHO+1qVYiHfFKVuVioJIheAc3fBSMFYEIsk= diff --git a/internal/authz/blob.go b/internal/authz/blob.go new file mode 100644 index 0000000..e89a0fd --- /dev/null +++ b/internal/authz/blob.go @@ -0,0 +1,51 @@ +package authz + +import ( + "encoding/json" + "time" +) + +// Target binds an op to a specific box (and optionally a guest) — the anti-retarget +// field. The §7 reference omitted the json tags; production needs them so the +// signed canonical bytes decode correctly. +type Target struct { + HostID string `json:"host_id"` + GuestID string `json:"guest_id"` +} + +// OpBlob is the canonical signed object (phase4 §2). The signature covers the +// EXACT bytes of this object's canonical JSON (keys sorted at every level, no +// insignificant whitespace, no trailing newline, UTF-8) — produced by the +// operator CLI / hub, verified here over the raw received bytes. +type OpBlob struct { + Op string `json:"op"` + Target Target `json:"target"` + Params json.RawMessage `json:"params"` + Nonce string `json:"nonce"` + IssuedAt time.Time `json:"issued_at"` + ExpiresAt time.Time `json:"expires_at"` + KeyID string `json:"key_id"` +} + +// VerifiedOp is the authenticated, parsed op returned on success — everything the +// reconcile layer (slice 4) needs to route and execute, not just the op string. +type VerifiedOp struct { + Op string // the operation, e.g. "guest_destroy" + HostID string // target host (== this agent's host) + // GuestID is non-empty for a guest-scoped op; the caller routes by it. "" = + // host-scoped op. The verifier does NOT need to know all guest ids. + GuestID string + Params json.RawMessage + Nonce string + IssuedAt time.Time + ExpiresAt time.Time + + // KeyID is the blob's self-declared key id — ADVISORY / audit only, never an + // authz input. Authz is the key-material allow-list match (Signer below). + KeyID string + // Signer is the allow-listed key whose material matched the signature. + Signer AllowedSigner + // KeyIDMatchesSigner is false when the blob's advisory KeyID disagrees with + // the matched signer's id (a benign audit signal, not a rejection). + KeyIDMatchesSigner bool +} diff --git a/internal/authz/doc.go b/internal/authz/doc.go new file mode 100644 index 0000000..2c877ed --- /dev/null +++ b/internal/authz/doc.go @@ -0,0 +1,37 @@ +// Package authz is the control-plane-authorization layer: it verifies +// operator-signed destructive ops before the agent executes them. It is what the +// reconcile loop (slice 4) calls to gate destructive desired-state deltas and +// signed one-shot jobs (03 §4, 04). The signing mechanism is proven (Phase 4, +// 14/14) — this package is its production form: a key-type-agnostic SSHSIG +// verifier, the full anti-replay/authorization pipeline, and a durable, +// crash-safe nonce store. +// +// # Mechanism (LOCKED — do not redesign) +// +// - SSHSIG via golang.org/x/crypto/ssh; no hand-rolled crypto, no raw-Ed25519 +// fallback. pub.Verify dispatches on the key's own algorithm, so the same path +// accepts ed25519 / sk-ssh-ed25519 (FIDO2) / rsa / ecdsa — a hardware operator +// key later is a box no-op (Phase 4 §5/§6, doc 04 §7). +// - Fixed namespace felhom-op-v1 (package constant, never caller-supplied). +// - The verifier verifies over the RAW received blob bytes and never +// canonicalizes — the canonical form (sorted-key, whitespace-free JSON) is the +// signer's contract, shared by the hub and the felhom-sign CLI. +// +// # Pipeline order (load-bearing — Verify) +// +// parse armor → namespace → parse pubkey → allow-list (by key MATERIAL, not +// key_id) → crypto verify → parse blob → target → time window → nonce LAST +// +// Each post-crypto stage rejects even with an otherwise-valid signature. The nonce +// is recorded last, so an invalid signature can never consume a nonce. key_id is +// advisory/audit only — authz is the key-material allow-list match. +// +// # Shared-contract dependency (flag for later, not built here) +// +// Signatures only verify if the op-generator (hub) and the felhom-sign CLI produce +// BYTE-IDENTICAL canonical JSON (keys sorted at every level, no insignificant +// whitespace, no trailing newline, UTF-8 — Phase 4 §2). The verifier deliberately +// does NOT re-canonicalize, so a divergence between those two producers surfaces as +// a crypto failure here. A shared canonicalizer that both import would be the right +// home for that contract; it is out of scope for this slice. +package authz diff --git a/internal/authz/errors.go b/internal/authz/errors.go new file mode 100644 index 0000000..9e727d5 --- /dev/null +++ b/internal/authz/errors.go @@ -0,0 +1,27 @@ +package authz + +import "errors" + +// Typed rejection sentinels — one per pipeline stage so the reconcile layer can +// distinguish "rejected" (a real signed op that failed a check) from "malformed" +// from a future "not yet signed". All are errors.Is-friendly: Verify wraps them +// with %w plus context. +var ( + // ErrMalformed: the armor/SSHSIG/blob could not be parsed (not a rejection of + // a well-formed op — bad input). + ErrMalformed = errors.New("authz: malformed signature or blob") + // ErrNamespace: SSHSIG namespace != the fixed felhom-op-v1 domain separator. + ErrNamespace = errors.New("authz: namespace mismatch") + // ErrUnknownSigner: the signing key's material is not in the pinned allow-list. + ErrUnknownSigner = errors.New("authz: signer not in allowed set") + // ErrBadSignature: cryptographic verification failed (tamper / wrong key). + ErrBadSignature = errors.New("authz: signature did not verify") + // ErrTarget: target.host_id is not this box. + ErrTarget = errors.New("authz: target mismatch") + // ErrExpired: now > expires_at. + ErrExpired = errors.New("authz: op expired") + // ErrNotYetValid: now < issued_at (minus clock-skew tolerance). + ErrNotYetValid = errors.New("authz: op not yet valid") + // ErrReplay: the nonce was already recorded in the window. + ErrReplay = errors.New("authz: replay (nonce already seen)") +) diff --git a/internal/authz/mint_test.go b/internal/authz/mint_test.go new file mode 100644 index 0000000..78fa8f8 --- /dev/null +++ b/internal/authz/mint_test.go @@ -0,0 +1,107 @@ +package authz + +import ( + "crypto/ed25519" + "crypto/rand" + "crypto/sha256" + "encoding/binary" + "encoding/pem" + "fmt" + "testing" + "time" + + "golang.org/x/crypto/ssh" +) + +// Test helpers that MINT armored SSHSIGs in-Go (hermetic) — the inverse of the +// production framing. They reuse the production signedData()/sshsigBlob so a test +// can never drift from the verifier's notion of the signed bytes. + +// canonicalBlob builds an op blob in the §2 canonical field order. (Self-consistent +// for the in-Go path: we sign exactly these bytes and verify the same bytes. The +// committed ssh-keygen fixture exercises real OpenSSH canonical interop.) +func canonicalBlob(op, hostID, guestID, keyID, nonce, paramsJSON string, issued, expires time.Time) []byte { + if paramsJSON == "" { + paramsJSON = "{}" + } + return []byte(fmt.Sprintf( + `{"expires_at":%q,"issued_at":%q,"key_id":%q,"nonce":%q,"op":%q,"params":%s,"target":{"guest_id":%q,"host_id":%q}}`, + expires.UTC().Format(time.RFC3339), issued.UTC().Format(time.RFC3339), + keyID, nonce, op, paramsJSON, guestID, hostID)) +} + +// mintArmor builds an armored SSHSIG over message, using sign to produce the inner +// ssh.Signature over the recomputed SSHSIG signed-data. +func mintArmor(t *testing.T, pubMarshaled []byte, namespace, hashName string, message []byte, sign func([]byte) ssh.Signature) []byte { + t.Helper() + sb := &sshsigBlob{Version: 1, PublicKey: string(pubMarshaled), Namespace: namespace, Reserved: "", HashAlgo: hashName} + signed, err := signedData(sb, message) + if err != nil { + t.Fatalf("signedData: %v", err) + } + sig := sign(signed) + sb.Signature = string(ssh.Marshal(&sig)) + raw := append([]byte(sshsigMagic), ssh.Marshal(sb)...) + return pem.EncodeToMemory(&pem.Block{Type: "SSH SIGNATURE", Bytes: raw}) +} + +// newEd25519Signer returns an ssh.PublicKey + a sign closure for a fresh ed25519 key. +func newEd25519Signer(t *testing.T) (ssh.PublicKey, func([]byte) ssh.Signature) { + t.Helper() + pub, priv, err := ed25519.GenerateKey(rand.Reader) + if err != nil { + t.Fatal(err) + } + sshPub, err := ssh.NewPublicKey(pub) + if err != nil { + t.Fatal(err) + } + sign := func(signed []byte) ssh.Signature { + return ssh.Signature{Format: ssh.KeyAlgoED25519, Blob: ed25519.Sign(priv, signed)} + } + return sshPub, sign +} + +// newSyntheticSKSigner emulates a FIDO2 sk-ssh-ed25519@openssh.com key with NO +// hardware (Phase 4 §5). It builds a spec-faithful sk public key and an sk-format +// signature: ed25519 over sha256(application)‖flags‖counter‖sha256(signed_data), +// sig.Blob = the raw ed25519 signature, sig.Rest = flags‖counter. It must verify +// through the UNCHANGED Verify path. +func newSyntheticSKSigner(t *testing.T) (ssh.PublicKey, func([]byte) ssh.Signature) { + t.Helper() + edPub, edPriv, err := ed25519.GenerateKey(rand.Reader) + if err != nil { + t.Fatal(err) + } + const application = "ssh:" + skBlob := ssh.Marshal(struct { + Name string + KeyBytes []byte + Application string + }{"sk-ssh-ed25519@openssh.com", []byte(edPub), application}) + skPub, err := ssh.ParsePublicKey(skBlob) + if err != nil { + t.Fatalf("parse synthetic sk pubkey: %v", err) + } + if skPub.Type() != "sk-ssh-ed25519@openssh.com" { + t.Fatalf("sk pubkey type = %q", skPub.Type()) + } + + sign := func(signed []byte) ssh.Signature { + const flagUserPresence = byte(0x01) // required, else Verify rejects + const counter = uint32(1) + appDigest := sha256.Sum256([]byte(application)) + dataDigest := sha256.Sum256(signed) + // original = appDigest ‖ flags ‖ counter(BE) ‖ dataDigest (x/crypto layout) + var original []byte + original = append(original, appDigest[:]...) + original = append(original, flagUserPresence) + original = binary.BigEndian.AppendUint32(original, counter) + original = append(original, dataDigest[:]...) + edSig := ed25519.Sign(edPriv, original) + // sig.Rest = skFields{Flags, Counter} = flags ‖ counter(BE) + rest := append([]byte{flagUserPresence}, binary.BigEndian.AppendUint32(nil, counter)...) + return ssh.Signature{Format: "sk-ssh-ed25519@openssh.com", Blob: edSig, Rest: rest} + } + return skPub, sign +} diff --git a/internal/authz/noncestore.go b/internal/authz/noncestore.go new file mode 100644 index 0000000..7eaa8c7 --- /dev/null +++ b/internal/authz/noncestore.go @@ -0,0 +1,195 @@ +package authz + +import ( + "bytes" + "encoding/json" + "errors" + "io/fs" + "os" + "path/filepath" + "sync" + "time" +) + +// MemoryNonceStore is a non-durable NonceStore for tests. Replay protection does +// NOT survive process restart — never use it on a real host. +type MemoryNonceStore struct { + mu sync.Mutex + seen map[string]time.Time +} + +// NewMemoryNonceStore builds an empty in-memory store. +func NewMemoryNonceStore() *MemoryNonceStore { + return &MemoryNonceStore{seen: make(map[string]time.Time)} +} + +// SeenOrRecord reports whether nonce was already recorded, recording it if not. +func (m *MemoryNonceStore) SeenOrRecord(nonce string, exp time.Time) bool { + m.mu.Lock() + defer m.mu.Unlock() + if _, ok := m.seen[nonce]; ok { + return true + } + m.seen[nonce] = exp + return false +} + +// FileNonceStore is the durable, crash-safe NonceStore for the host. Mechanism: +// an fsync'd append-only JSONL log with an in-memory index, periodic compaction, +// and expiry-only pruning. +// +// Durability guarantee: a nonce is on disk AND fsync'd before SeenOrRecord returns +// false, so the caller acting on a verified op always does so AFTER the durable +// record. A crash between verify and execute therefore drops the op (fail-safe +// direction) and never enables a replay. Replay protection survives restarts: the +// log is replayed into the index on Open. +// +// Pruning: a nonce is dropped only after its exp (compaction), never before — +// pruning before expiry would reopen the replay window. (An expired nonce can't be +// replayed anyway: the time-window check rejects an expired op before the nonce +// check, so pruning is housekeeping, not an authz hole.) +// +// Concurrency: a single mutex guards the file handle and index (single-process; the +// agent is concurrent — 03 §10). +type FileNonceStore struct { + mu sync.Mutex + path string + f *os.File + idx map[string]time.Time + sinceCompact int + now func() time.Time + + // CompactEvery is the append count that triggers a compaction (default 1000). + CompactEvery int +} + +type nonceRecord struct { + Nonce string `json:"n"` + Exp time.Time `json:"e"` +} + +// OpenFileNonceStore opens (or creates) the durable store at path, replaying any +// existing log into the index. +func OpenFileNonceStore(path string) (*FileNonceStore, error) { + s := &FileNonceStore{ + path: path, + idx: make(map[string]time.Time), + now: func() time.Time { return time.Now().UTC() }, + CompactEvery: 1000, + } + if err := s.load(); err != nil { + return nil, err + } + f, err := os.OpenFile(path, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o600) + if err != nil { + return nil, err + } + s.f = f + syncDir(filepath.Dir(path)) // make a freshly-created file's dir entry durable + return s, nil +} + +func (s *FileNonceStore) load() error { + b, err := os.ReadFile(s.path) + if errors.Is(err, fs.ErrNotExist) { + return nil + } + if err != nil { + return err + } + for _, line := range bytes.Split(b, []byte("\n")) { + line = bytes.TrimSpace(line) + if len(line) == 0 { + continue + } + var r nonceRecord + if json.Unmarshal(line, &r) != nil { + continue // skip a torn trailing line from a crash mid-append + } + s.idx[r.Nonce] = r.Exp + } + return nil +} + +// SeenOrRecord durably records an unseen nonce before returning false. On any I/O +// failure it returns true (fail-safe: the op is NOT executed rather than risk an +// unrecorded nonce enabling a later replay). +func (s *FileNonceStore) SeenOrRecord(nonce string, exp time.Time) bool { + s.mu.Lock() + defer s.mu.Unlock() + if _, ok := s.idx[nonce]; ok { + return true + } + rec, _ := json.Marshal(nonceRecord{Nonce: nonce, Exp: exp}) + rec = append(rec, '\n') + if _, err := s.f.Write(rec); err != nil { + return true + } + if err := s.f.Sync(); err != nil { + return true + } + s.idx[nonce] = exp + s.sinceCompact++ + s.maybeCompact() + return false +} + +// Close releases the file handle. +func (s *FileNonceStore) Close() error { + s.mu.Lock() + defer s.mu.Unlock() + if s.f != nil { + return s.f.Close() + } + return nil +} + +// maybeCompact rewrites the log keeping only non-expired entries once enough +// appends have accrued. Caller holds the mutex. Compaction is housekeeping: the +// recorded nonce is already durable, so a compaction failure never fails the op. +func (s *FileNonceStore) maybeCompact() { + if s.CompactEvery <= 0 || s.sinceCompact < s.CompactEvery { + return + } + s.sinceCompact = 0 + now := s.now() + + live := make(map[string]time.Time, len(s.idx)) + var buf bytes.Buffer + for n, e := range s.idx { + if e.Before(now) { + continue // prune AFTER expiry only — safe + } + live[n] = e + rec, _ := json.Marshal(nonceRecord{Nonce: n, Exp: e}) + buf.Write(rec) + buf.WriteByte('\n') + } + + tmp := s.path + ".tmp" + if err := os.WriteFile(tmp, buf.Bytes(), 0o600); err != nil { + return // keep using the existing handle; nonce already durable + } + if tf, err := os.OpenFile(tmp, os.O_WRONLY, 0o600); err == nil { + _ = tf.Sync() + _ = tf.Close() + } + if s.f != nil { + _ = s.f.Close() + } + if err := os.Rename(tmp, s.path); err != nil { + s.f, _ = os.OpenFile(s.path, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o600) + return + } + syncDir(filepath.Dir(s.path)) + s.f, _ = os.OpenFile(s.path, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o600) + s.idx = live +} + +// syncDir best-effort fsyncs a directory so a create/rename is durable. +func syncDir(dir string) { + if d, err := os.Open(dir); err == nil { + _ = d.Sync() + _ = d.Close() + } +} diff --git a/internal/authz/noncestore_test.go b/internal/authz/noncestore_test.go new file mode 100644 index 0000000..754c927 --- /dev/null +++ b/internal/authz/noncestore_test.go @@ -0,0 +1,95 @@ +package authz + +import ( + "os" + "path/filepath" + "testing" + "time" +) + +func TestMemoryNonceStore(t *testing.T) { + m := NewMemoryNonceStore() + exp := time.Now().Add(time.Hour) + if m.SeenOrRecord("a", exp) { + t.Fatal("first record should be unseen") + } + if !m.SeenOrRecord("a", exp) { + t.Fatal("second record should be seen") + } + if m.SeenOrRecord("b", exp) { + t.Fatal("distinct nonce should be unseen") + } +} + +func TestFileNonceStore_RecordAndReload(t *testing.T) { + path := filepath.Join(t.TempDir(), "nonces.log") + exp := refNow.Add(time.Hour) + + s1, err := OpenFileNonceStore(path) + if err != nil { + t.Fatal(err) + } + if s1.SeenOrRecord("dead", exp) { + t.Fatal("first record should be unseen") + } + if err := s1.Close(); err != nil { + t.Fatal(err) + } + + // Reopen: the recorded nonce must still be seen (durable across restart). + s2, err := OpenFileNonceStore(path) + if err != nil { + t.Fatal(err) + } + defer s2.Close() + if !s2.SeenOrRecord("dead", exp) { + t.Fatal("nonce not durable across reopen") + } +} + +func TestFileNonceStore_CompactionPrunesExpiredOnly(t *testing.T) { + path := filepath.Join(t.TempDir(), "nonces.log") + s, err := OpenFileNonceStore(path) + if err != nil { + t.Fatal(err) + } + s.now = func() time.Time { return refNow } + s.CompactEvery = 2 // force a compaction after two appends + + s.SeenOrRecord("expired", refNow.Add(-time.Hour)) // exp in the past + s.SeenOrRecord("live", refNow.Add(time.Hour)) // triggers compaction + if err := s.Close(); err != nil { + t.Fatal(err) + } + + // Reopen: the live nonce survived, the expired one was pruned (housekeeping; + // an expired op is rejected by the time check before the nonce check anyway). + s2, err := OpenFileNonceStore(path) + if err != nil { + t.Fatal(err) + } + defer s2.Close() + if !s2.SeenOrRecord("live", refNow.Add(time.Hour)) { + t.Error("live nonce should have survived compaction") + } + if s2.SeenOrRecord("expired", refNow.Add(-time.Hour)) { + t.Error("expired nonce should have been pruned (was still present)") + } +} + +func TestFileNonceStore_SkipsTornLine(t *testing.T) { + path := filepath.Join(t.TempDir(), "nonces.log") + // a valid record line + a torn/garbage trailing line from a hypothetical crash + content := `{"n":"good","e":"` + refNow.Add(time.Hour).Format(time.RFC3339Nano) + `"}` + "\n" + `{"n":"tor` + if err := os.WriteFile(path, []byte(content), 0o600); err != nil { + t.Fatal(err) + } + s, err := OpenFileNonceStore(path) + if err != nil { + t.Fatalf("open with torn line should not fail: %v", err) + } + defer s.Close() + if !s.SeenOrRecord("good", refNow.Add(time.Hour)) { + t.Error("valid record before the torn line should have loaded") + } +} diff --git a/internal/authz/signer_test.go b/internal/authz/signer_test.go new file mode 100644 index 0000000..3ad854f --- /dev/null +++ b/internal/authz/signer_test.go @@ -0,0 +1,36 @@ +package authz + +import ( + "os" + "testing" +) + +func TestNewAllowedSigner(t *testing.T) { + line, err := os.ReadFile("testdata/operator.pub") + if err != nil { + t.Fatal(err) + } + s, err := NewAllowedSigner("felhom-op-1", RoleOperational, string(line)) + if err != nil { + t.Fatalf("NewAllowedSigner: %v", err) + } + if s.KeyID != "felhom-op-1" || s.Role != RoleOperational || s.PublicKey == nil { + t.Errorf("signer = %+v", s) + } + if s.PublicKey.Type() != "ssh-ed25519" { + t.Errorf("key type = %q", s.PublicKey.Type()) + } +} + +func TestNewAllowedSigner_BadRole(t *testing.T) { + line, _ := os.ReadFile("testdata/operator.pub") + if _, err := NewAllowedSigner("k", "bogus", string(line)); err == nil { + t.Fatal("invalid role should error") + } +} + +func TestNewAllowedSigner_BadLine(t *testing.T) { + if _, err := NewAllowedSigner("k", RoleOperational, "not a key"); err == nil { + t.Fatal("malformed key line should error") + } +} diff --git a/internal/authz/sshsig.go b/internal/authz/sshsig.go new file mode 100644 index 0000000..6a2daaa --- /dev/null +++ b/internal/authz/sshsig.go @@ -0,0 +1,81 @@ +package authz + +import ( + "crypto/sha256" + "crypto/sha512" + "encoding/pem" + "fmt" + "hash" + + "golang.org/x/crypto/ssh" +) + +// SSHSIG framing — ported verbatim-in-spirit from phase4-signing-findings.md §7. +// The only manual work is SSHSIG *framing*; all crypto and key-type dispatch is +// x/crypto/ssh's (pub.Verify dispatches on the key's own algorithm, which is what +// makes the verifier key-type-agnostic — ed25519 / sk-ssh-ed25519 / rsa / ecdsa). +// No hand-rolled crypto. + +const sshsigMagic = "SSHSIG" + +// sshsigBlob is the binary SSHSIG body (after the 6-byte magic). Field order is +// the SSH wire order — do not reorder. +type sshsigBlob struct { + Version uint32 + PublicKey string + Namespace string + Reserved string + HashAlgo string + Signature string +} + +func hashByName(n string) (hash.Hash, error) { + switch n { + case "sha256": + return sha256.New(), nil + case "sha512": + return sha512.New(), nil + } + return nil, fmt.Errorf("%w: unsupported SSHSIG hash %q", ErrMalformed, n) +} + +// parseArmoredSSHSIG decodes the `-----BEGIN SSH SIGNATURE-----` armor into the +// SSHSIG body: pem.Decode → strip the literal 6-byte magic (not length-prefixed) +// → ssh.Unmarshal. +func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) { + block, _ := pem.Decode(armored) + if block == nil || block.Type != "SSH SIGNATURE" { + return nil, fmt.Errorf("%w: not an SSH SIGNATURE armor", ErrMalformed) + } + if len(block.Bytes) < len(sshsigMagic) || string(block.Bytes[:len(sshsigMagic)]) != sshsigMagic { + return nil, fmt.Errorf("%w: missing SSHSIG magic", ErrMalformed) + } + var sb sshsigBlob + if err := ssh.Unmarshal(block.Bytes[len(sshsigMagic):], &sb); err != nil { + return nil, fmt.Errorf("%w: %v", ErrMalformed, err) + } + if sb.Version != 1 { + return nil, fmt.Errorf("%w: bad SSHSIG version %d", ErrMalformed, sb.Version) + } + return &sb, nil +} + +// signedData recomputes the bytes the signature actually covers, per the SSHSIG +// spec: "SSHSIG" || ssh.Marshal(namespace, reserved, hash_algorithm, H(message)), +// where H is the named hash. The message is the RAW received blob bytes — the +// verifier never canonicalizes (the canonical form is the signer's contract). +func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) { + h, err := hashByName(sb.HashAlgo) + if err != nil { + return nil, err + } + h.Write(msg) + md := h.Sum(nil) + body := ssh.Marshal(struct { + Namespace string + Reserved string + HashAlgo string + Hash []byte + }{sb.Namespace, sb.Reserved, sb.HashAlgo, md}) + return append([]byte(sshsigMagic), body...), nil +} diff --git a/internal/authz/testdata/op_blob.json b/internal/authz/testdata/op_blob.json new file mode 100644 index 0000000..eaa0ddf --- /dev/null +++ b/internal/authz/testdata/op_blob.json @@ -0,0 +1 @@ +{"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}} \ No newline at end of file diff --git a/internal/authz/testdata/op_blob.sig b/internal/authz/testdata/op_blob.sig new file mode 100644 index 0000000..661685d --- /dev/null +++ b/internal/authz/testdata/op_blob.sig @@ -0,0 +1,6 @@ +-----BEGIN SSH SIGNATURE----- +U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgNXOOuMvD3Fh9MJYspBRWLXyQAd +WVeBICspeB9eL1xfIAAAAMZmVsaG9tLW9wLXYxAAAAAAAAAAZzaGE1MTIAAABTAAAAC3Nz +aC1lZDI1NTE5AAAAQG+bj+GNodNw7cfGYg3HWTDyJiu3g/5Aez1xlZQ540JGUIG9FV7vv8 +wrgN0r+rNh+ytEAM6UTOyI7g3LOjuVJgY= +-----END SSH SIGNATURE----- diff --git a/internal/authz/testdata/operator.pub b/internal/authz/testdata/operator.pub new file mode 100644 index 0000000..345a0b5 --- /dev/null +++ b/internal/authz/testdata/operator.pub @@ -0,0 +1 @@ +ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDVzjrjLw9xYfTCWLKQUVi18kAHVlXgSArKXgfXi9cXy felhom-op-1 diff --git a/internal/authz/verifier.go b/internal/authz/verifier.go new file mode 100644 index 0000000..b9773c4 --- /dev/null +++ b/internal/authz/verifier.go @@ -0,0 +1,191 @@ +package authz + +import ( + "bytes" + "encoding/json" + "fmt" + "log/slog" + "time" + + "golang.org/x/crypto/ssh" +) + +// Namespace is the FIXED SSHSIG domain separator. It is a package constant, never +// caller-supplied (phase4 §2.2): a signature minted for any other namespace must +// not verify. +const Namespace = "felhom-op-v1" + +// DefaultClockSkew tolerates operator/host clock drift on the not-yet-valid check +// only (issued_at may be up to this far in the future). Expiry is NOT extended — +// the validity window stays an honest upper bound. +const DefaultClockSkew = 2 * time.Minute + +// KeyRole tags a pinned operator key (doc 04 §3 two-key model). +type KeyRole string + +const ( + // RoleOperational signs ordinary destructive ops (the "master stamp"). + RoleOperational KeyRole = "operational" + // RoleRecovery is the cold key; authorizes ONLY key-rotation/break-glass ops. + // Role-scoping is enforced by the consuming layer (slice 4), not here. + RoleRecovery KeyRole = "recovery" +) + +// AllowedSigner is one pinned operator public key. +type AllowedSigner struct { + KeyID string + Role KeyRole + PublicKey ssh.PublicKey // parsed; allow-list match is by PublicKey.Marshal() + Comment string // from the authorized_keys line, if any +} + +// NewAllowedSigner parses a standard authorized_keys line ("ssh-ed25519 AAAA… [comment]" +// or "sk-ssh-ed25519@openssh.com AAAA… …") into an AllowedSigner with the given id+role. +func NewAllowedSigner(keyID string, role KeyRole, authorizedKeyLine string) (AllowedSigner, error) { + pub, comment, _, _, err := ssh.ParseAuthorizedKey([]byte(authorizedKeyLine)) + if err != nil { + return AllowedSigner{}, fmt.Errorf("authz: parsing pinned key %q: %w", keyID, err) + } + if role != RoleOperational && role != RoleRecovery { + return AllowedSigner{}, fmt.Errorf("authz: pinned key %q has invalid role %q", keyID, role) + } + return AllowedSigner{KeyID: keyID, Role: role, PublicKey: pub, Comment: comment}, nil +} + +// NonceStore records seen nonces for anti-replay. SeenOrRecord reports whether the +// nonce was already recorded; if not, it records it (durably, in the host impl) +// before returning false. See noncestore.go. +type NonceStore interface { + SeenOrRecord(nonce string, exp time.Time) (seen bool) +} + +// Verifier authenticates operator-signed destructive ops. Construct with New. +type Verifier struct { + signers []AllowedSigner + store NonceStore + hostID string + + // ClockSkew tolerance for the not-yet-valid check (default DefaultClockSkew). + ClockSkew time.Duration + // Logger, if set, emits a warning when a blob's advisory key_id disagrees with + // the matched signer. Never affects the verdict. + Logger *slog.Logger + + now func() time.Time // injectable for tests +} + +// New builds a Verifier over the pinned signer set, a nonce store, and this box's +// host id. allowedSigners is a set (single signer today; quorum is just sizing). +func New(signers []AllowedSigner, store NonceStore, hostID string) *Verifier { + return &Verifier{ + signers: signers, + store: store, + hostID: hostID, + ClockSkew: DefaultClockSkew, + now: func() time.Time { return time.Now().UTC() }, + } +} + +// Verify runs the LOCKED pipeline (phase4 §4 / doc 04 §2.3) and returns the +// authenticated op. Order is load-bearing and each post-crypto stage rejects even +// with an otherwise-valid signature: +// +// parse armor → namespace → parse pubkey → allow-list (by key MATERIAL, not +// key_id) → crypto verify (over the RAW received blob bytes) → parse blob → +// target → time window → nonce SeenOrRecord (LAST) +// +// The nonce is recorded last, so an invalid signature can never consume a nonce +// (DoS / replay-priming safe). Errors wrap the typed sentinels in errors.go. +func (v *Verifier) Verify(blob, sigArmored []byte) (*VerifiedOp, error) { + // 1. parse armor + sb, err := parseArmoredSSHSIG(sigArmored) + if err != nil { + return nil, err + } + + // 2. namespace (fixed domain separator) + if sb.Namespace != Namespace { + return nil, fmt.Errorf("%w: got %q want %q", ErrNamespace, sb.Namespace, Namespace) + } + + // 3. parse the embedded public key + pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey)) + if err != nil { + return nil, fmt.Errorf("%w: parsing signature public key: %v", ErrMalformed, err) + } + + // 4. allow-list match by KEY MATERIAL (pub.Marshal equality) — NOT by key_id + matched, ok := v.matchSigner(pub) + if !ok { + return nil, ErrUnknownSigner + } + + // 5. crypto verify over the RAW received bytes (never re-serialized) + signed, err := signedData(sb, blob) + if err != nil { + return nil, err + } + var inner ssh.Signature + if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil { + return nil, fmt.Errorf("%w: %v", ErrMalformed, err) + } + if err := pub.Verify(signed, &inner); err != nil { // dispatches on the key's algorithm + return nil, fmt.Errorf("%w: %v", ErrBadSignature, err) + } + + // 6. parse the (now authenticated) blob bytes + var op OpBlob + if err := json.Unmarshal(blob, &op); err != nil { + return nil, fmt.Errorf("%w: decoding op blob: %v", ErrMalformed, err) + } + + // 7. target binding — host must be this box. guest_id is surfaced, not matched + // here (the verifier doesn't enumerate guests; the caller routes by it). + if op.Target.HostID != v.hostID { + return nil, fmt.Errorf("%w: blob host_id=%q this=%q", ErrTarget, op.Target.HostID, v.hostID) + } + + // 8. time window (clock-skew tolerance on not-yet-valid only) + now := v.now() + if now.Before(op.IssuedAt.Add(-v.ClockSkew)) { + return nil, fmt.Errorf("%w: issued_at=%s now=%s", ErrNotYetValid, op.IssuedAt, now) + } + if now.After(op.ExpiresAt) { + return nil, fmt.Errorf("%w: expires_at=%s now=%s", ErrExpired, op.ExpiresAt, now) + } + + // 9. nonce LAST — only now is it durably recorded. + if v.store.SeenOrRecord(op.Nonce, op.ExpiresAt) { + return nil, fmt.Errorf("%w: nonce %s", ErrReplay, op.Nonce) + } + + // advisory key_id audit (never a verdict input) + keyIDMatches := op.KeyID == matched.KeyID + if !keyIDMatches && v.Logger != nil { + v.Logger.Warn("authz: blob key_id does not match the matched signer (advisory)", + "blob_key_id", op.KeyID, "matched_signer", matched.KeyID) + } + + return &VerifiedOp{ + Op: op.Op, + HostID: op.Target.HostID, + GuestID: op.Target.GuestID, + Params: op.Params, + Nonce: op.Nonce, + IssuedAt: op.IssuedAt, + ExpiresAt: op.ExpiresAt, + KeyID: op.KeyID, + Signer: matched, + KeyIDMatchesSigner: keyIDMatches, + }, nil +} + +func (v *Verifier) matchSigner(pub ssh.PublicKey) (AllowedSigner, bool) { + pm := pub.Marshal() + for _, s := range v.signers { + if s.PublicKey != nil && bytes.Equal(s.PublicKey.Marshal(), pm) { + return s, true + } + } + return AllowedSigner{}, false +} diff --git a/internal/authz/verify_test.go b/internal/authz/verify_test.go new file mode 100644 index 0000000..5f4422a --- /dev/null +++ b/internal/authz/verify_test.go @@ -0,0 +1,248 @@ +package authz + +import ( + "errors" + "os" + "path/filepath" + "testing" + "time" + + "golang.org/x/crypto/ssh" +) + +// fixed reference instant used across in-Go tests (deterministic time window). +var refNow = time.Date(2026, 6, 8, 12, 0, 0, 0, time.UTC) + +func atRefNow(v *Verifier) *Verifier { v.now = func() time.Time { return refNow }; return v } + +// rejects asserts a Verify error matches the expected sentinel. +func rejects(t *testing.T, err, want error) { + t.Helper() + if !errors.Is(err, want) { + t.Fatalf("want %v, got %v", want, err) + } +} + +// signerSet builds a one-key operational allow-list around an ssh.PublicKey. +func signerSet(pub ssh.PublicKey, keyID string) []AllowedSigner { + return []AllowedSigner{{KeyID: keyID, Role: RoleOperational, PublicKey: pub}} +} + +// validBlob is an op blob valid at refNow. +func validBlob(host, guest, keyID, nonce string) []byte { + return canonicalBlob("guest_destroy", host, guest, keyID, nonce, `{"purge":true}`, + refNow.Add(-time.Hour), refNow.Add(time.Hour)) +} + +// --- Real OpenSSH interop: committed ssh-keygen fixture --- + +func TestVerify_RealSSHKeygenFixture(t *testing.T) { + blob := readFile(t, "testdata/op_blob.json") + sig := readFile(t, "testdata/op_blob.sig") + pubLine := readFile(t, "testdata/operator.pub") + + signer, err := NewAllowedSigner("felhom-op-1", RoleOperational, string(pubLine)) + if err != nil { + t.Fatalf("NewAllowedSigner: %v", err) + } + v := New([]AllowedSigner{signer}, NewMemoryNonceStore(), "demo-felhom") + v.now = func() time.Time { return time.Date(2026, 6, 8, 12, 0, 0, 0, time.UTC) } // inside fixture window + + op, err := v.Verify(blob, sig) + if err != nil { + t.Fatalf("real fixture did not verify: %v", err) + } + if op.Op != "guest_destroy" || op.HostID != "demo-felhom" || op.GuestID != "9001" { + t.Errorf("unexpected op: %+v", op) + } + if op.KeyID != "felhom-op-1" || !op.KeyIDMatchesSigner { + t.Errorf("key_id audit wrong: %q matches=%v", op.KeyID, op.KeyIDMatchesSigner) + } +} + +// --- Happy path (in-Go ed25519) --- + +func TestVerify_HappyPath(t *testing.T) { + pub, sign := newEd25519Signer(t) + blob := validBlob("demo-felhom", "9001", "op", "n-happy-0001") + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + op, err := v.Verify(blob, sig) + if err != nil { + t.Fatalf("Verify: %v", err) + } + if op.Op != "guest_destroy" || op.Signer.KeyID != "op" { + t.Errorf("op = %+v", op) + } +} + +// --- Per-stage rejection, each with an otherwise-valid signature --- + +func TestVerify_RejectsPerStage(t *testing.T) { + pub, sign := newEd25519Signer(t) + other, _ := newEd25519Signer(t) + + t.Run("wrong namespace", func(t *testing.T) { + blob := validBlob("demo-felhom", "9001", "op", "n-ns-1") + sig := mintArmor(t, pub.Marshal(), "felhom-op-wrong", "sha512", blob, sign) + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + _, err := v.Verify(blob, sig) + rejects(t, err, ErrNamespace) + }) + + t.Run("signer not in set", func(t *testing.T) { + blob := validBlob("demo-felhom", "9001", "op", "n-unk-1") + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + v := atRefNow(New(signerSet(other, "other"), NewMemoryNonceStore(), "demo-felhom")) + _, err := v.Verify(blob, sig) + rejects(t, err, ErrUnknownSigner) + }) + + t.Run("tampered blob (crypto)", func(t *testing.T) { + blob := validBlob("demo-felhom", "9001", "op", "n-tamper-1") + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + tampered := append([]byte{}, blob...) + tampered[len(tampered)-2] = '!' // mutate inside the JSON + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + _, err := v.Verify(tampered, sig) + rejects(t, err, ErrBadSignature) + }) + + t.Run("retargeted host", func(t *testing.T) { + blob := validBlob("other-host", "9001", "op", "n-target-1") + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + _, err := v.Verify(blob, sig) + rejects(t, err, ErrTarget) + }) + + t.Run("expired", func(t *testing.T) { + blob := canonicalBlob("guest_destroy", "demo-felhom", "9001", "op", "n-exp-1", "{}", + refNow.Add(-2*time.Hour), refNow.Add(-time.Hour)) + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + _, err := v.Verify(blob, sig) + rejects(t, err, ErrExpired) + }) + + t.Run("not yet valid", func(t *testing.T) { + blob := canonicalBlob("guest_destroy", "demo-felhom", "9001", "op", "n-nyv-1", "{}", + refNow.Add(time.Hour), refNow.Add(2*time.Hour)) + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + _, err := v.Verify(blob, sig) + rejects(t, err, ErrNotYetValid) + }) + + t.Run("replay", func(t *testing.T) { + blob := validBlob("demo-felhom", "9001", "op", "n-replay-1") + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + if _, err := v.Verify(blob, sig); err != nil { + t.Fatalf("first use: %v", err) + } + _, err := v.Verify(blob, sig) + rejects(t, err, ErrReplay) + }) +} + +// --- THE anti-replay invariant: an invalid-sig attempt must NOT burn the nonce --- + +func TestVerify_InvalidSigDoesNotBurnNonce(t *testing.T) { + pub, sign := newEd25519Signer(t) + store := NewMemoryNonceStore() + const nonce = "n-not-burned-cafe" + + blobV := validBlob("demo-felhom", "9001", "op", nonce) + validSig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blobV, sign) + + // Attacker reuses the SAME nonce but a signature that fails crypto (valid key, + // signed over different bytes) — passes namespace + allow-list, fails at the + // crypto stage, which is BEFORE the nonce stage. + badSig := mintArmor(t, pub.Marshal(), Namespace, "sha512", []byte(`{"different":"bytes"}`), sign) + + v := atRefNow(New(signerSet(pub, "op"), store, "demo-felhom")) + if _, err := v.Verify(blobV, badSig); !errors.Is(err, ErrBadSignature) { + t.Fatalf("invalid attempt: want ErrBadSignature, got %v", err) + } + // The genuine valid op with the same nonce must still succeed — proving the + // failed attempt did NOT burn the nonce (nonce-recorded-last). + if _, err := v.Verify(blobV, validSig); err != nil { + t.Fatalf("valid op after invalid attempt should succeed, got %v", err) + } +} + +// --- Persistence across restart (durable nonce store) --- + +func TestVerify_ReplayRejectedAcrossRestart(t *testing.T) { + pub, sign := newEd25519Signer(t) + blob := validBlob("demo-felhom", "9001", "op", "n-persist-1") + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + path := filepath.Join(t.TempDir(), "nonces.log") + + store1, err := OpenFileNonceStore(path) + if err != nil { + t.Fatal(err) + } + v1 := atRefNow(New(signerSet(pub, "op"), store1, "demo-felhom")) + if _, err := v1.Verify(blob, sig); err != nil { + t.Fatalf("first use: %v", err) + } + if err := store1.Close(); err != nil { + t.Fatal(err) + } + + // Fresh store + verifier over the SAME path — simulates an agent restart. + store2, err := OpenFileNonceStore(path) + if err != nil { + t.Fatal(err) + } + defer store2.Close() + v2 := atRefNow(New(signerSet(pub, "op"), store2, "demo-felhom")) + _, err = v2.Verify(blob, sig) + rejects(t, err, ErrReplay) +} + +// --- Key-type-agnostic: synthetic FIDO2 sk-ssh-ed25519 through the unchanged path --- + +func TestVerify_KeyTypeAgnostic_SK(t *testing.T) { + skPub, skSign := newSyntheticSKSigner(t) + blob := validBlob("demo-felhom", "9001", "op", "n-sk-1") + sig := mintArmor(t, skPub.Marshal(), Namespace, "sha512", blob, skSign) + + v := atRefNow(New(signerSet(skPub, "op"), NewMemoryNonceStore(), "demo-felhom")) + op, err := v.Verify(blob, sig) + if err != nil { + t.Fatalf("sk verify through unchanged path failed: %v", err) + } + if op.Op != "guest_destroy" { + t.Errorf("op = %q", op.Op) + } +} + +// --- Byte-exactness: a re-serialized blob is NOT re-canonicalized (fails crypto) --- + +func TestVerify_ByteExactNoRecanonicalization(t *testing.T) { + pub, sign := newEd25519Signer(t) + blob := validBlob("demo-felhom", "9001", "op", "n-bytes-1") + sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign) + + // Same fields, different whitespace + key order — what a non-identical producer + // canonicalizer would emit. The verifier verifies raw bytes, so this fails crypto. + reserialized := []byte(`{ "op":"guest_destroy", "target":{"host_id":"demo-felhom","guest_id":"9001"}, "params":{"purge":true}, "nonce":"n-bytes-1", "issued_at":"` + + refNow.Add(-time.Hour).Format(time.RFC3339) + `", "expires_at":"` + refNow.Add(time.Hour).Format(time.RFC3339) + `", "key_id":"op" }`) + + v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom")) + _, err := v.Verify(reserialized, sig) + rejects(t, err, ErrBadSignature) +} + +func readFile(t *testing.T, path string) []byte { + t.Helper() + b, err := os.ReadFile(path) + if err != nil { + t.Fatal(err) + } + return b +} diff --git a/internal/config/config.go b/internal/config/config.go index 723c0a1..54a3309 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -17,14 +17,38 @@ import ( "strings" ) -// Config is the agent configuration. Only the fields the proxmox interaction -// layer needs are present in this slice. +// Config is the agent configuration. type Config struct { Proxmox ProxmoxConfig `json:"proxmox"` Privileged PrivilegedConfig `json:"privileged"` + Authz AuthzConfig `json:"authz"` LogLevel string `json:"log_level"` // debug|info|warn|error (default info) } +// AuthzConfig configures operator-signed-op verification (internal/authz). The +// pinned operator public keys are kept here as raw authorized_keys-style lines +// (this package stays dependency-free); the authz package parses them into its +// AllowedSigner set. Role-scoping (recovery keys authorize only key-rotation) is +// enforced by the consuming layer, not loaded here. +type AuthzConfig struct { + // NonceStorePath is the durable, crash-safe nonce log (anti-replay). Must be on + // persistent host storage so replay protection survives agent restarts. + NonceStorePath string `json:"nonce_store_path"` + // Signers are the pinned operator public keys (doc 04 §3 two-key model). + Signers []SignerKey `json:"signers"` +} + +// SignerKey is one pinned operator public key. +type SignerKey struct { + KeyID string `json:"key_id"` + // Role is "operational" (signs destructive ops) or "recovery" (cold key; + // authorizes only key-rotation/break-glass). + Role string `json:"role"` + // PublicKey is a standard authorized_keys line, e.g. + // "ssh-ed25519 AAAA… felhom-op-1" or "sk-ssh-ed25519@openssh.com AAAA… …". + PublicKey string `json:"public_key"` +} + // ProxmoxConfig configures the API client. type ProxmoxConfig struct { // Endpoint defaults to https://127.0.0.1:8006 (agent runs on the host). @@ -62,6 +86,7 @@ func Default() Config { return Config{ Proxmox: ProxmoxConfig{Endpoint: "https://127.0.0.1:8006"}, Privileged: PrivilegedConfig{Mode: "sudo"}, + Authz: AuthzConfig{NonceStorePath: "/var/lib/felhom-agent/nonces.log"}, LogLevel: "info", } }