feat(authz): operator signed-op verifier + durable nonce store (slice 2, v0.2.0)

internal/authz: production form of the Phase-4 SSHSIG signing primitive.

- Verifier.New/Verify with the LOCKED pipeline (namespace → allow-list by key
  material → crypto over RAW bytes → target → time → nonce LAST); each post-crypto
  stage rejects even with a valid sig; an invalid sig never burns a nonce.
- SSHSIG framing via x/crypto/ssh (no hand-rolled crypto); key-type-agnostic
  (ed25519 / sk-ssh-ed25519 / rsa / ecdsa via pub.Verify). Fixed namespace
  felhom-op-v1. Typed errors. OpBlob (fixed host_id/guest_id tags) + VerifiedOp.
- NonceStore: MemoryNonceStore + durable crash-safe FileNonceStore (fsync'd append
  log, replay-on-open, compaction, expiry-only pruning; survives restart).
- config.AuthzConfig (nonce path + pinned operational/recovery signer keys).
- Tests (14): real ssh-keygen fixture, per-stage rejection, nonce-not-burned,
  replay, persistence-across-restart, synthetic sk, byte-exactness.

Dep: golang.org/x/crypto v0.52.0 (declares go 1.25 — the Phase-4 doc's "Go 1.24.4 /
x/crypto v0.52.0" pairing doesn't build; build server upgraded to go1.26.0,
backward-compatible). Version 0.1.0 -> 0.2.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 15:23:02 +02:00
parent 43b7e96905
commit f0fee7e193
19 changed files with 1231 additions and 41 deletions
+56
View File
@@ -3,6 +3,62 @@
All notable changes to **felhom-agent** are recorded here. Update on every code All notable changes to **felhom-agent** are recorded here. Update on every code
change that gets pushed. change that gets pushed.
## v0.2.0 — `authz` signed-op verifier (slice 2) (2026-06-08)
Production form of the Phase-4 signing primitive: a key-type-agnostic SSHSIG
verifier for operator-signed destructive ops, with the full anti-replay/
authorization pipeline and a durable, crash-safe nonce store. What slice 4
(reconcile) will call to gate destructive desired-state deltas. No hub, no signing
CLI, no reconcile loop.
### Added
- **`internal/authz``Verifier`**: `New(signers, store, hostID)` + `Verify(blob,
sigArmored) (*VerifiedOp, error)`. Runs the LOCKED pipeline (order is
load-bearing): parse armor → namespace → parse pubkey → allow-list (by key
**material**, `pub.Marshal()` equality, not key_id) → crypto verify (over the
**raw received bytes**, never re-canonicalized) → parse blob → target → time
window → **nonce recorded LAST**. Each post-crypto stage rejects even with a
valid signature.
- **SSHSIG framing** (`sshsig.go`) via `golang.org/x/crypto/ssh` — `pem.Decode` →
strip 6-byte magic → `ssh.Unmarshal` → `ssh.ParsePublicKey` → recompute signed
data with the named hash → `pub.Verify` (dispatches on key algorithm). No
hand-rolled crypto. Key-type-agnostic: ed25519 / **sk-ssh-ed25519 (FIDO2)** /
rsa / ecdsa via the one path.
- **Fixed namespace** `felhom-op-v1` (package constant, never caller-supplied).
- **`OpBlob`** (corrected `host_id`/`guest_id` json tags) + **`VerifiedOp`** (op,
host/guest, params, key_id, matched signer). key_id is advisory/audit only —
never an authz input.
- **Typed errors**: `ErrMalformed, ErrNamespace, ErrUnknownSigner, ErrBadSignature,
ErrTarget, ErrExpired, ErrNotYetValid, ErrReplay` (errors.Is-friendly).
- **`NonceStore`** + two impls: `MemoryNonceStore` (tests) and **`FileNonceStore`**
— durable, crash-safe (fsync'd append log, replayed into an index on open,
periodic compaction, expiry-only pruning). A nonce is fsync'd to disk before
`SeenOrRecord` returns false; replay protection survives restart; I/O failure
fails safe (reports seen=true). Target generalization: host_id matched strictly,
guest_id surfaced for the caller to route.
- **Config**: `AuthzConfig` (nonce-store path + pinned operator `signers` tagged
`operational`/`recovery` with a key_id, as authorized_keys lines).
- **Version 0.2.0.**
### Tests
- Real OpenSSH interop via a committed `ssh-keygen -Y sign` vector (hermetic CI);
per-stage rejection (each with an otherwise-valid sig); the headline
**invalid-sig-does-not-burn-the-nonce** invariant; replay; **persistence across
restart**; synthetic **sk-ssh-ed25519** through the unchanged path; byte-exactness
(a re-serialized blob fails crypto — not re-canonicalized).
### Notes / corrections to the Phase-4 reference
- §7's `Target` lacked json tags (`host_id`/`guest_id`) — fixed.
- The doc paired "Go 1.24.4 / x/crypto v0.52.0", but v0.52.0 declares `go 1.25.0`
and does **not** build on Go 1.24. Resolved by upgrading the build server to
go1.26.0 (backward-compatible; felhom-controller/hub unaffected); the module is
`go 1.25.0` on x/crypto v0.52.0.
- Free function → constructed `Verifier`; returns the full `VerifiedOp`; typed
errors; clock-skew tolerance added; durable nonce store is the net-new work.
- **Shared-contract dependency flagged** (not built): the hub and the `felhom-sign`
CLI must emit byte-identical canonical JSON or signatures won't verify; a shared
canonicalizer both import would be the right home.
## v0.1.0 — Scaffold + `proxmox` interaction layer (slice 1) (2026-06-08) ## v0.1.0 — Scaffold + `proxmox` interaction layer (slice 1) (2026-06-08)
First slice: stand up the host-agent project and its foundation — the typed First slice: stand up the host-agent project and its foundation — the typed
+60 -37
View File
@@ -3,51 +3,74 @@
> This file holds the report for the **most recent** change, fully overwritten each task. > This file holds the report for the **most recent** change, fully overwritten each task.
> Cumulative history lives in [CHANGELOG.md](CHANGELOG.md). > Cumulative history lives in [CHANGELOG.md](CHANGELOG.md).
## Task: Agent scaffold + `proxmox` interaction package (slice 1) — v0.1.0 ## Task: `authz` signed-op verifier (slice 2) — v0.2.0
Stood up the host-agent project and its foundation — the typed `proxmox` interaction Turned the Phase-4 reference `VerifySignedOp` into a production package
layer every other agent module will call — with a runnable read-only `--selftest`. (`internal/authz`): a key-type-agnostic SSHSIG verifier for operator-signed destructive
Pushed to `main` (main-only repo). Build/vet/test green; verified live against the demo host. ops, the full anti-replay/authorization pipeline, and a durable, crash-safe nonce store.
This is what slice 4 (reconcile) calls to gate destructive desired-state deltas. Pushed to
`main`. Build/vet/test green locally (Go 1.26) and on the build server.
### Public surface ### Public surface (`internal/authz`)
- **`Verifier`** — `New(signers []AllowedSigner, store NonceStore, hostID string) *Verifier`;
`Verify(blob, sigArmored []byte) (*VerifiedOp, error)`. Optional `ClockSkew` (default 2m,
not-yet-valid only) and `Logger` (advisory key_id-mismatch warning).
- **`OpBlob`** — canonical signed object; `Target{HostID,GuestID}` with corrected
`host_id`/`guest_id` json tags; `Params json.RawMessage`, `Nonce`, `IssuedAt`, `ExpiresAt`, `KeyID`.
- **`VerifiedOp`** — `Op, HostID, GuestID, Params, Nonce, IssuedAt, ExpiresAt, KeyID (advisory),
Signer (matched), KeyIDMatchesSigner`.
- **`AllowedSigner`** + `NewAllowedSigner(keyID, role, authorizedKeyLine)`; roles
`RoleOperational` / `RoleRecovery` (doc 04 two-key model; role-scoping enforced by the caller).
- **`NonceStore`** interface + `MemoryNonceStore` (tests) and **`FileNonceStore`** (durable).
- **Typed errors**: `ErrMalformed, ErrNamespace, ErrUnknownSigner, ErrBadSignature, ErrTarget,
ErrExpired, ErrNotYetValid, ErrReplay` (errors.Is-friendly).
- **Config**: `config.AuthzConfig` (nonce-store path + pinned `Signers`).
**`proxmox.Client`** (API backend): ### Locked pipeline (order load-bearing)
- Read: `Version`, `Nodes`, `NodeStatus`, `ListLXC`, `GuestStatus`, `GuestConfig`, `ListStorage`, `NodeStorage`, `StorageContent` `parse armor → namespace (fixed felhom-op-v1) → parse pubkey → allow-list by key MATERIAL (not
- Async mutating (return a UPID): `RestoreLXC` (primary create path), `Vzdump`, `Snapshot`, `Rollback`, `DeleteSnapshot`, `SetConfig`, `Start`, `Stop` key_id) → crypto verify over RAW received bytes → parse blob → target (host strict, guest
- Tasks: `WaitTask(ctx, upid, WaitOptions)`, `TaskStatusOnce`, `TaskLogTail` surfaced) → time window → nonce recorded LAST`. Each post-crypto stage rejects even with a
- Errors: `*APIError` (parses the offending privilege from a 403), `*TaskError` (parses it from a failed task `exitstatus` + log tail) valid signature; an invalid signature can never consume a nonce.
- Types: `Version, Node, NodeStatus, Guest, GuestConfig (+Extra/MountPoints/Nets), Storage, StorageContent, TaskStatus, UPID`
**`proxmox.Privileged`** (fenced root-CLI; `Runner` iface, `ExecRunner` direct/`sudo -n`): `CreateGoldenLXC` (keyctl), `MountUSBByUUID`, `SMART`, `Sensors` — each documents *why it can't be the API*. ### Durable nonce store — mechanism & guarantee
fsync'd append-only JSONL log + in-memory index (replayed on open) + periodic compaction.
- **Crash-safe**: a nonce is written and `fsync`'d before `SeenOrRecord` returns `false`, so the
caller acts only *after* the durable record. A crash between verify and execute drops the op
(fail-safe) and never enables a replay. I/O failure → returns seen=true (op not executed).
- **Survives restart**: the log is replayed into the index on `OpenFileNonceStore`.
- **Pruning**: expired nonces dropped only at compaction (never before exp) — and an expired op
is rejected by the time check before the nonce check, so pruning is housekeeping, not a hole.
- **Concurrency-safe**: single mutex over file handle + index.
### API-vs-root routing table ### OPEN choices
- **Clock skew**: 2-minute tolerance on *not-yet-valid* only; expiry not extended (window stays an
honest bound).
- **Durable mechanism**: fsync'd append log + compaction (simple, honest, no embedded-KV dep).
- **Fixtures**: committed real `ssh-keygen -Y sign` vector (hermetic + proves OpenSSH interop) +
in-Go minting for rejection cases; the sk case is synthetic (spec-faithful, no hardware).
- **Package name**: `authz` (control-plane-authorization layer, matches doc 04).
| Backend | Ops | Why | ### Test matrix (all pass — 14 tests)
|---|---|---| Real ssh-keygen fixture · happy path · per-stage rejection {namespace, unknown-signer, tampered,
| **API** | node status, list/status/config guests, storage list+content, task status/log, **restore**, vzdump, snapshot/rollback/delete-snap, set-config, start/stop | FelhomAgent 16-priv token | retargeted-host, expired, not-yet-valid, replay} · **invalid-sig-does-NOT-burn-nonce** (then the
| **root-CLI (fenced)** | golden `pct create` (keyctl=1), USB mount-by-UUID/fstab, SMART/sensors | keyctl is `root@pam`-only; host mounts + SMART aren't API ops | valid op with that nonce still succeeds) · replay-rejected-across-restart (durable store) ·
key-type-agnostic synthetic **sk-ssh-ed25519** · byte-exactness (re-serialized blob fails crypto).
Fence is **structural** (`Client` has no runner, `Privileged` has no HTTP client) and asserted in `routing_test.go`. ### Corrections to the Phase-4 §7 reference (for production)
- `Target` needed `host_id`/`guest_id` json tags — fixed.
### OPEN-item choices - **The doc's "Go 1.24.4 / x/crypto v0.52.0" does not hold**: x/crypto v0.52.0 declares
- **Config:** JSON file + `FELHOM_AGENT_*` env overrides (stdlib, zero-dep; swappable to `yaml.v3` if YAML house-style is preferred). Token never logged (`Redacted()`). `go 1.25.0` and won't build on Go 1.24. Resolved by upgrading the build server to **go1.26.0**
- **Privileged runner / uid:** `Runner` iface; `ExecRunner{Mode: sudo|direct}`, default `sudo -n`. Proposed (not finalized): non-root service user + narrow sudoers allowlist for the 3 fenced commands. (backward-compatible — felhom-controller/hub build unchanged; distro Go package left intact,
- **Polling:** first poll immediate, then 1s → exponential backoff capped 5s, default total timeout 10m; honors ctx cancellation. Tunable via `WaitOptions`. upstream Go fronted on PATH).
- **`--selftest=task`:** included (gated behind the flag + `-vmid`). Unit-tested via mocks; not run live (the live token was read-only). - Free function → constructed `Verifier`; returns full `VerifiedOp`; typed errors; clock-skew;
- **Versioning:** `version` var in `main.go` (default `0.1.0`, `-ldflags -X main.version=`), `--version` flag. durable nonce store (the net-new engineering).
- **Shared-contract flag (not built)**: the hub and `felhom-sign` CLI must produce byte-identical
### What the live host revealed (recorded, not guessed) canonical JSON or signatures won't verify; a shared canonicalizer both import is the right home.
- Node name is **`demo-felhom`**; `felhom-pve` is only the SSH alias.
- `/nodes/{node}/status`: `cpu` is a 0..1 fraction, **`loadavg` is an array of strings**; `memory`/`rootfs`/`swap` nested.
- `vmid` is an **integer** in list/status; `status/current` carries no `vmid` (set from the path arg).
- Task: `status` ∈ {running, stopped}, `exitstatus` only once stopped; task log is `[{"n":N,"t":"…"}]`. UPID = `UPID:node:pid(hex):pstart(hex):starttime(hex):worker:id:user:`.
- `pveum user token add … --output-format json` returns `{"value":"…"}`.
- **No spike fact failed in practice** — 16-priv role, async/UPID model, keyctl boundary, dual-grant privsep all held. Teardown logged `ignore invalid acl token …`, confirming ACL auto-invalidation (phase1-2 §5).
### Verification ### Verification
- `go build/vet/test` green twice: locally (Go 1.26) and on the build server (Go 1.24.4). - `go build/vet/test` green locally (go1.26.0) and on the build server (upgraded to go1.26.0).
- **Live read-only `--selftest`** (built on 192.168.0.180, against `https://192.168.0.162:8006`, **TLS fingerprint-pinned** — no insecure mode): version, nodes, node status, guests, storage all `[ ok ]`. slog confirmed the token rendered as `…=********`. Throwaway token created + torn down. - Real OpenSSH `ssh-keygen` (OpenSSH 10.0p2) minted the committed fixture and self-verified it
- Mutating ops + live `WaitTask` are unit-tested only (live run used a read-only token); `--selftest=task` is ready to exercise them against a real `FelhomAgent` token. before commit.
### Repo state ### Repo state
- Branch: `main` only (feature branch merged + deleted, local & remote). Latest: `chore(agent): add CHANGELOG, version the agent at 0.1.0`. - Branch: `main` only. Dep: `golang.org/x/crypto v0.52.0` (+ `x/sys` indirect); `go 1.25.0`.
+1 -1
View File
@@ -24,7 +24,7 @@ import (
// version is the agent version. Overridable at build time with // version is the agent version. Overridable at build time with
// -ldflags "-X main.version=<v>"; defaults to the in-repo CHANGELOG version. // -ldflags "-X main.version=<v>"; defaults to the in-repo CHANGELOG version.
var version = "0.1.0" var version = "0.2.0"
func main() { func main() {
var ( var (
+5 -1
View File
@@ -1,3 +1,7 @@
module gitea.dooplex.hu/admin/felhom-agent module gitea.dooplex.hu/admin/felhom-agent
go 1.24 go 1.25.0
require golang.org/x/crypto v0.52.0
require golang.org/x/sys v0.45.0 // indirect
+6
View File
@@ -0,0 +1,6 @@
golang.org/x/crypto v0.52.0 h1:RMs7fP2rXdep0CftQlK8Uf+kibLm7qkCcradZWYz988=
golang.org/x/crypto v0.52.0/go.mod h1:1QgfPxDqh0T2M/elOJtp9RvuR95kVjir0e6/BvEmGbc=
golang.org/x/sys v0.45.0 h1:dO4czNzziLiiXplLQgBCEpCvXQ3dnkn0SdaZSYdQ+FY=
golang.org/x/sys v0.45.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/term v0.43.0 h1:S4RLU2sB31O/NCl+zFN9Aru9A/Cq2aqKpTZJ6B+DwT4=
golang.org/x/term v0.43.0/go.mod h1:lrhlHNdQJHO+1qVYiHfFKVuVioJIheAc3fBSMFYEIsk=
+51
View File
@@ -0,0 +1,51 @@
package authz
import (
"encoding/json"
"time"
)
// Target binds an op to a specific box (and optionally a guest) — the anti-retarget
// field. The §7 reference omitted the json tags; production needs them so the
// signed canonical bytes decode correctly.
type Target struct {
HostID string `json:"host_id"`
GuestID string `json:"guest_id"`
}
// OpBlob is the canonical signed object (phase4 §2). The signature covers the
// EXACT bytes of this object's canonical JSON (keys sorted at every level, no
// insignificant whitespace, no trailing newline, UTF-8) — produced by the
// operator CLI / hub, verified here over the raw received bytes.
type OpBlob struct {
Op string `json:"op"`
Target Target `json:"target"`
Params json.RawMessage `json:"params"`
Nonce string `json:"nonce"`
IssuedAt time.Time `json:"issued_at"`
ExpiresAt time.Time `json:"expires_at"`
KeyID string `json:"key_id"`
}
// VerifiedOp is the authenticated, parsed op returned on success — everything the
// reconcile layer (slice 4) needs to route and execute, not just the op string.
type VerifiedOp struct {
Op string // the operation, e.g. "guest_destroy"
HostID string // target host (== this agent's host)
// GuestID is non-empty for a guest-scoped op; the caller routes by it. "" =
// host-scoped op. The verifier does NOT need to know all guest ids.
GuestID string
Params json.RawMessage
Nonce string
IssuedAt time.Time
ExpiresAt time.Time
// KeyID is the blob's self-declared key id — ADVISORY / audit only, never an
// authz input. Authz is the key-material allow-list match (Signer below).
KeyID string
// Signer is the allow-listed key whose material matched the signature.
Signer AllowedSigner
// KeyIDMatchesSigner is false when the blob's advisory KeyID disagrees with
// the matched signer's id (a benign audit signal, not a rejection).
KeyIDMatchesSigner bool
}
+37
View File
@@ -0,0 +1,37 @@
// Package authz is the control-plane-authorization layer: it verifies
// operator-signed destructive ops before the agent executes them. It is what the
// reconcile loop (slice 4) calls to gate destructive desired-state deltas and
// signed one-shot jobs (03 §4, 04). The signing mechanism is proven (Phase 4,
// 14/14) — this package is its production form: a key-type-agnostic SSHSIG
// verifier, the full anti-replay/authorization pipeline, and a durable,
// crash-safe nonce store.
//
// # Mechanism (LOCKED — do not redesign)
//
// - SSHSIG via golang.org/x/crypto/ssh; no hand-rolled crypto, no raw-Ed25519
// fallback. pub.Verify dispatches on the key's own algorithm, so the same path
// accepts ed25519 / sk-ssh-ed25519 (FIDO2) / rsa / ecdsa — a hardware operator
// key later is a box no-op (Phase 4 §5/§6, doc 04 §7).
// - Fixed namespace felhom-op-v1 (package constant, never caller-supplied).
// - The verifier verifies over the RAW received blob bytes and never
// canonicalizes — the canonical form (sorted-key, whitespace-free JSON) is the
// signer's contract, shared by the hub and the felhom-sign CLI.
//
// # Pipeline order (load-bearing — Verify)
//
// parse armor → namespace → parse pubkey → allow-list (by key MATERIAL, not
// key_id) → crypto verify → parse blob → target → time window → nonce LAST
//
// Each post-crypto stage rejects even with an otherwise-valid signature. The nonce
// is recorded last, so an invalid signature can never consume a nonce. key_id is
// advisory/audit only — authz is the key-material allow-list match.
//
// # Shared-contract dependency (flag for later, not built here)
//
// Signatures only verify if the op-generator (hub) and the felhom-sign CLI produce
// BYTE-IDENTICAL canonical JSON (keys sorted at every level, no insignificant
// whitespace, no trailing newline, UTF-8 — Phase 4 §2). The verifier deliberately
// does NOT re-canonicalize, so a divergence between those two producers surfaces as
// a crypto failure here. A shared canonicalizer that both import would be the right
// home for that contract; it is out of scope for this slice.
package authz
+27
View File
@@ -0,0 +1,27 @@
package authz
import "errors"
// Typed rejection sentinels — one per pipeline stage so the reconcile layer can
// distinguish "rejected" (a real signed op that failed a check) from "malformed"
// from a future "not yet signed". All are errors.Is-friendly: Verify wraps them
// with %w plus context.
var (
// ErrMalformed: the armor/SSHSIG/blob could not be parsed (not a rejection of
// a well-formed op — bad input).
ErrMalformed = errors.New("authz: malformed signature or blob")
// ErrNamespace: SSHSIG namespace != the fixed felhom-op-v1 domain separator.
ErrNamespace = errors.New("authz: namespace mismatch")
// ErrUnknownSigner: the signing key's material is not in the pinned allow-list.
ErrUnknownSigner = errors.New("authz: signer not in allowed set")
// ErrBadSignature: cryptographic verification failed (tamper / wrong key).
ErrBadSignature = errors.New("authz: signature did not verify")
// ErrTarget: target.host_id is not this box.
ErrTarget = errors.New("authz: target mismatch")
// ErrExpired: now > expires_at.
ErrExpired = errors.New("authz: op expired")
// ErrNotYetValid: now < issued_at (minus clock-skew tolerance).
ErrNotYetValid = errors.New("authz: op not yet valid")
// ErrReplay: the nonce was already recorded in the window.
ErrReplay = errors.New("authz: replay (nonce already seen)")
)
+107
View File
@@ -0,0 +1,107 @@
package authz
import (
"crypto/ed25519"
"crypto/rand"
"crypto/sha256"
"encoding/binary"
"encoding/pem"
"fmt"
"testing"
"time"
"golang.org/x/crypto/ssh"
)
// Test helpers that MINT armored SSHSIGs in-Go (hermetic) — the inverse of the
// production framing. They reuse the production signedData()/sshsigBlob so a test
// can never drift from the verifier's notion of the signed bytes.
// canonicalBlob builds an op blob in the §2 canonical field order. (Self-consistent
// for the in-Go path: we sign exactly these bytes and verify the same bytes. The
// committed ssh-keygen fixture exercises real OpenSSH canonical interop.)
func canonicalBlob(op, hostID, guestID, keyID, nonce, paramsJSON string, issued, expires time.Time) []byte {
if paramsJSON == "" {
paramsJSON = "{}"
}
return []byte(fmt.Sprintf(
`{"expires_at":%q,"issued_at":%q,"key_id":%q,"nonce":%q,"op":%q,"params":%s,"target":{"guest_id":%q,"host_id":%q}}`,
expires.UTC().Format(time.RFC3339), issued.UTC().Format(time.RFC3339),
keyID, nonce, op, paramsJSON, guestID, hostID))
}
// mintArmor builds an armored SSHSIG over message, using sign to produce the inner
// ssh.Signature over the recomputed SSHSIG signed-data.
func mintArmor(t *testing.T, pubMarshaled []byte, namespace, hashName string, message []byte, sign func([]byte) ssh.Signature) []byte {
t.Helper()
sb := &sshsigBlob{Version: 1, PublicKey: string(pubMarshaled), Namespace: namespace, Reserved: "", HashAlgo: hashName}
signed, err := signedData(sb, message)
if err != nil {
t.Fatalf("signedData: %v", err)
}
sig := sign(signed)
sb.Signature = string(ssh.Marshal(&sig))
raw := append([]byte(sshsigMagic), ssh.Marshal(sb)...)
return pem.EncodeToMemory(&pem.Block{Type: "SSH SIGNATURE", Bytes: raw})
}
// newEd25519Signer returns an ssh.PublicKey + a sign closure for a fresh ed25519 key.
func newEd25519Signer(t *testing.T) (ssh.PublicKey, func([]byte) ssh.Signature) {
t.Helper()
pub, priv, err := ed25519.GenerateKey(rand.Reader)
if err != nil {
t.Fatal(err)
}
sshPub, err := ssh.NewPublicKey(pub)
if err != nil {
t.Fatal(err)
}
sign := func(signed []byte) ssh.Signature {
return ssh.Signature{Format: ssh.KeyAlgoED25519, Blob: ed25519.Sign(priv, signed)}
}
return sshPub, sign
}
// newSyntheticSKSigner emulates a FIDO2 sk-ssh-ed25519@openssh.com key with NO
// hardware (Phase 4 §5). It builds a spec-faithful sk public key and an sk-format
// signature: ed25519 over sha256(application)‖flags‖counter‖sha256(signed_data),
// sig.Blob = the raw ed25519 signature, sig.Rest = flags‖counter. It must verify
// through the UNCHANGED Verify path.
func newSyntheticSKSigner(t *testing.T) (ssh.PublicKey, func([]byte) ssh.Signature) {
t.Helper()
edPub, edPriv, err := ed25519.GenerateKey(rand.Reader)
if err != nil {
t.Fatal(err)
}
const application = "ssh:"
skBlob := ssh.Marshal(struct {
Name string
KeyBytes []byte
Application string
}{"sk-ssh-ed25519@openssh.com", []byte(edPub), application})
skPub, err := ssh.ParsePublicKey(skBlob)
if err != nil {
t.Fatalf("parse synthetic sk pubkey: %v", err)
}
if skPub.Type() != "sk-ssh-ed25519@openssh.com" {
t.Fatalf("sk pubkey type = %q", skPub.Type())
}
sign := func(signed []byte) ssh.Signature {
const flagUserPresence = byte(0x01) // required, else Verify rejects
const counter = uint32(1)
appDigest := sha256.Sum256([]byte(application))
dataDigest := sha256.Sum256(signed)
// original = appDigest ‖ flags ‖ counter(BE) ‖ dataDigest (x/crypto layout)
var original []byte
original = append(original, appDigest[:]...)
original = append(original, flagUserPresence)
original = binary.BigEndian.AppendUint32(original, counter)
original = append(original, dataDigest[:]...)
edSig := ed25519.Sign(edPriv, original)
// sig.Rest = skFields{Flags, Counter} = flags ‖ counter(BE)
rest := append([]byte{flagUserPresence}, binary.BigEndian.AppendUint32(nil, counter)...)
return ssh.Signature{Format: "sk-ssh-ed25519@openssh.com", Blob: edSig, Rest: rest}
}
return skPub, sign
}
+195
View File
@@ -0,0 +1,195 @@
package authz
import (
"bytes"
"encoding/json"
"errors"
"io/fs"
"os"
"path/filepath"
"sync"
"time"
)
// MemoryNonceStore is a non-durable NonceStore for tests. Replay protection does
// NOT survive process restart — never use it on a real host.
type MemoryNonceStore struct {
mu sync.Mutex
seen map[string]time.Time
}
// NewMemoryNonceStore builds an empty in-memory store.
func NewMemoryNonceStore() *MemoryNonceStore {
return &MemoryNonceStore{seen: make(map[string]time.Time)}
}
// SeenOrRecord reports whether nonce was already recorded, recording it if not.
func (m *MemoryNonceStore) SeenOrRecord(nonce string, exp time.Time) bool {
m.mu.Lock()
defer m.mu.Unlock()
if _, ok := m.seen[nonce]; ok {
return true
}
m.seen[nonce] = exp
return false
}
// FileNonceStore is the durable, crash-safe NonceStore for the host. Mechanism:
// an fsync'd append-only JSONL log with an in-memory index, periodic compaction,
// and expiry-only pruning.
//
// Durability guarantee: a nonce is on disk AND fsync'd before SeenOrRecord returns
// false, so the caller acting on a verified op always does so AFTER the durable
// record. A crash between verify and execute therefore drops the op (fail-safe
// direction) and never enables a replay. Replay protection survives restarts: the
// log is replayed into the index on Open.
//
// Pruning: a nonce is dropped only after its exp (compaction), never before —
// pruning before expiry would reopen the replay window. (An expired nonce can't be
// replayed anyway: the time-window check rejects an expired op before the nonce
// check, so pruning is housekeeping, not an authz hole.)
//
// Concurrency: a single mutex guards the file handle and index (single-process; the
// agent is concurrent — 03 §10).
type FileNonceStore struct {
mu sync.Mutex
path string
f *os.File
idx map[string]time.Time
sinceCompact int
now func() time.Time
// CompactEvery is the append count that triggers a compaction (default 1000).
CompactEvery int
}
type nonceRecord struct {
Nonce string `json:"n"`
Exp time.Time `json:"e"`
}
// OpenFileNonceStore opens (or creates) the durable store at path, replaying any
// existing log into the index.
func OpenFileNonceStore(path string) (*FileNonceStore, error) {
s := &FileNonceStore{
path: path,
idx: make(map[string]time.Time),
now: func() time.Time { return time.Now().UTC() },
CompactEvery: 1000,
}
if err := s.load(); err != nil {
return nil, err
}
f, err := os.OpenFile(path, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o600)
if err != nil {
return nil, err
}
s.f = f
syncDir(filepath.Dir(path)) // make a freshly-created file's dir entry durable
return s, nil
}
func (s *FileNonceStore) load() error {
b, err := os.ReadFile(s.path)
if errors.Is(err, fs.ErrNotExist) {
return nil
}
if err != nil {
return err
}
for _, line := range bytes.Split(b, []byte("\n")) {
line = bytes.TrimSpace(line)
if len(line) == 0 {
continue
}
var r nonceRecord
if json.Unmarshal(line, &r) != nil {
continue // skip a torn trailing line from a crash mid-append
}
s.idx[r.Nonce] = r.Exp
}
return nil
}
// SeenOrRecord durably records an unseen nonce before returning false. On any I/O
// failure it returns true (fail-safe: the op is NOT executed rather than risk an
// unrecorded nonce enabling a later replay).
func (s *FileNonceStore) SeenOrRecord(nonce string, exp time.Time) bool {
s.mu.Lock()
defer s.mu.Unlock()
if _, ok := s.idx[nonce]; ok {
return true
}
rec, _ := json.Marshal(nonceRecord{Nonce: nonce, Exp: exp})
rec = append(rec, '\n')
if _, err := s.f.Write(rec); err != nil {
return true
}
if err := s.f.Sync(); err != nil {
return true
}
s.idx[nonce] = exp
s.sinceCompact++
s.maybeCompact()
return false
}
// Close releases the file handle.
func (s *FileNonceStore) Close() error {
s.mu.Lock()
defer s.mu.Unlock()
if s.f != nil {
return s.f.Close()
}
return nil
}
// maybeCompact rewrites the log keeping only non-expired entries once enough
// appends have accrued. Caller holds the mutex. Compaction is housekeeping: the
// recorded nonce is already durable, so a compaction failure never fails the op.
func (s *FileNonceStore) maybeCompact() {
if s.CompactEvery <= 0 || s.sinceCompact < s.CompactEvery {
return
}
s.sinceCompact = 0
now := s.now()
live := make(map[string]time.Time, len(s.idx))
var buf bytes.Buffer
for n, e := range s.idx {
if e.Before(now) {
continue // prune AFTER expiry only — safe
}
live[n] = e
rec, _ := json.Marshal(nonceRecord{Nonce: n, Exp: e})
buf.Write(rec)
buf.WriteByte('\n')
}
tmp := s.path + ".tmp"
if err := os.WriteFile(tmp, buf.Bytes(), 0o600); err != nil {
return // keep using the existing handle; nonce already durable
}
if tf, err := os.OpenFile(tmp, os.O_WRONLY, 0o600); err == nil {
_ = tf.Sync()
_ = tf.Close()
}
if s.f != nil {
_ = s.f.Close()
}
if err := os.Rename(tmp, s.path); err != nil {
s.f, _ = os.OpenFile(s.path, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o600)
return
}
syncDir(filepath.Dir(s.path))
s.f, _ = os.OpenFile(s.path, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o600)
s.idx = live
}
// syncDir best-effort fsyncs a directory so a create/rename is durable.
func syncDir(dir string) {
if d, err := os.Open(dir); err == nil {
_ = d.Sync()
_ = d.Close()
}
}
+95
View File
@@ -0,0 +1,95 @@
package authz
import (
"os"
"path/filepath"
"testing"
"time"
)
func TestMemoryNonceStore(t *testing.T) {
m := NewMemoryNonceStore()
exp := time.Now().Add(time.Hour)
if m.SeenOrRecord("a", exp) {
t.Fatal("first record should be unseen")
}
if !m.SeenOrRecord("a", exp) {
t.Fatal("second record should be seen")
}
if m.SeenOrRecord("b", exp) {
t.Fatal("distinct nonce should be unseen")
}
}
func TestFileNonceStore_RecordAndReload(t *testing.T) {
path := filepath.Join(t.TempDir(), "nonces.log")
exp := refNow.Add(time.Hour)
s1, err := OpenFileNonceStore(path)
if err != nil {
t.Fatal(err)
}
if s1.SeenOrRecord("dead", exp) {
t.Fatal("first record should be unseen")
}
if err := s1.Close(); err != nil {
t.Fatal(err)
}
// Reopen: the recorded nonce must still be seen (durable across restart).
s2, err := OpenFileNonceStore(path)
if err != nil {
t.Fatal(err)
}
defer s2.Close()
if !s2.SeenOrRecord("dead", exp) {
t.Fatal("nonce not durable across reopen")
}
}
func TestFileNonceStore_CompactionPrunesExpiredOnly(t *testing.T) {
path := filepath.Join(t.TempDir(), "nonces.log")
s, err := OpenFileNonceStore(path)
if err != nil {
t.Fatal(err)
}
s.now = func() time.Time { return refNow }
s.CompactEvery = 2 // force a compaction after two appends
s.SeenOrRecord("expired", refNow.Add(-time.Hour)) // exp in the past
s.SeenOrRecord("live", refNow.Add(time.Hour)) // triggers compaction
if err := s.Close(); err != nil {
t.Fatal(err)
}
// Reopen: the live nonce survived, the expired one was pruned (housekeeping;
// an expired op is rejected by the time check before the nonce check anyway).
s2, err := OpenFileNonceStore(path)
if err != nil {
t.Fatal(err)
}
defer s2.Close()
if !s2.SeenOrRecord("live", refNow.Add(time.Hour)) {
t.Error("live nonce should have survived compaction")
}
if s2.SeenOrRecord("expired", refNow.Add(-time.Hour)) {
t.Error("expired nonce should have been pruned (was still present)")
}
}
func TestFileNonceStore_SkipsTornLine(t *testing.T) {
path := filepath.Join(t.TempDir(), "nonces.log")
// a valid record line + a torn/garbage trailing line from a hypothetical crash
content := `{"n":"good","e":"` + refNow.Add(time.Hour).Format(time.RFC3339Nano) + `"}` + "\n" + `{"n":"tor`
if err := os.WriteFile(path, []byte(content), 0o600); err != nil {
t.Fatal(err)
}
s, err := OpenFileNonceStore(path)
if err != nil {
t.Fatalf("open with torn line should not fail: %v", err)
}
defer s.Close()
if !s.SeenOrRecord("good", refNow.Add(time.Hour)) {
t.Error("valid record before the torn line should have loaded")
}
}
+36
View File
@@ -0,0 +1,36 @@
package authz
import (
"os"
"testing"
)
func TestNewAllowedSigner(t *testing.T) {
line, err := os.ReadFile("testdata/operator.pub")
if err != nil {
t.Fatal(err)
}
s, err := NewAllowedSigner("felhom-op-1", RoleOperational, string(line))
if err != nil {
t.Fatalf("NewAllowedSigner: %v", err)
}
if s.KeyID != "felhom-op-1" || s.Role != RoleOperational || s.PublicKey == nil {
t.Errorf("signer = %+v", s)
}
if s.PublicKey.Type() != "ssh-ed25519" {
t.Errorf("key type = %q", s.PublicKey.Type())
}
}
func TestNewAllowedSigner_BadRole(t *testing.T) {
line, _ := os.ReadFile("testdata/operator.pub")
if _, err := NewAllowedSigner("k", "bogus", string(line)); err == nil {
t.Fatal("invalid role should error")
}
}
func TestNewAllowedSigner_BadLine(t *testing.T) {
if _, err := NewAllowedSigner("k", RoleOperational, "not a key"); err == nil {
t.Fatal("malformed key line should error")
}
}
+81
View File
@@ -0,0 +1,81 @@
package authz
import (
"crypto/sha256"
"crypto/sha512"
"encoding/pem"
"fmt"
"hash"
"golang.org/x/crypto/ssh"
)
// SSHSIG framing — ported verbatim-in-spirit from phase4-signing-findings.md §7.
// The only manual work is SSHSIG *framing*; all crypto and key-type dispatch is
// x/crypto/ssh's (pub.Verify dispatches on the key's own algorithm, which is what
// makes the verifier key-type-agnostic — ed25519 / sk-ssh-ed25519 / rsa / ecdsa).
// No hand-rolled crypto.
const sshsigMagic = "SSHSIG"
// sshsigBlob is the binary SSHSIG body (after the 6-byte magic). Field order is
// the SSH wire order — do not reorder.
type sshsigBlob struct {
Version uint32
PublicKey string
Namespace string
Reserved string
HashAlgo string
Signature string
}
func hashByName(n string) (hash.Hash, error) {
switch n {
case "sha256":
return sha256.New(), nil
case "sha512":
return sha512.New(), nil
}
return nil, fmt.Errorf("%w: unsupported SSHSIG hash %q", ErrMalformed, n)
}
// parseArmoredSSHSIG decodes the `-----BEGIN SSH SIGNATURE-----` armor into the
// SSHSIG body: pem.Decode → strip the literal 6-byte magic (not length-prefixed)
// → ssh.Unmarshal.
func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) {
block, _ := pem.Decode(armored)
if block == nil || block.Type != "SSH SIGNATURE" {
return nil, fmt.Errorf("%w: not an SSH SIGNATURE armor", ErrMalformed)
}
if len(block.Bytes) < len(sshsigMagic) || string(block.Bytes[:len(sshsigMagic)]) != sshsigMagic {
return nil, fmt.Errorf("%w: missing SSHSIG magic", ErrMalformed)
}
var sb sshsigBlob
if err := ssh.Unmarshal(block.Bytes[len(sshsigMagic):], &sb); err != nil {
return nil, fmt.Errorf("%w: %v", ErrMalformed, err)
}
if sb.Version != 1 {
return nil, fmt.Errorf("%w: bad SSHSIG version %d", ErrMalformed, sb.Version)
}
return &sb, nil
}
// signedData recomputes the bytes the signature actually covers, per the SSHSIG
// spec: "SSHSIG" || ssh.Marshal(namespace, reserved, hash_algorithm, H(message)),
// where H is the named hash. The message is the RAW received blob bytes — the
// verifier never canonicalizes (the canonical form is the signer's contract).
func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) {
h, err := hashByName(sb.HashAlgo)
if err != nil {
return nil, err
}
h.Write(msg)
md := h.Sum(nil)
body := ssh.Marshal(struct {
Namespace string
Reserved string
HashAlgo string
Hash []byte
}{sb.Namespace, sb.Reserved, sb.HashAlgo, md})
return append([]byte(sshsigMagic), body...), nil
}
+1
View File
@@ -0,0 +1 @@
{"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}}
+6
View File
@@ -0,0 +1,6 @@
-----BEGIN SSH SIGNATURE-----
U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgNXOOuMvD3Fh9MJYspBRWLXyQAd
WVeBICspeB9eL1xfIAAAAMZmVsaG9tLW9wLXYxAAAAAAAAAAZzaGE1MTIAAABTAAAAC3Nz
aC1lZDI1NTE5AAAAQG+bj+GNodNw7cfGYg3HWTDyJiu3g/5Aez1xlZQ540JGUIG9FV7vv8
wrgN0r+rNh+ytEAM6UTOyI7g3LOjuVJgY=
-----END SSH SIGNATURE-----
+1
View File
@@ -0,0 +1 @@
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDVzjrjLw9xYfTCWLKQUVi18kAHVlXgSArKXgfXi9cXy felhom-op-1
+191
View File
@@ -0,0 +1,191 @@
package authz
import (
"bytes"
"encoding/json"
"fmt"
"log/slog"
"time"
"golang.org/x/crypto/ssh"
)
// Namespace is the FIXED SSHSIG domain separator. It is a package constant, never
// caller-supplied (phase4 §2.2): a signature minted for any other namespace must
// not verify.
const Namespace = "felhom-op-v1"
// DefaultClockSkew tolerates operator/host clock drift on the not-yet-valid check
// only (issued_at may be up to this far in the future). Expiry is NOT extended —
// the validity window stays an honest upper bound.
const DefaultClockSkew = 2 * time.Minute
// KeyRole tags a pinned operator key (doc 04 §3 two-key model).
type KeyRole string
const (
// RoleOperational signs ordinary destructive ops (the "master stamp").
RoleOperational KeyRole = "operational"
// RoleRecovery is the cold key; authorizes ONLY key-rotation/break-glass ops.
// Role-scoping is enforced by the consuming layer (slice 4), not here.
RoleRecovery KeyRole = "recovery"
)
// AllowedSigner is one pinned operator public key.
type AllowedSigner struct {
KeyID string
Role KeyRole
PublicKey ssh.PublicKey // parsed; allow-list match is by PublicKey.Marshal()
Comment string // from the authorized_keys line, if any
}
// NewAllowedSigner parses a standard authorized_keys line ("ssh-ed25519 AAAA… [comment]"
// or "sk-ssh-ed25519@openssh.com AAAA… …") into an AllowedSigner with the given id+role.
func NewAllowedSigner(keyID string, role KeyRole, authorizedKeyLine string) (AllowedSigner, error) {
pub, comment, _, _, err := ssh.ParseAuthorizedKey([]byte(authorizedKeyLine))
if err != nil {
return AllowedSigner{}, fmt.Errorf("authz: parsing pinned key %q: %w", keyID, err)
}
if role != RoleOperational && role != RoleRecovery {
return AllowedSigner{}, fmt.Errorf("authz: pinned key %q has invalid role %q", keyID, role)
}
return AllowedSigner{KeyID: keyID, Role: role, PublicKey: pub, Comment: comment}, nil
}
// NonceStore records seen nonces for anti-replay. SeenOrRecord reports whether the
// nonce was already recorded; if not, it records it (durably, in the host impl)
// before returning false. See noncestore.go.
type NonceStore interface {
SeenOrRecord(nonce string, exp time.Time) (seen bool)
}
// Verifier authenticates operator-signed destructive ops. Construct with New.
type Verifier struct {
signers []AllowedSigner
store NonceStore
hostID string
// ClockSkew tolerance for the not-yet-valid check (default DefaultClockSkew).
ClockSkew time.Duration
// Logger, if set, emits a warning when a blob's advisory key_id disagrees with
// the matched signer. Never affects the verdict.
Logger *slog.Logger
now func() time.Time // injectable for tests
}
// New builds a Verifier over the pinned signer set, a nonce store, and this box's
// host id. allowedSigners is a set (single signer today; quorum is just sizing).
func New(signers []AllowedSigner, store NonceStore, hostID string) *Verifier {
return &Verifier{
signers: signers,
store: store,
hostID: hostID,
ClockSkew: DefaultClockSkew,
now: func() time.Time { return time.Now().UTC() },
}
}
// Verify runs the LOCKED pipeline (phase4 §4 / doc 04 §2.3) and returns the
// authenticated op. Order is load-bearing and each post-crypto stage rejects even
// with an otherwise-valid signature:
//
// parse armor → namespace → parse pubkey → allow-list (by key MATERIAL, not
// key_id) → crypto verify (over the RAW received blob bytes) → parse blob →
// target → time window → nonce SeenOrRecord (LAST)
//
// The nonce is recorded last, so an invalid signature can never consume a nonce
// (DoS / replay-priming safe). Errors wrap the typed sentinels in errors.go.
func (v *Verifier) Verify(blob, sigArmored []byte) (*VerifiedOp, error) {
// 1. parse armor
sb, err := parseArmoredSSHSIG(sigArmored)
if err != nil {
return nil, err
}
// 2. namespace (fixed domain separator)
if sb.Namespace != Namespace {
return nil, fmt.Errorf("%w: got %q want %q", ErrNamespace, sb.Namespace, Namespace)
}
// 3. parse the embedded public key
pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey))
if err != nil {
return nil, fmt.Errorf("%w: parsing signature public key: %v", ErrMalformed, err)
}
// 4. allow-list match by KEY MATERIAL (pub.Marshal equality) — NOT by key_id
matched, ok := v.matchSigner(pub)
if !ok {
return nil, ErrUnknownSigner
}
// 5. crypto verify over the RAW received bytes (never re-serialized)
signed, err := signedData(sb, blob)
if err != nil {
return nil, err
}
var inner ssh.Signature
if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil {
return nil, fmt.Errorf("%w: %v", ErrMalformed, err)
}
if err := pub.Verify(signed, &inner); err != nil { // dispatches on the key's algorithm
return nil, fmt.Errorf("%w: %v", ErrBadSignature, err)
}
// 6. parse the (now authenticated) blob bytes
var op OpBlob
if err := json.Unmarshal(blob, &op); err != nil {
return nil, fmt.Errorf("%w: decoding op blob: %v", ErrMalformed, err)
}
// 7. target binding — host must be this box. guest_id is surfaced, not matched
// here (the verifier doesn't enumerate guests; the caller routes by it).
if op.Target.HostID != v.hostID {
return nil, fmt.Errorf("%w: blob host_id=%q this=%q", ErrTarget, op.Target.HostID, v.hostID)
}
// 8. time window (clock-skew tolerance on not-yet-valid only)
now := v.now()
if now.Before(op.IssuedAt.Add(-v.ClockSkew)) {
return nil, fmt.Errorf("%w: issued_at=%s now=%s", ErrNotYetValid, op.IssuedAt, now)
}
if now.After(op.ExpiresAt) {
return nil, fmt.Errorf("%w: expires_at=%s now=%s", ErrExpired, op.ExpiresAt, now)
}
// 9. nonce LAST — only now is it durably recorded.
if v.store.SeenOrRecord(op.Nonce, op.ExpiresAt) {
return nil, fmt.Errorf("%w: nonce %s", ErrReplay, op.Nonce)
}
// advisory key_id audit (never a verdict input)
keyIDMatches := op.KeyID == matched.KeyID
if !keyIDMatches && v.Logger != nil {
v.Logger.Warn("authz: blob key_id does not match the matched signer (advisory)",
"blob_key_id", op.KeyID, "matched_signer", matched.KeyID)
}
return &VerifiedOp{
Op: op.Op,
HostID: op.Target.HostID,
GuestID: op.Target.GuestID,
Params: op.Params,
Nonce: op.Nonce,
IssuedAt: op.IssuedAt,
ExpiresAt: op.ExpiresAt,
KeyID: op.KeyID,
Signer: matched,
KeyIDMatchesSigner: keyIDMatches,
}, nil
}
func (v *Verifier) matchSigner(pub ssh.PublicKey) (AllowedSigner, bool) {
pm := pub.Marshal()
for _, s := range v.signers {
if s.PublicKey != nil && bytes.Equal(s.PublicKey.Marshal(), pm) {
return s, true
}
}
return AllowedSigner{}, false
}
+248
View File
@@ -0,0 +1,248 @@
package authz
import (
"errors"
"os"
"path/filepath"
"testing"
"time"
"golang.org/x/crypto/ssh"
)
// fixed reference instant used across in-Go tests (deterministic time window).
var refNow = time.Date(2026, 6, 8, 12, 0, 0, 0, time.UTC)
func atRefNow(v *Verifier) *Verifier { v.now = func() time.Time { return refNow }; return v }
// rejects asserts a Verify error matches the expected sentinel.
func rejects(t *testing.T, err, want error) {
t.Helper()
if !errors.Is(err, want) {
t.Fatalf("want %v, got %v", want, err)
}
}
// signerSet builds a one-key operational allow-list around an ssh.PublicKey.
func signerSet(pub ssh.PublicKey, keyID string) []AllowedSigner {
return []AllowedSigner{{KeyID: keyID, Role: RoleOperational, PublicKey: pub}}
}
// validBlob is an op blob valid at refNow.
func validBlob(host, guest, keyID, nonce string) []byte {
return canonicalBlob("guest_destroy", host, guest, keyID, nonce, `{"purge":true}`,
refNow.Add(-time.Hour), refNow.Add(time.Hour))
}
// --- Real OpenSSH interop: committed ssh-keygen fixture ---
func TestVerify_RealSSHKeygenFixture(t *testing.T) {
blob := readFile(t, "testdata/op_blob.json")
sig := readFile(t, "testdata/op_blob.sig")
pubLine := readFile(t, "testdata/operator.pub")
signer, err := NewAllowedSigner("felhom-op-1", RoleOperational, string(pubLine))
if err != nil {
t.Fatalf("NewAllowedSigner: %v", err)
}
v := New([]AllowedSigner{signer}, NewMemoryNonceStore(), "demo-felhom")
v.now = func() time.Time { return time.Date(2026, 6, 8, 12, 0, 0, 0, time.UTC) } // inside fixture window
op, err := v.Verify(blob, sig)
if err != nil {
t.Fatalf("real fixture did not verify: %v", err)
}
if op.Op != "guest_destroy" || op.HostID != "demo-felhom" || op.GuestID != "9001" {
t.Errorf("unexpected op: %+v", op)
}
if op.KeyID != "felhom-op-1" || !op.KeyIDMatchesSigner {
t.Errorf("key_id audit wrong: %q matches=%v", op.KeyID, op.KeyIDMatchesSigner)
}
}
// --- Happy path (in-Go ed25519) ---
func TestVerify_HappyPath(t *testing.T) {
pub, sign := newEd25519Signer(t)
blob := validBlob("demo-felhom", "9001", "op", "n-happy-0001")
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
op, err := v.Verify(blob, sig)
if err != nil {
t.Fatalf("Verify: %v", err)
}
if op.Op != "guest_destroy" || op.Signer.KeyID != "op" {
t.Errorf("op = %+v", op)
}
}
// --- Per-stage rejection, each with an otherwise-valid signature ---
func TestVerify_RejectsPerStage(t *testing.T) {
pub, sign := newEd25519Signer(t)
other, _ := newEd25519Signer(t)
t.Run("wrong namespace", func(t *testing.T) {
blob := validBlob("demo-felhom", "9001", "op", "n-ns-1")
sig := mintArmor(t, pub.Marshal(), "felhom-op-wrong", "sha512", blob, sign)
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
_, err := v.Verify(blob, sig)
rejects(t, err, ErrNamespace)
})
t.Run("signer not in set", func(t *testing.T) {
blob := validBlob("demo-felhom", "9001", "op", "n-unk-1")
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
v := atRefNow(New(signerSet(other, "other"), NewMemoryNonceStore(), "demo-felhom"))
_, err := v.Verify(blob, sig)
rejects(t, err, ErrUnknownSigner)
})
t.Run("tampered blob (crypto)", func(t *testing.T) {
blob := validBlob("demo-felhom", "9001", "op", "n-tamper-1")
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
tampered := append([]byte{}, blob...)
tampered[len(tampered)-2] = '!' // mutate inside the JSON
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
_, err := v.Verify(tampered, sig)
rejects(t, err, ErrBadSignature)
})
t.Run("retargeted host", func(t *testing.T) {
blob := validBlob("other-host", "9001", "op", "n-target-1")
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
_, err := v.Verify(blob, sig)
rejects(t, err, ErrTarget)
})
t.Run("expired", func(t *testing.T) {
blob := canonicalBlob("guest_destroy", "demo-felhom", "9001", "op", "n-exp-1", "{}",
refNow.Add(-2*time.Hour), refNow.Add(-time.Hour))
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
_, err := v.Verify(blob, sig)
rejects(t, err, ErrExpired)
})
t.Run("not yet valid", func(t *testing.T) {
blob := canonicalBlob("guest_destroy", "demo-felhom", "9001", "op", "n-nyv-1", "{}",
refNow.Add(time.Hour), refNow.Add(2*time.Hour))
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
_, err := v.Verify(blob, sig)
rejects(t, err, ErrNotYetValid)
})
t.Run("replay", func(t *testing.T) {
blob := validBlob("demo-felhom", "9001", "op", "n-replay-1")
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
if _, err := v.Verify(blob, sig); err != nil {
t.Fatalf("first use: %v", err)
}
_, err := v.Verify(blob, sig)
rejects(t, err, ErrReplay)
})
}
// --- THE anti-replay invariant: an invalid-sig attempt must NOT burn the nonce ---
func TestVerify_InvalidSigDoesNotBurnNonce(t *testing.T) {
pub, sign := newEd25519Signer(t)
store := NewMemoryNonceStore()
const nonce = "n-not-burned-cafe"
blobV := validBlob("demo-felhom", "9001", "op", nonce)
validSig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blobV, sign)
// Attacker reuses the SAME nonce but a signature that fails crypto (valid key,
// signed over different bytes) — passes namespace + allow-list, fails at the
// crypto stage, which is BEFORE the nonce stage.
badSig := mintArmor(t, pub.Marshal(), Namespace, "sha512", []byte(`{"different":"bytes"}`), sign)
v := atRefNow(New(signerSet(pub, "op"), store, "demo-felhom"))
if _, err := v.Verify(blobV, badSig); !errors.Is(err, ErrBadSignature) {
t.Fatalf("invalid attempt: want ErrBadSignature, got %v", err)
}
// The genuine valid op with the same nonce must still succeed — proving the
// failed attempt did NOT burn the nonce (nonce-recorded-last).
if _, err := v.Verify(blobV, validSig); err != nil {
t.Fatalf("valid op after invalid attempt should succeed, got %v", err)
}
}
// --- Persistence across restart (durable nonce store) ---
func TestVerify_ReplayRejectedAcrossRestart(t *testing.T) {
pub, sign := newEd25519Signer(t)
blob := validBlob("demo-felhom", "9001", "op", "n-persist-1")
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
path := filepath.Join(t.TempDir(), "nonces.log")
store1, err := OpenFileNonceStore(path)
if err != nil {
t.Fatal(err)
}
v1 := atRefNow(New(signerSet(pub, "op"), store1, "demo-felhom"))
if _, err := v1.Verify(blob, sig); err != nil {
t.Fatalf("first use: %v", err)
}
if err := store1.Close(); err != nil {
t.Fatal(err)
}
// Fresh store + verifier over the SAME path — simulates an agent restart.
store2, err := OpenFileNonceStore(path)
if err != nil {
t.Fatal(err)
}
defer store2.Close()
v2 := atRefNow(New(signerSet(pub, "op"), store2, "demo-felhom"))
_, err = v2.Verify(blob, sig)
rejects(t, err, ErrReplay)
}
// --- Key-type-agnostic: synthetic FIDO2 sk-ssh-ed25519 through the unchanged path ---
func TestVerify_KeyTypeAgnostic_SK(t *testing.T) {
skPub, skSign := newSyntheticSKSigner(t)
blob := validBlob("demo-felhom", "9001", "op", "n-sk-1")
sig := mintArmor(t, skPub.Marshal(), Namespace, "sha512", blob, skSign)
v := atRefNow(New(signerSet(skPub, "op"), NewMemoryNonceStore(), "demo-felhom"))
op, err := v.Verify(blob, sig)
if err != nil {
t.Fatalf("sk verify through unchanged path failed: %v", err)
}
if op.Op != "guest_destroy" {
t.Errorf("op = %q", op.Op)
}
}
// --- Byte-exactness: a re-serialized blob is NOT re-canonicalized (fails crypto) ---
func TestVerify_ByteExactNoRecanonicalization(t *testing.T) {
pub, sign := newEd25519Signer(t)
blob := validBlob("demo-felhom", "9001", "op", "n-bytes-1")
sig := mintArmor(t, pub.Marshal(), Namespace, "sha512", blob, sign)
// Same fields, different whitespace + key order — what a non-identical producer
// canonicalizer would emit. The verifier verifies raw bytes, so this fails crypto.
reserialized := []byte(`{ "op":"guest_destroy", "target":{"host_id":"demo-felhom","guest_id":"9001"}, "params":{"purge":true}, "nonce":"n-bytes-1", "issued_at":"` +
refNow.Add(-time.Hour).Format(time.RFC3339) + `", "expires_at":"` + refNow.Add(time.Hour).Format(time.RFC3339) + `", "key_id":"op" }`)
v := atRefNow(New(signerSet(pub, "op"), NewMemoryNonceStore(), "demo-felhom"))
_, err := v.Verify(reserialized, sig)
rejects(t, err, ErrBadSignature)
}
func readFile(t *testing.T, path string) []byte {
t.Helper()
b, err := os.ReadFile(path)
if err != nil {
t.Fatal(err)
}
return b
}
+27 -2
View File
@@ -17,14 +17,38 @@ import (
"strings" "strings"
) )
// Config is the agent configuration. Only the fields the proxmox interaction // Config is the agent configuration.
// layer needs are present in this slice.
type Config struct { type Config struct {
Proxmox ProxmoxConfig `json:"proxmox"` Proxmox ProxmoxConfig `json:"proxmox"`
Privileged PrivilegedConfig `json:"privileged"` Privileged PrivilegedConfig `json:"privileged"`
Authz AuthzConfig `json:"authz"`
LogLevel string `json:"log_level"` // debug|info|warn|error (default info) LogLevel string `json:"log_level"` // debug|info|warn|error (default info)
} }
// AuthzConfig configures operator-signed-op verification (internal/authz). The
// pinned operator public keys are kept here as raw authorized_keys-style lines
// (this package stays dependency-free); the authz package parses them into its
// AllowedSigner set. Role-scoping (recovery keys authorize only key-rotation) is
// enforced by the consuming layer, not loaded here.
type AuthzConfig struct {
// NonceStorePath is the durable, crash-safe nonce log (anti-replay). Must be on
// persistent host storage so replay protection survives agent restarts.
NonceStorePath string `json:"nonce_store_path"`
// Signers are the pinned operator public keys (doc 04 §3 two-key model).
Signers []SignerKey `json:"signers"`
}
// SignerKey is one pinned operator public key.
type SignerKey struct {
KeyID string `json:"key_id"`
// Role is "operational" (signs destructive ops) or "recovery" (cold key;
// authorizes only key-rotation/break-glass).
Role string `json:"role"`
// PublicKey is a standard authorized_keys line, e.g.
// "ssh-ed25519 AAAA… felhom-op-1" or "sk-ssh-ed25519@openssh.com AAAA… …".
PublicKey string `json:"public_key"`
}
// ProxmoxConfig configures the API client. // ProxmoxConfig configures the API client.
type ProxmoxConfig struct { type ProxmoxConfig struct {
// Endpoint defaults to https://127.0.0.1:8006 (agent runs on the host). // Endpoint defaults to https://127.0.0.1:8006 (agent runs on the host).
@@ -62,6 +86,7 @@ func Default() Config {
return Config{ return Config{
Proxmox: ProxmoxConfig{Endpoint: "https://127.0.0.1:8006"}, Proxmox: ProxmoxConfig{Endpoint: "https://127.0.0.1:8006"},
Privileged: PrivilegedConfig{Mode: "sudo"}, Privileged: PrivilegedConfig{Mode: "sudo"},
Authz: AuthzConfig{NonceStorePath: "/var/lib/felhom-agent/nonces.log"},
LogLevel: "info", LogLevel: "info",
} }
} }