13 KiB
Phase 4 — Control-plane signing primitive (SSHSIG + Go verify): Findings
Where run: build server 192.168.0.180 (Debian 13, Go 1.24.4, OpenSSH 10.0p2),
no Proxmox. Date: 2026-06-08. Throwaway key generated, used, and deleted — no private
key, passphrase, or .sig committed.
De-risks the signing primitive before it is written into
04-control-plane-authorization.mdor the agent's verify code. Verdict up front: the approach works cleanly and is key-type- agnostic — no fallback needed. Go verifies the armoredSSHSIGformat, every tamper/replay/ authorization case is rejected, and a synthetic FIDO2sk-ssh-ed25519signature verifies through the unchanged code path (true hardware drop-in).
0. Result at a glance — 14/14 checks pass
== Step 2: SSHSIG signature verification (key-type-agnostic path) ==
PASS correct verified, op="guest_destroy"
PASS wrong key rejected: signer not in allowed set
PASS tampered blob rejected: signature invalid: ssh: signature did not verify
PASS wrong namespace rejected: namespace mismatch: got "felhom-op-wrong" want "felhom-op-v1"
== Step 3: anti-replay / authorization (valid signature, still rejected) ==
PASS first use verified, op="guest_destroy"
PASS replay (same nonce) rejected: replay: nonce a1b2c3d4...8f90 already seen
PASS expired rejected: expired (expires_at=2020-01-02 ..., now=2026-06-08 ...)
PASS not-yet-valid rejected: not yet valid (issued_at=2030-01-01 ...)
PASS retargeted host rejected: target mismatch: blob=demo-felhom/9001 this=other-host/9001
PASS retargeted guest rejected: target mismatch: blob=demo-felhom/9001 this=demo-felhom/8888
== Step 4: key-type-agnosticism — FIDO2 sk-ssh-ed25519 (synthetic, no device) ==
PASS parses sk pubkey type="sk-ssh-ed25519@openssh.com"
PASS authorized_keys form sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5...
PASS sk end-to-end verify verified, op="guest_destroy"
1. Software round-trip (baseline, CLI)
- Key:
ssh-keygen -t ed25519 -f felhom-op -N '<passphrase>' -C felhom-operator. (Signing non-interactively used anSSH_ASKPASShelper +setsid -w; in production the operator key lives behind an agent or a FIDO2 device, so the at-sign passphrase prompt is a non-issue. The passphrase mechanics are not what this spike de-risks.) - Sign with a domain-separated namespace:
ssh-keygen -Y sign -f felhom-op -n felhom-op-v1 blob.json→blob.json.sig(armored-----BEGIN SSH SIGNATURE-----). - Baseline verify (CLI sanity) with an allow-list:
allowed_signers: felhom-operator namespaces="felhom-op-v1" ssh-ed25519 AAAAC3... $ ssh-keygen -Y verify -f allowed_signers -I felhom-operator -n felhom-op-v1 \ -s blob.json.sig < blob.json Good "felhom-op-v1" signature for felhom-operator with ED25519 key SHA256:y0Lj8dIYTM6...
2. Canonical op blob spec (documented)
The signature covers these exact bytes; the operator CLI (also Go) must reproduce them byte-for-byte. Canonical form: JSON, keys sorted lexicographically at every level, no insignificant whitespace, no trailing newline, UTF-8.
{"expires_at":"<RFC3339 UTC>","issued_at":"<RFC3339 UTC>","key_id":"<id>","nonce":"<128-bit hex>","op":"<op>","params":{...},"target":{"guest_id":"<vmid>","host_id":"<node>"}}
| field | meaning |
|---|---|
op |
the operation, e.g. guest_destroy, storage_detach, restore_overwrite |
target.host_id / target.guest_id |
the box + guest the op is bound to (anti-retarget) |
params |
op-specific arguments (themselves canonical-sorted) |
nonce |
unique per op (anti-replay); ≥128-bit random |
issued_at / expires_at |
validity window (short — minutes) |
key_id |
which operator key (for rotation / audit) |
Exact test blob (236 bytes): {"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}}
Note: the SSHSIG namespace (
felhom-op-v1) is the cryptographic domain separator and is a fixed constant in the verifier, never caller-supplied — a signature minted for any other namespace must not verify (proven: "wrong namespace" rejected).
3. Go SSHSIG verify — approach + implementation cost
It is not a one-call verify, but it is clean — no hand-rolled crypto. The only manual work is SSHSIG framing; all crypto and key-type dispatch is the library's. Steps:
pem.Decodethe armor →block.Type == "SSH SIGNATURE",block.Bytesis the binary SSHSIG. (Go'sencoding/pemparses the armor directly — no manual base64/line handling.)- Strip the literal 6-byte
SSHSIGmagic preamble (it is not length-prefixed). ssh.Unmarshalthe rest into a struct{Version uint32; PublicKey, Namespace, Reserved, HashAlgo, Signature string}— library does the SSH wire parsing.ssh.ParsePublicKey([]byte(PublicKey))→ anssh.PublicKey.- Recompute the signed data per spec:
"SSHSIG" || string(namespace) || string(reserved) || string(hash_algorithm) || string(H(message)), whereHis the named hash (sha256/sha512) — built with onessh.Marshal. ssh.Unmarshal([]byte(Signature))intossh.Signature, thenpub.Verify(signed, &sig)— which dispatches on the key's own algorithm (this is what makes it key-agnostic).
Cost verdict: ~40 lines of framing in one file, zero crypto implemented by us. Well within the agent's budget; no reason to fall back to a different primitive.
4. Anti-replay / authorization layer (on top of signature validity)
Enforced in VerifySignedOp after the signature check, each proven to reject even with a
valid signature (Step 3 output above):
- replay — nonce already recorded in the window → reject;
- expired / not-yet-valid —
now ∉ [issued_at, expires_at]→ reject (both sides shown); - retargeted —
target.host_id/guest_id≠ this box/guest → reject (both shown).
(Order matters: signature → namespace → allow-list → crypto verify → target → time → nonce, so a replayed but otherwise valid op is still caught, and an invalid sig never consumes a nonce.)
5. Key-type-agnosticism — TRUE DROP-IN (no box change for FIDO2 later)
No FIDO2 device was used (by choice). Instead the spike emulated the authenticator exactly:
- Synthesized a well-formed
sk-ssh-ed25519@openssh.compublic key;ssh.ParsePublicKeyparses it andssh.MarshalAuthorizedKeyround-trips it. - Constructed a real
SSHSIGwhose inner signature follows the sk scheme (per OpenSSHPROTOCOL.u2f):ed25519oversha256(application) || flags || counter || sha256(signed_data), with the blobstring(format) string(ed25519_sig) byte(flags) uint32(counter)— i.e. exactly what a FIDO2 key emits. - Ran it through the unchanged
VerifySignedOp→ verified (op="guest_destroy").
Verdict: true drop-in. pub.Verify for sk-ssh-ed25519 is implemented in
golang.org/x/crypto/ssh v0.52.0 (it reconstructs appDigest‖flags‖counter‖dataDigest and
ed25519.Verifys it). Introducing a hardware operator key later is a no-op on the boxes —
the agent's verify code is identical; only the operator's signer key (and the allowed-signers
set entry) changes. No sk-specific handler is needed.
Because verification dispatches on the key type embedded in the signature, the same path also accepts
ssh-ed25519,rsa-sha2-*,ecdsa-sha2-*, etc. — algorithm choice is the operator's, not the agent's.
6. Fallback (not taken) and its cost
A fallback would be a raw Ed25519 detached signature (or minisign): trivially one
ed25519.Verify call, no SSHSIG framing. Rejected because it loses the clean FIDO2 path —
a raw-Ed25519 verifier cannot consume an sk-ssh-ed25519 signature (which carries flags+counter
and a different signed-data construction), so the future hardware swap would require changing
the verifier on every box. SSHSIG buys exactly the key-type-agnosticism (§5) that a raw scheme
forfeits, at a one-file framing cost (§3). No fallback is warranted.
7. Reference verifier (seed of the agent's verify code)
Verified working on Go 1.24.4 / x/crypto v0.52.0. (Test harness omitted; this is the verify
core + SSHSIG framing + anti-replay/authz.)
const Namespace = "felhom-op-v1" // FIXED domain separator, never caller-supplied
const sshsigMagic = "SSHSIG"
type Target struct{ HostID, GuestID string }
type OpBlob struct {
Op string `json:"op"`
Target Target `json:"target"`
Params json.RawMessage `json:"params"`
Nonce string `json:"nonce"`
IssuedAt time.Time `json:"issued_at"`
ExpiresAt time.Time `json:"expires_at"`
KeyID string `json:"key_id"`
}
// (Target needs json tags host_id/guest_id in the real struct.)
type NonceStore interface{ SeenOrRecord(nonce string, exp time.Time) bool }
type sshsigBlob struct {
Version uint32
PublicKey, Namespace, Reserved, HashAlgo, Signature string
}
func hashByName(n string) (hash.Hash, error) {
switch n {
case "sha256": return sha256.New(), nil
case "sha512": return sha512.New(), nil
}
return nil, fmt.Errorf("unsupported SSHSIG hash %q", n)
}
func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) {
block, _ := pem.Decode(armored)
if block == nil || block.Type != "SSH SIGNATURE" {
return nil, errors.New("not an SSH SIGNATURE armor")
}
if len(block.Bytes) < 6 || string(block.Bytes[:6]) != sshsigMagic {
return nil, errors.New("missing SSHSIG magic")
}
var sb sshsigBlob
if err := ssh.Unmarshal(block.Bytes[6:], &sb); err != nil { return nil, err }
if sb.Version != 1 { return nil, fmt.Errorf("bad version %d", sb.Version) }
return &sb, nil
}
func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) {
h, err := hashByName(sb.HashAlgo); if err != nil { return nil, err }
h.Write(msg); md := h.Sum(nil)
body := ssh.Marshal(struct{ Namespace, Reserved, HashAlgo string; Hash []byte }{
sb.Namespace, sb.Reserved, sb.HashAlgo, md})
return append([]byte(sshsigMagic), body...), nil
}
// VerifySignedOp: key-type-agnostic signature verify + anti-replay/authorization.
// allowedSigners is the trusted operator set (one key now; a quorum set later).
func VerifySignedOp(blob, sigArmored []byte, allowedSigners []ssh.PublicKey,
thisHostID, thisGuestID string, seenNonces NonceStore) (string, error) {
sb, err := parseArmoredSSHSIG(sigArmored)
if err != nil { return "", err }
if sb.Namespace != Namespace {
return "", fmt.Errorf("namespace mismatch: got %q want %q", sb.Namespace, Namespace)
}
pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey))
if err != nil { return "", err }
allowed := false
for _, a := range allowedSigners {
if bytes.Equal(a.Marshal(), pub.Marshal()) { allowed = true; break }
}
if !allowed { return "", errors.New("signer not in allowed set") }
signed, err := signedData(sb, blob)
if err != nil { return "", err }
var inner ssh.Signature
if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil { return "", err }
if err := pub.Verify(signed, &inner); err != nil { // dispatches on key algorithm
return "", fmt.Errorf("signature invalid: %w", err)
}
var op OpBlob
if err := json.Unmarshal(blob, &op); err != nil { return "", err }
if op.Target.HostID != thisHostID || op.Target.GuestID != thisGuestID {
return "", fmt.Errorf("target mismatch")
}
now := time.Now().UTC()
if now.Before(op.IssuedAt) { return "", errors.New("not yet valid") }
if now.After(op.ExpiresAt) { return "", errors.New("expired") }
if seenNonces.SeenOrRecord(op.Nonce, op.ExpiresAt) {
return "", fmt.Errorf("replay: nonce %s already seen", op.Nonce)
}
return op.Op, nil
}
8. Inputs to the design doc (04-control-plane-authorization.md)
- Primitive confirmed: SSHSIG (
ssh-keygen -Y sign/ armoredBEGIN SSH SIGNATURE), verified in Go viapem.Decode+ssh.Unmarshal+ssh.ParsePublicKey+pub.Verify. Low implementation cost; no crypto hand-rolled. - Hub cannot forge: the operator private key never touches the hub; the hub only queues the
opaque armored blob (matches
03§4). - Key-type-agnostic / hardware-ready: software
ed25519now, FIDO2sk-ssh-ed25519later is a box no-op (proven end-to-end). The verifier hardcodes neither key type nor algorithm. allowedSignersis a set: single signer today; threshold/quorum is just set sizing plus an N-of-M policy on top (out of scope here).- Anti-replay/authz are mandatory and cheap: namespace (fixed), allow-list, then crypto, then target-binding, time-window, nonce — all enforced and tested.
- Canonical blob (§2) is the shared contract between the operator CLI and the agent verifier.