# Phase 4 — Control-plane signing primitive (SSHSIG + Go verify): Findings **Where run:** build server `192.168.0.180` (Debian 13, **Go 1.24.4**, **OpenSSH 10.0p2**), no Proxmox. **Date:** 2026-06-08. Throwaway key generated, used, and **deleted** — no private key, passphrase, or `.sig` committed. > De-risks the signing primitive *before* it is written into `04-control-plane-authorization.md` > or the agent's verify code. **Verdict up front: the approach works cleanly and is key-type- > agnostic — no fallback needed.** Go verifies the armored `SSHSIG` format, every tamper/replay/ > authorization case is rejected, and a synthetic FIDO2 `sk-ssh-ed25519` signature verifies > through the **unchanged** code path (true hardware drop-in). --- ## 0. Result at a glance — 14/14 checks pass ``` == Step 2: SSHSIG signature verification (key-type-agnostic path) == PASS correct verified, op="guest_destroy" PASS wrong key rejected: signer not in allowed set PASS tampered blob rejected: signature invalid: ssh: signature did not verify PASS wrong namespace rejected: namespace mismatch: got "felhom-op-wrong" want "felhom-op-v1" == Step 3: anti-replay / authorization (valid signature, still rejected) == PASS first use verified, op="guest_destroy" PASS replay (same nonce) rejected: replay: nonce a1b2c3d4...8f90 already seen PASS expired rejected: expired (expires_at=2020-01-02 ..., now=2026-06-08 ...) PASS not-yet-valid rejected: not yet valid (issued_at=2030-01-01 ...) PASS retargeted host rejected: target mismatch: blob=demo-felhom/9001 this=other-host/9001 PASS retargeted guest rejected: target mismatch: blob=demo-felhom/9001 this=demo-felhom/8888 == Step 4: key-type-agnosticism — FIDO2 sk-ssh-ed25519 (synthetic, no device) == PASS parses sk pubkey type="sk-ssh-ed25519@openssh.com" PASS authorized_keys form sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5... PASS sk end-to-end verify verified, op="guest_destroy" ``` --- ## 1. Software round-trip (baseline, CLI) - Key: `ssh-keygen -t ed25519 -f felhom-op -N '' -C felhom-operator`. (Signing non-interactively used an `SSH_ASKPASS` helper + `setsid -w`; in production the operator key lives behind an agent or a FIDO2 device, so the at-sign passphrase prompt is a non-issue. The passphrase mechanics are **not** what this spike de-risks.) - Sign with a **domain-separated namespace**: `ssh-keygen -Y sign -f felhom-op -n felhom-op-v1 blob.json` → `blob.json.sig` (armored `-----BEGIN SSH SIGNATURE-----`). - Baseline verify (CLI sanity) with an allow-list: ``` allowed_signers: felhom-operator namespaces="felhom-op-v1" ssh-ed25519 AAAAC3... $ ssh-keygen -Y verify -f allowed_signers -I felhom-operator -n felhom-op-v1 \ -s blob.json.sig < blob.json Good "felhom-op-v1" signature for felhom-operator with ED25519 key SHA256:y0Lj8dIYTM6... ``` ## 2. Canonical op blob spec (documented) The signature covers **these exact bytes**; the operator CLI (also Go) must reproduce them byte-for-byte. **Canonical form: JSON, keys sorted lexicographically at every level, no insignificant whitespace, no trailing newline, UTF-8.** ```json {"expires_at":"","issued_at":"","key_id":"","nonce":"<128-bit hex>","op":"","params":{...},"target":{"guest_id":"","host_id":""}} ``` | field | meaning | |---|---| | `op` | the operation, e.g. `guest_destroy`, `storage_detach`, `restore_overwrite` | | `target.host_id` / `target.guest_id` | the box + guest the op is bound to (anti-retarget) | | `params` | op-specific arguments (themselves canonical-sorted) | | `nonce` | unique per op (anti-replay); ≥128-bit random | | `issued_at` / `expires_at` | validity window (short — minutes) | | `key_id` | which operator key (for rotation / audit) | Exact test blob (236 bytes): `{"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}}` > Note: the SSHSIG **namespace** (`felhom-op-v1`) is the cryptographic domain separator and is > a **fixed constant in the verifier**, never caller-supplied — a signature minted for any > other namespace must not verify (proven: "wrong namespace" rejected). ## 3. Go SSHSIG verify — approach + implementation cost **It is not a one-call verify, but it is clean — no hand-rolled crypto.** The only manual work is SSHSIG *framing*; all crypto and key-type dispatch is the library's. Steps: 1. `pem.Decode` the armor → `block.Type == "SSH SIGNATURE"`, `block.Bytes` is the binary SSHSIG. *(Go's `encoding/pem` parses the armor directly — no manual base64/line handling.)* 2. Strip the literal 6-byte `SSHSIG` magic preamble (it is **not** length-prefixed). 3. `ssh.Unmarshal` the rest into a struct `{Version uint32; PublicKey, Namespace, Reserved, HashAlgo, Signature string}` — library does the SSH wire parsing. 4. `ssh.ParsePublicKey([]byte(PublicKey))` → an `ssh.PublicKey`. 5. Recompute the signed data per spec: `"SSHSIG" || string(namespace) || string(reserved) || string(hash_algorithm) || string(H(message))`, where `H` is the **named** hash (`sha256`/`sha512`) — built with one `ssh.Marshal`. 6. `ssh.Unmarshal([]byte(Signature))` into `ssh.Signature`, then **`pub.Verify(signed, &sig)`** — which **dispatches on the key's own algorithm** (this is what makes it key-agnostic). **Cost verdict:** ~40 lines of framing in one file, zero crypto implemented by us. Well within the agent's budget; **no reason to fall back** to a different primitive. ## 4. Anti-replay / authorization layer (on top of signature validity) Enforced in `VerifySignedOp` *after* the signature check, each proven to reject **even with a valid signature** (Step 3 output above): - **replay** — nonce already recorded in the window → reject; - **expired / not-yet-valid** — `now ∉ [issued_at, expires_at]` → reject (both sides shown); - **retargeted** — `target.host_id`/`guest_id` ≠ this box/guest → reject (both shown). (Order matters: signature → namespace → allow-list → crypto verify → target → time → nonce, so a replayed *but otherwise valid* op is still caught, and an invalid sig never consumes a nonce.) ## 5. Key-type-agnosticism — **TRUE DROP-IN** (no box change for FIDO2 later) No FIDO2 device was used (by choice). Instead the spike **emulated the authenticator exactly**: - Synthesized a well-formed `sk-ssh-ed25519@openssh.com` public key; `ssh.ParsePublicKey` parses it and `ssh.MarshalAuthorizedKey` round-trips it. - Constructed a real `SSHSIG` whose inner signature follows the sk scheme (per OpenSSH `PROTOCOL.u2f`): `ed25519` over `sha256(application) || flags || counter || sha256(signed_data)`, with the blob `string(format) string(ed25519_sig) byte(flags) uint32(counter)` — i.e. exactly what a FIDO2 key emits. - Ran it through the **unchanged `VerifySignedOp`** → **verified** (`op="guest_destroy"`). **Verdict: true drop-in.** `pub.Verify` for `sk-ssh-ed25519` is implemented in `golang.org/x/crypto/ssh` **v0.52.0** (it reconstructs `appDigest‖flags‖counter‖dataDigest` and `ed25519.Verify`s it). Introducing a hardware operator key later is a **no-op on the boxes** — the agent's verify code is identical; only the operator's signer key (and the allowed-signers set entry) changes. No sk-specific handler is needed. > Because verification dispatches on the key type embedded in the signature, the same path also > accepts `ssh-ed25519`, `rsa-sha2-*`, `ecdsa-sha2-*`, etc. — algorithm choice is the operator's, > not the agent's. ## 6. Fallback (not taken) and its cost A fallback would be a **raw Ed25519 detached signature** (or `minisign`): trivially one `ed25519.Verify` call, no SSHSIG framing. **Rejected** because it **loses the clean FIDO2 path** — a raw-Ed25519 verifier cannot consume an `sk-ssh-ed25519` signature (which carries flags+counter and a different signed-data construction), so the future hardware swap would require **changing the verifier on every box**. SSHSIG buys exactly the key-type-agnosticism (§5) that a raw scheme forfeits, at a one-file framing cost (§3). **No fallback is warranted.** ## 7. Reference verifier (seed of the agent's verify code) Verified working on Go 1.24.4 / `x/crypto` v0.52.0. (Test harness omitted; this is the verify core + SSHSIG framing + anti-replay/authz.) ```go const Namespace = "felhom-op-v1" // FIXED domain separator, never caller-supplied const sshsigMagic = "SSHSIG" type Target struct{ HostID, GuestID string } type OpBlob struct { Op string `json:"op"` Target Target `json:"target"` Params json.RawMessage `json:"params"` Nonce string `json:"nonce"` IssuedAt time.Time `json:"issued_at"` ExpiresAt time.Time `json:"expires_at"` KeyID string `json:"key_id"` } // (Target needs json tags host_id/guest_id in the real struct.) type NonceStore interface{ SeenOrRecord(nonce string, exp time.Time) bool } type sshsigBlob struct { Version uint32 PublicKey, Namespace, Reserved, HashAlgo, Signature string } func hashByName(n string) (hash.Hash, error) { switch n { case "sha256": return sha256.New(), nil case "sha512": return sha512.New(), nil } return nil, fmt.Errorf("unsupported SSHSIG hash %q", n) } func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) { block, _ := pem.Decode(armored) if block == nil || block.Type != "SSH SIGNATURE" { return nil, errors.New("not an SSH SIGNATURE armor") } if len(block.Bytes) < 6 || string(block.Bytes[:6]) != sshsigMagic { return nil, errors.New("missing SSHSIG magic") } var sb sshsigBlob if err := ssh.Unmarshal(block.Bytes[6:], &sb); err != nil { return nil, err } if sb.Version != 1 { return nil, fmt.Errorf("bad version %d", sb.Version) } return &sb, nil } func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) { h, err := hashByName(sb.HashAlgo); if err != nil { return nil, err } h.Write(msg); md := h.Sum(nil) body := ssh.Marshal(struct{ Namespace, Reserved, HashAlgo string; Hash []byte }{ sb.Namespace, sb.Reserved, sb.HashAlgo, md}) return append([]byte(sshsigMagic), body...), nil } // VerifySignedOp: key-type-agnostic signature verify + anti-replay/authorization. // allowedSigners is the trusted operator set (one key now; a quorum set later). func VerifySignedOp(blob, sigArmored []byte, allowedSigners []ssh.PublicKey, thisHostID, thisGuestID string, seenNonces NonceStore) (string, error) { sb, err := parseArmoredSSHSIG(sigArmored) if err != nil { return "", err } if sb.Namespace != Namespace { return "", fmt.Errorf("namespace mismatch: got %q want %q", sb.Namespace, Namespace) } pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey)) if err != nil { return "", err } allowed := false for _, a := range allowedSigners { if bytes.Equal(a.Marshal(), pub.Marshal()) { allowed = true; break } } if !allowed { return "", errors.New("signer not in allowed set") } signed, err := signedData(sb, blob) if err != nil { return "", err } var inner ssh.Signature if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil { return "", err } if err := pub.Verify(signed, &inner); err != nil { // dispatches on key algorithm return "", fmt.Errorf("signature invalid: %w", err) } var op OpBlob if err := json.Unmarshal(blob, &op); err != nil { return "", err } if op.Target.HostID != thisHostID || op.Target.GuestID != thisGuestID { return "", fmt.Errorf("target mismatch") } now := time.Now().UTC() if now.Before(op.IssuedAt) { return "", errors.New("not yet valid") } if now.After(op.ExpiresAt) { return "", errors.New("expired") } if seenNonces.SeenOrRecord(op.Nonce, op.ExpiresAt) { return "", fmt.Errorf("replay: nonce %s already seen", op.Nonce) } return op.Op, nil } ``` ## 8. Inputs to the design doc (`04-control-plane-authorization.md`) - **Primitive confirmed:** SSHSIG (`ssh-keygen -Y sign` / armored `BEGIN SSH SIGNATURE`), verified in Go via `pem.Decode` + `ssh.Unmarshal` + `ssh.ParsePublicKey` + `pub.Verify`. Low implementation cost; no crypto hand-rolled. - **Hub cannot forge:** the operator private key never touches the hub; the hub only queues the opaque armored blob (matches `03` §4). - **Key-type-agnostic / hardware-ready:** software `ed25519` now, FIDO2 `sk-ssh-ed25519` later is a **box no-op** (proven end-to-end). The verifier hardcodes neither key type nor algorithm. - **`allowedSigners` is a set:** single signer today; **threshold/quorum is just set sizing** plus an N-of-M policy on top (out of scope here). - **Anti-replay/authz are mandatory and cheap:** namespace (fixed), allow-list, then crypto, then target-binding, time-window, nonce — all enforced and tested. - **Canonical blob (§2)** is the shared contract between the operator CLI and the agent verifier.