From 333c65cbc48719e8eae6f94d6a05e3361dd084da Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Mon, 8 Jun 2026 10:03:49 +0200 Subject: [PATCH] phase4-signing-findings --- docs/tests/phase4-signing-findings.md | 257 ++++++++++++++++++++++++++ 1 file changed, 257 insertions(+) create mode 100644 docs/tests/phase4-signing-findings.md diff --git a/docs/tests/phase4-signing-findings.md b/docs/tests/phase4-signing-findings.md new file mode 100644 index 0000000..9d0b0ca --- /dev/null +++ b/docs/tests/phase4-signing-findings.md @@ -0,0 +1,257 @@ +# Phase 4 — Control-plane signing primitive (SSHSIG + Go verify): Findings + +**Where run:** build server `192.168.0.180` (Debian 13, **Go 1.24.4**, **OpenSSH 10.0p2**), +no Proxmox. **Date:** 2026-06-08. Throwaway key generated, used, and **deleted** — no private +key, passphrase, or `.sig` committed. + +> De-risks the signing primitive *before* it is written into `04-control-plane-authorization.md` +> or the agent's verify code. **Verdict up front: the approach works cleanly and is key-type- +> agnostic — no fallback needed.** Go verifies the armored `SSHSIG` format, every tamper/replay/ +> authorization case is rejected, and a synthetic FIDO2 `sk-ssh-ed25519` signature verifies +> through the **unchanged** code path (true hardware drop-in). + +--- + +## 0. Result at a glance — 14/14 checks pass + +``` +== Step 2: SSHSIG signature verification (key-type-agnostic path) == + PASS correct verified, op="guest_destroy" + PASS wrong key rejected: signer not in allowed set + PASS tampered blob rejected: signature invalid: ssh: signature did not verify + PASS wrong namespace rejected: namespace mismatch: got "felhom-op-wrong" want "felhom-op-v1" + +== Step 3: anti-replay / authorization (valid signature, still rejected) == + PASS first use verified, op="guest_destroy" + PASS replay (same nonce) rejected: replay: nonce a1b2c3d4...8f90 already seen + PASS expired rejected: expired (expires_at=2020-01-02 ..., now=2026-06-08 ...) + PASS not-yet-valid rejected: not yet valid (issued_at=2030-01-01 ...) + PASS retargeted host rejected: target mismatch: blob=demo-felhom/9001 this=other-host/9001 + PASS retargeted guest rejected: target mismatch: blob=demo-felhom/9001 this=demo-felhom/8888 + +== Step 4: key-type-agnosticism — FIDO2 sk-ssh-ed25519 (synthetic, no device) == + PASS parses sk pubkey type="sk-ssh-ed25519@openssh.com" + PASS authorized_keys form sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5... + PASS sk end-to-end verify verified, op="guest_destroy" +``` + +--- + +## 1. Software round-trip (baseline, CLI) + +- Key: `ssh-keygen -t ed25519 -f felhom-op -N '' -C felhom-operator`. + (Signing non-interactively used an `SSH_ASKPASS` helper + `setsid -w`; in production the + operator key lives behind an agent or a FIDO2 device, so the at-sign passphrase prompt is a + non-issue. The passphrase mechanics are **not** what this spike de-risks.) +- Sign with a **domain-separated namespace**: + `ssh-keygen -Y sign -f felhom-op -n felhom-op-v1 blob.json` → `blob.json.sig` + (armored `-----BEGIN SSH SIGNATURE-----`). +- Baseline verify (CLI sanity) with an allow-list: + ``` + allowed_signers: felhom-operator namespaces="felhom-op-v1" ssh-ed25519 AAAAC3... + $ ssh-keygen -Y verify -f allowed_signers -I felhom-operator -n felhom-op-v1 \ + -s blob.json.sig < blob.json + Good "felhom-op-v1" signature for felhom-operator with ED25519 key SHA256:y0Lj8dIYTM6... + ``` + +## 2. Canonical op blob spec (documented) + +The signature covers **these exact bytes**; the operator CLI (also Go) must reproduce them +byte-for-byte. **Canonical form: JSON, keys sorted lexicographically at every level, no +insignificant whitespace, no trailing newline, UTF-8.** + +```json +{"expires_at":"","issued_at":"","key_id":"","nonce":"<128-bit hex>","op":"","params":{...},"target":{"guest_id":"","host_id":""}} +``` + +| field | meaning | +|---|---| +| `op` | the operation, e.g. `guest_destroy`, `storage_detach`, `restore_overwrite` | +| `target.host_id` / `target.guest_id` | the box + guest the op is bound to (anti-retarget) | +| `params` | op-specific arguments (themselves canonical-sorted) | +| `nonce` | unique per op (anti-replay); ≥128-bit random | +| `issued_at` / `expires_at` | validity window (short — minutes) | +| `key_id` | which operator key (for rotation / audit) | + +Exact test blob (236 bytes): `{"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}}` + +> Note: the SSHSIG **namespace** (`felhom-op-v1`) is the cryptographic domain separator and is +> a **fixed constant in the verifier**, never caller-supplied — a signature minted for any +> other namespace must not verify (proven: "wrong namespace" rejected). + +## 3. Go SSHSIG verify — approach + implementation cost + +**It is not a one-call verify, but it is clean — no hand-rolled crypto.** The only manual work +is SSHSIG *framing*; all crypto and key-type dispatch is the library's. Steps: + +1. `pem.Decode` the armor → `block.Type == "SSH SIGNATURE"`, `block.Bytes` is the binary SSHSIG. + *(Go's `encoding/pem` parses the armor directly — no manual base64/line handling.)* +2. Strip the literal 6-byte `SSHSIG` magic preamble (it is **not** length-prefixed). +3. `ssh.Unmarshal` the rest into a struct `{Version uint32; PublicKey, Namespace, Reserved, + HashAlgo, Signature string}` — library does the SSH wire parsing. +4. `ssh.ParsePublicKey([]byte(PublicKey))` → an `ssh.PublicKey`. +5. Recompute the signed data per spec: `"SSHSIG" || string(namespace) || string(reserved) || + string(hash_algorithm) || string(H(message))`, where `H` is the **named** hash + (`sha256`/`sha512`) — built with one `ssh.Marshal`. +6. `ssh.Unmarshal([]byte(Signature))` into `ssh.Signature`, then **`pub.Verify(signed, &sig)`** — + which **dispatches on the key's own algorithm** (this is what makes it key-agnostic). + +**Cost verdict:** ~40 lines of framing in one file, zero crypto implemented by us. Well within +the agent's budget; **no reason to fall back** to a different primitive. + +## 4. Anti-replay / authorization layer (on top of signature validity) + +Enforced in `VerifySignedOp` *after* the signature check, each proven to reject **even with a +valid signature** (Step 3 output above): + +- **replay** — nonce already recorded in the window → reject; +- **expired / not-yet-valid** — `now ∉ [issued_at, expires_at]` → reject (both sides shown); +- **retargeted** — `target.host_id`/`guest_id` ≠ this box/guest → reject (both shown). + +(Order matters: signature → namespace → allow-list → crypto verify → target → time → nonce, so +a replayed *but otherwise valid* op is still caught, and an invalid sig never consumes a nonce.) + +## 5. Key-type-agnosticism — **TRUE DROP-IN** (no box change for FIDO2 later) + +No FIDO2 device was used (by choice). Instead the spike **emulated the authenticator exactly**: + +- Synthesized a well-formed `sk-ssh-ed25519@openssh.com` public key; `ssh.ParsePublicKey` parses + it and `ssh.MarshalAuthorizedKey` round-trips it. +- Constructed a real `SSHSIG` whose inner signature follows the sk scheme (per OpenSSH + `PROTOCOL.u2f`): `ed25519` over `sha256(application) || flags || counter || sha256(signed_data)`, + with the blob `string(format) string(ed25519_sig) byte(flags) uint32(counter)` — i.e. exactly + what a FIDO2 key emits. +- Ran it through the **unchanged `VerifySignedOp`** → **verified** (`op="guest_destroy"`). + +**Verdict: true drop-in.** `pub.Verify` for `sk-ssh-ed25519` is implemented in +`golang.org/x/crypto/ssh` **v0.52.0** (it reconstructs `appDigest‖flags‖counter‖dataDigest` and +`ed25519.Verify`s it). Introducing a hardware operator key later is a **no-op on the boxes** — +the agent's verify code is identical; only the operator's signer key (and the allowed-signers +set entry) changes. No sk-specific handler is needed. + +> Because verification dispatches on the key type embedded in the signature, the same path also +> accepts `ssh-ed25519`, `rsa-sha2-*`, `ecdsa-sha2-*`, etc. — algorithm choice is the operator's, +> not the agent's. + +## 6. Fallback (not taken) and its cost + +A fallback would be a **raw Ed25519 detached signature** (or `minisign`): trivially one +`ed25519.Verify` call, no SSHSIG framing. **Rejected** because it **loses the clean FIDO2 path** — +a raw-Ed25519 verifier cannot consume an `sk-ssh-ed25519` signature (which carries flags+counter +and a different signed-data construction), so the future hardware swap would require **changing +the verifier on every box**. SSHSIG buys exactly the key-type-agnosticism (§5) that a raw scheme +forfeits, at a one-file framing cost (§3). **No fallback is warranted.** + +## 7. Reference verifier (seed of the agent's verify code) + +Verified working on Go 1.24.4 / `x/crypto` v0.52.0. (Test harness omitted; this is the verify +core + SSHSIG framing + anti-replay/authz.) + +```go +const Namespace = "felhom-op-v1" // FIXED domain separator, never caller-supplied +const sshsigMagic = "SSHSIG" + +type Target struct{ HostID, GuestID string } +type OpBlob struct { + Op string `json:"op"` + Target Target `json:"target"` + Params json.RawMessage `json:"params"` + Nonce string `json:"nonce"` + IssuedAt time.Time `json:"issued_at"` + ExpiresAt time.Time `json:"expires_at"` + KeyID string `json:"key_id"` +} +// (Target needs json tags host_id/guest_id in the real struct.) + +type NonceStore interface{ SeenOrRecord(nonce string, exp time.Time) bool } + +type sshsigBlob struct { + Version uint32 + PublicKey, Namespace, Reserved, HashAlgo, Signature string +} + +func hashByName(n string) (hash.Hash, error) { + switch n { + case "sha256": return sha256.New(), nil + case "sha512": return sha512.New(), nil + } + return nil, fmt.Errorf("unsupported SSHSIG hash %q", n) +} + +func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) { + block, _ := pem.Decode(armored) + if block == nil || block.Type != "SSH SIGNATURE" { + return nil, errors.New("not an SSH SIGNATURE armor") + } + if len(block.Bytes) < 6 || string(block.Bytes[:6]) != sshsigMagic { + return nil, errors.New("missing SSHSIG magic") + } + var sb sshsigBlob + if err := ssh.Unmarshal(block.Bytes[6:], &sb); err != nil { return nil, err } + if sb.Version != 1 { return nil, fmt.Errorf("bad version %d", sb.Version) } + return &sb, nil +} + +func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) { + h, err := hashByName(sb.HashAlgo); if err != nil { return nil, err } + h.Write(msg); md := h.Sum(nil) + body := ssh.Marshal(struct{ Namespace, Reserved, HashAlgo string; Hash []byte }{ + sb.Namespace, sb.Reserved, sb.HashAlgo, md}) + return append([]byte(sshsigMagic), body...), nil +} + +// VerifySignedOp: key-type-agnostic signature verify + anti-replay/authorization. +// allowedSigners is the trusted operator set (one key now; a quorum set later). +func VerifySignedOp(blob, sigArmored []byte, allowedSigners []ssh.PublicKey, + thisHostID, thisGuestID string, seenNonces NonceStore) (string, error) { + + sb, err := parseArmoredSSHSIG(sigArmored) + if err != nil { return "", err } + if sb.Namespace != Namespace { + return "", fmt.Errorf("namespace mismatch: got %q want %q", sb.Namespace, Namespace) + } + pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey)) + if err != nil { return "", err } + allowed := false + for _, a := range allowedSigners { + if bytes.Equal(a.Marshal(), pub.Marshal()) { allowed = true; break } + } + if !allowed { return "", errors.New("signer not in allowed set") } + + signed, err := signedData(sb, blob) + if err != nil { return "", err } + var inner ssh.Signature + if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil { return "", err } + if err := pub.Verify(signed, &inner); err != nil { // dispatches on key algorithm + return "", fmt.Errorf("signature invalid: %w", err) + } + + var op OpBlob + if err := json.Unmarshal(blob, &op); err != nil { return "", err } + if op.Target.HostID != thisHostID || op.Target.GuestID != thisGuestID { + return "", fmt.Errorf("target mismatch") + } + now := time.Now().UTC() + if now.Before(op.IssuedAt) { return "", errors.New("not yet valid") } + if now.After(op.ExpiresAt) { return "", errors.New("expired") } + if seenNonces.SeenOrRecord(op.Nonce, op.ExpiresAt) { + return "", fmt.Errorf("replay: nonce %s already seen", op.Nonce) + } + return op.Op, nil +} +``` + +## 8. Inputs to the design doc (`04-control-plane-authorization.md`) + +- **Primitive confirmed:** SSHSIG (`ssh-keygen -Y sign` / armored `BEGIN SSH SIGNATURE`), + verified in Go via `pem.Decode` + `ssh.Unmarshal` + `ssh.ParsePublicKey` + `pub.Verify`. Low + implementation cost; no crypto hand-rolled. +- **Hub cannot forge:** the operator private key never touches the hub; the hub only queues the + opaque armored blob (matches `03` §4). +- **Key-type-agnostic / hardware-ready:** software `ed25519` now, FIDO2 `sk-ssh-ed25519` later is + a **box no-op** (proven end-to-end). The verifier hardcodes neither key type nor algorithm. +- **`allowedSigners` is a set:** single signer today; **threshold/quorum is just set sizing** plus + an N-of-M policy on top (out of scope here). +- **Anti-replay/authz are mandatory and cheap:** namespace (fixed), allow-list, then crypto, + then target-binding, time-window, nonce — all enforced and tested. +- **Canonical blob (§2)** is the shared contract between the operator CLI and the agent verifier.