phase4-signing-findings

This commit is contained in:
2026-06-08 10:03:49 +02:00
parent bb0a9e7205
commit 333c65cbc4
+257
View File
@@ -0,0 +1,257 @@
# Phase 4 — Control-plane signing primitive (SSHSIG + Go verify): Findings
**Where run:** build server `192.168.0.180` (Debian 13, **Go 1.24.4**, **OpenSSH 10.0p2**),
no Proxmox. **Date:** 2026-06-08. Throwaway key generated, used, and **deleted** — no private
key, passphrase, or `.sig` committed.
> De-risks the signing primitive *before* it is written into `04-control-plane-authorization.md`
> or the agent's verify code. **Verdict up front: the approach works cleanly and is key-type-
> agnostic — no fallback needed.** Go verifies the armored `SSHSIG` format, every tamper/replay/
> authorization case is rejected, and a synthetic FIDO2 `sk-ssh-ed25519` signature verifies
> through the **unchanged** code path (true hardware drop-in).
---
## 0. Result at a glance — 14/14 checks pass
```
== Step 2: SSHSIG signature verification (key-type-agnostic path) ==
PASS correct verified, op="guest_destroy"
PASS wrong key rejected: signer not in allowed set
PASS tampered blob rejected: signature invalid: ssh: signature did not verify
PASS wrong namespace rejected: namespace mismatch: got "felhom-op-wrong" want "felhom-op-v1"
== Step 3: anti-replay / authorization (valid signature, still rejected) ==
PASS first use verified, op="guest_destroy"
PASS replay (same nonce) rejected: replay: nonce a1b2c3d4...8f90 already seen
PASS expired rejected: expired (expires_at=2020-01-02 ..., now=2026-06-08 ...)
PASS not-yet-valid rejected: not yet valid (issued_at=2030-01-01 ...)
PASS retargeted host rejected: target mismatch: blob=demo-felhom/9001 this=other-host/9001
PASS retargeted guest rejected: target mismatch: blob=demo-felhom/9001 this=demo-felhom/8888
== Step 4: key-type-agnosticism — FIDO2 sk-ssh-ed25519 (synthetic, no device) ==
PASS parses sk pubkey type="sk-ssh-ed25519@openssh.com"
PASS authorized_keys form sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5...
PASS sk end-to-end verify verified, op="guest_destroy"
```
---
## 1. Software round-trip (baseline, CLI)
- Key: `ssh-keygen -t ed25519 -f felhom-op -N '<passphrase>' -C felhom-operator`.
(Signing non-interactively used an `SSH_ASKPASS` helper + `setsid -w`; in production the
operator key lives behind an agent or a FIDO2 device, so the at-sign passphrase prompt is a
non-issue. The passphrase mechanics are **not** what this spike de-risks.)
- Sign with a **domain-separated namespace**:
`ssh-keygen -Y sign -f felhom-op -n felhom-op-v1 blob.json``blob.json.sig`
(armored `-----BEGIN SSH SIGNATURE-----`).
- Baseline verify (CLI sanity) with an allow-list:
```
allowed_signers: felhom-operator namespaces="felhom-op-v1" ssh-ed25519 AAAAC3...
$ ssh-keygen -Y verify -f allowed_signers -I felhom-operator -n felhom-op-v1 \
-s blob.json.sig < blob.json
Good "felhom-op-v1" signature for felhom-operator with ED25519 key SHA256:y0Lj8dIYTM6...
```
## 2. Canonical op blob spec (documented)
The signature covers **these exact bytes**; the operator CLI (also Go) must reproduce them
byte-for-byte. **Canonical form: JSON, keys sorted lexicographically at every level, no
insignificant whitespace, no trailing newline, UTF-8.**
```json
{"expires_at":"<RFC3339 UTC>","issued_at":"<RFC3339 UTC>","key_id":"<id>","nonce":"<128-bit hex>","op":"<op>","params":{...},"target":{"guest_id":"<vmid>","host_id":"<node>"}}
```
| field | meaning |
|---|---|
| `op` | the operation, e.g. `guest_destroy`, `storage_detach`, `restore_overwrite` |
| `target.host_id` / `target.guest_id` | the box + guest the op is bound to (anti-retarget) |
| `params` | op-specific arguments (themselves canonical-sorted) |
| `nonce` | unique per op (anti-replay); ≥128-bit random |
| `issued_at` / `expires_at` | validity window (short — minutes) |
| `key_id` | which operator key (for rotation / audit) |
Exact test blob (236 bytes): `{"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}}`
> Note: the SSHSIG **namespace** (`felhom-op-v1`) is the cryptographic domain separator and is
> a **fixed constant in the verifier**, never caller-supplied — a signature minted for any
> other namespace must not verify (proven: "wrong namespace" rejected).
## 3. Go SSHSIG verify — approach + implementation cost
**It is not a one-call verify, but it is clean — no hand-rolled crypto.** The only manual work
is SSHSIG *framing*; all crypto and key-type dispatch is the library's. Steps:
1. `pem.Decode` the armor → `block.Type == "SSH SIGNATURE"`, `block.Bytes` is the binary SSHSIG.
*(Go's `encoding/pem` parses the armor directly — no manual base64/line handling.)*
2. Strip the literal 6-byte `SSHSIG` magic preamble (it is **not** length-prefixed).
3. `ssh.Unmarshal` the rest into a struct `{Version uint32; PublicKey, Namespace, Reserved,
HashAlgo, Signature string}` — library does the SSH wire parsing.
4. `ssh.ParsePublicKey([]byte(PublicKey))` → an `ssh.PublicKey`.
5. Recompute the signed data per spec: `"SSHSIG" || string(namespace) || string(reserved) ||
string(hash_algorithm) || string(H(message))`, where `H` is the **named** hash
(`sha256`/`sha512`) — built with one `ssh.Marshal`.
6. `ssh.Unmarshal([]byte(Signature))` into `ssh.Signature`, then **`pub.Verify(signed, &sig)`** —
which **dispatches on the key's own algorithm** (this is what makes it key-agnostic).
**Cost verdict:** ~40 lines of framing in one file, zero crypto implemented by us. Well within
the agent's budget; **no reason to fall back** to a different primitive.
## 4. Anti-replay / authorization layer (on top of signature validity)
Enforced in `VerifySignedOp` *after* the signature check, each proven to reject **even with a
valid signature** (Step 3 output above):
- **replay** — nonce already recorded in the window → reject;
- **expired / not-yet-valid** — `now ∉ [issued_at, expires_at]` → reject (both sides shown);
- **retargeted** — `target.host_id`/`guest_id` ≠ this box/guest → reject (both shown).
(Order matters: signature → namespace → allow-list → crypto verify → target → time → nonce, so
a replayed *but otherwise valid* op is still caught, and an invalid sig never consumes a nonce.)
## 5. Key-type-agnosticism — **TRUE DROP-IN** (no box change for FIDO2 later)
No FIDO2 device was used (by choice). Instead the spike **emulated the authenticator exactly**:
- Synthesized a well-formed `sk-ssh-ed25519@openssh.com` public key; `ssh.ParsePublicKey` parses
it and `ssh.MarshalAuthorizedKey` round-trips it.
- Constructed a real `SSHSIG` whose inner signature follows the sk scheme (per OpenSSH
`PROTOCOL.u2f`): `ed25519` over `sha256(application) || flags || counter || sha256(signed_data)`,
with the blob `string(format) string(ed25519_sig) byte(flags) uint32(counter)` — i.e. exactly
what a FIDO2 key emits.
- Ran it through the **unchanged `VerifySignedOp`** → **verified** (`op="guest_destroy"`).
**Verdict: true drop-in.** `pub.Verify` for `sk-ssh-ed25519` is implemented in
`golang.org/x/crypto/ssh` **v0.52.0** (it reconstructs `appDigest‖flags‖counter‖dataDigest` and
`ed25519.Verify`s it). Introducing a hardware operator key later is a **no-op on the boxes** —
the agent's verify code is identical; only the operator's signer key (and the allowed-signers
set entry) changes. No sk-specific handler is needed.
> Because verification dispatches on the key type embedded in the signature, the same path also
> accepts `ssh-ed25519`, `rsa-sha2-*`, `ecdsa-sha2-*`, etc. — algorithm choice is the operator's,
> not the agent's.
## 6. Fallback (not taken) and its cost
A fallback would be a **raw Ed25519 detached signature** (or `minisign`): trivially one
`ed25519.Verify` call, no SSHSIG framing. **Rejected** because it **loses the clean FIDO2 path** —
a raw-Ed25519 verifier cannot consume an `sk-ssh-ed25519` signature (which carries flags+counter
and a different signed-data construction), so the future hardware swap would require **changing
the verifier on every box**. SSHSIG buys exactly the key-type-agnosticism (§5) that a raw scheme
forfeits, at a one-file framing cost (§3). **No fallback is warranted.**
## 7. Reference verifier (seed of the agent's verify code)
Verified working on Go 1.24.4 / `x/crypto` v0.52.0. (Test harness omitted; this is the verify
core + SSHSIG framing + anti-replay/authz.)
```go
const Namespace = "felhom-op-v1" // FIXED domain separator, never caller-supplied
const sshsigMagic = "SSHSIG"
type Target struct{ HostID, GuestID string }
type OpBlob struct {
Op string `json:"op"`
Target Target `json:"target"`
Params json.RawMessage `json:"params"`
Nonce string `json:"nonce"`
IssuedAt time.Time `json:"issued_at"`
ExpiresAt time.Time `json:"expires_at"`
KeyID string `json:"key_id"`
}
// (Target needs json tags host_id/guest_id in the real struct.)
type NonceStore interface{ SeenOrRecord(nonce string, exp time.Time) bool }
type sshsigBlob struct {
Version uint32
PublicKey, Namespace, Reserved, HashAlgo, Signature string
}
func hashByName(n string) (hash.Hash, error) {
switch n {
case "sha256": return sha256.New(), nil
case "sha512": return sha512.New(), nil
}
return nil, fmt.Errorf("unsupported SSHSIG hash %q", n)
}
func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) {
block, _ := pem.Decode(armored)
if block == nil || block.Type != "SSH SIGNATURE" {
return nil, errors.New("not an SSH SIGNATURE armor")
}
if len(block.Bytes) < 6 || string(block.Bytes[:6]) != sshsigMagic {
return nil, errors.New("missing SSHSIG magic")
}
var sb sshsigBlob
if err := ssh.Unmarshal(block.Bytes[6:], &sb); err != nil { return nil, err }
if sb.Version != 1 { return nil, fmt.Errorf("bad version %d", sb.Version) }
return &sb, nil
}
func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) {
h, err := hashByName(sb.HashAlgo); if err != nil { return nil, err }
h.Write(msg); md := h.Sum(nil)
body := ssh.Marshal(struct{ Namespace, Reserved, HashAlgo string; Hash []byte }{
sb.Namespace, sb.Reserved, sb.HashAlgo, md})
return append([]byte(sshsigMagic), body...), nil
}
// VerifySignedOp: key-type-agnostic signature verify + anti-replay/authorization.
// allowedSigners is the trusted operator set (one key now; a quorum set later).
func VerifySignedOp(blob, sigArmored []byte, allowedSigners []ssh.PublicKey,
thisHostID, thisGuestID string, seenNonces NonceStore) (string, error) {
sb, err := parseArmoredSSHSIG(sigArmored)
if err != nil { return "", err }
if sb.Namespace != Namespace {
return "", fmt.Errorf("namespace mismatch: got %q want %q", sb.Namespace, Namespace)
}
pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey))
if err != nil { return "", err }
allowed := false
for _, a := range allowedSigners {
if bytes.Equal(a.Marshal(), pub.Marshal()) { allowed = true; break }
}
if !allowed { return "", errors.New("signer not in allowed set") }
signed, err := signedData(sb, blob)
if err != nil { return "", err }
var inner ssh.Signature
if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil { return "", err }
if err := pub.Verify(signed, &inner); err != nil { // dispatches on key algorithm
return "", fmt.Errorf("signature invalid: %w", err)
}
var op OpBlob
if err := json.Unmarshal(blob, &op); err != nil { return "", err }
if op.Target.HostID != thisHostID || op.Target.GuestID != thisGuestID {
return "", fmt.Errorf("target mismatch")
}
now := time.Now().UTC()
if now.Before(op.IssuedAt) { return "", errors.New("not yet valid") }
if now.After(op.ExpiresAt) { return "", errors.New("expired") }
if seenNonces.SeenOrRecord(op.Nonce, op.ExpiresAt) {
return "", fmt.Errorf("replay: nonce %s already seen", op.Nonce)
}
return op.Op, nil
}
```
## 8. Inputs to the design doc (`04-control-plane-authorization.md`)
- **Primitive confirmed:** SSHSIG (`ssh-keygen -Y sign` / armored `BEGIN SSH SIGNATURE`),
verified in Go via `pem.Decode` + `ssh.Unmarshal` + `ssh.ParsePublicKey` + `pub.Verify`. Low
implementation cost; no crypto hand-rolled.
- **Hub cannot forge:** the operator private key never touches the hub; the hub only queues the
opaque armored blob (matches `03` §4).
- **Key-type-agnostic / hardware-ready:** software `ed25519` now, FIDO2 `sk-ssh-ed25519` later is
a **box no-op** (proven end-to-end). The verifier hardcodes neither key type nor algorithm.
- **`allowedSigners` is a set:** single signer today; **threshold/quorum is just set sizing** plus
an N-of-M policy on top (out of scope here).
- **Anti-replay/authz are mandatory and cheap:** namespace (fixed), allow-list, then crypto,
then target-binding, time-window, nonce — all enforced and tested.
- **Canonical blob (§2)** is the shared contract between the operator CLI and the agent verifier.