258 lines
13 KiB
Markdown
258 lines
13 KiB
Markdown
# Phase 4 — Control-plane signing primitive (SSHSIG + Go verify): Findings
|
|
|
|
**Where run:** build server `192.168.0.180` (Debian 13, **Go 1.24.4**, **OpenSSH 10.0p2**),
|
|
no Proxmox. **Date:** 2026-06-08. Throwaway key generated, used, and **deleted** — no private
|
|
key, passphrase, or `.sig` committed.
|
|
|
|
> De-risks the signing primitive *before* it is written into `04-control-plane-authorization.md`
|
|
> or the agent's verify code. **Verdict up front: the approach works cleanly and is key-type-
|
|
> agnostic — no fallback needed.** Go verifies the armored `SSHSIG` format, every tamper/replay/
|
|
> authorization case is rejected, and a synthetic FIDO2 `sk-ssh-ed25519` signature verifies
|
|
> through the **unchanged** code path (true hardware drop-in).
|
|
|
|
---
|
|
|
|
## 0. Result at a glance — 14/14 checks pass
|
|
|
|
```
|
|
== Step 2: SSHSIG signature verification (key-type-agnostic path) ==
|
|
PASS correct verified, op="guest_destroy"
|
|
PASS wrong key rejected: signer not in allowed set
|
|
PASS tampered blob rejected: signature invalid: ssh: signature did not verify
|
|
PASS wrong namespace rejected: namespace mismatch: got "felhom-op-wrong" want "felhom-op-v1"
|
|
|
|
== Step 3: anti-replay / authorization (valid signature, still rejected) ==
|
|
PASS first use verified, op="guest_destroy"
|
|
PASS replay (same nonce) rejected: replay: nonce a1b2c3d4...8f90 already seen
|
|
PASS expired rejected: expired (expires_at=2020-01-02 ..., now=2026-06-08 ...)
|
|
PASS not-yet-valid rejected: not yet valid (issued_at=2030-01-01 ...)
|
|
PASS retargeted host rejected: target mismatch: blob=demo-felhom/9001 this=other-host/9001
|
|
PASS retargeted guest rejected: target mismatch: blob=demo-felhom/9001 this=demo-felhom/8888
|
|
|
|
== Step 4: key-type-agnosticism — FIDO2 sk-ssh-ed25519 (synthetic, no device) ==
|
|
PASS parses sk pubkey type="sk-ssh-ed25519@openssh.com"
|
|
PASS authorized_keys form sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5...
|
|
PASS sk end-to-end verify verified, op="guest_destroy"
|
|
```
|
|
|
|
---
|
|
|
|
## 1. Software round-trip (baseline, CLI)
|
|
|
|
- Key: `ssh-keygen -t ed25519 -f felhom-op -N '<passphrase>' -C felhom-operator`.
|
|
(Signing non-interactively used an `SSH_ASKPASS` helper + `setsid -w`; in production the
|
|
operator key lives behind an agent or a FIDO2 device, so the at-sign passphrase prompt is a
|
|
non-issue. The passphrase mechanics are **not** what this spike de-risks.)
|
|
- Sign with a **domain-separated namespace**:
|
|
`ssh-keygen -Y sign -f felhom-op -n felhom-op-v1 blob.json` → `blob.json.sig`
|
|
(armored `-----BEGIN SSH SIGNATURE-----`).
|
|
- Baseline verify (CLI sanity) with an allow-list:
|
|
```
|
|
allowed_signers: felhom-operator namespaces="felhom-op-v1" ssh-ed25519 AAAAC3...
|
|
$ ssh-keygen -Y verify -f allowed_signers -I felhom-operator -n felhom-op-v1 \
|
|
-s blob.json.sig < blob.json
|
|
Good "felhom-op-v1" signature for felhom-operator with ED25519 key SHA256:y0Lj8dIYTM6...
|
|
```
|
|
|
|
## 2. Canonical op blob spec (documented)
|
|
|
|
The signature covers **these exact bytes**; the operator CLI (also Go) must reproduce them
|
|
byte-for-byte. **Canonical form: JSON, keys sorted lexicographically at every level, no
|
|
insignificant whitespace, no trailing newline, UTF-8.**
|
|
|
|
```json
|
|
{"expires_at":"<RFC3339 UTC>","issued_at":"<RFC3339 UTC>","key_id":"<id>","nonce":"<128-bit hex>","op":"<op>","params":{...},"target":{"guest_id":"<vmid>","host_id":"<node>"}}
|
|
```
|
|
|
|
| field | meaning |
|
|
|---|---|
|
|
| `op` | the operation, e.g. `guest_destroy`, `storage_detach`, `restore_overwrite` |
|
|
| `target.host_id` / `target.guest_id` | the box + guest the op is bound to (anti-retarget) |
|
|
| `params` | op-specific arguments (themselves canonical-sorted) |
|
|
| `nonce` | unique per op (anti-replay); ≥128-bit random |
|
|
| `issued_at` / `expires_at` | validity window (short — minutes) |
|
|
| `key_id` | which operator key (for rotation / audit) |
|
|
|
|
Exact test blob (236 bytes): `{"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}}`
|
|
|
|
> Note: the SSHSIG **namespace** (`felhom-op-v1`) is the cryptographic domain separator and is
|
|
> a **fixed constant in the verifier**, never caller-supplied — a signature minted for any
|
|
> other namespace must not verify (proven: "wrong namespace" rejected).
|
|
|
|
## 3. Go SSHSIG verify — approach + implementation cost
|
|
|
|
**It is not a one-call verify, but it is clean — no hand-rolled crypto.** The only manual work
|
|
is SSHSIG *framing*; all crypto and key-type dispatch is the library's. Steps:
|
|
|
|
1. `pem.Decode` the armor → `block.Type == "SSH SIGNATURE"`, `block.Bytes` is the binary SSHSIG.
|
|
*(Go's `encoding/pem` parses the armor directly — no manual base64/line handling.)*
|
|
2. Strip the literal 6-byte `SSHSIG` magic preamble (it is **not** length-prefixed).
|
|
3. `ssh.Unmarshal` the rest into a struct `{Version uint32; PublicKey, Namespace, Reserved,
|
|
HashAlgo, Signature string}` — library does the SSH wire parsing.
|
|
4. `ssh.ParsePublicKey([]byte(PublicKey))` → an `ssh.PublicKey`.
|
|
5. Recompute the signed data per spec: `"SSHSIG" || string(namespace) || string(reserved) ||
|
|
string(hash_algorithm) || string(H(message))`, where `H` is the **named** hash
|
|
(`sha256`/`sha512`) — built with one `ssh.Marshal`.
|
|
6. `ssh.Unmarshal([]byte(Signature))` into `ssh.Signature`, then **`pub.Verify(signed, &sig)`** —
|
|
which **dispatches on the key's own algorithm** (this is what makes it key-agnostic).
|
|
|
|
**Cost verdict:** ~40 lines of framing in one file, zero crypto implemented by us. Well within
|
|
the agent's budget; **no reason to fall back** to a different primitive.
|
|
|
|
## 4. Anti-replay / authorization layer (on top of signature validity)
|
|
|
|
Enforced in `VerifySignedOp` *after* the signature check, each proven to reject **even with a
|
|
valid signature** (Step 3 output above):
|
|
|
|
- **replay** — nonce already recorded in the window → reject;
|
|
- **expired / not-yet-valid** — `now ∉ [issued_at, expires_at]` → reject (both sides shown);
|
|
- **retargeted** — `target.host_id`/`guest_id` ≠ this box/guest → reject (both shown).
|
|
|
|
(Order matters: signature → namespace → allow-list → crypto verify → target → time → nonce, so
|
|
a replayed *but otherwise valid* op is still caught, and an invalid sig never consumes a nonce.)
|
|
|
|
## 5. Key-type-agnosticism — **TRUE DROP-IN** (no box change for FIDO2 later)
|
|
|
|
No FIDO2 device was used (by choice). Instead the spike **emulated the authenticator exactly**:
|
|
|
|
- Synthesized a well-formed `sk-ssh-ed25519@openssh.com` public key; `ssh.ParsePublicKey` parses
|
|
it and `ssh.MarshalAuthorizedKey` round-trips it.
|
|
- Constructed a real `SSHSIG` whose inner signature follows the sk scheme (per OpenSSH
|
|
`PROTOCOL.u2f`): `ed25519` over `sha256(application) || flags || counter || sha256(signed_data)`,
|
|
with the blob `string(format) string(ed25519_sig) byte(flags) uint32(counter)` — i.e. exactly
|
|
what a FIDO2 key emits.
|
|
- Ran it through the **unchanged `VerifySignedOp`** → **verified** (`op="guest_destroy"`).
|
|
|
|
**Verdict: true drop-in.** `pub.Verify` for `sk-ssh-ed25519` is implemented in
|
|
`golang.org/x/crypto/ssh` **v0.52.0** (it reconstructs `appDigest‖flags‖counter‖dataDigest` and
|
|
`ed25519.Verify`s it). Introducing a hardware operator key later is a **no-op on the boxes** —
|
|
the agent's verify code is identical; only the operator's signer key (and the allowed-signers
|
|
set entry) changes. No sk-specific handler is needed.
|
|
|
|
> Because verification dispatches on the key type embedded in the signature, the same path also
|
|
> accepts `ssh-ed25519`, `rsa-sha2-*`, `ecdsa-sha2-*`, etc. — algorithm choice is the operator's,
|
|
> not the agent's.
|
|
|
|
## 6. Fallback (not taken) and its cost
|
|
|
|
A fallback would be a **raw Ed25519 detached signature** (or `minisign`): trivially one
|
|
`ed25519.Verify` call, no SSHSIG framing. **Rejected** because it **loses the clean FIDO2 path** —
|
|
a raw-Ed25519 verifier cannot consume an `sk-ssh-ed25519` signature (which carries flags+counter
|
|
and a different signed-data construction), so the future hardware swap would require **changing
|
|
the verifier on every box**. SSHSIG buys exactly the key-type-agnosticism (§5) that a raw scheme
|
|
forfeits, at a one-file framing cost (§3). **No fallback is warranted.**
|
|
|
|
## 7. Reference verifier (seed of the agent's verify code)
|
|
|
|
Verified working on Go 1.24.4 / `x/crypto` v0.52.0. (Test harness omitted; this is the verify
|
|
core + SSHSIG framing + anti-replay/authz.)
|
|
|
|
```go
|
|
const Namespace = "felhom-op-v1" // FIXED domain separator, never caller-supplied
|
|
const sshsigMagic = "SSHSIG"
|
|
|
|
type Target struct{ HostID, GuestID string }
|
|
type OpBlob struct {
|
|
Op string `json:"op"`
|
|
Target Target `json:"target"`
|
|
Params json.RawMessage `json:"params"`
|
|
Nonce string `json:"nonce"`
|
|
IssuedAt time.Time `json:"issued_at"`
|
|
ExpiresAt time.Time `json:"expires_at"`
|
|
KeyID string `json:"key_id"`
|
|
}
|
|
// (Target needs json tags host_id/guest_id in the real struct.)
|
|
|
|
type NonceStore interface{ SeenOrRecord(nonce string, exp time.Time) bool }
|
|
|
|
type sshsigBlob struct {
|
|
Version uint32
|
|
PublicKey, Namespace, Reserved, HashAlgo, Signature string
|
|
}
|
|
|
|
func hashByName(n string) (hash.Hash, error) {
|
|
switch n {
|
|
case "sha256": return sha256.New(), nil
|
|
case "sha512": return sha512.New(), nil
|
|
}
|
|
return nil, fmt.Errorf("unsupported SSHSIG hash %q", n)
|
|
}
|
|
|
|
func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) {
|
|
block, _ := pem.Decode(armored)
|
|
if block == nil || block.Type != "SSH SIGNATURE" {
|
|
return nil, errors.New("not an SSH SIGNATURE armor")
|
|
}
|
|
if len(block.Bytes) < 6 || string(block.Bytes[:6]) != sshsigMagic {
|
|
return nil, errors.New("missing SSHSIG magic")
|
|
}
|
|
var sb sshsigBlob
|
|
if err := ssh.Unmarshal(block.Bytes[6:], &sb); err != nil { return nil, err }
|
|
if sb.Version != 1 { return nil, fmt.Errorf("bad version %d", sb.Version) }
|
|
return &sb, nil
|
|
}
|
|
|
|
func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) {
|
|
h, err := hashByName(sb.HashAlgo); if err != nil { return nil, err }
|
|
h.Write(msg); md := h.Sum(nil)
|
|
body := ssh.Marshal(struct{ Namespace, Reserved, HashAlgo string; Hash []byte }{
|
|
sb.Namespace, sb.Reserved, sb.HashAlgo, md})
|
|
return append([]byte(sshsigMagic), body...), nil
|
|
}
|
|
|
|
// VerifySignedOp: key-type-agnostic signature verify + anti-replay/authorization.
|
|
// allowedSigners is the trusted operator set (one key now; a quorum set later).
|
|
func VerifySignedOp(blob, sigArmored []byte, allowedSigners []ssh.PublicKey,
|
|
thisHostID, thisGuestID string, seenNonces NonceStore) (string, error) {
|
|
|
|
sb, err := parseArmoredSSHSIG(sigArmored)
|
|
if err != nil { return "", err }
|
|
if sb.Namespace != Namespace {
|
|
return "", fmt.Errorf("namespace mismatch: got %q want %q", sb.Namespace, Namespace)
|
|
}
|
|
pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey))
|
|
if err != nil { return "", err }
|
|
allowed := false
|
|
for _, a := range allowedSigners {
|
|
if bytes.Equal(a.Marshal(), pub.Marshal()) { allowed = true; break }
|
|
}
|
|
if !allowed { return "", errors.New("signer not in allowed set") }
|
|
|
|
signed, err := signedData(sb, blob)
|
|
if err != nil { return "", err }
|
|
var inner ssh.Signature
|
|
if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil { return "", err }
|
|
if err := pub.Verify(signed, &inner); err != nil { // dispatches on key algorithm
|
|
return "", fmt.Errorf("signature invalid: %w", err)
|
|
}
|
|
|
|
var op OpBlob
|
|
if err := json.Unmarshal(blob, &op); err != nil { return "", err }
|
|
if op.Target.HostID != thisHostID || op.Target.GuestID != thisGuestID {
|
|
return "", fmt.Errorf("target mismatch")
|
|
}
|
|
now := time.Now().UTC()
|
|
if now.Before(op.IssuedAt) { return "", errors.New("not yet valid") }
|
|
if now.After(op.ExpiresAt) { return "", errors.New("expired") }
|
|
if seenNonces.SeenOrRecord(op.Nonce, op.ExpiresAt) {
|
|
return "", fmt.Errorf("replay: nonce %s already seen", op.Nonce)
|
|
}
|
|
return op.Op, nil
|
|
}
|
|
```
|
|
|
|
## 8. Inputs to the design doc (`04-control-plane-authorization.md`)
|
|
|
|
- **Primitive confirmed:** SSHSIG (`ssh-keygen -Y sign` / armored `BEGIN SSH SIGNATURE`),
|
|
verified in Go via `pem.Decode` + `ssh.Unmarshal` + `ssh.ParsePublicKey` + `pub.Verify`. Low
|
|
implementation cost; no crypto hand-rolled.
|
|
- **Hub cannot forge:** the operator private key never touches the hub; the hub only queues the
|
|
opaque armored blob (matches `03` §4).
|
|
- **Key-type-agnostic / hardware-ready:** software `ed25519` now, FIDO2 `sk-ssh-ed25519` later is
|
|
a **box no-op** (proven end-to-end). The verifier hardcodes neither key type nor algorithm.
|
|
- **`allowedSigners` is a set:** single signer today; **threshold/quorum is just set sizing** plus
|
|
an N-of-M policy on top (out of scope here).
|
|
- **Anti-replay/authz are mandatory and cheap:** namespace (fixed), allow-list, then crypto,
|
|
then target-binding, time-window, nonce — all enforced and tested.
|
|
- **Canonical blob (§2)** is the shared contract between the operator CLI and the agent verifier.
|