f0fee7e193
internal/authz: production form of the Phase-4 SSHSIG signing primitive. - Verifier.New/Verify with the LOCKED pipeline (namespace → allow-list by key material → crypto over RAW bytes → target → time → nonce LAST); each post-crypto stage rejects even with a valid sig; an invalid sig never burns a nonce. - SSHSIG framing via x/crypto/ssh (no hand-rolled crypto); key-type-agnostic (ed25519 / sk-ssh-ed25519 / rsa / ecdsa via pub.Verify). Fixed namespace felhom-op-v1. Typed errors. OpBlob (fixed host_id/guest_id tags) + VerifiedOp. - NonceStore: MemoryNonceStore + durable crash-safe FileNonceStore (fsync'd append log, replay-on-open, compaction, expiry-only pruning; survives restart). - config.AuthzConfig (nonce path + pinned operational/recovery signer keys). - Tests (14): real ssh-keygen fixture, per-stage rejection, nonce-not-burned, replay, persistence-across-restart, synthetic sk, byte-exactness. Dep: golang.org/x/crypto v0.52.0 (declares go 1.25 — the Phase-4 doc's "Go 1.24.4 / x/crypto v0.52.0" pairing doesn't build; build server upgraded to go1.26.0, backward-compatible). Version 0.1.0 -> 0.2.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
112 lines
6.7 KiB
Markdown
112 lines
6.7 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to **felhom-agent** are recorded here. Update on every code
|
|
change that gets pushed.
|
|
|
|
## v0.2.0 — `authz` signed-op verifier (slice 2) (2026-06-08)
|
|
|
|
Production form of the Phase-4 signing primitive: a key-type-agnostic SSHSIG
|
|
verifier for operator-signed destructive ops, with the full anti-replay/
|
|
authorization pipeline and a durable, crash-safe nonce store. What slice 4
|
|
(reconcile) will call to gate destructive desired-state deltas. No hub, no signing
|
|
CLI, no reconcile loop.
|
|
|
|
### Added
|
|
- **`internal/authz` — `Verifier`**: `New(signers, store, hostID)` + `Verify(blob,
|
|
sigArmored) (*VerifiedOp, error)`. Runs the LOCKED pipeline (order is
|
|
load-bearing): parse armor → namespace → parse pubkey → allow-list (by key
|
|
**material**, `pub.Marshal()` equality, not key_id) → crypto verify (over the
|
|
**raw received bytes**, never re-canonicalized) → parse blob → target → time
|
|
window → **nonce recorded LAST**. Each post-crypto stage rejects even with a
|
|
valid signature.
|
|
- **SSHSIG framing** (`sshsig.go`) via `golang.org/x/crypto/ssh` — `pem.Decode` →
|
|
strip 6-byte magic → `ssh.Unmarshal` → `ssh.ParsePublicKey` → recompute signed
|
|
data with the named hash → `pub.Verify` (dispatches on key algorithm). No
|
|
hand-rolled crypto. Key-type-agnostic: ed25519 / **sk-ssh-ed25519 (FIDO2)** /
|
|
rsa / ecdsa via the one path.
|
|
- **Fixed namespace** `felhom-op-v1` (package constant, never caller-supplied).
|
|
- **`OpBlob`** (corrected `host_id`/`guest_id` json tags) + **`VerifiedOp`** (op,
|
|
host/guest, params, key_id, matched signer). key_id is advisory/audit only —
|
|
never an authz input.
|
|
- **Typed errors**: `ErrMalformed, ErrNamespace, ErrUnknownSigner, ErrBadSignature,
|
|
ErrTarget, ErrExpired, ErrNotYetValid, ErrReplay` (errors.Is-friendly).
|
|
- **`NonceStore`** + two impls: `MemoryNonceStore` (tests) and **`FileNonceStore`**
|
|
— durable, crash-safe (fsync'd append log, replayed into an index on open,
|
|
periodic compaction, expiry-only pruning). A nonce is fsync'd to disk before
|
|
`SeenOrRecord` returns false; replay protection survives restart; I/O failure
|
|
fails safe (reports seen=true). Target generalization: host_id matched strictly,
|
|
guest_id surfaced for the caller to route.
|
|
- **Config**: `AuthzConfig` (nonce-store path + pinned operator `signers` tagged
|
|
`operational`/`recovery` with a key_id, as authorized_keys lines).
|
|
- **Version 0.2.0.**
|
|
|
|
### Tests
|
|
- Real OpenSSH interop via a committed `ssh-keygen -Y sign` vector (hermetic CI);
|
|
per-stage rejection (each with an otherwise-valid sig); the headline
|
|
**invalid-sig-does-not-burn-the-nonce** invariant; replay; **persistence across
|
|
restart**; synthetic **sk-ssh-ed25519** through the unchanged path; byte-exactness
|
|
(a re-serialized blob fails crypto — not re-canonicalized).
|
|
|
|
### Notes / corrections to the Phase-4 reference
|
|
- §7's `Target` lacked json tags (`host_id`/`guest_id`) — fixed.
|
|
- The doc paired "Go 1.24.4 / x/crypto v0.52.0", but v0.52.0 declares `go 1.25.0`
|
|
and does **not** build on Go 1.24. Resolved by upgrading the build server to
|
|
go1.26.0 (backward-compatible; felhom-controller/hub unaffected); the module is
|
|
`go 1.25.0` on x/crypto v0.52.0.
|
|
- Free function → constructed `Verifier`; returns the full `VerifiedOp`; typed
|
|
errors; clock-skew tolerance added; durable nonce store is the net-new work.
|
|
- **Shared-contract dependency flagged** (not built): the hub and the `felhom-sign`
|
|
CLI must emit byte-identical canonical JSON or signatures won't verify; a shared
|
|
canonicalizer both import would be the right home.
|
|
|
|
## v0.1.0 — Scaffold + `proxmox` interaction layer (slice 1) (2026-06-08)
|
|
|
|
First slice: stand up the host-agent project and its foundation — the typed
|
|
Proxmox interaction layer every other module will call. No reconcile loop, hub
|
|
client, signing, or storage/backup orchestration yet (later slices).
|
|
|
|
### Added
|
|
- **Project scaffold**: module `gitea.dooplex.hu/admin/felhom-agent`, binary
|
|
`felhom-agent` (`cmd/felhom-agent/`), Go 1.24, zero external dependencies
|
|
(pure stdlib). `--version` flag; `version` var overridable via
|
|
`-ldflags "-X main.version=<v>"`.
|
|
- **`internal/proxmox` — API backend (`Client`)**: hand-rolled REST client over
|
|
`https://<host>:8006/api2/json` with `PVEAPIToken` auth. Typed read ops
|
|
(`Version`, `Nodes`, `NodeStatus`, `ListLXC`, `GuestStatus`, `GuestConfig`,
|
|
`ListStorage`, `NodeStorage`, `StorageContent`) and async mutating ops
|
|
returning a UPID (`RestoreLXC` — the primary create path, `Vzdump`, `Snapshot`,
|
|
`Rollback`, `DeleteSnapshot`, `SetConfig`, `Start`, `Stop`).
|
|
- **`WaitTask`**: polls `GET /nodes/{node}/tasks/{upid}/status` until stopped, then
|
|
asserts `exitstatus == "OK"` (authorization can surface at task execution, not
|
|
the POST — phase1-2 §1.3). Exponential backoff (1s→5s cap), context
|
|
cancellation + timeout. `*APIError` parses the offending privilege from a 403;
|
|
`*TaskError` parses it from a failed task exitstatus + log tail.
|
|
- **`internal/proxmox` — fenced root-CLI backend (`Privileged`)**: limited to the
|
|
three proven OS-root exceptions only — `CreateGoldenLXC` (keyctl `pct create`),
|
|
`MountUSBByUUID`, `SMART`, `Sensors`; each cites why it can't be the API. Fence
|
|
is structural (Client never shells out, Privileged never makes an HTTP call) and
|
|
asserted in tests.
|
|
- **TLS trust**: SHA-256 leaf-cert pinning (the host serves a self-signed cert) or
|
|
a CA file; an explicitly-named `insecure_skip_verify` that is off by default. No
|
|
blanket verification disable.
|
|
- **`internal/config`**: JSON config file + `FELHOM_AGENT_*` env overrides; the
|
|
token secret is never logged (`Redacted()`).
|
|
- **`internal/log`**: slog setup (text, stderr, configurable level).
|
|
- **`cmd/felhom-agent --selftest`**: read-only health report against a live host
|
|
(version/nodes/status/guests/storage); `--selftest=task --vmid N` exercises
|
|
`WaitTask` on a reversible snapshot→rollback→delete op (gated; default selftest
|
|
mutates nothing).
|
|
- **Tests**: unit tests with a mock HTTP transport + mock runner (UPID parse,
|
|
`WaitTask` running→OK / failed-403 / timeout / ctx-cancel, 403→privilege error,
|
|
response decoding against shapes captured live from `demo-felhom`, config
|
|
redaction, and the API-vs-root routing fence).
|
|
|
|
### Notes
|
|
- Types are grounded in the spike findings
|
|
(`felhom.eu/documentation/proxmox-platform.md`, `tests/phase{0,1-2,3}-findings.md`)
|
|
and the exact JSON shapes captured live from `demo-felhom` (PVE 9.2.2).
|
|
- Verified: `go build/vet/test` green on Go 1.24.4 (build server) and a live
|
|
read-only `--selftest` against the demo host with TLS fingerprint pinning.
|
|
- The 16-privilege `FelhomAgent` role + privsep token (role on **both** user and
|
|
token) is provisioned out-of-band; the agent only consumes the token.
|