From fe7d0850a5c4c54cc57347709ec26ae4c3f22174 Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Wed, 10 Jun 2026 07:27:35 +0200 Subject: [PATCH] spike(slice7): PBS recovery-code escrow round-trip findings (redacted) Validated wrap->lose->unwrap->restore on a fenced throwaway: the R-recovered key decrypts a real encrypted snapshot. Pins the PBS-native command sequence (key change-passphrase --kdf scrypt/none), the pty requirement (F-A1: TTY-only, env var ignored) + the echo caveat (F-A2: discard pty output so R can't leak), the blob format/size, and the R format (EFF wordlist, >=128-bit). No K/R/token value recorded. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../tests/slice7-escrow-spike-findings.md | 91 +++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 documentation/tests/slice7-escrow-spike-findings.md diff --git a/documentation/tests/slice7-escrow-spike-findings.md b/documentation/tests/slice7-escrow-spike-findings.md new file mode 100644 index 0000000..ef39577 --- /dev/null +++ b/documentation/tests/slice7-escrow-spike-findings.md @@ -0,0 +1,91 @@ +# Slice 7 — PBS recovery-code escrow round-trip: Findings + +**Host:** `demo-felhom` (192.168.0.162) + PBS on DooPlex (192.168.0.180), PVE 9.2.2 / Debian 13. +`proxmox-backup-client` 4.x. +**Date:** 2026-06-10. **Driver:** SPIKE — validate the wrap → lose → unwrap → **restore** flow +end-to-end on a **fenced throwaway** before specing the agent's escrow creation (slice-6 verifyfail +discipline). **No real key `K` or datastore was touched.** + +> **REDACTED by policy.** No `K` value, no recovery code value, no token secret appears here — +> command *shapes*, blob *size/format*, fingerprint *matching* (not contents), `R` *entropy/format* +> (not the value). Throwaway key `Kt` / throwaway passphrase `Rt` only. + +--- + +## 1. Setup (all throwaway, torn down) + +- Throwaway datastore `escrowspike` on DooPlex (`/mnt/5_hdd/pbs-escrowspike`), ACLs for + `felhom@pbs` + `felhom@pbs!n100`. The real `felhom-spike` datastore was never used. +- Throwaway PBS client key `Kt` (`key create --kdf none` — mirrors the live `K` posture: stored + unencrypted so the agent backs up + restore-tests unattended). +- The `felhom@pbs!n100` token (real) was used for **auth only** to the throwaway datastore; the + encryption key under test (`Kt`) is throwaway. (Same separation as the verifyfail runbook.) + +## 2. The validated command sequence (this is the Phase-B contract) + +PBS's key+passphrase path is the wrap mechanism (no bespoke crypto). The blob is a PBS key file +re-keyed from `kdf=none` to `kdf=scrypt` under the recovery code; recovery reverses it. + +- **Wrap `K` under `R`** (escrow create) — copy the live key, then re-key the copy: + ``` + cp + proxmox-backup-client key change-passphrase --kdf scrypt # prompts: New + Verify + ``` +- **Unwrap** (recover `K` from the blob with `R`): + ``` + proxmox-backup-client key change-passphrase --kdf none # prompts: Encryption Key Password + ``` + +### F-A1 — `change-passphrase` is TTY-only; `PBS_ENCRYPTION_PASSWORD` is NOT consulted +Both directions prompt on the controlling terminal and fail `unable to change passphrase - no tty` +when run non-interactively; the env var does **not** supply the new/old passphrase. **The agent +must drive it via a pty** (Go: a pty pair; the spike used `pty.fork()`), feeding the passphrase +once per prompt: **wrap → twice** (New + Verify), **unwrap → once** (Encryption Key Password). + +### F-A2 — the pty echoes the passphrase → the driver MUST discard pty output +The pty's line discipline echoes the fed passphrase back on the master fd. The wrapper must +**discard the pty's output** (never copy it to stdout/log) and ideally run echo-off, so `R` cannot +leak through captured output. (The spike's redacted runner returns the child output only to satisfy +`pty.spawn`'s progress loop and sends the whole invocation's stdout to `/dev/null`.) + +### F-A3 — blob format + size +The blob is the standard PBS key JSON (`kdf: scrypt`, scrypt params, `data`, `fingerprint`, +`created`). **~383 bytes**, opaque. The **fingerprint is preserved** across wrap→unwrap (it +identifies the underlying key, not the passphrase) — the spike used it to prove same-key recovery. + +## 3. Results (round-trips — the actual tests) + +- **Crypto round-trip:** `create Kt (kdf=none)` → wrap to `kdf=scrypt` (383 B) → **remove `Kt`** → + unwrap with `Rt` → recovered key `kdf=none` with **fingerprint identical to the original** + (`match=True`). A **wrong passphrase is rejected** (`change-passphrase` exits non-zero; blob stays + `scrypt`). +- **Backup → recover → restore (the load-bearing test):** wrote a canary file → **encrypted backup** + to `escrowspike` with `Kt` (`--crypt-mode encrypt`) → wrap `Kt` under `Rt` → **remove `Kt`** → + unwrap with `Rt` → `proxmox-backup-client restore data.pxar --keyfile ` → + **the canary content came back byte-identical** (`canary-match=True`). The R-recovered key + **decrypts a real encrypted snapshot** — slice-10 recovery is pre-validated a slice early. +- Gotcha (test-harness only, not the mechanism): a snapshot must be restored with the *same* key it + was made with — selecting the newest snapshot by `backup-time` matters when stale snapshots exist. + +## 4. `R` (recovery code) — chosen entropy/format (implemented in Phase B) + +- Generated with `crypto/rand`; **≥128 bits**. +- **Word-list form** for off-paper transcription by a non-technical household: **EFF large wordlist + (7776 words, 12.92 bits/word), 10 words → ~129 bits**, space/hyphen separated. (Raw base32 invites + typos; the diceware form is the standard for human-entered passphrases.) +- Surfaced to the customer **exactly once** (selftest stdout on the demo; enrollment UX later). + **Never** logged/persisted/committed. + +## 5. Teardown + +`escrowspike` datastore removed (`--destroy-data true`) + ACLs deleted + dir removed; +`felhom-spike` (real) **untouched**; all throwaway keys/blobs/scripts on the demo host removed. No +`R`/`K`/token value was written anywhere. + +## 6. Verdict + +**READY to implement Phase B** (agent escrow creation) and **Phase C** (hub opaque storage). The +PBS-native wrap is validated, recovery is proven to restore real encrypted data, and the two +implementation constraints are pinned: drive `change-passphrase` via a pty (F-A1), and discard the +pty output so `R` can't leak (F-A2). Escrow **consumption / restore-mode serving** stays slice 10 +(but is now de-risked by this round-trip).