spike(slice7): PBS recovery-code escrow round-trip findings (redacted)

Validated wrap->lose->unwrap->restore on a fenced throwaway: the R-recovered key
decrypts a real encrypted snapshot. Pins the PBS-native command sequence (key
change-passphrase --kdf scrypt/none), the pty requirement (F-A1: TTY-only, env var
ignored) + the echo caveat (F-A2: discard pty output so R can't leak), the blob
format/size, and the R format (EFF wordlist, >=128-bit). No K/R/token value recorded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-10 07:27:35 +02:00
parent 15c4728e2c
commit fe7d0850a5
@@ -0,0 +1,91 @@
# Slice 7 — PBS recovery-code escrow round-trip: Findings
**Host:** `demo-felhom` (192.168.0.162) + PBS on DooPlex (192.168.0.180), PVE 9.2.2 / Debian 13.
`proxmox-backup-client` 4.x.
**Date:** 2026-06-10. **Driver:** SPIKE — validate the wrap → lose → unwrap → **restore** flow
end-to-end on a **fenced throwaway** before specing the agent's escrow creation (slice-6 verifyfail
discipline). **No real key `K` or datastore was touched.**
> **REDACTED by policy.** No `K` value, no recovery code value, no token secret appears here —
> command *shapes*, blob *size/format*, fingerprint *matching* (not contents), `R` *entropy/format*
> (not the value). Throwaway key `Kt` / throwaway passphrase `Rt` only.
---
## 1. Setup (all throwaway, torn down)
- Throwaway datastore `escrowspike` on DooPlex (`/mnt/5_hdd/pbs-escrowspike`), ACLs for
`felhom@pbs` + `felhom@pbs!n100`. The real `felhom-spike` datastore was never used.
- Throwaway PBS client key `Kt` (`key create --kdf none` — mirrors the live `K` posture: stored
unencrypted so the agent backs up + restore-tests unattended).
- The `felhom@pbs!n100` token (real) was used for **auth only** to the throwaway datastore; the
encryption key under test (`Kt`) is throwaway. (Same separation as the verifyfail runbook.)
## 2. The validated command sequence (this is the Phase-B contract)
PBS's key+passphrase path is the wrap mechanism (no bespoke crypto). The blob is a PBS key file
re-keyed from `kdf=none` to `kdf=scrypt` under the recovery code; recovery reverses it.
- **Wrap `K` under `R`** (escrow create) — copy the live key, then re-key the copy:
```
cp <K-keyfile> <blob>
proxmox-backup-client key change-passphrase <blob> --kdf scrypt # prompts: New + Verify
```
- **Unwrap** (recover `K` from the blob with `R`):
```
proxmox-backup-client key change-passphrase <blob> --kdf none # prompts: Encryption Key Password
```
### F-A1 — `change-passphrase` is TTY-only; `PBS_ENCRYPTION_PASSWORD` is NOT consulted
Both directions prompt on the controlling terminal and fail `unable to change passphrase - no tty`
when run non-interactively; the env var does **not** supply the new/old passphrase. **The agent
must drive it via a pty** (Go: a pty pair; the spike used `pty.fork()`), feeding the passphrase
once per prompt: **wrap → twice** (New + Verify), **unwrap → once** (Encryption Key Password).
### F-A2 — the pty echoes the passphrase → the driver MUST discard pty output
The pty's line discipline echoes the fed passphrase back on the master fd. The wrapper must
**discard the pty's output** (never copy it to stdout/log) and ideally run echo-off, so `R` cannot
leak through captured output. (The spike's redacted runner returns the child output only to satisfy
`pty.spawn`'s progress loop and sends the whole invocation's stdout to `/dev/null`.)
### F-A3 — blob format + size
The blob is the standard PBS key JSON (`kdf: scrypt`, scrypt params, `data`, `fingerprint`,
`created`). **~383 bytes**, opaque. The **fingerprint is preserved** across wrap→unwrap (it
identifies the underlying key, not the passphrase) — the spike used it to prove same-key recovery.
## 3. Results (round-trips — the actual tests)
- **Crypto round-trip:** `create Kt (kdf=none)` → wrap to `kdf=scrypt` (383 B) → **remove `Kt`** →
unwrap with `Rt` → recovered key `kdf=none` with **fingerprint identical to the original**
(`match=True`). A **wrong passphrase is rejected** (`change-passphrase` exits non-zero; blob stays
`scrypt`).
- **Backup → recover → restore (the load-bearing test):** wrote a canary file → **encrypted backup**
to `escrowspike` with `Kt` (`--crypt-mode encrypt`) → wrap `Kt` under `Rt` → **remove `Kt`** →
unwrap with `Rt` → `proxmox-backup-client restore <snap> data.pxar <out> --keyfile <recovered>` →
**the canary content came back byte-identical** (`canary-match=True`). The R-recovered key
**decrypts a real encrypted snapshot** — slice-10 recovery is pre-validated a slice early.
- Gotcha (test-harness only, not the mechanism): a snapshot must be restored with the *same* key it
was made with — selecting the newest snapshot by `backup-time` matters when stale snapshots exist.
## 4. `R` (recovery code) — chosen entropy/format (implemented in Phase B)
- Generated with `crypto/rand`; **≥128 bits**.
- **Word-list form** for off-paper transcription by a non-technical household: **EFF large wordlist
(7776 words, 12.92 bits/word), 10 words → ~129 bits**, space/hyphen separated. (Raw base32 invites
typos; the diceware form is the standard for human-entered passphrases.)
- Surfaced to the customer **exactly once** (selftest stdout on the demo; enrollment UX later).
**Never** logged/persisted/committed.
## 5. Teardown
`escrowspike` datastore removed (`--destroy-data true`) + ACLs deleted + dir removed;
`felhom-spike` (real) **untouched**; all throwaway keys/blobs/scripts on the demo host removed. No
`R`/`K`/token value was written anywhere.
## 6. Verdict
**READY to implement Phase B** (agent escrow creation) and **Phase C** (hub opaque storage). The
PBS-native wrap is validated, recovery is proven to restore real encrypted data, and the two
implementation constraints are pinned: drive `change-passphrase` via a pty (F-A1), and discard the
pty output so `R` can't leak (F-A2). Escrow **consumption / restore-mode serving** stays slice 10
(but is now de-risked by this round-trip).