Files
felhom.eu/documentation/tests/slice7-escrow-spike-findings.md
T
admin fe7d0850a5 spike(slice7): PBS recovery-code escrow round-trip findings (redacted)
Validated wrap->lose->unwrap->restore on a fenced throwaway: the R-recovered key
decrypts a real encrypted snapshot. Pins the PBS-native command sequence (key
change-passphrase --kdf scrypt/none), the pty requirement (F-A1: TTY-only, env var
ignored) + the echo caveat (F-A2: discard pty output so R can't leak), the blob
format/size, and the R format (EFF wordlist, >=128-bit). No K/R/token value recorded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 07:27:35 +02:00

5.4 KiB

Slice 7 — PBS recovery-code escrow round-trip: Findings

Host: demo-felhom (192.168.0.162) + PBS on DooPlex (192.168.0.180), PVE 9.2.2 / Debian 13. proxmox-backup-client 4.x. Date: 2026-06-10. Driver: SPIKE — validate the wrap → lose → unwrap → restore flow end-to-end on a fenced throwaway before specing the agent's escrow creation (slice-6 verifyfail discipline). No real key K or datastore was touched.

REDACTED by policy. No K value, no recovery code value, no token secret appears here — command shapes, blob size/format, fingerprint matching (not contents), R entropy/format (not the value). Throwaway key Kt / throwaway passphrase Rt only.


1. Setup (all throwaway, torn down)

  • Throwaway datastore escrowspike on DooPlex (/mnt/5_hdd/pbs-escrowspike), ACLs for felhom@pbs + felhom@pbs!n100. The real felhom-spike datastore was never used.
  • Throwaway PBS client key Kt (key create --kdf none — mirrors the live K posture: stored unencrypted so the agent backs up + restore-tests unattended).
  • The felhom@pbs!n100 token (real) was used for auth only to the throwaway datastore; the encryption key under test (Kt) is throwaway. (Same separation as the verifyfail runbook.)

2. The validated command sequence (this is the Phase-B contract)

PBS's key+passphrase path is the wrap mechanism (no bespoke crypto). The blob is a PBS key file re-keyed from kdf=none to kdf=scrypt under the recovery code; recovery reverses it.

  • Wrap K under R (escrow create) — copy the live key, then re-key the copy:
    cp <K-keyfile> <blob>
    proxmox-backup-client key change-passphrase <blob> --kdf scrypt      # prompts: New + Verify
    
  • Unwrap (recover K from the blob with R):
    proxmox-backup-client key change-passphrase <blob> --kdf none        # prompts: Encryption Key Password
    

F-A1 — change-passphrase is TTY-only; PBS_ENCRYPTION_PASSWORD is NOT consulted

Both directions prompt on the controlling terminal and fail unable to change passphrase - no tty when run non-interactively; the env var does not supply the new/old passphrase. The agent must drive it via a pty (Go: a pty pair; the spike used pty.fork()), feeding the passphrase once per prompt: wrap → twice (New + Verify), unwrap → once (Encryption Key Password).

F-A2 — the pty echoes the passphrase → the driver MUST discard pty output

The pty's line discipline echoes the fed passphrase back on the master fd. The wrapper must discard the pty's output (never copy it to stdout/log) and ideally run echo-off, so R cannot leak through captured output. (The spike's redacted runner returns the child output only to satisfy pty.spawn's progress loop and sends the whole invocation's stdout to /dev/null.)

F-A3 — blob format + size

The blob is the standard PBS key JSON (kdf: scrypt, scrypt params, data, fingerprint, created). ~383 bytes, opaque. The fingerprint is preserved across wrap→unwrap (it identifies the underlying key, not the passphrase) — the spike used it to prove same-key recovery.

3. Results (round-trips — the actual tests)

  • Crypto round-trip: create Kt (kdf=none) → wrap to kdf=scrypt (383 B) → remove Kt → unwrap with Rt → recovered key kdf=none with fingerprint identical to the original (match=True). A wrong passphrase is rejected (change-passphrase exits non-zero; blob stays scrypt).
  • Backup → recover → restore (the load-bearing test): wrote a canary file → encrypted backup to escrowspike with Kt (--crypt-mode encrypt) → wrap Kt under Rtremove Kt → unwrap with Rtproxmox-backup-client restore <snap> data.pxar <out> --keyfile <recovered>the canary content came back byte-identical (canary-match=True). The R-recovered key decrypts a real encrypted snapshot — slice-10 recovery is pre-validated a slice early.
  • Gotcha (test-harness only, not the mechanism): a snapshot must be restored with the same key it was made with — selecting the newest snapshot by backup-time matters when stale snapshots exist.

4. R (recovery code) — chosen entropy/format (implemented in Phase B)

  • Generated with crypto/rand; ≥128 bits.
  • Word-list form for off-paper transcription by a non-technical household: EFF large wordlist (7776 words, 12.92 bits/word), 10 words → ~129 bits, space/hyphen separated. (Raw base32 invites typos; the diceware form is the standard for human-entered passphrases.)
  • Surfaced to the customer exactly once (selftest stdout on the demo; enrollment UX later). Never logged/persisted/committed.

5. Teardown

escrowspike datastore removed (--destroy-data true) + ACLs deleted + dir removed; felhom-spike (real) untouched; all throwaway keys/blobs/scripts on the demo host removed. No R/K/token value was written anywhere.

6. Verdict

READY to implement Phase B (agent escrow creation) and Phase C (hub opaque storage). The PBS-native wrap is validated, recovery is proven to restore real encrypted data, and the two implementation constraints are pinned: drive change-passphrase via a pty (F-A1), and discard the pty output so R can't leak (F-A2). Escrow consumption / restore-mode serving stays slice 10 (but is now de-risked by this round-trip).