Validated escrow consumption end-to-end on a genuinely key-less box against the real felhom-spike datastore: recover K from (blob,R) via the real escrow.Unwrap, restore REAL data (spike-lxc rootfs, 2.5G) with the recovered key only, wrong-R fails closed (no plausible-but-wrong key), live K byte-unchanged. Redacted (no R/K/secret). GO to spec 10C + build 10D. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
11 KiB
Slice 10 (10C core) — escrow CONSUMPTION: recover K + restore REAL data on a key-less box — Findings
Host: box1 = demo-felhom (192.168.0.162); PBS datastore felhom-spike on DooPlex (192.168.0.180). PVE 9.2.2 / Debian 13, proxmox-backup-client 4.2.0.
Date: 2026-06-10. Driver: SPIKE — validate escrow consumption (recover K from (blob, R) on a fresh, key-less box) and the real-data restore that proves it, before specing the hub Down-channel (10A) and DR orchestration (10D). This is the half the creation self-verify does not cover: a genuinely key-less box, real-data recovery (not just a fingerprint), and a load-bearing R.
REDACTED by policy. No
Kvalue, no recovery codeRvalue, no token secret appears here — only command shapes, blob size/format, fingerprint match (the truncated PBS-printed prefix, never key bytes), andRstructure (10 EFF words / ~129 bits, not the value).sha256(K-file)was recorded out-of-band for the byte-unchanged proof and is not pasted here. The liveKwas operated on only as a copy and is byte-unchanged. The recoveredKwas shredded at teardown.
The spike drove the real agent code (internal/escrow Create / Unwrap / KeyFingerprint) via a throwaway harness (cmd/escrow-spike, not committed, removed at teardown) — so what passed here is exactly the code path 10C will wrap, not a re-implementation.
0. Setup (real datastore + real key + real encrypted backup)
- Real
K: the live PBS client encryption key for storagefelhom-pbsat/etc/pve/priv/storage/felhom-pbs.enc(kdf=none— stored unencrypted so the agent backs up + restore-tests unattended). Fingerprint01:36:e9:…(the PBS-printed prefix). - Real encrypted backup:
ct/9001/2026-06-09T15:01:37Zinfelhom-spike— thespike-lxccontainer,root.pxar2.5 GB,crypt-mode=encrypt, manifest key fingerprint01:36:e9:…, verify stateok. A genuinely encrypted real-guest backup (from the slice-6/8B path). - box1 has no default client key (
/root/.config/proxmox-backup/encryption-key.jsonabsent) — so nothing could silently mask the key-less test.
1. The validated consumption sequence (the 10C contract)
Consumption reverses creation (slice 7): the blob is the PBS key file re-keyed kdf=none → scrypt under R; recovery re-keys scrypt → none under R, yielding the raw K.
- Unwrap (
escrow.Unwrap(blob, R)):proxmox-backup-client key change-passphrase <blob> --kdf none(one prompt, fedRvia a pty). The in-place result is the recovered raw key file. - Restore with the recovered key only:
proxmox-backup-client restore <snap> <archive> <dest> --keyfile <recovered.key> --repository <repo>(PBS auth viaPBS_PASSWORD/PBS_FINGERPRINT).
The pty driving (F-A1/F-A2 from slice 7) carried over unchanged and works headless (no controlling TTY) — the harness ran over a non-interactive SSH session exactly as the daemon does.
2. Phase results
S0 — pre-flight (create from the REAL K; K untouched) — PASS
escrow.Create on box1 from the live K: self-verify PASSED (Create unwraps a copy with R and matches the fingerprint), result fingerprint 01:36:e9:… (== live K), blob 383 bytes kdf=scrypt, R = 10 words / ~129 bits. sha256(K-file) identical before and after Create → the live key was operated on only as a copy. R was written only to a 0600 file, never to stdout/log.
S1 — genuinely key-less fresh box; K is absent — PASS
A fenced fresh box = an isolated HOME/XDG_CONFIG_HOME under /root/escrow-spike/freshbox with no encryption key present. A restore of ct/9001 there with no key failed cleanly:
Error: missing key - manifest was created with key 01:36:e9:fe:e1:ee:3d:7a
No output file was produced. This makes S3 meaningful: without K, the real data is unrecoverable — so any later success is attributable to the recovered key, not a pre-existing one. The fresh box was then handed only the blob + R (nothing else from box1).
S2 — consume: recover K from (blob, R) — PASS
escrow.Unwrap(freshbox/blob, freshbox/R) → OK. The blob went kdf=scrypt → kdf=none (raw key), and KeyFingerprint(recovered) = 01:36:e9:… — a bit-for-bit match to the live K fingerprint. K genuinely came from R (the box had none).
S3 — LOAD-BEARING: restore REAL data with the recovered K only — PASS
Using only the recovered key on the key-less box:
- Config blob (
pct.conf.blob) decrypted to the real guest config —hostname: spike-lxc,ostype: debian,rootfs: local-lvm:vm-9001-disk-0,size=10G,cores: 2,memory: 2048. - Full
root.pxar(2.5 GB encrypted) restored in ~19 s, exit 0. The recovered rootfs is intact:/etc/hostname=spike-lxc,/etc/os-release=Debian GNU/Linux 13 (trixie), 143/etcentries,/bin/bashpresent + executable, 2.5 G on disk.
This — not the fingerprint — is the proof the recovered K decrypts real customer data end-to-end. Directly contrasts the S1 key-absent failure: same snapshot, same box, the only difference is the recovered key.
S4 — negative: R is load-bearing — PASS
escrow.Unwrap(blob, WRONG-R) failed cleanly: unwrap: FAILED: escrow: unwrap: exit status 255 (nonzero exit). The blob was left unchanged (kdf still scrypt) — no plausible-but-wrong raw key was emitted. Using the still-wrapped blob as a restore keyfile failed too (Error: no password input mechanism available, no output). A wrong R yields nothing usable, never silent garbage.
3. Findings / gotchas (feeds 10C/10D specs)
- F-C1 — the "missing key" failure is explicit and keyed. A key-less restore fails with
missing key - manifest was created with key <fp-prefix>and produces no partial output. 10D's restore step can detect a missing/wrong key deterministically (no silent empty restore) and surface which key fingerprint is required. - F-C2 — the recovered key is the raw
kdf=nonekey, ready to use as-is. Unwrap leaves a normal PBS key file;--keyfile <recovered>restores immediately. The fresh box needs no key-install ceremony beyond placing the unwrapped file where the restore reads it (--keyfile, or$XDG_CONFIG_HOME/proxmox-backup/encryption-key.jsonfor the default path). This is the only "install" step. - F-C3 — wrong
Ris fail-closed at the KDF layer. scrypt passphrase failure abortschange-passphrasewith a nonzero exit and leaves the blob untouched; there is no code path that emits a wrong-but-structurally-valid key. 10C does not need an extra "did we get the right key?" guard to avoid garbage — but it SHOULD still fingerprint-check (F-C4) to fail fast and loudly. - F-C4 — fingerprint-after-unwrap is the cheap correctness gate.
KeyFingerprint(recovered)vs the expected fingerprint (which the hub knows: it's instorage.cfg/the manifest) confirms the right key before a multi-GB restore. 10C should do this immediately after Unwrap. - F-C5 — pty driving is headless-safe. The slice-7 pty mechanism worked over a non-interactive SSH session with no controlling TTY — same as the daemon. No regression; nothing new needed for consumption.
- F-C6 —
Kis never mutated. Create copies before wrapping;sha256(K-file)was identical before, mid-spike, and after. The consumption path only ever reads/writes the fresh-box copy. Safe to run against a live box's escrow without risk to its running key.
4. What a fresh box needs before consumption (input to the 10D / Down-channel design)
Recovery on a re-enrolling box needs exactly four inputs, three of which are not the escrow secret:
- the opaque blob — from the hub Down-channel (10A) (the hub stores it; cannot open it);
- the recovery code
R— from the customer, by hand (two-factor; the hub never holds it); - PBS connection + auth — repo (
<user>@<realm>!<token>@<server>:<datastore>), the token secret, and the server fingerprint — these come from the restore directive / identity the hub serves (10D); - the expected key fingerprint — to gate F-C4 — also hub-served (it is in the storage manifest).
The blob +
RproduceK; (3)+(4) come from the hub. This cleanly separates the two factors: the hub serves everything exceptR, so a hub compromise alone still cannot decrypt (zero-knowledge holds end-to-end through consumption).
5. GO / NO-GO
GO to spec 10C (escrow consumption) and to build 10D (DR orchestration) around it. The crypto + real-data consumption is proven end-to-end on a genuinely key-less box with the real datastore: recover-from-(blob,R) works, the recovered key decrypts real customer data, a wrong R fails closed, and the live K is never touched. 10C is a thin wrapper over the proven escrow.Unwrap + a fingerprint gate (F-C4) + the existing PBS restore path; the remaining work is the plumbing (10A Down-channel to deliver the blob; 10D to deliver inputs (3)+(4) and orchestrate identity/namespace/tunnel restore + the operator-signed restore-overwrite gate, 10B), not the crypto.
6. Teardown
- Shredded (
shred -u): the recovered key, the fresh-box blob/Rcopies, the wrong-Rtest blob + wrong-Rfile, box1's blob +R. - Destroyed: the fenced fresh-box dir incl. the 2.5 G restored rootfs (
/root/escrow-spike) and the spike harness binary on box1. - Live
Kbyte-unchanged:sha256(/etc/pve/priv/storage/felhom-pbs.enc)identical to the S0 baseline at teardown. No secret committed to git. - No secret ever resided on the build server (180):
R,K, and the blob were generated/written/shredded only on box1 (162); 180 only compiled the non-secret harness and served datastore ciphertext. The throwaway harness binary/source on 180 (/tmp/escrow-spike*,cmd/escrow-spike/, never committed, build-only checkout that does not feed git) was removed at teardown. (DooPlex — 180 + PBS:8007+gitea.dooplex.hu— had a transient outage right at teardown, minutes after it served the S3 restore; box1/162 is a separate machine and was unaffected. 180 cleanup completed once it returned.) felhom-spikeleft as found: the spike used only read-only datastore ops (snapshot list,restore) — no create/delete/prune/forget — so no test snapshot could be orphaned or removed regardless of the final re-list (which could not run because 180 was unreachable; S1–S4 had already confirmedct/9001intact and verifiedok).
Out of scope (validated only the crypto + real-data consumption)
- Hub Down-channel serving the blob/restore-directive back to a re-enrolling box → 10A.
- Identity / tunnel / PBS-namespace restore + re-enrollment authorization → 10D.
- Operator-signed restore-overwrite gating → 10B.