Empirical PBS validation before the slice-6 Phase B spec. Records: PBS install on Debian-13 DooPlex (trixie key ships in proxmox-archive-keyring, no standalone .gpg), datastore + cert fingerprint, the PBS privsep gotcha (grant role on user AND token), the encrypted pbs storage + key location (/etc/pve/priv/storage/<id>.enc), the snapshot volid format + native fields (→ PBSSnapshot shape), restore-from-PBS works unchanged, the verify mechanism (server-side; agent drives it remotely via the PBS API, result read from snapshot verification.state), no operator-token privilege gap, and zero-knowledge confirmed (server can't decrypt without the client key). PBS+datastore+storage left up for Phase B; no secrets committed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
11 KiB
Phase 5 spike — PBS mechanism validation (DooPlex server ← N100 client)
Status: empirical findings from a live spike (2026-06-09). PBS was never validated in any prior spike (proxmox-platform.md §4.6). This establishes the real mechanisms before the slice-6 Phase B spec is written. No production data; probe → record → teardown. The PBS server + datastore + the N100's
felhom-pbsstorage are left in place for Phase B's live runbook to reuse.
Topology under test
- PBS server: DooPlex
192.168.0.180(Debian 13 trixie, separate box, backup-only — runs no guests). PBSproxmox-backup-server 4.2.1-1. - Client: the N100
demo-felhom192.168.0.162(PVE 9.2.2,proxmox-backup-client 4.2.0). It backs up to DooPlex and restores from it.
Pre-flight
| Result | |
|---|---|
| P1 DooPlex can host PBS | ✅ Debian 13 trixie; proxmox-backup-server not in stock apt — needed the Proxmox PBS no-subscription repo (deb http://download.proxmox.com/debian/pbs trixie pbs-no-subscription). Surprise: there is no standalone proxmox-release-trixie.gpg (404; only bookworm/bullseye are published) — the trixie key ships in the proxmox-archive-keyring package (key 24B30F06…0BFE778E). I copied that keyring from the N100 (PVE9/trixie already has it). 126 GB free on /. |
| P2 N100 → DooPlex:8007 | ✅ reachable (8007 closed pre-install, open after). |
| P3 N100 PBS client | ✅ proxmox-backup-client 4.2.0, PVE PBSPlugin.pm present. |
Stand-up (DooPlex)
- S1 —
proxmox-backup-manager datastore create felhom-spike /var/lib/pbs-spike→TASK OK. Servicesproxmox-backup-proxy+proxmox-backupactive, listening on:8007. Server cert fingerprint:3b:95:5a:fa:9e:0e:4a:54:f3:64:08:e5:a2:a2:6c:66:e9:86:44:64:40:8e:c2:f7:6e:41:d2:c2:1e:86:48:c4. - S2 — created PBS user
felhom@pbs+ API tokenfelhom@pbs!n100, ACLDatastoreAdminon/datastore/felhom-spike. ⚠️ PBS privsep gotcha (mirrors PVE): an API token's effective rights = token-ACL ∩ user-ACL. Granting only the token wasn't enough —pvesm addfailed with "Cannot find datastore" untilDatastoreAdminwas also granted to thefelhom@pbsuser. Phase B's enrollment must grant the role on both the user and the token.
Adding as an encrypted PVE storage (N100)
- A1 —
pvesm add pbs felhom-pbs --server 192.168.0.180 --datastore felhom-spike --fingerprint <fp> --username 'felhom@pbs!n100' --password <token-secret> --encryption-key autogen --content backup. Resulting/etc/pve/storage.cfg:Where the keys live on the box (the "live key on box"):pbs: felhom-pbs datastore felhom-spike server 192.168.0.180 content backup encryption-key 01:36:e9:fe:e1:ee:3d:7a:9d:bf:3d:63:d0:68:fd:24:45:b7:5f:bc:b6:82:bc:6d:d2:b4:7a:b0:1a:86:6d:a1 fingerprint 3b:95:5a:fa:…:48:c4 username felhom@pbs!n100- client encryption key →
/etc/pve/priv/storage/felhom-pbs.enc(root:www-data 0600, 255 B). Theencryption-keyline in storage.cfg is only the key's fingerprint (01:36:e9:fe…), not the key. - PBS token secret →
/etc/pve/priv/storage/felhom-pbs.pw(0600, 37 B).
- client encryption key →
- A2 — the slice-5 agent observe (
--selftest=storage) sees the target with the fingerprint-pinned durable_id exactly as designed:durable=192.168.0.180:felhom-spike#3b:95:5a:fa:…:48:c4,type=pbs,state=attached. No agent change needed for observation.
Probes (B1–B6)
B1 — backup to PBS
vzdump 9001 --storage felhom-pbs --mode snapshot →
- PBS snapshot id
ct/9001/2026-06-09T14:18:33Z(<type>/<id>/<RFC3339Z>); the underlyingproxmox-backup-client backup … --repository felhom@pbs!n100@192.168.0.180:felhom-spike. - Encrypted client-side:
--crypt-mode=encrypt, "Using encryption key from file descriptor", "Encryption key fingerprint: 01:36:e9:fe:e1:ee:3d:7a" (matches the storage key). Incremental/deduped ("reused 41 MiB"). ~19 s for ~1 GiB. - Surprise vs Phase A: vzdump chose
stopmode for the (stopped) guest even thoughsnapshotwas requested (INFO: backup mode: stop). PVE picks the actual mode; the reportedBackup.modeis what was requested. For a running guest on lvm-thin it would snapshot. (Still crash-consistent only — no fsfreeze, per slice 6.)
B2 — snapshot inventory → the PBSSnapshot wire shape
- PVE volid (
pvesm list felhom-pbs):felhom-pbs:backup/ct/9001/2026-06-09T14:18:33Z, formatpbs-ct, typebackup. This is the exact volidpct restoreconsumes (B3). - PBS native (
proxmox-backup-client snapshot list --output-format json) per snapshot:backup-type(ct|vm),backup-id,backup-time(epoch int),size,owner(felhom@pbs!n100),protected(bool),fingerprint(the encryption-key fp), andfiles[]each withfilename+size+crypt-mode(encryptfor data,sign-onlyforindex.json).verificationis ABSENT until a verify runs (see B4). Namespace: not shown → the default (root) namespace; ansfield appears only for non-root namespaces.- → Proposed
PBSSnapshot:namespace,backup_type,backup_id,backup_time,size_bytes,owner,protected,encrypted(derive fromfiles[].crypt-mode),verify_state(ok|failed|none),verified_at/verify_upid.
- → Proposed
B3 — restore from PBS
pct restore 990001 'felhom-pbs:backup/ct/9001/2026-06-09T14:18:33Z' --storage local-lvm →
restored + booted to running. The existing restore path works UNCHANGED against a
pbs-sourced volid — same volid + --storage shape the agent's RestoreLXC already uses
(ostemplate=<volid>, restore=1). No agent restore code change needed for PBS. PVE pulls
- decrypts using the storage's
.enckey automatically.
B4 — verify mechanism (the big unknown — resolved)
proxmox-backup-clienthas NOverifysubcommand — verify is server-side.- Triggers: server CLI
proxmox-backup-manager verify <store> [--ignore-verified] [--outdated-after N]on DooPlex, OR the PBS APIPOST /api2/json/admin/datastore/ <ds>/verify(whole datastore; per-snapshot params available). - The agent on the N100 CAN drive it remotely via the PBS API + token (no DooPlex shell
needed). Proven:
curl -X POST …/admin/datastore/felhom-spike/verifywith headerAuthorization: PBSAPIToken=felhom@pbs!n100:<secret>returned a task UPIDUPID:dooplex:…:verify:felhom\x2dspike:felhom@pbs!n100:. NeedsDatastore.Verify(inDatastoreAdmin). - Result read-back: after verify, the snapshot's
verificationfield appears:{"state":"ok","upid":"UPID:dooplex:…"}(read viasnapshot list). So the agent triggers via API → polls/re-lists → readsverification.state(ok/failed). (Task-status polling needs the PBS node name — it'sdooplex, embedded in the UPID;localhostreturnsexitstatus: unknown.)
B5 — agent-token (felhom-agent@pve) privileges — no gap
Driven by the agent (operator token, not root@pam):
- Backup to PBS (
--selftest=backup): ✅felhom-pbs:backup/ct/9001/2026-06-09T14:22:30Z, crash_consistent, success. - Restore from PBS (
--selftest=restore-test): ✅ restored into scratch 990000, booted, verifiedrunning, torn down — pass. - The FelhomAgent role's existing
Datastore.{Audit,Allocate,AllocateSpace}+VM.Backupsuffice for both backup-to-PBS and restore-from-PBS. No role widening needed. (Two auth layers: the PVE operator token authorizes the vzdump/restore API call; the PBS token in storage.cfg authenticates PVE→PBS. The spike exercised both.)
B6 — zero-knowledge confirmed
- All data files are
crypt-mode=encrypt(B2);index.jsonissign-only. - Without the key, an authenticated restore fails to decrypt:
proxmox-backup-client restore … pct.conf.blob -(no--keyfile) →Error: missing key - manifest was created with key 01:36:e9:fe:e1:ee:3d:7a. - With
--keyfile /etc/pve/priv/storage/felhom-pbs.enc→ decrypts (returns the guest config). The key is the only gate. - The PBS server holds no client key —
find /etc/proxmox-backup /var/lib/pbs-spikefor key material returns only the server's owncsrf.key, never the client encryption key. So DooPlex can store + serve chunks but cannot read guest data. Zero-knowledge holds: the live key on the N100 is the irreducible residual (the operator/hub can't read the data).
Implications for the Phase B spec (flagged surprises vs the dir-storage assumptions)
- Enrollment must grant the PBS role on BOTH the user AND the token (PBS privsep), and add
the
pbsstorage with--encryption-key autogen→ the live key lands at/etc/pve/priv/storage/<id>.enc(the "live PBS key on the box", doc 03 §8). The hub holds only the recovery-code-wrapped escrow (out of scope here). - Backup + restore need NO new code beyond targeting a
pbsstorage —VzdumpandRestoreLXC/pct restorework against pbs volids unchanged. The agent'sLatestBackupVolID(StorageContent filter) already resolved the pbs volid. - Verify is a NEW capability to build: a server-side op the agent triggers remotely via
the PBS API (
POST …/datastore/<ds>/verify) using the storage's token, then reads backverification.statefrom the snapshot list. This is the "lighter frequent integrity check" (§8) — it does NOT need the encryption key (ciphertext-level), unlike the full self-restore- test. Phase B needs a small PBS-API client (token auth, fingerprint pin) for verify + snapshot-list-with-verify-state; the existingproxmox.Client(PVE API) does not cover it. PBSSnapshotwire shape = the B2 fields;verify_stateis the load-bearing one and isnoneuntil a verify runs.- vzdump mode is PVE's choice (stop for stopped guests) — report requested-vs-actual if it matters, or read the actual mode from the task log.
Teardown / left-in-place
- Throwaway restore guest 990001 destroyed; agent restore-test scratch self-torn-down;
pct list→ no leftover guests. Agent config reverted (backup.local_backup_target→local). Token-secret temp files removed from both boxes. - Left in place for Phase B: the PBS server on DooPlex, the
felhom-spikedatastore (with two test snapshots of 9001), thefelhom@pbs!n100token + ACLs, and the N100'sfelhom-pbsencrypted storage (+ its.enc/.pwunder/etc/pve/priv/storage/). - No secrets committed — the encryption key, token secret, and PBS password live only in
/etc/pve/priv/storage/(0600) on the N100; this doc references them by location/fingerprint only.