Files
proxmox-controller/docs/tests/phase1-2-findings.md
T
2026-06-07 20:46:01 +02:00

17 KiB
Raw Blame History

Phase 1 + 2 — Privilege Model & Backup/Restore Round-Trip: Findings

Host: demo-felhom (192.168.0.162) — Proxmox VE 9.2.2, node confirmed via pvesh get /nodesdemo-felhom. Storage: local (dir, content iso,vztmpl,backup,import), local-lvm (LVM-thin, rootdir,images). Subject: LXC 9001 (spike-lxc, unprivileged, nesting=1,keyctl=1, Docker + postgres/redis/nginx stack). Date: 2026-06-07.

Data and observations only — no recommendation or verdict.

Hypotheses — verdicts at a glance

Hypothesis Result
H1 Backup scopes to one VMID; restore/create needs node/pool allocate → denied to narrow token CONFIRMED (create CT = 403)
H2 An LXC vzdump captures the Docker volumes (they live in the container rootfs) CONFIRMED (sentinel survived both restores)
H3 Crash-consistent (running) and quiesced (stopped) backups both restore cleanly CONFIRMED (A via WAL recovery, B clean start)
H4 Running unprivileged LXC snapshots on LVM-thin; restored CT keeps unprivileged+nesting/keyctl CONFIRMED (live snapshot OK; config survived)

1. Phase 1 — Privilege model

1.1 Setup (operator side, root)

pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit"
pveum user add felhom-ctl@pve --comment "spike in-guest controller"
pveum user token add felhom-ctl@pve ctl --privsep 1   # secret: b6547d9d-... (ephemeral, spike-only)
pveum acl modify /vms/9001      -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup

Privilege names were verified against PVEVMAdmin / PVEDatastoreUser via pveum role list first. Note: the reference doc's introspection command pveum role info <role> does not exist in PVE 9 — only pveum role list works.

1.2 ⚠️ Privsep gotcha — the doc's runbook is incomplete

With --privsep 1, a token's effective rights are the intersection of the backing user's permissions AND the token's own ACLs. The reference doc (§3) grants ACLs to the token only. With the user felhom-ctl@pve holding no permissions, the intersection was empty — the first self-audit call returned:

HTTP 403  {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}

Fix applied: also grant the user the role on the same paths (pveum acl modify /vms/9001 -user felhom-ctl@pve -role FelhomSelfBackup, same for /storage/local). After that the self-calls succeeded. A privsep token needs the permission present on both the user and the token (the token ACL is what keeps the token ≤ user / narrowly scoped). This must be reflected in the controller provisioning.

1.3 Test matrix (every call run from inside the unprivileged LXC, pct exec 9001)

H=192.168.0.162 N=demo-felhom AUTH="PVEAPIToken=felhom-ctl@pve!ctl=<secret>"

# Call Expected Actual Notes
1 GET /version 200 200 reachable + auth from inside LXC (no privilege needed)
2 GET /nodes/$N/lxc/9001/status/current 200 200¹ self audit (after privsep fix)
3 POST /nodes/$N/lxc/9001/snapshot snapname=spk1 200/UPID→OK 200, task exitstatus OK running-LXC self-snapshot (H4)
4 POST /nodes/$N/vzdump vmid=9001 storage=local mode=snapshot 200/UPID→OK 200, task exitstatus OK self backup, archive produced
5 GET /nodes/$N/qemu/9000/status/current 403 403 Permission check failed (/vms/9000, VM.Audit)
6 POST /nodes/$N/vzdump vmid=9000 storage=local 403 200 POST → task exitstatus 403² see note
7 POST /nodes/$N/lxc (create CT) 403 403 Permission check failedproves create/allocate is operator-tier (H1)

¹ before the privsep fix this was 403; see §1.2. ² Important nuance: the vzdump endpoint accepts the POST and returns a UPID even for an unauthorized vmid; the authorization failure surfaces at task execution, not at the HTTP layer. Polled from root: exitstatus: "403 Permission check failed (/vms/9000, VM.Backup)", and no 9000 archive was created. The boundary holds — but a controller must poll the task exitstatus, not trust the POST's 200, to know a cross-guest backup was actually refused.

Pass criteria met: self-ops (14) succeed; cross-guest read (5), cross-guest backup (6, at task level), and create/allocate (7) are denied. The controller-as-guest boundary and the two-tier split are validated.

1.4 Final minimal role — VM.PowerMgmt not required

The doc's open question ("does Tier A need VM.PowerMgmt for stop-mode backups? Likely yes"). Tested and refuted: a stop-mode self-vzdump submitted by the token (vmid=9001 mode=stop) completed with exitstatus: OK using the role without VM.PowerMgmt. vzdump performs the guest shutdown/restart internally under VM.Backup; no separate power privilege is needed.

Final minimal role (FelhomSelfBackup) — satisfies self-audit, self-snapshot, and both snapshot- and stop-mode self-backup: VM.Audit, VM.Snapshot, VM.Backup, Datastore.AllocateSpace, Datastore.Audit (VM.PowerMgmt deliberately omitted — confirmed unnecessary.)

1.5 TLS observation

From inside the LXC, curl without -k:

curl: (60) SSL certificate problem: unable to get local issuer certificate

The host serves the default self-signed PVE cert; all tests used -k. Production trust (pin the PVE CA / issue a proper cert) is a separate design decision, flagged here.

1.6 Running-LXC snapshot (H4)

Call #3 snapshotted the running unprivileged LXC on LVM-thin (exitstatus OK). pct listsnapshot 9001 shows spk1 with pct status 9001 = running. No stop required — the snapshot-before-update rollback flow is viable on a live container.


2. Phase 2 — Backup → real restore round-trip

Sentinel written pre-flight into the pgdata volume: restore_check(42,'phase2-sentinel') → clean read 42|phase2-sentinel.

2.1 Backups (operator/root side)

Variant Mode Stack state Task time Wall Archive Size (zstd)
A — crash-consistent snapshot running 00:00:24 25 s vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst 934 MB (979,718,569 B)
B — quiesced snapshot stopped (docker compose stop) 00:00:21 22 s vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst 934 MB (979,671,582 B)

Both from a 2.5 GiB source; zstd → ~934 MB (~2.7:1). The stack was restarted after Variant B. LXC snapshot-mode vzdump does not fsfreeze (no guest agent in an LXC — consistent with the Phase 0 finding) → Variant A is genuinely crash-consistent.

2.2 Restore → fresh VMID → boot → verify

Check 9002 (Variant A) 9003 (Variant B)
Restore time (pct restore … --storage local-lvm) 12 s 11 s
unprivileged: 1 survived yes yes
features: nesting=1,keyctl=1 survived yes yes
Containers after boot exited (no restart policy) → docker compose up -d same
3 containers healthy yes yes
curl localhost:8080 HTTP 200 HTTP 200
Sentinel (42,'phase2-sentinel') PRESENT PRESENT
Postgres first-start WAL crash recovery (see below) clean start, no recovery

Restored CTs inherit 9001's fixed hwaddr. To avoid a MAC clash with the still-running 9001 on vmbr0, net0 was reset to auto-generate a fresh MAC before boot. All verification (stack health, curl localhost, sentinel) is guest-internal and needs no external network — and the Docker images are inside the restored rootfs, so no pulls.

Variant A — Postgres automatic WAL recovery on 9002 (verbatim, post-restore boot):

LOG:  database system was interrupted; last known up at 2026-06-07 18:13:21 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/CB12838
LOG:  invalid record length at 0/CB12870: expected at least 24, got 0   # normal end-of-WAL
LOG:  redo done at 0/CB12838 ...
LOG:  checkpoint starting: end-of-recovery immediate wait
LOG:  database system is ready to accept connections

Variant B — clean start on 9003 (verbatim, post-restore boot):

LOG:  database system was shut down at 2026-06-07 18:14:39 UTC
LOG:  database system is ready to accept connections

H2 confirmed: one LXC vzdump captured the whole customer including the Docker named volume — the sentinel data restored in both guests. H3 confirmed: both variants restored to a bootable guest with intact data; the crash-consistent one recovered via WAL with no manual intervention, the quiesced one started clean. H4 confirmed: restored config preserved unprivileged + nesting/keyctl, so Docker ran in the restored CT.


3. Observations & confounds

  1. Privsep token needs perms on user and token (§1.2) — the single most important correction to the reference runbook; without it every scoped call 403s.
  2. vzdump authorization is task-level, not POST-level (§1.3 note ²) — a 200 + UPID does not mean authorized. The controller must poll exitstatus. This is also the general async-task lesson: every backup/snapshot/restore returns a UPID and the real result is in the task status.
  3. pveum role info is gone in PVE 9 — use pveum role list. Minor doc drift.
  4. VM.PowerMgmt not needed for stop-mode backup (§1.4) — narrower role than the doc assumed.
  5. No fsfreeze for LXC — Variant A relied on Postgres's own WAL crash recovery, which worked here for an idle-at-backup DB. Under heavy write load, app-consistency for LXC still rests on the controller quiescing first (or stop-mode), exactly as the reference warned. This single test is not a durability guarantee under load.
  6. Restore MAC collision (§2.2) — pct restore preserves the source hwaddr; restoring while the original runs needs a MAC reset (or the original stopped). The controller's restore flow must handle identity (MAC/hostname/IP) to avoid clashes.
  7. No restart policy on the compose services — restored containers came up exited; docker compose up -d (or a restart policy / systemd unit) is required for the stack to return automatically after a restore or guest reboot.
  8. Restore is fast, backup dominated by I/O — restores were 1112 s (extract at ~524 MiB/s); backups ~2225 s (read 2.5 GiB at ~108119 MiB/s + zstd). Single runs, idle host, ~150 MB DB; not a throughput benchmark.
  9. Sequencing artifact: a Phase-1 stop-mode self-backup ran before Phase 2 and stopped/started 9001; the stack was brought back up and the sentinel re-verified before the Variant A/B backups, so it does not affect the round-trip results.

4. Raw command log (appendix)

4.1 Pre-flight

$ pvesh get /nodes  -> node: demo-felhom
$ cat /etc/pve/storage.cfg
dir: local   ... content iso,vztmpl,backup,import        # 'backup' present
lvmthin: local-lvm ... content rootdir,images            # no backup (expected)
$ pct start 9001 ; docker compose up -d  -> 3 containers Started
$ curl localhost:8080  -> HTTP 200
# sentinel:
CREATE TABLE ; INSERT 0 1 ; SELECT count -> 1 ; SELECT * -> 42 | phase2-sentinel

4.2 Phase 1 — role/user/token/ACL

$ pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit"  -> role-ok
$ pveum user add felhom-ctl@pve --comment "spike in-guest controller"  -> user-ok
$ pveum user token add felhom-ctl@pve ctl --privsep 1
  {"full-tokenid":"felhom-ctl@pve!ctl","info":{"privsep":"1"},"value":"b6547d9d-08ec-4f22-beb8-a551dc2cd69d"}
$ pveum acl modify /vms/9001 -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup   -> ok
$ pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup -> ok
$ pveum role list | grep FelhomSelfBackup
  FelhomSelfBackup | Datastore.AllocateSpace,Datastore.Audit,VM.Audit,VM.Backup,VM.Snapshot
$ pveum role info FelhomSelfBackup   -> ERROR: unknown command 'pveum role info'   # PVE9 has no 'role info'

4.3 Phase 1 — matrix (from inside LXC)

# TLS without -k:
curl: (60) SSL certificate problem: unable to get local issuer certificate

# BEFORE privsep fix:
#2 GET self status -> HTTP 403 {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}

# privsep fix:
$ pveum acl modify /vms/9001 -user 'felhom-ctl@pve' -role FelhomSelfBackup  -> ok
$ pveum acl modify /storage/local -user 'felhom-ctl@pve' -role FelhomSelfBackup -> ok

# AFTER fix:
#1 GET /version                         -> HTTP 200
#2 GET /nodes/.../lxc/9001/status/current -> HTTP 200 {"data":{...,"status":"running",...}}
#5 GET /nodes/.../qemu/9000/status/current -> HTTP 403 (/vms/9000, VM.Audit)
#6 POST vzdump vmid=9000 -> HTTP 200 {"data":"UPID:...vzdump:9000:felhom-ctl@pve!ctl:"}
   root poll: exitstatus="403 Permission check failed (/vms/9000, VM.Backup)"
   task log: TASK ERROR: 403 Permission check failed (/vms/9000, VM.Backup)
   /var/lib/vz/dump: no 9000 archive created
#7 POST /nodes/.../lxc (create CT vmid=9009) -> HTTP 403 {"message":"Permission check failed\n"}

#3 POST lxc/9001/snapshot snapname=spk1 -> HTTP 200 UPID:...vzsnapshot:9001...
   root: exitstatus "OK" ; pct listsnapshot 9001 -> spk1 ; pct status 9001 -> running
#4 POST vzdump vmid=9001 storage=local mode=snapshot -> HTTP 200 UPID:...vzdump:9001...
   root: exitstatus "OK"
   token can read own task status: HTTP 200 {"...exitstatus":"OK"}   # earlier poll TIMEOUTs were a shell-quoting bug in the helper, not a perms issue

# stop-mode self-backup (VM.PowerMgmt test):
$ token POST vzdump vmid=9001 storage=local mode=stop -> HTTP 200 UPID:...vzdump:9001...
   root poll: exitstatus "OK"     # SUCCEEDED without VM.PowerMgmt in the role

4.4 Phase 2 — backups

# Variant A (running):
$ vzdump 9001 --mode snapshot --storage local --compress zstd
INFO: Total bytes written: 2585589760 (2.5GiB, 108MiB/s)
INFO: archive file size: 934MB
INFO: Finished Backup of VM 9001 (00:00:24)   ; WALL_SECONDS=25
-> vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst  (979718569 B)

# Variant B (stopped):
$ docker compose stop   (cache,db,web Stopped)
$ vzdump 9001 --mode snapshot --storage local --compress zstd
INFO: Total bytes written: 2585825280 (2.5GiB, 119MiB/s)
INFO: Finished Backup of VM 9001 (00:00:21)   ; WALL_SECONDS=22
-> vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst  (979671582 B)
$ docker compose start   (db,cache,web Started)

4.5 Phase 2 — restores + verification

# A -> 9002:
$ pct restore 9002 .../20_13_43.tar.zst --storage local-lvm
  Total bytes read: 2585589760 (2.5GiB, 524MiB/s) ; RESTORE_A_SECONDS=12
$ pct config 9002 -> features: nesting=1,keyctl=1 ; unprivileged: 1
$ pct set 9002 -net0 name=eth0,bridge=vmbr0,ip=dhcp   # fresh MAC BC:24:11:E3:F4:64
$ pct start 9002 ; docker compose up -d -> 3 running ; curl -> HTTP 200
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
  db log: "was interrupted ... not properly shut down; automatic recovery in progress
           redo starts/redo done ... database system is ready to accept connections"

# B -> 9003:
$ pct restore 9003 .../20_14_40.tar.zst --storage local-lvm
  Total bytes read: 2585825280 (2.5GiB, 524MiB/s) ; RESTORE_B_SECONDS=11
$ pct config 9003 -> features: nesting=1,keyctl=1 ; unprivileged: 1
$ pct set 9003 -net0 ... (fresh MAC) ; pct start 9003 ; docker compose up -d -> 3 running ; curl 200
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
  db log: "database system was shut down at ... ; database system is ready to accept connections"  # clean

5. Teardown (executed)

Restore targets destroyed; Phase 1 objects and spike artifacts removed; 9000/9001 left stopped-but-present. Verified clean: felhom-ctl@pve deleted, no spike ACLs, empty dump/, spk1 removed.

Correction: pveum acl delete requires --roles (a bare -user/-token path errors 400 roles: property is missing). In practice the explicit ACL deletes are unnecessary — deleting the token/user/role auto-invalidates the referencing ACLs (PVE logs ignore invalid acl token … and drops them).

pct stop 9002 ; pct stop 9003 ; pct destroy 9002 --purge ; pct destroy 9003 --purge
# correct ACL-delete syntax (needs --roles), or just let user/role deletion clean them:
pveum acl delete /vms/9001      --roles FelhomSelfBackup --users  'felhom-ctl@pve'
pveum acl delete /vms/9001      --roles FelhomSelfBackup --tokens 'felhom-ctl@pve!ctl'
pveum acl delete /storage/local --roles FelhomSelfBackup --users  'felhom-ctl@pve'
pveum acl delete /storage/local --roles FelhomSelfBackup --tokens 'felhom-ctl@pve!ctl'
pveum user token remove felhom-ctl@pve ctl ; pveum user delete felhom-ctl@pve ; pveum role delete FelhomSelfBackup
pct delsnapshot 9001 spk1
rm -f /var/lib/vz/dump/vzdump-lxc-9001-*.tar.zst /var/lib/vz/dump/vzdump-lxc-9001-*.log
pct stop 9001     # back to stopped-but-present

6. To destroy 9000/9001 later (NOT run — left stopped-but-present)

qm destroy 9000 --purge        # VM  (Phase 0 subject)
pct destroy 9001 --purge       # LXC (Phase 0/1/2 subject)
# Debian 13 CT template left in place: local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst