moved documentation to felhom.eu
This commit is contained in:
@@ -0,0 +1,315 @@
|
||||
# Phase 1 + 2 — Privilege Model & Backup/Restore Round-Trip: Findings
|
||||
|
||||
**Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, node confirmed via
|
||||
`pvesh get /nodes` → `demo-felhom`. Storage: `local` (dir, content
|
||||
`iso,vztmpl,backup,import`), `local-lvm` (LVM-thin, `rootdir,images`).
|
||||
**Subject:** LXC `9001` (`spike-lxc`, unprivileged, `nesting=1,keyctl=1`, Docker +
|
||||
postgres/redis/nginx stack). **Date:** 2026-06-07.
|
||||
|
||||
> Data and observations only — **no recommendation or verdict**.
|
||||
|
||||
## Hypotheses — verdicts at a glance
|
||||
| | Hypothesis | Result |
|
||||
|---|---|---|
|
||||
| **H1** | Backup scopes to one VMID; restore/create needs node/pool allocate → denied to narrow token | **CONFIRMED** (create CT = 403) |
|
||||
| **H2** | An LXC vzdump captures the Docker volumes (they live in the container rootfs) | **CONFIRMED** (sentinel survived both restores) |
|
||||
| **H3** | Crash-consistent (running) *and* quiesced (stopped) backups both restore cleanly | **CONFIRMED** (A via WAL recovery, B clean start) |
|
||||
| **H4** | Running unprivileged LXC snapshots on LVM-thin; restored CT keeps unprivileged+nesting/keyctl | **CONFIRMED** (live snapshot OK; config survived) |
|
||||
|
||||
---
|
||||
|
||||
## 1. Phase 1 — Privilege model
|
||||
|
||||
### 1.1 Setup (operator side, root)
|
||||
```
|
||||
pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit"
|
||||
pveum user add felhom-ctl@pve --comment "spike in-guest controller"
|
||||
pveum user token add felhom-ctl@pve ctl --privsep 1 # secret: b6547d9d-... (ephemeral, spike-only)
|
||||
pveum acl modify /vms/9001 -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
|
||||
pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
|
||||
```
|
||||
Privilege names were verified against `PVEVMAdmin` / `PVEDatastoreUser` via
|
||||
`pveum role list` first. **Note:** the reference doc's introspection command
|
||||
`pveum role info <role>` **does not exist in PVE 9** — only `pveum role list` works.
|
||||
|
||||
### 1.2 ⚠️ Privsep gotcha — the doc's runbook is incomplete
|
||||
With `--privsep 1`, a token's effective rights are the **intersection of the backing
|
||||
user's permissions AND the token's own ACLs**. The reference doc (§3) grants ACLs to the
|
||||
**token only**. With the user `felhom-ctl@pve` holding **no** permissions, the
|
||||
intersection was **empty** — the first self-audit call returned:
|
||||
```
|
||||
HTTP 403 {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}
|
||||
```
|
||||
**Fix applied:** also grant the user the role on the same paths
|
||||
(`pveum acl modify /vms/9001 -user felhom-ctl@pve -role FelhomSelfBackup`, same for
|
||||
`/storage/local`). After that the self-calls succeeded. **A privsep token needs the
|
||||
permission present on *both* the user and the token** (the token ACL is what keeps the
|
||||
token ≤ user / narrowly scoped). This must be reflected in the controller provisioning.
|
||||
|
||||
### 1.3 Test matrix (every call run from **inside** the unprivileged LXC, `pct exec 9001`)
|
||||
`H=192.168.0.162 N=demo-felhom AUTH="PVEAPIToken=felhom-ctl@pve!ctl=<secret>"`
|
||||
|
||||
| # | Call | Expected | **Actual** | Notes |
|
||||
|---|---|---|---|---|
|
||||
| 1 | `GET /version` | 200 | **200** | reachable + auth from inside LXC (no privilege needed) |
|
||||
| 2 | `GET /nodes/$N/lxc/9001/status/current` | 200 | **200**¹ | self audit (after privsep fix) |
|
||||
| 3 | `POST /nodes/$N/lxc/9001/snapshot snapname=spk1` | 200/UPID→OK | **200, task exitstatus OK** | **running-LXC self-snapshot (H4)** |
|
||||
| 4 | `POST /nodes/$N/vzdump vmid=9001 storage=local mode=snapshot` | 200/UPID→OK | **200, task exitstatus OK** | self backup, archive produced |
|
||||
| 5 | `GET /nodes/$N/qemu/9000/status/current` | 403 | **403** | `Permission check failed (/vms/9000, VM.Audit)` |
|
||||
| 6 | `POST /nodes/$N/vzdump vmid=9000 storage=local` | 403 | **200 POST → task exitstatus 403**² | see note |
|
||||
| 7 | `POST /nodes/$N/lxc` (create CT) | 403 | **403** | `Permission check failed` — **proves create/allocate is operator-tier (H1)** |
|
||||
|
||||
¹ before the privsep fix this was 403; see §1.2.
|
||||
² **Important nuance:** the `vzdump` endpoint accepts the POST and returns a UPID even for
|
||||
an unauthorized vmid; the authorization failure surfaces at **task execution**, not at the
|
||||
HTTP layer. Polled from root:
|
||||
`exitstatus: "403 Permission check failed (/vms/9000, VM.Backup)"`, and **no 9000 archive
|
||||
was created**. The boundary holds — but a controller must **poll the task exitstatus**, not
|
||||
trust the POST's 200, to know a cross-guest backup was actually refused.
|
||||
|
||||
**Pass criteria met:** self-ops (1–4) succeed; cross-guest read (5), cross-guest backup
|
||||
(6, at task level), and create/allocate (7) are denied. The controller-as-guest boundary
|
||||
and the two-tier split are validated.
|
||||
|
||||
### 1.4 Final minimal role — `VM.PowerMgmt` **not** required
|
||||
The doc's open question ("does Tier A need `VM.PowerMgmt` for stop-mode backups? Likely
|
||||
yes"). **Tested and refuted:** a **stop-mode** self-vzdump submitted by the token
|
||||
(`vmid=9001 mode=stop`) completed with **`exitstatus: OK`** using the role *without*
|
||||
`VM.PowerMgmt`. `vzdump` performs the guest shutdown/restart internally under
|
||||
`VM.Backup`; no separate power privilege is needed.
|
||||
|
||||
> **Final minimal role (`FelhomSelfBackup`) — satisfies self-audit, self-snapshot, and
|
||||
> both `snapshot`- and `stop`-mode self-backup:**
|
||||
> `VM.Audit, VM.Snapshot, VM.Backup, Datastore.AllocateSpace, Datastore.Audit`
|
||||
> (`VM.PowerMgmt` deliberately omitted — confirmed unnecessary.)
|
||||
|
||||
### 1.5 TLS observation
|
||||
From inside the LXC, `curl` **without** `-k`:
|
||||
```
|
||||
curl: (60) SSL certificate problem: unable to get local issuer certificate
|
||||
```
|
||||
The host serves the default self-signed PVE cert; all tests used `-k`. Production trust
|
||||
(pin the PVE CA / issue a proper cert) is a separate design decision, flagged here.
|
||||
|
||||
### 1.6 Running-LXC snapshot (H4)
|
||||
Call #3 snapshotted the **running** unprivileged LXC on LVM-thin (`exitstatus OK`).
|
||||
`pct listsnapshot 9001` shows `spk1` with `pct status 9001 = running`. **No stop
|
||||
required** — the snapshot-before-update rollback flow is viable on a live container.
|
||||
|
||||
---
|
||||
|
||||
## 2. Phase 2 — Backup → real restore round-trip
|
||||
|
||||
Sentinel written pre-flight into the `pgdata` volume:
|
||||
`restore_check(42,'phase2-sentinel')` → clean read `42|phase2-sentinel`.
|
||||
|
||||
### 2.1 Backups (operator/root side)
|
||||
| Variant | Mode | Stack state | Task time | Wall | Archive | Size (zstd) |
|
||||
|---|---|---|---|---|---|---|
|
||||
| **A — crash-consistent** | `snapshot` | **running** | 00:00:24 | 25 s | `vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst` | **934 MB** (979,718,569 B) |
|
||||
| **B — quiesced** | `snapshot` | **stopped** (`docker compose stop`) | 00:00:21 | 22 s | `vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst` | **934 MB** (979,671,582 B) |
|
||||
|
||||
Both from a 2.5 GiB source; zstd → ~934 MB (~2.7:1). The stack was restarted after
|
||||
Variant B. **LXC snapshot-mode vzdump does *not* fsfreeze** (no guest agent in an LXC —
|
||||
consistent with the Phase 0 finding) → Variant A is genuinely crash-consistent.
|
||||
|
||||
### 2.2 Restore → fresh VMID → boot → verify
|
||||
| Check | 9002 (Variant A) | 9003 (Variant B) |
|
||||
|---|---|---|
|
||||
| Restore time (`pct restore … --storage local-lvm`) | **12 s** | **11 s** |
|
||||
| `unprivileged: 1` survived | **yes** | **yes** |
|
||||
| `features: nesting=1,keyctl=1` survived | **yes** | **yes** |
|
||||
| Containers after boot | `exited` (no restart policy) → `docker compose up -d` | same |
|
||||
| 3 containers healthy | **yes** | **yes** |
|
||||
| `curl localhost:8080` | **HTTP 200** | **HTTP 200** |
|
||||
| **Sentinel `(42,'phase2-sentinel')`** | **PRESENT** | **PRESENT** |
|
||||
| Postgres first-start | **WAL crash recovery** (see below) | **clean start, no recovery** |
|
||||
|
||||
> Restored CTs inherit 9001's fixed `hwaddr`. To avoid a MAC clash with the still-running
|
||||
> 9001 on `vmbr0`, `net0` was reset to auto-generate a fresh MAC before boot. All
|
||||
> verification (stack health, `curl localhost`, sentinel) is guest-internal and needs no
|
||||
> external network — and the Docker images are inside the restored rootfs, so no pulls.
|
||||
|
||||
**Variant A — Postgres automatic WAL recovery on 9002 (verbatim, post-restore boot):**
|
||||
```
|
||||
LOG: database system was interrupted; last known up at 2026-06-07 18:13:21 UTC
|
||||
LOG: database system was not properly shut down; automatic recovery in progress
|
||||
LOG: redo starts at 0/CB12838
|
||||
LOG: invalid record length at 0/CB12870: expected at least 24, got 0 # normal end-of-WAL
|
||||
LOG: redo done at 0/CB12838 ...
|
||||
LOG: checkpoint starting: end-of-recovery immediate wait
|
||||
LOG: database system is ready to accept connections
|
||||
```
|
||||
**Variant B — clean start on 9003 (verbatim, post-restore boot):**
|
||||
```
|
||||
LOG: database system was shut down at 2026-06-07 18:14:39 UTC
|
||||
LOG: database system is ready to accept connections
|
||||
```
|
||||
|
||||
**H2 confirmed:** one LXC vzdump captured the whole customer including the Docker named
|
||||
volume — the sentinel data restored in both guests. **H3 confirmed:** both variants
|
||||
restored to a bootable guest with intact data; the crash-consistent one recovered via WAL
|
||||
with no manual intervention, the quiesced one started clean. **H4 confirmed:** restored
|
||||
config preserved `unprivileged` + `nesting/keyctl`, so Docker ran in the restored CT.
|
||||
|
||||
---
|
||||
|
||||
## 3. Observations & confounds
|
||||
1. **Privsep token needs perms on user *and* token** (§1.2) — the single most important
|
||||
correction to the reference runbook; without it every scoped call 403s.
|
||||
2. **vzdump authorization is task-level, not POST-level** (§1.3 note ²) — a 200 + UPID
|
||||
does **not** mean authorized. The controller must poll `exitstatus`. This is also the
|
||||
general async-task lesson: every backup/snapshot/restore returns a UPID and the real
|
||||
result is in the task status.
|
||||
3. **`pveum role info` is gone in PVE 9** — use `pveum role list`. Minor doc drift.
|
||||
4. **`VM.PowerMgmt` not needed for stop-mode backup** (§1.4) — narrower role than the doc
|
||||
assumed.
|
||||
5. **No fsfreeze for LXC** — Variant A relied on Postgres's own WAL crash recovery, which
|
||||
worked here for an idle-at-backup DB. Under heavy write load, app-consistency for LXC
|
||||
still rests on the controller quiescing first (or stop-mode), exactly as the reference
|
||||
warned. This single test is not a durability guarantee under load.
|
||||
6. **Restore MAC collision** (§2.2) — `pct restore` preserves the source `hwaddr`;
|
||||
restoring while the original runs needs a MAC reset (or the original stopped). The
|
||||
controller's restore flow must handle identity (MAC/hostname/IP) to avoid clashes.
|
||||
7. **No restart policy on the compose services** — restored containers came up `exited`;
|
||||
`docker compose up -d` (or a restart policy / systemd unit) is required for the stack
|
||||
to return automatically after a restore or guest reboot.
|
||||
8. **Restore is fast, backup dominated by I/O** — restores were 11–12 s (extract at
|
||||
~524 MiB/s); backups ~22–25 s (read 2.5 GiB at ~108–119 MiB/s + zstd). Single runs,
|
||||
idle host, ~150 MB DB; not a throughput benchmark.
|
||||
9. **Sequencing artifact:** a Phase-1 stop-mode self-backup ran before Phase 2 and
|
||||
stopped/started 9001; the stack was brought back up and the sentinel re-verified
|
||||
before the Variant A/B backups, so it does not affect the round-trip results.
|
||||
|
||||
---
|
||||
|
||||
## 4. Raw command log (appendix)
|
||||
|
||||
### 4.1 Pre-flight
|
||||
```
|
||||
$ pvesh get /nodes -> node: demo-felhom
|
||||
$ cat /etc/pve/storage.cfg
|
||||
dir: local ... content iso,vztmpl,backup,import # 'backup' present
|
||||
lvmthin: local-lvm ... content rootdir,images # no backup (expected)
|
||||
$ pct start 9001 ; docker compose up -d -> 3 containers Started
|
||||
$ curl localhost:8080 -> HTTP 200
|
||||
# sentinel:
|
||||
CREATE TABLE ; INSERT 0 1 ; SELECT count -> 1 ; SELECT * -> 42 | phase2-sentinel
|
||||
```
|
||||
|
||||
### 4.2 Phase 1 — role/user/token/ACL
|
||||
```
|
||||
$ pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit" -> role-ok
|
||||
$ pveum user add felhom-ctl@pve --comment "spike in-guest controller" -> user-ok
|
||||
$ pveum user token add felhom-ctl@pve ctl --privsep 1
|
||||
{"full-tokenid":"felhom-ctl@pve!ctl","info":{"privsep":"1"},"value":"b6547d9d-08ec-4f22-beb8-a551dc2cd69d"}
|
||||
$ pveum acl modify /vms/9001 -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup -> ok
|
||||
$ pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup -> ok
|
||||
$ pveum role list | grep FelhomSelfBackup
|
||||
FelhomSelfBackup | Datastore.AllocateSpace,Datastore.Audit,VM.Audit,VM.Backup,VM.Snapshot
|
||||
$ pveum role info FelhomSelfBackup -> ERROR: unknown command 'pveum role info' # PVE9 has no 'role info'
|
||||
```
|
||||
|
||||
### 4.3 Phase 1 — matrix (from inside LXC)
|
||||
```
|
||||
# TLS without -k:
|
||||
curl: (60) SSL certificate problem: unable to get local issuer certificate
|
||||
|
||||
# BEFORE privsep fix:
|
||||
#2 GET self status -> HTTP 403 {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}
|
||||
|
||||
# privsep fix:
|
||||
$ pveum acl modify /vms/9001 -user 'felhom-ctl@pve' -role FelhomSelfBackup -> ok
|
||||
$ pveum acl modify /storage/local -user 'felhom-ctl@pve' -role FelhomSelfBackup -> ok
|
||||
|
||||
# AFTER fix:
|
||||
#1 GET /version -> HTTP 200
|
||||
#2 GET /nodes/.../lxc/9001/status/current -> HTTP 200 {"data":{...,"status":"running",...}}
|
||||
#5 GET /nodes/.../qemu/9000/status/current -> HTTP 403 (/vms/9000, VM.Audit)
|
||||
#6 POST vzdump vmid=9000 -> HTTP 200 {"data":"UPID:...vzdump:9000:felhom-ctl@pve!ctl:"}
|
||||
root poll: exitstatus="403 Permission check failed (/vms/9000, VM.Backup)"
|
||||
task log: TASK ERROR: 403 Permission check failed (/vms/9000, VM.Backup)
|
||||
/var/lib/vz/dump: no 9000 archive created
|
||||
#7 POST /nodes/.../lxc (create CT vmid=9009) -> HTTP 403 {"message":"Permission check failed\n"}
|
||||
|
||||
#3 POST lxc/9001/snapshot snapname=spk1 -> HTTP 200 UPID:...vzsnapshot:9001...
|
||||
root: exitstatus "OK" ; pct listsnapshot 9001 -> spk1 ; pct status 9001 -> running
|
||||
#4 POST vzdump vmid=9001 storage=local mode=snapshot -> HTTP 200 UPID:...vzdump:9001...
|
||||
root: exitstatus "OK"
|
||||
token can read own task status: HTTP 200 {"...exitstatus":"OK"} # earlier poll TIMEOUTs were a shell-quoting bug in the helper, not a perms issue
|
||||
|
||||
# stop-mode self-backup (VM.PowerMgmt test):
|
||||
$ token POST vzdump vmid=9001 storage=local mode=stop -> HTTP 200 UPID:...vzdump:9001...
|
||||
root poll: exitstatus "OK" # SUCCEEDED without VM.PowerMgmt in the role
|
||||
```
|
||||
|
||||
### 4.4 Phase 2 — backups
|
||||
```
|
||||
# Variant A (running):
|
||||
$ vzdump 9001 --mode snapshot --storage local --compress zstd
|
||||
INFO: Total bytes written: 2585589760 (2.5GiB, 108MiB/s)
|
||||
INFO: archive file size: 934MB
|
||||
INFO: Finished Backup of VM 9001 (00:00:24) ; WALL_SECONDS=25
|
||||
-> vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst (979718569 B)
|
||||
|
||||
# Variant B (stopped):
|
||||
$ docker compose stop (cache,db,web Stopped)
|
||||
$ vzdump 9001 --mode snapshot --storage local --compress zstd
|
||||
INFO: Total bytes written: 2585825280 (2.5GiB, 119MiB/s)
|
||||
INFO: Finished Backup of VM 9001 (00:00:21) ; WALL_SECONDS=22
|
||||
-> vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst (979671582 B)
|
||||
$ docker compose start (db,cache,web Started)
|
||||
```
|
||||
|
||||
### 4.5 Phase 2 — restores + verification
|
||||
```
|
||||
# A -> 9002:
|
||||
$ pct restore 9002 .../20_13_43.tar.zst --storage local-lvm
|
||||
Total bytes read: 2585589760 (2.5GiB, 524MiB/s) ; RESTORE_A_SECONDS=12
|
||||
$ pct config 9002 -> features: nesting=1,keyctl=1 ; unprivileged: 1
|
||||
$ pct set 9002 -net0 name=eth0,bridge=vmbr0,ip=dhcp # fresh MAC BC:24:11:E3:F4:64
|
||||
$ pct start 9002 ; docker compose up -d -> 3 running ; curl -> HTTP 200
|
||||
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
|
||||
db log: "was interrupted ... not properly shut down; automatic recovery in progress
|
||||
redo starts/redo done ... database system is ready to accept connections"
|
||||
|
||||
# B -> 9003:
|
||||
$ pct restore 9003 .../20_14_40.tar.zst --storage local-lvm
|
||||
Total bytes read: 2585825280 (2.5GiB, 524MiB/s) ; RESTORE_B_SECONDS=11
|
||||
$ pct config 9003 -> features: nesting=1,keyctl=1 ; unprivileged: 1
|
||||
$ pct set 9003 -net0 ... (fresh MAC) ; pct start 9003 ; docker compose up -d -> 3 running ; curl 200
|
||||
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
|
||||
db log: "database system was shut down at ... ; database system is ready to accept connections" # clean
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Teardown (executed)
|
||||
Restore targets destroyed; Phase 1 objects and spike artifacts removed; `9000`/`9001`
|
||||
left **stopped-but-present**. Verified clean: `felhom-ctl@pve` deleted, no spike ACLs,
|
||||
empty `dump/`, `spk1` removed.
|
||||
|
||||
> **Correction:** `pveum acl delete` **requires `--roles`** (a bare `-user`/`-token`
|
||||
> path errors `400 roles: property is missing`). In practice the explicit ACL deletes
|
||||
> are unnecessary — deleting the token/user/role **auto-invalidates** the referencing
|
||||
> ACLs (PVE logs `ignore invalid acl token …` and drops them).
|
||||
|
||||
```bash
|
||||
pct stop 9002 ; pct stop 9003 ; pct destroy 9002 --purge ; pct destroy 9003 --purge
|
||||
# correct ACL-delete syntax (needs --roles), or just let user/role deletion clean them:
|
||||
pveum acl delete /vms/9001 --roles FelhomSelfBackup --users 'felhom-ctl@pve'
|
||||
pveum acl delete /vms/9001 --roles FelhomSelfBackup --tokens 'felhom-ctl@pve!ctl'
|
||||
pveum acl delete /storage/local --roles FelhomSelfBackup --users 'felhom-ctl@pve'
|
||||
pveum acl delete /storage/local --roles FelhomSelfBackup --tokens 'felhom-ctl@pve!ctl'
|
||||
pveum user token remove felhom-ctl@pve ctl ; pveum user delete felhom-ctl@pve ; pveum role delete FelhomSelfBackup
|
||||
pct delsnapshot 9001 spk1
|
||||
rm -f /var/lib/vz/dump/vzdump-lxc-9001-*.tar.zst /var/lib/vz/dump/vzdump-lxc-9001-*.log
|
||||
pct stop 9001 # back to stopped-but-present
|
||||
```
|
||||
|
||||
## 6. To destroy 9000/9001 later (NOT run — left stopped-but-present)
|
||||
```bash
|
||||
qm destroy 9000 --purge # VM (Phase 0 subject)
|
||||
pct destroy 9001 --purge # LXC (Phase 0/1/2 subject)
|
||||
# Debian 13 CT template left in place: local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst
|
||||
```
|
||||
Reference in New Issue
Block a user