Files
felhom-agent/docs/phase1-2-findings.md
T
2026-06-07 20:20:52 +02:00

307 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 1 + 2 — Privilege Model & Backup/Restore Round-Trip: Findings
**Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, node confirmed via
`pvesh get /nodes``demo-felhom`. Storage: `local` (dir, content
`iso,vztmpl,backup,import`), `local-lvm` (LVM-thin, `rootdir,images`).
**Subject:** LXC `9001` (`spike-lxc`, unprivileged, `nesting=1,keyctl=1`, Docker +
postgres/redis/nginx stack). **Date:** 2026-06-07.
> Data and observations only — **no recommendation or verdict**.
## Hypotheses — verdicts at a glance
| | Hypothesis | Result |
|---|---|---|
| **H1** | Backup scopes to one VMID; restore/create needs node/pool allocate → denied to narrow token | **CONFIRMED** (create CT = 403) |
| **H2** | An LXC vzdump captures the Docker volumes (they live in the container rootfs) | **CONFIRMED** (sentinel survived both restores) |
| **H3** | Crash-consistent (running) *and* quiesced (stopped) backups both restore cleanly | **CONFIRMED** (A via WAL recovery, B clean start) |
| **H4** | Running unprivileged LXC snapshots on LVM-thin; restored CT keeps unprivileged+nesting/keyctl | **CONFIRMED** (live snapshot OK; config survived) |
---
## 1. Phase 1 — Privilege model
### 1.1 Setup (operator side, root)
```
pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit"
pveum user add felhom-ctl@pve --comment "spike in-guest controller"
pveum user token add felhom-ctl@pve ctl --privsep 1 # secret: b6547d9d-... (ephemeral, spike-only)
pveum acl modify /vms/9001 -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
```
Privilege names were verified against `PVEVMAdmin` / `PVEDatastoreUser` via
`pveum role list` first. **Note:** the reference doc's introspection command
`pveum role info <role>` **does not exist in PVE 9** — only `pveum role list` works.
### 1.2 ⚠️ Privsep gotcha — the doc's runbook is incomplete
With `--privsep 1`, a token's effective rights are the **intersection of the backing
user's permissions AND the token's own ACLs**. The reference doc (§3) grants ACLs to the
**token only**. With the user `felhom-ctl@pve` holding **no** permissions, the
intersection was **empty** — the first self-audit call returned:
```
HTTP 403 {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}
```
**Fix applied:** also grant the user the role on the same paths
(`pveum acl modify /vms/9001 -user felhom-ctl@pve -role FelhomSelfBackup`, same for
`/storage/local`). After that the self-calls succeeded. **A privsep token needs the
permission present on *both* the user and the token** (the token ACL is what keeps the
token ≤ user / narrowly scoped). This must be reflected in the controller provisioning.
### 1.3 Test matrix (every call run from **inside** the unprivileged LXC, `pct exec 9001`)
`H=192.168.0.162 N=demo-felhom AUTH="PVEAPIToken=felhom-ctl@pve!ctl=<secret>"`
| # | Call | Expected | **Actual** | Notes |
|---|---|---|---|---|
| 1 | `GET /version` | 200 | **200** | reachable + auth from inside LXC (no privilege needed) |
| 2 | `GET /nodes/$N/lxc/9001/status/current` | 200 | **200**¹ | self audit (after privsep fix) |
| 3 | `POST /nodes/$N/lxc/9001/snapshot snapname=spk1` | 200/UPID→OK | **200, task exitstatus OK** | **running-LXC self-snapshot (H4)** |
| 4 | `POST /nodes/$N/vzdump vmid=9001 storage=local mode=snapshot` | 200/UPID→OK | **200, task exitstatus OK** | self backup, archive produced |
| 5 | `GET /nodes/$N/qemu/9000/status/current` | 403 | **403** | `Permission check failed (/vms/9000, VM.Audit)` |
| 6 | `POST /nodes/$N/vzdump vmid=9000 storage=local` | 403 | **200 POST → task exitstatus 403**² | see note |
| 7 | `POST /nodes/$N/lxc` (create CT) | 403 | **403** | `Permission check failed`**proves create/allocate is operator-tier (H1)** |
¹ before the privsep fix this was 403; see §1.2.
² **Important nuance:** the `vzdump` endpoint accepts the POST and returns a UPID even for
an unauthorized vmid; the authorization failure surfaces at **task execution**, not at the
HTTP layer. Polled from root:
`exitstatus: "403 Permission check failed (/vms/9000, VM.Backup)"`, and **no 9000 archive
was created**. The boundary holds — but a controller must **poll the task exitstatus**, not
trust the POST's 200, to know a cross-guest backup was actually refused.
**Pass criteria met:** self-ops (14) succeed; cross-guest read (5), cross-guest backup
(6, at task level), and create/allocate (7) are denied. The controller-as-guest boundary
and the two-tier split are validated.
### 1.4 Final minimal role — `VM.PowerMgmt` **not** required
The doc's open question ("does Tier A need `VM.PowerMgmt` for stop-mode backups? Likely
yes"). **Tested and refuted:** a **stop-mode** self-vzdump submitted by the token
(`vmid=9001 mode=stop`) completed with **`exitstatus: OK`** using the role *without*
`VM.PowerMgmt`. `vzdump` performs the guest shutdown/restart internally under
`VM.Backup`; no separate power privilege is needed.
> **Final minimal role (`FelhomSelfBackup`) — satisfies self-audit, self-snapshot, and
> both `snapshot`- and `stop`-mode self-backup:**
> `VM.Audit, VM.Snapshot, VM.Backup, Datastore.AllocateSpace, Datastore.Audit`
> (`VM.PowerMgmt` deliberately omitted — confirmed unnecessary.)
### 1.5 TLS observation
From inside the LXC, `curl` **without** `-k`:
```
curl: (60) SSL certificate problem: unable to get local issuer certificate
```
The host serves the default self-signed PVE cert; all tests used `-k`. Production trust
(pin the PVE CA / issue a proper cert) is a separate design decision, flagged here.
### 1.6 Running-LXC snapshot (H4)
Call #3 snapshotted the **running** unprivileged LXC on LVM-thin (`exitstatus OK`).
`pct listsnapshot 9001` shows `spk1` with `pct status 9001 = running`. **No stop
required** — the snapshot-before-update rollback flow is viable on a live container.
---
## 2. Phase 2 — Backup → real restore round-trip
Sentinel written pre-flight into the `pgdata` volume:
`restore_check(42,'phase2-sentinel')` → clean read `42|phase2-sentinel`.
### 2.1 Backups (operator/root side)
| Variant | Mode | Stack state | Task time | Wall | Archive | Size (zstd) |
|---|---|---|---|---|---|---|
| **A — crash-consistent** | `snapshot` | **running** | 00:00:24 | 25 s | `vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst` | **934 MB** (979,718,569 B) |
| **B — quiesced** | `snapshot` | **stopped** (`docker compose stop`) | 00:00:21 | 22 s | `vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst` | **934 MB** (979,671,582 B) |
Both from a 2.5 GiB source; zstd → ~934 MB (~2.7:1). The stack was restarted after
Variant B. **LXC snapshot-mode vzdump does *not* fsfreeze** (no guest agent in an LXC —
consistent with the Phase 0 finding) → Variant A is genuinely crash-consistent.
### 2.2 Restore → fresh VMID → boot → verify
| Check | 9002 (Variant A) | 9003 (Variant B) |
|---|---|---|
| Restore time (`pct restore … --storage local-lvm`) | **12 s** | **11 s** |
| `unprivileged: 1` survived | **yes** | **yes** |
| `features: nesting=1,keyctl=1` survived | **yes** | **yes** |
| Containers after boot | `exited` (no restart policy) → `docker compose up -d` | same |
| 3 containers healthy | **yes** | **yes** |
| `curl localhost:8080` | **HTTP 200** | **HTTP 200** |
| **Sentinel `(42,'phase2-sentinel')`** | **PRESENT** | **PRESENT** |
| Postgres first-start | **WAL crash recovery** (see below) | **clean start, no recovery** |
> Restored CTs inherit 9001's fixed `hwaddr`. To avoid a MAC clash with the still-running
> 9001 on `vmbr0`, `net0` was reset to auto-generate a fresh MAC before boot. All
> verification (stack health, `curl localhost`, sentinel) is guest-internal and needs no
> external network — and the Docker images are inside the restored rootfs, so no pulls.
**Variant A — Postgres automatic WAL recovery on 9002 (verbatim, post-restore boot):**
```
LOG: database system was interrupted; last known up at 2026-06-07 18:13:21 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/CB12838
LOG: invalid record length at 0/CB12870: expected at least 24, got 0 # normal end-of-WAL
LOG: redo done at 0/CB12838 ...
LOG: checkpoint starting: end-of-recovery immediate wait
LOG: database system is ready to accept connections
```
**Variant B — clean start on 9003 (verbatim, post-restore boot):**
```
LOG: database system was shut down at 2026-06-07 18:14:39 UTC
LOG: database system is ready to accept connections
```
**H2 confirmed:** one LXC vzdump captured the whole customer including the Docker named
volume — the sentinel data restored in both guests. **H3 confirmed:** both variants
restored to a bootable guest with intact data; the crash-consistent one recovered via WAL
with no manual intervention, the quiesced one started clean. **H4 confirmed:** restored
config preserved `unprivileged` + `nesting/keyctl`, so Docker ran in the restored CT.
---
## 3. Observations & confounds
1. **Privsep token needs perms on user *and* token** (§1.2) — the single most important
correction to the reference runbook; without it every scoped call 403s.
2. **vzdump authorization is task-level, not POST-level** (§1.3 note ²) — a 200 + UPID
does **not** mean authorized. The controller must poll `exitstatus`. This is also the
general async-task lesson: every backup/snapshot/restore returns a UPID and the real
result is in the task status.
3. **`pveum role info` is gone in PVE 9** — use `pveum role list`. Minor doc drift.
4. **`VM.PowerMgmt` not needed for stop-mode backup** (§1.4) — narrower role than the doc
assumed.
5. **No fsfreeze for LXC** — Variant A relied on Postgres's own WAL crash recovery, which
worked here for an idle-at-backup DB. Under heavy write load, app-consistency for LXC
still rests on the controller quiescing first (or stop-mode), exactly as the reference
warned. This single test is not a durability guarantee under load.
6. **Restore MAC collision** (§2.2) — `pct restore` preserves the source `hwaddr`;
restoring while the original runs needs a MAC reset (or the original stopped). The
controller's restore flow must handle identity (MAC/hostname/IP) to avoid clashes.
7. **No restart policy on the compose services** — restored containers came up `exited`;
`docker compose up -d` (or a restart policy / systemd unit) is required for the stack
to return automatically after a restore or guest reboot.
8. **Restore is fast, backup dominated by I/O** — restores were 1112 s (extract at
~524 MiB/s); backups ~2225 s (read 2.5 GiB at ~108119 MiB/s + zstd). Single runs,
idle host, ~150 MB DB; not a throughput benchmark.
9. **Sequencing artifact:** a Phase-1 stop-mode self-backup ran before Phase 2 and
stopped/started 9001; the stack was brought back up and the sentinel re-verified
before the Variant A/B backups, so it does not affect the round-trip results.
---
## 4. Raw command log (appendix)
### 4.1 Pre-flight
```
$ pvesh get /nodes -> node: demo-felhom
$ cat /etc/pve/storage.cfg
dir: local ... content iso,vztmpl,backup,import # 'backup' present
lvmthin: local-lvm ... content rootdir,images # no backup (expected)
$ pct start 9001 ; docker compose up -d -> 3 containers Started
$ curl localhost:8080 -> HTTP 200
# sentinel:
CREATE TABLE ; INSERT 0 1 ; SELECT count -> 1 ; SELECT * -> 42 | phase2-sentinel
```
### 4.2 Phase 1 — role/user/token/ACL
```
$ pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit" -> role-ok
$ pveum user add felhom-ctl@pve --comment "spike in-guest controller" -> user-ok
$ pveum user token add felhom-ctl@pve ctl --privsep 1
{"full-tokenid":"felhom-ctl@pve!ctl","info":{"privsep":"1"},"value":"b6547d9d-08ec-4f22-beb8-a551dc2cd69d"}
$ pveum acl modify /vms/9001 -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup -> ok
$ pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup -> ok
$ pveum role list | grep FelhomSelfBackup
FelhomSelfBackup | Datastore.AllocateSpace,Datastore.Audit,VM.Audit,VM.Backup,VM.Snapshot
$ pveum role info FelhomSelfBackup -> ERROR: unknown command 'pveum role info' # PVE9 has no 'role info'
```
### 4.3 Phase 1 — matrix (from inside LXC)
```
# TLS without -k:
curl: (60) SSL certificate problem: unable to get local issuer certificate
# BEFORE privsep fix:
#2 GET self status -> HTTP 403 {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}
# privsep fix:
$ pveum acl modify /vms/9001 -user 'felhom-ctl@pve' -role FelhomSelfBackup -> ok
$ pveum acl modify /storage/local -user 'felhom-ctl@pve' -role FelhomSelfBackup -> ok
# AFTER fix:
#1 GET /version -> HTTP 200
#2 GET /nodes/.../lxc/9001/status/current -> HTTP 200 {"data":{...,"status":"running",...}}
#5 GET /nodes/.../qemu/9000/status/current -> HTTP 403 (/vms/9000, VM.Audit)
#6 POST vzdump vmid=9000 -> HTTP 200 {"data":"UPID:...vzdump:9000:felhom-ctl@pve!ctl:"}
root poll: exitstatus="403 Permission check failed (/vms/9000, VM.Backup)"
task log: TASK ERROR: 403 Permission check failed (/vms/9000, VM.Backup)
/var/lib/vz/dump: no 9000 archive created
#7 POST /nodes/.../lxc (create CT vmid=9009) -> HTTP 403 {"message":"Permission check failed\n"}
#3 POST lxc/9001/snapshot snapname=spk1 -> HTTP 200 UPID:...vzsnapshot:9001...
root: exitstatus "OK" ; pct listsnapshot 9001 -> spk1 ; pct status 9001 -> running
#4 POST vzdump vmid=9001 storage=local mode=snapshot -> HTTP 200 UPID:...vzdump:9001...
root: exitstatus "OK"
token can read own task status: HTTP 200 {"...exitstatus":"OK"} # earlier poll TIMEOUTs were a shell-quoting bug in the helper, not a perms issue
# stop-mode self-backup (VM.PowerMgmt test):
$ token POST vzdump vmid=9001 storage=local mode=stop -> HTTP 200 UPID:...vzdump:9001...
root poll: exitstatus "OK" # SUCCEEDED without VM.PowerMgmt in the role
```
### 4.4 Phase 2 — backups
```
# Variant A (running):
$ vzdump 9001 --mode snapshot --storage local --compress zstd
INFO: Total bytes written: 2585589760 (2.5GiB, 108MiB/s)
INFO: archive file size: 934MB
INFO: Finished Backup of VM 9001 (00:00:24) ; WALL_SECONDS=25
-> vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst (979718569 B)
# Variant B (stopped):
$ docker compose stop (cache,db,web Stopped)
$ vzdump 9001 --mode snapshot --storage local --compress zstd
INFO: Total bytes written: 2585825280 (2.5GiB, 119MiB/s)
INFO: Finished Backup of VM 9001 (00:00:21) ; WALL_SECONDS=22
-> vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst (979671582 B)
$ docker compose start (db,cache,web Started)
```
### 4.5 Phase 2 — restores + verification
```
# A -> 9002:
$ pct restore 9002 .../20_13_43.tar.zst --storage local-lvm
Total bytes read: 2585589760 (2.5GiB, 524MiB/s) ; RESTORE_A_SECONDS=12
$ pct config 9002 -> features: nesting=1,keyctl=1 ; unprivileged: 1
$ pct set 9002 -net0 name=eth0,bridge=vmbr0,ip=dhcp # fresh MAC BC:24:11:E3:F4:64
$ pct start 9002 ; docker compose up -d -> 3 running ; curl -> HTTP 200
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
db log: "was interrupted ... not properly shut down; automatic recovery in progress
redo starts/redo done ... database system is ready to accept connections"
# B -> 9003:
$ pct restore 9003 .../20_14_40.tar.zst --storage local-lvm
Total bytes read: 2585825280 (2.5GiB, 524MiB/s) ; RESTORE_B_SECONDS=11
$ pct config 9003 -> features: nesting=1,keyctl=1 ; unprivileged: 1
$ pct set 9003 -net0 ... (fresh MAC) ; pct start 9003 ; docker compose up -d -> 3 running ; curl 200
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
db log: "database system was shut down at ... ; database system is ready to accept connections" # clean
```
---
## 5. Teardown (executed — see §6 for what was left)
Restore targets destroyed; Phase 1 objects and spike artifacts removed; `9000`/`9001`
left **stopped-but-present**.
```bash
pct destroy 9002 --purge ; pct destroy 9003 --purge
pveum acl delete /vms/9001 -user 'felhom-ctl@pve' ; pveum acl delete /vms/9001 -token 'felhom-ctl@pve!ctl'
pveum acl delete /storage/local -user 'felhom-ctl@pve' ; pveum acl delete /storage/local -token 'felhom-ctl@pve!ctl'
pveum user token remove felhom-ctl@pve ctl ; pveum user delete felhom-ctl@pve ; pveum role delete FelhomSelfBackup
pct delsnapshot 9001 spk1
rm -f /var/lib/vz/dump/vzdump-lxc-9001-*.tar.zst /var/lib/vz/dump/vzdump-lxc-9001-*.log
pct stop 9001 # back to stopped-but-present
```
## 6. To destroy 9000/9001 later (NOT run — left stopped-but-present)
```bash
qm destroy 9000 --purge # VM (Phase 0 subject)
pct destroy 9001 --purge # LXC (Phase 0/1/2 subject)
# Debian 13 CT template left in place: local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst
```