diff --git a/docs/proxmox-platform.md b/docs/proxmox-platform.md index e7f1cc0..8303844 100644 --- a/docs/proxmox-platform.md +++ b/docs/proxmox-platform.md @@ -207,6 +207,39 @@ Poll `GET /nodes//tasks//status` until `status: stopped`, then read (The task owner — including a token — can read its own task status: 200.) +### 3.6 Operator-tier agent role & root-vs-API boundary (validated) +The operator-tier **host agent** (`03-host-agent.md`) needs a far broader role than the +Phase-1 *guest self-backup* role (which is denied create/allocate — §3.4). The minimal role +that drives the full guest lifecycle via an API token, validated by paring +[[phase3 §B3](tests/phase3-findings.md)]: + +> **`FelhomAgent` (operator-tier, 16 privileges):** +> `VM.Allocate, VM.Audit, VM.Config.Disk, VM.Config.CPU, VM.Config.Memory, VM.Config.Network, +> VM.Config.Options, VM.PowerMgmt, VM.Snapshot, VM.Snapshot.Rollback, VM.Backup, +> Datastore.Allocate, Datastore.AllocateSpace, Datastore.Audit, Sys.Audit, SDN.Use` +> +> Paring proved: `SDN.Use` is **required** (PVE 9 gates bridge use; omitting it → `403 +> (/sdn/zones/localnetwork/vmbr0, SDN.Use)`); `Sys.Audit` required for host metrics +> (`GET /nodes//status`); `VM.Config.Network`/`VM.Config.Options` required for NIC/onboot +> config; `Datastore.AllocateTemplate` **not** needed (drop it). NB `VM.Config.CPUMemory` is +> not a real privilege — it is `VM.Config.CPU` + `VM.Config.Memory`. + +**Root-vs-API boundary** [[phase3 §B3](tests/phase3-findings.md)] — nearly the entire guest +lifecycle, **including restore**, is API-token-covered; the genuine OS-root residual is narrow: + +| Operation | Coverage | +|---|---| +| Create LXC (nesting-only), config, allocate, start/stop, snapshot/rollback, vzdump, **restore**, destroy, add storage definition, host metrics | **scoped API token** (the `FelhomAgent` role) | +| ⚠️ **Create LXC with `keyctl=1`** (Docker needs it — §2.3) | **OS root `root@pam` only** | +| USB physical mount-by-UUID / systemd mount unit / fstab; SMART/sensors | OS root / narrow sudoers | + +> ⚠️ **`keyctl=1` (and any feature flag except `nesting`) can be set only by an actual +> `root@pam` session** — `changing feature flags (except nesting) is only allowed for +> root@pam`. **No API token qualifies**, not even a non-privsep `root@pam` token (same 403). +> So *fresh provisioning* of a Docker-capable LXC needs `pct create` as OS root (or a narrow +> sudoers entry). **Restore is exempt:** a token-authorized `vzrestore` **preserves +> `keyctl=1`** from the archive — the DR path needs no root. + --- ## 4. Backup & restore (`vzdump` / `pct restore`) @@ -267,6 +300,29 @@ snapshot-before-change rollback flow. findings above are for `vzdump` to a `dir` storage. PBS (dedup, incremental, remote, dirty- bitmap) is pending. +### 4.7 vzdump scope by LXC mount type (validated) +A stop-mode `vzdump` includes/excludes each LXC mount point by **type and the `backup` flag** +[[phase3 §B2](tests/phase3-findings.md)]. Validated three ways (vzdump log, archive grep, +restore): + +| Location | `backup` flag | In the vzdump? | +|---|---|---| +| rootfs (and anything inside it) | — | **included** (always) | +| **Docker named volume** (default driver) | — | **included** — it lives in the rootfs (`/var/lib/docker/volumes//_data`) | +| volume mount point (`mpN`) | `backup=1` | included | +| volume mount point (`mpN`) | `backup=0` | **excluded** (vol recreated empty on restore) | +| bind mount point (`mpN: /host/path`) | n/a | **excluded** ("not a volume"); data is *not* in the archive | + +> ⚠️ **The `backup=` flag is honoured ONLY for *volume* mount points.** A **Docker +> named volume is in the rootfs and is always captured** — so a "bulk" volume left as a +> default named volume is silently swept into the whole-guest image. To keep bulk data **out**, +> realize it as a dedicated `backup=0` volume mount point (proven recipe: +> `pct set -mpN :,mp=/mnt/bulk,backup=0` then +> `docker volume create --driver local -o type=none -o o=bind -o device=/mnt/bulk bulkvol`). +> A **bind mount's** data is excluded from the archive entirely; on same-host restore it +> reappears only because the bind config re-attaches the same host dir — on a *different* host +> (true DR) it is gone unless backed up separately. + --- ## 5. Gotchas & operational notes (quick reference) @@ -283,6 +339,9 @@ bitmap) is pending. | **`pveum role info` gone** | use `pveum role list` in PVE 9 | [phase1-2 §1.1](tests/phase1-2-findings.md) | | **`pveum acl delete` needs `--roles`** | bare `-user`/`-token` path errors `400 roles: property is missing` | [phase1-2 §5](tests/phase1-2-findings.md) | | **`VM.PowerMgmt` not needed** | stop-mode backup works under `VM.Backup` alone | [phase1-2 §1.4](tests/phase1-2-findings.md) | +| **`keyctl=1` is root-only** | feature flags except `nesting` need a `root@pam` session; no API token (even root's) can set them; restore preserves them | [phase3 §B3](tests/phase3-findings.md) | +| **`SDN.Use` gates bridge use** | PVE 9 needs `SDN.Use` to attach a NIC to `vmbr0`; omit it → 403 | [phase3 §B3](tests/phase3-findings.md) | +| **Docker named vol = always backed up** | named volumes live in rootfs; only *volume mountpoints* honour `backup=0`; bulk must be a dedicated `backup=0` mp | [phase3 §B2](tests/phase3-findings.md) | --- @@ -301,6 +360,8 @@ bitmap) is pending. | Running, unprivileged LXC snapshots on LVM-thin (no stop) | [phase1-2 §1.6](tests/phase1-2-findings.md) | | `vzdump` → `pct restore` round-trip; one backup captures Docker volumes; config survives | [phase1-2 §2](tests/phase1-2-findings.md) | | Crash-consistent restore recovers via Postgres WAL; quiesced restores clean | [phase1-2 §2.2](tests/phase1-2-findings.md) | +| LXC vzdump scope by mount type; `backup=0` excludes volume mps; Docker named vols ride rootfs; proven bulk-exclusion recipe | [phase3 §B2](tests/phase3-findings.md) | +| Operator agent role (16 privs); guest lifecycle incl. restore is API-token-covered; `keyctl` create is `root@pam`-only | [phase3 §B3](tests/phase3-findings.md) | ### Not yet validated (do not assume) | Open item | Why it matters | diff --git a/docs/tests/phase3-findings.md b/docs/tests/phase3-findings.md new file mode 100644 index 0000000..4e61b35 --- /dev/null +++ b/docs/tests/phase3-findings.md @@ -0,0 +1,234 @@ +# Phase 3 — vzdump exclusion (B2) & agent operator role + root boundary (B3): Findings + +**Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, node confirmed via +`pvesh get /nodes` → `demo-felhom`. **Date:** 2026-06-08. Throwaway resources (VMIDs +9010-9023, role/user `FelhomAgent`/`felhom-agent@pve`); all torn down (only the pre-existing +9000/9001 remain, stopped). Every Proxmox op polled to `task exitstatus` (not the POST +return). + +> Validates the two items the design review (`_design-review.md`) flagged as unvalidated: +> **B2** (what vzdump includes/excludes per LXC mount type + how to keep bulk out) and **B3** +> (the least-privilege operator role + the root-vs-API boundary). Data only. + +--- + +## B2 — vzdump inclusion/exclusion matrix + +**Setup:** one unprivileged LXC `9010` (`nesting=1,keyctl=1`, overlayfs), Docker 29.5.3 +installed, with five sentinel locations: + +| # | location | config | +|---|---|---| +| 1 | rootfs file `/SENTINEL_ROOTFS` | rootfs (`local-lvm:8`) | +| 2 | Docker **named** volume `b2vol` → `SENTINEL_DOCKERVOL` | default driver | +| 3 | `mp1` volume mount `/mnt/mp1` `SENTINEL_MP1` | `local-lvm:1,backup=1` | +| 4 | `mp2` volume mount `/mnt/mp2` `SENTINEL_MP2` | `local-lvm:1,backup=0` | +| 5 | `mp3` **bind** mount `/mnt/mp3` `SENTINEL_MP3` | host `/root/b2-bindsrc` | +| 6 | bulk Docker vol `bulkvol` bound onto mp2 → `SENTINEL_BULK` | `--driver local -o type=none -o o=bind -o device=/mnt/mp2` | + +**The "trap" confirmed at setup:** the Docker named volume's on-disk path is +`/var/lib/docker/volumes/b2vol/_data` — **inside the LXC rootfs**. + +### Result matrix (stop-mode vzdump → `local`, verified 3 ways: vzdump log, archive grep, restore to 9011) + +| Sentinel | location | flag | **in archive?** | restored 9011 | +|---|---|---|---|---| +| `SENTINEL_ROOTFS` | rootfs | — | **INCLUDED** | present | +| `SENTINEL_DOCKERVOL` | Docker named vol (in rootfs) | — | **INCLUDED** ⚠️ the trap | present | +| `SENTINEL_MP1` | volume mp | `backup=1` | **INCLUDED** | present | +| `SENTINEL_MP2` | volume mp | `backup=0` | **EXCLUDED** | absent (vol recreated empty) | +| `SENTINEL_MP3` | bind mount | n/a | **EXCLUDED** | reappears via re-bind only¹ | +| `SENTINEL_BULK` | Docker vol on mp2 | `backup=0` | **EXCLUDED** | absent | + +¹ The bind-mount **data is not in the archive** (archive grep shows no mp3 path). It +reappears in the restored 9011 only because `pct restore` preserves the bind config +`mp3: /root/b2-bindsrc` and re-attaches the **same host dir**. On a *different* host (true DR) +the bind data would be gone unless backed up separately — important for DR planning. + +**vzdump log (verbatim) — the authoritative per-mount decision:** +``` +INFO: including mount point rootfs ('/') in backup +INFO: including mount point mp1 ('/mnt/mp1') in backup +INFO: excluding volume mount point mp2 ('/mnt/mp2') from backup (disabled) +INFO: excluding bind mount point mp3 ('/mnt/mp3') from backup (not a volume) +``` +**Archive contents (verbatim) — `tar --zstd -tf … | grep SENTINEL`:** +``` +./var/lib/docker/volumes/b2vol/_data/SENTINEL_DOCKERVOL +./SENTINEL_ROOTFS +./mnt/mp1/SENTINEL_MP1 +``` +**Restore verification (verbatim) — sentinels in restored 9011:** +``` +PRESENT : /SENTINEL_ROOTFS +PRESENT : /var/lib/docker/volumes/b2vol/_data/SENTINEL_DOCKERVOL +PRESENT : /mnt/mp1/SENTINEL_MP1 +ABSENT : /mnt/mp2/SENTINEL_MP2 +ABSENT : /mnt/mp2/SENTINEL_BULK +PRESENT : /mnt/mp3/SENTINEL_MP3 # via re-bind to same host dir, NOT from archive +``` + +### Proven bulk-exclusion recipe +A "bulk" Docker volume is kept out of the guest vzdump by binding it onto a **volume +mountpoint with `backup=0`**: +1. Attach a Proxmox volume mountpoint with the flag: + `pct set -mpN :,mp=/mnt/bulk,backup=0` +2. Realize the Docker volume on that path: + `docker volume create --driver local -o type=none -o o=bind -o device=/mnt/bulk bulkvol` + (or a compose bind to `/mnt/bulk`). +3. Data written through `bulkvol` lands on the `backup=0` mountpoint → **excluded** from + vzdump, while rootfs/hot sentinels are **included**. Verified: `SENTINEL_BULK` absent from + archive and restore; `SENTINEL_ROOTFS` present. + +### The trap, stated for the placement component +`backup=` is **only honoured for volume mount points** (confirmed: pct manpage + +vzdump log "excluding volume mount point … (disabled)"). A Docker **named volume uses the +default driver and lands in the rootfs**, which is **always backed up** — so a "bulk" volume +left as an ordinary named volume is **silently swept into the whole-guest image**. The +per-volume placement component **must** realize every `bulk` volume as a dedicated `backup=0` +mountpoint (or external bind mount), never a default named volume. + +--- + +## B3 — agent operator role + root-vs-API boundary + +**Caveat applied (Phase 1):** privsep token needs the role on **both** user and token. Setup: +user `felhom-agent@pve` + privsep token `agent`, role `FelhomAgent`, dual-granted at `/`. +All ops driven **as the token** via the REST API; task `exitstatus` polled. + +> ⚠️ **Terminology:** the Phase-1 `FelhomSelfBackup` role is the discarded **guest-side +> self-backup** role (scoped to one guest, *denied* create/allocate). `FelhomAgent` here is +> its **operator-tier replacement** — a different, broader role. Do not conflate. + +### Op matrix (as the scoped token) + +| # | Operation | API call | Result | +|---|---|---|---| +| read | host status | `GET /nodes/$N/status` | **200** (needs `Sys.Audit`) | +| read | storage list | `GET /storage` | **200** (`Datastore.Audit`) | +| 1 | **create LXC, `nesting=1,keyctl=1`** | `POST /nodes/$N/lxc` | **403** — `changing feature flags (except nesting) is only allowed for root@pam` | +| 1′ | create LXC, **nesting-only** | `POST /nodes/$N/lxc` | **200 / OK** | +| 2 | set config (mem/cpu/options + mountpoint w/ `backup` flag) | `PUT /nodes/$N/lxc//config` | **200** | +| 3 | allocate volume | `POST /nodes/$N/storage/local-lvm/content` | **200** (`Datastore.AllocateSpace`) | +| 4 | start | `POST …/status/start` | **OK** (`VM.PowerMgmt`) | +| 5 | stop | `POST …/status/stop` | **OK** | +| 6a | snapshot | `POST …/snapshot` | **OK** (`VM.Snapshot`) | +| 6b | rollback | `POST …/snapshot/s1/rollback` | **OK** (`VM.Snapshot.Rollback`) | +| 7 | stop-mode backup | `POST /nodes/$N/vzdump mode=stop` | **OK** (`VM.Backup`) | +| 8 | restore → fresh vmid | `POST /nodes/$N/lxc restore=1` | **OK** — and **restored CT kept `features: nesting=1,keyctl=1`** | +| 9 | destroy CT | `DELETE /nodes/$N/lxc/?purge=1` | **OK** (`VM.Allocate`) | +| 9b | add storage definition (dir) | `POST /storage` | **200** (`Datastore.Allocate`, **no root**) | + +**The two headline results:** +1. **`keyctl=1` on create is `root@pam`-only.** Verbatim: + `Permission check failed (changing feature flags (except nesting) is only allowed for root@pam)`. + Confirmed this is **not** token-fixable: a **non-privsep `root@pam` token** got the **same + 403**. Only an actual `root@pam` session (OS root / `pct create` as root) can set it. + `nesting` alone is allowed for a scoped token. +2. **Restore preserves `keyctl`.** A token-authorized `vzrestore` of a keyctl archive produced + `9021` with `features: nesting=1,keyctl=1, unprivileged: 1`. So the **DR/restore path is + fully token-covered**; only *fresh provisioning* needs root for the keyctl flag. + +### Paring (each drop shown to still pass, or proven needed) + +| Privilege | Verdict | Evidence | +|---|---|---| +| `Datastore.AllocateTemplate` | **DROP** (unnecessary) | create-from-template succeeded without it (200/OK) | +| `Sys.Audit` | **KEEP** | `GET /nodes/$N/status` → **403** without it (host metrics, `03` §5) | +| `VM.Config.Network` | **KEEP** | create with `net0` → **403 (/vms/…, VM.Config.Network)** without it | +| `VM.Config.Options` | **KEEP** | config `onboot=1` → **403 (/vms/…, VM.Config.Options)** without it | +| `SDN.Use` | **KEEP (added vs review sketch)** | create → **403 (/sdn/zones/localnetwork/vmbr0, SDN.Use)** without it | + +> Corrections to the review's candidate sketch: `VM.Config.CPUMemory` is **not a real +> privilege** — split into `VM.Config.CPU` + `VM.Config.Memory`. `SDN.Use` was **missing** and +> is **required** (PVE 9 gates bridge use behind it). `Datastore.AllocateTemplate` is **not +> needed**. + +### Final minimal `FelhomAgent` role (proven sufficient for ops 1′–9b) +``` +VM.Allocate VM.Audit VM.Config.Disk VM.Config.CPU VM.Config.Memory +VM.Config.Network VM.Config.Options VM.PowerMgmt VM.Snapshot VM.Snapshot.Rollback +VM.Backup Datastore.Allocate Datastore.AllocateSpace Datastore.Audit Sys.Audit SDN.Use +``` +(16 privileges. `Datastore.Allocate` is for the storage-definition add; drop it if the agent +never creates Proxmox storage entries via the API. `VM.PowerMgmt` is for start/stop lifecycle +— not for the backup itself, consistent with `proxmox-platform.md` §3.4.) + +### Root-vs-API boundary table (answers `03` §3) + +| Agent host operation | Coverage | Notes | +|---|---|---| +| Create unprivileged LXC, **nesting-only** | **API token** | `VM.Allocate`+`VM.Config.*`+`Datastore.AllocateSpace`+`SDN.Use` | +| **Create with `keyctl=1` (Docker needs it — Phase 0)** | **OS root `root@pam`** (`pct create` as root / sudoers) | no API token works, incl. a root@pam token | +| Set config (mem/cpu/net/options/mountpoint + `backup` flag) | API token | | +| Allocate guest volume | API token | `Datastore.AllocateSpace` | +| Start / stop / snapshot / rollback | API token | `VM.PowerMgmt` / `VM.Snapshot(.Rollback)` | +| vzdump backup (stop/snapshot mode) | API token | `VM.Backup` | +| **Restore from vzdump (preserves keyctl)** | **API token** | DR path needs no root | +| Destroy guest (scratch + compensating rollback, B1) | API token | `VM.Allocate` | +| Add Proxmox **storage definition** (dir/nfs/cifs/pbs) | API token | `Datastore.Allocate`; the *definition* only | +| Host status / metrics report | API token | `Sys.Audit` | +| **USB physical mount-by-UUID / systemd mount unit / fstab** | **OS root / narrow sudoers** | not a Proxmox API op (host-level mount; not tested here) | +| **SMART / hardware sensors** | OS root | not API-exposed | + +**Boundary summary:** nearly the entire guest lifecycle — including **restore** — is covered +by the scoped token. The genuine OS-root residual is narrow: **(1) fresh creation of a +Docker-capable LXC (the `keyctl` flag), (2) physical USB mount-by-UUID / systemd mount units / +fstab, (3) hardware/SMART.** This supports `03` §3's "non-root service + scoped token + narrow +sudoers" model — with the **specific** sudoers/root entries being: `pct create` (or just the +keyctl-setting step) and the host mount operations. + +--- + +## Raw command log (appendix) + +### B2 +``` +pct create 9010 ... --features nesting=1,keyctl=1 --unprivileged 1 # rootfs local-lvm:8 +pct set 9010 -mp1 local-lvm:1,mp=/mnt/mp1,backup=1 +pct set 9010 -mp2 local-lvm:1,mp=/mnt/mp2,backup=0 +pct set 9010 -mp3 /root/b2-bindsrc,mp=/mnt/mp3 +# docker named vol: docker volume inspect b2vol -> /var/lib/docker/volumes/b2vol/_data +# bulk: docker volume create --driver local -o type=none -o o=bind -o device=/mnt/mp2 bulkvol +vzdump 9010 --mode stop --storage local --compress zstd +# INFO: including mount point rootfs ('/') in backup +# INFO: including mount point mp1 ('/mnt/mp1') in backup +# INFO: excluding volume mount point mp2 ('/mnt/mp2') from backup (disabled) +# INFO: excluding bind mount point mp3 ('/mnt/mp3') from backup (not a volume) +tar --zstd -tf | grep SENTINEL # -> rootfs, dockervol, mp1 only +pct restore 9011 --storage local-lvm # -> mp2/bulk absent, mp3 via re-bind +``` + +### B3 +``` +pveum role add FelhomAgent -privs "VM.Allocate VM.Audit VM.Config.Disk VM.Config.CPU VM.Config.Memory VM.Config.Network VM.Config.Options VM.PowerMgmt VM.Snapshot VM.Snapshot.Rollback VM.Backup Datastore.Allocate Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Sys.Audit" # candidate (pre-SDN) +pveum user add felhom-agent@pve ; pveum user token add felhom-agent@pve agent --privsep 1 +pveum acl modify / -user 'felhom-agent@pve' -role FelhomAgent +pveum acl modify / -token 'felhom-agent@pve!agent' -role FelhomAgent + +# token create with keyctl: +POST /nodes/demo-felhom/lxc ... features=nesting=1,keyctl=1 + -> 403 "changing feature flags (except nesting) is only allowed for root@pam" +# + SDN.Use missing initially: + -> 403 "Permission check failed (/sdn/zones/localnetwork/vmbr0, SDN.Use)" +# root@pam non-privsep token, keyctl create: + -> 403 (same "only allowed for root@pam") # tokens never qualify + +# token nesting-only create / config(PUT) / start / stop / snapshot / rollback / +# vzdump(stop) / restore->9021 (kept keyctl) / destroy / POST /storage -> all 200/OK + +# paring: +GET /nodes/$N/status without Sys.Audit -> 403 (KEEP) +create net0 without VM.Config.Network -> 403 (KEEP) +config onboot=1 without VM.Config.Options -> 403 (KEEP) +create from template without Datastore.AllocateTemplate -> OK (DROP) +``` + +### Teardown +``` +pct destroy 9010 9011 9021 --purge # 9020/9022/9023 already destroyed during tests +pveum user token remove felhom-agent@pve agent ; pveum user delete felhom-agent@pve +pveum role delete FelhomAgent # ACLs at / auto-invalidated +rm -f /var/lib/vz/dump/vzdump-lxc-9010-* /var/lib/vz/dump/vzdump-lxc-9020-* +# verified: only 9000/9001 remain (stopped-but-present); no felhom-agent user/role; dump dir empty +```