# Proxmox Spike — API & Access-Control Reference Reference for the **controller-as-guest** architecture, synthesized from current Proxmox VE 9.x documentation (June 2026). Items marked **[confirm on box]** should be verified once PVE is installed — treat them as Phase 0/1 verification steps, not gospel. Every Proxmox CLI tool is a thin wrapper over the same REST API, so anything below is reachable from Go. --- ## 1. API fundamentals - **Base URL:** `https://192.168.0.162:8006/api2/json` - **Auth (API token):** HTTP header `Authorization: PVEAPIToken=USER@REALM!TOKENID=SECRET` The secret is shown **once** at creation — capture it immediately, it can't be retrieved again. - **Response shape:** `{ "data": ... }`; errors come back via HTTP status + body. - **Discovery (do this live on the box instead of trusting any doc):** - `pvesh get /version` - `pvesh ls /nodes//qemu/` - Full schema browser: `https://pve.proxmox.com/pve-docs/api-viewer/` - "What call does the GUI make?" → perform the action in the web UI with browser DevTools → Network open and read the request. Fastest way to find the exact endpoint + params for anything. - **Async tasks:** long operations (backup, restore, clone) return a **UPID** (task id), not a result. Poll `GET /nodes//tasks//status` until `status: stopped`, then check `exitstatus`. The controller must poll, not block. **[confirm on box]** the exact polling/response shape. --- ## 2. RBAC model — (path, principal, role) An ACL entry is a triple of **(path, user/group/token, role)**. A role is a bundle of privileges, assigned at the most specific path possible. - **Paths:** `/`, `/vms/`, `/nodes/`, `/storage/`, `/pool/`, `/access/...` - **Predefined roles include:** `PVEAuditor` (read-only), `PVEVMUser`, `PVEVMAdmin`, `PVEDatastoreUser`, `PVEAdmin`, `PVEUserAdmin`. - **API tokens with privilege separation (`--privsep 1`):** the token's effective permissions are the **intersection** of (a) the backing user's permissions and (b) the token's own ACLs. A privsep token can therefore never exceed its user, and you grant it a separate, minimal ACL. This is exactly the property the in-guest controller needs. Introspection: ```bash pveum role list pveum role info PVEVMAdmin pveum user permissions --path /vms/ ``` --- ## 3. Two-tier privilege model (our architecture decision) **Tier A — in-guest controller (customer-facing, NARROW).** Runs inside the customer's guest. Token scoped to *that guest's own VMID only*: read its own status/config, snapshot itself, back itself up, write the backup to the datastore. Cannot see or touch other guests. The LXC/VM's own privilege level is irrelevant here — reaching `host:8006` is just an HTTPS call + token. **Tier B — operator (provisioning, BROAD).** Creates/destroys guests, builds the golden template, attaches storage, wires PBS. Lives operator-side (hub / tooling), never on the customer box. ### Phase 1 runbook — minimal self-backup role + scoped token ```bash # 1. Custom least-privilege role: "back up / snapshot myself" # [confirm on box: exact privilege names via `pveum role list` / api-viewer] pveum role add FelhomSelfBackup \ -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit" # 2. Dedicated API-only user in the PVE realm (no login password) pveum user add felhom-ctl@pve --comment "In-guest controller (self-backup)" # 3. Privsep token for that user (SECRET shown once) pveum user token add felhom-ctl@pve ctl --privsep 1 # 4. Scope the TOKEN to one guest + the backup datastore only pveum acl modify /vms/ -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup pveum acl modify /storage/ -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup # 5. Test FROM INSIDE the guest curl -k https://:8006/api2/json/version \ -H "Authorization: PVEAPIToken=felhom-ctl@pve!ctl=" curl -k -X POST https://:8006/api2/json/nodes//vzdump \ -H "Authorization: PVEAPIToken=felhom-ctl@pve!ctl=" \ -d "vmid=&storage=&mode=snapshot" ``` **Pass criteria:** the token backs up its OWN vmid, and returns **403** on any other vmid. That single result validates the whole controller-as-guest design. **Open question to settle here:** does Tier A also need `VM.PowerMgmt` so it can stop/start its own guest for `stop`-mode backups? Likely yes — add it and re-test. --- ## 4. Backup / restore (vzdump) **Modes:** - **`stop`** — orderly guest shutdown → live backup → resume. Highest consistency, short defined downtime. - **`snapshot`** — lowest downtime; copies blocks while running. *Small inconsistency risk* unless the guest cooperates (see below). - **`suspend`** — legacy/compat, longer downtime, not recommended. **App-consistency — the concrete version of the earlier warning:** - **VM:** install `qemu-guest-agent` in the guest and set `agent: 1`. `snapshot`-mode vzdump then calls `guest-fsfreeze-freeze` / `-thaw` around the copy → near-free filesystem consistency. **This is a real point in the VM's favour over LXC.** - **LXC:** no guest agent → no fsfreeze. App-consistency becomes the *controller's* job: quiesce in-guest first (stop stacks / flush DBs) **then** vzdump, or use `stop` mode. Same lesson as the restic work, moved to the guest layer. **CLI / API:** ```bash vzdump --mode snapshot --storage # CLI # API (async → UPID): POST /api2/json/nodes//vzdump params: vmid, storage, mode, ... ``` **Restore is NOT a single "restore" call** — you recreate the guest from the archive: - **VM:** `qmrestore ` / `POST /nodes//qemu` with `archive=...` - **LXC:** `pct restore ` / `POST /nodes//lxc` with the archive as source Phase 2's real-restore test = restore to a **fresh vmid** and boot it. Do not declare the backup "working" until a restored guest actually runs. --- ## 5. Key REST endpoints (qemu shown; lxc is parallel under `/lxc`) ``` GET /nodes GET /nodes//qemu list VMs GET /nodes//qemu//status/current live status GET /nodes//qemu//config config POST /nodes//qemu//status/{start,stop,shutdown,reboot} POST /nodes//qemu//snapshot (snapname, description) GET /nodes//qemu//snapshot list snapshots POST /nodes//qemu//snapshot//rollback POST /nodes//vzdump backup (async, UPID) GET /nodes//tasks//status poll async task ``` LXC: replace `/qemu/` with `/lxc/`. For **Docker-in-LXC** the container needs `features nesting=1,keyctl=1` (`pct set -features nesting=1,keyctl=1`, or the `features` property on `POST /nodes//lxc`) — **[confirm on box]**. --- ## 6. Phase 0 confirm-on-box checklist - [ ] PVE 9.2 installed; storage = LVM-thin (leave free space to also test dir/qcow2) - [ ] Exact privilege set for `FelhomSelfBackup` (`pveum role info`) - [ ] UPID task-polling response shape - [ ] Docker official apt repo has a `trixie` channel - [ ] LXC `features nesting=1,keyctl=1` syntax + Docker actually runs inside an LXC - [ ] Baseline idle + under-load RAM/CPU: one Debian VM vs one Debian LXC, identical resources