7.3 KiB
Proxmox Spike — API & Access-Control Reference
Reference for the controller-as-guest architecture, synthesized from current Proxmox VE 9.x documentation (June 2026).
Items marked [confirm on box] should be verified once PVE is installed — treat them as Phase 0/1 verification steps, not gospel. Every Proxmox CLI tool is a thin wrapper over the same REST API, so anything below is reachable from Go.
1. API fundamentals
- Base URL:
https://192.168.0.162:8006/api2/json - Auth (API token): HTTP header
Authorization: PVEAPIToken=USER@REALM!TOKENID=SECRETThe secret is shown once at creation — capture it immediately, it can't be retrieved again. - Response shape:
{ "data": ... }; errors come back via HTTP status + body. - Discovery (do this live on the box instead of trusting any doc):
pvesh get /versionpvesh ls /nodes/<node>/qemu/<vmid>- Full schema browser:
https://pve.proxmox.com/pve-docs/api-viewer/ - "What call does the GUI make?" → perform the action in the web UI with browser DevTools → Network open and read the request. Fastest way to find the exact endpoint + params for anything.
- Async tasks: long operations (backup, restore, clone) return a UPID
(task id), not a result. Poll
GET /nodes/<node>/tasks/<upid>/statusuntilstatus: stopped, then checkexitstatus. The controller must poll, not block. [confirm on box] the exact polling/response shape.
2. RBAC model — (path, principal, role)
An ACL entry is a triple of (path, user/group/token, role). A role is a bundle of privileges, assigned at the most specific path possible.
- Paths:
/,/vms/<vmid>,/nodes/<node>,/storage/<store>,/pool/<pool>,/access/... - Predefined roles include:
PVEAuditor(read-only),PVEVMUser,PVEVMAdmin,PVEDatastoreUser,PVEAdmin,PVEUserAdmin. - API tokens with privilege separation (
--privsep 1): the token's effective permissions are the intersection of (a) the backing user's permissions and (b) the token's own ACLs. A privsep token can therefore never exceed its user, and you grant it a separate, minimal ACL. This is exactly the property the in-guest controller needs.
Introspection:
pveum role list
pveum role info PVEVMAdmin
pveum user permissions <user> --path /vms/<vmid>
3. Two-tier privilege model (our architecture decision)
Tier A — in-guest controller (customer-facing, NARROW).
Runs inside the customer's guest. Token scoped to that guest's own VMID only:
read its own status/config, snapshot itself, back itself up, write the backup to
the datastore. Cannot see or touch other guests. The LXC/VM's own privilege
level is irrelevant here — reaching host:8006 is just an HTTPS call + token.
Tier B — operator (provisioning, BROAD). Creates/destroys guests, builds the golden template, attaches storage, wires PBS. Lives operator-side (hub / tooling), never on the customer box.
Phase 1 runbook — minimal self-backup role + scoped token
# 1. Custom least-privilege role: "back up / snapshot myself"
# [confirm on box: exact privilege names via `pveum role list` / api-viewer]
pveum role add FelhomSelfBackup \
-privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit"
# 2. Dedicated API-only user in the PVE realm (no login password)
pveum user add felhom-ctl@pve --comment "In-guest controller (self-backup)"
# 3. Privsep token for that user (SECRET shown once)
pveum user token add felhom-ctl@pve ctl --privsep 1
# 4. Scope the TOKEN to one guest + the backup datastore only
pveum acl modify /vms/<vmid> -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
pveum acl modify /storage/<store> -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
# 5. Test FROM INSIDE the guest
curl -k https://<host>:8006/api2/json/version \
-H "Authorization: PVEAPIToken=felhom-ctl@pve!ctl=<SECRET>"
curl -k -X POST https://<host>:8006/api2/json/nodes/<node>/vzdump \
-H "Authorization: PVEAPIToken=felhom-ctl@pve!ctl=<SECRET>" \
-d "vmid=<vmid>&storage=<store>&mode=snapshot"
Pass criteria: the token backs up its OWN vmid, and returns 403 on any other vmid. That single result validates the whole controller-as-guest design.
Open question to settle here: does Tier A also need VM.PowerMgmt so it can
stop/start its own guest for stop-mode backups? Likely yes — add it and re-test.
4. Backup / restore (vzdump)
Modes:
stop— orderly guest shutdown → live backup → resume. Highest consistency, short defined downtime.snapshot— lowest downtime; copies blocks while running. Small inconsistency risk unless the guest cooperates (see below).suspend— legacy/compat, longer downtime, not recommended.
App-consistency — the concrete version of the earlier warning:
- VM: install
qemu-guest-agentin the guest and setagent: 1.snapshot-mode vzdump then callsguest-fsfreeze-freeze/-thawaround the copy → near-free filesystem consistency. This is a real point in the VM's favour over LXC. - LXC: no guest agent → no fsfreeze. App-consistency becomes the
controller's job: quiesce in-guest first (stop stacks / flush DBs) then
vzdump, or use
stopmode. Same lesson as the restic work, moved to the guest layer.
CLI / API:
vzdump <vmid> --mode snapshot --storage <store> # CLI
# API (async → UPID):
POST /api2/json/nodes/<node>/vzdump params: vmid, storage, mode, ...
Restore is NOT a single "restore" call — you recreate the guest from the archive:
- VM:
qmrestore <archive> <newvmid>/POST /nodes/<node>/qemuwitharchive=... - LXC:
pct restore <newvmid> <archive>/POST /nodes/<node>/lxcwith the archive as source
Phase 2's real-restore test = restore to a fresh vmid and boot it. Do not declare the backup "working" until a restored guest actually runs.
5. Key REST endpoints (qemu shown; lxc is parallel under /lxc)
GET /nodes
GET /nodes/<node>/qemu list VMs
GET /nodes/<node>/qemu/<vmid>/status/current live status
GET /nodes/<node>/qemu/<vmid>/config config
POST /nodes/<node>/qemu/<vmid>/status/{start,stop,shutdown,reboot}
POST /nodes/<node>/qemu/<vmid>/snapshot (snapname, description)
GET /nodes/<node>/qemu/<vmid>/snapshot list snapshots
POST /nodes/<node>/qemu/<vmid>/snapshot/<snap>/rollback
POST /nodes/<node>/vzdump backup (async, UPID)
GET /nodes/<node>/tasks/<upid>/status poll async task
LXC: replace /qemu/ with /lxc/. For Docker-in-LXC the container needs
features nesting=1,keyctl=1 (pct set <vmid> -features nesting=1,keyctl=1, or
the features property on POST /nodes/<node>/lxc) — [confirm on box].
6. Phase 0 confirm-on-box checklist
- PVE 9.2 installed; storage = LVM-thin (leave free space to also test dir/qcow2)
- Exact privilege set for
FelhomSelfBackup(pveum role info) - UPID task-polling response shape
- Docker official apt repo has a
trixiechannel - LXC
features nesting=1,keyctl=1syntax + Docker actually runs inside an LXC - Baseline idle + under-load RAM/CPU: one Debian VM vs one Debian LXC, identical resources