feat(agent): scaffold + proxmox interaction layer (slice 1)

Stand up the felhom-agent project (module gitea.dooplex.hu/admin/felhom-agent,
binary felhom-agent) and the internal/proxmox package: the typed library every
other agent module calls to talk to Proxmox.

- API-first Client (hand-rolled REST over net/http, PVEAPIToken auth) with typed
  read ops (version/nodes/status/lxc/config/storage) and async mutating ops
  (restore/vzdump/snapshot/rollback/delete-snapshot/setconfig/start/stop), each
  returning a UPID. WaitTask polls task status until stopped and asserts
  exitstatus OK (authz can surface at task exec, not the POST — phase1-2 §1.3).
- Fenced Privileged (root-CLI) backend for the THREE proven exceptions only
  (keyctl pct create, USB mount/fstab, SMART/sensors); each cites why it can't be
  the API. Fence is structural (Client never shells out, Privileged never HTTPs)
  and asserted in routing_test.go.
- TLS: SHA-256 leaf-cert pinning or CA file; insecure mode explicit + off by
  default. No blanket verification disable.
- 403 -> privilege-named APIError; failed task -> privilege-named TaskError.
- JSON config + env overrides (token never logged); slog logging.
- cmd/felhom-agent --selftest (read-only health report) + gated --selftest=task
  (reversible snapshot/rollback/delete exercise of WaitTask). No daemon loop yet.
- Types grounded in the spike findings and exact JSON shapes captured live from
  demo-felhom (PVE 9.2.2). Unit tests use a mock transport + runner.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 14:34:32 +02:00
parent 4d84207572
commit a042316d6d
24 changed files with 2240 additions and 0 deletions
+11
View File
@@ -0,0 +1,11 @@
# build output
/felhom-agent
/felhom-agent.exe
/dist/
# local config that may carry a real token secret
/agent.json
*.local.json
# go
/vendor/
+126
View File
@@ -0,0 +1,126 @@
# felhom-agent
The **host agent** for the Felhom platform — the operator-tier component that runs on each
Proxmox host and owns *all* Proxmox interaction (provision/restore guests, host storage,
backups, host+tunnel monitoring, hub control loop, per-guest local API). Design:
[`felhom.eu/documentation/architecture/03-host-agent.md`](https://gitea.dooplex.hu/admin/felhom.eu/raw/branch/main/documentation/architecture/03-host-agent.md).
> **Status — slice 1 of N.** This repo currently contains the project scaffold and the
> **`internal/proxmox`** interaction layer (the typed library every other module will call to
> talk to Proxmox), plus a runnable read-only `--selftest`. **No** reconcile loop, hub client,
> signing, or storage/backup orchestration yet — those are later slices.
Module: `gitea.dooplex.hu/admin/felhom-agent` · binary: `felhom-agent` · Go 1.24.
## Layout
```
cmd/felhom-agent/ # entry point + --selftest (wiring only; no daemon loop yet)
internal/proxmox/ # the Proxmox interaction layer (API-first + fenced root-CLI)
internal/config/ # JSON config + env overrides (secrets never logged)
internal/log/ # slog setup
configs/agent.example.json
```
## The `proxmox` package — model
Two backends, one fixed routing policy (the fence is structural — `Client` never shells out,
`Privileged` never makes an HTTP call; asserted in `routing_test.go`):
| | Backend | Used for |
|---|---|---|
| **API (default)** | `proxmox.Client` | everything the scoped **FelhomAgent** token can do |
| **root-CLI (fenced)** | `proxmox.Privileged` | the **three** proven OS-root exceptions only |
Grounded entirely in the spike findings (`felhom.eu/documentation/proxmox-platform.md`,
`tests/phase{0,1-2,3}-findings.md`). Every mutating API op is **async**: it returns a UPID and
the caller `WaitTask`s until the task stops, then asserts `exitstatus == "OK"` — authorization
can surface at task execution, not the HTTP POST (phase1-2 §1.3).
### Public surface
`Client` (API):
- Read: `Version`, `Nodes`, `NodeStatus`, `ListLXC`, `GuestStatus`, `GuestConfig`,
`ListStorage`, `NodeStorage`, `StorageContent`.
- Async mutating (return UPID): `RestoreLXC` (primary create path), `Vzdump`, `Snapshot`,
`Rollback`, `DeleteSnapshot`, `SetConfig`, `Start`, `Stop`.
- Tasks: `WaitTask`, `TaskStatusOnce`, `TaskLogTail`.
- Errors: `*APIError` (parses the offending privilege from a 403), `*TaskError` (parses it from
a failed task `exitstatus`).
`Privileged` (fenced root-CLI) — each method documents *why it can't be the API*:
- `CreateGoldenLXC``pct create` with `keyctl=1` (root@pam-only; the only root-fenced create —
the per-customer path provisions by **restore**, which preserves keyctl).
- `MountUSBByUUID` — host mount-by-UUID (not a Proxmox API op).
- `SMART`, `Sensors` — hardware reads (not API-exposed).
### API-vs-root routing table
See the table in [`internal/proxmox/doc.go`](internal/proxmox/doc.go). Summary: the entire guest
lifecycle **including restore** is API-token-covered; OS-root is confined to golden-image
`keyctl` create, host mounts, and SMART/sensors (phase3 §B3).
### TLS trust
The host serves a self-signed cert. Verification is **not** blanket-disabled. Pick one in
config: `ca_file` (PEM, full verify), `fingerprint` (SHA-256 of the host leaf cert — pinned
exact-cert match; the `/nodes` API returns each node's `ssl_fingerprint` to pin), or the
explicitly-named `insecure_skip_verify` (off by default; selftest-against-127.0.0.1 only).
## Provisioning the token (out-of-band, operator side)
The agent only **consumes** a privilege-separated API token; role setup is a provisioning step.
The role must be granted on **both the user AND the token** for the same path, or the
intersection is empty and every call 403s (phase1-2 §1.2):
```bash
pveum role add FelhomAgent -privs "VM.Allocate VM.Audit VM.Config.Disk VM.Config.CPU \
VM.Config.Memory VM.Config.Network VM.Config.Options VM.PowerMgmt VM.Snapshot \
VM.Snapshot.Rollback VM.Backup Datastore.Allocate Datastore.AllocateSpace \
Datastore.Audit Sys.Audit SDN.Use" # 16 privileges, validated Phase 3 B3
pveum user add felhom-agent@pve
pveum user token add felhom-agent@pve agent --privsep 1 # capture the secret (shown once)
pveum acl modify / -user 'felhom-agent@pve' -role FelhomAgent
pveum acl modify / -token 'felhom-agent@pve!agent' -role FelhomAgent
```
(`VM.Config.CPUMemory` is **not** a real privilege; `SDN.Use` **is** required for bridge use.)
## Run
```bash
go build ./...
# read-only health check against the host:
./felhom-agent --config configs/agent.example.json --selftest
# or via env (keeps the secret off disk):
FELHOM_AGENT_PROXMOX_TOKEN='felhom-agent@pve!agent=SECRET' \
FELHOM_AGENT_PROXMOX_NODE=demo-felhom \
FELHOM_AGENT_PROXMOX_ENDPOINT=https://192.168.0.162:8006 \
FELHOM_AGENT_PROXMOX_TLS_FINGERPRINT='BA:7C:...:CF' \
./felhom-agent --selftest
```
`--selftest` (read-only) loads config, builds the API client, and runs the read queries (version,
nodes, node status, guests, storage), printing a short health report. It mutates nothing and says
so cleanly if the token/endpoint isn't configured.
`--selftest=task --vmid N` (explicitly gated) exercises `WaitTask` on a **reversible** op
(snapshot → rollback → delete-snapshot) against guest `N`. Default `--selftest` never mutates.
## Process model (proposed, not finalized — see 03 §3/§12)
Native Go binary, systemd service, **non-root** service user holding the scoped token, with a
**narrow sudoers allowlist** for the three fenced ops. `privileged.mode: "sudo"` matches this;
`"direct"` is for dev/CI where the agent is already root.
## Test
```bash
go vet ./... && go test ./...
```
Unit tests use a mock HTTP transport + mock runner (no live host): UPID parse, `WaitTask`
(running→OK / running→failed-403 / timeout / ctx-cancel), 403→privilege-named error, response
decoding against the captured live shapes, and the API-vs-root routing fence.
+234
View File
@@ -0,0 +1,234 @@
// Command felhom-agent is the host agent (slice 1: scaffold + proxmox layer).
//
// This slice is wiring only: it has no daemon/reconcile loop yet (slice 3/4). It
// exposes a read-only --selftest that exercises the proxmox package against a live
// host, and an explicitly-gated --selftest=task that exercises WaitTask on a
// reversible op (snapshot -> rollback -> delete-snapshot).
package main
import (
"context"
"errors"
"flag"
"fmt"
"log/slog"
"os"
"os/signal"
"syscall"
"time"
"gitea.dooplex.hu/admin/felhom-agent/internal/config"
applog "gitea.dooplex.hu/admin/felhom-agent/internal/log"
"gitea.dooplex.hu/admin/felhom-agent/internal/proxmox"
)
func main() {
var (
cfgPath string
selftest selftestFlag
vmid int
)
flag.StringVar(&cfgPath, "config", envOr("FELHOM_AGENT_CONFIG", "/etc/felhom-agent/agent.json"), "path to the agent config file (JSON)")
flag.Var(&selftest, "selftest", "run a self-test and exit: bare/`read` = read-only queries; `task` = reversible mutating exercise (needs -vmid)")
flag.IntVar(&vmid, "vmid", 0, "guest VMID for --selftest=task (the reversible snapshot/rollback exercise)")
flag.Parse()
cfg, err := config.Load(cfgPath)
if err != nil {
// A missing default config file is fine if env provides the values; only a
// present-but-unreadable/invalid file is fatal here.
if !(os.IsNotExist(errors.Unwrap(err)) && cfgPath == flag.Lookup("config").DefValue) {
fmt.Fprintln(os.Stderr, "config error:", err)
os.Exit(2)
}
cfg = config.Default()
}
logger := applog.New(cfg.LogLevel)
switch selftest.mode {
case "":
// No daemon loop yet.
logger.Info("felhom-agent slice-1 scaffold; no run loop yet",
"hint", "use --selftest (read-only) or --selftest=task --vmid N")
// TODO: poll loop — slice 3/4.
return
case "read":
os.Exit(runSelftestRead(context.Background(), cfg, logger))
case "task":
os.Exit(runSelftestTask(context.Background(), cfg, logger, vmid))
}
}
// runSelftestRead loads config, builds the API client, and runs the read-only
// queries against the live host, printing a short health report. It mutates
// nothing. Missing/invalid config is reported cleanly (no panic).
func runSelftestRead(ctx context.Context, cfg config.Config, logger *slog.Logger) int {
if err := cfg.Validate(); err != nil {
fmt.Fprintln(os.Stderr, "selftest: not configured:", err)
return 1
}
logger.Info("selftest (read-only) starting", "config", fmt.Sprintf("%+v", cfg.Redacted().Proxmox))
client, err := proxmox.NewClient(proxmox.Config{
Endpoint: cfg.Proxmox.Endpoint,
Node: cfg.Proxmox.Node,
Token: cfg.Proxmox.Token,
TLS: proxmox.TLSConfig{
CAFile: cfg.Proxmox.TLS.CAFile,
Fingerprint: cfg.Proxmox.TLS.Fingerprint,
InsecureSkipVerify: cfg.Proxmox.TLS.InsecureSkipVerify,
},
})
if err != nil {
fmt.Fprintln(os.Stderr, "selftest: client init:", err)
return 1
}
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
fmt.Println("=== felhom-agent selftest (read-only) ===")
fmt.Printf("endpoint : %s node=%s\n", cfg.Proxmox.Endpoint, cfg.Proxmox.Node)
fail := 0
report := func(label string, err error) bool {
if err != nil {
fmt.Printf(" [FAIL] %-14s %v\n", label, err)
fail++
return false
}
return true
}
if v, err := client.Version(ctx); report("version", err) {
fmt.Printf(" [ ok ] %-14s PVE %s (release %s)\n", "version", v.Version, v.Release)
}
if nodes, err := client.Nodes(ctx); report("nodes", err) {
fmt.Printf(" [ ok ] %-14s %d node(s)\n", "nodes", len(nodes))
for _, n := range nodes {
marker := " "
if n.Node == cfg.Proxmox.Node {
marker = "* "
}
fmt.Printf(" %s%s status=%s fp=%s…\n", marker, n.Node, n.Status, head(n.SSLFingerprint, 17))
}
}
if s, err := client.NodeStatus(ctx); report("node status", err) {
fmt.Printf(" [ ok ] %-14s up %s, load %v, mem %s/%s, root %s/%s\n", "node status",
dur(s.Uptime), s.LoadAvg,
gib(s.Memory.Used), gib(s.Memory.Total), gib(s.RootFS.Used), gib(s.RootFS.Total))
}
if gs, err := client.ListLXC(ctx); report("list lxc", err) {
fmt.Printf(" [ ok ] %-14s %d guest(s)\n", "list lxc", len(gs))
for _, g := range gs {
fmt.Printf(" - %d %q status=%s\n", g.VMID, g.Name, g.Status)
}
}
if ss, err := client.NodeStorage(ctx); report("storage", err) {
fmt.Printf(" [ ok ] %-14s %d store(s)\n", "storage", len(ss))
for _, s := range ss {
fmt.Printf(" - %-10s type=%-8s content=%s used=%s/%s\n",
s.Storage, s.Type, s.Content, gib(s.Used), gib(s.Total))
}
}
if fail > 0 {
fmt.Printf("=== selftest FAILED (%d check(s)) ===\n", fail)
return 1
}
fmt.Println("=== selftest OK ===")
return 0
}
// runSelftestTask exercises WaitTask on a reversible op against -vmid: snapshot ->
// rollback -> delete-snapshot. Explicitly gated; never runs under bare --selftest.
func runSelftestTask(ctx context.Context, cfg config.Config, logger *slog.Logger, vmid int) int {
if err := cfg.Validate(); err != nil {
fmt.Fprintln(os.Stderr, "selftest: not configured:", err)
return 1
}
if vmid == 0 {
fmt.Fprintln(os.Stderr, "selftest=task requires -vmid N (a guest safe to snapshot/rollback)")
return 2
}
client, err := proxmox.NewClient(proxmox.Config{
Endpoint: cfg.Proxmox.Endpoint, Node: cfg.Proxmox.Node, Token: cfg.Proxmox.Token,
TLS: proxmox.TLSConfig{
CAFile: cfg.Proxmox.TLS.CAFile, Fingerprint: cfg.Proxmox.TLS.Fingerprint,
InsecureSkipVerify: cfg.Proxmox.TLS.InsecureSkipVerify,
},
})
if err != nil {
fmt.Fprintln(os.Stderr, "selftest: client init:", err)
return 1
}
// Ctrl-C aborts the wait cleanly.
ctx, stop := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
defer stop()
const snap = "felhom-selftest"
steps := []struct {
name string
do func() (string, error)
}{
{"snapshot", func() (string, error) { return client.Snapshot(ctx, vmid, snap, "felhom-agent selftest") }},
{"rollback", func() (string, error) { return client.Rollback(ctx, vmid, snap) }},
{"delete-snapshot", func() (string, error) { return client.DeleteSnapshot(ctx, vmid, snap) }},
}
fmt.Printf("=== felhom-agent selftest=task (vmid %d, snapshot %q) ===\n", vmid, snap)
for _, st := range steps {
upid, err := st.do()
if err != nil {
fmt.Printf(" [FAIL] %-16s %v\n", st.name, err)
return 1
}
fmt.Printf(" .... %-16s upid=%s\n", st.name, upid)
status, err := client.WaitTask(ctx, upid, proxmox.WaitOptions{})
if err != nil {
fmt.Printf(" [FAIL] %-16s %v\n", st.name, err)
return 1
}
fmt.Printf(" [ ok ] %-16s exitstatus=%s\n", st.name, status.ExitStatus)
}
fmt.Println("=== selftest=task OK ===")
return 0
}
// --- small helpers / flag type ---
func envOr(key, def string) string {
if v := os.Getenv(key); v != "" {
return v
}
return def
}
func head(s string, n int) string {
if len(s) <= n {
return s
}
return s[:n]
}
func dur(seconds int64) string { return (time.Duration(seconds) * time.Second).String() }
func gib(bytes int64) string { return fmt.Sprintf("%.1fGiB", float64(bytes)/(1<<30)) }
// selftestFlag is a flag.Value that also satisfies IsBoolFlag, so `--selftest`
// works bare (read-only) and `--selftest=task` / `--selftest=read` set the mode.
type selftestFlag struct{ mode string }
func (f *selftestFlag) String() string { return f.mode }
func (f *selftestFlag) IsBoolFlag() bool { return true }
func (f *selftestFlag) Set(v string) error {
switch v {
case "true", "", "read":
f.mode = "read"
case "task":
f.mode = "task"
default:
return fmt.Errorf("invalid --selftest value %q (want read|task)", v)
}
return nil
}
+17
View File
@@ -0,0 +1,17 @@
{
"proxmox": {
"endpoint": "https://127.0.0.1:8006",
"node": "demo-felhom",
"token": "felhom-agent@pve!agent=REPLACE_WITH_SECRET",
"tls": {
"ca_file": "",
"fingerprint": "BA:7C:99:7D:45:D0:67:91:E2:F2:72:74:6E:D6:9F:83:51:D1:61:E5:C3:BD:F6:A0:B8:0B:E3:D8:DB:89:5B:CF",
"insecure_skip_verify": false
}
},
"privileged": {
"mode": "sudo",
"sudo_path": "sudo"
},
"log_level": "info"
}
+3
View File
@@ -0,0 +1,3 @@
module gitea.dooplex.hu/admin/felhom-agent
go 1.24
+145
View File
@@ -0,0 +1,145 @@
// Package config loads the felhom-agent configuration the proxmox layer needs.
//
// Format: a JSON file (stdlib-only — no YAML dep, consistent with the agent's
// "pure stdlib" constraint), with per-field environment overrides. Secrets (the
// API token) are never logged; see Config.Redacted.
//
// OPEN item (noted in the slice reply): the controller/hub use YAML; if matching
// that house style is preferred over the zero-dependency constraint, the loader
// can swap to yaml.v3 without touching call sites.
package config
import (
"encoding/json"
"fmt"
"os"
"strconv"
"strings"
)
// Config is the agent configuration. Only the fields the proxmox interaction
// layer needs are present in this slice.
type Config struct {
Proxmox ProxmoxConfig `json:"proxmox"`
Privileged PrivilegedConfig `json:"privileged"`
LogLevel string `json:"log_level"` // debug|info|warn|error (default info)
}
// ProxmoxConfig configures the API client.
type ProxmoxConfig struct {
// Endpoint defaults to https://127.0.0.1:8006 (agent runs on the host).
Endpoint string `json:"endpoint"`
// Node is the Proxmox node name; confirm on the box (GET /nodes).
Node string `json:"node"`
// Token is the full API token "USER@REALM!TOKENID=SECRET".
//
// Provisioning note: this is a privilege-SEPARATED token. Its role
// (FelhomAgent, 16 privileges) must be granted on BOTH the user AND the token
// for the same path, or the intersection is empty and every call 403s
// (phase1-2 §1.2). Role setup is out-of-band; the agent only consumes the token.
Token string `json:"token"`
// TLS trust to the host's (self-signed) cert.
TLS TLSTrust `json:"tls"`
}
// TLSTrust mirrors proxmox.TLSConfig (kept dependency-free here).
type TLSTrust struct {
CAFile string `json:"ca_file"`
Fingerprint string `json:"fingerprint"` // SHA-256 of the host leaf cert
InsecureSkipVerify bool `json:"insecure_skip_verify"` // off by default; selftest-only
}
// PrivilegedConfig configures the fenced root-CLI runner.
type PrivilegedConfig struct {
// Mode: "sudo" (default — non-root agent + narrow sudoers) or "direct".
Mode string `json:"mode"`
// SudoPath overrides the sudo binary (default "sudo").
SudoPath string `json:"sudo_path"`
}
// Default returns a Config pre-populated with sane defaults.
func Default() Config {
return Config{
Proxmox: ProxmoxConfig{Endpoint: "https://127.0.0.1:8006"},
Privileged: PrivilegedConfig{Mode: "sudo"},
LogLevel: "info",
}
}
// Load reads the config file at path (if non-empty) over the defaults, then
// applies environment overrides. A missing path with all-env config is allowed.
func Load(path string) (Config, error) {
cfg := Default()
if path != "" {
b, err := os.ReadFile(path)
if err != nil {
return cfg, fmt.Errorf("config: reading %s: %w", path, err)
}
if err := json.Unmarshal(b, &cfg); err != nil {
return cfg, fmt.Errorf("config: parsing %s: %w", path, err)
}
}
applyEnv(&cfg)
return cfg, nil
}
// applyEnv overlays FELHOM_AGENT_* environment variables. Useful for the token in
// particular (keep the secret out of the file on disk if desired).
func applyEnv(cfg *Config) {
if v := os.Getenv("FELHOM_AGENT_PROXMOX_ENDPOINT"); v != "" {
cfg.Proxmox.Endpoint = v
}
if v := os.Getenv("FELHOM_AGENT_PROXMOX_NODE"); v != "" {
cfg.Proxmox.Node = v
}
if v := os.Getenv("FELHOM_AGENT_PROXMOX_TOKEN"); v != "" {
cfg.Proxmox.Token = v
}
if v := os.Getenv("FELHOM_AGENT_PROXMOX_TLS_CA_FILE"); v != "" {
cfg.Proxmox.TLS.CAFile = v
}
if v := os.Getenv("FELHOM_AGENT_PROXMOX_TLS_FINGERPRINT"); v != "" {
cfg.Proxmox.TLS.Fingerprint = v
}
if v := os.Getenv("FELHOM_AGENT_PROXMOX_TLS_INSECURE"); v != "" {
if b, err := strconv.ParseBool(v); err == nil {
cfg.Proxmox.TLS.InsecureSkipVerify = b
}
}
if v := os.Getenv("FELHOM_AGENT_LOG_LEVEL"); v != "" {
cfg.LogLevel = v
}
}
// Validate checks the config is usable for talking to the API.
func (c Config) Validate() error {
if c.Proxmox.Endpoint == "" {
return fmt.Errorf("config: proxmox.endpoint is required")
}
if c.Proxmox.Node == "" {
return fmt.Errorf("config: proxmox.node is required (confirm with `pvesh get /nodes`)")
}
if c.Proxmox.Token == "" {
return fmt.Errorf("config: proxmox.token is required (set proxmox.token or FELHOM_AGENT_PROXMOX_TOKEN)")
}
if !strings.Contains(c.Proxmox.Token, "!") || !strings.Contains(c.Proxmox.Token, "=") {
return fmt.Errorf("config: proxmox.token must be USER@REALM!TOKENID=SECRET")
}
return nil
}
// Redacted returns a copy safe to log: the token secret is masked.
func (c Config) Redacted() Config {
if c.Proxmox.Token != "" {
c.Proxmox.Token = redactToken(c.Proxmox.Token)
}
return c
}
// redactToken keeps the public "USER@REALM!TOKENID=" prefix and masks the secret.
func redactToken(tok string) string {
if i := strings.LastIndex(tok, "="); i >= 0 {
return tok[:i+1] + "********"
}
return "********"
}
+59
View File
@@ -0,0 +1,59 @@
package config
import (
"os"
"path/filepath"
"strings"
"testing"
)
func TestRedactedMasksSecret(t *testing.T) {
c := Default()
c.Proxmox.Token = "felhom-agent@pve!agent=b6547d9d-08ec-4f22-beb8-a551dc2cd69d"
got := c.Redacted().Proxmox.Token
if strings.Contains(got, "b6547d9d") {
t.Fatalf("secret leaked in redacted token: %q", got)
}
if !strings.HasPrefix(got, "felhom-agent@pve!agent=") {
t.Errorf("redacted token lost its public prefix: %q", got)
}
// The original must be untouched (Redacted returns a copy).
if !strings.Contains(c.Proxmox.Token, "b6547d9d") {
t.Errorf("Redacted mutated the original config")
}
}
func TestValidate(t *testing.T) {
c := Default()
c.Proxmox.Node = "demo-felhom"
c.Proxmox.Token = "felhom-agent@pve!agent=secret"
if err := c.Validate(); err != nil {
t.Fatalf("valid config rejected: %v", err)
}
c.Proxmox.Token = "no-bang-no-eq"
if err := c.Validate(); err == nil {
t.Errorf("malformed token accepted")
}
}
func TestLoadFileThenEnvOverride(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "agent.json")
if err := os.WriteFile(path, []byte(`{"proxmox":{"node":"file-node","token":"u@pve!t=filesecret"}}`), 0o600); err != nil {
t.Fatal(err)
}
t.Setenv("FELHOM_AGENT_PROXMOX_NODE", "env-node")
cfg, err := Load(path)
if err != nil {
t.Fatalf("Load: %v", err)
}
if cfg.Proxmox.Node != "env-node" {
t.Errorf("env did not override node: %q", cfg.Proxmox.Node)
}
if cfg.Proxmox.Token != "u@pve!t=filesecret" {
t.Errorf("token from file lost: %q", cfg.Proxmox.Token)
}
if cfg.Proxmox.Endpoint != "https://127.0.0.1:8006" {
t.Errorf("default endpoint lost: %q", cfg.Proxmox.Endpoint)
}
}
+28
View File
@@ -0,0 +1,28 @@
// Package log builds the agent's slog logger. Kept tiny on purpose; the agent is
// a host service, so logs go to stderr (journald-friendly). Secrets must never be
// passed to the logger — config is logged only via Config.Redacted (see config).
package log
import (
"log/slog"
"os"
"strings"
)
// New returns a text slog.Logger at the given level ("debug"|"info"|"warn"|
// "error"; unknown falls back to info), writing to stderr.
func New(level string) *slog.Logger {
var lvl slog.Level
switch strings.ToLower(level) {
case "debug":
lvl = slog.LevelDebug
case "warn", "warning":
lvl = slog.LevelWarn
case "error":
lvl = slog.LevelError
default:
lvl = slog.LevelInfo
}
h := slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: lvl})
return slog.New(h)
}
+154
View File
@@ -0,0 +1,154 @@
package proxmox
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
"strings"
"time"
)
// doer is the minimal HTTP surface the client needs; *http.Client satisfies it.
// Tests inject a mock to exercise decoding/error paths without a live host.
type doer interface {
Do(*http.Request) (*http.Response, error)
}
// Config configures a Client (the API backend).
type Config struct {
// Endpoint is the API base, e.g. "https://127.0.0.1:8006". The "/api2/json"
// suffix is added by the client.
Endpoint string
// Node is the Proxmox node name (e.g. "demo-felhom"). Confirm on the box
// (GET /nodes), never hard-code — see proxmox-platform.md §1.
Node string
// Token is the full API token "USER@REALM!TOKENID=SECRET". Never logged.
Token string
// TLS selects how the host cert is trusted.
TLS TLSConfig
// HTTPTimeout bounds a single HTTP round-trip (not a whole task wait).
// Defaults to 30s.
HTTPTimeout time.Duration
}
// Client is the API backend: a typed REST client for one Proxmox host. It is the
// default path for everything the scoped token can do. It never shells out.
type Client struct {
base string // "<endpoint>/api2/json"
node string
token string
http doer
}
// NewClient builds an API client. It validates required config and constructs the
// TLS-pinned transport.
func NewClient(cfg Config) (*Client, error) {
if cfg.Endpoint == "" {
return nil, fmt.Errorf("proxmox: endpoint is required")
}
if cfg.Node == "" {
return nil, fmt.Errorf("proxmox: node is required")
}
if cfg.Token == "" {
return nil, fmt.Errorf("proxmox: API token is required")
}
tlsCfg, err := cfg.TLS.build()
if err != nil {
return nil, err
}
timeout := cfg.HTTPTimeout
if timeout == 0 {
timeout = 30 * time.Second
}
hc := &http.Client{
Timeout: timeout,
Transport: &http.Transport{TLSClientConfig: tlsCfg},
}
return &Client{
base: strings.TrimRight(cfg.Endpoint, "/") + "/api2/json",
node: cfg.Node,
token: cfg.Token,
http: hc,
}, nil
}
// Node returns the configured node name.
func (c *Client) Node() string { return c.node }
// get performs GET <path> and decodes the {"data": ...} envelope into out.
func (c *Client) get(ctx context.Context, path string, out any) error {
return c.do(ctx, http.MethodGet, path, nil, out)
}
// postForm performs a form-encoded POST/PUT and decodes the envelope into out.
// out may be nil when the caller does not need the body.
func (c *Client) postForm(ctx context.Context, method, path string, params url.Values, out any) error {
var body io.Reader
if params != nil {
body = strings.NewReader(params.Encode())
}
return c.doBody(ctx, method, path, body, "application/x-www-form-urlencoded", out)
}
func (c *Client) do(ctx context.Context, method, path string, body io.Reader, out any) error {
return c.doBody(ctx, method, path, body, "", out)
}
// doBody is the single HTTP chokepoint: builds the request, sets auth, executes,
// maps non-2xx to APIError, and decodes the data envelope.
func (c *Client) doBody(ctx context.Context, method, path string, body io.Reader, contentType string, out any) error {
req, err := http.NewRequestWithContext(ctx, method, c.base+path, body)
if err != nil {
return fmt.Errorf("proxmox: building request: %w", err)
}
req.Header.Set("Authorization", "PVEAPIToken="+c.token)
req.Header.Set("Accept", "application/json")
if contentType != "" {
req.Header.Set("Content-Type", contentType)
}
resp, err := c.http.Do(req)
if err != nil {
return fmt.Errorf("proxmox: %s %s: %w", method, path, err)
}
defer resp.Body.Close()
raw, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("proxmox: reading %s %s response: %w", method, path, err)
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return newAPIError(resp.StatusCode, method, path, string(raw))
}
if out == nil {
return nil
}
var env struct {
Data json.RawMessage `json:"data"`
}
if err := json.Unmarshal(raw, &env); err != nil {
return fmt.Errorf("proxmox: decoding %s %s envelope: %w", method, path, err)
}
if len(env.Data) == 0 || bytes.Equal(env.Data, []byte("null")) {
return nil // no data (e.g. a sync PUT /config)
}
if err := json.Unmarshal(env.Data, out); err != nil {
return fmt.Errorf("proxmox: decoding %s %s data: %w", method, path, err)
}
return nil
}
// dataString runs a request expecting the "data" field to be a bare string
// (the UPID returned by async mutating ops). Returns "" with no error when the
// response carries no data (some sync ops).
func (c *Client) dataString(ctx context.Context, method, path string, params url.Values) (string, error) {
var s string
if err := c.postForm(ctx, method, path, params, &s); err != nil {
return "", err
}
return s, nil
}
+102
View File
@@ -0,0 +1,102 @@
package proxmox
import (
"context"
"errors"
"net/http"
"testing"
)
func TestAPIError_403ExtractsPrivilege(t *testing.T) {
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
return jsonResp(403, `{"message":"Permission check failed (/nodes/demo-felhom, Sys.Audit)\n"}`), nil
}}
_, err := newTestClient(d).NodeStatus(context.Background())
var ae *APIError
if !errors.As(err, &ae) {
t.Fatalf("want *APIError, got %T: %v", err, err)
}
if !ae.IsForbidden() {
t.Errorf("IsForbidden = false")
}
if ae.Privilege != "Sys.Audit" {
t.Errorf("privilege = %q, want Sys.Audit", ae.Privilege)
}
if ae.DeniedPath != "/nodes/demo-felhom" {
t.Errorf("denied path = %q", ae.DeniedPath)
}
}
func TestDecode_ListLXC(t *testing.T) {
// Exact shape captured from the live host.
body := `{"data":[{"cpu":0,"cpus":2,"disk":0,"maxdisk":10737418240,"maxmem":2147483648,"mem":0,"name":"spike-lxc","status":"stopped","type":"lxc","uptime":0,"vmid":9001}]}`
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) { return jsonResp(200, body), nil }}
gs, err := newTestClient(d).ListLXC(context.Background())
if err != nil {
t.Fatalf("ListLXC: %v", err)
}
if len(gs) != 1 {
t.Fatalf("len = %d", len(gs))
}
g := gs[0]
if g.VMID != 9001 || g.Name != "spike-lxc" || g.Status != "stopped" || g.CPUs != 2 {
t.Errorf("decoded guest wrong: %+v", g)
}
}
func TestDecode_NodeStatus(t *testing.T) {
body := `{"data":{"cpu":0.0057,"uptime":73078,"loadavg":["0.11","0.09","0.05"],"pveversion":"pve-manager/9.2.2","memory":{"total":16537989120,"used":2043027456,"free":13587857408,"available":14494961664},"rootfs":{"total":100861726720,"used":4943888384,"free":95917838336,"avail":90747101184},"cpuinfo":{"cores":4,"cpus":4,"sockets":1,"model":"Intel(R) N100"}}}`
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) { return jsonResp(200, body), nil }}
s, err := newTestClient(d).NodeStatus(context.Background())
if err != nil {
t.Fatalf("NodeStatus: %v", err)
}
if len(s.LoadAvg) != 3 || s.LoadAvg[0] != "0.11" {
t.Errorf("loadavg = %v", s.LoadAvg)
}
if s.Memory.Total != 16537989120 || s.CPUInfo.Cores != 4 {
t.Errorf("decoded node status wrong: %+v", s)
}
}
func TestDecode_GuestConfig_FeaturesAndExtra(t *testing.T) {
// keyctl must survive as a string; mpN/netN land in Extra.
body := `{"data":{"arch":"amd64","cores":2,"features":"nesting=1,keyctl=1","hostname":"spike-lxc","memory":2048,"net0":"name=eth0,bridge=vmbr0,hwaddr=BC:24:11:D1:6D:CB,ip=dhcp,type=veth","rootfs":"local-lvm:vm-9001-disk-0,size=10G","unprivileged":1,"mp0":"local-lvm:1,mp=/mnt/bulk,backup=0"}}`
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) { return jsonResp(200, body), nil }}
cfg, err := newTestClient(d).GuestConfig(context.Background(), 9001)
if err != nil {
t.Fatalf("GuestConfig: %v", err)
}
if cfg.Features != "nesting=1,keyctl=1" {
t.Errorf("features = %q", cfg.Features)
}
if cfg.Unprivileged != 1 {
t.Errorf("unprivileged = %d", cfg.Unprivileged)
}
if mp := cfg.MountPoints(); mp["mp0"] != "local-lvm:1,mp=/mnt/bulk,backup=0" {
t.Errorf("mountpoints = %v", mp)
}
if nets := cfg.Nets(); nets["net0"] == "" {
t.Errorf("nets = %v", nets)
}
// "memory" must NOT be misread as an mp/net prefix match.
if mp := cfg.MountPoints(); len(mp) != 1 {
t.Errorf("expected exactly 1 mountpoint, got %v", mp)
}
}
func TestDataString_ReturnsUPID(t *testing.T) {
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
if r.Method != http.MethodPost {
t.Errorf("method = %s", r.Method)
}
return jsonResp(200, `{"data":"`+testUPID+`"}`), nil
}}
upid, err := newTestClient(d).Snapshot(context.Background(), 9001, "s1", "")
if err != nil {
t.Fatalf("Snapshot: %v", err)
}
if upid != testUPID {
t.Errorf("upid = %q", upid)
}
}
+62
View File
@@ -0,0 +1,62 @@
// Package proxmox is the typed interaction layer the host agent uses to talk to
// a single Proxmox VE host. Every other agent module calls this package; it owns
// the API-first + fenced-root-CLI model the spikes proved
// (felhom.eu/documentation/proxmox-platform.md and tests/phase{0,1-2,3}-findings.md).
//
// # Two backends, one routing policy
//
// The package has two independent backends. Which path an operation takes is a
// fixed policy, not a per-call choice:
//
// - Client (API backend) — the default for everything the scoped FelhomAgent
// token can do. A hand-rolled REST client over https://<host>:8006/api2/json,
// auth header "Authorization: PVEAPIToken=USER@REALM!TOKENID=SECRET". Every
// mutating call is async: it returns a UPID and the caller polls the task with
// WaitTask until it stops, then asserts exitstatus == "OK". Authorization can
// surface at task execution, not the HTTP POST (phase1-2 §1.3) — so the POST's
// 200 is never trusted.
//
// - Privileged (root-CLI backend) — fenced to the three proven exceptions ONLY:
// (a) keyctl `pct create` for golden-image builds, (b) USB mount-by-UUID /
// fstab, (c) SMART / sensors reads. Each method cites why it cannot be the API.
//
// Client never shells out and Privileged never makes an HTTP call: the fence is
// structural (separate types, separate dependencies), and asserted in
// routing_test.go.
//
// # API-vs-root routing table (phase3-findings.md §B3 boundary)
//
// Operation Backend Why
// ------------------------------------------------- ----------- ----------------------------------
// node status / resources / metrics Client (API) Sys.Audit
// list guests + per-guest status/config Client (API) VM.Audit
// storage list + content Client (API) Datastore.Audit
// task status / log Client (API) task owner can read own task
// restore LXC from archive (PRIMARY create path) Client (API) VM.Allocate; restore preserves keyctl
// vzdump backup (stop/snapshot mode) Client (API) VM.Backup (stop-mode needs no PowerMgmt)
// snapshot / rollback / delete-snapshot Client (API) VM.Snapshot / VM.Snapshot.Rollback
// set config (mem/cpu/net/options/mountpoint) Client (API) VM.Config.*
// start / stop guest Client (API) VM.PowerMgmt
// ------------------------------------------------- ----------- ----------------------------------
// golden-image `pct create` with keyctl=1 Privileged keyctl is root@pam-only; no token qualifies
// USB mount-by-UUID / systemd mount unit / fstab Privileged host-level mount, not a Proxmox API op
// SMART / hardware sensors Privileged not API-exposed
//
// # Grounding notes for later slices (do not act on these here)
//
// - Provision-by-restore is the primary create path: a token-authorized restore
// preserves features=nesting=1,keyctl=1 (phase3 §B3); fresh `pct create` with
// keyctl is the only root-fenced create.
// - A Docker NAMED volume lives in the LXC rootfs (/var/lib/docker/volumes/<v>/_data)
// and is ALWAYS captured by vzdump. The backup=<bool> flag is honoured only for
// *volume* mount points; a bulk volume must be a dedicated backup=0 mountpoint or
// it is silently swept into the whole-guest image (phase3 §B2).
// - `pct restore` preserves the source MAC + hostname — reset network identity
// before starting alongside the original (phase1-2 §2.2).
// - An LXC has no guest agent, so snapshot-mode vzdump does NOT fsfreeze: an
// agent-initiated backup is crash-consistent only; app-consistency is the
// controller's job (quiesce, then POST /backup) (proxmox-platform.md §4.2).
//
// This slice (slice 1) wraps only the proven, read-tested op set. No reconcile
// loop, hub client, or signing — those are later slices.
package proxmox
+81
View File
@@ -0,0 +1,81 @@
package proxmox
import (
"fmt"
"regexp"
)
// permRe extracts the offending privilege (and path) from a Proxmox permission
// message, e.g. "Permission check failed (/vms/9000, VM.Backup)" or
// "403 Permission check failed (/sdn/zones/localnetwork/vmbr0, SDN.Use)".
var permRe = regexp.MustCompile(`Permission check failed \(([^,]+),\s*([^)]+)\)`)
// APIError is returned for a non-2xx HTTP response from the Proxmox API. On a 403
// it parses the offending path + privilege so a role misconfiguration is
// diagnosable (the FelhomAgent role is exactly 16 privileges — see doc.go).
type APIError struct {
StatusCode int
Method string
Path string // request path
Body string // response body (trimmed)
// Populated from a permission-check message when present:
DeniedPath string // ACL path, e.g. "/vms/9000"
Privilege string // e.g. "VM.Backup"
}
func (e *APIError) Error() string {
if e.Privilege != "" {
return fmt.Sprintf("proxmox: %s %s -> HTTP %d: permission denied at %s (missing privilege %s)",
e.Method, e.Path, e.StatusCode, e.DeniedPath, e.Privilege)
}
return fmt.Sprintf("proxmox: %s %s -> HTTP %d: %s", e.Method, e.Path, e.StatusCode, e.Body)
}
// IsForbidden reports whether this was an HTTP 403.
func (e *APIError) IsForbidden() bool { return e.StatusCode == 403 }
// newAPIError builds an APIError, extracting privilege info from the body.
func newAPIError(statusCode int, method, path, body string) *APIError {
e := &APIError{StatusCode: statusCode, Method: method, Path: path, Body: trimBody(body)}
if m := permRe.FindStringSubmatch(body); m != nil {
e.DeniedPath = m[1]
e.Privilege = m[2]
}
return e
}
// TaskError is returned by WaitTask when a task stops with a non-OK exitstatus.
// The authorization failure for a mutating op surfaces here (in the task
// exitstatus), not at the HTTP POST — so callers must always WaitTask.
type TaskError struct {
UPID string
ExitStatus string // e.g. "403 Permission check failed (/vms/9000, VM.Backup)"
LogTail []string // last lines of the task log, for diagnosis
DeniedPath string
Privilege string
}
func (e *TaskError) Error() string {
if e.Privilege != "" {
return fmt.Sprintf("proxmox: task %s failed: permission denied at %s (missing privilege %s)",
e.UPID, e.DeniedPath, e.Privilege)
}
return fmt.Sprintf("proxmox: task %s failed: exitstatus %q", e.UPID, e.ExitStatus)
}
func newTaskError(upid, exitStatus string, logTail []string) *TaskError {
e := &TaskError{UPID: upid, ExitStatus: exitStatus, LogTail: logTail}
if m := permRe.FindStringSubmatch(exitStatus); m != nil {
e.DeniedPath = m[1]
e.Privilege = m[2]
}
return e
}
func trimBody(s string) string {
const max = 512
if len(s) > max {
return s[:max] + "…"
}
return s
}
+50
View File
@@ -0,0 +1,50 @@
package proxmox
import (
"context"
"io"
"net/http"
"strings"
)
// mockDoer is an injectable HTTP transport for the API client. It records call
// count and routes each request to fn.
type mockDoer struct {
calls int
fn func(*http.Request) (*http.Response, error)
}
func (m *mockDoer) Do(r *http.Request) (*http.Response, error) {
m.calls++
return m.fn(r)
}
// jsonResp builds an HTTP response with a JSON body.
func jsonResp(code int, body string) *http.Response {
return &http.Response{
StatusCode: code,
Body: io.NopCloser(strings.NewReader(body)),
Header: http.Header{"Content-Type": []string{"application/json"}},
}
}
// newTestClient wraps a mockDoer in a Client (bypassing NewClient's real transport).
func newTestClient(d doer) *Client {
return &Client{base: "https://host:8006/api2/json", node: "demo-felhom", token: "u@pve!t=secret", http: d}
}
// mockRunner records privileged command invocations and returns canned output.
type mockRunner struct {
calls int
lastCmd string
lastArg []string
out []byte
err error
}
func (m *mockRunner) Run(_ context.Context, name string, args ...string) ([]byte, []byte, error) {
m.calls++
m.lastCmd = name
m.lastArg = args
return m.out, nil, m.err
}
+148
View File
@@ -0,0 +1,148 @@
package proxmox
import (
"context"
"fmt"
"net/http"
"net/url"
"strconv"
)
// Async mutating operations. Each is API-token-covered (the FelhomAgent role) and
// returns a UPID string; the caller MUST WaitTask on it and assert exitstatus OK.
// The HTTP 200 here is not proof of success (phase1-2 §1.3).
// BackupMode is the vzdump mode.
type BackupMode string
const (
// ModeStop: orderly guest shutdown -> backup -> restart. Highest consistency.
// For LXC the shutdown/restart is internal to vzdump and needs only VM.Backup
// (NOT VM.PowerMgmt) — phase1-2 §1.4.
ModeStop BackupMode = "stop"
// ModeSnapshot: lowest downtime; for an LXC this is crash-consistent only (no
// fsfreeze) — app-consistency is the controller's job (proxmox-platform.md §4.2).
ModeSnapshot BackupMode = "snapshot"
)
// RestoreLXCOptions parameterizes a restore. This is the PRIMARY create path:
// a token-authorized restore preserves features=nesting=1,keyctl=1 from the
// archive, so it needs no root (phase3 §B3). Fresh `pct create` with keyctl is
// the only root-fenced create (see Privileged.CreateGoldenLXC).
type RestoreLXCOptions struct {
VMID int // target VMID (fresh id)
Archive string // source archive volid, e.g. "local:backup/vzdump-lxc-9001-...tar.zst"
Storage string // target storage for the rootfs, e.g. "local-lvm"
Force bool // overwrite an existing VMID (destructive — caller must have authority)
}
// RestoreLXC restores an LXC from a vzdump/PBS archive via POST /nodes/{node}/lxc
// (restore=1). Returns the UPID. NOTE: pct restore preserves the source MAC +
// hostname — reset network identity before starting alongside the original
// (phase1-2 §2.2). Identity reset is a SetConfig call the caller makes after.
func (c *Client) RestoreLXC(ctx context.Context, opts RestoreLXCOptions) (string, error) {
if opts.VMID == 0 || opts.Archive == "" || opts.Storage == "" {
return "", fmt.Errorf("proxmox: RestoreLXC needs vmid, archive and storage")
}
v := url.Values{}
v.Set("vmid", strconv.Itoa(opts.VMID))
v.Set("ostemplate", opts.Archive) // pct restore source
v.Set("restore", "1")
v.Set("storage", opts.Storage)
if opts.Force {
v.Set("force", "1")
}
return c.dataString(ctx, http.MethodPost, "/nodes/"+c.node+"/lxc", v)
}
// VzdumpOptions parameterizes a backup.
type VzdumpOptions struct {
VMID int
Storage string // a storage whose content includes "backup" (e.g. "local") — NOT local-lvm
Mode BackupMode // ModeStop | ModeSnapshot
Compress string // "zstd" (default), "lzo", "gzip", or "" for none
}
// Vzdump starts a backup via POST /nodes/{node}/vzdump. Returns the UPID. An
// agent-initiated vzdump is crash-consistent only for an LXC (no fsfreeze).
func (c *Client) Vzdump(ctx context.Context, opts VzdumpOptions) (string, error) {
if opts.VMID == 0 || opts.Storage == "" || opts.Mode == "" {
return "", fmt.Errorf("proxmox: Vzdump needs vmid, storage and mode")
}
v := url.Values{}
v.Set("vmid", strconv.Itoa(opts.VMID))
v.Set("storage", opts.Storage)
v.Set("mode", string(opts.Mode))
if opts.Compress == "" {
opts.Compress = "zstd"
}
v.Set("compress", opts.Compress)
return c.dataString(ctx, http.MethodPost, "/nodes/"+c.node+"/vzdump", v)
}
// Snapshot creates an LXC snapshot via POST /nodes/{node}/lxc/{vmid}/snapshot.
// A running, unprivileged LXC can be snapshotted on LVM-thin with no stop
// (phase1-2 §1.6) — this is the snapshot-before-change primitive.
func (c *Client) Snapshot(ctx context.Context, vmid int, snapname, description string) (string, error) {
if vmid == 0 || snapname == "" {
return "", fmt.Errorf("proxmox: Snapshot needs vmid and snapname")
}
v := url.Values{}
v.Set("snapname", snapname)
if description != "" {
v.Set("description", description)
}
path := fmt.Sprintf("/nodes/%s/lxc/%d/snapshot", c.node, vmid)
return c.dataString(ctx, http.MethodPost, path, v)
}
// Rollback rolls an LXC back to a snapshot via
// POST /nodes/{node}/lxc/{vmid}/snapshot/{snap}/rollback.
func (c *Client) Rollback(ctx context.Context, vmid int, snapname string) (string, error) {
if vmid == 0 || snapname == "" {
return "", fmt.Errorf("proxmox: Rollback needs vmid and snapname")
}
path := fmt.Sprintf("/nodes/%s/lxc/%d/snapshot/%s/rollback", c.node, vmid, url.PathEscape(snapname))
return c.dataString(ctx, http.MethodPost, path, url.Values{})
}
// DeleteSnapshot removes an LXC snapshot via
// DELETE /nodes/{node}/lxc/{vmid}/snapshot/{snap}.
func (c *Client) DeleteSnapshot(ctx context.Context, vmid int, snapname string) (string, error) {
if vmid == 0 || snapname == "" {
return "", fmt.Errorf("proxmox: DeleteSnapshot needs vmid and snapname")
}
path := fmt.Sprintf("/nodes/%s/lxc/%d/snapshot/%s", c.node, vmid, url.PathEscape(snapname))
return c.dataString(ctx, http.MethodDelete, path, nil)
}
// SetConfig applies config changes via PUT /nodes/{node}/lxc/{vmid}/config
// (e.g. memory, cores, net0, mpN with a backup flag). PVE may apply this
// synchronously (no UPID) — the returned string is empty in that case, and "" is
// not an error. When a UPID is returned, WaitTask on it.
//
// Identity reset after a restore (phase1-2 §2.2) is a SetConfig with
// params{"net0": "name=eth0,bridge=vmbr0,ip=dhcp"} (regenerates the MAC).
func (c *Client) SetConfig(ctx context.Context, vmid int, params map[string]string) (string, error) {
if vmid == 0 || len(params) == 0 {
return "", fmt.Errorf("proxmox: SetConfig needs vmid and at least one param")
}
v := url.Values{}
for k, val := range params {
v.Set(k, val)
}
path := fmt.Sprintf("/nodes/%s/lxc/%d/config", c.node, vmid)
return c.dataString(ctx, http.MethodPut, path, v)
}
// Start starts a guest via POST /nodes/{node}/lxc/{vmid}/status/start (VM.PowerMgmt).
func (c *Client) Start(ctx context.Context, vmid int) (string, error) {
path := fmt.Sprintf("/nodes/%s/lxc/%d/status/start", c.node, vmid)
return c.dataString(ctx, http.MethodPost, path, url.Values{})
}
// Stop stops a guest via POST /nodes/{node}/lxc/{vmid}/status/stop (VM.PowerMgmt).
func (c *Client) Stop(ctx context.Context, vmid int) (string, error) {
path := fmt.Sprintf("/nodes/%s/lxc/%d/status/stop", c.node, vmid)
return c.dataString(ctx, http.MethodPost, path, url.Values{})
}
+203
View File
@@ -0,0 +1,203 @@
package proxmox
import (
"context"
"encoding/json"
"fmt"
"os/exec"
"strconv"
)
// The Privileged backend is fenced to the THREE proven OS-root exceptions only
// (phase3 §B3 boundary, doc.go routing table):
//
// (a) keyctl `pct create` for golden-image builds,
// (b) USB mount-by-UUID / fstab,
// (c) SMART / sensors reads.
//
// It runs host commands through a Runner (direct exec or sudo). It makes NO HTTP
// call — the fence between API ops and root ops is structural: Client owns the
// API, Privileged owns the shell. routing_test.go asserts neither crosses over.
//
// Everything else — the entire guest lifecycle including restore — goes through
// the API Client. Do NOT add non-exception methods here.
// Runner executes a host command and returns its stdout/stderr. *ExecRunner is the
// production implementation; tests inject a mock to assert which commands ran.
type Runner interface {
Run(ctx context.Context, name string, args ...string) (stdout, stderr []byte, err error)
}
// RunnerMode selects how privileged commands are executed.
type RunnerMode string
const (
// RunnerDirect: exec the binary directly (agent already runs as root — not the
// recommended uid model, see README; useful in dev/CI).
RunnerDirect RunnerMode = "direct"
// RunnerSudo: prefix with sudo (the intended model — agent runs as a non-root
// service user with a narrow sudoers allowlist, 03 §3/§12).
RunnerSudo RunnerMode = "sudo"
)
// ExecRunner runs commands via os/exec, optionally through sudo.
type ExecRunner struct {
Mode RunnerMode
SudoPath string // defaults to "sudo" when Mode == RunnerSudo
}
// Run implements Runner.
func (r *ExecRunner) Run(ctx context.Context, name string, args ...string) ([]byte, []byte, error) {
var cmd *exec.Cmd
if r.Mode == RunnerSudo {
sudo := r.SudoPath
if sudo == "" {
sudo = "sudo"
}
cmd = exec.CommandContext(ctx, sudo, append([]string{"-n", name}, args...)...)
} else {
cmd = exec.CommandContext(ctx, name, args...)
}
var stdout, stderr capBuf
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
return stdout.b, stderr.b, err
}
// Privileged is the root-CLI backend.
type Privileged struct {
runner Runner
node string
}
// NewPrivileged builds the fenced root backend.
func NewPrivileged(runner Runner, node string) *Privileged {
return &Privileged{runner: runner, node: node}
}
// GoldenLXCSpec describes a golden-base CT to build fresh.
type GoldenLXCSpec struct {
VMID int
OSTemplate string // CT template volid, e.g. "local:vztmpl/debian-13-standard_..._amd64.tar.zst"
Storage string // rootfs storage, e.g. "local-lvm"
RootFSGB int
Cores int
MemoryMB int
Hostname string
// Features is forced to "nesting=1,keyctl=1" — keyctl is exactly why this is
// root-fenced.
}
// CreateGoldenLXC builds a Docker-capable golden base CT with keyctl=1.
//
// WHY THIS CANNOT BE THE API: setting feature flags other than `nesting` on
// create is `root@pam`-only — `changing feature flags (except nesting) is only
// allowed for root@pam`. No API token qualifies, not even a non-privsep root@pam
// token (same 403). This is the ONLY root-fenced create; the per-customer path
// provisions by restore, which preserves keyctl with no root (phase3 §B3).
//
// This is a one-time/maintenance op at enrollment (03 §9), off the per-customer path.
func (p *Privileged) CreateGoldenLXC(ctx context.Context, spec GoldenLXCSpec) error {
if spec.VMID == 0 || spec.OSTemplate == "" || spec.Storage == "" {
return fmt.Errorf("proxmox: CreateGoldenLXC needs vmid, ostemplate and storage")
}
rootfs := spec.Storage
if spec.RootFSGB > 0 {
rootfs = fmt.Sprintf("%s:%d", spec.Storage, spec.RootFSGB)
}
args := []string{
"create", strconv.Itoa(spec.VMID), spec.OSTemplate,
"--unprivileged", "1",
"--features", "nesting=1,keyctl=1",
"--rootfs", rootfs,
}
if spec.Cores > 0 {
args = append(args, "--cores", strconv.Itoa(spec.Cores))
}
if spec.MemoryMB > 0 {
args = append(args, "--memory", strconv.Itoa(spec.MemoryMB))
}
if spec.Hostname != "" {
args = append(args, "--hostname", spec.Hostname)
}
return p.run(ctx, "pct", args...)
}
// MountUSBByUUID mounts a filesystem by UUID at target (creating the mountpoint).
//
// WHY THIS CANNOT BE THE API: a physical host mount is not a Proxmox API op; it is
// a host-level mount handled by OS root / a narrow sudoers entry (phase3 §B3).
// fstab persistence is a later-slice concern (03 §7 storage manifest).
func (p *Privileged) MountUSBByUUID(ctx context.Context, uuid, target string) error {
if uuid == "" || target == "" {
return fmt.Errorf("proxmox: MountUSBByUUID needs uuid and target")
}
if err := p.run(ctx, "mkdir", "-p", target); err != nil {
return err
}
return p.run(ctx, "mount", "UUID="+uuid, target)
}
// SMART returns parsed `smartctl -a -j` JSON for a device.
//
// WHY THIS CANNOT BE THE API: disk SMART data is not exposed by the Proxmox API;
// it is read with OS root via smartctl (phase3 §B3).
func (p *Privileged) SMART(ctx context.Context, device string) (map[string]any, error) {
if device == "" {
return nil, fmt.Errorf("proxmox: SMART needs a device")
}
out, stderr, err := p.runner.Run(ctx, "smartctl", "-a", "-j", device)
if err != nil {
// smartctl uses nonzero exit codes as bitmask warnings even on success;
// trust parseable JSON output over the exit code.
if len(out) == 0 {
return nil, fmt.Errorf("proxmox: smartctl %s: %w: %s", device, err, stderr)
}
}
var m map[string]any
if err := json.Unmarshal(out, &m); err != nil {
return nil, fmt.Errorf("proxmox: parsing smartctl JSON: %w", err)
}
return m, nil
}
// Sensors returns parsed `sensors -j` JSON (hardware temperatures/fans).
//
// WHY THIS CANNOT BE THE API: hardware sensors are not API-exposed (phase3 §B3).
func (p *Privileged) Sensors(ctx context.Context) (map[string]any, error) {
out, stderr, err := p.runner.Run(ctx, "sensors", "-j")
if err != nil && len(out) == 0 {
return nil, fmt.Errorf("proxmox: sensors: %w: %s", err, stderr)
}
var m map[string]any
if err := json.Unmarshal(out, &m); err != nil {
return nil, fmt.Errorf("proxmox: parsing sensors JSON: %w", err)
}
return m, nil
}
// run executes a command and wraps a nonzero exit with its stderr.
func (p *Privileged) run(ctx context.Context, name string, args ...string) error {
_, stderr, err := p.runner.Run(ctx, name, args...)
if err != nil {
return fmt.Errorf("proxmox: %s %v: %w: %s", name, args, err, trimBody(string(stderr)))
}
return nil
}
// capBuf is a tiny capped buffer so a runaway command can't blow memory.
type capBuf struct{ b []byte }
func (c *capBuf) Write(p []byte) (int, error) {
const max = 1 << 20 // 1 MiB
if len(c.b) < max {
room := max - len(c.b)
if room >= len(p) {
c.b = append(c.b, p...)
} else {
c.b = append(c.b, p[:room]...)
}
}
return len(p), nil // always report full consumption
}
+78
View File
@@ -0,0 +1,78 @@
package proxmox
import (
"context"
"fmt"
"net/url"
)
// Read-only query operations. All API-backed (Datastore.Audit / VM.Audit /
// Sys.Audit). These are what `felhom-agent --selftest` exercises against a live
// host — they mutate nothing.
// Version returns GET /version.
func (c *Client) Version(ctx context.Context) (Version, error) {
var v Version
return v, c.get(ctx, "/version", &v)
}
// Nodes returns GET /nodes. Use this to confirm the node name and read each
// node's ssl_fingerprint (which is what to pin in TLSConfig).
func (c *Client) Nodes(ctx context.Context) ([]Node, error) {
var ns []Node
return ns, c.get(ctx, "/nodes", &ns)
}
// NodeStatus returns GET /nodes/{node}/status (host metrics; needs Sys.Audit).
func (c *Client) NodeStatus(ctx context.Context) (NodeStatus, error) {
var s NodeStatus
return s, c.get(ctx, "/nodes/"+c.node+"/status", &s)
}
// ListLXC returns GET /nodes/{node}/lxc (the guests on this node).
func (c *Client) ListLXC(ctx context.Context) ([]Guest, error) {
var gs []Guest
return gs, c.get(ctx, "/nodes/"+c.node+"/lxc", &gs)
}
// GuestStatus returns GET /nodes/{node}/lxc/{vmid}/status/current. The API body
// has no vmid field (it is in the path), so it is set from the argument.
func (c *Client) GuestStatus(ctx context.Context, vmid int) (Guest, error) {
var g Guest
path := fmt.Sprintf("/nodes/%s/lxc/%d/status/current", c.node, vmid)
if err := c.get(ctx, path, &g); err != nil {
return Guest{}, err
}
g.VMID = vmid
return g, nil
}
// GuestConfig returns GET /nodes/{node}/lxc/{vmid}/config.
func (c *Client) GuestConfig(ctx context.Context, vmid int) (GuestConfig, error) {
var cfg GuestConfig
path := fmt.Sprintf("/nodes/%s/lxc/%d/config", c.node, vmid)
return cfg, c.get(ctx, path, &cfg)
}
// ListStorage returns GET /storage (cluster-wide storage definitions).
func (c *Client) ListStorage(ctx context.Context) ([]Storage, error) {
var ss []Storage
return ss, c.get(ctx, "/storage", &ss)
}
// NodeStorage returns GET /nodes/{node}/storage (storage with live usage).
func (c *Client) NodeStorage(ctx context.Context) ([]Storage, error) {
var ss []Storage
return ss, c.get(ctx, "/nodes/"+c.node+"/storage", &ss)
}
// StorageContent returns GET /nodes/{node}/storage/{store}/content (e.g. vzdump
// archives + CT templates available for a restore).
func (c *Client) StorageContent(ctx context.Context, store string) ([]StorageContent, error) {
var cs []StorageContent
path := fmt.Sprintf("/nodes/%s/storage/%s/content", c.node, url.PathEscape(store))
return cs, c.get(ctx, path, &cs)
}
// urlEscape escapes a path segment (a UPID contains ':' and '@').
func urlEscape(s string) string { return url.PathEscape(s) }
+97
View File
@@ -0,0 +1,97 @@
package proxmox
import (
"context"
"net/http"
"testing"
)
// TestRouting_APIOpsNeverShellOut asserts the API path never invokes the
// privileged runner: API ops (read + mutating) go only through the HTTP doer.
func TestRouting_APIOpsNeverShellOut(t *testing.T) {
runner := &mockRunner{}
// If any API op tried to use a runner, it would have to be wired here — it
// cannot be, because Client has no runner field. We still assert structurally:
// run a batch of API ops with a recording doer and confirm the runner is idle.
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
// Generic OK responses sufficient for the calls below.
if r.Method == http.MethodGet {
return jsonResp(200, `{"data":[]}`), nil
}
return jsonResp(200, `{"data":"`+testUPID+`"}`), nil
}}
c := newTestClient(d)
ctx := context.Background()
_, _ = c.Version(ctx)
_, _ = c.Nodes(ctx)
_, _ = c.ListLXC(ctx)
_, _ = c.NodeStorage(ctx)
_, _ = c.Snapshot(ctx, 9001, "s1", "")
_, _ = c.Rollback(ctx, 9001, "s1")
_, _ = c.Vzdump(ctx, VzdumpOptions{VMID: 9001, Storage: "local", Mode: ModeStop})
_, _ = c.RestoreLXC(ctx, RestoreLXCOptions{VMID: 9100, Archive: "local:backup/a.tar.zst", Storage: "local-lvm"})
_, _ = c.Start(ctx, 9001)
_, _ = c.Stop(ctx, 9001)
if runner.calls != 0 {
t.Fatalf("API ops invoked the privileged runner %d time(s) — fence broken", runner.calls)
}
if d.calls == 0 {
t.Fatalf("expected API ops to use the HTTP doer")
}
}
// TestRouting_PrivilegedOpsNeverHTTP asserts the fenced root path never makes an
// HTTP call: Privileged ops go only through the runner.
func TestRouting_PrivilegedOpsNeverHTTP(t *testing.T) {
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
t.Fatalf("privileged op made an HTTP call to %s — fence broken", r.URL)
return nil, nil
}}
_ = d // a Privileged has no doer field; this doer is unreachable by construction.
runner := &mockRunner{out: []byte(`{"ok":true}`)}
p := NewPrivileged(runner, "demo-felhom")
ctx := context.Background()
if err := p.CreateGoldenLXC(ctx, GoldenLXCSpec{VMID: 9999, OSTemplate: "local:vztmpl/x.tar.zst", Storage: "local-lvm"}); err != nil {
t.Fatalf("CreateGoldenLXC: %v", err)
}
if err := p.MountUSBByUUID(ctx, "1234-ABCD", "/mnt/usb"); err != nil {
t.Fatalf("MountUSBByUUID: %v", err)
}
if _, err := p.SMART(ctx, "/dev/sda"); err != nil {
t.Fatalf("SMART: %v", err)
}
if _, err := p.Sensors(ctx); err != nil {
t.Fatalf("Sensors: %v", err)
}
if runner.calls == 0 {
t.Fatalf("expected privileged ops to use the runner")
}
}
// TestPrivileged_CreateGoldenForcesKeyctl asserts the golden create always carries
// the keyctl feature flag (the whole reason it is root-fenced).
func TestPrivileged_CreateGoldenForcesKeyctl(t *testing.T) {
runner := &mockRunner{}
p := NewPrivileged(runner, "demo-felhom")
if err := p.CreateGoldenLXC(context.Background(), GoldenLXCSpec{
VMID: 9999, OSTemplate: "local:vztmpl/x.tar.zst", Storage: "local-lvm", RootFSGB: 8,
}); err != nil {
t.Fatalf("CreateGoldenLXC: %v", err)
}
if runner.lastCmd != "pct" {
t.Errorf("cmd = %q, want pct", runner.lastCmd)
}
var sawFeatures bool
for i, a := range runner.lastArg {
if a == "--features" && i+1 < len(runner.lastArg) && runner.lastArg[i+1] == "nesting=1,keyctl=1" {
sawFeatures = true
}
}
if !sawFeatures {
t.Errorf("pct create args missing keyctl features: %v", runner.lastArg)
}
}
+141
View File
@@ -0,0 +1,141 @@
package proxmox
import (
"context"
"fmt"
"time"
)
// TaskStatus is GET /nodes/{node}/tasks/{upid}/status. While the task runs,
// Status == "running" and ExitStatus is empty; once it stops, Status == "stopped"
// and ExitStatus is "OK" or an error string (e.g. a 403 permission message).
type TaskStatus struct {
UPID string `json:"upid"`
ID string `json:"id"`
Node string `json:"node"`
Type string `json:"type"`
User string `json:"user"`
Status string `json:"status"` // "running" | "stopped"
ExitStatus string `json:"exitstatus"` // present once stopped
PID int64 `json:"pid"`
StartTime int64 `json:"starttime"`
}
// Running reports whether the task is still executing.
func (t TaskStatus) Running() bool { return t.Status == "running" }
// OK reports whether the task stopped successfully.
func (t TaskStatus) OK() bool { return t.Status == "stopped" && t.ExitStatus == "OK" }
// taskLogLine is one entry of GET /nodes/{node}/tasks/{upid}/log: {"n":N,"t":"..."}.
type taskLogLine struct {
N int `json:"n"`
T string `json:"t"`
}
// WaitOptions tunes WaitTask polling. Zero value yields sane defaults.
type WaitOptions struct {
// Interval is the first poll gap (default 1s).
Interval time.Duration
// MaxInterval caps the backed-off gap (default 5s).
MaxInterval time.Duration
// Timeout bounds the whole wait (default 10m). Restore/vzdump can be slow;
// callers may raise it. A zero/elapsed context deadline also stops the wait.
Timeout time.Duration
}
func (o WaitOptions) withDefaults() WaitOptions {
if o.Interval <= 0 {
o.Interval = 1 * time.Second
}
if o.MaxInterval <= 0 {
o.MaxInterval = 5 * time.Second
}
if o.Timeout <= 0 {
o.Timeout = 10 * time.Minute
}
return o
}
// TaskStatusOnce fetches the current task status (one HTTP call).
func (c *Client) TaskStatusOnce(ctx context.Context, upid string) (TaskStatus, error) {
u, err := ParseUPID(upid)
if err != nil {
return TaskStatus{}, err
}
var st TaskStatus
path := fmt.Sprintf("/nodes/%s/tasks/%s/status", u.Node, urlEscape(upid))
if err := c.get(ctx, path, &st); err != nil {
return TaskStatus{}, err
}
return st, nil
}
// TaskLogTail fetches up to limit trailing log lines for a task (for diagnosis).
func (c *Client) TaskLogTail(ctx context.Context, upid string, limit int) ([]string, error) {
u, err := ParseUPID(upid)
if err != nil {
return nil, err
}
if limit <= 0 {
limit = 20
}
var lines []taskLogLine
path := fmt.Sprintf("/nodes/%s/tasks/%s/log?limit=%d", u.Node, urlEscape(upid), limit)
if err := c.get(ctx, path, &lines); err != nil {
return nil, err
}
out := make([]string, 0, len(lines))
for _, l := range lines {
out = append(out, l.T)
}
return out, nil
}
// WaitTask polls a task until it stops, then asserts exitstatus == "OK". On any
// non-OK exit it returns a *TaskError carrying the exitstatus, the parsed
// privilege (if it was a permission failure), and a tail of the task log.
//
// This is the contract for EVERY mutating op: the POST's HTTP 200 is not proof of
// success — authorization can fail at task execution (phase1-2 §1.3).
func (c *Client) WaitTask(ctx context.Context, upid string, opts WaitOptions) (TaskStatus, error) {
opts = opts.withDefaults()
if _, err := ParseUPID(upid); err != nil {
return TaskStatus{}, err
}
ctx, cancel := context.WithTimeout(ctx, opts.Timeout)
defer cancel()
interval := opts.Interval
timer := time.NewTimer(0) // first poll immediately
defer timer.Stop()
for {
select {
case <-ctx.Done():
return TaskStatus{}, fmt.Errorf("proxmox: waiting for task %s: %w", upid, ctx.Err())
case <-timer.C:
}
st, err := c.TaskStatusOnce(ctx, upid)
if err != nil {
return TaskStatus{}, err
}
if st.Running() || st.Status == "" {
// back off, capped
interval *= 2
if interval > opts.MaxInterval {
interval = opts.MaxInterval
}
timer.Reset(interval)
continue
}
// stopped
if st.ExitStatus == "OK" {
return st, nil
}
tail, _ := c.TaskLogTail(ctx, upid, 20) // best-effort
return st, newTaskError(upid, st.ExitStatus, tail)
}
}
+81
View File
@@ -0,0 +1,81 @@
package proxmox
import (
"context"
"errors"
"net/http"
"strings"
"testing"
"time"
)
const testUPID = "UPID:demo-felhom:00026454:004E3431:6A265E53:vzsnapshot:9001:root@pam:"
// fastWait keeps tests quick.
var fastWait = WaitOptions{Interval: time.Millisecond, MaxInterval: 2 * time.Millisecond, Timeout: time.Second}
func TestWaitTask_RunningThenOK(t *testing.T) {
var n int
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
n++
if n == 1 {
return jsonResp(200, `{"data":{"upid":"`+testUPID+`","status":"running"}}`), nil
}
return jsonResp(200, `{"data":{"upid":"`+testUPID+`","status":"stopped","exitstatus":"OK"}}`), nil
}}
st, err := newTestClient(d).WaitTask(context.Background(), testUPID, fastWait)
if err != nil {
t.Fatalf("WaitTask: %v", err)
}
if !st.OK() {
t.Errorf("status not OK: %+v", st)
}
}
func TestWaitTask_FailedSurfacesPrivilege(t *testing.T) {
// vzdump against an unauthorized vmid: 200+UPID, then the 403 in exitstatus.
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
if strings.Contains(r.URL.Path, "/log") {
return jsonResp(200, `{"data":[{"n":1,"t":"TASK ERROR: 403 Permission check failed (/vms/9000, VM.Backup)"}]}`), nil
}
return jsonResp(200, `{"data":{"upid":"`+testUPID+`","status":"stopped","exitstatus":"403 Permission check failed (/vms/9000, VM.Backup)"}}`), nil
}}
_, err := newTestClient(d).WaitTask(context.Background(), testUPID, fastWait)
var te *TaskError
if !errors.As(err, &te) {
t.Fatalf("want *TaskError, got %T: %v", err, err)
}
if te.Privilege != "VM.Backup" {
t.Errorf("privilege = %q, want VM.Backup", te.Privilege)
}
if te.DeniedPath != "/vms/9000" {
t.Errorf("denied path = %q", te.DeniedPath)
}
if len(te.LogTail) == 0 {
t.Errorf("expected a log tail")
}
}
func TestWaitTask_Timeout(t *testing.T) {
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
return jsonResp(200, `{"data":{"upid":"`+testUPID+`","status":"running"}}`), nil
}}
opts := WaitOptions{Interval: time.Millisecond, MaxInterval: time.Millisecond, Timeout: 30 * time.Millisecond}
_, err := newTestClient(d).WaitTask(context.Background(), testUPID, opts)
if err == nil || !errors.Is(err, context.DeadlineExceeded) {
t.Fatalf("want deadline-exceeded, got %v", err)
}
}
func TestWaitTask_CtxCancel(t *testing.T) {
d := &mockDoer{fn: func(r *http.Request) (*http.Response, error) {
return jsonResp(200, `{"data":{"upid":"`+testUPID+`","status":"running"}}`), nil
}}
ctx, cancel := context.WithCancel(context.Background())
go func() { time.Sleep(20 * time.Millisecond); cancel() }()
opts := WaitOptions{Interval: time.Millisecond, MaxInterval: time.Millisecond, Timeout: time.Minute}
_, err := newTestClient(d).WaitTask(ctx, testUPID, opts)
if err == nil || !errors.Is(err, context.Canceled) {
t.Fatalf("want canceled, got %v", err)
}
}
+88
View File
@@ -0,0 +1,88 @@
package proxmox
import (
"crypto/sha256"
"crypto/tls"
"crypto/x509"
"encoding/hex"
"fmt"
"os"
"strings"
)
// TLSConfig describes how the client trusts the Proxmox host's certificate. The
// host serves a self-signed cert by default (proxmox-platform.md §3.1); we do NOT
// blanket-disable verification. Pick exactly one trust mechanism:
//
// - CAFile: path to a PEM bundle (the PVE CA / a real cert chain) — full verify.
// - Fingerprint: SHA-256 of the leaf cert (hex, colons optional). Verification is
// pinned to that exact cert — strong trust for a self-signed host without a CA.
// The /nodes API returns each node's ssl_fingerprint, which is what to pin.
// - InsecureSkipVerify: explicitly off by default. Only acceptable for a
// --selftest against 127.0.0.1; it is named honestly, not hidden behind a flag
// that sounds benign.
//
// If none is set, standard system verification applies (which will fail on a
// self-signed host — that is the safe default; the operator must pin).
type TLSConfig struct {
CAFile string
Fingerprint string
InsecureSkipVerify bool
}
func (t TLSConfig) build() (*tls.Config, error) {
switch {
case t.InsecureSkipVerify:
// Caller opted in explicitly and by an honestly-named field.
return &tls.Config{InsecureSkipVerify: true}, nil //nolint:gosec // documented, config-gated, off by default
case t.Fingerprint != "":
want, err := normalizeFingerprint(t.Fingerprint)
if err != nil {
return nil, err
}
// Pin to the leaf cert's SHA-256. We disable the default chain check (a
// self-signed cert has no CA) but enforce an exact-cert match instead, so
// this is pinning, not "skip verify".
return &tls.Config{
InsecureSkipVerify: true, //nolint:gosec // replaced by the pin check below
VerifyPeerCertificate: func(rawCerts [][]byte, _ [][]*x509.Certificate) error {
if len(rawCerts) == 0 {
return fmt.Errorf("proxmox: TLS pin: peer presented no certificate")
}
got := sha256.Sum256(rawCerts[0])
if hex.EncodeToString(got[:]) != want {
return fmt.Errorf("proxmox: TLS pin mismatch: server cert sha256 does not match configured fingerprint")
}
return nil
},
}, nil
case t.CAFile != "":
pem, err := os.ReadFile(t.CAFile)
if err != nil {
return nil, fmt.Errorf("proxmox: reading TLS CA file: %w", err)
}
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(pem) {
return nil, fmt.Errorf("proxmox: TLS CA file %q contained no usable certificates", t.CAFile)
}
return &tls.Config{RootCAs: pool}, nil
default:
return &tls.Config{}, nil // system roots; safe default
}
}
// normalizeFingerprint lowercases and strips colons/whitespace, validating that
// the result is a 64-char (32-byte) hex SHA-256.
func normalizeFingerprint(fp string) (string, error) {
s := strings.ToLower(strings.NewReplacer(":", "", " ", "", "\t", "").Replace(fp))
if len(s) != 64 {
return "", fmt.Errorf("proxmox: fingerprint must be a SHA-256 (64 hex chars), got %d", len(s))
}
if _, err := hex.DecodeString(s); err != nil {
return "", fmt.Errorf("proxmox: fingerprint is not valid hex: %w", err)
}
return s, nil
}
+46
View File
@@ -0,0 +1,46 @@
package proxmox
import "testing"
func TestNormalizeFingerprint(t *testing.T) {
// 64-hex with colons (the /nodes ssl_fingerprint form) normalizes fine.
const withColons = "BA:7C:99:7D:45:D0:67:91:E2:F2:72:74:6E:D6:9F:83:51:D1:61:E5:C3:BD:F6:A0:B8:0B:E3:D8:DB:89:5B:CF"
got, err := normalizeFingerprint(withColons)
if err != nil {
t.Fatalf("normalize: %v", err)
}
if len(got) != 64 {
t.Errorf("len = %d", len(got))
}
if got != "ba7c997d45d06791e2f272746ed69f8351d161e5c3bdf6a0b80be3d8db895bcf" {
t.Errorf("got %q", got)
}
}
func TestNormalizeFingerprint_Bad(t *testing.T) {
for _, c := range []string{"", "tooshort", "zz7c997d45d06791e2f272746ed69f8351d161e5c3bdf6a0b80be3d8db895bcf"} {
if _, err := normalizeFingerprint(c); err == nil {
t.Errorf("normalize(%q) = nil, want error", c)
}
}
}
func TestTLSConfig_Build(t *testing.T) {
// Fingerprint pin produces a config with a pin verifier (and the documented
// InsecureSkipVerify=true that the verifier overrides).
c, err := (TLSConfig{Fingerprint: "ba7c997d45d06791e2f272746ed69f8351d161e5c3bdf6a0b80be3d8db895bcf"}).build()
if err != nil {
t.Fatalf("build pin: %v", err)
}
if c.VerifyPeerCertificate == nil {
t.Errorf("pin config missing VerifyPeerCertificate")
}
// Default (no trust set) uses system roots, no skip.
def, err := (TLSConfig{}).build()
if err != nil {
t.Fatalf("build default: %v", err)
}
if def.InsecureSkipVerify {
t.Errorf("default must verify")
}
}
+164
View File
@@ -0,0 +1,164 @@
package proxmox
import "encoding/json"
// Types mirror the exact JSON shapes captured from the live demo host
// (demo-felhom, PVE 9.2.2, 2026-06-08) via `pvesh get ... --output-format json`.
// Decoding ignores unknown fields, so we depend only on the fields we use.
// Version is GET /version.
type Version struct {
Release string `json:"release"` // "9.2"
RepoID string `json:"repoid"`
Version string `json:"version"` // "9.2.2"
}
// Node is one entry of GET /nodes.
type Node struct {
Node string `json:"node"` // node name, e.g. "demo-felhom"
Status string `json:"status"` // "online"
CPU float64 `json:"cpu"` // load fraction 0..1
MaxCPU int `json:"maxcpu"`
Mem int64 `json:"mem"`
MaxMem int64 `json:"maxmem"`
Disk int64 `json:"disk"`
MaxDisk int64 `json:"maxdisk"`
Uptime int64 `json:"uptime"`
SSLFingerprint string `json:"ssl_fingerprint"`
}
// NodeStatus is GET /nodes/{node}/status (host metrics; needs Sys.Audit).
type NodeStatus struct {
CPU float64 `json:"cpu"` // load fraction 0..1
Uptime int64 `json:"uptime"`
LoadAvg []string `json:"loadavg"` // 1/5/15-min, as strings in the API
PVEVersion string `json:"pveversion"`
KVersion string `json:"kversion"`
Memory struct {
Total int64 `json:"total"`
Used int64 `json:"used"`
Free int64 `json:"free"`
Available int64 `json:"available"`
} `json:"memory"`
RootFS struct {
Total int64 `json:"total"`
Used int64 `json:"used"`
Free int64 `json:"free"`
Avail int64 `json:"avail"`
} `json:"rootfs"`
Swap struct {
Total int64 `json:"total"`
Used int64 `json:"used"`
Free int64 `json:"free"`
} `json:"swap"`
CPUInfo struct {
Cores int `json:"cores"`
CPUs int `json:"cpus"`
Sockets int `json:"sockets"`
Model string `json:"model"`
} `json:"cpuinfo"`
}
// Guest is one entry of GET /nodes/{node}/lxc and the body of
// GET /nodes/{node}/lxc/{vmid}/status/current. The status/current response has no
// vmid field (it is in the path), so callers set VMID from the request argument.
type Guest struct {
VMID int `json:"vmid"`
Name string `json:"name"`
Status string `json:"status"` // "running" | "stopped"
Type string `json:"type"` // "lxc"
CPUs int `json:"cpus"`
CPU float64 `json:"cpu"`
Mem int64 `json:"mem"`
MaxMem int64 `json:"maxmem"`
Disk int64 `json:"disk"`
MaxDisk int64 `json:"maxdisk"`
Uptime int64 `json:"uptime"`
}
// GuestConfig is GET /nodes/{node}/lxc/{vmid}/config. The config surface is
// dynamic (net0..netN, mp0..mpN, unusedN), so known fields are typed and the full
// raw map is preserved in Extra for the dynamic ones.
type GuestConfig struct {
Hostname string `json:"hostname"`
Arch string `json:"arch"`
Cores int `json:"cores"`
Memory int64 `json:"memory"`
Swap int64 `json:"swap"`
OSType string `json:"ostype"`
RootFS string `json:"rootfs"`
Features string `json:"features"` // e.g. "nesting=1,keyctl=1"
Unprivileged int `json:"unprivileged"` // 1 if unprivileged
Digest string `json:"digest"`
// Extra holds every field as raw JSON, including the dynamic netN/mpN/unusedN
// keys not promoted above.
Extra map[string]json.RawMessage `json:"-"`
}
// UnmarshalJSON fills both the typed known fields and the raw Extra map.
func (g *GuestConfig) UnmarshalJSON(b []byte) error {
type alias GuestConfig // avoid recursion
var a alias
if err := json.Unmarshal(b, &a); err != nil {
return err
}
*g = GuestConfig(a)
return json.Unmarshal(b, &g.Extra)
}
// MountPoints returns the mpN entries (e.g. "mp0" -> "local-lvm:1,mp=/mnt/mp1,backup=0")
// pulled from Extra. Relevant for later slices' bulk-volume placement.
func (g *GuestConfig) MountPoints() map[string]string {
return g.prefixed("mp")
}
// Nets returns the netN entries from Extra.
func (g *GuestConfig) Nets() map[string]string {
return g.prefixed("net")
}
func (g *GuestConfig) prefixed(prefix string) map[string]string {
out := map[string]string{}
for k, raw := range g.Extra {
if len(k) <= len(prefix) || k[:len(prefix)] != prefix {
continue
}
// require the suffix to be a digit (mp0, net0 — not "memory")
if c := k[len(prefix)]; c < '0' || c > '9' {
continue
}
var s string
if json.Unmarshal(raw, &s) == nil {
out[k] = s
}
}
return out
}
// Storage is one entry of GET /storage (cluster) and GET /nodes/{node}/storage
// (the latter adds usage fields). Unused fields stay zero.
type Storage struct {
Storage string `json:"storage"`
Type string `json:"type"` // "dir" | "lvmthin" | "nfs" | "cifs" | "pbs"
Content string `json:"content"` // comma list, e.g. "vztmpl,backup,iso,import"
Path string `json:"path,omitempty"`
Total int64 `json:"total,omitempty"`
Used int64 `json:"used,omitempty"`
Avail int64 `json:"avail,omitempty"`
Active int `json:"active,omitempty"`
Enabled int `json:"enabled,omitempty"`
Shared int `json:"shared,omitempty"`
UsedFraction float64 `json:"used_fraction,omitempty"`
}
// StorageContent is one entry of GET /nodes/{node}/storage/{store}/content
// (e.g. vzdump archives, CT templates, guest volumes).
type StorageContent struct {
VolID string `json:"volid"` // e.g. "local:backup/vzdump-lxc-9001-...tar.zst"
Content string `json:"content"`
Format string `json:"format"`
Size int64 `json:"size"`
CTime int64 `json:"ctime"`
VMID int `json:"vmid,omitempty"`
}
+63
View File
@@ -0,0 +1,63 @@
package proxmox
import (
"fmt"
"strconv"
"strings"
)
// UPID is a parsed Proxmox task identifier. Long operations (vzdump, restore,
// snapshot, ...) return a UPID rather than a result; the caller polls the task.
//
// Wire format (captured live, demo-felhom):
//
// UPID:demo-felhom:00026454:004E3431:6A265E53:vzdestroy:9021:root@pam:
// |node |pid-hex |pstart-hx|start-hex |worker |id |user |(trailing)
type UPID struct {
Raw string
Node string
PID uint64 // decoded from hex
PStart uint64 // decoded from hex
StartTime uint64 // decoded from hex (unix seconds)
Worker string // task type, e.g. "vzdump", "vzdestroy", "vzsnapshot"
ID string // worker target, e.g. the vmid as a string
User string // e.g. "root@pam" or "felhom-agent@pve!agent"
}
// ParseUPID parses a Proxmox UPID string. The user field may contain '@' and '!'
// but never ':', so a plain colon-split is correct.
func ParseUPID(s string) (UPID, error) {
if !strings.HasPrefix(s, "UPID:") {
return UPID{}, fmt.Errorf("proxmox: not a UPID: %q", s)
}
// UPID:node:pid:pstart:starttime:worker:id:user: -> 9 fields, last empty
parts := strings.Split(s, ":")
if len(parts) < 8 {
return UPID{}, fmt.Errorf("proxmox: malformed UPID (%d fields): %q", len(parts), s)
}
pid, err := strconv.ParseUint(parts[2], 16, 64)
if err != nil {
return UPID{}, fmt.Errorf("proxmox: bad UPID pid %q: %w", parts[2], err)
}
pstart, err := strconv.ParseUint(parts[3], 16, 64)
if err != nil {
return UPID{}, fmt.Errorf("proxmox: bad UPID pstart %q: %w", parts[3], err)
}
start, err := strconv.ParseUint(parts[4], 16, 64)
if err != nil {
return UPID{}, fmt.Errorf("proxmox: bad UPID starttime %q: %w", parts[4], err)
}
return UPID{
Raw: s,
Node: parts[1],
PID: pid,
PStart: pstart,
StartTime: start,
Worker: parts[5],
ID: parts[6],
User: parts[7],
}, nil
}
// String returns the original wire form.
func (u UPID) String() string { return u.Raw }
+59
View File
@@ -0,0 +1,59 @@
package proxmox
import "testing"
func TestParseUPID(t *testing.T) {
// Captured live from demo-felhom.
const raw = "UPID:demo-felhom:00026454:004E3431:6A265E53:vzdestroy:9021:root@pam:"
u, err := ParseUPID(raw)
if err != nil {
t.Fatalf("ParseUPID: %v", err)
}
if u.Node != "demo-felhom" {
t.Errorf("node = %q", u.Node)
}
if u.Worker != "vzdestroy" {
t.Errorf("worker = %q", u.Worker)
}
if u.ID != "9021" {
t.Errorf("id = %q", u.ID)
}
if u.User != "root@pam" {
t.Errorf("user = %q", u.User)
}
if u.PID != 0x00026454 {
t.Errorf("pid = %#x, want 0x26454", u.PID)
}
if u.StartTime != 0x6A265E53 {
t.Errorf("starttime = %#x", u.StartTime)
}
if u.String() != raw {
t.Errorf("String() round-trip = %q", u.String())
}
}
func TestParseUPID_PrivsepTokenUser(t *testing.T) {
// The user field can contain '@' and '!' (a privsep token) but never ':'.
const raw = "UPID:demo-felhom:00001234:00005678:6A265E53:vzdump:9001:felhom-agent@pve!agent:"
u, err := ParseUPID(raw)
if err != nil {
t.Fatalf("ParseUPID: %v", err)
}
if u.User != "felhom-agent@pve!agent" {
t.Errorf("user = %q", u.User)
}
}
func TestParseUPID_Invalid(t *testing.T) {
cases := []string{
"",
"not-a-upid",
"UPID:node:nothex:00:00:t:1:u:", // bad pid hex
"UPID:node:00:00", // too few fields
}
for _, c := range cases {
if _, err := ParseUPID(c); err == nil {
t.Errorf("ParseUPID(%q) = nil error, want error", c)
}
}
}