Compare commits

...

21 Commits

Author SHA1 Message Date
admin 2f8658981d docs: reflow CLAUDE.md; switch REPORT.md to overwrite-latest; add no-secrets rule
Unify the REPORT/CHANGELOG convention with the sibling repos (REPORT.md was
append/cumulative -> now overwrite-latest; CHANGELOG stays cumulative). Reflow
removes hard mid-paragraph line wraps; rendered output unchanged. CHANGELOG entry
in hub/CHANGELOG.md. No hub code change -> no version bump.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:54:32 +02:00
admin 7bc27c38de update 2026-06-08 20:06:11 +02:00
admin aab3e137c5 updated CLAUDE.md 2026-06-08 19:17:41 +02:00
admin 4be3bdf486 fix(hub): slice-3 follow-ups — /host-report 413 oversize + contract golden (v0.7.1)
- handleHostReport: read maxHostReportBytes+1 (4 MiB const) and reject oversize with
  413 instead of silent LimitReader truncation. Controller handleReport (1 MiB) is
  unchanged. Test asserts 413.
- contract: hub/internal/api/testdata/host-report.golden.json (byte-identical with
  felhom-agent's copy) + TestHostReport_GoldenContract drives the real handler and
  asserts 200 + denorm + both guests upserted.
- CHANGELOG v0.7.1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 18:31:44 +02:00
admin 23611c20ef chore(hub): revert incidental gofmt-only reformatting outside slice-3 scope
Restores notify/templates.go, store/telemetry.go, web/configs.go to upstream —
those were alignment-only churn from a tree-wide gofmt, not part of slice 3. Keeps
the host-domain diff additions-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:38:18 +02:00
admin 7c0c75457f feat(hub): host-domain ingest — tables + /host-report + per-host auth + host dead-man's-switch (v0.7.0, slice 3)
Purely additive; the controller path (reports/customer_configs/checkAuthCustomer/
existing checkers) is untouched. Cutover remains slice 10.

- store: new hosts/guests/host_reports tables (full schema incl. columns INERT
  until slice 10, so no later ALTER); GetHostByAPIKey/GetHost/ListHosts/UpsertHost/
  SaveHostReport/UpsertGuestFromReport (preserves inert cols)/GetHostStaleness/
  GuestID; Prune also prunes host_reports.
- api: checkAuthHost (sibling of checkAuthCustomer); POST /host-report (per-host
  Bearer, 4MiB, denorm + guest upsert, control envelope); POST /admin/hosts
  (PROVISIONAL global-key host mint); host_* event types registered.
- monitor: HostStalenessChecker sibling over host_reports (host_stale/down/
  recovered), wired on the existing 60s ticker; controller checkers unchanged.
- tests (hermetic): store intent/inert-column preservation, auth, ingest
  (envelope+denorm, mismatch/unknown/blocked/oversize), admin mint round-trip,
  host staleness transitions.

CHANGELOG v0.7.0. Contract matches the agent host-report spec field-for-field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:36:16 +02:00
admin 0d832def7b fix: update repo-name refs after deploy-felhom-compose -> felhom-controller rename
- hub/internal/web/templatefetcher.go: raw-template URL now points at the renamed
  repo (was relying on Gitea's post-rename redirect)
- documentation/ (moved here from the felhom-agent repo): fix controller-source path
  refs (deploy-felhom-compose -> felhom-controller) and the platform repo name
  (proxmox-controller -> felhom-agent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 14:03:13 +02:00
admin cb1d964620 Merge pull request 'moved documentation to felhom.eu' (#7) from fix/filebrowser-config-args into main
Reviewed-on: #7
2026-06-08 11:54:53 +00:00
admin 3d6cde8080 Merge pull request 'docs: rework repo-name references for renames' (#6) from chore/rename-repo-refs into main
Reviewed-on: #6
2026-06-08 11:52:04 +00:00
admin 715f644bf0 moved documentation to felhom.eu 2026-06-08 13:50:14 +02:00
admin 0f12e17175 docs: rework repo-name references for renames
deploy-felhom-compose -> felhom-controller, proxmox-controller -> felhom-agent in
README.md and CLAUDE.md. Hub source (templatefetcher.go) intentionally left untouched
per scope; its raw-template URL is flagged separately for the operator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 13:39:53 +02:00
admin 7b545c1ec7 Merge pull request 'fix: pass --config to filebrowser (v2.63.x changed default lookup path)' (#5) from fix/filebrowser-config-args into main 2026-06-06 12:22:05 +00:00
admin ea66afa960 manifests: pass --config to filebrowser so it reads our ConfigMap
The previous PR pinned filebrowser to v2.63.13 + runAsUser:0 which
solved the PVC permission issue, but the pod was still 0/1 Ready
because v2.63.x changed the default config-file lookup path:

  Old (v2-alpine): /.filebrowser.json (matched our existing mount)
  New (v2.63.13) : /config/settings.json (NOT mounted in this pod)

So the new image ran with its built-in defaults (port 80, in-memory
db), and the readiness probe on 8080/health timed out.

Fix: pass `args: ["-c", "/.filebrowser.json"]` so filebrowser uses the
ConfigMap we already mount there. No volumeMount changes needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 14:22:04 +02:00
admin 87b062e84a Merge pull request 'feat: umami 3.1.0 + filebrowser v2.63.13 (root)' (#4) from feat/umami-v3-filebrowser-root into main 2026-06-06 12:17:21 +00:00
admin bd0531e4a8 manifests: umami -> 3.1.0 (v3 line) + filebrowser v2.63.13 with runAsUser:0
umami:
  Switch from SHA-pinned v3.0.3 to the tagged v3.1.0 release (the v3
  line proper -- same schema lineage, normal Prisma minor-version
  migration). This is the documented forward path that the version-
  checker hint `postgresql-latest -> 3.1` indicated. The v1.x
  postgresql-vX.Y.Z line we briefly tried earlier today is a
  DIFFERENT image lineage with incompatible migrations -- avoid.

filebrowser:
  Re-pin to v2.63.13 (debian-based default) so Renovate can track
  future bumps. The non-root UID in that image can't write to the
  existing PVC contents (chowned to root by the previous v2-alpine
  image), so set pod-level securityContext runAsUser:0 + runAsGroup:0
  to keep using the same volume layout without a chown initContainer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 14:17:20 +02:00
admin dc64bb2d79 Merge pull request 'fix(URGENT): pin umami to exact SHA (v1.38.0 has schema lineage mismatch)' (#3) from fix/umami-sha-pin into main 2026-06-06 11:53:55 +00:00
admin 7e6ea9d66c manifests: pin umami to exact image SHA (schema mismatch with v1.38.0)
Previous PR pinned `ghcr.io/umami-software/umami:postgresql-v1.38.0`.
The new pod crashlooped on Prisma:

  ERROR: relation "event" does not exist
  Migration name: 02_add_event_data
  Database error code: 42P01

The 120-day-old working pod's actual image is:
  ghcr.io/umami-software/umami@sha256:28f263fe06f79ebffa5a6a6e9b...

It runs an older umami build whose schema doesn't have the `event`
table that the v1 migration `02_add_event_data` operates on. The DB
has migrations 10-14 applied (newer than 02 by name) but 02 isn't in
its applied set -- likely a schema fork between the line our 120d pod
runs and the postgresql-vX.Y.Z line that v1.38.0 advances toward.

Pin to the exact SHA that the working pod uses, so pod restarts +
ArgoCD syncs both keep producing pods on the same known-good image
(cached on the node, no registry pull needed). Renovate also stops
chasing the broken upgrade path.

Proper fix (deferred): plan a v3.x migration. The version-checker
dashboard hint `postgresql-latest → 3.1` suggests umami v3.x dropped
the `postgresql-` prefix and is what we'd want long-term. That needs
a real DB migration plan since the schema lineage is genuinely
different from this image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 13:53:54 +02:00
admin a964dc20a4 Merge pull request 'fix: revert filebrowser to v2-alpine (PVC permission issue with v2.63.13)' (#2) from fix/filebrowser-revert into main 2026-06-06 11:45:19 +00:00
admin df2a1259d9 manifests: revert filebrowser v2.63.13 -> v2-alpine (PVC permission issue)
The previous PR pinned `filebrowser/filebrowser:v2-alpine` to v2.63.13
but it crashlooped on:

  Error: open /database/filebrowser.db: permission denied

The v2.63.13 image (debian-based default) runs as a non-root UID and
can't write to files on the PVC that were created by the v2-alpine
image (which ran as root). No `v2.63.13-alpine` tag exists upstream
(filebrowser stopped publishing per-version alpine variants), so we
can't trivially preserve the same runtime.

Quick recovery: revert to v2-alpine so filebrowser is usable again.
Proper fix (deferred): either an initContainer that `chown -R 1000:1000
/database /srv` or a `securityContext.fsGroup: 1000` on the pod spec
to let the non-root UID write to the existing PVC. Both require some
care since the chown is destructive if the UID is wrong.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 13:45:18 +02:00
admin e363c6594d Merge pull request 'manifests: re-pin moving tags (umami / filebrowser)' (#1) from fix/version-pins into main 2026-06-06 11:41:51 +00:00
admin ce80dce497 manifests: re-pin moving tags so Renovate can track them
- umami       postgresql-latest  -> postgresql-v1.38.0
  - filebrowser v2-alpine          -> v2.63.13

These two were "latest"-style moving tags that Renovate physically
cannot propose updates for. Pinning to current upstream versions so
future bumps go through the normal Renovate PR flow.

Note: Renovate operates from the homelab-manifests repo, not this one
yet — but felhom-system/* copies exist in homelab-manifests for
discoverability, and Renovate already tracks the pinned forms via a
new customManager for the umami `postgresql-vX.Y.Z` pattern (added in
homelab-manifests admin-system/renovate.yaml). For now, future bumps
will need to be applied to both repos until we consolidate the source
of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 13:41:50 +02:00
28 changed files with 4877 additions and 198 deletions
+72 -169
View File
@@ -1,191 +1,94 @@
# CLAUDE.md — Project Instructions for Claude Code # CLAUDE.md — Project Instructions for Claude Code (`felhom.eu`)
> This file is read automatically by Claude Code at the start of every session. > Read automatically by Claude Code when it works in this repo. Keep it updated as the project evolves. Cross-repo orientation (the felhom system, artifact taxonomy, access) lives in the workspace-root `e:\git\CLAUDE.md`; this file is `felhom.eu`-specific.
> It replaces the "Instructions" panel from the claude.ai Project.
> Keep it updated as the project evolves.
## Project overview ## Project overview
This repo (`felhom.eu`) contains: This repo (`felhom.eu`) contains:
- **Website** (`website/`) — Static HTML pages at felhom.eu, served via k3s nginx + git-sync sidecar - **Website** (`website/`) — static HTML at felhom.eu, served via k3s nginx + git-sync sidecar.
- **Hub** (`hub/`) — Go application (felhom-hub) — centralized dashboard for monitoring customer controllers, runs on k3s at hub.felhom.eu - **Hub** (`hub/`) — Go application (felhom-hub) — the **operator backend**, on k3s at `hub.felhom.eu`.
- **K8s manifests** (`manifests/`) — k3s deployment manifests for all felhom-system services - **K8s manifests** (`manifests/`) — k3s deployment manifests for felhom-system services.
- **Architecture docs** (`documentation/`) — the **authoritative design home for the whole Felhom system**: `architecture/01..05-*.md` (topology/trust, controller module map, host-agent, signing, hub), `proxmox-platform.md`, and `tests/phase{0,1-2,3,4}-findings.md`. Read these before designing.
See `README.md` for full architecture, DNS, email, and SEO documentation. See `README.md` for full architecture/DNS/email/SEO docs. See `TASK.md` for the current task (if any).
See `TASK.md` for the current task to implement (if it exists).
## The Felhom system (so the hub's role is in context)
Felhom is **Proxmox-based**, with a locked **three-component model**:
- **Hub** (this repo, `hub/`) — operator backend. Authors operator *intent*; mirrors box *reality*; holds **no data-plane role** and never connects inbound to a box.
- **Host agent** (repo `felhom-agent/`) — one per Proxmox host; owns all Proxmox interaction.
- **In-guest controller** (repo `felhom-controller/`) — one per customer LXC; Docker-only.
The hub is **not** just controller monitoring anymore. As of slice 3 it ingests **two report streams**: the agent's host-domain report (`POST /api/v1/host-report`, the heartbeat) and the legacy controller report (`POST /api/v1/report`). The controller path is **frozen and retires at the slice-10 cutover** — do not modify it until then.
## Hub — current state (v0.7.x)
- **Tables:** `customer_configs`, `events`, `app_telemetry`/`app_log_issues`, the legacy `reports`, and the slice-3 host-domain additions `hosts` / `guests` / `host_reports` (additive; columns marked inert exist for the slice-10 cutover but are unused now).
- **Auth:** Bearer — global key, per-customer key (legacy), and per-host key (`GetHostByAPIKey`, slice 3). Provisional global-key host mint at `POST /api/v1/admin/hosts`.
- **Monitoring:** the controller `StalenessChecker` (over `reports`) AND a sibling `HostStalenessChecker` (over `host_reports`, emitting `host_stale`/`host_down`/`host_recovered`).
- Two-tier notifications (operator English / customer Hungarian, Resend, cooldowns); `events` audit.
## Code quality rules ## Code quality rules
- Always double-check generated code for bugs, logic issues, syntax errors - Always double-check generated code for bugs, logic issues, syntax errors.
- Handle edge cases without overcomplicating the script/program - Handle edge cases without overcomplicating.
- Add debug capabilities (logging, verbose output) for easier troubleshooting - Add debug capabilities (logging, verbose output).
- If you need more input or troubleshooting command output, ask first — don't guess - If you need more input or troubleshooting output, **ask first — don't guess**.
## Workspace layout ## Workflow & artifacts
``` The planning/architecture assistant ("project Claude", in claude.ai) writes specs and validates pushes; **you (Claude Code) implement**. A file being open in the editor is NOT an instruction.
E:\git\felhom.eu\ (or /e/git/felhom.eu/ in Git Bash)
├── hub/ # felhom-hub Go application
│ ├── cmd/hub/ # Entry point (main.go)
│ ├── internal/
│ │ ├── api/ # Report ingestion API
│ │ ├── store/ # SQLite storage + queries
│ │ └── web/ # Dashboard UI
│ │ ├── server.go # Server, routing, template funcs
│ │ ├── embed.go # go:embed for templates
│ │ └── templates/ # HTML templates + CSS
│ ├── configs/ # Example config files
│ ├── Dockerfile
│ ├── Makefile
│ └── go.mod
├── manifests/ # k3s deployment manifests
│ ├── hub.yaml # Hub deployment (hub.felhom.eu)
│ ├── webpage.yaml # Website + FileBrowser + git-sync
│ ├── contact-mailer.yaml # Contact form email sender
│ ├── healthchecks.yaml # Healthchecks (status.felhom.eu)
│ └── umami.yaml # Analytics (stats.felhom.eu)
├── website/ # Static HTML pages (felhom.eu)
│ ├── index.html
│ ├── alkalmazasok.html
│ ├── ... (all Hungarian, UTF-8 with BOM)
│ └── assets/ # Logos, screenshots, OG images
├── CLAUDE.md # This file
├── README.md # Full project documentation
└── TASK.md # Current task (if exists)
```
Related repos (same parent directory): - **`TASK.md` / `TASK-*.md`** — a spec for you to implement. Then push and update this repo's changelog (`hub/CHANGELOG.md`) and root `REPORT.md` per the convention below.
``` - **`RUNBOOK-*.md`** — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.)
E:\git\deploy-felhom-compose\ # felhom-controller Go app + deploy scripts - Validation of a push against a spec's criteria is project Claude's job, not yours, unless asked.
E:\git\app-catalog-felhom.eu\ # Docker Compose templates per app
E:\git\homelab-manifests\ # k3s cluster manifests (dooplex.hu services)
E:\git\misc-scripts\ # Helper scripts (build scripts, repo collector)
```
All repos hosted at `gitea.dooplex.hu/admin/`. > **In every repository where you make a change, update both files in that repo:**
> - **`CHANGELOG.md`** — a cumulative log of **all** changes; newest entry on top.
## SSH access > - **`REPORT.md`** — **overwrite** with a summary of the **most recent** implementation (or significant validation/operational run) only; not cumulative.
>
SSH key-based authentication configured. No password prompts. > **Never write secrets** — tokens, passwords, private keys, API keys — into `CHANGELOG.md`, `REPORT.md`, or any committed file. Reference them as "stored out-of-band" instead.
**IMPORTANT — SSH binary:** Claude Code runs in Git Bash, which has its own SSH at
`/usr/bin/ssh` (= `C:\Program Files\Git\usr\bin\ssh.exe`). This binary does NOT have
access to the Windows SSH agent and will fail silently. Always use the Windows native
OpenSSH binary:
```
SSH=/c/Windows/System32/OpenSSH/ssh.exe
```
All SSH commands below use `$SSH` — set it at the start of your session.
| Host | IP | User | Role |
|------|----|------|------|
| Build server (k3s node) | 192.168.0.180 | kisfenyo | Build + push images, kubectl |
| Demo node | 192.168.0.162 | kisfenyo | Test deployment (demo-felhom.eu) |
**Note:** `kubectl` on the build server requires `sudo` (k3s kubeconfig permissions).
## Build & deploy workflow — Hub
After making code changes to `hub/`, you **MUST** build, push, and deploy the new image.
Do NOT leave code changes uncommitted or undeployed.
### Step 1: Commit and push changes
```bash
cd /e/git/felhom.eu
git add -A && git commit -m "<descriptive message>" && git push
```
### Step 2: Build + push the container image on the build server
The build server (192.168.0.180) has the build toolchain. The build script lives at
`~/build/felhom-hub/build.sh` on the build server (NOT in this repo).
First, check the current running version:
```bash
$SSH kisfenyo@192.168.0.180 "sudo kubectl get deploy -n felhom-system hub -o jsonpath='{.spec.template.spec.containers[0].image}'"
```
Then build with the next version (e.g., if current is 0.1.2, use 0.1.3):
```bash
$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <NEW_VERSION> --push"
```
The build script:
- Pulls latest code from Gitea (`git pull` on the felhom.eu repo)
- Copies `hub/` source to a clean build workspace
- Builds Docker image with version + build-time ldflags
- Pushes to `gitea.dooplex.hu/admin/felhom-hub:<VERSION>` and `:latest`
### Step 3: Deploy to k3s
```bash
$SSH kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:<NEW_VERSION>"
```
### Step 4: Verify the deployment
```bash
$SSH kisfenyo@192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub && echo '---' && sudo kubectl logs -n felhom-system -l app=hub --tail 10"
```
Should show pod Running and `[INFO] felhom-hub <VERSION> starting` in logs.
### Build workflow summary
| Step | Command | Where |
|------|---------|-------|
| 1. Commit + push | `git add -A && git commit && git push` | Local (this repo) |
| 2. Build + push image | `$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <VER> --push"` | Build server |
| 3. Deploy | `$SSH kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=...:<VER>"` | Build server (kubectl) |
| 4. Verify | `$SSH kisfenyo@192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub"` | Build server |
## Build & deploy workflow — Website
The website auto-deploys via git-sync sidecar. Just push to `main`:
```bash
cd /e/git/felhom.eu
git add -A && git commit -m "<message>" && git push
```
Changes are live within 1-2 minutes. No build step needed.
For emergency edits, use FileBrowser at `https://files.felhom.eu`.
## Build & deploy workflow — K8s Manifests
Manifests are applied manually:
```bash
ssh kisfenyo@192.168.0.180 "sudo kubectl apply -f /home/kisfenyo/git/felhom.eu/manifests/<manifest>.yaml"
```
Remember to `git pull` on the build server first if you pushed changes locally.
## Tech stack (Hub) ## Tech stack (Hub)
- **Language:** Go 1.24+ - **Language:** Go 1.24+ (build server is go1.26.0).
- **Web framework:** stdlib `net/http` + `html/template` - **Web:** stdlib `net/http` + `html/template`. **DB:** SQLite via `modernc.org/sqlite` (pure Go).
- **Database:** SQLite via `modernc.org/sqlite` (pure Go, no CGo) - **Auth:** bcrypt + Bearer tokens. **Deploy:** Docker on k3s (felhom-system ns).
- **Auth:** bcrypt password hash + basic auth - **Storage:** Longhorn PVC at `/data/` (SQLite DB). **Config:** YAML via ConfigMap at `/etc/felhom-hub/hub.yaml`.
- **Deployment:** Docker container on k3s (felhom-system namespace)
- **Storage:** Longhorn PVC at `/data/` (SQLite DB) ## SSH access
- **Config:** YAML file mounted via k8s ConfigMap at `/etc/felhom-hub/hub.yaml`
Use the Windows OpenSSH binary (Git Bash's `/usr/bin/ssh` can't reach the Windows agent and fails silently): `SSH=/c/Windows/System32/OpenSSH/ssh.exe`. All SSH commands below use `$SSH`.
| Host | IP | User | Role |
|------|----|------|------|
| Build server (k3s node) | 192.168.0.180 | kisfenyo | Build + push images, kubectl (needs `sudo`) |
| Demo Proxmox host | 192.168.0.162 | root@pam (SSH alias felhom-pve, root, no sudo) | pveum/pct + live Proxmox validation — available to CC |
## Build & deploy — Hub
After code changes to `hub/`, you **MUST** build, push, and deploy.
1. **Commit + push:** `cd /e/git/felhom.eu && git add -A && git commit -m "<msg>" && git push`
2. **Check running version:** `$SSH kisfenyo@192.168.0.180 "sudo kubectl get deploy -n felhom-system hub -o jsonpath='{.spec.template.spec.containers[0].image}'"`
3. **Build + push image** (next version; build script lives on the build server, not in this repo): `$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <NEW_VERSION> --push"` (pulls latest from Gitea, builds with version+build-time ldflags into `main.Version`, pushes `gitea.dooplex.hu/admin/felhom-hub:<VER>` and `:latest`.)
4. **Deploy:** `$SSH kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:<NEW_VERSION>"`
5. **Verify:** `$SSH kisfenyo@192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub && sudo kubectl logs -n felhom-system -l app=hub --tail 10"` (expect Running + `[INFO] felhom-hub <VERSION> starting`.)
> If the hub deployment is ArgoCD-managed (auto-sync), a manual `kubectl set image` may be reverted by ArgoCD drift-correction — confirm the deploy path before relying on step 4.
## Build & deploy — Website / Manifests
- **Website** auto-deploys via git-sync; just push to `main` (live in 12 min). Emergency edits: FileBrowser at `https://files.felhom.eu`.
- **Manifests** are applied manually (git pull on the build server first if you pushed): `$SSH kisfenyo@192.168.0.180 "sudo kubectl apply -f /home/kisfenyo/git/felhom.eu/manifests/<manifest>.yaml"`
## Key patterns ## Key patterns
- Hub receives reports from customer controllers via `POST /api/v1/report` (Bearer token auth) - Hub ingests **host-reports from agents** (`POST /api/v1/host-report`, Bearer per-host) and legacy **controller reports** (`POST /api/v1/report`). The host-report `received_at` is the dead-man's-switch liveness signal.
- Dashboard shows all customers in a table with status, CPU, memory, disk, containers, backup age - Status logic: OK (report < 30m), WARN (30m1h or health=warn), DOWN (> 1h or health=fail).
- Customer detail page shows system info, report history, full JSON report - SQLite timestamps vary in format — use `parseSQLiteTime()`.
- Status logic: OK (report < 30m), WARN (30m-1h or health=warn), DOWN (> 1h or health=fail) - Dashboard/detail auto-refresh every 60s via `<meta http-equiv="refresh">`. Geo-restricted to Hungary via nginx ingress annotation.
- SQLite timestamps may vary in format — use `parseSQLiteTime()` for robust parsing
- Auto-refresh: dashboard and detail pages refresh every 60 seconds via `<meta http-equiv="refresh">`
- Geo-restricted to Hungary via nginx ingress annotation
## File encoding ## File encoding
All HTML files in `website/` are **UTF-8 with BOM**. Ensure your editor preserves this. All `website/` HTML is **UTF-8 with BOM** — preserve it. Hub Go source is standard UTF-8 (no BOM).
Hub Go source files are standard UTF-8 (no BOM).
+1 -1
View File
@@ -217,7 +217,7 @@ Every page includes:
| Repository | Purpose | | Repository | Purpose |
|------------|---------| |------------|---------|
| [app-catalog-felhom.eu](https://gitea.dooplex.hu/admin/app-catalog-felhom.eu) | Docker Compose templates + .felhom.yml metadata for 45+ apps | | [app-catalog-felhom.eu](https://gitea.dooplex.hu/admin/app-catalog-felhom.eu) | Docker Compose templates + .felhom.yml metadata for 45+ apps |
| [deploy-felhom-compose](https://gitea.dooplex.hu/admin/deploy-felhom-compose) | felhom-controller Go app + customer deploy scripts | | [felhom-controller](https://gitea.dooplex.hu/admin/felhom-controller) | felhom-controller Go app + customer deploy scripts |
| [deploy-portainer](https://gitea.dooplex.hu/admin/deploy-portainer) | Legacy — Portainer-based deploy scripts (deprecated) | | [deploy-portainer](https://gitea.dooplex.hu/admin/deploy-portainer) | Legacy — Portainer-based deploy scripts (deprecated) |
| [homelab-manifests](https://gitea.dooplex.hu/admin/homelab-manifests) | k3s cluster manifests for dooplex.hu services | | [homelab-manifests](https://gitea.dooplex.hu/admin/homelab-manifests) | k3s cluster manifests for dooplex.hu services |
| [misc-scripts](https://gitea.dooplex.hu/admin/misc-scripts) | Utility scripts (collect-repos.sh, etc.) | | [misc-scripts](https://gitea.dooplex.hu/admin/misc-scripts) | Utility scripts (collect-repos.sh, etc.) |
+132
View File
@@ -0,0 +1,132 @@
# felhom.eu — task reports
> **Overwrite** this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in [hub/CHANGELOG.md](hub/CHANGELOG.md). Sections below predate this convention change and are retained as history.
---
## Hub slice 3 — host-domain ingest (v0.7.0) — 2026-06-08
Purely **additive** host-domain ingest in `hub/`: new tables, the agent's
`/host-report` heartbeat endpoint, per-host Bearer auth, a provisional host mint, and a
host-domain dead-man's-switch. The existing controller path is **untouched**; the schema/
auth cutover remains **slice 10**. Pushed to `main`; build/vet/test green locally and on
the build server.
### New tables (`store.go migrate()`, idempotent — `// v0.7.0: host-domain`)
- **`hosts`** — one per customer agent. Reality columns (`agent_version`, `last_report_at`)
+ operator-intent columns **INERT until slice 10** (`desired_json`, `desired_generation`,
`dr_record_json`).
- **`guests`** — one per controller LXC, PK `guest_id = "<host_id>/<vmid>"` (hub-derived).
Reality columns (`display_name`, `status`, `controller_version`, `vmid`, `last_seen_at`)
+ **INERT** `api_key`, `desired_spec_json`.
- **`host_reports`** — the report stream + denormalized columns (cpu/mem/disk %, guest
counts, cloudflared status); pruned by `Prune(maxDays)` alongside `reports`.
> Inert columns exist **now** so slice 10 needs no `ALTER`; nothing reads/writes them this
> slice. Migration is additive-only (no `DROP`, no edits to `reports`/`customer_configs`)
> and idempotent.
### New store methods
`GetHostByAPIKey`, `GetHost`, `ListHosts`, `UpsertHost` (updates only identity + `updated_at`
on conflict), `SaveHostReport` (inserts a report row + bumps reality columns only),
`UpsertGuestFromReport` (updates reality columns only — **preserves** `api_key`/
`desired_spec_json`), `GetHostStaleness` (skips never-reported hosts), `GuestID`.
Structs: `Host`, `Guest`, `HostReportDenorm`, `HostStaleRow`.
### Auth (added; existing path unchanged)
`checkAuthHost(r)``(hostID, customerID, isGlobal, ok)`: global key → trust `body.host_id`;
per-host key → bound identity; failure → not-ok. `checkAuthCustomer` is byte-for-byte unchanged.
### Endpoints
- **`POST /api/v1/host-report`** (the heartbeat): per-host auth; 4 MiB body; computes denorm
(`guest_running` counts only `status=="running"`); `SaveHostReport` + per-guest
`UpsertGuestFromReport` (a guest upsert failure is logged, not fatal — liveness); returns the
control envelope `{status:"ok", poll_interval_seconds:900, blocked, desired_generation:0,
has_signed_ops:false}`. `blocked` reflects `customer_configs.status`; the other two are
reserved placeholders (slice 4). Global-key bootstrap requires the host to already exist
(else 400); per-host key requires `body.host_id == hostID` (else 403).
- **`POST /api/v1/admin/hosts`** — **PROVISIONAL**, global-key only. Mints `host_id` (legible
`<customer>-<hex>`) + a random `api_key` (`configgen.RandomHex(32)`); 201 `{host_id, api_key}`.
Flagged in code as the slice-3 bootstrap to be removed/locked at enrollment (slices 78).
### Host dead-man's-switch
`monitor.HostStalenessChecker` (`host_staleness.go`) — a **sibling** of the controller
`StalenessChecker`, keyed on host↔`host_reports`, emitting `host_stale`/`host_down`/
`host_recovered` (30m / 60m), attributed to the host's customer (so the existing per-customer
notification UX picks them up). Registered in `allowedEventTypes`; wired in `main.go` on the
existing 60s ticker. The controller staleness/deadline checkers are untouched and keep running.
### Contract
The `/host-report` JSON matches the agent spec §4 field-for-field (host_id, reported_at,
agent_version, host{…}, guests[{vmid,name,status,controller_version,spec}], cloudflared{status},
and the empty storage_targets/backups/restore_tests/pbs_snapshots/audit_tail — accepted
empty/absent). The envelope matches agent spec §5.
### Test matrix (new, hermetic — temp SQLite, no live data)
- **store**: upsert/lookup; a report-path update **preserves** `desired_json`/`desired_generation`;
guest upsert **preserves** `api_key`/`desired_spec_json` while updating reality; `GuestID`;
staleness skips never-reported.
- **auth**: `checkAuthHost` global / per-host / unknown.
- **ingest**: valid → 200 + envelope + denorm (`guest_running` = 1 of 2); host_id mismatch → 403;
unknown host under global key → 400; blocked customer → `blocked:true`; oversize body → 400.
- **admin mint**: non-global → 403; unknown customer → 400; success → 201 + minted key
round-trips through `/host-report`.
- **host staleness**: seed emits no events; ok→stale→down→recovered transitions.
### Untouched / deferred (explicit)
- **Controller path unchanged**: `/api/v1/report`, `reports`, `customer_configs`,
`checkAuthCustomer`, existing staleness + deadline checkers — additions only, all still green.
- **Not built** (per scope): desired-state serving, `signed_ops`, geo→hub, DR-record migration,
dashboard re-design. The cutover (drop `reports``guest_reports`, merge checkers, tighten the
provisional admin/global-key auth) remains **slice 10**.
### Versioning / deploy
Hub version is the `main.Version` ldflags var (`build.sh <VER>`), default `"dev"`; recorded
**v0.7.0** in `hub/CHANGELOG.md`. The image build + ArgoCD deploy are **not** part of this task
(no deploy performed).
### Repo state
Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the
build server (go1.26).
---
## Hub slice-3 follow-ups (v0.7.1) — 2026-06-08
Validation follow-ups (hub half). Pushed to `main`; build/vet/test green locally (go1.26) and on
the build server.
### §3 — `/host-report` rejects oversize with 413 (not silent truncation)
`handleHostReport` now reads `maxHostReportBytes+1` (const `4 << 20`, defined near
`defaultHostPollSeconds`) and returns **`413 Payload too large`** when exceeded, instead of relying
on `LimitReader` truncation (which could accept a truncated-but-valid JSON as a partial report,
dropping guests from the mirror). **Scope-frozen:** the controller `handleReport` 1 MiB read is
**unchanged** (diff touches only the host path); the small divergence is acceptable until cutover.
`TestHandleHostReport_OversizeRejected` now asserts 413.
### §4 — cross-repo contract golden fixture (hub half)
- `hub/internal/api/testdata/host-report.golden.json` — a **byte-identical copy** of felhom-agent's
golden (verified by md5).
- `TestHostReport_GoldenContract` — mints a host, POSTs the golden through the **real**
`handleHostReport`, asserts 200 + denorm (`guest_total=2`, `guest_running=1`,
`cloudflared_status="active"`) + both guests upserted. Proves `hostReportPayload` still extracts
the contract from the real wire shape.
**Caveat (called out):** the two golden files are a *duplicated* contract with no shared source of
truth. JSON can't hold a comment, so the mandatory "keep byte-identical" marker lives in each test
file's doc comment. When slices 5/6 add real `storage_targets`/`backups` fields, promote this to a
shared Go types module (the proper fix); this fixture is the bridge.
### Versioning / scope
Recorded **v0.7.1** in `hub/CHANGELOG.md`. The hub version is the `main.Version` ldflags var
(`build.sh <VER>`, default `"dev"`) — there is no in-repo version constant to bump (the task's
pointer to `web/version.go` is the controller-image `VersionChecker`, unrelated); the image tag is
applied at build/deploy (ArgoCD), not in this task. No deploy performed.
### Untouched (confirmed)
Controller path (`handleReport`/`reports`/`customer_configs`/`checkAuthCustomer`/existing checkers)
unchanged. The agent's proxmox client timeout was a "confirm" item — already bounded (30s default),
no change.
### Repo state
Branch: `main`. Verified `go build/vet/test ./...` green in `hub/` locally (go1.26) and on the build server (go1.26).
@@ -0,0 +1,224 @@
# Felhom Controller Architecture — Part 1: Topology & Trust
**Status:** draft (decisions from the topology/trust design sessions).
**Platform facts** referenced here live in `docs/proxmox-platform.md`; this document
records *Felhom's decisions*, not Proxmox behaviour.
---
## 1. Model at a glance
Three components. **Control is always box-initiated** — the hub never connects *into* a
customer box.
```
operator side customer box (per Proxmox host)
┌───────────────────┐ ┌───────────────────────────────────────────┐
│ HUB │ │ Proxmox host │
│ (dooplex.hu, k3s) │ │ ┌──────────────┐ │
│ - report sink │◀──poll──┤ │ HOST AGENT │ operator-tier │
│ - signed jobs │ signed │ │ (Proxmox │ • all Proxmox ops │
│ - dashboard │ jobs │ │ token) │ • provision / restore │
│ - customer record│ │ └──────┬───────┘ • storage mgmt │
│ - PBS namespace │ │ │ local constrained API │
└─────────▲─────────┘ │ ┌──────▼───────────────────────────────┐ │
│ │ │ customer LXC (one per customer) │ │
│ direct, app- │ │ ┌──────────────┐ Docker: │ │
└───────────────────┼───┤ │ IN-GUEST │ [app] [app] ... │ │
domain reports │ │ │ CONTROLLER │ (Docker containers)│
│ │ │ (Docker-only)│ │ │
│ │ └──────────────┘ │ │
│ └───────────────────────────────────────┘ │
└───────────────────────────────────────────┘
PBS (offsite) ◀── outbound, client-side-encrypted backups ── customer box
end-users / customer ◀── Cloudflare Tunnel ── apps + controller UI
```
---
## 2. The customer node
- One **Proxmox host** per box (PVE 9.2, Debian 13, LVM-thin).
- **Default workload topology:** one **customer LXC**, Docker inside it, each app a Docker
container/stack. Apps are isolated at the Docker layer (separate containers, networks,
volumes, cgroup limits); they share one LXC/kernel/Docker daemon.
- **Escape hatch:** promote an individual app to its own guest (LXC or VM) only for a
specific reason — a non-Linux/Windows app, a genuinely untrusted or exposed app needing
hard isolation, or a resource hog needing guarantees.
- **Multi-tenant:** one customer per host is the home default; multiple customer LXCs on
one host (a company environment) is **not precluded** — the agent manages a *set* of
guests. The only multi-tenant-specific work deferred to "if it becomes real" is resource
fairness (per-guest disk/RAM/CPU quotas).
---
## 3. Components & responsibilities
| | **Hub** | **Host agent** | **In-guest controller** |
|---|---|---|---|
| Runs on | dooplex.hu (k3s) | the Proxmox host | the customer LXC |
| Tier | operator backend | operator (high-privilege) | customer-facing (app) |
| Holds | customer records, signed-job source, PBS namespaces, escrowed keys | the **only** Proxmox API token; per-host operator identity | **no Proxmox creds**; its own hub API key + a local-API token to the agent |
| Does | reporting sink, dashboard, job queue, source of durable truth | all Proxmox ops (provision, restore, snapshot, backup, storage mgmt, LXC lifecycle); polls hub for signed jobs; exposes a constrained local API to the controller; **per-guest authorization gate** | Docker/app lifecycle, catalog deploy, customer UI, app-level (data-layer) backup; reports app-domain to the hub directly |
| Never does | initiate a connection *into* a box | — | touch the Proxmox API directly |
**Key separation:** the controller manages Docker; the agent manages Proxmox. The controller's
only path to guest-level operations (snapshot-before-deploy, "grow my RAM") is a constrained
**local API call to the agent**, which the agent authorizes (scoped to that controller's own
guest) and executes with its operator-tier token. This consolidates all Proxmox access and
all per-guest authorization in one auditable place and leaves the guest with zero Proxmox
credentials.
---
## 4. Control plane — box-initiated
- CGNAT does **not** force this: the Cloudflare Tunnel already makes a box reachable through
Cloudflare's edge. We *choose* box-initiated control for the smallest attack surface — the
box exposes no control endpoint at all.
- The agent and the controller **poll** the hub; the hub never initiates inbound.
- Operator actions are delivered as **signed jobs**: the agent verifies an operator signature
before executing, so a compromised hub database alone cannot forge commands.
- All operator-initiated actions are recorded in a **customer-visible audit log**.
---
## 5. Trust boundaries
| Boundary | What crosses | Mechanism | Blast radius if breached |
|---|---|---|---|
| end-user ↔ apps | app traffic | Cloudflare Tunnel → Traefik (Host routing) | that app |
| customer ↔ controller UI | management UI | Cloudflare Tunnel; UI auth (bcrypt) | the customer's own box |
| controller ↔ agent | snapshot/resize/backup requests | local constrained RPC; agent authorizes per-guest | the controller's own guest only |
| agent ↔ hub | reports + signed jobs | outbound poll; signed jobs | one box; signed jobs limit forgery |
| controller ↔ hub | app-domain reports/jobs (incl. geo desired-state) | outbound, own API key | app-domain of one customer |
| box ↔ PBS | encrypted backups | outbound; per-customer namespace; client-side encryption | ciphertext only (operator can't read) |
| guest ↔ Proxmox host | **(none direct)** | the guest holds no Proxmox creds; all via the agent | — |
| hub ↔ Cloudflare API | geo-restriction WAF (enforcement) | the **hub** holds the CF API token; reconciles geo desired-state → WAF | the customer's zone/WAF |
---
## 6. Enrollment & identity
- **Physical presence at provisioning** (on-site install, or pre-imaged-and-delivered).
This removes any zero-touch remote-enrollment problem.
- A **one-time retrieval code** mints durable identity. Single-use (burned on the successful
config fetch) plus a short *pre-use* TTL; one-click regenerate for the only real failure
case (fetch fails before anything is persisted). After the fetch, the code is irrelevant —
everything downstream runs on durable credentials, so retries don't need it.
- **Order:** the agent enrolls first (and, running as root at setup, mints its own scoped
operator-tier Proxmox token), then provisions the customer LXC from the golden template and
deploys the controller into it — injecting the controller's hub API key and its local-API
token. The controller is the agent's product, never the other way around.
- The **hub customer record is the durable source of truth**, and it survives box loss:
identity, domain, **Cloudflare tunnel token**, **PBS namespace**, **storage manifest**, a
**mirrored app inventory** (bottom-up reality, not operator-declared intent — apps themselves
restore from the PBS guest snapshot, never re-deployed from this record; see `05` §1/§9), and the
**escrowed (zero-knowledge) backup key**. This is what makes hardware replacement possible.
---
## 7. Networking
- **Cloudflare Tunnel** provides inbound access to apps and the controller UI (the CGNAT
solution). Tunnel token lives in the hub record → **reused on new hardware during DR**, so
DNS/routing stay intact through an outage.
- **Outbound only** for control/report/backup (poll to hub, push to PBS). No inbound control
endpoint exists in the chosen model.
- **Tunnel placement: host** (resolved, Part 3 §3/§5). `cloudflared` runs on the Proxmox host
as its own **agent-managed systemd service** — not inside the guest — so the data path
survives control-plane death by construction. Geo-restriction WAF is **hub-enforced** (the
hub holds the CF API token; the controller only reports geo desired-state).
---
## 8. Storage & backup
**Tiers** (escalating failure scope):
| Layer | Mechanism | Survives | Note |
|---|---|---|---|
| Snapshot | LVM-thin snapshot (transient) | *logical* loss only | whole-LXC rollback; **not a backup** |
| Local — second storage | vzdump to `dir`/`nfs`/`cifs` | primary-disk failure (USB) / box death (NAS) | first *real* backup tier |
| Offsite — PBS | dedup'd, incremental, encrypted | site loss | the DR substrate; paid tier |
- **Storage manifest** (hub-held, agent-reconciled): per target → type, durable identity
(UUID / `server:/export` / repo+fingerprint), **class** (fast/slow + rough IOPS, set once
at attach), role, encrypted credentials, schedule/retention. The agent creates the Proxmox
storages, continuously checks presence/reachability, and reports per-target status (a
disconnected target → actionable notification).
- **App data placement is per-volume, not per-app:** `.felhom.yml` classifies each volume
**hot** (DB/config/cache → fast storage, enforced) vs **bulk** (media/files → may be slow).
A photo app's DB stays on SSD while its blobs go to the USB.
- **Backup scoping:** hot data (LXC rootfs) rides the guest `vzdump` → tiers + PBS. Bulk data
on external mount points is **excluded** from the guest vzdump (per-mount `backup` flag) and
gets its own per-volume policy (file-level to a tier, slower cadence — or explicitly *not*
backed up for re-downloadable content, with the customer informed).
- **Tiers double as the DR restore-source priority:** restore from the fastest *surviving*
source (local if still attachable, PBS on true site loss).
- **Key custody (zero-knowledge default):** three tiers the customer chooses —
*customer-only* / *zero-knowledge escrow (default)* / *operator-managed*. Default escrows
the **PBS passphrase-protected keyfile** in the hub, wrapped under a **customer recovery
code** the operator can't open; DR needs the customer's code. Access-notification is an
audit signal, never the primary guard. (Don't build bespoke crypto — use PBS's native
keyfile passphrase.)
---
## 9. Disaster recovery
- **Guest-loss (host + agent alive):** the agent restores the guest from the fastest
surviving tier, **resets identity** (MAC/hostname — see `proxmox-platform.md`), boots it,
controller returns. Validated mechanics: Phase 2.
- **Host / hardware-loss (agent gone):** re-provision (§6) in **restore mode** — the hub,
knowing the customer has PBS backups, hands the freshly-enrolled agent the existing identity
+ PBS namespace + a restore directive instead of a clean-provision directive. The agent
restores from PBS; the controller returns on the same domain (tunnel reused from the hub
record). DR = provisioning + a restore mode, not a separate mechanism.
- **Snapshot-before-deploy:** controller asks the agent to snapshot, deploys, runs its
post-deploy health check, asks the agent to roll back on failure. (Transient snapshot, §8.)
---
## 10. How this embodies the product values
- **Zero-knowledge offsite** — the operator holds the offsite backup but cannot read it.
- **Box-initiated control + signed jobs** — no standing operator backdoor; a hub compromise
alone can't forge commands.
- **Customer-visible audit log** — every operator action is visible to the customer.
- **Never hold data hostage** — subscriptions cover ongoing labour (monitoring, offsite,
support, new deployments); the customer's data and deployed apps remain recoverable by the
customer (recovery code), with nothing locked behind the operator.
---
## 11. Open sub-decisions (carried into later parts)
- **RTO/RPO targets** → drive the backup + offsite-replication schedule (§8).
- Offboarding / decommission (scenario 6) — not yet designed; must honour "never hold data
hostage" in credential revocation + data hand-off.
- Multi-tenant resource fairness — deferred until multi-tenant is real (§2).
---
## Appendix — relationship to the spike
- **Phase 0** → §2: LXC-default for the workload; overhead numbers.
- **Phase 1** → §3/§5: validated the privilege boundary (create/allocate is operator-tier).
The guest-side scoped-backup-token it proved possible is **not** used — we chose the
agent-mediated path — but it confirmed restore = operator-tier, which shapes the agent.
- **Phase 2** → §8/§9: backup→restore round-trip; identity reset on restore.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
- §5 trust boundaries: **added `hub ↔ Cloudflare API`** row (hub holds the CF token, enforces
geo→WAF); controller↔hub row notes it carries geo desired-state (S4).
- §7 networking: **tunnel placement resolved → host** (agent-managed systemd service); geo is
hub-enforced (S4/S5).
- §11 open items: removed the now-resolved **tunnel placement** and **self-update flow** entries
(S5; self-update designed in 03 §11).
- §6 durable record: **"declarative app inventory" → "mirrored app inventory"** — aligns the wording
with the locked two-driver model (`05` §1: apps are bottom-up mirror, never operator-declared;
`05` §9: apps restore from the PBS guest snapshot, not re-deployed from this record).
@@ -0,0 +1,374 @@
# Felhom Controller Architecture — Part 2: Controller Module Map
**Status:** audit (keep / port / delete / modify / add), grounded in the v0.33 source.
**Subject:** the v0.33 controller in `felhom-controller/controller/` (110 `.go` files,
~40 K LOC) audited against [01-topology-and-trust.md](01-topology-and-trust.md) and
[../proxmox-platform.md](../proxmox-platform.md).
> This is a **planning map, not the port.** No controller code was changed. Source
> citations use `controller/internal/...:line` (a different repo, so links are not
> clickable). Classifications reflect the **target model**: the in-guest controller is
> **Docker-only and holds no Proxmox credentials**; everything host/disk/Proxmox moves to
> a new **host agent** (out of scope here); the controller reaches the agent through a
> constrained **local API**.
## Classification scheme
**KEEP** (host-agnostic, ~unchanged) · **PORT** (survives, needs rework) ·
**DELETE (→agent)** (responsibility moves to the host agent) ·
**DELETE (obsolete)** (no longer needed) · **MODIFY** (stays, materially changes) ·
**NEW** (no v0.33 equivalent).
Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-target with a keep/port target).
---
## 0. Executive summary
- The **app domain is largely intact and portable**: stack lifecycle (`stacks/`), catalog
git-sync (`sync/`), app-to-app integrations (`integrations/`), `.fab` export/import
(`appexport/`), the scheduler, crypto, asset sync, the hub report/notify *channels*, and
most of the web UI **KEEP/PORT cleanly**.
- The **disk/storage/host half deletes wholesale to the agent**: all of `storage/`,
`monitor/watchdog.go`, the restic/cross-drive/disk-layout/drive-mount parts of `backup/`,
`report/infra_backup*`+`infra_pull`, and the host-physical parts of `system/`.
- The **setup wizard (`setup/`) is obsolete** — the agent provisions the controller.
- **The single biggest hazard is `backup/`**: the keep side (DB dumps, Docker-volume
archive, per-app restore — needed by `appexport/` and the backup UI) and the delete side
(restic, cross-drive, drive-mount) are **interleaved inside the same files**
(`backup.go`, `restore.go`, `paths.go`), not cleanly file-separated. Extracting the
app-data-backup subset into a clean retained package is the critical refactor.
- **Intent-vs-reality corrections** (vs the task's provisional split): `monitor/pinger.go`
is already **dead** (legacy Healthchecks.io, "deprecated… now handled by Hub" per
`main.go`) → DELETE(obsolete), not keep. `backup.go`/`restore.go`/`paths.go` do **not**
split on file boundaries — they split *within* the file. `settings/` is **not** pure app
domain — it stores disk/disconnect/decommission state. `system/` is genuinely
mixed-per-function, not per-file.
---
## 1. v0.33 module inventory (package → purpose, key deps)
| Package | Purpose | Key internal deps |
|---|---|---|
| `cmd/controller/main.go` | Entry point; wires all subsystems; 6 adapters break import cycles; branches into setup mode | imports **every** package |
| `api/` | REST API (`router.go`) + geo endpoints (`geo.go`) | stacks, backup, metrics, notify, selfupdate, sync, system, assets, integrations, cloudflare, config, settings |
| `appexport/` | `.fab` app export/import (config+DB+volumes, AES-256-CTR+scrypt) | **backup** (DB dump), (provider iface → stacks) |
| `assets/` | Download/cache app assets from Hub API | — (HTTP only) |
| `backup/` | DB dumps, Docker-volume archive, **restic**, **cross-drive rsync**, per-app restore, **drive mount**, disk-layout, infra-backup metadata | config, monitor, settings, system, util |
| `cloudflare/` | Geo-restriction via Cloudflare WAF (zone/waf/geosync/countries) — **enforcement → hub** (S4) | settings |
| `config/` | `controller.yaml` schema + load | — |
| `crypto/` | AES-256-GCM for app.yaml secrets | — |
| `integrations/` | App-to-app (OnlyOffice→FileBrowser/Nextcloud) via docker exec / config patch | stacks, crypto, settings |
| `metrics/` | SQLite time-series: system + container metrics, log scan | system |
| `monitor/` | App health (`healthcheck`,`pinger`) + **storage/USB watchdog** | config, notify, settings, system |
| `notify/` | Hub event push (direct, own API key) | settings |
| `recovery/` | Generate `recovery-info.txt` (DR guide) | — |
| `report/` | Build+push hub report; **infra-backup payload**; **recovery pull** | backup, config, metrics, monitor, scheduler, settings, stacks, system |
| `scheduler/` | Cron/interval jobs, Budapest TZ | — |
| `selftest/` | Startup checks (docker/dirs/catalog/hub/**restic repos**/mountpoint) | backup, config, settings, system |
| `selfupdate/` | Self-update: pull image, edit compose, `up -d` | config |
| `settings/` | `settings.json` persistent state: **storage paths/disconnect/decommission**, cross-drive cfg, notif prefs, geo, integration state, DB-validation cache | — |
| `setup/` | **First-run wizard** (scan drives, hub-restore, manual config) | backup, config, report, settings, web |
| `stacks/` | Docker Compose lifecycle, deploy + memory validation, metadata (`.felhom.yml`), HDD-data delete | config, crypto, system |
| `storage/` | **Physical disk** scan/format/attach/mount/migrate/fstab/safety | backup, settings, util |
| `sync/` | Catalog git-sync (pull templates) | config |
| `system/` | Resource info: mem/cpu/load (guest) + **temp/disk-model/USB/mount topology (host)** | — |
| `util/` | String helper | — |
| `web/` | Hungarian dashboard: pages, auth, deploy, backup UI, **storage/disk UI**, DR restore UI, export UI, debug | appexport, backup, config, crypto, integrations, monitor, notify, scheduler, selfupdate, settings, stacks, storage, system |
---
## 2. Classification table (per package/file)
### `cmd/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `cmd/controller/main.go` | **MODIFY** | Wiring stays, but drop the setup-mode branch, the storage/watchdog/drive-migrator/restic/cross-drive/infra-backup wiring, and add the **agent local-API client**. 6 adapters shrink. | hazard |
### `api/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `api/router.go` | **PORT/MODIFY** | Keep stacks/deploy/integrations/metrics/sync/assets/selfupdate routes; **remove `/api/storage/*` (disk)**; backup routes become **agent-coordinated guest-backup** requests; `config/apply` (hub-pushes-yaml) changes since the **agent** now injects config at provision. | needs-rework |
| `api/geo.go` | **PORT/MODIFY** | Keep the customer-facing geo **preference** endpoints (set/get global + per-app); **drop the Cloudflare-sync trigger** — enforcement → hub (S4). The controller reports geo desired-state up instead of calling the CF API. | needs-rework |
### `appexport/` — KEEP/PORT (Docker-volume + DB level, no disk ops)
| File | Class | Reason | Risk |
|---|---|---|---|
| `crypto.go` | **KEEP** | Self-contained AES-256-CTR+HMAC+scrypt for `.fab`. | clean |
| `manifest.go`, `provider.go` | **KEEP** | Bundle metadata; provider interface (impl in main). | clean |
| `export.go` | **PORT** | Docker-volume `tar`, DB dump via `backup.DumpOne`, config copy. Depends on the **retained** app-data-backup subset of `backup/`; HDD-mount enumeration reworked to **per-volume placement**. | needs-rework |
| `restore.go` | **PORT** | `docker volume create`/`tar xf`, DB import, compose up. Same per-volume rework. | needs-rework |
| `estimate.go` | **PORT** | `du`/`df` on mounts → per-volume sizing. | clean |
### `assets/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `syncer.go` | **KEEP** | Hub API download + checksum cache; already a direct hub channel. | clean |
### `backup/` — THE SPLIT (delete side interleaved with keep side; see §3)
| File | Class | Reason | Risk |
|---|---|---|---|
| `dbdump.go` | **KEEP** | Pure `docker exec pg_dump`/`mariadb-dump` — app/DB data layer; the retained per-app backup. | clean |
| `appdata.go` | **PORT** | App-data discovery (stacks/volumes/DB containers, `du`). "HDD mount" concept → per-volume. | needs-rework |
| `backup.go` (1478 L) | **MODIFY (split)** | Mixes **keep** (`RunDBDumps`, `DumpAppVolumes(Safe)`, app restore) with **delete→agent** (`RunBackup`/`backupDrive`/restic snapshot/prune/check on per-drive repos). Must be torn in two. | hazard |
| `restore.go` (442 L) | **MODIFY (split)** | `RestoreApp` restic path → agent; Docker-volume + Tier-2 rsync restore (app layer) → keep. | hazard |
| `restore_app_linux.go`/`_other.go` | **PORT** | Per-app restore: compose pull/up, rsync app data, DB-dump restore. App layer; depends on backup location that changes. | needs-rework |
| `paths.go` | **MODIFY (split)** | `AppDBDumpPath`/`AppVolumeDumpPath` keep; `Primary/SecondaryResticRepoPath`, `InfraBackupDir` → agent. | needs-rework |
| `restic.go` | **DELETE (→agent)** | restic repos on drives = infra backup tier; agent does vzdump/PBS. | hazard |
| `crossdrive.go` | **DELETE (→agent)** | Tier-2 cross-drive rsync to secondary storage = storage-tier (agent + storage manifest). | hazard |
| `restore_drives_linux.go`/`_other.go` | **DELETE (→agent)** | `lsblk`/`blkid`/`mount`/fstab — pure host disk. | hazard |
| `disk_layout.go` | **DELETE (→agent)** | Disk topology for DR → agent. | clean |
| `local_infra.go` | **DELETE (→agent)** | Per-drive infra-backup metadata → agent. | clean |
| `restore_scan.go` | **DELETE (→agent)** | Scans drives to build a DR restore plan = agent-tier DR. | needs-rework |
### `cloudflare/` — DELETE (→hub): CF-API enforcement moves to the hub (S4)
| File | Class | Reason | Risk |
|---|---|---|---|
| `client.go`,`zone.go`,`waf.go`,`geosync.go`,`countries.go` | **DELETE (→hub)** | The **hub** holds the CF API token and reconciles geo desired-state → WAF (doc 01 §5, doc 03 §2). The controller no longer calls the Cloudflare API — it reports geo desired-state up. The customer-facing geo *preference UI/data* stays (see `api/geo.go`). | needs-rework |
### `config/`, `crypto/`, `util/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `config/config.go` | **MODIFY** | Drop `BackupConfig` (restic/retention), storage-drive keys, and `InfrastructureConfig.cf_api_token` (→hub, S4); keep customer/paths/web/git/stacks/monitoring/hub/assets/system; **add agent local-API endpoint+token**. | needs-rework |
| `crypto/crypto.go` | **KEEP** | App.yaml secret encryption. | clean |
| `util/strings.go` | **KEEP** | Trivial helper. | clean |
### `integrations/` — all KEEP (pure app-domain)
| File | Class | Reason | Risk |
|---|---|---|---|
| `integrations.go`,`lifecycle.go`,`manager.go`,`onlyoffice_filebrowser.go`,`onlyoffice_nextcloud.go` | **KEEP** | App-to-app via `docker exec` / compose-config patch; no host ops. | clean |
### `metrics/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `store.go`,`logscanner.go`,`telemetry.go`,`types.go` | **KEEP** | SQLite store, `docker logs` scan, container telemetry — app-domain. | clean |
| `collector.go` | **PORT** | Container metrics (`docker stats`) keep; host metrics via `system.GetInfo` (temp, physical disk) become **agent-provided or dropped**. | needs-rework |
| `sysinfo.go`/`sysinfo_other.go` | **MODIFY** | Reads `/host/etc`, `/proc/cpuinfo`, uptime — host static info; in-guest some is meaningful, hardware identity via agent. | needs-rework |
### `monitor/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `healthcheck.go` | **PORT (split)** | Keep guest health (mem/cpu/docker/protected-containers); host health (temp, **physical disk**, storage-path mount status) becomes **agent-fed**. | needs-rework |
| `pinger.go` | **DELETE (obsolete)** | Legacy Healthchecks.io; `main.go` itself marks it "deprecated… now handled by Hub". *(Corrects the task's KEEP/PORT guess.)* | clean |
| `watchdog.go` (902 L) | **DELETE (→agent)** | Storage/USB disconnect monitoring: `umount -l`, `mount -T /host-fstab`, UUID probing, restic-lock cleanup — pure host storage. | hazard |
### `notify/`, `recovery/`, `scheduler/`, `selftest/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `notify/notifier.go` | **KEEP/MODIFY** | Direct hub event channel (own API key) — keep; prune infra event types that move to the agent (`storage_disconnected`, `crossdrive_*`, `disaster_recovery_*`). | clean |
| `recovery/info.go` | **DELETE (obsolete)** | Generates a DR text guide (OS install, docker-setup.sh, hub restore UI); DR is now agent+hub provisioning. | clean |
| `scheduler/scheduler.go` | **KEEP** | Generic cron/interval, Budapest TZ. | clean |
| `selftest/selftest.go` | **PORT** | Keep docker/dirs/catalog/hub checks; drop restic-repo + system-data **mountpoint** checks (→agent). | needs-rework |
### `report/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `pusher.go` | **KEEP** | Direct hub push (`/api/v1/report`, Bearer). | clean |
| `telemetry.go` | **KEEP** | Per-app telemetry section. | clean |
| `builder.go` (326 L) | **MODIFY** | Keep containers/telemetry/stacks/geo/app-health; drop/relocate host system info, physical storage, **restic backup status incl. restic password**. | hazard |
| `types.go` | **MODIFY** | Schema: drop infra fields (`restic password`, physical storage), keep app-domain. | needs-rework |
| `infra_backup.go`/`_linux.go`/`_other.go` | **DELETE (→agent)** | Builds infra-backup payload (disk layout, restic/enc passwords) for hub. | hazard |
| `infra_pull.go` | **DELETE (→agent)** | Pulls recovery config + infra backup from hub (setup-wizard DR). | needs-rework |
### `selfupdate/` — controller is agent-managed (doc 03 §11)
| File | Class | Reason | Risk |
|---|---|---|---|
| `version.go` | **KEEP** | Semver parse / version string (still used for reporting). | clean |
| `state.go` | **DELETE (obsolete)** | Self-update audit state — the agent owns controller updates now (doc 03 §11). | clean |
| `updater.go` | **DELETE (→agent)** | Resolved (doc 03 §11): the controller is **agent-managed** — the agent snapshots → redeploys → health-gates → rolls back the controller. The controller's old self-update path (image pull + compose edit) is **removed**. | clean |
### `settings/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `settings/settings.go` (1101 L) | **MODIFY (split)** | Keep notif prefs, integration state, geo, DB-validation cache, cross-drive *intent*. The **storage-path registry** (`StoragePath` with `Disconnected`/`DisconnectedAt`/`StoppedStacks`/decommission) is disk-management state → reshape to **per-volume placement** fed by the agent's storage manifest; disconnect/decommission/migrate state leaves. (UUID is *not* a persisted field — runtime-derived from fstab.) | hazard |
### `setup/` — all DELETE (obsolete); the agent provisions the controller
| File | Class | Reason | Risk |
|---|---|---|---|
| `handlers.go`,`setup.go`,`csrf.go`,`network.go` | **DELETE (obsolete)** | First-run wizard (hub-restore, manual config, LAN-IP detection). | needs-rework |
| `scanner.go` | **DELETE (→agent)** | Drive scan (`lsblk`+temp mounts) for backup discovery — host op; its capability informs the agent. | clean |
### `stacks/` — core app domain (KEEP/PORT)
| File | Class | Reason | Risk |
|---|---|---|---|
| `manager.go` (1074 L) | **KEEP/PORT** | Docker Compose orchestration, scan/state/start/stop/logs — the heart. Minor port. | clean |
| `deploy.go` | **PORT** | Memory validation (`system.GetMemoryMB`**guest** mem, fine in LXC), secret gen, encrypted app.yaml. **Add snapshot-before-deploy → agent** hook. | needs-rework |
| `healthprobe.go` | **KEEP** | TCP/HTTP app probes. | clean |
| `metadata.go` | **PORT** | `.felhom.yml` parse. **Add per-volume hot/bulk classification** (doc 01 §8). | needs-rework |
| `delete.go` | **PORT** | Stack delete + HDD-data `os.RemoveAll` on bind mounts → per-volume cleanup. | needs-rework |
### `storage/` — entire package DELETE (→agent)
| File | Class | Reason | Risk |
|---|---|---|---|
| `scan*`,`format*`,`attach*`,`migrate*`,`migrate_drive*`,`safety*` | **DELETE (→agent)** | Physical disk: `lsblk`/`sfdisk`/`wipefs`/`mkfs.ext4`/`partprobe`/`mount`/`umount`/fstab/`blkid`/drive-rsync. The agent owns all of this (doc 01 §3, §8). | hazard |
### `sync/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `sync/sync.go` | **KEEP** | Catalog git-sync (clone/fetch/reset, copy compose+`.felhom.yml`, never overwrite app.yaml). | clean |
### `system/` — split per-function (not per-file)
| File | Class | Reason | Risk |
|---|---|---|---|
| `cpu_linux.go`/`cpu_other.go` | **KEEP** | `/proc/stat` works inside an LXC. | clean |
| `info.go`/`info_other.go` | **KEEP** | Structs/stubs. | clean |
| `info_linux.go` | **MODIFY (split)** | Keep mem (`/proc/meminfo`)/load/statfs (guest); **temp via `/host/sys`, hwmon → agent**. | needs-rework |
| `mounts_linux.go`/`mounts_other.go` | **DELETE (→agent)** mostly | Mount-point detection, USB, disk model, fstab, probe — host/disk. Guest-meaningful `statfs` disk-usage is the only keep-candidate → fold into the kept `info`. | hazard |
### `web/` — split by UI surface
| File | Class | Reason | Risk |
|---|---|---|---|
| `auth.go`,`csrf.go`,`logbuffer.go`,`embed.go`,`templates.go` | **KEEP** | Session/CSRF, log ring buffer, embeds/logo. | clean |
| `funcmap.go` | **KEEP/PORT** | Template helpers; a few backup/state labels track the backup rework. | clean |
| `server.go` (559 L) | **MODIFY** | Routing/wiring; remove storage/DR-restore/watchdog wiring; keep app/deploy/backup/settings/export/debug. | needs-rework |
| `handlers.go` (1883 L) | **PORT/MODIFY** | Core pages keep; the embedded **storage-path management** (add/remove/label/schedulable, storage bars, FileBrowser mount sync) → per-volume / agent-fed. | hazard |
| `handler_export.go` | **KEEP/PORT** | `.fab` UI. | clean |
| `handler_debug.go` (823 L) | **PORT** | Drop storage-simulate/infra-push/DR debug; keep the rest. | needs-rework |
| `alerts.go` | **PORT/MODIFY** | Storage-disconnect alert now sourced from **agent** status; backup/update alerts keep. | needs-rework |
| `handler_restore.go` | **DELETE (→agent) / MODIFY** | DR restore-mode UI; DR is agent-tier — replace with an agent-status view or remove. | needs-rework |
| `storage_handlers.go` (1600 L) | **DELETE (→agent)** | Format/attach/mount/disconnect/migrate-drive/decommission disk UI. Any survivor is a **thin client calling the agent API** (e.g. per-volume placement requests). | hazard |
| `templates/` (HTML, non-Go) | **PORT** | Remove disk-wizard + DR pages; keep app/deploy/backup/settings pages. | needs-rework |
### `scripts/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `scripts/hashpass.go` | **KEEP** | Standalone bcrypt helper. | clean |
---
## 3. Coupling hazards (delete-targets depended on by keep/port)
1. **`backup/` is half-deleted but split *inside files*, not across them.** `backup.go`
contains both `RunDBDumps`/`DumpAppVolumesSafe`/app-restore (keep) and
`RunBackup`/`backupDrive` + restic (delete→agent); `restore.go` and `paths.go` are
likewise mixed. **Keep/port consumers reach into this same package:**
- `appexport/export.go:295``backup.DiscoverDatabases`/`DumpOne` (DB dump is app-layer — must survive)
- `report/builder.go:buildBackupReport` → backup status (MODIFY)
- `web/handlers.go` (backups page, `buildAppBackupRows`), `web/funcmap.go`, `web/alerts.go`, `web/handler_restore.go`, `web/handler_debug.go`
- `selftest/selftest.go:217``checkResticRepos` (restic path — delete)
- `main.go` scheduler chain `RunFullBackup` (DB→volume→restic→infra-push) interleaves both sides.
**Action:** extract the app-data-backup subset (DB dump, volume archive, per-app
restore) into a clean retained package *before* deleting the restic/cross-drive code,
or every keep consumer breaks.
2. **`backup/crossdrive.go` (delete→agent) is wired as `crossDriveRunner` into**
`main.go`, `api/router.go`, `web/server.go`, and surfaced by `report/builder.go` and the
backups page. Removing it requires reworking the backup UI/report to the agent's
guest-backup status.
3. **`storage/` (delete→agent) depended on by keep/port UI:** `web/storage_handlers.go`
(delete) and `web/server.go`/`web/handlers.go` (port) — the latter renders storage
labels/bars and runs **FileBrowser mount sync** off the storage-path registry.
`storage/migrate*.go` also imports `backup` (also being split). Untangle the per-volume
placement UI from the disk-management UI.
4. **`monitor/watchdog.go` (delete→agent) depended on by** `web/alerts.go` (port),
`web/server.go`, `web/handler_debug.go`, `main.go`. The disconnect **alert** must instead
consume agent-reported storage status.
5. **`system/` mixed-per-function, consumed by both sides.** Keep consumers —
`stacks/deploy.go` (`GetMemoryMB`, guest), `metrics/collector.go` (container) — must not
drag in the host-disk/temp/USB code that goes to the agent (`mounts_linux.go`,
`info_linux.go` temp). Also consumed by `report/builder.go` (MODIFY), `monitor/healthcheck.go`
(PORT), `selftest`, `crossdrive` (delete). **Split `system/` cleanly into guest-info vs
host-info first.**
6. **`settings/StoragePath` carries disk state into an app-domain store.** Disk fields
(`Disconnected`,`DisconnectedAt`,`StoppedStacks`, decommission — UUID is *not* persisted, it's runtime-derived from fstab via `system.ParseFstabUUID`/`watchdog.go`) are written by
`watchdog.go`/`storage_handlers.go`/`crossdrive.go` (all delete) but the same struct is
read by `stacks`/`web` for labels and **placement** (keep). Reshape `StoragePath` to a
placement record fed by the agent manifest.
7. **`report/builder.go` imports almost everything** (backup, monitor, scheduler, stacks,
system, metrics, settings, config). Its MODIFY must land *after* the backup and system
splits, or it pulls deleted code along.
8. **`backup/paths.go` shared both ways** — `appexport` + `selftest` + the kept DB-dump
flow use the app-dump path helpers; the same file holds the restic/secondary helpers
that leave.
9. **DR/provisioning chain is cross-cut:** `setup/` (obsolete) → `report/infra_pull` +
`recovery/info` + `backup.MountDrivesFromLayout` + `backup.ReadLocalInfraBackup`. All
obsolete/→agent, but `main.go`'s setup branch and `web/handler_restore.go` reference
them; remove together.
---
## 4. Moves to the host agent (consolidated — feeds the future agent design)
> Reporting only; **not** designing the agent here.
- **All physical-disk management** — `storage/` in full: scan/classify, format
(`wipefs`/`sfdisk`/`mkfs.ext4`/`partprobe`), attach (raw mount + bind + fstab), per-app
and full-drive migration (rsync), safety checks (system-disk detection).
- **Storage/USB watchdog** — `monitor/watchdog.go`: disconnect/reconnect detection,
`umount -l`, `mount -T /host-fstab`, UUID-by-id probing, safe-disconnect, restic-lock
cleanup.
- **Infra/disk backup tier** — `backup/restic.go`, `crossdrive.go`,
`restore_drives_*`, `disk_layout.go`, `local_infra.go`, `restore_scan.go`, plus the
restic-snapshot half of `backup.go`, the restic-restore half of `restore.go`, and the
restic/secondary path helpers in `paths.go`. (Maps to the agent's `vzdump`→tiers→PBS in
doc 01 §8.)
- **Infra-backup payload + recovery pull** — `report/infra_backup*`, `report/infra_pull`.
- **Host-physical telemetry** — `system/mounts_linux.go` (mount topology, USB, disk
model), the temp/hwmon parts of `system/info_linux.go`, and the host-hardware parts of
`metrics/sysinfo.go`.
- **Drive scanning for provisioning/DR** — `setup/scanner.go`.
- **Self-restore-test execution** — the agent performs the restore-to-scratch-guest; the
controller only orchestrates/validates (see §5).
---
## 5. New components to build (no v0.33 equivalent)
1. **Agent local-API client** — the controller's only path to guest-level Proxmox
operations (doc 01 §3, §5): `snapshot-before-deploy` + rollback, "grow my RAM", request
guest backup/restore, read the storage manifest / mount placement, query per-target
storage status. Replaces the deleted direct host/disk code with constrained RPC. The
controller holds **no Proxmox creds** — only a local-API token.
2. **Per-volume storage placement** (doc 01 §8) — `.felhom.yml` `hot`/`bulk` volume
classification (extend `stacks/metadata.go`), enforcement at deploy (extend
`stacks/deploy.go`), and a placement record in `settings`. Replaces the per-app
HDD-path + cross-drive model. A `bulk` volume must be realized as a `backup=0` mount point,
**never** a rootfs Docker named volume (validated recipe: `phase3-findings.md` B2 / doc 03 §7).
3. **Self-restore-test status display** (read-only) — the **agent owns orchestration** (it
holds the PBS key and creates the scratch guest — operator-tier, doc 03 §8); the controller
only surfaces `GET /restore-test/status` in its UI. (Round-trip validated: Phase 2,
[../proxmox-platform.md](../proxmox-platform.md) §4.)
4. **Snapshot-before-deploy/rollback flow** in the deploy path — wraps the existing
compose deploy with agent snapshot → health check → agent rollback-on-failure
(doc 01 §9). New behaviour on top of `stacks/deploy.go` + `stacks/healthprobe.go`.
5. **Agent-provisioning bootstrap receiver** — the controller accepts its injected hub API
key + local-API token from the agent at provision time (doc 01 §6), replacing the
deleted `setup/` wizard.
---
## 6. Open / blocked items
- **Geo — resolved (S4):** CF-API **enforcement moves to the hub** (it holds the CF token and
reconciles geo → WAF); the controller keeps the geo **preference UI/data** and reports
desired-state up. Tunnel placement is settled (host, agent-managed, doc 03 §3/§5). The
`cloudflare/` package + `api/geo.go`'s CF-sync are DELETE-from-controller → hub.
- **Self-update — resolved (doc 03 §11):** the controller is agent-managed; its self-update
path is removed.
- **`settings`/`stacks` per-volume reshape** — depends on the storage-manifest contract
between hub ↔ agent ↔ controller (doc 01 §8), not yet specified.
- **Backup UI/report surface** — depends on the agent's guest-backup status API shape
(what the controller can see about vzdump/PBS state) — undefined.
- **Notification event taxonomy** — which infra events (`storage_disconnected`,
`crossdrive_*`, `disaster_recovery_*`) the **agent** emits vs the controller, once those
responsibilities move.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
- **M1:** removed `UUID` from the `settings.StoragePath` field lists (§ settings, hazard #6) —
it is runtime-derived from fstab, not persisted.
- **S4 (geo):** `cloudflare/` reclassified **PORT(blocked) → DELETE(→hub)** (CF-API enforcement
moves to the hub); `api/geo.go`**PORT/MODIFY** (keep geo *preference* endpoints, drop the
CF-sync trigger); `config/config.go` also drops `cf_api_token`. §6 + §1 updated.
- **S5:** cloudflare/geo no longer "blocked on tunnel placement" (resolved).
- **S6:** §5(3) self-restore-test → **status-display only**; the agent owns orchestration.
- **Self-update resolved (03 §11):** `updater.go`**DELETE(→agent)**, `state.go`
DELETE(obsolete), `version.go` KEEP; §6 + §5(2) updated (bulk = `backup=0` mountpoint recipe).
+299
View File
@@ -0,0 +1,299 @@
# Architecture Part 3 — The Host Agent
> Status: design draft (decision content). To be grounded by Claude Code against
> `docs/proxmox-platform.md` and `docs/architecture/02-controller-module-map.md`,
> then placed at `docs/architecture/03-host-agent.md`.
>
> Builds on Part 1 (`01-topology-and-trust.md`) and Part 2 (`02-controller-module-map.md`).
> Where this doc and the locked decisions disagree, the locked decisions win and this
> draft is wrong — flag it.
## 1. Purpose & scope
The **host agent** is the operator-tier component that runs on each Proxmox host and
owns *all* Proxmox interaction. It is the trusted host actor: it provisions and restores
guests, manages host storage, orchestrates backups and restore-tests, watches the host
and the tunnel, talks to the hub, and exposes a narrow local API to the in-guest
controllers it deploys.
It is the privileged tier. The controller deliberately holds **no** Proxmox credentials
(Part 1) — the privilege the controller shed by losing `storage/` did not disappear, it
**moved here**. That makes the agent's hardening and blast-radius discipline the most
security-sensitive part of the platform.
The agent manages a **set** of guests on its host (usually one customer = one guest, but
the multi-tenant/company case is not precluded — the agent's data model is per-host,
N-guests, never "the guest").
## 2. Responsibilities (and explicit non-responsibilities)
Owns:
1. **Proxmox lifecycle** — create/start/stop/destroy guests, snapshots, storage allocation. Via a scoped Proxmox API token (the **`FelhomAgent` operator role** — `proxmox-platform.md` §3.6, validated Phase 3 B3) for everything the API covers; raw host ops only where unavoidable.
2. **Storage management** — attach/classify targets, reconcile the storage manifest, mount USB-by-UUID, present mounts into guests.
3. **Backup/restore orchestration** — vzdump to the tiers, PBS, snapshot management, and the **self-restore-test**.
4. **Host & tunnel monitoring** — host metrics, guest up/down, storage-target status, and `cloudflared` health; reports the host domain to the hub.
5. **Provisioning** — provision a guest **by restoring the golden base image** (§9), deploy the controller into it, hand it its bootstrap config; also **build and refresh the golden base image** itself.
6. **Hub control loop** — poll for desired state + signed jobs, reconcile, execute, report, heartbeat.
7. **Local API** — the per-guest authorization gate the controller calls.
8. **Self-update** — update itself (carefully — it is a host service) and update the controllers it owns.
Explicitly does **not**:
- Serve application traffic or sit in the data path. **Control plane, not data plane**: if the agent dies, apps keep serving (Docker + LXC run without it); only *management* degrades — no new backups, no provisioning, hub loses the heartbeat.
- Hold or proxy customer application data.
- Run inside a guest. It is the thing that recovers guests and the host; it cannot be one of them.
- Manage **geo-restriction / the Cloudflare API**. Geo is hub-owned: the customer sets it in the controller UI, the controller reports the geo desired-state to the hub, and the **hub** (holding the CF API token) reconciles the WAF (S4). The agent manages only the *tunnel* service (`cloudflared`, §3/§5), never WAF rules.
## 3. Process model & host integration
- **Native Go binary, systemd service** on the host: boot-start, `Restart=always`, systemd watchdog (kill+restart on hang), journald logging, resource limits.
- **Root-minimized (boundary settled — Phase 3 B3).** The agent runs as a **non-root** service user with the scoped `FelhomAgent` token for all API-covered work + a **narrow `sudoers` allowlist** for true host ops. Per Phase 3 (B3) the boundary is settled: the entire per-customer guest lifecycle — provision (by restore, §9), config, start/stop, snapshot, backup, **restore**, destroy — is token-covered. Genuine OS-root is confined to: (1) building/refreshing the **golden base image** (`keyctl` create is `root@pam`-only — one-time at enrollment + a maintenance cadence, §9); (2) **host mounts** (USB mount-by-UUID, systemd mount units / fstab); (3) **SMART / hardware sensors**. Root therefore never sits on the per-customer path. See `proxmox-platform.md` §3.6 for the role + boundary table.
- **`cloudflared` is a separate systemd service**, not embedded in the agent. This is what makes the data path survive control-plane death by construction. The agent **manages and health-watches** it (see §5) but the tunnel does not live or die with the agent process.
## 4. Control model — reconcile + signed destructive ops
Two channels, split by **reversibility**, not by transport.
**(a) Desired-state reconciliation — steady state.**
The hub holds desired state for the host: which guests should exist (and at what spec),
the storage manifest, backup/retention policies, controller image versions. The agent
runs a reconcile loop converging actual Proxmox state → desired: idempotent, self-healing,
and tolerant of missed polls (drift is corrected on the next loop). Provisioning retries,
re-attach of a flapping USB target, redeploy of a crashed controller — all fall out of
reconciliation for free.
**(b) Signed one-shot jobs — operator actions.**
Restore-now, decommission, force-backup, break-glass-enable. Discrete, run-once
(idempotency key), written to the customer-visible audit log, and **outside** the reconcile
loop — they are point-in-time and often destructive, and a reconciler must never re-run a
restore because it "sees drift." A one-shot job names a **target** ("restore guest X from
snapshot S"), not a procedure; the agent owns the *how*.
**The reversibility gate (security-critical).**
"Signed jobs resist hub compromise" only holds if the agent also distrusts hub-supplied
*desired state* for destructive changes. The gate is by **provenance + data-bearing-ness, not
by verb**:
- **The reconciler MAY act without an operator signature** when: (a) creating/starting/restarting; (b) destroying resources it created earlier **within the same journaled transaction** (compensating rollback, §10); (c) destroying resources it **tagged ephemeral/scratch** (e.g. restore-test scratch guests, §8). The ephemeral/scratch tag is **agent-internal provenance and is never accepted from the hub** — else a compromised hub could relabel a data-bearing guest as scratch to walk the gate.
- **An operator signature is always required** to destroy/overwrite any resource holding the only/primary copy of customer data — live-guest destroy, storage detach/wipe, restore-overwrite, decommission — *regardless of whether it arrives as a job or as a desired-state delta*. A compromised hub cannot forge them because the signing key is **not held by the hub** (it lives with the operator / a separate signing path; the hub only queues opaque signed blobs).
- **Healing a crashed controller is non-destructive by construction:** it is reconstructable from its image + the guest's persistent volume, so "redeploy" = restart the LXC / `docker compose up -d` **inside the existing guest** — never a guest destroy. (v0.33 precedent: `watchdog.go` restarts stopped stacks, it never destroys the guest.)
Signed payloads carry a **nonce + expiry** (anti-replay: a captured "restore" job cannot be
re-injected later) and a target binding (host + guest id) so a signature can't be retargeted.
Notification-on-destructive-op is an **audit signal, never the guard** — a compromised hub
could both issue and suppress the notice, which is exactly why the *signature* (not the
notification) is the control.
## 5. Hub ↔ agent protocol (host domain)
**Box-initiated poll.** The hub never connects inbound. Each poll cycle exchanges:
- **Up:** heartbeat + a host-domain state report — host CPU/RAM/disk, per-guest up/down + spec, storage-target status (USB connected? NFS/CIFS reachable? PBS reachable?), last backup per target, last restore-test result, `cloudflared` health, agent + controller versions, audit-log tail.
- **Down:** the current desired state, any pending signed one-shot jobs, and config (poll interval, update window, policy changes).
**Dead-man's-switch (essential, not optional).** In a box-initiated model the heartbeat
*is* the liveness signal — a box that stops checking in is otherwise invisible. The hub
alerts the operator when an agent misses its expected check-in window. This is the worst
failure mode for a managed service, so it gets first-class treatment hub-side.
**Break-glass.** Standing inbound control is off. But when the poll loop *itself* is wedged
(agent hung, host sick) you cannot fix it through the poll loop. So there is an explicit,
**off-by-default, customer-consented, fully-audited** emergency path: SSH to the host via
the Cloudflare Tunnel behind Cloudflare Access (or on-site). Enabling it is itself a signed,
logged operation; it auto-expires.
## 6. Agent ↔ controller local API
The controller (in its LXC) reaches the agent (on the host) over the local bridge.
- **Transport:** HTTPS to the host's bridge IP on a fixed port.
- **Auth:** a per-guest local token, minted by the agent when it deploys the controller and written into the guest's bootstrap config. The agent maps token → guest and **authorizes per guest**: a controller can only act on *its own* guest. This is the agent acting as the per-guest authorization gate from Part 1.
- **Surface (minimal, all scoped to the caller's own guest):**
- `GET /storage` — mounts available to this guest and their **class** (fast/slow), so the controller can place hot vs bulk volumes per `.felhom.yml`. (The agent owns the actual mounts; the controller just binds to the paths it's given.)
- `POST /snapshot` — snapshot *this* guest (the snapshot-before-deploy primitive).
- `POST /rollback` — roll *this* guest back to a named snapshot (post-deploy failure recovery).
- `POST /backup` — request a backup-now of *this* guest (enqueued; non-destructive).
- `GET /backup/due` — whether a policy-scheduled backup is due for *this* guest, so the controller can quiesce then call `POST /backup` (the app-consistent path, §8).
- `GET /backup/status`, `GET /restore-test/status` — read-only status for the controller's UI.
Note what is *absent*: nothing here lets a controller touch another guest, the host, storage
attachment, or restore-overwrite. Destructive/cross-guest power stays operator-signed (§4).
A controller can only `POST /rollback` (or snapshot/backup) **its own** guest — the agent maps
token → guest and authorizes per guest, so a compromised controller's blast radius is
**self-scoped and bounded** to its own guest.
## 7. Storage manifest & reconciliation
The manifest is the load-bearing contract. It absorbs the **persisted** disk-state fields that
`settings.StoragePath` carries today **and adds** `durable_id`/UUID — today the controller
re-derives the UUID from fstab each boot (Part 2 / Phase-3), so persisting it is an
improvement. Held in the hub, reconciled by the agent.
Per target:
| field | meaning |
|---|---|
| `type` | `local-dir` / `usb` / `nfs` / `cifs` / `pbs` |
| `durable_id` | UUID (USB), `server:export` (NFS/CIFS), `repo+fingerprint` (PBS) — survives box loss |
| `class` | `fast` or `slow`, set **once at attach**, with an IOPS marker; no runtime speed-test |
| `role` | `primary` / `vzdump-target` / `pbs-offsite` / `bulk-data` |
| `creds` | encrypted (NFS/CIFS/PBS); USB has none |
| `policy` | schedule + retention for this target |
| `state` | `attached` / `disconnected` / `decommissioned` |
Reconciliation: ensure each `attached` target is mounted (USB-by-UUID via the sudoers
allowlist), each Proxmox storage entry matches, and `disconnected` targets are surfaced to
the hub (the storage watchdog — detect a USB drop in seconds, not at the next health cycle).
**Placement is per-volume, not per-app.** Hot volumes (DB/config) → a `fast` target,
**enforced**; bulk volumes (media) → may live on `slow`, declared in `.felhom.yml`.
A `bulk` volume **MUST** be realized as a `backup=0` **volume mount point** (or an external
bind mount) — **never** a Docker named volume in rootfs, which `vzdump` always captures
(verified, `phase3-findings.md` B2). Proven recipe: attach
`-mpN <storage>:<size>,mp=/mnt/bulk,backup=0`, then
`docker volume create --driver local -o type=none -o o=bind -o device=/mnt/bulk <vol>` (or a
compose bind). The per-volume placement component (Part 2 §5(2)) enforces this at deploy. The
**DR consequence** of excluding bulk is covered in §8.
**Field re-homing (from `settings.StoragePath`, Part 2):** `Label` → manifest (canonical);
`IsDefault`/`Schedulable` → manifest `policy`; `MigratedTo` + decommission → manifest `state`;
`StoppedStacks` → the **controller's `settings`** (app-domain: which apps to restart on
reconnect, not a host concern).
## 8. Backup/restore orchestration
Tiers double as backup *and* restore-source priority (fastest surviving source first),
per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a backup) →
**local second storage** (vzdump to dir/NFS/CIFS) → **PBS offsite** (the DR substrate).
- **Quiescing (controller-driven for app-consistency):** an LXC has no fsfreeze
(`proxmox-platform.md` §4.2), so app-consistency is the controller's job: it learns a backup
is due (`GET /backup/due`, §6, or via its hub channel) → **quiesces** the app stack →
`POST /backup` → polls `GET /backup/status` → unquiesces. **An agent-initiated vzdump is
crash-consistent only** (there is no inbound-to-guest channel to trigger a quiesce — §3/§5).
Every Proxmox op is async → the agent polls `task exitstatus`, never trusts the POST return.
- **Bulk volumes have no DR coverage from the guest vzdump** — they are excluded (§7). Every
`bulk` volume needs an explicit own-backup decision: its own backup target per the manifest
`policy`, **or deliberately none** when the data is re-downloadable (customer informed). On
host-loss, un-backed-up bulk is gone; a **bind-mounted** bulk volume re-attaches only on the
*same* host, so cross-host DR needs the separate backup. A deliberate per-volume choice,
never a silent loss.
- **Key custody (PBS):** the **live** PBS key sits on the box so the agent can both back up
*and* run restore-tests. The hub holds only the **recovery-code-wrapped escrow** copy it
cannot open (zero-knowledge default). So: the box can restore-test; the operator cannot
read the data; the customer's offsite recovery code is the irreducible residual.
- **Self-restore-test:** the closing of the "tested restore is the critical gap" theme. The
agent periodically restores a backup into a **throwaway scratch guest**, boots it, runs
health checks, reports pass/fail, and tears it down. Zero-knowledge backups can *only* be
restore-tested by the box (the operator lacks the key) — so this lives in the agent by
necessity, not just convenience. Integrity-verify (cheap, ciphertext-level) runs more often
as the lighter check.
## 9. Provisioning & DR flows
**Provisioning (reconcile-driven, by restore).** Fresh creation of a Docker-capable LXC needs
the `keyctl=1` feature flag, which Proxmox permits only for `root@pam` (Phase 3, B3) — not the
scoped token. But a token-authorized **restore preserves `keyctl`** (Phase 3, B3), so the agent
provisions **by restoring a golden base image**, never by `pct create` on the per-customer path:
- A **golden base archive** — minimal Debian + Docker, `nesting=1,keyctl=1`, overlayfs — is
built once as `root@pam` **at enrollment** (when the agent legitimately holds root to mint its
Proxmox token) and refreshed on a maintenance cadence. This is the one place `keyctl`/root
provisioning lives — off the per-customer path.
- To provision guest G: restore the golden archive → new VMID (token-covered: `VM.Allocate` +
`Datastore.AllocateSpace`; `keyctl` preserved) → reset identity (MAC/hostname) → size the guest
(CPU/mem config + `pct resize` rootfs, token-covered) → attach storage mounts per the manifest
→ deploy the controller → hand it bootstrap config. A mid-flight failure is journaled and
compensating-rolled-back (destroy the just-restored guest — allowed without a signature per §4,
same-transaction provenance).
**Unified bring-up primitive.** Provisioning and DR-restore share the same token-covered front
half — *restore an archive → reset identity* — and differ only in the archive and the back half:
provisioning restores the **golden base** then deploys a fresh controller; DR-restore restores
the **customer's backup** (already containing controller + data), brings it up, and reattaches
external storage. One code path, exercised by every restore-test (§8).
**Guest loss.** Agent restores G from the fastest surviving tier and resets identity
(MAC/hostname) so the restored guest rejoins cleanly — this *is* the unified restore primitive
above (customer-backup archive, DR back half).
**Host/hardware loss.** Re-enroll the new host in **restore mode**; the hub — the durable
source of truth that survives box death — hands the new agent the existing identity, PBS
namespace, tunnel token, storage manifest, and a restore directive. Tunnel is reused from
the hub record, so DNS stays intact.
## 10. Concurrency, crash-safety, idempotency
- **Per-guest serialization.** Reconcile, one-shot jobs, and local-API calls all feed a
work queue that serializes mutations **per guest** (Proxmox dislikes concurrent conflicting
ops on the same guest). Independent guests proceed in parallel.
- **Operation journaling.** Multi-step async ops (provision, restore, controller-update, agent
self-update) are journaled with their in-flight Proxmox task ids. On agent restart, the
journal is replayed: resume-or-rollback, so a crash mid-restore never leaves a corrupt or
half-built guest.
- **Idempotency keys** on one-shot jobs (run-once across retries and restarts).
## 11. Self-update
- **Agent (the hard case — a host service, no snapshot-rollback).** **A/B layout:** download →
verify signature → stage as the inactive slot → flip a `current → good|new` symlink → restart.
**Revert authority lives outside the swapped binary**`Restart=always` alone just
crash-loops a bad binary — so a **separate health-gate** (a systemd oneshot `ExecStartPost`
probe, or a tiny supervisor unit) flips `current` back to last-good and restarts on a failed
health window. The new version is **committed as "good" only after a clean health window**.
Triggered by a hub signed job within the update window; manual always allowed. Journaled (§10).
- **Controller (the easy case — it's a guest).** The agent owns the controller's lifecycle,
so the **agent updates the controller**: snapshot-before-update (free rollback, because the
controller *is* a snapshottable guest) → pull new image → redeploy → health-check → rollback
on failure. This resolves the Part-2 `selfupdate/` open: the controller is **agent-managed**,
not self-updating; the controller's old self-update path is removed.
## 12. Secrets at rest on the host
The agent holds, root-only on the host fs: the scoped Proxmox token, the hub API key, the
operator's **public** verify key (for §4 signatures — public, low-risk), the Cloudflare
tunnel token, encrypted storage creds (NFS/CIFS/PBS), and the **live PBS key**. The privilege
and the secret footprint that left the controller now concentrate here — which is the whole
argument for §3's root-minimization and a small, auditable agent.
## 13. Open items / what this unblocks
Resolved here: tunnel placement (host, agent-managed, own systemd service), the
reconcile-vs-jobs fork (hybrid, gated by reversibility), agent process model, self-update
ownership, the local-API surface, the storage-manifest schema, **provision-by-restore**, and
the **root-vs-API boundary** (Phase 3, B3).
Still open:
- Multi-tenant **resource fairness** on a shared host (per-guest cgroup limits, noisy-neighbor) — deferred to the company-case pass.
- Operator-side **signing tooling** — where the operator signing key lives operationally and how a destructive op gets signed without undue friction (offline key vs. a small signing service; the security floor is "not in the hub").
- Hub-side **desired-state editing UX** and the host-domain report schema details — belong to the hub architecture doc.
- **Golden base image** refresh cadence + fleet versioning — who triggers a rebuild, how the per-host image version is tracked (operational detail, not blocking; §9).
This doc hands the implementation three contracts it was waiting on:
1. the **local-API surface** (§6) → the controller's NEW local-API client, snapshot-before-deploy, and self-restore-test wiring (Part 2);
2. the **storage-manifest schema** (§7) → the `settings.StoragePath` reshape and per-volume hot/bulk placement (Part 2);
3. the **backup contract** (§78) → the destination for the app-data-backup package extracted in the Part-2 refactor.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
- **NEW provision-by-restore** (§9): the agent provisions by **restoring a golden base image**
(token-covered, preserves `keyctl`), never `pct create` on the per-customer path; one unified
restore primitive shared with DR. §2 responsibility + §3 boundary updated.
- **B3** (§2/§3): replaced "Phase-1 minimal role" with the validated **`FelhomAgent`** operator
role; root-vs-API boundary **settled** (root only for golden-image build, host mounts, SMART).
- **B1** (§4): reversibility gate rewritten as **provenance + data-bearing** (scratch tag is
agent-internal, never hub-supplied; crashed-controller heal is non-destructive in-place).
- **B2** (§7/§8): validated bulk-as-`backup=0`-mountpoint recipe + the **bulk-DR consequence**
(excluded bulk needs its own backup decision).
- **S1** (§6/§8): `GET /backup/due` added; controller-driven quiescing; agent vzdump is
crash-consistent only. **S2** (§10/§11): A/B self-update with external revert authority;
controller-update + agent self-update journaled. **S3** (§7): `StoragePath` field re-homing.
**S4:** geo non-responsibility added (§2). **M2** (§7): manifest "absorbs + adds durable_id".
**§6:** rollback is self-scoped/bounded. **§13:** golden-image refresh cadence added as open.
@@ -0,0 +1,154 @@
# Architecture Part 4 — Control-plane authorization (operator signing)
> Status: design draft (decision content), grounded on `docs/tests/phase4-signing-findings.md`.
> To be reviewed by Claude Code against that spike + `03` §4, then placed at
> `docs/architecture/04-control-plane-authorization.md`.
>
> Builds on Part 1 (enrollment / trust), Part 3 (the agent verifies + the §4 reversibility gate).
> This doc defines the **mechanism** behind `03` §4's "an operator signature the hub can't forge."
## 1. Purpose & scope
`03` §4 gates **destructive/irreversible** operations behind an operator signature the hub cannot
forge. That gate is only real if signing is real. This doc defines the signing mechanism: the
primitive, the keys, rotation, the three components' roles, and the operator workflow. The
*policy* (what needs a signature) lives in `03` §4; this is the *how*.
**Recap of what needs a signature** (from `03` §4, by reversibility, not by verb): destroying or
overwriting any resource holding the only/primary copy of customer data — live-guest destroy,
storage detach/wipe, restore-overwrite, decommission — **regardless of whether it arrives as a job
or a desired-state delta**. Benign convergence (deploy a guest, attach storage, restore to a *new*
guest, bump a version) runs on normal hub auth, unsigned. Most recovery is therefore unsigned;
signed ops are rare and deliberate.
## 2. Primitive — SSH signatures (SSHSIG)
Confirmed by Phase 4: destructive ops carry an **SSH signature** (`ssh-keygen -Y sign`, the armored
`SSHSIG` format), verified by the agent in Go (`golang.org/x/crypto/ssh`) — `pem.Decode`
`ssh.Unmarshal``ssh.ParsePublicKey``pub.Verify`. ~40 lines of framing, no hand-rolled crypto.
**Why SSHSIG and not raw Ed25519 / minisign:** SSHSIG verification dispatches on the key type
embedded in the signature, so the **same verifier accepts a software key (`ssh-ed25519`) today and
a FIDO2 hardware key (`sk-ssh-ed25519@openssh.com`) later** — which is exactly the hardware-ready
foundation we want (§7). A raw-Ed25519 verifier cannot consume an sk signature (flags+counter,
different signed-data), so it would force a verifier change on every box at hardware-adoption time.
SSHSIG buys key-type-agnosticism for a one-file framing cost (Phase 4 §56).
### 2.1 The signed object — canonical op blob
The signature covers an op blob (Phase 4 §2):
```
{ op, target:{host_id, guest_id}, params, nonce, issued_at, expires_at, key_id }
```
- **Canonical form is a *signer-side* requirement** — JSON, keys sorted at every level, no
insignificant whitespace, UTF-8 — so the blob is deterministic and human-auditable. The
**verifier trusts the exact bytes it receives** (it verifies the signature over the raw bytes and
parses those same bytes for fields), so there is no canonicalization-mismatch risk on the verify
side. The canonical form is the shared contract between the operator CLI and the agent (both Go).
- `nonce` ≥128-bit random; `issued_at`/`expires_at` a short window (minutes); `key_id` identifies
the signing key (rotation/audit).
### 2.2 Domain separation — the namespace
The SSHSIG **namespace** `felhom-op-v1` is a **fixed constant in the verifier**, never
caller-supplied. A signature minted for any other namespace must not verify (proven). This stops a
signature made for one purpose being reused for another.
### 2.3 Verify pipeline (order is load-bearing)
`namespace → allow-list → crypto verify → target binding → time window → nonce`. The **nonce is
recorded last**, only after everything else passes, so an invalid signature can never consume a
nonce (DoS-safe). Each layer is mandatory and was proven to reject independently (Phase 4 §34):
- **target binding** — `target.host_id`/`guest_id` must equal *this* box/guest (a signature for box
A cannot be replayed at box B);
- **time window** — `now ∈ [issued_at, expires_at]`;
- **nonce** — unseen within the window (the nonce store **must be persistent across agent restarts**
and expiry-pruned; a non-persistent store reopens the replay window after a restart).
The Phase-4 reference verifier (`VerifySignedOp`) is the seed of the agent's implementation.
## 3. The keys — two-key model, software now
Both software (SSH-format) keys today; both are also valid FIDO2-resident keys later with no box
change (§7).
- **Operational signing key** — the "master stamp" for destructive ops. A **dedicated** key (NOT
the operator's daily SSH login key), passphrase-protected, on the operator workstation. Used only
for destructive ops — rare, so its exposure is low.
- **Cold recovery key** — generated once, kept **offline** (password manager / a USB held back /
printed). Never used for ordinary ops; its sole power is to authorize rotating the operational key
if that key is lost or compromised.
Both **public** keys are pinned onto the agent at enrollment (the allowed-signers set). The
operational key is authorized for ops; the recovery key is authorized **only** for key-rotation
instructions.
**Allowed-signers is a set** → single signer today; **quorum (N-of-M) for the highest-blast ops is
just set sizing + a threshold policy**, addable later without a redesign (Phase 4 §8). Out of scope
now.
## 4. Rotation & compromise recovery
The agents pin the operator public keys. The danger: rotation must **not** flow as plain hub config,
or a compromised hub re-pins its own key and forges everything. So **every re-pin is itself a signed
op the agent verifies** (same pipeline, §2.3) — never unauthenticated config.
- **Planned rotation:** the *current* operational key signs a "new operational public key = X" op;
the agent accepts it because it's signed by the trusted current key (key-signs-key).
- **Operational key lost/compromised:** the **cold recovery key** signs the re-pin; the agent accepts
it because the recovery key is pinned and authorized for rotation. The compromised key is removed
from the allowed set in the same signed op.
- **Both keys gone:** on-site physical re-enrollment (last resort — re-establishes the trust root the
way initial enrollment did).
## 5. Component roles
- **Operator tooling (the workstation).** A signing CLI behind a thin **`Signer` interface**
(`Sign(blob) → signature`). The backend today is a **file key**; a **FIDO2/PIV** backend drops in
later (§7) with no change to the blob format, the hub, or the agent. Holds the operational private
key (passphrase-protected); can reach the cold recovery key when rotation is needed.
- **Hub.** Queues the **opaque** signed blobs and surfaces pending destructive ops + their signature
status in the operator UI. Holds **no** private key and cannot sign — a compromised hub can only
queue blobs the agent rejects. (Matches `03` §4 / box-initiated poll.)
- **Agent (each box).** Pins the allowed-signers set (operational + recovery) at enrollment; runs the
verify pipeline (§2.3) on any destructive op before executing; writes every signed op to the
customer-visible **audit log**. Notification-on-destructive-op is an audit signal, never the guard
(a compromised hub could issue *and* suppress notice — the signature is the control).
- **Enrollment.** Pins the initial operational + recovery public keys onto the agent during the
physical-presence provisioning step (the trust root is established on-site, not via the hub).
## 6. Operator workflow
- **Routine work** (deploy, monitor, attach storage, restore to a *new* guest): no signing, zero
overhead.
- **A destructive op** (rare): the operator runs the signing CLI on their workstation — which builds
the canonical blob, signs it (passphrase, or later a hardware touch), and posts it to the hub
queue — then the agent polls, verifies, executes, and audits. One command + passphrase, from the
desk. **Never** a site visit.
## 7. Hardware readiness (Viktor's "build the foundation now")
Software `ssh-ed25519` now; a FIDO2 `sk-ssh-ed25519@openssh.com` key later is a **no-op on the
boxes** — proven end-to-end against the OpenSSH spec in Phase 4 §5 (the unchanged verifier accepts a
spec-faithful sk signature). At hardware adoption the operator generates an sk-key, points the
`Signer` backend at it, and updates the allowed-signers entry; nothing on the boxes changes.
Two honest notes:
- **Confirm with a real device at adoption.** §5 was validated to spec, not against live hardware —
a 5-minute real-key round-trip should confirm it (no surprise expected; signer/library/device all
follow the same spec).
- **Optional future hardening:** require the FIDO2 **user-presence (touch) flag**. The verifier is
crypto-only today (correct for software keys); enforcing the flag is a small later option once
hardware is in use.
## 8. Open items
- **Quorum policy** (N-of-M per op-class, e.g. two signatures for decommission) — deferred; the
allowed-signers-set foundation supports it.
- **Signing-key passphrase UX** on the workstation (ssh-agent / askpass) — minor operator-tooling
detail.
- **Hub-side pending-op UI** (showing ops awaiting signature + audit) — belongs to the hub doc.
## 9. What this unblocks
Closes the `03` §4 "undesigned signing path." Hands the implementation: the **canonical blob spec**
(§2.1) + the **`VerifySignedOp` reference** (Phase 4 §7) for the agent's verify path, the
**`Signer` interface** for the operator CLI, and the **allowed-signers pinning** step for enrollment.
The hub's signed-job queue + pending-op UI carry into the hub architecture doc.
@@ -0,0 +1,223 @@
# Architecture Part 5 — The Hub
> Status: design draft (decision content). To be validated by Claude Code against the **actual
> felhom-hub source** (`felhom.eu` repo, `hub/`) + Parts 0104, then placed at
> `docs/architecture/05-hub-architecture.md`.
>
> The hub is **not** greenfield — it's a mature service (felhom-hub v0.6.3, Go + SQLite on k3s,
> `hub.felhom.eu`). This doc is the **deltas** to evolve it for the Proxmox model, plus the new
> data model. Builds on Part 1 (trust/enrollment), Part 3 (the agent + reconcile), Part 4 (signing).
## 1. Source-of-truth model — two drivers, two directions
The single most important framing, and the one that governs everything below: the hub is **not** a
monolithic source of truth. State flows in two directions with opposite drivers.
- **Operator-driven *intent* — hub authors, agent reconciles (top-down).** Which guests should
exist and their spec, storage *policy* (a target's role/class/backup schedule), controller +
golden-image versions, identity, tunnel. The operator sets these in the hub; the agent converges
toward them. Here the hub *is* the source of truth.
- **Box/customer-driven *reality* — box authors, pushes up, hub mirrors (bottom-up).** Which USB
drive is *physically* attached (and its `durable_id`), what apps are deployed and where, the
customer's controller configs/settings, host/guest health, latest PBS snapshot pointers. The
customer or the physical world drives these; the box reports them; the hub stays an up-to-date
**mirror** but is **never** the driver.
They meet at a **handshake**, not a tug-of-war. Storage is the clearest case: the customer plugs in
a drive → the agent *detects* it and reports `durable_id X attached` (reality) → the operator
assigns `role=bulk, class=slow, backup=weekly` (policy, intent) → the agent reconciles that policy
*onto the detected drive*. **Apps never enter the reconcile loop** — app deployment is the
controller's domain (customer- or operator-driven, inside the guest); the hub only mirrors the
resulting inventory. **Reconciliation applies to infrastructure; the app/customer layer is mirrored.**
## 2. Data model (Part 1 decision (b): customer-anchored)
A customer's deployment is one **Host** (its agent) plus one-or-more **Guests** (its controllers).
1 customer = 1 host + N guests; the shared-host multi-tenant case is deferred (not precluded — the
`hosts` table is the seam it would use).
- **`customer_configs`** (existing) — the Customer anchor: identity, domain, email,
`retrieval_password`, status, config_json. Unchanged role.
- **`hosts`** (new) — `host_id PK, customer_id, api_key` (the agent's hub key), `agent_version`,
desired-state intent (storage manifest + policies + golden-image version, as JSON), a per-host
**`desired_generation`** counter, the slim DR record (§9), timestamps.
- **`guests`** (new) — `guest_id PK, customer_id, host_id, api_key` (the controller's hub key),
`display_name, controller_version`, per-guest **`desired_spec_json`** (CPU/mem/disk, versions),
timestamps.
**Per-reporter keys:** today's per-customer `customer_configs.api_key` becomes per-reporter —
`hosts.api_key` (agent) and `guests.api_key` (controller). The hub resolves a presented Bearer key →
host or guest → customer; `customer_configs.api_key` goes unused once auth resolves via the new keys.
**Clean cutover:** no dual-model support; the demo re-enrolls fresh into `host + guests`.
## 3. Report ingest — two domains
The single controller report splits. The de-privileged controller no longer sees host disks/storage/
backup, so its report **slims** (it loses System/Storage/Backup, keeps app-domain).
- **`POST /api/v1/host-report`** (new, agent) → **`host_reports`**: host CPU/RAM/disk, per-guest
up/down + spec, storage-target status (attached drives + `durable_id` + reachability), last backup
+ restore-test per target, latest PBS snapshot pointers, `cloudflared` health, agent + controller
versions. Denormalized columns for the dashboard; full `report_json`. Index `(host_id, received_at
DESC)` + `(customer_id, received_at DESC)`.
- **`POST /api/v1/report`** (existing, slimmed controller) → the renamed **`guest_reports`**: it
gains `guest_id` + `host_id`; its `cpu/memory` denorm now means *guest-level*; `backup_last_snapshot`
goes quiet (backup status lives in `host_reports`). App telemetry / log issues stay.
These two streams are the bottom-up mirror of §1 — they keep the hub current without a separate push.
## 4. Liveness / dead-man's-switch
Evolves the existing staleness checker (60s **cadence**, 30m/1h **thresholds** — OK <30m, down at
2× = >1h; today: controller-report recency → `node_stale`/`down`/`recovered`):
- **Primary = host-report recency → `host_stale` / `host_down`.** The agent heartbeat is the box's
liveness signal; a silent agent = the box is gone (the critical alert).
- **Guest up/down comes from the host report's per-guest status** — authoritative, every poll, faster
than waiting for a guest report to go stale.
- **Guest-report recency = secondary** app-level signal.
**Backup-deadline checker:** today it is *event-based* — it scans for `backup_completed`/`backup_failed`
events since local midnight and alerts if none. Two changes: (1) **mechanism** — move it to a field
check on `host_reports`' last-backup-per-target (cleaner now that backup state arrives in the host
report); (2) **emitter** — the de-privileged controller no longer runs backups, so the **agent** is the
source of the last-backup status (Part 3 §8). Without re-homing the source, the deadline check would go
silent after the controller stops backing up.
## 5. Desired-state serving
The operator's **intent** (§1 top-down) lives as JSON on `hosts`/`guests` (storage manifest +
policies + golden version on the host; per-guest spec + versions on the guest) with a per-host
`desired_generation`. The agent pulls its host's desired state on poll (with the generation, so it
reconciles only on change and reports which generation it has converged to).
- **Benign convergence** (create a guest, attach storage per policy, bump a version, adjust a
non-destructive policy) → the agent reconciles freely.
- **Destructive convergence** (guest removal = destroy, storage detach/wipe, data-losing resize) →
the agent requires a **matching signed op** (§6) before executing that delta; absent/invalid → it
refuses and reports `pending_signature`.
**Geo is *not* in the agent's desired state** — it's customer→hub→Cloudflare (§7); the agent never
touches WAF.
## 6. Authorization — signed-op queue + editing flow
Implements Part 4's gate on the hub side. The hub holds **no signing key**.
- **`signed_ops`** (new): `op_id, customer_id, host_id, target_guest, op_type, op_blob (canonical
JSON), signature (armored SSHSIG), status (pending_signature → signed → delivered → executed /
failed / expired / rejected), nonce, issued_at, expires_at, executed_at, result`.
- **Editing flow:** the operator edits a customer's desired state, reusing the existing config-form +
diff UX. Note the **transport inverts**: today's "Push" is a hub→box *inbound* POST (forbidden by the
box-initiated model); here "publish" means **write to desired state, delivered on the next agent/
controller poll**. The form and diff carry over; the push transport does not. The hub diffs vs current
and **classifies each delta** (B1 rule):
- **benign** → published straight to desired state;
- **destructive** → the hub generates the canonical op blob and routes it through signing.
- **Signing hand-off (Part 4 option (b)):** a local operator CLI (`felhom-sign --pending`) fetches
the pending blob from the hub, signs it on the workstation with the dedicated key, and posts the
signature back into `signed_ops`. The hub never sees the key.
- The agent polls `signed_ops` for its host alongside desired state, verifies (Part 4 pipeline),
executes, and reports status → the hub logs to the existing **`events`** audit trail.
- **Classification lives in both places, with different jobs:** the hub classifies at *edit time*
for UX (prompt to sign); the **agent's classification is the authoritative guard** (a compromised
hub could skip the prompt, but the agent still enforces the signature).
- A **pending-ops view** per customer shows the lifecycle (awaiting signature → awaiting agent →
executed).
## 7. Geo enforcement (Part-2 S4)
The hub already holds the CF API token and already has a remove-all path
(`internal/web/configs.go` `handleGeoDisable` → `cloudflare.RemoveGeoRules`). **But the token is
dual-purpose today** — DNS-01/ACME *and* WAF/geo — and `configgen.Generate` deep-merges it (via
`config_json`) into the generated `controller.yaml`, so it currently ships **down to the box**. Two
things follow:
- **ACME assumption (must be stated, not skipped):** in the Cloudflare-Tunnel-default model the edge
terminates TLS, so the box needs no public certificate and the **DNS-01/ACME use of the token goes
away**. Granting that, the token comes fully off the box and lives hub-only. (If any box still does
DNS-01, the token cannot fully come off — so this assumption is load-bearing.)
- **`configgen` must stop emitting `cf_api_token`** into `controller.yaml` (drop it from the merge /
relocate it to a hub-only field).
The delta: the **customer sets geo in the controller UI → the controller reports the geo desired-state
up → the hub reconciles it into the Cloudflare WAF** (rather than the box calling the CF API). The hub
keeps the remove-all override for self-lockout. The controller no longer calls the CF API.
## 8. Enrollment (evolution of the existing retrieval-password/config-gen flow)
Today: `GET /config/{id}` with an `X-Retrieval-Password` (Hungarian passphrase) returns a deep-merged
`controller.yaml`. New:
- Enrollment mints the **agent identity first** (the agent then provisions controllers), pins the
**operator signing public keys** (Part 4 — operational + cold recovery) onto the agent, and the
agent mints each controller's bootstrap (its hub guest key + local-API token).
- A **restore-mode** re-enrollment (§9) hands an existing identity to a fresh agent.
The existing `configgen` deep-merge + Hungarian-passphrase machinery is the base; it grows the
agent-first + key-pinning + restore-mode steps.
## 9. DR model
The headline: the **old heavy infra-backup push retires** — not because the hub authors everything
(§1 says it doesn't), but because (a) the box-driven mirror already arrives via the §3 report streams,
and (b) the actual app **data + configs live inside the PBS guest snapshot**. So a separate
config+secrets+restic-password infra-backup blob is redundant.
What remains:
- the **report streams** keep the hub's mirror current (storage layout + `durable_id`s, app inventory,
snapshot pointers) — but this mirror is **convenience, not the DR source of record** (reports are
pruned by age);
- the agent **escrows the recovery-code-wrapped PBS key** to the hub (the one artifact only the box
can produce — zero-knowledge: the hub stores it, cannot open it);
- a **slim DR record** on the `hosts` row (PBS namespace + repo fingerprint + the wrapped escrow key).
These last two are *box-reported* columns on an otherwise operator-intent row — labelled as such so
the §1 two-driver split stays legible per column.
Both existing infra-backup tables retire — `infra_backup_versions` (the current/live one, all readers
hit it) **and** `infra_backups` (the deprecated legacy mirror). The slim DR record folds onto `hosts`
instead. The **controller's infra-backup push is removed** (it's de-privileged).
**Recovery (host loss):** the new agent re-enrolls in **restore mode**; the hub hands it the durable
record — and DR reads from the **durable sources, not the prunable report mirror**: operator intent
(desired-state on `hosts`/`guests` — identity, tunnel token, storage manifest), the slim DR record
(PBS namespace + repo fingerprint), the **wrapped escrow key**, and **PBS's own snapshot enumeration**
(the agent lists snapshots once it has the namespace + unwrapped key). Guest inventory + app data come
from **inside the PBS guest snapshots**, not from a retained `host_report`, so recovery doesn't degrade
when the last report has aged out. The **customer provides their recovery code at the agent**, which
unwraps the PBS key locally (never sent to the hub); the agent restores guests from PBS, resets
identity, reuses the tunnel. The customer recovery code is the irreducible residual (the premium
operator-managed custody tier avoids it, at the cost of the operator holding the key). The old
controller-targeted `GET /recovery/{id}` is replaced by this agent restore-mode flow.
## 10. What persists from today (unchanged or lightly adapted)
The Customer record (`customer_configs`); config generation/retrieval (`configgen`); the two-tier
notification system (operator English / customer Hungarian, Resend, cooldowns); `events` + audit;
`app_telemetry` / `app_log_issues`; customer lifecycle actions (block/unblock, trigger-update,
delete); the asset manager; and the dashboard — adapted to render the **host + guests** view per
customer instead of a single controller.
## 11. Schema deltas (grounded in store.go's idempotent style; clean cutover)
- **NEW:** `hosts`, `guests`, `host_reports`, `signed_ops`.
- **DROP `reports` + CREATE `guest_reports`** (under the clean cutover this is drop+create with no data
migration, not an in-place rename); `guest_reports` adds `guest_id`, `host_id`; `cpu/memory` mean
guest-level; `backup_last_snapshot` goes quiet.
- **ADD** desired-state JSON + `desired_generation` to `hosts`; `desired_spec_json` to `guests`; the
slim DR record (PBS namespace + repo fingerprint + wrapped escrow key) onto `hosts`.
- **DROP both** `infra_backup_versions` (current/live) **and** `infra_backups` (legacy mirror) — the DR
record replaces them on `hosts`.
- **KEEP** `customer_configs`, `events`, `customer_notifications`, `notification_log`,
`app_telemetry`, `app_log_issues`.
- **Authz cleanup the cutover enables:** several endpoints today use global-or-any-customer-key auth
rather than customer-scoped (the infra-backup GETs, `/notify`). Most retire with the infra-backup
push; any that carry over should scope to the resolved host/guest → customer under §2.
## 12. Open items
- Operator signing-key operational mechanics (Part 4 §8) — the hub-side pending-op UI is here; the
key custody/rotation tooling is Part 4's.
- Multi-tenant resource fairness (deferred shared-host case).
- Hub-side desired-state **editing UX** specifics (form/diff wiring) — to be grounded against
`hub/internal/web/configs.go` at implementation.
- Golden-image refresh cadence / fleet versioning (carried from Part 3 §13).
@@ -0,0 +1,260 @@
# Critical design review — Proxmox re-platform doc set
> ✅ **RESOLVED (2026-06-08).** All findings folded into 01/02/03 + `proxmox-platform.md`
> (Phase-3 spike run for B2/B3 → `tests/phase3-findings.md`). **Folded:** B1 (03 §4), B2
> (03 §7/§8 + platform §4.7), B3 (03 §2/§3 + platform §3.6), S1 (03 §6/§8), S2 (03 §10/§11),
> S3 (03 §7), S4 (01 §5/§7 + 02 + 03 §2), S5 (01 §7/§11 + 02 §6), S6 (02 §5), M1 (02 §3),
> M2 (03 §7), M3 (03 §10), §6-residual (03 §6). Plus the two Phase-3 design updates:
> provision-by-restore (03 §9) and the settled root-vs-API boundary (03 §3). **Deferred/none:**
> no finding was deferred; the pre-existing open items (operator signing-key mechanics,
> multi-tenant fairness, hub-side desired-state UX, golden-image refresh cadence) remain
> flagged in 03 §13. This artifact can be deleted once confirmed.
Working artifact. Review pass over `01-topology-and-trust.md`, `02-controller-module-map.md`,
`03-host-agent.md`, `proxmox-platform.md`, and the Phase 0 / Phase 1-2 findings, grounded
against the v0.33 source (`felhom-controller/controller/`). Every finding cites a
file+line or a doc section. Severity: **blocking** / **should-fix** / **minor**.
Two findings are self-corrections of my own earlier work (`02` and `proxmox-platform.md`) —
flagged as such.
---
## Ranked summary
| # | Severity | Finding | Where |
|---|---|---|---|
| B1 | **blocking** | Reversibility gate contradicts the self-heal reconcile loop — crashed-guest healing can require a signature-gated destroy → reconcile stalls | `03` §4 vs §4(a) |
| B2 | **blocking** | vzdump bulk-exclusion only works for **volume** mount points; Docker **named volumes live in the LXC rootfs and ARE captured** → naive placement silently backs up the 1 TB media drive. Unvalidated by spike. | `03` §7 vs `proxmox-platform.md` §4.3 + pct manpage |
| B3 | **blocking** | Agent's Proxmox role is called "the minimal role from Phase 1" — but that role is the *narrow self-backup* role that Phase 1 proved is **denied** create/allocate/restore. The agent's operator-tier role is undefined. | `03` §2/§3 vs `phase1-2` §1.3-1.4, `01` appendix |
| S1 | should-fix | Quiescing for agent/hub-scheduled backups has **no agent→controller channel** — the local API is controller→agent only | `03` §6, §8 |
| S2 | should-fix | Agent self-update revert authority unspecified — if the new binary won't boot, nothing outside it can flip back | `03` §11 |
| S3 | should-fix | Storage manifest drops fields `settings.StoragePath` carries today (Label, Schedulable/default, StoppedStacks, MigratedTo) with no re-homing stated | `03` §7 vs `settings.go:90-103` |
| S4 | should-fix | Geo-restriction WAF ownership + Cloudflare **API token** placement unspecified after tunnel placement was locked; zone-wide token in a guest is a blast-radius concern | `03` (absent), `01` §3, `config.go` InfrastructureConfig |
| S5 | should-fix | Cross-doc staleness: `01` §11 still lists tunnel placement OPEN; `02` §6 lists geo "blocked on tunnel placement" — both resolved by `03` §13 | `01` §11, `02` §6 vs `03` §13 |
| S6 | should-fix (self-correct) | `02` put self-restore-test **orchestration** in the controller; `03` correctly makes it agent-owned (controller only reads status) | `02` §5(3) vs `03` §6/§8 |
| M1 | minor (self-correct) | `02` §3 lists `UUID` as a `settings.StoragePath` field — it isn't; UUID is derived from fstab at runtime | `02` §3 vs `settings.go:91-103` |
| M2 | minor | `03` §7 says the manifest "absorbs the disk-state fields StoragePath carries today" incl. UUID — UUID isn't persisted today, so the manifest *adds* it (an improvement, not absorption) | `03` §7 |
| M3 | minor | controller-update is not in `03` §10's journaled-ops list, though it's a multi-step async op | `03` §10 vs §11 |
**Values check: clean.** No DR/key-custody/offboarding path leaves a customer locked out.
Zero-knowledge DR (`03` §8, `01` §8) correctly makes the customer recovery code the
irreducible residual; the operator cannot read data and the box can still restore-test.
No hostage path found.
**Locked premises:** reviewed for soundness/consistency only; not relitigated.
---
## Blocking findings
### B1 — The reversibility gate stalls the self-healing reconcile loop
**Where:** `03` §4(a) vs the gate in §4.
**What:** §4(a) lists "redeploy of a crashed controller" as benign convergence that "falls
out of reconciliation for free." The gate then lists **guest destroy** among the
irreversible ops that require an operator signature "*regardless of whether they arrive as a
job or as a desired-state delta*." These collide: if healing a wedged guest requires
destroy+recreate (corrupt rootfs, failed in-place restart, half-built guest from an
interrupted provision), the reconciler hits a signature-gated op and **cannot proceed
without an operator** — the loop either stalls or silently gives up, defeating "self-healing
… tolerant of missed polls."
**Why it matters:** This is the security-critical control model. A fuzzy benign/destructive
line is unimplementable: either the reconciler can destroy (and a compromised hub's desired
state can wipe guests — the exact threat §4 exists to stop), or it can't (and self-heal is a
fiction for the crashed-guest case).
**Grounding:** `03` §4 self-describes the gate as "security-critical"; §9/§10 already rely on
the reconciler rolling back "a half-built guest" — which *is* a destroy of a customer-id-bound
resource, contradicting the blanket "guest destroy needs a signature."
**Suggested fix (crisp, implementable rule):** Scope the reconciler's destructive verbs by
*provenance and data-bearing-ness*, not by verb:
- The reconciler MAY, without a signature: (a) create/start/restart; (b) destroy resources it
**created earlier in the same journaled transaction** (compensating rollback, §10); (c)
destroy resources **tagged ephemeral/scratch** (restore-test scratch guests, §8).
- Destroying or overwriting any resource that **holds the only/primary copy of customer data**
always needs an operator signature.
- **Healing a crashed controller is non-destructive by construction:** the controller is
reconstructable from its image + the guest's persistent volume, so "redeploy" = restart the
LXC / `docker compose up -d` **inside the existing guest** — never a guest destroy. State
this explicitly so the two clauses stop colliding. (The v0.33 self-heal precedent is already
in-place restart: `watchdog.go` restarts stopped stacks, it never destroys the guest.)
### B2 — vzdump bulk-exclusion: the rootfs-Docker-volume trap
**Where:** `03` §7 ("Bulk external mounts are excluded from the guest's vzdump (a per-mount
backup flag)").
**What:** Two grounded problems:
1. The flag is real but narrow. The pct manpage (verified): `backup=<boolean>`
*"Whether to include the mount point in backups (**only used for volume mount points**)."*
It does **not** apply to bind mounts / device mounts (those are handled separately).
2. The trap: `proxmox-platform.md` §4.3 (validated in `phase1-2` §2.2) proved that **Docker
named volumes live inside the LXC rootfs and ARE captured by vzdump** — a sentinel in
`pgdata` survived. The default Felhom app uses Docker named volumes. So unless bulk data is
deliberately placed on a **dedicated Proxmox volume mount point** (backup=0) or a bind
mount, a "bulk" volume will be an ordinary named volume in rootfs and will be **silently
swept into the whole-guest image** — exactly the 1 TB-media-in-every-backup outcome §7 says
it prevents.
**Why it matters:** Backup size/cost and RPO blow up silently; the failure is invisible until
a media drive fills the vzdump target. This is load-bearing for the §8 tier model.
**Grounding:** pct manpage (fetched 2026); `proxmox-platform.md` §4.3; `phase1-2` §2.2.
Not covered by any spike — `proxmox-platform.md` §6 "not yet validated" should gain this row.
**Suggested fix:** Make the placement contract explicit: a `bulk` volume **must** be realized
as a dedicated LXC mount point (volume mountpoint with `backup=0`, or an external bind mount),
**never** a Docker named volume in rootfs. The per-volume placement component (`02` §5(2))
must enforce this at deploy. Add a Phase-3 spike: create an LXC with a `backup=0` volume
mountpoint + a bind mount, vzdump it, confirm both are excluded and the rootfs+`backup=1`
volume are included.
### B3 — The agent's Proxmox role is mis-grounded as "the Phase-1 minimal role"
**Where:** `03` §2 ("scoped Proxmox API token (minimal role from Phase 1)"), §3 ("the
Phase-1 minimal role is the API floor").
**What:** Phase 1's minimal role (`FelhomSelfBackup` = `VM.Audit, VM.Snapshot, VM.Backup,
Datastore.AllocateSpace, Datastore.Audit`) is the **narrow self-backup** role scoped to one
guest, and Phase 1 explicitly proved it is **denied (403)** on create/allocate
(`phase1-2` §1.3 call #7) — i.e. exactly the operator-tier ops the agent's whole job consists
of (provision, restore, storage allocation). Worse, `01` appendix states that guest-side role
"**is not used** — we chose the agent-mediated path." So `03` cites, as the agent's role
floor, a role that (a) the architecture discarded and (b) is provably insufficient for the
agent.
**Why it matters:** The agent's actual operator-tier role is **undefined**. Provisioning,
restore, and storage management cannot be built or hardened against an undefined privilege
set, and §3's root-minimization argument ("the Phase-1 minimal role is the API floor")
collapses because that floor can't create a guest.
**Grounding:** `phase1-2` §1.3 (create CT = 403), §1.4 (role = self-backup only); `01`
appendix ("not used … confirmed restore = operator-tier"); `proxmox-platform.md` §3.4.
**Suggested fix:** Replace the Phase-1 reference with a **new agent operator role** to be
defined and least-privilege-tested in a Phase-3 spike — minimally `VM.Allocate`, `VM.Config.*`,
`VM.PowerMgmt`, `VM.Snapshot(.Rollback)`, `VM.Backup`, `VM.Audit`, `Datastore.Allocate(Space)`,
`Datastore.Audit`, plus whatever storage-attach needs (see S4/root-boundary below). Keep §3's
"API token, not root, where the API suffices" principle — that part is sound — but stop
calling it the Phase-1 role.
---
## Should-fix findings
### S1 — No agent→controller channel for backup quiescing
**Where:** `03` §6 (local API is controller→agent only) vs §8 ("the controller stops the app
stack … before a guest vzdump where app-consistency matters").
**What:** App-consistent LXC backup requires the controller to quiesce (no fsfreeze for LXC —
`proxmox-platform.md` §4.2, `phase1-2` §2.1). But the §6 surface is entirely controller→agent;
the box-initiated model forbids the hub calling in, and there is no agent→controller call
defined. For a **hub/agent-scheduled** backup (schedule lives in the manifest `policy`, §7),
the agent has no way to tell the controller "quiesce now."
**Why it matters:** Either scheduled backups silently fall back to crash-consistent (relying
on WAL recovery, which `phase1-2` §3 warns is unvalidated under write load), or the feature
can't be built as drawn.
**Suggested fix:** Make backups **controller-driven for app-consistency**: the controller
learns due/policy via its own hub channel (or a `GET /backup/due` on the local API), quiesces,
calls the existing `POST /backup`, then unquiesces on completion. Document that agent-initiated
vzdump is crash-consistent only. (No inbound-to-guest channel needed — preserves §3/§5.)
### S2 — Agent self-update revert authority unspecified
**Where:** `03` §11 ("a watchdog reverts to last-good if the new binary fails to come up
healthy").
**What:** The agent is a single host systemd service with `Restart=always` (§3). If the new
binary crashes on startup, systemd just restarts the **same bad binary** in a loop. "Revert
to last-good" cannot be done *by* the thing that won't boot. §11 doesn't name the actor.
**Why it matters:** A bad self-update can brick the crown-jewel host agent — the one component
that recovers everything else — with no automatic recovery, requiring break-glass.
**Suggested fix:** Put revert authority **outside** the swapped binary: e.g. an A/B symlink
(`current → good|new`) where a separate systemd oneshot health-gate (`ExecStartPost` probe; on
failure flip the symlink back and restart), or a tiny supervisor unit. Boot-into-last-good +
explicit "commit" after a clean health window is the robust pattern. Add agent-update to the
§10 journal so an interrupted swap is resumable.
### S3 — Manifest schema omits live `StoragePath` fields without re-homing them
**Where:** `03` §7 table vs `settings.go:90-103`.
**What:** Today's `StoragePath` carries `Label`, `IsDefault`, `Schedulable`, `StoppedStacks`,
`Decommissioned`/`DecommissionedAt`/`MigratedTo`. The manifest covers state (attached/
disconnected/decommissioned) and durable_id, but drops: **Label** (human name, e.g. "Külső
HDD 1TB" — UI), **Schedulable/IsDefault** (default placement target for new apps),
**StoppedStacks** (which apps to restart on reconnect — app-domain), **MigratedTo** (decommission
target pointer).
**Why it matters:** `02` named this manifest as the contract that the `settings.StoragePath`
reshape depends on. Silently dropped fields become lost behavior (no default-drive choice, no
restart-after-reconnect list, no friendly labels).
**Suggested fix:** Either add Label + a placement-default marker to the manifest, or explicitly
state which fields re-home to the controller's `settings` (StoppedStacks and Label are
plausibly controller-side; default/schedulable placement must live wherever placement decisions
are made). Make the split explicit so neither side assumes the other owns it.
### S4 — Geo-WAF ownership + Cloudflare API token placement unspecified
**Where:** `03` covers `cloudflared` (tunnel) health but is silent on geo-restriction WAF; `02`
§6 had `cloudflare/`+`geo` "blocked on tunnel placement"; `01` §3 lists the controller's creds
as "hub API key + local-API token" only.
**What:** Now that tunnel placement is locked (host), the **geo-restriction WAF** management
(`cloudflare/` package: zone/waf/geosync) still has no home. It requires a Cloudflare **API
token** (`config.go` InfrastructureConfig.cf_api_token) with zone-wide WAF edit rights. If geo
stays in the controller (app-domain, per `02`), a **zone-wide Cloudflare token sits inside the
customer guest** — a real blast-radius concern (compromise → edit/disable WAF for the whole
zone, potentially other customers on the same zone).
**Why it matters:** Trust-boundary gap. `01` §5's boundary table has no row for controller↔
Cloudflare-API. Unspecified ownership blocks the `02` geo classification from being unblocked.
**Suggested fix:** Decide geo-WAF ownership explicitly and add it to `01` §5. Options: (a) move
WAF management to the **agent/hub** (operator-tier, token off the customer box); (b) keep it in
the controller but scope the CF token per-zone/per-customer if the account model allows. Note
this is now *unblocked* by the tunnel decision and should leave `02` §6's "blocked" state.
### S5 — Cross-doc staleness on the now-locked tunnel placement
**Where:** `01` §11 ("Cloudflare Tunnel placement: host vs guest (§7)") and `02` §6
("`cloudflare/` + `api/geo.go` — blocked on tunnel placement") vs `03` §13 ("Resolved here:
tunnel placement (host, agent-managed)") and the LOCKED list.
**What:** `01` and `02` still present as OPEN/blocked a decision `03` and the locked set have
resolved.
**Why it matters:** A dev reading `01`/`02` would treat a settled decision as open (or a
classification as blocked when only geo-ownership, S4, actually remains).
**Suggested fix:** When folding this review in: update `01` §7/§11 to record tunnel=host
(agent-managed systemd service); update `02` §6 to reduce the cloudflare item from "blocked on
tunnel placement" to the narrower "blocked on geo-WAF ownership (S4)."
### S6 — (self-correction) self-restore-test orchestration belongs to the agent, not the controller
**Where:** `02` §5(3) said "Self-restore-test orchestration — *controller* asks the agent to
restore to scratch guest, validates, reports." `03` §8 makes the **agent** drive it
autonomously; §6 gives the controller only `GET /restore-test/status` (read-only).
**What:** `03` is right and `02` overreached. Zero-knowledge means only the box/agent holds the
PBS key (`03` §8); creating a scratch guest is operator-tier (create/allocate — `phase1-2`
§1.3 #7); the controller cannot do either. The controller's only piece is surfacing status.
**Why it matters:** Keeps the NEW-component list honest — this is not a controller component to
build beyond a status read.
**Suggested fix:** Amend `02` §5(3) to "self-restore-test **status display** (read-only); the
agent owns orchestration."
---
## Minor findings
- **M1 (self-correction):** `02` §3 lists `UUID` among `settings.StoragePath` fields. It is
**not** there (`settings.go:91-103`: Path, Label, IsDefault, Schedulable, AddedAt,
Disconnected/At, StoppedStacks, Decommissioned/At, MigratedTo). UUID is derived at runtime
from fstab / `/host-dev/disk/by-uuid` by `system.ParseFstabUUID` and `watchdog.go`. The
classification (settings = MODIFY/split) is unaffected; the field list was wrong.
- **M2:** Consequently `03` §7's "absorbs the disk-state fields `settings.StoragePath` carries
today" overstates: `durable_id`/UUID is *not* carried today, so the manifest **adds** durable
identity (a genuine improvement — today the controller re-derives UUID from fstab each boot,
which is fragile). Reword "absorbs" → "absorbs + adds durable_id."
- **M3:** `03` §10 journals "provision, restore" but not **controller-update** (§11), which is
also a multi-step async op (snapshot→pull→redeploy→health→rollback). Add it so an agent crash
mid-controller-update is resume-or-rollback like the others.
---
## Verified-correct (no action) — grounding that held up
- LXC flags `nesting=1,keyctl=1` + overlayfs (`03` §9) match `proxmox-platform.md` §2.3 /
`phase0` §3. ✓
- async `task exitstatus`, not POST return (`03` §8) matches `proxmox-platform.md` §3.5. ✓
- stop-mode backup not requiring `VM.PowerMgmt` (`03` §8 "per Phase 1") matches
`proxmox-platform.md` §3.4. ✓ (applies to the agent role too.)
- running-LXC snapshot on LVM-thin (`03` §6/§8/§11) matches `proxmox-platform.md` §4.5 /
`phase1-2` §1.6. ✓
- `monitor/pinger.go` deprecation (`02` DELETE-obsolete) confirmed in `main.go:168,175`
("legacy, will be removed" / "no longer used — monitoring is now handled by the Hub"). ✓
- backup keep/delete **intra-file tear** (`02` hazard) confirmed: `backup.go` holds both
`RunDBDumps`/`DumpAppVolumes(Safe)` (keep) and `RunBackup`/`RunFullBackup` (restic, delete);
`restore.go` holds `RestoreApp` (restic) + `RestoreAppFromTier2` (app). The §7-8 backup
contract gives the extracted app-data-backup package a coherent destination. ✓
- Control-plane-not-data-plane (`03` §2/§43): apps keep serving if the agent dies — consistent
with Docker-in-LXC running independently (`phase0` §3). ✓
- §6 per-guest local-API authorization (token→guest map): sound; a leaked token acts only on
its own guest. Residual: a compromised controller can `POST /rollback` its **own** guest
(blast radius = self) — acceptable per design; worth a one-line note that rollback is
self-scoped and bounded.
+221
View File
@@ -0,0 +1,221 @@
# `05-hub-architecture.md` — critical review (grounded against felhom-hub v0.6.3 source + Parts 0104)
Method: every claim about the existing hub was checked against `felhom.eu/hub/` source; every
cross-doc claim against Parts 01/03/04. Citations are `file:line`. Severity: **blocking** (wrong /
breaks an assumption) · **should-fix** (real gap or contradiction, low blast) · **minor**.
The two highest-value catches (doc assumes something the code contradicts) are **S1** and **S2**.
---
## Ranked summary
| # | What | Where (doc → code) | Severity |
|---|---|---|---|
| S1 | §9/§11 name the **wrong infra-backup table as current**`infra_backup_versions` is the live/primary one; `infra_backups` is the deprecated write-only mirror | 05 §9/§11 → `store.go:198-217,541-578` | should-fix (code-contradiction) |
| S2 | §7 treats the CF token as **geo-only**; it is **dual-purpose (DNS-01/ACME + WAF)** and is injected into the generated `controller.yaml` | 05 §7 → `config_form.html:76-80`, `controller.yaml.default:26`, `configgen.go:28-37`, `configs.go:1041` | should-fix (code-contradiction / unverified assumption) |
| S3 | §6 leans on the existing **"Push"**, but that is a hub→box **inbound** POST — forbidden by the box-initiated model; transport must invert to poll | 05 §6 → `configs.go:569-570,1148-1150`; Part 1 §4/§5/§11; Part 3 §5 | should-fix |
| S4 | Part 1 §6 calls app inventory **"declarative"**; 05 §1 (LOCKED) says apps are mirrored, never declared/reconciled, restored from PBS | Part 1 §6 ↔ 05 §1/§9 | should-fix (cross-doc) |
| S5 | §9 hands "guest inventory + snapshots" **from the prunable report mirror**; DR soundness actually rests on durable sources | 05 §9/§3 → `store.go:809-816` | should-fix (DR robustness) |
| S6 | §4 says backup-deadline checker "maps onto host_reports' last-backup field"; today it is **event-based** and controller-emitted | 05 §4 → `deadline.go:31-86` | should-fix (mechanism) |
| M1 | "60s staleness checker" conflates the 60s **cadence** with the 30m/1h **threshold** | 05 §4 → `main.go:207-217,99-102`, `staleness.go:33-37` | minor |
| M2 | §2 `customer_configs` field list omits `api_key` — the very field the per-reporter plan retires | 05 §2 → `store.go:102-112` | minor |
| M3 | §11 `reports``guest_reports` "rename" is really drop+create under the locked clean cutover | 05 §11 → `store.go:55-119` | minor |
| M4 | Pre-existing weak authz on infra-backup GET / `/notify` (any valid key, not customer-scoped) | handler.go:407,536,568,596 | minor |
No **blocking** findings — the data model and the two-driver framing are sound, and the LOCKED clean
cutover absorbs most schema risk. The items below are gaps/contradictions worth fixing before the doc
drives work.
---
## Highest-value: doc assumes something the code contradicts
### S1 — `infra_backups` vs `infra_backup_versions` is inverted (should-fix, code-contradiction)
05 §9: *"`infra_backup_versions` retires; `infra_backups` is repurposed into the slim DR record."*
§11 repeats: *"RETIRE `infra_backup_versions`; repurpose `infra_backups`."*
The code is the other way round:
- `infra_backup_versions` (added v0.7.0, `store.go:198-211`) is the **live/primary** table. **Every read
path hits it**: `GetInfraBackup` (`store.go:565-578`), `GetInfraBackupByID` (`store.go:581-593`),
`GetInfraBackupMeta` (`store.go:604`), `ListInfraBackupVersions` (`store.go:640`), and the recovery
endpoint (`handler.go:670-686`).
- `infra_backups` (original single-row, `store.go:96-100`) is **deprecated**. It is now **written only
as a legacy mirror** ("for backward compatibility during rollback window", `store.go:552-558`) and is
**never read** except as the one-time migration *source* (`store.go:214-217`).
So the doc proposes retiring the current table and repurposing the dead one. Under the LOCKED clean
cutover both are discarded anyway, so blast radius is low — but an implementer following §9/§11
literally would point the DR record at the wrong table.
**Fix:** take §11's own alternative — *fold the slim DR record onto `hosts`* and **drop both**
infra-backup tables. If a standalone table is kept, base it on `infra_backup_versions` (the one with the
data/readers), and correct the "which is current" framing.
### S2 — the CF API token is **not** geo-only; it is the ACME token too, and ships into `controller.yaml` (should-fix, code-contradiction)
05 §7: *"The hub already holds the CF API token (the config form notes Zone WAF:Edit)… rather than
pushing the token down to the controller… The controller no longer calls the CF API."*
Grounding confirms the hub **does** hold the token and **does** have a remove-all path:
`config_json → infrastructure.cf_api_token` (`configs.go:714-715,1041-1042,1089-1096`) →
`cfClient.RemoveGeoRules(cfToken, cfg.Domain, …)` in `handleGeoDisable` (`configs.go:1112`), route
`/customers/{id}/geo/disable` (`server.go:201-205`). ✓ The §7 framing of geo-enforcement-moves-to-hub
is also consistent with Part 1 §5/§7 and Part 3 §2/§46.
**But the doc's assumption that the token is *for geo* is contradicted by the code:** the same
`cf_api_token` is **dual-purpose**
- the config-form hint says **"Zone DNS:Edit (ACME), Zone WAF:Edit (geo)"** (`config_form.html:80`),
- `controller.yaml.default:26` documents it as the **"Cloudflare API token (DNS-01 challenge)"**,
- and it is **deep-merged into the generated `controller.yaml`** via `configgen.Generate` (config_json
overrides, `configgen.go:28-37`), i.e. **today it is shipped down to the box** and served at
`/config/{id}` and `/recovery/{id}`.
Consequences §7 must address:
1. **"Token off the controller" is incomplete** if the box still does DNS-01/ACME. In the CF-Tunnel
model the box may no longer need a public cert at all (edge-terminated), making the ACME use moot —
but that is an assumption the doc must state, not skip. Either confirm ACME is gone, or the CF token
cannot fully come off the box.
2. **`configgen` must stop emitting `cf_api_token` into `controller.yaml`** (or relocate it to a
hub-only field). As written, the generated config still carries it.
---
## Should-fix
### S3 — §6 "Push" is an inbound-to-box mechanism the new model forbids
05 §6: *"the operator edits a customer's desired state (building on the existing config-form +
Push/Pull/Diff)."* The form + diff/pull/push handlers exist — `handlePushConfig` (`configs.go:569`),
`handlePullConfig` (`configs.go:952`), `handleConfigDiff` (`configs.go:861`), routes at
`server.go:209-229`. ✓ So the UI base is real.
The wrinkle: **"Push" today is a hub→controller outbound POST** (`handlePushConfig` "sends the generated
YAML config to the controller", `configs.go:569-570`), as is the geo-disable notify
(`notifyControllerGeoDisable``POST controllerURL/api/geo/settings`, `configs.go:1148-1153`). Both are
the hub **connecting into the box** — explicitly disallowed by the box-initiated model (Part 1 §4
"the hub never initiates inbound"; §5 row `agent↔hub`/`controller↔hub` = outbound poll; Part 3 §5 "The
hub never connects inbound"). 05's own §5 already resolves this (desired state is **pulled** on poll
with a `desired_generation`). So the doc is internally consistent in *mechanism* but loose in *wording*:
**make §6 explicit that "Push" becomes "publish to desired state, delivered on the next agent/controller
poll," not a reuse of the inbound push transport.** The form/diff UX carries over; the transport inverts.
(Same applies to the geo-disable controller-notify path.)
### S4 — "declarative app inventory" (Part 1 §6) vs "apps are mirrored, never reconciled" (05 §1)
Part 1 §6 lists the durable record as including a **"declarative app inventory"** that survives box loss
— wording that implies an operator-authored, re-deployable spec. 05 §1 (LOCKED two-driver model) is
explicit the opposite way: *"Apps never enter the reconcile loop… the hub only mirrors the resulting
inventory… the app/customer layer is mirrored,"* and 05 §9 restores apps **from the PBS guest snapshot**,
not by re-deploying a declared inventory. These are reconcilable (the mirror *is* durable last-known
truth) but the word "declarative" contradicts the locked framing and the §9 restore-from-snapshot path.
**Fix (align the older doc to the locked model):** in Part 1 §6 change "declarative app inventory" →
"mirrored / last-reported app inventory," and note apps are recovered from the guest snapshot, not
re-declared. (Flagging an internal inconsistency, not relitigating the locked premise.)
### S5 — §9 reads DR inputs from a prunable mirror; soundness rests on durable sources
05 §9 hands the recovering agent *"identity, tunnel token, storage manifest, PBS namespace, guest
inventory + snapshots."* §3 places "guest inventory" and "latest PBS snapshot pointers" in
`host_reports` — the bottom-up mirror. But reports are **pruned** (`Prune` deletes rows older than
`maxDays`, `store.go:809-816`; the doc keeps this), so after a long pre-DR outage the last `host_report`
can be gone or stale. The actually-durable DR inputs are: desired-state on `hosts`/`guests` (§5), the
slim DR record (PBS namespace + repo fingerprint + wrapped escrow key, §9/§11), and **PBS's own snapshot
enumeration** (the agent lists snapshots once it has the namespace + unwrapped key). The mirrored
inventory/pointers are convenience, not the source of record.
**Fix:** state in §9 that DR reads from the durable sources (desired-state + DR record + PBS), **not**
from prunable `host_reports`, so recovery doesn't degrade when the last report has aged out. This also
keeps §1's two-driver discipline clean: DR must not depend on bottom-up mirror rows being retained.
(Note: the `hosts` row legitimately mixes top-down intent columns with a few box-reported columns —
repo fingerprint, wrapped escrow key. That is fine; just label them as box-reported so the §1 split
stays legible at the column level.)
### S6 — backup-deadline checker: doc says field-based, code is event-based (and re-emitter changes)
05 §4: *"The existing backup-deadline checker maps onto `host_reports`' last-backup-per-target."* The
existing checker is **event-based**, not field-based: `CheckBackupDeadlines` looks for
`backup_completed` / `backup_failed` (and `db_dump_*`) **events** since Budapest midnight and emits
`expected_backup_missed` if neither is present (`deadline.go:31-86`). Two changes the doc should make
explicit:
1. **Mechanism:** either keep it event-based (someone emits `backup_completed`) or genuinely move it to
a `host_reports.last_backup_per_target` field check — the doc says the latter but the impl is the
former.
2. **Emitter:** today the **controller** emits backup events; in the de-privileged model the **agent**
owns backup/PBS (Part 3 §8), so the agent must now emit `backup_completed`/`backup_failed` (or the
host report carries last-backup-per-target). Without re-homing the emitter, the deadline check goes
silent after the controller stops doing backups.
---
## Minor
- **M1 — "60s staleness checker" (§4).** 60s is the **check cadence** (`main.go:207-217`,
`ticker := time.NewTicker(60 * time.Second)`); the **staleness threshold** is 30m (default,
`main.go:99-102`) with down at 2× = 60m (`staleness.go:33-37`; CLAUDE.md "OK <30m, DOWN >1h"). The
event-transition mechanism (`node_stale`/`node_down`/`node_recovered`) is described correctly
(`staleness.go:155-185`). Reword to "the staleness checker (60s cadence, 30m/1h thresholds)."
- **M2 — `customer_configs` fields (§2).** The list ("identity, domain, email, retrieval_password,
status, config_json") omits **`api_key`** (`store.go:108`) — the field §2's per-reporter plan
actually retires. Worth noting `customer_configs.api_key` becomes unused once auth resolves via
`hosts.api_key` / `guests.api_key`.
- **M3 — rename under clean cutover (§11).** `migrate()` is all `CREATE TABLE IF NOT EXISTS` +
idempotent `ALTER` (`store.go:55-119,146-149`). §11's claim "grounded in store.go's idempotent style"
is accurate. But a `reports``guest_reports` **rename** isn't part of that style; under the LOCKED
clean cutover (demo re-enrolls fresh, §2) it is really **drop `reports` + create `guest_reports`**
with no data migration. Name it as such to avoid implying an in-place rename + backfill.
- **M4 — pre-existing weak authz.** `handleInfraBackupGet`/`Versions` and `handleNotify`/
`handleSavePreferences`/`handleInfraBackupPush` use `checkAuth` (global **or any** customer key,
`handler.go:63-66`), not customer-scoped `checkAuthCustomer`. Most retire with the infra-backup push
(§9); for any that carry over, the per-reporter model (§2) should scope them to the resolved
host/guest→customer. Not a regression the doc introduces — a cleanup the cutover enables.
---
## Confirmed accurate (grounding that holds — so the rest of the doc can be trusted)
- **§10 KEEP list** matches the schema exactly: `customer_configs`, `events`, `customer_notifications`,
`notification_log`, `app_telemetry`, `app_log_issues` all present (`store.go:74-189,102-135`). The
asset manager exists (`handler.go:57,834-867`). ✓
- **§10 two-tier notifications** (operator English / customer Hungarian, Resend, cooldowns) match
`notify/dispatcher.go`: `processOperator` (1h cooldown, `FormatOperatorEmail`, gated by `operatorOn`,
`dispatcher.go:91-114`) + `processCustomer` (prefs-driven, default 6h, `FormatCustomerEmail`,
`dispatcher.go:116-158`); wired in `main.go:134`. ✓
- **§8 enrollment / §11 configgen** — deep-merge + Hungarian passphrase base is real:
`configgen.deepMerge` (`configgen.go:76-91`), programmatic overrides + `hub.api_key = cfg.APIKey`
(`configgen.go:40-47`), retrieval-password gate (`handler.go:709-753`). The evolution to agent-first +
per-guest keys + key-pinning is a clean extension. ✓
- **§2 auth extension** (Bearer → reporter → customer) is clean against today's
`checkAuthCustomer` (global key, else `GetCustomerConfigByAPIKey`, `handler.go:72-90`,
`store.go:913-935`); adding host/guest key lookups slots straight in. ✓
- **§11 "idempotent style"** is accurate (`store.go:55-119`). New tables/columns (`hosts`, `guests`,
`host_reports`, `signed_ops`, `desired_generation`, `desired_spec_json`) follow the existing
`CREATE IF NOT EXISTS` / `ALTER … ` pattern cleanly.
- **§9 escrow/custody** is consistent with Part 1 §8 (three-tier custody, zero-knowledge default,
recovery-code-wrapped PBS keyfile, operator can't open) and Part 3 §8 (live PBS key on the box for
backup + restore-test; hub holds only the wrapped escrow). The "customer recovery code is the
irreducible residual; operator-managed tier avoids it" matches Part 1 §8 verbatim in spirit. ✓
- **§4 dead-man's-switch** (host-report recency = primary liveness) is consistent with Part 3 §5
("the heartbeat *is* the liveness signal… first-class treatment hub-side"). ✓
- **§5/§6 signed-op + desired-state** are consistent with Part 4 and Part 3 §4:
hub holds **no** signing key and queues opaque blobs (Part 4 §5; 05 §6 "The hub holds no signing
key"); agent runs the verify pipeline and is the authoritative guard (Part 4 §2.3, Part 3 §4; 05 §6
"the agent's classification is the authoritative guard"); hub classifies at edit-time for UX only.
05 §6's `signed_ops` columns are a consistent superset of Part 4 §2.1's blob
`{op, target:{host_id,guest_id}, params, nonce, issued_at, expires_at, key_id}` (05 adds hub-side
lifecycle states `delivered`/`rejected` — fine). The local-CLI hand-off (`felhom-sign --pending`)
matches Part 4 §56's `Signer`-on-the-workstation model. ✓
## Two-driver soundness (axis 3) — holds
No place in 05 has the hub **drive** box/customer-owned state. Desired-state (§5) is all infrastructure
intent (guests, storage *policy*, versions, identity, tunnel) — top-down and legitimate. Apps are
explicitly excluded from reconcile (§1, §5) and mirrored only. Storage is the handshake (detect →
assign policy → reconcile policy onto the detected drive), matching Part 3 §7. The one nuance (S5): the
`hosts` row holds both top-down intent and a few box-reported columns (repo fingerprint, wrapped escrow
key) — acceptable, just label provenance per column. Reconcile (§5) never collides with app/storage
reality because the reality columns (`durable_id` attached, snapshot pointers, app inventory) are
mirror-only and never serve as desired state.
## DR completeness (axis 4) — safe to retire the heavy push, with S5's clarification
Retiring the controller's infra-backup push is safe **given** that DR reads from durable sources, not
the prunable mirror (S5). What the old push carried — `deployed_stacks` + `disk_layout.mounts`
(`store.go:768-795`, surfaced by `handleRecovery`, `handler.go:620-705`) — is reconstructible:
storage layout/`durable_id`s from the storage manifest (desired-state, durable) + host-report mirror;
app inventory from the guest **inside the PBS snapshot** (so it need not be separately stored); snapshot
list from PBS itself. The one artifact only the box can produce — the recovery-code-wrapped PBS key — is
explicitly escrowed (§9), zero-knowledge, consistent with Part 1 §8 / Part 3 §8. So nothing
DR-essential is lost by removing the push **provided** §9 is amended per S5 to name durable sources and
not lean on `host_reports` retention.
+385
View File
@@ -0,0 +1,385 @@
# Proxmox Platform Reference
Authoritative, living reference for the Proxmox platform underneath `felhom-agent`.
It records **facts about Proxmox and what we validated about it** — not Felhom design
decisions. Where a design choice exists, this doc points to the (future) controller
architecture document rather than making the choice here.
**Evidence base** (raw, chronological spike logs — kept as the underlying record):
- [tests/phase0-findings.md](tests/phase0-findings.md) — VM-vs-LXC overhead, Docker-in-LXC viability
- [tests/phase1-2-findings.md](tests/phase1-2-findings.md) — privilege model, backup/restore round-trip
- [tests/Proxmox_Spike_-_API_&_Access-Control_Reference.md](tests/Proxmox_Spike_-_API_&_Access-Control_Reference.md) — **superseded** pre-spike reference (contains a known privsep error; do not cite as authoritative)
Every nontrivial claim links to its evidence section. Validated on a single host
(`demo-felhom`, 192.168.0.162, 4 vCPU / 16 GB) on 2026-06-07; treat single-run timings and
measurements as indicative, not benchmarks.
---
## 1. Platform baseline
Validated stack [[phase0 §1](tests/phase0-findings.md)]:
| Component | Version |
|---|---|
| Proxmox VE (`pve-manager`) | **9.2.2** (`b9984c6d90a4bd80`) |
| OS | Debian 13 (Trixie) |
| Kernel | proxmox-kernel **7.0.2-6-pve** |
| `pve-qemu-kvm` | 11.0.0-3 |
| `qemu-server` | 9.1.15 |
| `pve-container` | 6.1.10 |
| `lxc-pve` / `lxcfs` | 7.0.0-2 / 7.0.0-pve1 |
| `criu` | 4.1.1-1 |
`pvesh get /version` → release 9.2. Always confirm the node name on the box
(`pvesh get /nodes`) rather than hard-coding it.
### 1.1 Storage backends
Two backends were present and exercised [[phase0 §1](tests/phase0-findings.md), [phase1-2 §pre-flight](tests/phase1-2-findings.md)]:
| Storage | Type | Path / VG | Content types | Holds |
|---|---|---|---|---|
| `local` | `dir` | `/var/lib/vz` | `iso, vztmpl, backup, import` | ISOs, CT templates, **vzdump archives** |
| `local-lvm` | `lvmthin` | VG `pve`, thinpool `data` | `rootdir, images` | guest disk volumes |
**Why backups cannot live on LVM-thin:** LVM-thin is a *block* backend — it allocates
logical volumes for guest disks. Backup archives and templates are *files*, which require a
file-level backend (`dir`, NFS, CIFS, or PBS). A `vzdump` target must therefore be a
storage whose content types include `backup` (here, `local`); pointing `vzdump` at
`local-lvm` is not valid. [[phase1-2 §pre-flight / §2.1](tests/phase1-2-findings.md)]
### 1.2 Repositories
PVE 9 uses **deb822** `.sources` files under `/etc/apt/sources.list.d/`. For a host
without a subscription, the enterprise repos (`pve-enterprise.sources`,
`ceph-*-enterprise.sources`) must be disabled (they return 401) and a no-subscription repo
enabled. *The spike host arrived with the no-subscription repo already configured and the
host updated [[phase0 baseline](tests/phase0-findings.md)]; the repo setup itself was not a
spike deliverable* — the canonical no-subscription `.sources` is the standard Proxmox 9
procedure (`/etc/apt/sources.list.d/pve-no-subscription.sources` with
`Components: pve-no-subscription`). Treat the exact commands as standard setup, not
spike-validated.
**Docker repository (validated):** Docker's official apt repo **has a `trixie` channel**;
no fallback to Debian's `docker.io` was needed. Installed Docker **29.5.3** from it in both
guest types. [[phase0 §1](tests/phase0-findings.md)]
---
## 2. Guest model (LXC vs VM) — validated facts
Both guest types ran the **identical** workload (Debian 13, Docker 29.5.3, a
postgres/redis/nginx compose stack) under identical resources (2 vCPU, 2048 MB, ~10 GB)
[[phase0](tests/phase0-findings.md)].
### 2.1 Isolation characteristic (fact, not recommendation)
- **LXC** is an OS-level container: it **shares the host kernel**. Docker-in-LXC needs the
container configured for nesting (see §2.3).
- **VM** runs its **own guest kernel** under KVM/QEMU, with full hardware-level isolation
and its own firmware.
The trade-offs below follow directly from this difference.
### 2.2 Resource overhead (measured)
Host RAM used = `MemTotal MemAvailable`, deltas vs a both-stopped baseline of 1702 MB;
one guest measured at a time [[phase0 §2](tests/phase0-findings.md)]:
| Metric | LXC | VM | Note |
|---|---|---|---|
| Idle host-RAM delta | **+211 MB** | **+2056 MB** | structural, see below |
| Under-load host-RAM delta | **+410 MB** | **+2084 MB** | |
| Per-guest attribution | cgroup `memory.current` 1961 MB¹ | KVM RSS ~2031 MB | |
| Idle host CPU used | ~0.3 % | ~6.0 % | VM has an emulation/guest-kernel floor |
| Under-load host CPU used | ~39.4 % | ~53.9 % | VM work shows as `%guest` (31.9 %) |
| pgbench throughput | 2211 tps | 1820 tps | identical load, 0 failed both |
| Disk used (host thin-LV) | ~2.67 GiB | ~2.94 GiB | of 10 GiB allocated |
| Provisioning (create→ready) | ~1015 s | ~6075 s | template-extract vs qcow2-import+boot |
¹ `cgroup memory.current` counts reclaimable page cache shared with the host and
**overstates** the LXC's true incremental cost; the +211 MB host delta is the honest
number [[phase0 §4.4](tests/phase0-findings.md)].
**Why the RAM gap is structural** [[phase0 §4.3](tests/phase0-findings.md)]: LXC processes
share the host kernel and page cache, so only the working set counts against the host. A VM
with **no ballooning configured** has KVM back every guest-touched page (including the
guest's own page cache), so its host cost ≈ the full RAM allocation and is largely
load-independent. *Ballooning / KSM were not tested* and could change the VM figure.
### 2.3 Docker-in-LXC viability (validated)
Docker ran **cleanly in an *unprivileged* LXC** configured with
`--features nesting=1,keyctl=1 --unprivileged 1` (PVE 9 syntax, accepted by `pct create`)
[[phase0 §3](tests/phase0-findings.md)]:
- `docker run hello-world` → success; full 3-container stack healthy.
- **Storage driver: `overlayfs`** (cgroup v2, systemd cgroup driver) — **no `vfs`
fallback**. (Docker 29 names the overlay driver `overlayfs` via the containerd
snapshotter image store; same overlay technology as the legacy `overlay2`.)
- Named volume persisted writes; multi-container networking + published port worked
(`curl localhost:8080` → 200); 0 failed transactions under load.
- No privileged-container fallback was needed.
### 2.4 Guest agent & app-consistency capability
- **VM:** `qemu-guest-agent` installs and reports (`agent: 1`), enabling
`guest-fsfreeze`-based app-consistent `snapshot` backups [[phase0 §4.8](tests/phase0-findings.md)].
The Debian genericcloud image does **not** ship the agent — it must be installed
in-guest.
- **LXC:** no guest agent exists → **no fsfreeze** (see §4.2).
---
## 3. API & access control
### 3.1 Fundamentals
- **Base URL:** `https://<host>:8006/api2/json`. Every `pve*` CLI is a thin wrapper over
this REST API.
- **Token auth header:** `Authorization: PVEAPIToken=USER@REALM!TOKENID=SECRET`. The
secret is shown **once** at creation. Response envelope: `{"data": ...}`.
- **TLS reality:** the host serves the default **self-signed** certificate. `curl` without
`-k` fails `SSL certificate problem: unable to get local issuer certificate`
[[phase1-2 §1.5](tests/phase1-2-findings.md)]. Production trust (pin the PVE CA / install
a real cert) is a separate, not-yet-decided concern.
### 3.2 RBAC model
An ACL entry is a triple **(path, principal, role)**; a role is a bundle of privileges,
assigned at the most specific path. Paths include `/`, `/vms/<vmid>`, `/nodes/<node>`,
`/storage/<store>`, `/pool/<pool>`, `/access/...`.
Introspection (**corrected for PVE 9**) [[phase1-2 §1.1](tests/phase1-2-findings.md)]:
- `pveum role list` — lists roles **with their privileges**.
- ⚠️ `pveum role info <role>` **does not exist in PVE 9** (the old reference used it).
- `pveum acl list`, `pveum user permissions <user> --path <path>`.
### 3.3 Privilege-separated tokens — the intersection rule (corrected)
> **A privsep token's (`--privsep 1`) effective permissions are the *intersection* of (a)
> the backing user's permissions and (b) the token's own ACLs.** The role must therefore be
> granted on **BOTH the user AND the token** for the same path. Granting it on the token
> only yields an **empty intersection** and a **403 even on self-calls.**
> [[phase1-2 §1.2](tests/phase1-2-findings.md)]
This corrects the superseded reference (§3 there grants the ACL to the token only). The
intersection is what keeps a privsep token ≤ its user while still being independently
scopeable to a narrow path.
Working pattern (validated):
```bash
pveum role add <Role> -privs "<priv> <priv> ..." # NB: -privs is space-separated
pveum user add <user>@pve
pveum user token add <user>@pve <tokenid> --privsep 1 # capture SECRET (shown once)
pveum acl modify <path> -user '<user>@pve' -role <Role> # BOTH the user...
pveum acl modify <path> -token '<user>@pve!<tokenid>' -role <Role> # ...AND the token
```
`pveum acl delete` **requires `--roles`** (a bare `-user`/`-token` path errors
`400 roles: property is missing`). Deleting the token/user/role auto-invalidates the
referencing ACLs. [[phase1-2 §5](tests/phase1-2-findings.md)]
### 3.4 Validated minimal self-backup role
A token scoped to **one VMID + the backup datastore** can audit, snapshot, and back up
**only that guest**, and is denied on every other guest and on create/allocate
[[phase1-2 §1.31.4](tests/phase1-2-findings.md)]:
> **Minimal role for self-audit + self-snapshot + both `snapshot`- and `stop`-mode
> self-backup:**
> `VM.Audit, VM.Snapshot, VM.Backup, Datastore.AllocateSpace, Datastore.Audit`
⚠️ **`VM.PowerMgmt` is NOT required for stop-mode backup** — `vzdump` performs the guest
shutdown/restart internally under `VM.Backup` (tested: stop-mode self-backup returned
`exitstatus OK` without it) [[phase1-2 §1.4](tests/phase1-2-findings.md)]. This corrects the
old reference's "likely yes" guess.
Validated boundary (token scoped to `/vms/<self>` + `/storage/local`):
| Operation | Result |
|---|---|
| `GET /version` | 200 |
| `GET` self status, `POST` self snapshot, `POST` self vzdump | 200 / task `OK` |
| `GET`/`POST` against **another** guest's vmid | **403** (read) / task **403** (backup) |
| `POST /nodes/<node>/lxc` (create/allocate a guest) | **403** — create/allocate is operator-tier |
### 3.5 Async tasks — trust `exitstatus`, not the POST
Long operations (`vzdump`, `snapshot`, clone, restore) return a **UPID**, not a result.
Poll `GET /nodes/<node>/tasks/<upid>/status` until `status: stopped`, then read
`exitstatus` [[phase1-2 §1.3](tests/phase1-2-findings.md)].
> ⚠️ **Authorization can surface at task execution, not at the HTTP POST.** A `vzdump`
> against an unauthorized vmid returns **HTTP 200 + a UPID**, but the task then ends
> `exitstatus: "403 Permission check failed (/vms/<id>, VM.Backup)"` and produces **no
> archive**. A caller that trusts the 200 would wrongly believe the backup ran. Always poll
> the task and check `exitstatus`.
(The task owner — including a token — can read its own task status: 200.)
### 3.6 Operator-tier agent role & root-vs-API boundary (validated)
The operator-tier **host agent** (`03-host-agent.md`) needs a far broader role than the
Phase-1 *guest self-backup* role (which is denied create/allocate — §3.4). The minimal role
that drives the full guest lifecycle via an API token, validated by paring
[[phase3 §B3](tests/phase3-findings.md)]:
> **`FelhomAgent` (operator-tier, 16 privileges):**
> `VM.Allocate, VM.Audit, VM.Config.Disk, VM.Config.CPU, VM.Config.Memory, VM.Config.Network,
> VM.Config.Options, VM.PowerMgmt, VM.Snapshot, VM.Snapshot.Rollback, VM.Backup,
> Datastore.Allocate, Datastore.AllocateSpace, Datastore.Audit, Sys.Audit, SDN.Use`
>
> Paring proved: `SDN.Use` is **required** (PVE 9 gates bridge use; omitting it → `403
> (/sdn/zones/localnetwork/vmbr0, SDN.Use)`); `Sys.Audit` required for host metrics
> (`GET /nodes/<node>/status`); `VM.Config.Network`/`VM.Config.Options` required for NIC/onboot
> config; `Datastore.AllocateTemplate` **not** needed (drop it). NB `VM.Config.CPUMemory` is
> not a real privilege — it is `VM.Config.CPU` + `VM.Config.Memory`.
**Root-vs-API boundary** [[phase3 §B3](tests/phase3-findings.md)] — nearly the entire guest
lifecycle, **including restore**, is API-token-covered; the genuine OS-root residual is narrow:
| Operation | Coverage |
|---|---|
| Create LXC (nesting-only), config, allocate, start/stop, snapshot/rollback, vzdump, **restore**, destroy, add storage definition, host metrics | **scoped API token** (the `FelhomAgent` role) |
| ⚠️ **Create LXC with `keyctl=1`** (Docker needs it — §2.3) | **OS root `root@pam` only** |
| USB physical mount-by-UUID / systemd mount unit / fstab; SMART/sensors | OS root / narrow sudoers |
> ⚠️ **`keyctl=1` (and any feature flag except `nesting`) can be set only by an actual
> `root@pam` session** — `changing feature flags (except nesting) is only allowed for
> root@pam`. **No API token qualifies**, not even a non-privsep `root@pam` token (same 403).
> So *fresh provisioning* of a Docker-capable LXC needs `pct create` as OS root (or a narrow
> sudoers entry). **Restore is exempt:** a token-authorized `vzrestore` **preserves
> `keyctl=1`** from the archive — the DR path needs no root.
---
## 4. Backup & restore (`vzdump` / `pct restore`)
### 4.1 Modes
- **`stop`** — orderly guest shutdown → backup → restart. Highest consistency, defined
downtime. (For LXC the shutdown/restart is internal to `vzdump`; needs only `VM.Backup`
§3.4.)
- **`snapshot`** — lowest downtime; copies blocks while running. Consistency depends on the
guest cooperating (§4.2).
- **`suspend`** — legacy/compat, not used.
### 4.2 Consistency: crash-consistent vs quiesced, and no-fsfreeze-for-LXC
> ⚠️ **An LXC has no guest agent, so `snapshot`-mode `vzdump` does NOT fsfreeze.** A
> running-stack LXC backup is therefore **crash-consistent** (filesystem-level), not
> app-consistent. App-consistency for an LXC is the caller's job: quiesce in-guest first
> (stop the stack / flush DBs) or use `stop` mode. A **VM** with `qemu-guest-agent` gets
> `guest-fsfreeze` around the copy → near-free app-consistency. [[phase1-2 §2.1](tests/phase1-2-findings.md), [phase0 §4.8](tests/phase0-findings.md)]
**Validated restore behaviour** (LXC, Postgres) [[phase1-2 §2.2](tests/phase1-2-findings.md)]:
- **Crash-consistent (running):** on first start Postgres ran **automatic WAL recovery**
(`database system was interrupted … not properly shut down; automatic recovery in
progress … redo done … ready to accept connections`) and the data was intact.
- **Quiesced (stack stopped):** clean start, no recovery, data intact.
- Both restored correctly here on an idle-at-backup DB; this is **not** a durability
guarantee under heavy write load (§6).
### 4.3 What a backup captures
A single LXC `vzdump` captures the container rootfs **including the Docker named volumes**
(they live in the rootfs) — one backup = the whole guest and its data. Validated: a
sentinel row survived both variants [[phase1-2 §2.2](tests/phase1-2-findings.md)].
Sizes/timings (2.5 GiB source, zstd) [[phase1-2 §2.12.2](tests/phase1-2-findings.md)]:
backup ~934 MB (~2.7:1) in ~2225 s; restore in ~1112 s.
### 4.4 Restore = recreate-from-archive (identity is preserved)
There is no single "restore" call — you recreate the guest from the archive into a **new
VMID**:
- **LXC:** `pct restore <newid> <archive> --storage <store>`
- **VM:** `qmrestore <archive> <newid>` (or `POST /nodes/<node>/qemu` with `archive=`)
> ⚠️ **`pct restore` preserves the source config — including the MAC address and
> hostname.** Restoring while the original still runs causes a **MAC/hostname collision** on
> the bridge; reset network identity (`pct set <id> -net0 name=eth0,bridge=vmbr0,ip=dhcp`
> regenerates the MAC) before starting. [[phase1-2 §2.2](tests/phase1-2-findings.md)]
**Restored config survives intact:** `unprivileged: 1` and `features: nesting=1,keyctl=1`
are preserved, so Docker runs in the restored CT [[phase1-2 §2.2](tests/phase1-2-findings.md)].
### 4.5 Snapshots
A **running, unprivileged LXC can be snapshotted on LVM-thin** with no stop required
(`exitstatus OK`; snapshot listed while the CT stays `running`)
[[phase1-2 §1.6](tests/phase1-2-findings.md)]. This is the mechanism available for a
snapshot-before-change rollback flow.
### 4.6 PBS (Proxmox Backup Server)
**Not yet validated.** No PBS datastore was configured or tested in the spike. All backup
findings above are for `vzdump` to a `dir` storage. PBS (dedup, incremental, remote, dirty-
bitmap) is pending.
### 4.7 vzdump scope by LXC mount type (validated)
A stop-mode `vzdump` includes/excludes each LXC mount point by **type and the `backup` flag**
[[phase3 §B2](tests/phase3-findings.md)]. Validated three ways (vzdump log, archive grep,
restore):
| Location | `backup` flag | In the vzdump? |
|---|---|---|
| rootfs (and anything inside it) | — | **included** (always) |
| **Docker named volume** (default driver) | — | **included** — it lives in the rootfs (`/var/lib/docker/volumes/<v>/_data`) |
| volume mount point (`mpN`) | `backup=1` | included |
| volume mount point (`mpN`) | `backup=0` | **excluded** (vol recreated empty on restore) |
| bind mount point (`mpN: /host/path`) | n/a | **excluded** ("not a volume"); data is *not* in the archive |
> ⚠️ **The `backup=<boolean>` flag is honoured ONLY for *volume* mount points.** A **Docker
> named volume is in the rootfs and is always captured** — so a "bulk" volume left as a
> default named volume is silently swept into the whole-guest image. To keep bulk data **out**,
> realize it as a dedicated `backup=0` volume mount point (proven recipe:
> `pct set <id> -mpN <storage>:<size>,mp=/mnt/bulk,backup=0` then
> `docker volume create --driver local -o type=none -o o=bind -o device=/mnt/bulk bulkvol`).
> A **bind mount's** data is excluded from the archive entirely; on same-host restore it
> reappears only because the bind config re-attaches the same host dir — on a *different* host
> (true DR) it is gone unless backed up separately.
---
## 5. Gotchas & operational notes (quick reference)
| Gotcha | Detail | Evidence |
|---|---|---|
| **deb822 repos** | PVE 9 repos are `.sources` files; disable enterprise, enable no-subscription | standard setup |
| **Privsep dual-grant** | privsep token needs the role on **both** user and token, else empty intersection → 403 | [phase1-2 §1.2](tests/phase1-2-findings.md) |
| **Async authz** | `vzdump` POST returns 200+UPID even when unauthorized; the 403 is in the task `exitstatus`; poll it | [phase1-2 §1.3](tests/phase1-2-findings.md) |
| **No fsfreeze for LXC** | running-LXC `snapshot` backup is crash-consistent only; quiesce or use `stop` for app-consistency | [phase1-2 §2.1](tests/phase1-2-findings.md) |
| **Restore identity collision** | `pct restore` keeps source MAC + hostname; reset before starting alongside the original | [phase1-2 §2.2](tests/phase1-2-findings.md) |
| **Restart policy for self-heal** | restored/rebooted containers come up `exited` with no restart policy; need a restart policy or an explicit `compose up -d` to return automatically | [phase1-2 §2.2/§3](tests/phase1-2-findings.md) |
| **Self-signed TLS** | host cert is self-signed; `curl` needs `-k` until trust is set up | [phase1-2 §1.5](tests/phase1-2-findings.md) |
| **`pveum role info` gone** | use `pveum role list` in PVE 9 | [phase1-2 §1.1](tests/phase1-2-findings.md) |
| **`pveum acl delete` needs `--roles`** | bare `-user`/`-token` path errors `400 roles: property is missing` | [phase1-2 §5](tests/phase1-2-findings.md) |
| **`VM.PowerMgmt` not needed** | stop-mode backup works under `VM.Backup` alone | [phase1-2 §1.4](tests/phase1-2-findings.md) |
| **`keyctl=1` is root-only** | feature flags except `nesting` need a `root@pam` session; no API token (even root's) can set them; restore preserves them | [phase3 §B3](tests/phase3-findings.md) |
| **`SDN.Use` gates bridge use** | PVE 9 needs `SDN.Use` to attach a NIC to `vmbr0`; omit it → 403 | [phase3 §B3](tests/phase3-findings.md) |
| **Docker named vol = always backed up** | named volumes live in rootfs; only *volume mountpoints* honour `backup=0`; bulk must be a dedicated `backup=0` mp | [phase3 §B2](tests/phase3-findings.md) |
---
## 6. Validated vs open
### Validated by the spike
| Fact | Evidence |
|---|---|
| PVE 9.2.2 / Debian 13 / kernel 7.0.2 baseline; `local` (dir) vs `local-lvm` (thin) roles | [phase0 §1](tests/phase0-findings.md), [phase1-2 pre-flight](tests/phase1-2-findings.md) |
| Docker runs in an **unprivileged** LXC (`nesting=1,keyctl=1`), driver `overlayfs`, cgroup v2 | [phase0 §3](tests/phase0-findings.md) |
| LXC vs VM overhead (idle host RAM +211 MB vs +2056 MB; CPU/throughput/provisioning) | [phase0 §2](tests/phase0-findings.md) |
| Privsep token = intersection of user ∩ token ACLs (dual-grant required) | [phase1-2 §1.2](tests/phase1-2-findings.md) |
| Minimal self-backup role; `VM.PowerMgmt` unnecessary | [phase1-2 §1.4](tests/phase1-2-findings.md) |
| Token scoped to one VMID: self-ops succeed, cross-guest + create/allocate denied | [phase1-2 §1.3](tests/phase1-2-findings.md) |
| Async UPID model; vzdump authz surfaces in `exitstatus`, not the POST | [phase1-2 §1.3](tests/phase1-2-findings.md) |
| Running, unprivileged LXC snapshots on LVM-thin (no stop) | [phase1-2 §1.6](tests/phase1-2-findings.md) |
| `vzdump``pct restore` round-trip; one backup captures Docker volumes; config survives | [phase1-2 §2](tests/phase1-2-findings.md) |
| Crash-consistent restore recovers via Postgres WAL; quiesced restores clean | [phase1-2 §2.2](tests/phase1-2-findings.md) |
| LXC vzdump scope by mount type; `backup=0` excludes volume mps; Docker named vols ride rootfs; proven bulk-exclusion recipe | [phase3 §B2](tests/phase3-findings.md) |
| Operator agent role (16 privs); guest lifecycle incl. restore is API-token-covered; `keyctl` create is `root@pam`-only | [phase3 §B3](tests/phase3-findings.md) |
### Not yet validated (do not assume)
| Open item | Why it matters |
|---|---|
| **PBS** (dedup/incremental/remote backup) | the only backup path tested was `vzdump` to a `dir` |
| **The real controller running inside an LXC** reaching `host:8006` | spike used `curl`/CLI, not the actual Go controller |
| **App-consistency under heavy write load** | WAL recovery was validated only on an idle-at-backup DB |
| **Live migration / restore to a different host** | single-node spike only |
| **Ballooning / KSM** effect on VM RAM cost | VM RAM measured with neither configured |
| **Cluster / HA** behaviour | single node |
| **Production TLS trust** for the API | all calls used `-k` against a self-signed cert |
| **deb822 no-subscription repo setup** as a controlled step | host arrived pre-configured |
---
## 7. Scope boundary
This document holds **platform facts only.** Felhom design decisions — e.g. which guest
type is the default, whether to use privsep or non-privsep tokens, where PBS lives — are
**out of scope** and belong in the controller-architecture document. Where this reference
notes a decision exists, the decision itself is recorded there, not here.
@@ -0,0 +1,176 @@
> ⚠️ **SUPERSEDED — spike evidence only, not authoritative.** This is the *pre-spike*
> reference and contains at least one known error (the privsep/ACL mechanism in §3 — it
> grants the ACL to the token only, which yields an empty intersection and a 403 even on
> self-calls). For the corrected, validated facts read
> [`../proxmox-platform.md`](../proxmox-platform.md). Kept here unchanged as the record of
> what we believed going into the spike.
# Proxmox Spike — API & Access-Control Reference
Reference for the **controller-as-guest** architecture, synthesized from current
Proxmox VE 9.x documentation (June 2026).
Items marked **[confirm on box]** should be verified once PVE is installed —
treat them as Phase 0/1 verification steps, not gospel. Every Proxmox CLI tool
is a thin wrapper over the same REST API, so anything below is reachable from Go.
---
## 1. API fundamentals
- **Base URL:** `https://192.168.0.162:8006/api2/json`
- **Auth (API token):** HTTP header
`Authorization: PVEAPIToken=USER@REALM!TOKENID=SECRET`
The secret is shown **once** at creation — capture it immediately, it can't be
retrieved again.
- **Response shape:** `{ "data": ... }`; errors come back via HTTP status + body.
- **Discovery (do this live on the box instead of trusting any doc):**
- `pvesh get /version`
- `pvesh ls /nodes/<node>/qemu/<vmid>`
- Full schema browser: `https://pve.proxmox.com/pve-docs/api-viewer/`
- "What call does the GUI make?" → perform the action in the web UI with
browser DevTools → Network open and read the request. Fastest way to find
the exact endpoint + params for anything.
- **Async tasks:** long operations (backup, restore, clone) return a **UPID**
(task id), not a result. Poll `GET /nodes/<node>/tasks/<upid>/status` until
`status: stopped`, then check `exitstatus`. The controller must poll, not
block. **[confirm on box]** the exact polling/response shape.
---
## 2. RBAC model — (path, principal, role)
An ACL entry is a triple of **(path, user/group/token, role)**. A role is a
bundle of privileges, assigned at the most specific path possible.
- **Paths:** `/`, `/vms/<vmid>`, `/nodes/<node>`, `/storage/<store>`,
`/pool/<pool>`, `/access/...`
- **Predefined roles include:** `PVEAuditor` (read-only), `PVEVMUser`,
`PVEVMAdmin`, `PVEDatastoreUser`, `PVEAdmin`, `PVEUserAdmin`.
- **API tokens with privilege separation (`--privsep 1`):** the token's
effective permissions are the **intersection** of (a) the backing user's
permissions and (b) the token's own ACLs. A privsep token can therefore never
exceed its user, and you grant it a separate, minimal ACL. This is exactly the
property the in-guest controller needs.
Introspection:
```bash
pveum role list
pveum role info PVEVMAdmin
pveum user permissions <user> --path /vms/<vmid>
```
---
## 3. Two-tier privilege model (our architecture decision)
**Tier A — in-guest controller (customer-facing, NARROW).**
Runs inside the customer's guest. Token scoped to *that guest's own VMID only*:
read its own status/config, snapshot itself, back itself up, write the backup to
the datastore. Cannot see or touch other guests. The LXC/VM's own privilege
level is irrelevant here — reaching `host:8006` is just an HTTPS call + token.
**Tier B — operator (provisioning, BROAD).**
Creates/destroys guests, builds the golden template, attaches storage, wires PBS.
Lives operator-side (hub / tooling), never on the customer box.
### Phase 1 runbook — minimal self-backup role + scoped token
```bash
# 1. Custom least-privilege role: "back up / snapshot myself"
# [confirm on box: exact privilege names via `pveum role list` / api-viewer]
pveum role add FelhomSelfBackup \
-privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit"
# 2. Dedicated API-only user in the PVE realm (no login password)
pveum user add felhom-ctl@pve --comment "In-guest controller (self-backup)"
# 3. Privsep token for that user (SECRET shown once)
pveum user token add felhom-ctl@pve ctl --privsep 1
# 4. Scope the TOKEN to one guest + the backup datastore only
pveum acl modify /vms/<vmid> -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
pveum acl modify /storage/<store> -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
# 5. Test FROM INSIDE the guest
curl -k https://<host>:8006/api2/json/version \
-H "Authorization: PVEAPIToken=felhom-ctl@pve!ctl=<SECRET>"
curl -k -X POST https://<host>:8006/api2/json/nodes/<node>/vzdump \
-H "Authorization: PVEAPIToken=felhom-ctl@pve!ctl=<SECRET>" \
-d "vmid=<vmid>&storage=<store>&mode=snapshot"
```
**Pass criteria:** the token backs up its OWN vmid, and returns **403** on any
other vmid. That single result validates the whole controller-as-guest design.
**Open question to settle here:** does Tier A also need `VM.PowerMgmt` so it can
stop/start its own guest for `stop`-mode backups? Likely yes — add it and re-test.
---
## 4. Backup / restore (vzdump)
**Modes:**
- **`stop`** — orderly guest shutdown → live backup → resume. Highest
consistency, short defined downtime.
- **`snapshot`** — lowest downtime; copies blocks while running. *Small
inconsistency risk* unless the guest cooperates (see below).
- **`suspend`** — legacy/compat, longer downtime, not recommended.
**App-consistency — the concrete version of the earlier warning:**
- **VM:** install `qemu-guest-agent` in the guest and set `agent: 1`.
`snapshot`-mode vzdump then calls `guest-fsfreeze-freeze` / `-thaw` around the
copy → near-free filesystem consistency. **This is a real point in the VM's
favour over LXC.**
- **LXC:** no guest agent → no fsfreeze. App-consistency becomes the
*controller's* job: quiesce in-guest first (stop stacks / flush DBs) **then**
vzdump, or use `stop` mode. Same lesson as the restic work, moved to the guest
layer.
**CLI / API:**
```bash
vzdump <vmid> --mode snapshot --storage <store> # CLI
# API (async → UPID):
POST /api2/json/nodes/<node>/vzdump params: vmid, storage, mode, ...
```
**Restore is NOT a single "restore" call** — you recreate the guest from the
archive:
- **VM:** `qmrestore <archive> <newvmid>` / `POST /nodes/<node>/qemu` with `archive=...`
- **LXC:** `pct restore <newvmid> <archive>` / `POST /nodes/<node>/lxc` with the archive as source
Phase 2's real-restore test = restore to a **fresh vmid** and boot it. Do not
declare the backup "working" until a restored guest actually runs.
---
## 5. Key REST endpoints (qemu shown; lxc is parallel under `/lxc`)
```
GET /nodes
GET /nodes/<node>/qemu list VMs
GET /nodes/<node>/qemu/<vmid>/status/current live status
GET /nodes/<node>/qemu/<vmid>/config config
POST /nodes/<node>/qemu/<vmid>/status/{start,stop,shutdown,reboot}
POST /nodes/<node>/qemu/<vmid>/snapshot (snapname, description)
GET /nodes/<node>/qemu/<vmid>/snapshot list snapshots
POST /nodes/<node>/qemu/<vmid>/snapshot/<snap>/rollback
POST /nodes/<node>/vzdump backup (async, UPID)
GET /nodes/<node>/tasks/<upid>/status poll async task
```
LXC: replace `/qemu/` with `/lxc/`. For **Docker-in-LXC** the container needs
`features nesting=1,keyctl=1` (`pct set <vmid> -features nesting=1,keyctl=1`, or
the `features` property on `POST /nodes/<node>/lxc`) — **[confirm on box]**.
---
## 6. Phase 0 confirm-on-box checklist
- [ ] PVE 9.2 installed; storage = LVM-thin (leave free space to also test dir/qcow2)
- [ ] Exact privilege set for `FelhomSelfBackup` (`pveum role info`)
- [ ] UPID task-polling response shape
- [ ] Docker official apt repo has a `trixie` channel
- [ ] LXC `features nesting=1,keyctl=1` syntax + Docker actually runs inside an LXC
- [ ] Baseline idle + under-load RAM/CPU: one Debian VM vs one Debian LXC, identical resources
+331
View File
@@ -0,0 +1,331 @@
# Phase 0 — VM vs LXC Overhead Spike: Findings
**Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, Debian 13 (Trixie),
kernel 7.0.2-6-pve, 4 vCPU, 16 GB RAM (15771 MB `MemTotal`).
**Date:** 2026-06-07. **Measured one guest at a time, the other fully stopped.**
> This document presents **data and observations only**. No recommendation or verdict —
> the architecture decision is made elsewhere.
---
## 1. Provenance
### Platform
| Component | Version |
|---|---|
| pve-manager | 9.2.2 (`b9984c6d90a4bd80`) |
| kernel | proxmox-kernel 7.0.2-6-pve |
| pve-qemu-kvm | 11.0.0-3 |
| qemu-server | 9.1.15 |
| pve-container | 6.1.10 |
| lxc-pve / lxcfs | 7.0.0-2 / 7.0.0-pve1 |
| criu | 4.1.1-1 |
`pvesh get /version` → release 9.2, version 9.2.2.
### Guest images
| | LXC (9001) | VM (9000) |
|---|---|---|
| Source | `local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst` | `debian-13-genericcloud-amd64.qcow2` |
| Build | Debian 13.1 standard CT template (downloaded via `pveam`, checksum verified) | cloud build **20260601-2496**; in-guest reports Debian **13.5** after `apt update` |
| qcow2 | n/a | virtual 3 GiB, on-disk 323 MiB, compat 1.1/zlib |
### Docker (identical in both guests)
| | LXC | VM |
|---|---|---|
| Source | Docker official apt repo, **`trixie` channel** (confirmed present) | same |
| Version | **29.5.3** build d1c06ef | **29.5.3** build d1c06ef |
| Storage Driver | **`overlayfs`** (not vfs) | **`overlayfs`** (not vfs) |
| Cgroup Version / Driver | **v2 / systemd** | **v2 / systemd** |
| `hello-world` | OK | OK |
> Docker's official repo **does** have a `trixie` channel — no fallback to Debian's
> `docker.io` was needed. Docker 29 reports the driver as `overlayfs` (the containerd
> snapshotter image store) rather than the legacy name `overlay2`; this is the same
> overlay technology and is **not** a `vfs` fallback.
---
## 2. Comparison table
Baseline (both guests stopped): host RAM used **median 1702 MB** (range 16991703);
host CPU **~0.1 % used** (99.9 % idle). All RAM deltas below are vs this baseline.
Host RAM used = `MemTotal MemAvailable`, 5 samples ~3 s apart (median reported).
| Metric | LXC (9001) | VM (9000) | Δ (VM LXC) |
|---|---|---|---|
| **Idle host-RAM delta** | **+211 MB** (1913) | **+2056 MB** (3758) | **+1845 MB** |
| **Under-load host-RAM delta** | **+410 MB** (2112) | **+2084 MB** (3786) | **+1674 MB** |
| **Per-guest mem attribution** | cgroup `memory.current` = **1961 MB**¹ | KVM process RSS = **2031 MB** (idle) / **2047 MB** (load) | — |
| **Idle host CPU used** | **~0.3 %** (0.20 usr + 0.10 sys) | **~6.0 %** (3.37 usr + 2.31 sys + 0.29 guest) | **+5.7 pp** |
| **Under-load host CPU used** | **~39.4 %** (17.1 usr + 7.5 sys + 14.5 iowait + 0.3 soft) | **~53.9 %** (31.9 guest + 16.4 iowait + 3.4 sys + 1.7 usr + 0.6 soft) | **+14.5 pp** |
| **pgbench throughput** | **2211.7 tps**, lat 1.809 ms, 132 710 tx/60 s, 0 failed | **1819.6 tps**, lat 2.198 ms, 163 764 tx/90 s, 0 failed² | **392 tps** |
| **Disk allocated** | 10 GiB | 10 GiB | 0 |
| **Disk used (host thin-LV)** | 26.73 % ≈ **2.67 GiB** | 29.33 % ≈ **2.94 GiB** | +0.27 GiB |
| **Disk used (inside guest)** | 2.1 GiB / 9.7 GiB | 2.4 GiB / 9.7 GiB | +0.3 GiB |
| **Provisioning (rough, create→ready)** | ~1015 s³ | ~6075 s³ | — |
¹ `memory.current` counts reclaimable page cache shared with the host and therefore
**overstates** the LXC's true incremental cost; the +211 MB host-RAM delta is the honest
number. ² VM 60 s runs gave 1739 & 1759 tps — consistent with the 90 s definitive run.
³ Guest-creation step only; see §4. Docker install + first image pull (~network-bound,
~identical for both) is excluded.
### Inside-guest `free -m` (context only — not the decisive number)
| | total | used | buff/cache | available |
|---|---|---|---|---|
| LXC idle | 2048 | 125 | 1851 | 1922 |
| VM idle | 1974 | 509 | 1524 | 1464 |
The VM sees **1974 MB** usable of 2048 allocated (firmware/kernel reservation).
---
## 3. Docker-in-LXC viability
**Worked cleanly in an *unprivileged* LXC with `--features nesting=1,keyctl=1`. No
privileged fallback was needed.**
- `--features nesting=1,keyctl=1 --unprivileged 1` accepted by `pct create` (PVE 9
syntax confirmed via `pct help create`).
- `docker run hello-world` → success.
- **Storage driver: `overlayfs`** (cgroup v2, systemd cgroup driver) — **no `vfs`
fallback**.
- Full 3-container stack (`postgres:17`, `redis:7`, `nginx:alpine`) came up healthy.
- Named volume `pgdata` persisted a write (`SELECT count` returned 1 after table
create/insert).
- Multi-container networking + published port worked: `curl localhost:8080`**HTTP 200**.
- 60 s pgbench load: **0 failed transactions**.
No errors, no `dmesg`/`journalctl` anomalies, no workarounds. The privileged-LXC
fallback path (step A5) was therefore **not exercised**.
---
## 4. Observations & confounds
1. **VM under-load CPU required a re-measurement (diagnosed, not hidden).** The first
VM-load sample showed host CPU ~5 % — identical to *idle* — while pgbench nonetheless
completed a full 60 s run (1739 tps). Root cause: the VM load was launched through a
**nested SSH + `nohup &`** layer (host→VM), which started pgbench *after* the sampling
window. The LXC path used local `pct exec` (no nested SSH) so its first sample was
valid. Re-running with pgbench held in the **foreground of a long-lived SSH channel**
(guaranteed active) and sampling during a confirmed window gave the true **53.9 %**
(`%guest`=31.9). **Confound:** the two guests' load was driven through different
plumbing (`pct exec` vs nested SSH); the *throughput* numbers are unaffected
(pgbench self-reports its own duration), but the CPU figures came from
methodologically asymmetric harnesses.
2. **Baseline drift from residual page cache.** After stopping each guest, host RAM did
not snap back to 1702 MB immediately (e.g. 1895 MB just after the LXC stopped;
1965→1794 MB drifting down after the VM). This is reclaimable cache, not a leak.
Treat all RAM deltas as ±~100 MB.
3. **The headline RAM gap is structural, not incidental.** LXC processes share the host
kernel and page cache, so only the working set counts against the host (+211 MB idle).
The VM, with **no ballooning configured**, has KVM back every guest-touched page —
including the guest's own 1.5 GB page cache — so the host cost ≈ the full 2 GB
allocation (KVM RSS ≈ 2031 MB) and is **largely load-independent** (3758 idle → 3786
load). Ballooning / KSM were not tested and could change this.
4. **`cgroup memory.current` ≠ host cost.** For the LXC it read 1961 MB (near the 2 GB
limit) because it includes reclaimable page cache; the real incremental host cost was
+211 MB. Per the protocol, `MemTotal MemAvailable` is the decisive metric.
5. **VM idle CPU floor (~6 %) vs LXC (~0.3 %).** QEMU device emulation + a full guest
kernel's timer/housekeeping impose a small constant CPU cost even at rest.
6. **Throughput vs CPU trade.** The VM did slightly *less* work (1820 vs 2211 tps) for
*more* host CPU (53.9 vs 39.4 %). The extra cost surfaces as `%guest` (31.9 %) — the
actual DB work *plus* virtualization overhead — whereas in the LXC the same DB work
appears directly as host `%usr`/`%sys`. iowait was comparable (~1516 %, WAL fsync).
7. **Workload fits in RAM.** pgbench scale `-s 10` (~150 MB) fits in cache in both
guests, so the test is commit/CPU-bound rather than disk-bound; a larger-than-RAM
dataset would stress the storage paths differently and is not covered here.
8. **qemu-guest-agent confirmed on the VM** (`qm guest cmd 9000 ping` → OK). This enables
`guest-fsfreeze`-based app-consistent `snapshot`-mode vzdump for the VM — a capability
the LXC has no equivalent for. The genericcloud image does **not** ship the agent;
it had to be installed in-guest (and the VM IP had to be found via `nmap`/MAC until
the agent was up).
9. **Provisioning asymmetry foreshadows cloning.** LXC create is template-extract-bound
(526 MiB at 387 MiB/s + SSH keygen, ~1015 s). VM create is qcow2-import-bound (3 GiB
→ LVM ≈ 30 s) plus a full firmware boot to SSH-ready (~3045 s). Figures are rough,
single-run, and exclude the shared network-bound Docker install + first image pull.
---
## 5. Raw command log (appendix)
### 5.1 Provenance
```
$ pveversion -v | grep ...
pve-manager: 9.2.2 (running version: 9.2.2/b9984c6d90a4bd80)
proxmox-kernel-7.0: 7.0.2-6
criu: 4.1.1-1
lxc-pve: 7.0.0-2
lxcfs: 7.0.0-pve1
pve-container: 6.1.10
pve-qemu-kvm: 11.0.0-3
qemu-server: 9.1.15
$ pvesm status
local dir active 98497780 4333576 89114656 4.40%
local-lvm lvmthin active 365760512 0 365760512 0.00%
# Docker repo trixie channel:
$ curl -fsSL https://download.docker.com/linux/debian/dists/ | grep -oE 'trixie|bookworm|bullseye'
bookworm / bullseye / trixie # trixie present
# Cloud image:
$ qemu-img info debian-13-genericcloud-amd64.qcow2
virtual size: 3 GiB ; disk size: 323 MiB ; compat 1.1 ; build 20260601-2496
```
### 5.2 Baseline (both guests stopped)
```
$ for i in 1..5; awk MemTotal-MemAvailable /proc/meminfo ; sleep 3
used=1699 MB / 1702 / 1702 / 1702 / 1703 MB (median 1702)
$ mpstat 1 5
Average: all 0.05 usr 0.05 sys ... 99.90 idle
```
### 5.3 LXC 9001 — create + Docker
```
$ pct create 9001 local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst \
--hostname spike-lxc --cores 2 --memory 2048 --rootfs local-lvm:10 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp --features nesting=1,keyctl=1 \
--unprivileged 1 --start 1
Logical volume "vm-9001-disk-0" created.
extracting archive ... Total bytes read: 551505920 (526MiB, 387MiB/s)
Creating SSH host key ... done
=== exit: 0 ; status: running
features: nesting=1,keyctl=1 ; unprivileged: 1 ; ip 192.168.0.115/24
# Docker install (official repo, trixie stable): DOCKER-INSTALL-OK
$ docker --version -> Docker version 29.5.3, build d1c06ef
$ docker run --rm hello-world -> Hello from Docker!
$ docker info | grep -iE 'Storage Driver|Cgroup'
Storage Driver: overlayfs
Cgroup Driver: systemd
Cgroup Version: 2
Server Version: 29.5.3 ; Kernel: 7.0.2-6-pve ; OS: Debian GNU/Linux 13 (trixie)
```
### 5.4 LXC 9001 — stack health
```
$ docker compose ps
spike-cache-1 running Up
spike-db-1 running Up
spike-web-1 running Up
$ curl -s -o /dev/null -w 'HTTP %{http_code}' localhost:8080 -> HTTP 200
$ psql CREATE TABLE spike_persist; INSERT; SELECT count(*) -> 1 (volume persists)
```
### 5.5 LXC 9001 — idle measurement
```
Host RAM used (5x3s): 1913 / 1914 / 1913 / 1914 / 1913 MB (median 1913, Δ +211)
cgroup memory.current: 2056036352 B = 1961 MB
inside free -m: total 2048 used 125 buff/cache 1851 available 1922
mpstat 1 5 Average: 0.20 usr 0.10 sys ... 99.70 idle (~0.3% used)
pct df 9001: rootfs 9.7G size, 2.1G used, 21.6%
```
### 5.6 LXC 9001 — under-load measurement
```
$ pgbench -i -s 10 -> done in 1.39 s
$ pgbench -T 60 -c 4 (run concurrently with sampling):
Host RAM used (5x3s): 2149 / 2143 / 2112 / 2086 / 2071 MB (median 2112, Δ +410)
cgroup memory.current: 2130382848 B = 2032 MB
mpstat 1 5 Average: 17.10 usr 7.50 sys 14.50 iowait 0.31 soft 60.59 idle (~39.4% used)
pgbench result: scaling 10, clients 4, 60 s
transactions: 132710 ; failed 0 (0.000%)
latency average = 1.809 ms ; tps = 2211.713864
host thin LV vm-9001-disk-0: 10240 MB, Data% 26.73 (≈2.67 GiB)
```
### 5.7 VM 9000 — create + cloud-init
```
$ qm create 9000 --name spike-vm --cores 2 --memory 2048 \
--net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-single --agent 1
$ qm set 9000 --scsi0 local-lvm:0,import-from=/var/lib/vz/template/qcow2/debian-13-genericcloud-amd64.qcow2
transferred 3.0 GiB of 3.0 GiB (100.00%)
scsi0: successfully created disk 'local-lvm:vm-9000-disk-0,size=3G'
$ qm set 9000 --ide2 local-lvm:cloudinit --boot order=scsi0 --serial0 socket --vga serial0
$ qm disk resize 9000 scsi0 10G -> resized 3.00 -> 10.00 GiB
$ qm set 9000 --ciuser spike --cipassword spike --sshkeys /root/spike-pubkey.pub --ipconfig0 ip=dhcp
# pubkey file = the two real keys from the host's /etc/pve/priv/authorized_keys
# (incl. ssh-ed25519 ...kisfenyo@windows — the same workstation key)
$ qm start 9000 -> start-ok
```
### 5.8 VM 9000 — IP discovery + guest agent + Docker
```
# genericcloud has no guest-agent at first boot -> qm guest cmd ping failed.
# IP found via MAC on the bridge:
$ nmap -sn 192.168.0.0/24 | grep -B2 BC:24:11:C7:41:87
Nmap scan report for 192.168.0.155 ; MAC BC:24:11:C7:41:87 (Proxmox)
$ ssh -i /root/.ssh/id_rsa spike@192.168.0.155 'hostname; cat /etc/debian_version'
spike-vm ; 13.5
# install qemu-guest-agent + Docker (official repo, trixie): VM-INSTALL-OK
$ qm guest cmd 9000 ping -> AGENT OK (fsfreeze available)
$ docker --version -> Docker version 29.5.3, build d1c06ef
$ docker run --rm hello-world -> Hello from Docker!
$ docker info | grep -iE 'Storage Driver|Cgroup'
Storage Driver: overlayfs ; Cgroup Driver: systemd ; Cgroup Version: 2
```
### 5.9 VM 9000 — stack health
```
$ docker compose ps -> spike-cache-1 / spike-db-1 / spike-web-1 all running
$ curl ... localhost:8080 -> HTTP 200
$ psql ... SELECT count(*) -> 1 (volume persists)
```
### 5.10 VM 9000 — idle measurement
```
Host RAM used (5x3s): 3758 / 3757 / 3754 / 3759 / 3758 MB (median 3758, Δ +2056)
KVM process RSS / VSZ: 2079988 / 3380896 KiB (RSS = 2031 MB)
inside free -m: total 1974 used 509 buff/cache 1524 available 1464
mpstat 1 5 Average: 3.37 usr 2.31 sys 0.29 guest ... 94.04 idle (~6.0% used)
qm config: scsi0 local-lvm:vm-9000-disk-0,size=10G
host thin LV vm-9000-disk-0: 10240 MB, Data% 29.33 (≈2.94 GiB)
inside df -h /: 9.7G size, 2.4G used, 25%
```
### 5.11 VM 9000 — under-load measurement (definitive, load confirmed active)
```
# First attempt (nested-ssh + nohup &) launched pgbench AFTER the sample window ->
# host CPU read a false ~5% (identical to idle). Diagnosed; re-run below holds
# pgbench in the foreground of a long-lived SSH channel and samples during it.
$ pgbench -T 90 -c 4 (foreground, channel held):
transactions: 163764 ; failed 0 (0.000%)
latency average = 2.198 ms ; tps = 1819.602345
(60 s confirmation runs: 1739 & 1759 tps)
# Sampled 10 s into the confirmed-active load:
Host RAM used (5x3s): 3784 / 3786 / 3786 / 3786 / 3786 MB (median 3786, Δ +2084)
KVM process RSS / VSZ: 2096508 / 4495008 KiB (RSS = 2047 MB)
guest uptime: load average 1.71 (2 vCPU) -> vCPUs busy
mpstat 1 8 Average:
1.70 usr 3.40 sys 16.35 iowait 0.58 soft 31.89 guest 46.08 idle (~53.9% used)
```
### 5.12 Teardown state
```
$ qm list -> 9000 spike-vm stopped
$ pct list -> 9001 spike-lxc stopped
# both present, both stopped (numbers can be re-checked)
```
---
## 6. Teardown — destroy commands (NOT run)
Both guests were left **stopped but present**. To remove them:
```bash
qm destroy 9000 --purge # VM (also removes cloudinit + disks)
pct destroy 9001 --purge # LXC
# optional spike artifacts on the host:
rm -f /var/lib/vz/template/qcow2/debian-13-genericcloud-amd64.qcow2
rm -f /root/spike-pubkey.pub /root/vm-install.sh
# (Debian 13 CT template left in place: local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst)
```
+315
View File
@@ -0,0 +1,315 @@
# Phase 1 + 2 — Privilege Model & Backup/Restore Round-Trip: Findings
**Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, node confirmed via
`pvesh get /nodes``demo-felhom`. Storage: `local` (dir, content
`iso,vztmpl,backup,import`), `local-lvm` (LVM-thin, `rootdir,images`).
**Subject:** LXC `9001` (`spike-lxc`, unprivileged, `nesting=1,keyctl=1`, Docker +
postgres/redis/nginx stack). **Date:** 2026-06-07.
> Data and observations only — **no recommendation or verdict**.
## Hypotheses — verdicts at a glance
| | Hypothesis | Result |
|---|---|---|
| **H1** | Backup scopes to one VMID; restore/create needs node/pool allocate → denied to narrow token | **CONFIRMED** (create CT = 403) |
| **H2** | An LXC vzdump captures the Docker volumes (they live in the container rootfs) | **CONFIRMED** (sentinel survived both restores) |
| **H3** | Crash-consistent (running) *and* quiesced (stopped) backups both restore cleanly | **CONFIRMED** (A via WAL recovery, B clean start) |
| **H4** | Running unprivileged LXC snapshots on LVM-thin; restored CT keeps unprivileged+nesting/keyctl | **CONFIRMED** (live snapshot OK; config survived) |
---
## 1. Phase 1 — Privilege model
### 1.1 Setup (operator side, root)
```
pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit"
pveum user add felhom-ctl@pve --comment "spike in-guest controller"
pveum user token add felhom-ctl@pve ctl --privsep 1 # secret: b6547d9d-... (ephemeral, spike-only)
pveum acl modify /vms/9001 -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup
```
Privilege names were verified against `PVEVMAdmin` / `PVEDatastoreUser` via
`pveum role list` first. **Note:** the reference doc's introspection command
`pveum role info <role>` **does not exist in PVE 9** — only `pveum role list` works.
### 1.2 ⚠️ Privsep gotcha — the doc's runbook is incomplete
With `--privsep 1`, a token's effective rights are the **intersection of the backing
user's permissions AND the token's own ACLs**. The reference doc (§3) grants ACLs to the
**token only**. With the user `felhom-ctl@pve` holding **no** permissions, the
intersection was **empty** — the first self-audit call returned:
```
HTTP 403 {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}
```
**Fix applied:** also grant the user the role on the same paths
(`pveum acl modify /vms/9001 -user felhom-ctl@pve -role FelhomSelfBackup`, same for
`/storage/local`). After that the self-calls succeeded. **A privsep token needs the
permission present on *both* the user and the token** (the token ACL is what keeps the
token ≤ user / narrowly scoped). This must be reflected in the controller provisioning.
### 1.3 Test matrix (every call run from **inside** the unprivileged LXC, `pct exec 9001`)
`H=192.168.0.162 N=demo-felhom AUTH="PVEAPIToken=felhom-ctl@pve!ctl=<secret>"`
| # | Call | Expected | **Actual** | Notes |
|---|---|---|---|---|
| 1 | `GET /version` | 200 | **200** | reachable + auth from inside LXC (no privilege needed) |
| 2 | `GET /nodes/$N/lxc/9001/status/current` | 200 | **200**¹ | self audit (after privsep fix) |
| 3 | `POST /nodes/$N/lxc/9001/snapshot snapname=spk1` | 200/UPID→OK | **200, task exitstatus OK** | **running-LXC self-snapshot (H4)** |
| 4 | `POST /nodes/$N/vzdump vmid=9001 storage=local mode=snapshot` | 200/UPID→OK | **200, task exitstatus OK** | self backup, archive produced |
| 5 | `GET /nodes/$N/qemu/9000/status/current` | 403 | **403** | `Permission check failed (/vms/9000, VM.Audit)` |
| 6 | `POST /nodes/$N/vzdump vmid=9000 storage=local` | 403 | **200 POST → task exitstatus 403**² | see note |
| 7 | `POST /nodes/$N/lxc` (create CT) | 403 | **403** | `Permission check failed`**proves create/allocate is operator-tier (H1)** |
¹ before the privsep fix this was 403; see §1.2.
² **Important nuance:** the `vzdump` endpoint accepts the POST and returns a UPID even for
an unauthorized vmid; the authorization failure surfaces at **task execution**, not at the
HTTP layer. Polled from root:
`exitstatus: "403 Permission check failed (/vms/9000, VM.Backup)"`, and **no 9000 archive
was created**. The boundary holds — but a controller must **poll the task exitstatus**, not
trust the POST's 200, to know a cross-guest backup was actually refused.
**Pass criteria met:** self-ops (14) succeed; cross-guest read (5), cross-guest backup
(6, at task level), and create/allocate (7) are denied. The controller-as-guest boundary
and the two-tier split are validated.
### 1.4 Final minimal role — `VM.PowerMgmt` **not** required
The doc's open question ("does Tier A need `VM.PowerMgmt` for stop-mode backups? Likely
yes"). **Tested and refuted:** a **stop-mode** self-vzdump submitted by the token
(`vmid=9001 mode=stop`) completed with **`exitstatus: OK`** using the role *without*
`VM.PowerMgmt`. `vzdump` performs the guest shutdown/restart internally under
`VM.Backup`; no separate power privilege is needed.
> **Final minimal role (`FelhomSelfBackup`) — satisfies self-audit, self-snapshot, and
> both `snapshot`- and `stop`-mode self-backup:**
> `VM.Audit, VM.Snapshot, VM.Backup, Datastore.AllocateSpace, Datastore.Audit`
> (`VM.PowerMgmt` deliberately omitted — confirmed unnecessary.)
### 1.5 TLS observation
From inside the LXC, `curl` **without** `-k`:
```
curl: (60) SSL certificate problem: unable to get local issuer certificate
```
The host serves the default self-signed PVE cert; all tests used `-k`. Production trust
(pin the PVE CA / issue a proper cert) is a separate design decision, flagged here.
### 1.6 Running-LXC snapshot (H4)
Call #3 snapshotted the **running** unprivileged LXC on LVM-thin (`exitstatus OK`).
`pct listsnapshot 9001` shows `spk1` with `pct status 9001 = running`. **No stop
required** — the snapshot-before-update rollback flow is viable on a live container.
---
## 2. Phase 2 — Backup → real restore round-trip
Sentinel written pre-flight into the `pgdata` volume:
`restore_check(42,'phase2-sentinel')` → clean read `42|phase2-sentinel`.
### 2.1 Backups (operator/root side)
| Variant | Mode | Stack state | Task time | Wall | Archive | Size (zstd) |
|---|---|---|---|---|---|---|
| **A — crash-consistent** | `snapshot` | **running** | 00:00:24 | 25 s | `vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst` | **934 MB** (979,718,569 B) |
| **B — quiesced** | `snapshot` | **stopped** (`docker compose stop`) | 00:00:21 | 22 s | `vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst` | **934 MB** (979,671,582 B) |
Both from a 2.5 GiB source; zstd → ~934 MB (~2.7:1). The stack was restarted after
Variant B. **LXC snapshot-mode vzdump does *not* fsfreeze** (no guest agent in an LXC —
consistent with the Phase 0 finding) → Variant A is genuinely crash-consistent.
### 2.2 Restore → fresh VMID → boot → verify
| Check | 9002 (Variant A) | 9003 (Variant B) |
|---|---|---|
| Restore time (`pct restore … --storage local-lvm`) | **12 s** | **11 s** |
| `unprivileged: 1` survived | **yes** | **yes** |
| `features: nesting=1,keyctl=1` survived | **yes** | **yes** |
| Containers after boot | `exited` (no restart policy) → `docker compose up -d` | same |
| 3 containers healthy | **yes** | **yes** |
| `curl localhost:8080` | **HTTP 200** | **HTTP 200** |
| **Sentinel `(42,'phase2-sentinel')`** | **PRESENT** | **PRESENT** |
| Postgres first-start | **WAL crash recovery** (see below) | **clean start, no recovery** |
> Restored CTs inherit 9001's fixed `hwaddr`. To avoid a MAC clash with the still-running
> 9001 on `vmbr0`, `net0` was reset to auto-generate a fresh MAC before boot. All
> verification (stack health, `curl localhost`, sentinel) is guest-internal and needs no
> external network — and the Docker images are inside the restored rootfs, so no pulls.
**Variant A — Postgres automatic WAL recovery on 9002 (verbatim, post-restore boot):**
```
LOG: database system was interrupted; last known up at 2026-06-07 18:13:21 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/CB12838
LOG: invalid record length at 0/CB12870: expected at least 24, got 0 # normal end-of-WAL
LOG: redo done at 0/CB12838 ...
LOG: checkpoint starting: end-of-recovery immediate wait
LOG: database system is ready to accept connections
```
**Variant B — clean start on 9003 (verbatim, post-restore boot):**
```
LOG: database system was shut down at 2026-06-07 18:14:39 UTC
LOG: database system is ready to accept connections
```
**H2 confirmed:** one LXC vzdump captured the whole customer including the Docker named
volume — the sentinel data restored in both guests. **H3 confirmed:** both variants
restored to a bootable guest with intact data; the crash-consistent one recovered via WAL
with no manual intervention, the quiesced one started clean. **H4 confirmed:** restored
config preserved `unprivileged` + `nesting/keyctl`, so Docker ran in the restored CT.
---
## 3. Observations & confounds
1. **Privsep token needs perms on user *and* token** (§1.2) — the single most important
correction to the reference runbook; without it every scoped call 403s.
2. **vzdump authorization is task-level, not POST-level** (§1.3 note ²) — a 200 + UPID
does **not** mean authorized. The controller must poll `exitstatus`. This is also the
general async-task lesson: every backup/snapshot/restore returns a UPID and the real
result is in the task status.
3. **`pveum role info` is gone in PVE 9** — use `pveum role list`. Minor doc drift.
4. **`VM.PowerMgmt` not needed for stop-mode backup** (§1.4) — narrower role than the doc
assumed.
5. **No fsfreeze for LXC** — Variant A relied on Postgres's own WAL crash recovery, which
worked here for an idle-at-backup DB. Under heavy write load, app-consistency for LXC
still rests on the controller quiescing first (or stop-mode), exactly as the reference
warned. This single test is not a durability guarantee under load.
6. **Restore MAC collision** (§2.2) — `pct restore` preserves the source `hwaddr`;
restoring while the original runs needs a MAC reset (or the original stopped). The
controller's restore flow must handle identity (MAC/hostname/IP) to avoid clashes.
7. **No restart policy on the compose services** — restored containers came up `exited`;
`docker compose up -d` (or a restart policy / systemd unit) is required for the stack
to return automatically after a restore or guest reboot.
8. **Restore is fast, backup dominated by I/O** — restores were 1112 s (extract at
~524 MiB/s); backups ~2225 s (read 2.5 GiB at ~108119 MiB/s + zstd). Single runs,
idle host, ~150 MB DB; not a throughput benchmark.
9. **Sequencing artifact:** a Phase-1 stop-mode self-backup ran before Phase 2 and
stopped/started 9001; the stack was brought back up and the sentinel re-verified
before the Variant A/B backups, so it does not affect the round-trip results.
---
## 4. Raw command log (appendix)
### 4.1 Pre-flight
```
$ pvesh get /nodes -> node: demo-felhom
$ cat /etc/pve/storage.cfg
dir: local ... content iso,vztmpl,backup,import # 'backup' present
lvmthin: local-lvm ... content rootdir,images # no backup (expected)
$ pct start 9001 ; docker compose up -d -> 3 containers Started
$ curl localhost:8080 -> HTTP 200
# sentinel:
CREATE TABLE ; INSERT 0 1 ; SELECT count -> 1 ; SELECT * -> 42 | phase2-sentinel
```
### 4.2 Phase 1 — role/user/token/ACL
```
$ pveum role add FelhomSelfBackup -privs "VM.Audit VM.Snapshot VM.Backup Datastore.AllocateSpace Datastore.Audit" -> role-ok
$ pveum user add felhom-ctl@pve --comment "spike in-guest controller" -> user-ok
$ pveum user token add felhom-ctl@pve ctl --privsep 1
{"full-tokenid":"felhom-ctl@pve!ctl","info":{"privsep":"1"},"value":"b6547d9d-08ec-4f22-beb8-a551dc2cd69d"}
$ pveum acl modify /vms/9001 -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup -> ok
$ pveum acl modify /storage/local -token 'felhom-ctl@pve!ctl' -role FelhomSelfBackup -> ok
$ pveum role list | grep FelhomSelfBackup
FelhomSelfBackup | Datastore.AllocateSpace,Datastore.Audit,VM.Audit,VM.Backup,VM.Snapshot
$ pveum role info FelhomSelfBackup -> ERROR: unknown command 'pveum role info' # PVE9 has no 'role info'
```
### 4.3 Phase 1 — matrix (from inside LXC)
```
# TLS without -k:
curl: (60) SSL certificate problem: unable to get local issuer certificate
# BEFORE privsep fix:
#2 GET self status -> HTTP 403 {"message":"Permission check failed (/vms/9001, VM.Audit)\n"}
# privsep fix:
$ pveum acl modify /vms/9001 -user 'felhom-ctl@pve' -role FelhomSelfBackup -> ok
$ pveum acl modify /storage/local -user 'felhom-ctl@pve' -role FelhomSelfBackup -> ok
# AFTER fix:
#1 GET /version -> HTTP 200
#2 GET /nodes/.../lxc/9001/status/current -> HTTP 200 {"data":{...,"status":"running",...}}
#5 GET /nodes/.../qemu/9000/status/current -> HTTP 403 (/vms/9000, VM.Audit)
#6 POST vzdump vmid=9000 -> HTTP 200 {"data":"UPID:...vzdump:9000:felhom-ctl@pve!ctl:"}
root poll: exitstatus="403 Permission check failed (/vms/9000, VM.Backup)"
task log: TASK ERROR: 403 Permission check failed (/vms/9000, VM.Backup)
/var/lib/vz/dump: no 9000 archive created
#7 POST /nodes/.../lxc (create CT vmid=9009) -> HTTP 403 {"message":"Permission check failed\n"}
#3 POST lxc/9001/snapshot snapname=spk1 -> HTTP 200 UPID:...vzsnapshot:9001...
root: exitstatus "OK" ; pct listsnapshot 9001 -> spk1 ; pct status 9001 -> running
#4 POST vzdump vmid=9001 storage=local mode=snapshot -> HTTP 200 UPID:...vzdump:9001...
root: exitstatus "OK"
token can read own task status: HTTP 200 {"...exitstatus":"OK"} # earlier poll TIMEOUTs were a shell-quoting bug in the helper, not a perms issue
# stop-mode self-backup (VM.PowerMgmt test):
$ token POST vzdump vmid=9001 storage=local mode=stop -> HTTP 200 UPID:...vzdump:9001...
root poll: exitstatus "OK" # SUCCEEDED without VM.PowerMgmt in the role
```
### 4.4 Phase 2 — backups
```
# Variant A (running):
$ vzdump 9001 --mode snapshot --storage local --compress zstd
INFO: Total bytes written: 2585589760 (2.5GiB, 108MiB/s)
INFO: archive file size: 934MB
INFO: Finished Backup of VM 9001 (00:00:24) ; WALL_SECONDS=25
-> vzdump-lxc-9001-2026_06_07-20_13_43.tar.zst (979718569 B)
# Variant B (stopped):
$ docker compose stop (cache,db,web Stopped)
$ vzdump 9001 --mode snapshot --storage local --compress zstd
INFO: Total bytes written: 2585825280 (2.5GiB, 119MiB/s)
INFO: Finished Backup of VM 9001 (00:00:21) ; WALL_SECONDS=22
-> vzdump-lxc-9001-2026_06_07-20_14_40.tar.zst (979671582 B)
$ docker compose start (db,cache,web Started)
```
### 4.5 Phase 2 — restores + verification
```
# A -> 9002:
$ pct restore 9002 .../20_13_43.tar.zst --storage local-lvm
Total bytes read: 2585589760 (2.5GiB, 524MiB/s) ; RESTORE_A_SECONDS=12
$ pct config 9002 -> features: nesting=1,keyctl=1 ; unprivileged: 1
$ pct set 9002 -net0 name=eth0,bridge=vmbr0,ip=dhcp # fresh MAC BC:24:11:E3:F4:64
$ pct start 9002 ; docker compose up -d -> 3 running ; curl -> HTTP 200
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
db log: "was interrupted ... not properly shut down; automatic recovery in progress
redo starts/redo done ... database system is ready to accept connections"
# B -> 9003:
$ pct restore 9003 .../20_14_40.tar.zst --storage local-lvm
Total bytes read: 2585825280 (2.5GiB, 524MiB/s) ; RESTORE_B_SECONDS=11
$ pct config 9003 -> features: nesting=1,keyctl=1 ; unprivileged: 1
$ pct set 9003 -net0 ... (fresh MAC) ; pct start 9003 ; docker compose up -d -> 3 running ; curl 200
$ psql SELECT * FROM restore_check -> 42 | phase2-sentinel
db log: "database system was shut down at ... ; database system is ready to accept connections" # clean
```
---
## 5. Teardown (executed)
Restore targets destroyed; Phase 1 objects and spike artifacts removed; `9000`/`9001`
left **stopped-but-present**. Verified clean: `felhom-ctl@pve` deleted, no spike ACLs,
empty `dump/`, `spk1` removed.
> **Correction:** `pveum acl delete` **requires `--roles`** (a bare `-user`/`-token`
> path errors `400 roles: property is missing`). In practice the explicit ACL deletes
> are unnecessary — deleting the token/user/role **auto-invalidates** the referencing
> ACLs (PVE logs `ignore invalid acl token …` and drops them).
```bash
pct stop 9002 ; pct stop 9003 ; pct destroy 9002 --purge ; pct destroy 9003 --purge
# correct ACL-delete syntax (needs --roles), or just let user/role deletion clean them:
pveum acl delete /vms/9001 --roles FelhomSelfBackup --users 'felhom-ctl@pve'
pveum acl delete /vms/9001 --roles FelhomSelfBackup --tokens 'felhom-ctl@pve!ctl'
pveum acl delete /storage/local --roles FelhomSelfBackup --users 'felhom-ctl@pve'
pveum acl delete /storage/local --roles FelhomSelfBackup --tokens 'felhom-ctl@pve!ctl'
pveum user token remove felhom-ctl@pve ctl ; pveum user delete felhom-ctl@pve ; pveum role delete FelhomSelfBackup
pct delsnapshot 9001 spk1
rm -f /var/lib/vz/dump/vzdump-lxc-9001-*.tar.zst /var/lib/vz/dump/vzdump-lxc-9001-*.log
pct stop 9001 # back to stopped-but-present
```
## 6. To destroy 9000/9001 later (NOT run — left stopped-but-present)
```bash
qm destroy 9000 --purge # VM (Phase 0 subject)
pct destroy 9001 --purge # LXC (Phase 0/1/2 subject)
# Debian 13 CT template left in place: local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst
```
+234
View File
@@ -0,0 +1,234 @@
# Phase 3 — vzdump exclusion (B2) & agent operator role + root boundary (B3): Findings
**Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, node confirmed via
`pvesh get /nodes``demo-felhom`. **Date:** 2026-06-08. Throwaway resources (VMIDs
9010-9023, role/user `FelhomAgent`/`felhom-agent@pve`); all torn down (only the pre-existing
9000/9001 remain, stopped). Every Proxmox op polled to `task exitstatus` (not the POST
return).
> Validates the two items the design review (`_design-review.md`) flagged as unvalidated:
> **B2** (what vzdump includes/excludes per LXC mount type + how to keep bulk out) and **B3**
> (the least-privilege operator role + the root-vs-API boundary). Data only.
---
## B2 — vzdump inclusion/exclusion matrix
**Setup:** one unprivileged LXC `9010` (`nesting=1,keyctl=1`, overlayfs), Docker 29.5.3
installed, with five sentinel locations:
| # | location | config |
|---|---|---|
| 1 | rootfs file `/SENTINEL_ROOTFS` | rootfs (`local-lvm:8`) |
| 2 | Docker **named** volume `b2vol``SENTINEL_DOCKERVOL` | default driver |
| 3 | `mp1` volume mount `/mnt/mp1` `SENTINEL_MP1` | `local-lvm:1,backup=1` |
| 4 | `mp2` volume mount `/mnt/mp2` `SENTINEL_MP2` | `local-lvm:1,backup=0` |
| 5 | `mp3` **bind** mount `/mnt/mp3` `SENTINEL_MP3` | host `/root/b2-bindsrc` |
| 6 | bulk Docker vol `bulkvol` bound onto mp2 → `SENTINEL_BULK` | `--driver local -o type=none -o o=bind -o device=/mnt/mp2` |
**The "trap" confirmed at setup:** the Docker named volume's on-disk path is
`/var/lib/docker/volumes/b2vol/_data`**inside the LXC rootfs**.
### Result matrix (stop-mode vzdump → `local`, verified 3 ways: vzdump log, archive grep, restore to 9011)
| Sentinel | location | flag | **in archive?** | restored 9011 |
|---|---|---|---|---|
| `SENTINEL_ROOTFS` | rootfs | — | **INCLUDED** | present |
| `SENTINEL_DOCKERVOL` | Docker named vol (in rootfs) | — | **INCLUDED** ⚠️ the trap | present |
| `SENTINEL_MP1` | volume mp | `backup=1` | **INCLUDED** | present |
| `SENTINEL_MP2` | volume mp | `backup=0` | **EXCLUDED** | absent (vol recreated empty) |
| `SENTINEL_MP3` | bind mount | n/a | **EXCLUDED** | reappears via re-bind only¹ |
| `SENTINEL_BULK` | Docker vol on mp2 | `backup=0` | **EXCLUDED** | absent |
¹ The bind-mount **data is not in the archive** (archive grep shows no mp3 path). It
reappears in the restored 9011 only because `pct restore` preserves the bind config
`mp3: /root/b2-bindsrc` and re-attaches the **same host dir**. On a *different* host (true DR)
the bind data would be gone unless backed up separately — important for DR planning.
**vzdump log (verbatim) — the authoritative per-mount decision:**
```
INFO: including mount point rootfs ('/') in backup
INFO: including mount point mp1 ('/mnt/mp1') in backup
INFO: excluding volume mount point mp2 ('/mnt/mp2') from backup (disabled)
INFO: excluding bind mount point mp3 ('/mnt/mp3') from backup (not a volume)
```
**Archive contents (verbatim) — `tar --zstd -tf … | grep SENTINEL`:**
```
./var/lib/docker/volumes/b2vol/_data/SENTINEL_DOCKERVOL
./SENTINEL_ROOTFS
./mnt/mp1/SENTINEL_MP1
```
**Restore verification (verbatim) — sentinels in restored 9011:**
```
PRESENT : /SENTINEL_ROOTFS
PRESENT : /var/lib/docker/volumes/b2vol/_data/SENTINEL_DOCKERVOL
PRESENT : /mnt/mp1/SENTINEL_MP1
ABSENT : /mnt/mp2/SENTINEL_MP2
ABSENT : /mnt/mp2/SENTINEL_BULK
PRESENT : /mnt/mp3/SENTINEL_MP3 # via re-bind to same host dir, NOT from archive
```
### Proven bulk-exclusion recipe
A "bulk" Docker volume is kept out of the guest vzdump by binding it onto a **volume
mountpoint with `backup=0`**:
1. Attach a Proxmox volume mountpoint with the flag:
`pct set <id> -mpN <storage>:<size>,mp=/mnt/bulk,backup=0`
2. Realize the Docker volume on that path:
`docker volume create --driver local -o type=none -o o=bind -o device=/mnt/bulk bulkvol`
(or a compose bind to `/mnt/bulk`).
3. Data written through `bulkvol` lands on the `backup=0` mountpoint → **excluded** from
vzdump, while rootfs/hot sentinels are **included**. Verified: `SENTINEL_BULK` absent from
archive and restore; `SENTINEL_ROOTFS` present.
### The trap, stated for the placement component
`backup=<boolean>` is **only honoured for volume mount points** (confirmed: pct manpage +
vzdump log "excluding volume mount point … (disabled)"). A Docker **named volume uses the
default driver and lands in the rootfs**, which is **always backed up** — so a "bulk" volume
left as an ordinary named volume is **silently swept into the whole-guest image**. The
per-volume placement component **must** realize every `bulk` volume as a dedicated `backup=0`
mountpoint (or external bind mount), never a default named volume.
---
## B3 — agent operator role + root-vs-API boundary
**Caveat applied (Phase 1):** privsep token needs the role on **both** user and token. Setup:
user `felhom-agent@pve` + privsep token `agent`, role `FelhomAgent`, dual-granted at `/`.
All ops driven **as the token** via the REST API; task `exitstatus` polled.
> ⚠️ **Terminology:** the Phase-1 `FelhomSelfBackup` role is the discarded **guest-side
> self-backup** role (scoped to one guest, *denied* create/allocate). `FelhomAgent` here is
> its **operator-tier replacement** — a different, broader role. Do not conflate.
### Op matrix (as the scoped token)
| # | Operation | API call | Result |
|---|---|---|---|
| read | host status | `GET /nodes/$N/status` | **200** (needs `Sys.Audit`) |
| read | storage list | `GET /storage` | **200** (`Datastore.Audit`) |
| 1 | **create LXC, `nesting=1,keyctl=1`** | `POST /nodes/$N/lxc` | **403**`changing feature flags (except nesting) is only allowed for root@pam` |
| 1 | create LXC, **nesting-only** | `POST /nodes/$N/lxc` | **200 / OK** |
| 2 | set config (mem/cpu/options + mountpoint w/ `backup` flag) | `PUT /nodes/$N/lxc/<id>/config` | **200** |
| 3 | allocate volume | `POST /nodes/$N/storage/local-lvm/content` | **200** (`Datastore.AllocateSpace`) |
| 4 | start | `POST …/status/start` | **OK** (`VM.PowerMgmt`) |
| 5 | stop | `POST …/status/stop` | **OK** |
| 6a | snapshot | `POST …/snapshot` | **OK** (`VM.Snapshot`) |
| 6b | rollback | `POST …/snapshot/s1/rollback` | **OK** (`VM.Snapshot.Rollback`) |
| 7 | stop-mode backup | `POST /nodes/$N/vzdump mode=stop` | **OK** (`VM.Backup`) |
| 8 | restore → fresh vmid | `POST /nodes/$N/lxc restore=1` | **OK** — and **restored CT kept `features: nesting=1,keyctl=1`** |
| 9 | destroy CT | `DELETE /nodes/$N/lxc/<id>?purge=1` | **OK** (`VM.Allocate`) |
| 9b | add storage definition (dir) | `POST /storage` | **200** (`Datastore.Allocate`, **no root**) |
**The two headline results:**
1. **`keyctl=1` on create is `root@pam`-only.** Verbatim:
`Permission check failed (changing feature flags (except nesting) is only allowed for root@pam)`.
Confirmed this is **not** token-fixable: a **non-privsep `root@pam` token** got the **same
403**. Only an actual `root@pam` session (OS root / `pct create` as root) can set it.
`nesting` alone is allowed for a scoped token.
2. **Restore preserves `keyctl`.** A token-authorized `vzrestore` of a keyctl archive produced
`9021` with `features: nesting=1,keyctl=1, unprivileged: 1`. So the **DR/restore path is
fully token-covered**; only *fresh provisioning* needs root for the keyctl flag.
### Paring (each drop shown to still pass, or proven needed)
| Privilege | Verdict | Evidence |
|---|---|---|
| `Datastore.AllocateTemplate` | **DROP** (unnecessary) | create-from-template succeeded without it (200/OK) |
| `Sys.Audit` | **KEEP** | `GET /nodes/$N/status`**403** without it (host metrics, `03` §5) |
| `VM.Config.Network` | **KEEP** | create with `net0`**403 (/vms/…, VM.Config.Network)** without it |
| `VM.Config.Options` | **KEEP** | config `onboot=1`**403 (/vms/…, VM.Config.Options)** without it |
| `SDN.Use` | **KEEP (added vs review sketch)** | create → **403 (/sdn/zones/localnetwork/vmbr0, SDN.Use)** without it |
> Corrections to the review's candidate sketch: `VM.Config.CPUMemory` is **not a real
> privilege** — split into `VM.Config.CPU` + `VM.Config.Memory`. `SDN.Use` was **missing** and
> is **required** (PVE 9 gates bridge use behind it). `Datastore.AllocateTemplate` is **not
> needed**.
### Final minimal `FelhomAgent` role (proven sufficient for ops 1′–9b)
```
VM.Allocate VM.Audit VM.Config.Disk VM.Config.CPU VM.Config.Memory
VM.Config.Network VM.Config.Options VM.PowerMgmt VM.Snapshot VM.Snapshot.Rollback
VM.Backup Datastore.Allocate Datastore.AllocateSpace Datastore.Audit Sys.Audit SDN.Use
```
(16 privileges. `Datastore.Allocate` is for the storage-definition add; drop it if the agent
never creates Proxmox storage entries via the API. `VM.PowerMgmt` is for start/stop lifecycle
— not for the backup itself, consistent with `proxmox-platform.md` §3.4.)
### Root-vs-API boundary table (answers `03` §3)
| Agent host operation | Coverage | Notes |
|---|---|---|
| Create unprivileged LXC, **nesting-only** | **API token** | `VM.Allocate`+`VM.Config.*`+`Datastore.AllocateSpace`+`SDN.Use` |
| **Create with `keyctl=1` (Docker needs it — Phase 0)** | **OS root `root@pam`** (`pct create` as root / sudoers) | no API token works, incl. a root@pam token |
| Set config (mem/cpu/net/options/mountpoint + `backup` flag) | API token | |
| Allocate guest volume | API token | `Datastore.AllocateSpace` |
| Start / stop / snapshot / rollback | API token | `VM.PowerMgmt` / `VM.Snapshot(.Rollback)` |
| vzdump backup (stop/snapshot mode) | API token | `VM.Backup` |
| **Restore from vzdump (preserves keyctl)** | **API token** | DR path needs no root |
| Destroy guest (scratch + compensating rollback, B1) | API token | `VM.Allocate` |
| Add Proxmox **storage definition** (dir/nfs/cifs/pbs) | API token | `Datastore.Allocate`; the *definition* only |
| Host status / metrics report | API token | `Sys.Audit` |
| **USB physical mount-by-UUID / systemd mount unit / fstab** | **OS root / narrow sudoers** | not a Proxmox API op (host-level mount; not tested here) |
| **SMART / hardware sensors** | OS root | not API-exposed |
**Boundary summary:** nearly the entire guest lifecycle — including **restore** — is covered
by the scoped token. The genuine OS-root residual is narrow: **(1) fresh creation of a
Docker-capable LXC (the `keyctl` flag), (2) physical USB mount-by-UUID / systemd mount units /
fstab, (3) hardware/SMART.** This supports `03` §3's "non-root service + scoped token + narrow
sudoers" model — with the **specific** sudoers/root entries being: `pct create` (or just the
keyctl-setting step) and the host mount operations.
---
## Raw command log (appendix)
### B2
```
pct create 9010 ... --features nesting=1,keyctl=1 --unprivileged 1 # rootfs local-lvm:8
pct set 9010 -mp1 local-lvm:1,mp=/mnt/mp1,backup=1
pct set 9010 -mp2 local-lvm:1,mp=/mnt/mp2,backup=0
pct set 9010 -mp3 /root/b2-bindsrc,mp=/mnt/mp3
# docker named vol: docker volume inspect b2vol -> /var/lib/docker/volumes/b2vol/_data
# bulk: docker volume create --driver local -o type=none -o o=bind -o device=/mnt/mp2 bulkvol
vzdump 9010 --mode stop --storage local --compress zstd
# INFO: including mount point rootfs ('/') in backup
# INFO: including mount point mp1 ('/mnt/mp1') in backup
# INFO: excluding volume mount point mp2 ('/mnt/mp2') from backup (disabled)
# INFO: excluding bind mount point mp3 ('/mnt/mp3') from backup (not a volume)
tar --zstd -tf <archive> | grep SENTINEL # -> rootfs, dockervol, mp1 only
pct restore 9011 <archive> --storage local-lvm # -> mp2/bulk absent, mp3 via re-bind
```
### B3
```
pveum role add FelhomAgent -privs "VM.Allocate VM.Audit VM.Config.Disk VM.Config.CPU VM.Config.Memory VM.Config.Network VM.Config.Options VM.PowerMgmt VM.Snapshot VM.Snapshot.Rollback VM.Backup Datastore.Allocate Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Sys.Audit" # candidate (pre-SDN)
pveum user add felhom-agent@pve ; pveum user token add felhom-agent@pve agent --privsep 1
pveum acl modify / -user 'felhom-agent@pve' -role FelhomAgent
pveum acl modify / -token 'felhom-agent@pve!agent' -role FelhomAgent
# token create with keyctl:
POST /nodes/demo-felhom/lxc ... features=nesting=1,keyctl=1
-> 403 "changing feature flags (except nesting) is only allowed for root@pam"
# + SDN.Use missing initially:
-> 403 "Permission check failed (/sdn/zones/localnetwork/vmbr0, SDN.Use)"
# root@pam non-privsep token, keyctl create:
-> 403 (same "only allowed for root@pam") # tokens never qualify
# token nesting-only create / config(PUT) / start / stop / snapshot / rollback /
# vzdump(stop) / restore->9021 (kept keyctl) / destroy / POST /storage -> all 200/OK
# paring:
GET /nodes/$N/status without Sys.Audit -> 403 (KEEP)
create net0 without VM.Config.Network -> 403 (KEEP)
config onboot=1 without VM.Config.Options -> 403 (KEEP)
create from template without Datastore.AllocateTemplate -> OK (DROP)
```
### Teardown
```
pct destroy 9010 9011 9021 --purge # 9020/9022/9023 already destroyed during tests
pveum user token remove felhom-agent@pve agent ; pveum user delete felhom-agent@pve
pveum role delete FelhomAgent # ACLs at / auto-invalidated
rm -f /var/lib/vz/dump/vzdump-lxc-9010-* /var/lib/vz/dump/vzdump-lxc-9020-*
# verified: only 9000/9001 remain (stopped-but-present); no felhom-agent user/role; dump dir empty
```
@@ -0,0 +1,257 @@
# Phase 4 — Control-plane signing primitive (SSHSIG + Go verify): Findings
**Where run:** build server `192.168.0.180` (Debian 13, **Go 1.24.4**, **OpenSSH 10.0p2**),
no Proxmox. **Date:** 2026-06-08. Throwaway key generated, used, and **deleted** — no private
key, passphrase, or `.sig` committed.
> De-risks the signing primitive *before* it is written into `04-control-plane-authorization.md`
> or the agent's verify code. **Verdict up front: the approach works cleanly and is key-type-
> agnostic — no fallback needed.** Go verifies the armored `SSHSIG` format, every tamper/replay/
> authorization case is rejected, and a synthetic FIDO2 `sk-ssh-ed25519` signature verifies
> through the **unchanged** code path (true hardware drop-in).
---
## 0. Result at a glance — 14/14 checks pass
```
== Step 2: SSHSIG signature verification (key-type-agnostic path) ==
PASS correct verified, op="guest_destroy"
PASS wrong key rejected: signer not in allowed set
PASS tampered blob rejected: signature invalid: ssh: signature did not verify
PASS wrong namespace rejected: namespace mismatch: got "felhom-op-wrong" want "felhom-op-v1"
== Step 3: anti-replay / authorization (valid signature, still rejected) ==
PASS first use verified, op="guest_destroy"
PASS replay (same nonce) rejected: replay: nonce a1b2c3d4...8f90 already seen
PASS expired rejected: expired (expires_at=2020-01-02 ..., now=2026-06-08 ...)
PASS not-yet-valid rejected: not yet valid (issued_at=2030-01-01 ...)
PASS retargeted host rejected: target mismatch: blob=demo-felhom/9001 this=other-host/9001
PASS retargeted guest rejected: target mismatch: blob=demo-felhom/9001 this=demo-felhom/8888
== Step 4: key-type-agnosticism — FIDO2 sk-ssh-ed25519 (synthetic, no device) ==
PASS parses sk pubkey type="sk-ssh-ed25519@openssh.com"
PASS authorized_keys form sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5...
PASS sk end-to-end verify verified, op="guest_destroy"
```
---
## 1. Software round-trip (baseline, CLI)
- Key: `ssh-keygen -t ed25519 -f felhom-op -N '<passphrase>' -C felhom-operator`.
(Signing non-interactively used an `SSH_ASKPASS` helper + `setsid -w`; in production the
operator key lives behind an agent or a FIDO2 device, so the at-sign passphrase prompt is a
non-issue. The passphrase mechanics are **not** what this spike de-risks.)
- Sign with a **domain-separated namespace**:
`ssh-keygen -Y sign -f felhom-op -n felhom-op-v1 blob.json``blob.json.sig`
(armored `-----BEGIN SSH SIGNATURE-----`).
- Baseline verify (CLI sanity) with an allow-list:
```
allowed_signers: felhom-operator namespaces="felhom-op-v1" ssh-ed25519 AAAAC3...
$ ssh-keygen -Y verify -f allowed_signers -I felhom-operator -n felhom-op-v1 \
-s blob.json.sig < blob.json
Good "felhom-op-v1" signature for felhom-operator with ED25519 key SHA256:y0Lj8dIYTM6...
```
## 2. Canonical op blob spec (documented)
The signature covers **these exact bytes**; the operator CLI (also Go) must reproduce them
byte-for-byte. **Canonical form: JSON, keys sorted lexicographically at every level, no
insignificant whitespace, no trailing newline, UTF-8.**
```json
{"expires_at":"<RFC3339 UTC>","issued_at":"<RFC3339 UTC>","key_id":"<id>","nonce":"<128-bit hex>","op":"<op>","params":{...},"target":{"guest_id":"<vmid>","host_id":"<node>"}}
```
| field | meaning |
|---|---|
| `op` | the operation, e.g. `guest_destroy`, `storage_detach`, `restore_overwrite` |
| `target.host_id` / `target.guest_id` | the box + guest the op is bound to (anti-retarget) |
| `params` | op-specific arguments (themselves canonical-sorted) |
| `nonce` | unique per op (anti-replay); ≥128-bit random |
| `issued_at` / `expires_at` | validity window (short — minutes) |
| `key_id` | which operator key (for rotation / audit) |
Exact test blob (236 bytes): `{"expires_at":"2026-06-09T00:00:00Z","issued_at":"2026-06-08T00:00:00Z","key_id":"felhom-op-1","nonce":"a1b2c3d4e5f60718293a4b5c6d7e8f90","op":"guest_destroy","params":{"purge":true},"target":{"guest_id":"9001","host_id":"demo-felhom"}}`
> Note: the SSHSIG **namespace** (`felhom-op-v1`) is the cryptographic domain separator and is
> a **fixed constant in the verifier**, never caller-supplied — a signature minted for any
> other namespace must not verify (proven: "wrong namespace" rejected).
## 3. Go SSHSIG verify — approach + implementation cost
**It is not a one-call verify, but it is clean — no hand-rolled crypto.** The only manual work
is SSHSIG *framing*; all crypto and key-type dispatch is the library's. Steps:
1. `pem.Decode` the armor → `block.Type == "SSH SIGNATURE"`, `block.Bytes` is the binary SSHSIG.
*(Go's `encoding/pem` parses the armor directly — no manual base64/line handling.)*
2. Strip the literal 6-byte `SSHSIG` magic preamble (it is **not** length-prefixed).
3. `ssh.Unmarshal` the rest into a struct `{Version uint32; PublicKey, Namespace, Reserved,
HashAlgo, Signature string}` — library does the SSH wire parsing.
4. `ssh.ParsePublicKey([]byte(PublicKey))` → an `ssh.PublicKey`.
5. Recompute the signed data per spec: `"SSHSIG" || string(namespace) || string(reserved) ||
string(hash_algorithm) || string(H(message))`, where `H` is the **named** hash
(`sha256`/`sha512`) — built with one `ssh.Marshal`.
6. `ssh.Unmarshal([]byte(Signature))` into `ssh.Signature`, then **`pub.Verify(signed, &sig)`** —
which **dispatches on the key's own algorithm** (this is what makes it key-agnostic).
**Cost verdict:** ~40 lines of framing in one file, zero crypto implemented by us. Well within
the agent's budget; **no reason to fall back** to a different primitive.
## 4. Anti-replay / authorization layer (on top of signature validity)
Enforced in `VerifySignedOp` *after* the signature check, each proven to reject **even with a
valid signature** (Step 3 output above):
- **replay** — nonce already recorded in the window → reject;
- **expired / not-yet-valid**`now ∉ [issued_at, expires_at]` → reject (both sides shown);
- **retargeted**`target.host_id`/`guest_id` ≠ this box/guest → reject (both shown).
(Order matters: signature → namespace → allow-list → crypto verify → target → time → nonce, so
a replayed *but otherwise valid* op is still caught, and an invalid sig never consumes a nonce.)
## 5. Key-type-agnosticism — **TRUE DROP-IN** (no box change for FIDO2 later)
No FIDO2 device was used (by choice). Instead the spike **emulated the authenticator exactly**:
- Synthesized a well-formed `sk-ssh-ed25519@openssh.com` public key; `ssh.ParsePublicKey` parses
it and `ssh.MarshalAuthorizedKey` round-trips it.
- Constructed a real `SSHSIG` whose inner signature follows the sk scheme (per OpenSSH
`PROTOCOL.u2f`): `ed25519` over `sha256(application) || flags || counter || sha256(signed_data)`,
with the blob `string(format) string(ed25519_sig) byte(flags) uint32(counter)` — i.e. exactly
what a FIDO2 key emits.
- Ran it through the **unchanged `VerifySignedOp`****verified** (`op="guest_destroy"`).
**Verdict: true drop-in.** `pub.Verify` for `sk-ssh-ed25519` is implemented in
`golang.org/x/crypto/ssh` **v0.52.0** (it reconstructs `appDigest‖flags‖counter‖dataDigest` and
`ed25519.Verify`s it). Introducing a hardware operator key later is a **no-op on the boxes**
the agent's verify code is identical; only the operator's signer key (and the allowed-signers
set entry) changes. No sk-specific handler is needed.
> Because verification dispatches on the key type embedded in the signature, the same path also
> accepts `ssh-ed25519`, `rsa-sha2-*`, `ecdsa-sha2-*`, etc. — algorithm choice is the operator's,
> not the agent's.
## 6. Fallback (not taken) and its cost
A fallback would be a **raw Ed25519 detached signature** (or `minisign`): trivially one
`ed25519.Verify` call, no SSHSIG framing. **Rejected** because it **loses the clean FIDO2 path**
a raw-Ed25519 verifier cannot consume an `sk-ssh-ed25519` signature (which carries flags+counter
and a different signed-data construction), so the future hardware swap would require **changing
the verifier on every box**. SSHSIG buys exactly the key-type-agnosticism (§5) that a raw scheme
forfeits, at a one-file framing cost (§3). **No fallback is warranted.**
## 7. Reference verifier (seed of the agent's verify code)
Verified working on Go 1.24.4 / `x/crypto` v0.52.0. (Test harness omitted; this is the verify
core + SSHSIG framing + anti-replay/authz.)
```go
const Namespace = "felhom-op-v1" // FIXED domain separator, never caller-supplied
const sshsigMagic = "SSHSIG"
type Target struct{ HostID, GuestID string }
type OpBlob struct {
Op string `json:"op"`
Target Target `json:"target"`
Params json.RawMessage `json:"params"`
Nonce string `json:"nonce"`
IssuedAt time.Time `json:"issued_at"`
ExpiresAt time.Time `json:"expires_at"`
KeyID string `json:"key_id"`
}
// (Target needs json tags host_id/guest_id in the real struct.)
type NonceStore interface{ SeenOrRecord(nonce string, exp time.Time) bool }
type sshsigBlob struct {
Version uint32
PublicKey, Namespace, Reserved, HashAlgo, Signature string
}
func hashByName(n string) (hash.Hash, error) {
switch n {
case "sha256": return sha256.New(), nil
case "sha512": return sha512.New(), nil
}
return nil, fmt.Errorf("unsupported SSHSIG hash %q", n)
}
func parseArmoredSSHSIG(armored []byte) (*sshsigBlob, error) {
block, _ := pem.Decode(armored)
if block == nil || block.Type != "SSH SIGNATURE" {
return nil, errors.New("not an SSH SIGNATURE armor")
}
if len(block.Bytes) < 6 || string(block.Bytes[:6]) != sshsigMagic {
return nil, errors.New("missing SSHSIG magic")
}
var sb sshsigBlob
if err := ssh.Unmarshal(block.Bytes[6:], &sb); err != nil { return nil, err }
if sb.Version != 1 { return nil, fmt.Errorf("bad version %d", sb.Version) }
return &sb, nil
}
func signedData(sb *sshsigBlob, msg []byte) ([]byte, error) {
h, err := hashByName(sb.HashAlgo); if err != nil { return nil, err }
h.Write(msg); md := h.Sum(nil)
body := ssh.Marshal(struct{ Namespace, Reserved, HashAlgo string; Hash []byte }{
sb.Namespace, sb.Reserved, sb.HashAlgo, md})
return append([]byte(sshsigMagic), body...), nil
}
// VerifySignedOp: key-type-agnostic signature verify + anti-replay/authorization.
// allowedSigners is the trusted operator set (one key now; a quorum set later).
func VerifySignedOp(blob, sigArmored []byte, allowedSigners []ssh.PublicKey,
thisHostID, thisGuestID string, seenNonces NonceStore) (string, error) {
sb, err := parseArmoredSSHSIG(sigArmored)
if err != nil { return "", err }
if sb.Namespace != Namespace {
return "", fmt.Errorf("namespace mismatch: got %q want %q", sb.Namespace, Namespace)
}
pub, err := ssh.ParsePublicKey([]byte(sb.PublicKey))
if err != nil { return "", err }
allowed := false
for _, a := range allowedSigners {
if bytes.Equal(a.Marshal(), pub.Marshal()) { allowed = true; break }
}
if !allowed { return "", errors.New("signer not in allowed set") }
signed, err := signedData(sb, blob)
if err != nil { return "", err }
var inner ssh.Signature
if err := ssh.Unmarshal([]byte(sb.Signature), &inner); err != nil { return "", err }
if err := pub.Verify(signed, &inner); err != nil { // dispatches on key algorithm
return "", fmt.Errorf("signature invalid: %w", err)
}
var op OpBlob
if err := json.Unmarshal(blob, &op); err != nil { return "", err }
if op.Target.HostID != thisHostID || op.Target.GuestID != thisGuestID {
return "", fmt.Errorf("target mismatch")
}
now := time.Now().UTC()
if now.Before(op.IssuedAt) { return "", errors.New("not yet valid") }
if now.After(op.ExpiresAt) { return "", errors.New("expired") }
if seenNonces.SeenOrRecord(op.Nonce, op.ExpiresAt) {
return "", fmt.Errorf("replay: nonce %s already seen", op.Nonce)
}
return op.Op, nil
}
```
## 8. Inputs to the design doc (`04-control-plane-authorization.md`)
- **Primitive confirmed:** SSHSIG (`ssh-keygen -Y sign` / armored `BEGIN SSH SIGNATURE`),
verified in Go via `pem.Decode` + `ssh.Unmarshal` + `ssh.ParsePublicKey` + `pub.Verify`. Low
implementation cost; no crypto hand-rolled.
- **Hub cannot forge:** the operator private key never touches the hub; the hub only queues the
opaque armored blob (matches `03` §4).
- **Key-type-agnostic / hardware-ready:** software `ed25519` now, FIDO2 `sk-ssh-ed25519` later is
a **box no-op** (proven end-to-end). The verifier hardcodes neither key type nor algorithm.
- **`allowedSigners` is a set:** single signer today; **threshold/quorum is just set sizing** plus
an N-of-M policy on top (out of scope here).
- **Anti-replay/authz are mandatory and cheap:** namespace (fixed), allow-list, then crypto,
then target-binding, time-window, nonce — all enforced and tested.
- **Canonical blob (§2)** is the shared contract between the operator CLI and the agent verifier.
+28
View File
@@ -1,5 +1,33 @@
# Felhom Hub — Changelog # Felhom Hub — Changelog
## Repo docs — no hub version change (2026-06-08)
### Changed
- **Reflowed `felhom.eu/CLAUDE.md`** — removed hard mid-paragraph line wraps (prose, list items, blockquotes now single-line); tables untouched; rendered output unchanged.
- **Unified the REPORT/CHANGELOG convention**: this repo's `REPORT.md` switches from *append/cumulative* to **overwrite-latest** (uniform with the sibling repos); `CHANGELOG.md` (this file) stays the cumulative log, newest on top. Updated `REPORT.md`'s header note accordingly (existing sections retained as history). Added an explicit **no-secrets** rule. No hub code change → no version bump.
## v0.7.1 (2026-06-08)
### Changed
- **`/host-report` rejects oversize bodies explicitly with 413** (`handler.go`) instead of silently truncating at the 4 MiB `LimitReader` cap. Reads one byte past `maxHostReportBytes` and returns `413 Payload too large` — a truncated-but-valid JSON could otherwise be accepted as a partial report (silently dropping guests from the mirror). The controller `handleReport` 1 MiB path is **unchanged** (frozen until slice-10 cutover).
### Added
- **Cross-repo contract fixture** `hub/internal/api/testdata/host-report.golden.json` (byte-identical with felhom-agent's copy) + `TestHostReport_GoldenContract` — POSTs the golden through the real `handleHostReport` and asserts 200 + denorm (`guest_total`/`guest_running`/`cloudflared_status`) + both guests upserted, proving `hostReportPayload` still extracts the contract from the real shape. Duplicated contract (no shared types module yet); revisit at slices 5/6.
## v0.7.0 (2026-06-08)
### Added — host-domain ingest (slice 3, additive; controller path untouched)
- **New tables** `hosts`, `guests`, `host_reports` (`store.go migrate()`, idempotent). Full schema now, including columns **inert until slice 10** (`hosts.desired_json`/`desired_generation`/`dr_record_json`, `guests.api_key`/`desired_spec_json`) so the cutover needs no `ALTER`. Nothing reads/writes the inert columns this slice.
- **`POST /api/v1/host-report`** — the agent's heartbeat. Per-host Bearer auth; 4 MiB body; persists the full report + denormalized fields (cpu/mem/disk %, guest counts, cloudflared status); upserts each guest's **reality** columns (`guest_id = "<host_id>/<vmid>"`, hub-derived); returns the control envelope `{status, poll_interval_seconds:900, blocked, desired_generation:0, has_signed_ops:false}` (`blocked` reflects the customer's status; the latter two are reserved/placeholder for slice 4).
- **Per-host key auth**`checkAuthHost` (Bearer → host → customer), added alongside the unchanged `checkAuthCustomer`. Global key remains a bootstrap fallback.
- **`POST /api/v1/admin/hosts`** — **PROVISIONAL** global-key-only host mint (host_id + per-host api_key); the slice-3 bootstrap until enrollment (slices 78) replaces it.
- **Host dead-man's-switch**`monitor.HostStalenessChecker` over `host_reports`, emitting `host_stale`/`host_down`/`host_recovered` (30m/60m), attributed to the host's customer; registered in `allowedEventTypes`; wired in `cmd/hub/main.go` on the existing 60s ticker. A deliberate **sibling** of the controller `StalenessChecker` (both run until slice 10).
- **Store methods**: `GetHostByAPIKey`, `GetHost`, `ListHosts`, `UpsertHost`, `SaveHostReport`, `UpsertGuestFromReport` (preserves inert columns on conflict), `GetHostStaleness` (skips never-reported hosts), `GuestID`. `Prune` now also prunes `host_reports` (same retention).
- **Tests** (new, hermetic): store, auth (`checkAuthHost`), ingest (valid+envelope+denorm, host_id mismatch→403, unknown-host-under-global→400, blocked→true, oversize→400), admin mint (non-global→403, unknown customer→400, mint+round-trip), host staleness transitions.
### Unchanged (explicit)
- The controller path — `/api/v1/report`, `reports`, `customer_configs`, `checkAuthCustomer`, the existing staleness/deadline checkers — is untouched and still green. The old controller and the new agent report in parallel during slices 39; the schema/auth cutover is **slice 10**.
## v0.6.2 (2026-02-26) ## v0.6.2 (2026-02-26)
### Added ### Added
+4
View File
@@ -206,6 +206,9 @@ func main() {
// Staleness checker — runs every 60s // Staleness checker — runs every 60s
stalenessChecker := monitor.NewStalenessChecker(dataStore, staleThreshold, dispatcher.ProcessEvent, logger) stalenessChecker := monitor.NewStalenessChecker(dataStore, staleThreshold, dispatcher.ProcessEvent, logger)
// v0.7.0: host-domain dead-man's-switch (sibling; the controller checker above is
// unchanged and keeps running until the slice-10 cutover). Same 60s cadence.
hostStalenessChecker := monitor.NewHostStalenessChecker(dataStore, staleThreshold, dispatcher.ProcessEvent, logger)
go func() { go func() {
ticker := time.NewTicker(60 * time.Second) ticker := time.NewTicker(60 * time.Second)
defer ticker.Stop() defer ticker.Stop()
@@ -215,6 +218,7 @@ func main() {
return return
case <-ticker.C: case <-ticker.C:
stalenessChecker.Check() stalenessChecker.Check()
hostStalenessChecker.Check()
} }
} }
}() }()
+238 -9
View File
@@ -89,6 +89,30 @@ func (h *Handler) checkAuthCustomer(r *http.Request) (customerID string, isGloba
return cfg.CustomerID, false, true return cfg.CustomerID, false, true
} }
// checkAuthHost resolves a Bearer token to a HOST identity (the agent's auth
// path). It is a sibling of checkAuthCustomer — the controller path is unchanged.
// - global key -> ("", "", true, true) caller trusts body.host_id
// - per-host key -> (hostID, customerID, false, true)
// - failure -> ("", "", false, false)
func (h *Handler) checkAuthHost(r *http.Request) (hostID, customerID string, isGlobal, ok bool) {
auth := r.Header.Get("Authorization")
if !strings.HasPrefix(auth, "Bearer ") {
return "", "", false, false
}
token := strings.TrimPrefix(auth, "Bearer ")
// Global key first (same constant-time compare as checkAuthCustomer).
if h.apiKey != "" && subtle.ConstantTimeCompare([]byte(token), []byte(h.apiKey)) == 1 {
return "", "", true, true
}
host, err := h.store.GetHostByAPIKey(token)
if err != nil || host == nil {
return "", "", false, false
}
return host.HostID, host.CustomerID, false, true
}
// ServeHTTP routes API requests. // ServeHTTP routes API requests.
func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
path := strings.TrimPrefix(r.URL.Path, "/api/v1") path := strings.TrimPrefix(r.URL.Path, "/api/v1")
@@ -96,6 +120,10 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
switch { switch {
case r.Method == http.MethodPost && path == "/report": case r.Method == http.MethodPost && path == "/report":
h.handleReport(w, r) h.handleReport(w, r)
case r.Method == http.MethodPost && path == "/host-report":
h.handleHostReport(w, r)
case r.Method == http.MethodPost && path == "/admin/hosts":
h.handleAdminCreateHost(w, r)
case r.Method == http.MethodPost && path == "/event": case r.Method == http.MethodPost && path == "/event":
h.handleEvent(w, r) h.handleEvent(w, r)
case r.Method == http.MethodPost && path == "/notify": case r.Method == http.MethodPost && path == "/notify":
@@ -194,6 +222,203 @@ func (h *Handler) handleReport(w http.ResponseWriter, r *http.Request) {
json.NewEncoder(w).Encode(resp) json.NewEncoder(w).Encode(resp)
} }
// defaultHostPollSeconds is the cadence the hub hands every agent this slice (no
// per-host override UI yet — that is a later slice).
const defaultHostPollSeconds = 900
// maxHostReportBytes bounds a host-report body. Larger than the controller path's
// 1 MiB because host reports carry the full guest list + (later) storage/backup
// arrays. We read one byte past it and reject explicitly (413) rather than letting
// LimitReader silently truncate — a truncated-but-valid JSON would otherwise be
// accepted as a partial report, dropping guests from the mirror.
const maxHostReportBytes = 4 << 20 // 4 MiB
// hostReportPayload is the subset of the agent host-report (slice-3 contract,
// §3 / agent spec §4) the hub needs for denorm + guest reality. Unknown fields
// (storage_targets/backups/restore_tests/pbs_snapshots/audit_tail) are ignored,
// so an empty or absent collection is accepted without error.
type hostReportPayload struct {
HostID string `json:"host_id"`
AgentVersion string `json:"agent_version"`
Host struct {
CPUPercent float64 `json:"cpu_percent"`
MemoryPercent float64 `json:"memory_percent"`
DiskPercent float64 `json:"disk_percent"`
} `json:"host"`
Guests []struct {
VMID int `json:"vmid"`
Name string `json:"name"`
Status string `json:"status"`
ControllerVersion string `json:"controller_version"`
} `json:"guests"`
Cloudflared struct {
Status string `json:"status"`
} `json:"cloudflared"`
}
// handleHostReport ingests the agent's host-report (the heartbeat) and returns the
// control envelope (agent spec §5).
func (h *Handler) handleHostReport(w http.ResponseWriter, r *http.Request) {
hostID, custID, isGlobal, ok := h.checkAuthHost(r)
if !ok {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
body, err := io.ReadAll(io.LimitReader(r.Body, maxHostReportBytes+1))
if err != nil {
http.Error(w, "Bad request", http.StatusBadRequest)
return
}
if len(body) > maxHostReportBytes {
http.Error(w, "Payload too large", http.StatusRequestEntityTooLarge)
return
}
var rep hostReportPayload
if err := json.Unmarshal(body, &rep); err != nil || rep.HostID == "" {
http.Error(w, "Invalid payload: host_id required", http.StatusBadRequest)
return
}
if isGlobal {
// Global-key bootstrap: trust body.host_id but require the host to exist
// (it must be minted first) and resolve its customer from the row.
host, err := h.store.GetHost(rep.HostID)
if err != nil {
h.logger.Printf("[ERROR] host lookup failed for %s: %v", rep.HostID, err)
http.Error(w, "Internal error", http.StatusInternalServerError)
return
}
if host == nil {
http.Error(w, "Unknown host_id (mint via /admin/hosts first)", http.StatusBadRequest)
return
}
hostID, custID = rep.HostID, host.CustomerID
} else if rep.HostID != hostID {
http.Error(w, "Forbidden: host_id mismatch", http.StatusForbidden)
return
}
running := 0
for _, g := range rep.Guests {
if g.Status == "running" {
running++
}
}
denorm := store.HostReportDenorm{
AgentVersion: rep.AgentVersion,
CPUPercent: rep.Host.CPUPercent,
MemoryPercent: rep.Host.MemoryPercent,
DiskPercent: rep.Host.DiskPercent,
GuestTotal: len(rep.Guests),
GuestRunning: running,
CloudflaredStatus: rep.Cloudflared.Status,
}
if err := h.store.SaveHostReport(hostID, custID, body, denorm); err != nil {
h.logger.Printf("[ERROR] Failed to save host-report from %s: %v", hostID, err)
http.Error(w, "Internal error", http.StatusInternalServerError)
return
}
for _, g := range rep.Guests {
status := g.Status
if status == "" {
status = "unknown"
}
guest := &store.Guest{
GuestID: store.GuestID(hostID, g.VMID),
CustomerID: custID,
HostID: hostID,
VMID: g.VMID,
DisplayName: g.Name,
Status: status,
ControllerVersion: g.ControllerVersion,
}
if err := h.store.UpsertGuestFromReport(guest); err != nil {
// A guest upsert failure must not drop the whole report (liveness).
h.logger.Printf("[WARN] Failed to upsert guest %s: %v", guest.GuestID, err)
}
}
h.logger.Printf("[INFO] host-report from %s (%d guests, %d bytes)", hostID, len(rep.Guests), len(body))
blocked := false
if cc, err := h.store.GetCustomerConfig(custID); err == nil && cc != nil && cc.Status == "blocked" {
blocked = true
}
resp := map[string]interface{}{
"status": "ok",
"poll_interval_seconds": defaultHostPollSeconds,
"blocked": blocked,
"desired_generation": 0, // reserved (slice 4)
"has_signed_ops": false, // reserved (slice 4)
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(resp)
}
// handleAdminCreateHost mints a host identity (host_id + per-host api_key).
//
// PROVISIONAL (slice-3 bootstrap): global-key only, so the demo agent can
// authenticate before enrollment (slices 78) exists. Enrollment will mint host
// identity + pin signing keys; this endpoint should be removed/locked down then
// (tracked under doc 05 §11 auth-tightening at cutover).
func (h *Handler) handleAdminCreateHost(w http.ResponseWriter, r *http.Request) {
_, _, isGlobal, ok := h.checkAuthHost(r)
if !ok || !isGlobal {
http.Error(w, "Forbidden: global key required", http.StatusForbidden)
return
}
body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20))
if err != nil {
http.Error(w, "Bad request", http.StatusBadRequest)
return
}
var req struct {
CustomerID string `json:"customer_id"`
HostID string `json:"host_id"`
DisplayName string `json:"display_name"`
}
if err := json.Unmarshal(body, &req); err != nil || req.CustomerID == "" {
http.Error(w, "Invalid payload: customer_id required", http.StatusBadRequest)
return
}
cc, err := h.store.GetCustomerConfig(req.CustomerID)
if err != nil {
http.Error(w, "Internal error", http.StatusInternalServerError)
return
}
if cc == nil {
http.Error(w, "Unknown customer_id", http.StatusBadRequest)
return
}
hostID := req.HostID
if hostID == "" {
sfx, err := configgen.RandomHex(3) // 6 hex chars — human-legible for the demo
if err != nil {
http.Error(w, "Internal error", http.StatusInternalServerError)
return
}
hostID = req.CustomerID + "-" + sfx
}
apiKey, err := configgen.RandomHex(32)
if err != nil {
http.Error(w, "Internal error", http.StatusInternalServerError)
return
}
if err := h.store.UpsertHost(&store.Host{HostID: hostID, CustomerID: req.CustomerID, APIKey: apiKey}); err != nil {
h.logger.Printf("[ERROR] Failed to mint host for %s: %v", req.CustomerID, err)
http.Error(w, "Internal error", http.StatusInternalServerError)
return
}
h.logger.Printf("[INFO] provisional host mint: %s (customer %s)", hostID, req.CustomerID)
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusCreated)
json.NewEncoder(w).Encode(map[string]string{"host_id": hostID, "api_key": apiKey})
}
// allowedEventTypes lists all valid event_type values the Hub accepts. // allowedEventTypes lists all valid event_type values the Hub accepts.
var allowedEventTypes = map[string]bool{ var allowedEventTypes = map[string]bool{
// Controller-pushed events // Controller-pushed events
@@ -219,11 +444,15 @@ var allowedEventTypes = map[string]bool{
"disaster_recovery_started": true, "disaster_recovery_started": true,
"disaster_recovery_completed": true, "disaster_recovery_completed": true,
// Hub-generated events // Hub-generated events
"node_stale": true, "node_stale": true,
"node_down": true, "node_down": true,
"node_recovered": true, "node_recovered": true,
"expected_backup_missed": true, // Hub-generated host-domain events (v0.7.0, slice 3)
"expected_dbdump_missed": true, "host_stale": true,
"host_down": true,
"host_recovered": true,
"expected_backup_missed": true,
"expected_dbdump_missed": true,
// Special // Special
"test": true, "test": true,
} }
@@ -686,10 +915,10 @@ func (h *Handler) handleRecovery(w http.ResponseWriter, r *http.Request, custome
} }
resp := struct { resp := struct {
CustomerID string `json:"customer_id"` CustomerID string `json:"customer_id"`
ConfigYAML string `json:"config_yaml"` ConfigYAML string `json:"config_yaml"`
InfraBackup json.RawMessage `json:"infra_backup"` InfraBackup json.RawMessage `json:"infra_backup"`
HasInfraBackup bool `json:"has_infra_backup"` HasInfraBackup bool `json:"has_infra_backup"`
BackupVersions []store.InfraBackupVersion `json:"backup_versions,omitempty"` BackupVersions []store.InfraBackupVersion `json:"backup_versions,omitempty"`
}{ }{
CustomerID: customerID, CustomerID: customerID,
+232
View File
@@ -0,0 +1,232 @@
package api
import (
"database/sql"
"encoding/json"
"io"
"log"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"testing"
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
_ "modernc.org/sqlite"
)
const globalKey = "GLOBALKEY"
func newTestHandler(t *testing.T) (*Handler, *store.Store, string) {
t.Helper()
path := filepath.Join(t.TempDir(), "test.db")
st, err := store.New(path, log.New(io.Discard, "", 0))
if err != nil {
t.Fatalf("store.New: %v", err)
}
t.Cleanup(func() { st.Close() })
h := New(st, globalKey, "", "", nil, log.New(io.Discard, "", 0))
return h, st, path
}
func do(h *Handler, method, path, bearer, body string) *httptest.ResponseRecorder {
req := httptest.NewRequest(method, "/api/v1"+path, strings.NewReader(body))
if bearer != "" {
req.Header.Set("Authorization", "Bearer "+bearer)
}
rr := httptest.NewRecorder()
h.ServeHTTP(rr, req)
return rr
}
func TestCheckAuthHost(t *testing.T) {
h, st, _ := newTestHandler(t)
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
mk := func(bearer string) *http.Request {
req := httptest.NewRequest(http.MethodPost, "/api/v1/host-report", nil)
if bearer != "" {
req.Header.Set("Authorization", "Bearer "+bearer)
}
return req
}
if _, _, isGlobal, ok := h.checkAuthHost(mk(globalKey)); !ok || !isGlobal {
t.Error("global key should resolve isGlobal=true")
}
hostID, custID, isGlobal, ok := h.checkAuthHost(mk("HKEY"))
if !ok || isGlobal || hostID != "h1" || custID != "c1" {
t.Errorf("per-host key = %q/%q global=%v ok=%v", hostID, custID, isGlobal, ok)
}
if _, _, _, ok := h.checkAuthHost(mk("bogus")); ok {
t.Error("unknown key should fail")
}
}
func validReportBody(hostID string) string {
return `{"host_id":"` + hostID + `","agent_version":"0.3.0",` +
`"host":{"cpu_percent":3.2,"memory_percent":25,"disk_percent":19,"loadavg":["0.1"],"uptime_seconds":100},` +
`"guests":[{"vmid":100,"name":"acme","status":"running","controller_version":""},` +
`{"vmid":101,"name":"beta","status":"stopped"}],` +
`"storage_targets":[],"backups":[],"cloudflared":{"status":"active"},"audit_tail":[]}`
}
func TestHandleHostReport_ValidAndEnvelopeAndDenorm(t *testing.T) {
h, st, dbPath := newTestHandler(t)
st.SaveCustomerConfig(&store.CustomerConfig{CustomerID: "c1", APIKey: "ckey", RetrievalPassword: "p"})
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
rr := do(h, http.MethodPost, "/host-report", "HKEY", validReportBody("h1"))
if rr.Code != 200 {
t.Fatalf("status = %d, body=%s", rr.Code, rr.Body.String())
}
var env struct {
Status string `json:"status"`
PollIntervalSeconds int `json:"poll_interval_seconds"`
Blocked bool `json:"blocked"`
DesiredGeneration int `json:"desired_generation"`
HasSignedOps bool `json:"has_signed_ops"`
}
json.Unmarshal(rr.Body.Bytes(), &env)
if env.Status != "ok" || env.PollIntervalSeconds != 900 || env.Blocked || env.DesiredGeneration != 0 || env.HasSignedOps {
t.Errorf("envelope = %+v", env)
}
// Denorm: guest_running counts only "running" (1 of 2). Read via a 2nd connection.
db, _ := sql.Open("sqlite", dbPath)
defer db.Close()
var total, running int
var cf string
db.QueryRow(`SELECT guest_total, guest_running, cloudflared_status FROM host_reports WHERE host_id='h1' ORDER BY id DESC LIMIT 1`).
Scan(&total, &running, &cf)
if total != 2 || running != 1 || cf != "active" {
t.Errorf("denorm total=%d running=%d cloudflared=%q (want 2,1,active)", total, running, cf)
}
// Guests upserted.
var gname, gstatus string
if err := db.QueryRow(`SELECT display_name, status FROM guests WHERE guest_id='h1/100'`).Scan(&gname, &gstatus); err != nil {
t.Fatalf("guest h1/100 not upserted: %v", err)
}
if gname != "acme" || gstatus != "running" {
t.Errorf("guest = %q/%q", gname, gstatus)
}
}
func TestHandleHostReport_HostIDMismatch(t *testing.T) {
h, st, _ := newTestHandler(t)
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
rr := do(h, http.MethodPost, "/host-report", "HKEY", validReportBody("other-host"))
if rr.Code != http.StatusForbidden {
t.Errorf("status = %d, want 403", rr.Code)
}
}
func TestHandleHostReport_UnknownHostUnderGlobalKey(t *testing.T) {
h, _, _ := newTestHandler(t)
rr := do(h, http.MethodPost, "/host-report", globalKey, validReportBody("ghost"))
if rr.Code != http.StatusBadRequest {
t.Errorf("status = %d, want 400 (unknown host_id)", rr.Code)
}
}
func TestHandleHostReport_BlockedCustomer(t *testing.T) {
h, st, _ := newTestHandler(t)
st.SaveCustomerConfig(&store.CustomerConfig{CustomerID: "c1", APIKey: "ckey", RetrievalPassword: "p"})
st.SetCustomerConfigStatus("c1", "blocked")
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
rr := do(h, http.MethodPost, "/host-report", "HKEY", validReportBody("h1"))
if rr.Code != 200 {
t.Fatalf("status = %d", rr.Code)
}
var env struct {
Blocked bool `json:"blocked"`
}
json.Unmarshal(rr.Body.Bytes(), &env)
if !env.Blocked {
t.Error("blocked customer should yield blocked:true")
}
}
func TestHandleHostReport_OversizeRejected(t *testing.T) {
h, st, _ := newTestHandler(t)
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "HKEY"})
big := `{"host_id":"h1","guests":[{"vmid":1,"name":"` + strings.Repeat("a", 5<<20) + `"}]}`
rr := do(h, http.MethodPost, "/host-report", "HKEY", big)
if rr.Code != http.StatusRequestEntityTooLarge {
t.Errorf("oversize body status = %d, want 413", rr.Code)
}
}
func TestAdminCreateHost(t *testing.T) {
h, st, _ := newTestHandler(t)
st.SaveCustomerConfig(&store.CustomerConfig{CustomerID: "c1", APIKey: "ckey", RetrievalPassword: "p"})
// non-global key (per-customer) → 403
if rr := do(h, http.MethodPost, "/admin/hosts", "ckey", `{"customer_id":"c1"}`); rr.Code != http.StatusForbidden {
t.Errorf("per-customer key status = %d, want 403", rr.Code)
}
// missing/unknown customer → 400
if rr := do(h, http.MethodPost, "/admin/hosts", globalKey, `{"customer_id":"nope"}`); rr.Code != http.StatusBadRequest {
t.Errorf("unknown customer status = %d, want 400", rr.Code)
}
// success → 201 + usable key (round-trip)
rr := do(h, http.MethodPost, "/admin/hosts", globalKey, `{"customer_id":"c1"}`)
if rr.Code != http.StatusCreated {
t.Fatalf("mint status = %d, body=%s", rr.Code, rr.Body.String())
}
var minted struct {
HostID string `json:"host_id"`
APIKey string `json:"api_key"`
}
json.Unmarshal(rr.Body.Bytes(), &minted)
if minted.HostID == "" || minted.APIKey == "" {
t.Fatalf("mint response = %+v", minted)
}
// the minted key authenticates a host-report
rr2 := do(h, http.MethodPost, "/host-report", minted.APIKey, validReportBody(minted.HostID))
if rr2.Code != 200 {
t.Errorf("round-trip host-report with minted key = %d, body=%s", rr2.Code, rr2.Body.String())
}
}
// TestHostReport_GoldenContract drives the real handler with the shared golden
// host-report and proves hostReportPayload still extracts what it needs from the
// real wire shape (denorm + guest upsert).
//
// testdata/host-report.golden.json MUST be kept byte-identical with felhom-agent's
// internal/hub/testdata/host-report.golden.json — it is a duplicated contract until
// a shared types module exists (revisit when slices 5/6 add real fields).
func TestHostReport_GoldenContract(t *testing.T) {
h, st, dbPath := newTestHandler(t)
st.SaveCustomerConfig(&store.CustomerConfig{CustomerID: "c1", APIKey: "ckey", RetrievalPassword: "p"})
st.UpsertHost(&store.Host{HostID: "demo-host-01", CustomerID: "c1", APIKey: "HKEY"})
golden, err := os.ReadFile("testdata/host-report.golden.json")
if err != nil {
t.Fatal(err)
}
rr := do(h, http.MethodPost, "/host-report", "HKEY", string(golden))
if rr.Code != 200 {
t.Fatalf("golden report status = %d, body=%s", rr.Code, rr.Body.String())
}
db, _ := sql.Open("sqlite", dbPath)
defer db.Close()
var total, running int
var cf string
if err := db.QueryRow(`SELECT guest_total, guest_running, cloudflared_status FROM host_reports WHERE host_id='demo-host-01' ORDER BY id DESC LIMIT 1`).
Scan(&total, &running, &cf); err != nil {
t.Fatal(err)
}
if total != 2 || running != 1 || cf != "active" {
t.Errorf("denorm total=%d running=%d cloudflared=%q (want 2,1,active)", total, running, cf)
}
var guestCount int
db.QueryRow(`SELECT COUNT(*) FROM guests WHERE host_id='demo-host-01'`).Scan(&guestCount)
if guestCount != 2 {
t.Errorf("guests upserted = %d, want 2", guestCount)
}
}
+38
View File
@@ -0,0 +1,38 @@
{
"host_id": "demo-host-01",
"reported_at": "2026-06-08T12:00:00Z",
"agent_version": "0.3.1",
"host": {
"node": "demo-felhom",
"cpu_percent": 3.2,
"memory_total_bytes": 16777216000,
"memory_used_bytes": 4194304000,
"memory_percent": 25,
"disk_total_bytes": 152000000000,
"disk_used_bytes": 30000000000,
"disk_percent": 19.7,
"loadavg": ["0.10", "0.20", "0.15"],
"uptime_seconds": 86400
},
"guests": [
{
"vmid": 100,
"name": "felhom-cust-acme",
"status": "running",
"controller_version": "",
"spec": { "cores": 2, "memory_bytes": 2147483648, "disk_bytes": 21474836480 }
},
{
"vmid": 101,
"name": "felhom-cust-beta",
"status": "stopped",
"controller_version": ""
}
],
"storage_targets": [],
"backups": [],
"restore_tests": [],
"pbs_snapshots": [],
"cloudflared": { "status": "active" },
"audit_tail": []
}
+176
View File
@@ -0,0 +1,176 @@
package monitor
import (
"log"
"sync"
"time"
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
)
// HostStalenessChecker is the host-domain dead-man's-switch (v0.7.0, slice 3). It
// is a deliberate SIBLING of StalenessChecker, not a rename: during slices 39 the
// controller report stream (reports) and the agent host-report stream
// (host_reports) are both live, so both checkers run. It keys on host↔host_reports
// and emits host_stale / host_down / host_recovered. Merging is a slice-10 job.
//
// Events are attributed to the host's CUSTOMER (SaveEvent + onEvent take the
// customer_id) so the existing per-customer notification/event UX picks them up
// unchanged.
type HostStalenessChecker struct {
store *store.Store
threshold time.Duration // "stale" after this (default 30m — same as the controller checker)
downAfter time.Duration // "down" after this (2x threshold)
logger *log.Logger
onEvent EventNotifyFunc
mu sync.Mutex
states map[string]string // hostID → "ok" | "stale" | "down"
customerOf map[string]string // hostID → customerID (for event attribution)
downtimeStart map[string]time.Time // hostID → when it first became unreachable
}
// NewHostStalenessChecker creates the checker and seeds state from current
// host-report recency. No events are generated during initialization.
func NewHostStalenessChecker(s *store.Store, threshold time.Duration, onEvent EventNotifyFunc, logger *log.Logger) *HostStalenessChecker {
sc := &HostStalenessChecker{
store: s,
threshold: threshold,
downAfter: 2 * threshold,
logger: logger,
onEvent: onEvent,
states: make(map[string]string),
customerOf: make(map[string]string),
downtimeStart: make(map[string]time.Time),
}
rows, err := s.GetHostStaleness()
if err != nil {
logger.Printf("[WARN] Host staleness checker: failed to seed states: %v", err)
return sc
}
var okCount, staleCount, downCount int
for _, row := range rows {
if s.IsCustomerBlocked(row.CustomerID) {
continue
}
sc.customerOf[row.HostID] = row.CustomerID
age := time.Since(row.LastReportAt)
switch {
case age > sc.downAfter:
sc.states[row.HostID] = "down"
downCount++
case age > sc.threshold:
sc.states[row.HostID] = "stale"
staleCount++
default:
sc.states[row.HostID] = "ok"
okCount++
}
}
logger.Printf("[INFO] Host staleness checker initialized: %d ok, %d stale, %d down", okCount, staleCount, downCount)
return sc
}
// Check evaluates all hosts and emits events on state transitions. Call every 60s.
func (sc *HostStalenessChecker) Check() {
rows, err := sc.store.GetHostStaleness()
if err != nil {
sc.logger.Printf("[WARN] Host staleness check failed: %v", err)
return
}
sc.mu.Lock()
defer sc.mu.Unlock()
seen := make(map[string]bool, len(rows))
for _, row := range rows {
seen[row.HostID] = true
if sc.store.IsCustomerBlocked(row.CustomerID) {
delete(sc.states, row.HostID)
continue
}
sc.customerOf[row.HostID] = row.CustomerID
age := time.Since(row.LastReportAt)
var newState string
switch {
case age > sc.downAfter:
newState = "down"
case age > sc.threshold:
newState = "stale"
default:
newState = "ok"
}
oldState := sc.states[row.HostID]
if oldState == "" {
sc.states[row.HostID] = newState // first observation — no event
continue
}
if oldState == newState {
continue
}
sc.states[row.HostID] = newState
if newState == "stale" && oldState == "ok" {
sc.downtimeStart[row.HostID] = time.Now()
}
downtimeDur := age
if newState == "ok" {
if t, ok := sc.downtimeStart[row.HostID]; ok {
downtimeDur = time.Since(t)
}
delete(sc.downtimeStart, row.HostID)
}
sc.emitTransition(row.HostID, row.CustomerID, oldState, newState, downtimeDur)
}
for id := range sc.states {
if !seen[id] {
delete(sc.states, id)
delete(sc.downtimeStart, id)
}
}
}
// GetState returns the current staleness state for a host.
func (sc *HostStalenessChecker) GetState(hostID string) string {
sc.mu.Lock()
defer sc.mu.Unlock()
s := sc.states[hostID]
if s == "" {
return "unknown"
}
return s
}
func (sc *HostStalenessChecker) emitTransition(hostID, customerID, oldState, newState string, age time.Duration) {
var eventType, severity, message string
switch {
case newState == "stale":
eventType = "host_stale"
severity = "warning"
message = "Host " + hostID + ": no report for " + formatDuration(age)
case newState == "down":
eventType = "host_down"
severity = "error"
message = "Host " + hostID + ": no report for " + formatDuration(age)
case newState == "ok" && (oldState == "stale" || oldState == "down"):
eventType = "host_recovered"
severity = "info"
message = "Host " + hostID + ": reports resumed (was " + oldState + " for " + formatDuration(age) + ")"
default:
return
}
sc.logger.Printf("[INFO] Host staleness: %s %s → %s (%s)", hostID, oldState, newState, eventType)
if _, err := sc.store.SaveEvent(customerID, eventType, severity, message, "{}", "hub"); err != nil {
sc.logger.Printf("[WARN] Failed to save host staleness event for %s: %v", hostID, err)
return
}
if sc.onEvent != nil {
sc.onEvent(customerID, eventType, severity, message, "{}", "hub")
}
}
@@ -0,0 +1,88 @@
package monitor
import (
"database/sql"
"fmt"
"io"
"log"
"path/filepath"
"testing"
"time"
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
_ "modernc.org/sqlite"
)
// backdate sets a host's last_report_at to N minutes ago, simulating the passage
// of time without sleeping. Uses a second connection (the checker reads via store).
func backdate(t *testing.T, db *sql.DB, hostID string, minutesAgo int) {
t.Helper()
if _, err := db.Exec(`UPDATE hosts SET last_report_at = datetime('now', ?) WHERE host_id = ?`,
fmt.Sprintf("-%d minutes", minutesAgo), hostID); err != nil {
t.Fatal(err)
}
}
func TestHostStalenessChecker(t *testing.T) {
path := filepath.Join(t.TempDir(), "test.db")
st, err := store.New(path, log.New(io.Discard, "", 0))
if err != nil {
t.Fatal(err)
}
defer st.Close()
db, _ := sql.Open("sqlite", path)
defer db.Close()
st.SaveCustomerConfig(&store.CustomerConfig{CustomerID: "c1", APIKey: "ck", RetrievalPassword: "p"})
st.UpsertHost(&store.Host{HostID: "h1", CustomerID: "c1", APIKey: "k1"})
st.SaveHostReport("h1", "c1", []byte(`{}`), store.HostReportDenorm{}) // sets last_report_at
var events []string
onEvent := func(customerID, eventType, severity, message, detailsJSON, source string) {
events = append(events, eventType)
}
// Seed already-stale (40m) → state stale, but NO event on init.
backdate(t, db, "h1", 40)
sc := NewHostStalenessChecker(st, 30*time.Minute, onEvent, log.New(io.Discard, "", 0))
if len(events) != 0 {
t.Fatalf("seed must not emit events, got %v", events)
}
if sc.GetState("h1") != "stale" {
t.Fatalf("seeded state = %q, want stale", sc.GetState("h1"))
}
// Same age → no transition.
sc.Check()
if len(events) != 0 {
t.Fatalf("no transition expected, got %v", events)
}
// Fresh report → host_recovered.
backdate(t, db, "h1", 2)
sc.Check()
if last(events) != "host_recovered" {
t.Fatalf("events = %v, want last host_recovered", events)
}
// Aged to stale → host_stale.
backdate(t, db, "h1", 40)
sc.Check()
if last(events) != "host_stale" {
t.Fatalf("events = %v, want last host_stale", events)
}
// Aged past 2× → host_down.
backdate(t, db, "h1", 130)
sc.Check()
if last(events) != "host_down" {
t.Fatalf("events = %v, want last host_down", events)
}
}
func last(s []string) string {
if len(s) == 0 {
return ""
}
return s[len(s)-1]
}
+122
View File
@@ -0,0 +1,122 @@
package store
import (
"io"
"log"
"path/filepath"
"testing"
)
func newTestStore(t *testing.T) *Store {
t.Helper()
s, err := New(filepath.Join(t.TempDir(), "test.db"), log.New(io.Discard, "", 0))
if err != nil {
t.Fatalf("store.New: %v", err)
}
t.Cleanup(func() { s.Close() })
return s
}
func TestGuestID(t *testing.T) {
if got := GuestID("demo-host-01", 100); got != "demo-host-01/100" {
t.Errorf("GuestID = %q", got)
}
}
func TestUpsertHost_AndLookup(t *testing.T) {
s := newTestStore(t)
if err := s.UpsertHost(&Host{HostID: "h1", CustomerID: "c1", APIKey: "k1"}); err != nil {
t.Fatalf("UpsertHost: %v", err)
}
h, err := s.GetHost("h1")
if err != nil || h == nil {
t.Fatalf("GetHost: %v / %v", h, err)
}
if h.CustomerID != "c1" || h.APIKey != "k1" || h.DesiredJSON != "{}" || h.LastReportAt != nil {
t.Errorf("host = %+v", h)
}
byKey, err := s.GetHostByAPIKey("k1")
if err != nil || byKey == nil || byKey.HostID != "h1" {
t.Errorf("GetHostByAPIKey hit = %+v / %v", byKey, err)
}
miss, err := s.GetHostByAPIKey("nope")
if err != nil || miss != nil {
t.Errorf("GetHostByAPIKey miss = %+v / %v (want nil,nil)", miss, err)
}
}
func TestSaveHostReport_BumpsRealityPreservesIntent(t *testing.T) {
s := newTestStore(t)
if err := s.UpsertHost(&Host{HostID: "h1", CustomerID: "c1", APIKey: "k1"}); err != nil {
t.Fatal(err)
}
// Operator-owned intent columns (inert this slice) set out-of-band.
if _, err := s.db.Exec(`UPDATE hosts SET desired_json='{"want":1}', desired_generation=7 WHERE host_id='h1'`); err != nil {
t.Fatal(err)
}
denorm := HostReportDenorm{AgentVersion: "0.3.0", CPUPercent: 3.2, MemoryPercent: 25, DiskPercent: 19, GuestTotal: 2, GuestRunning: 1, CloudflaredStatus: "active"}
if err := s.SaveHostReport("h1", "c1", []byte(`{"host_id":"h1"}`), denorm); err != nil {
t.Fatalf("SaveHostReport: %v", err)
}
h, _ := s.GetHost("h1")
if h.AgentVersion != "0.3.0" || h.LastReportAt == nil {
t.Errorf("reality not bumped: %+v", h)
}
if h.DesiredJSON != `{"want":1}` || h.DesiredGeneration != 7 {
t.Errorf("a report must NOT clobber intent columns: desired_json=%q gen=%d", h.DesiredJSON, h.DesiredGeneration)
}
var n int
s.db.QueryRow(`SELECT COUNT(*) FROM host_reports WHERE host_id='h1'`).Scan(&n)
if n != 1 {
t.Errorf("host_reports rows = %d, want 1", n)
}
}
func TestUpsertGuestFromReport_PreservesInertColumns(t *testing.T) {
s := newTestStore(t)
gid := GuestID("h1", 100)
if err := s.UpsertGuestFromReport(&Guest{GuestID: gid, CustomerID: "c1", HostID: "h1", VMID: 100, DisplayName: "acme", Status: "running"}); err != nil {
t.Fatal(err)
}
// Slice-10 columns set out-of-band; a report upsert must not touch them.
if _, err := s.db.Exec(`UPDATE guests SET api_key='controllerkey', desired_spec_json='{"cores":4}' WHERE guest_id=?`, gid); err != nil {
t.Fatal(err)
}
// A later report changes reality (status/name).
if err := s.UpsertGuestFromReport(&Guest{GuestID: gid, CustomerID: "c1", HostID: "h1", VMID: 100, DisplayName: "acme-renamed", Status: "stopped"}); err != nil {
t.Fatal(err)
}
var apiKey, desiredSpec, status, name string
err := s.db.QueryRow(`SELECT api_key, desired_spec_json, status, display_name FROM guests WHERE guest_id=?`, gid).
Scan(&apiKey, &desiredSpec, &status, &name)
if err != nil {
t.Fatal(err)
}
if apiKey != "controllerkey" || desiredSpec != `{"cores":4}` {
t.Errorf("inert columns clobbered: api_key=%q desired_spec_json=%q", apiKey, desiredSpec)
}
if status != "stopped" || name != "acme-renamed" {
t.Errorf("reality not updated: status=%q name=%q", status, name)
}
}
func TestGetHostStaleness_SkipsNeverReported(t *testing.T) {
s := newTestStore(t)
s.UpsertHost(&Host{HostID: "h1", CustomerID: "c1", APIKey: "k1"})
rows, err := s.GetHostStaleness()
if err != nil {
t.Fatal(err)
}
if len(rows) != 0 {
t.Errorf("never-reported host should be skipped, got %d rows", len(rows))
}
s.SaveHostReport("h1", "c1", []byte(`{}`), HostReportDenorm{})
rows, _ = s.GetHostStaleness()
if len(rows) != 1 || rows[0].HostID != "h1" {
t.Errorf("after a report expected 1 row, got %+v", rows)
}
}
+277 -16
View File
@@ -5,6 +5,7 @@ import (
"encoding/json" "encoding/json"
"fmt" "fmt"
"log" "log"
"strconv"
"time" "time"
_ "modernc.org/sqlite" _ "modernc.org/sqlite"
@@ -18,18 +19,18 @@ type Store struct {
// CustomerSummary holds the latest status for a customer (for dashboard). // CustomerSummary holds the latest status for a customer (for dashboard).
type CustomerSummary struct { type CustomerSummary struct {
CustomerID string CustomerID string
CustomerName string CustomerName string
ControllerVersion string ControllerVersion string
ReceivedAt time.Time ReceivedAt time.Time
HealthStatus string HealthStatus string
CPUPercent float64 CPUPercent float64
MemoryPercent float64 MemoryPercent float64
ContainerTotal int ContainerTotal int
ContainerRunning int ContainerRunning int
BackupLastSnapshot *time.Time BackupLastSnapshot *time.Time
ReportJSON string ReportJSON string
ControllerURL string ControllerURL string
// Computed fields (not stored) // Computed fields (not stored)
TimeSinceReport time.Duration TimeSinceReport time.Duration
@@ -216,6 +217,63 @@ func (s *Store) migrate() error {
WHERE NOT EXISTS (SELECT 1 FROM infra_backup_versions WHERE NOT EXISTS (SELECT 1 FROM infra_backup_versions
WHERE infra_backup_versions.customer_id = infra_backups.customer_id)`) WHERE infra_backup_versions.customer_id = infra_backups.customer_id)`)
// v0.7.0: host-domain (slice 3). Purely additive — the controller path
// (reports/customer_configs) is untouched; the schema cutover is slice 10.
// Columns marked INERT exist now so slice 10 needs no ALTER; nothing reads or
// writes them this slice.
_, err = s.db.Exec(`
CREATE TABLE IF NOT EXISTS hosts (
host_id TEXT PRIMARY KEY,
customer_id TEXT NOT NULL,
api_key TEXT NOT NULL,
agent_version TEXT NOT NULL DEFAULT '',
last_report_at DATETIME,
desired_json TEXT NOT NULL DEFAULT '{}',
desired_generation INTEGER NOT NULL DEFAULT 0,
dr_record_json TEXT NOT NULL DEFAULT '{}',
created_at DATETIME NOT NULL DEFAULT (datetime('now')),
updated_at DATETIME NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_hosts_customer ON hosts(customer_id);
CREATE TABLE IF NOT EXISTS guests (
guest_id TEXT PRIMARY KEY,
customer_id TEXT NOT NULL,
host_id TEXT NOT NULL,
vmid INTEGER NOT NULL,
display_name TEXT NOT NULL DEFAULT '',
status TEXT NOT NULL DEFAULT 'unknown',
controller_version TEXT NOT NULL DEFAULT '',
last_seen_at DATETIME,
api_key TEXT NOT NULL DEFAULT '',
desired_spec_json TEXT NOT NULL DEFAULT '{}',
created_at DATETIME NOT NULL DEFAULT (datetime('now')),
updated_at DATETIME NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_guests_host ON guests(host_id);
CREATE INDEX IF NOT EXISTS idx_guests_customer ON guests(customer_id);
CREATE TABLE IF NOT EXISTS host_reports (
id INTEGER PRIMARY KEY AUTOINCREMENT,
host_id TEXT NOT NULL,
customer_id TEXT NOT NULL,
received_at DATETIME NOT NULL DEFAULT (datetime('now')),
report_json TEXT NOT NULL,
agent_version TEXT,
cpu_percent REAL,
memory_percent REAL,
disk_percent REAL,
guest_total INTEGER,
guest_running INTEGER,
cloudflared_status TEXT
);
CREATE INDEX IF NOT EXISTS idx_host_reports_host ON host_reports(host_id, received_at DESC);
CREATE INDEX IF NOT EXISTS idx_host_reports_customer ON host_reports(customer_id, received_at DESC);
`)
if err != nil {
return err
}
return nil return nil
} }
@@ -812,7 +870,13 @@ func (s *Store) Prune(maxDays int) (int64, error) {
if err != nil { if err != nil {
return 0, err return 0, err
} }
return res.RowsAffected() n, _ := res.RowsAffected()
// v0.7.0: prune the parallel host-domain report stream, same retention.
if hres, herr := s.db.Exec("DELETE FROM host_reports WHERE received_at < ?", cutoff); herr == nil {
hn, _ := hres.RowsAffected()
n += hn
}
return n, nil
} }
// Close closes the database connection. // Close closes the database connection.
@@ -1138,11 +1202,11 @@ func scanEvents(rows *sql.Rows) ([]Event, error) {
// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return. // parseSQLiteTime tries multiple formats that modernc.org/sqlite may return.
func parseSQLiteTime(s string) time.Time { func parseSQLiteTime(s string) time.Time {
formats := []string{ formats := []string{
"2006-01-02 15:04:05", // SQLite datetime('now') "2006-01-02 15:04:05", // SQLite datetime('now')
"2006-01-02T15:04:05Z", // RFC3339 without fractional "2006-01-02T15:04:05Z", // RFC3339 without fractional
time.RFC3339, // 2006-01-02T15:04:05Z07:00 time.RFC3339, // 2006-01-02T15:04:05Z07:00
time.RFC3339Nano, // with fractional seconds time.RFC3339Nano, // with fractional seconds
"2006-01-02 15:04:05+00:00", // with explicit UTC offset "2006-01-02 15:04:05+00:00", // with explicit UTC offset
"2006-01-02 15:04:05.999999999", // with fractional, no TZ "2006-01-02 15:04:05.999999999", // with fractional, no TZ
} }
for _, f := range formats { for _, f := range formats {
@@ -1180,3 +1244,200 @@ func parseDiskSummary(reportJSON string) string {
} }
return result return result
} }
// ---- v0.7.0: host-domain (slice 3) ----
// Additive store surface for the agent's host-report stream. The controller-path
// methods above are untouched.
// Host is one customer agent. Mixes operator-intent columns (Desired*, DRRecord —
// INERT until slice 10) with box-reported reality (AgentVersion, LastReportAt).
type Host struct {
HostID string
CustomerID string
APIKey string
AgentVersion string
LastReportAt *time.Time
DesiredJSON string
DesiredGeneration int64
DRRecordJSON string
CreatedAt time.Time
UpdatedAt time.Time
}
// Guest is one controller LXC. Reality columns are report-driven; APIKey and
// DesiredSpecJSON are INERT until slice 10 and must survive report upserts.
type Guest struct {
GuestID string
CustomerID string
HostID string
VMID int
DisplayName string
Status string
ControllerVersion string
LastSeenAt *time.Time
APIKey string
DesiredSpecJSON string
CreatedAt time.Time
UpdatedAt time.Time
}
// HostReportDenorm are the denormalized fields pulled from a host-report for the
// dashboard / staleness, mirroring the reports table's denorm pattern.
type HostReportDenorm struct {
AgentVersion string
CPUPercent float64
MemoryPercent float64
DiskPercent float64
GuestTotal int
GuestRunning int
CloudflaredStatus string
}
// HostStaleRow is the minimal per-host recency row the dead-man's-switch reads.
type HostStaleRow struct {
HostID string
CustomerID string
LastReportAt time.Time
}
// GuestID derives the interim guest primary key from host + vmid. The hub owns the
// id scheme (locked decision 3) so the slice-10 swap to durable ids is hub-only.
func GuestID(hostID string, vmid int) string {
return hostID + "/" + strconv.Itoa(vmid)
}
func scanHost(scan func(dest ...any) error) (*Host, error) {
var h Host
var lastReport sql.NullString
var createdAt, updatedAt string
err := scan(&h.HostID, &h.CustomerID, &h.APIKey, &h.AgentVersion, &lastReport,
&h.DesiredJSON, &h.DesiredGeneration, &h.DRRecordJSON, &createdAt, &updatedAt)
if err != nil {
return nil, err
}
if lastReport.Valid {
t := parseSQLiteTime(lastReport.String)
h.LastReportAt = &t
}
h.CreatedAt = parseSQLiteTime(createdAt)
h.UpdatedAt = parseSQLiteTime(updatedAt)
return &h, nil
}
const hostSelectCols = `host_id, customer_id, api_key, agent_version, last_report_at,
desired_json, desired_generation, dr_record_json, created_at, updated_at`
// GetHostByAPIKey looks up a host by its per-host hub key. Returns nil (no error)
// if no match — parallels GetCustomerConfigByAPIKey.
func (s *Store) GetHostByAPIKey(apiKey string) (*Host, error) {
h, err := scanHost(s.db.QueryRow(`SELECT `+hostSelectCols+` FROM hosts WHERE api_key = ?`, apiKey).Scan)
if err == sql.ErrNoRows {
return nil, nil
}
return h, err
}
// GetHost looks up a host by id. Returns nil (no error) if not found.
func (s *Store) GetHost(hostID string) (*Host, error) {
h, err := scanHost(s.db.QueryRow(`SELECT `+hostSelectCols+` FROM hosts WHERE host_id = ?`, hostID).Scan)
if err == sql.ErrNoRows {
return nil, nil
}
return h, err
}
// ListHosts returns all hosts (debug / host-domain views).
func (s *Store) ListHosts() ([]Host, error) {
rows, err := s.db.Query(`SELECT ` + hostSelectCols + ` FROM hosts ORDER BY host_id`)
if err != nil {
return nil, err
}
defer rows.Close()
var hosts []Host
for rows.Next() {
h, err := scanHost(rows.Scan)
if err != nil {
return nil, err
}
hosts = append(hosts, *h)
}
return hosts, rows.Err()
}
// UpsertHost creates or updates a host identity (used by the admin mint). On
// conflict it updates only operator-settable identity fields + updated_at; it does
// NOT touch the reality columns (agent_version/last_report_at) or the inert intent
// columns (desired_*/dr_record_json) — those are owned elsewhere.
func (s *Store) UpsertHost(h *Host) error {
_, err := s.db.Exec(`
INSERT INTO hosts (host_id, customer_id, api_key, updated_at)
VALUES (?, ?, ?, datetime('now'))
ON CONFLICT(host_id) DO UPDATE SET
customer_id = excluded.customer_id,
api_key = excluded.api_key,
updated_at = datetime('now')`,
h.HostID, h.CustomerID, h.APIKey,
)
return err
}
// SaveHostReport inserts a host_reports row and bumps the host's reality columns
// (agent_version/last_report_at/updated_at) — never the inert intent columns.
func (s *Store) SaveHostReport(hostID, customerID string, reportJSON []byte, d HostReportDenorm) error {
_, err := s.db.Exec(`
INSERT INTO host_reports (host_id, customer_id, report_json, agent_version,
cpu_percent, memory_percent, disk_percent, guest_total, guest_running, cloudflared_status)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`,
hostID, customerID, string(reportJSON), d.AgentVersion,
d.CPUPercent, d.MemoryPercent, d.DiskPercent, d.GuestTotal, d.GuestRunning, d.CloudflaredStatus,
)
if err != nil {
return err
}
_, err = s.db.Exec(`
UPDATE hosts SET agent_version = ?, last_report_at = datetime('now'), updated_at = datetime('now')
WHERE host_id = ?`, d.AgentVersion, hostID)
return err
}
// UpsertGuestFromReport upserts the REALITY columns of a guest. On conflict it
// must NOT clobber the inert columns (api_key / desired_spec_json).
func (s *Store) UpsertGuestFromReport(g *Guest) error {
_, err := s.db.Exec(`
INSERT INTO guests (guest_id, customer_id, host_id, vmid, display_name, status,
controller_version, last_seen_at, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'), datetime('now'))
ON CONFLICT(guest_id) DO UPDATE SET
vmid = excluded.vmid,
display_name = excluded.display_name,
status = excluded.status,
controller_version = excluded.controller_version,
last_seen_at = datetime('now'),
updated_at = datetime('now')`,
g.GuestID, g.CustomerID, g.HostID, g.VMID, g.DisplayName, g.Status,
g.ControllerVersion,
)
return err
}
// GetHostStaleness returns per-host recency for the dead-man's-switch. Hosts that
// have never reported (NULL last_report_at) are skipped — a freshly-minted host is
// not "down" until it has checked in at least once.
func (s *Store) GetHostStaleness() ([]HostStaleRow, error) {
rows, err := s.db.Query(`SELECT host_id, customer_id, last_report_at FROM hosts WHERE last_report_at IS NOT NULL`)
if err != nil {
return nil, err
}
defer rows.Close()
var out []HostStaleRow
for rows.Next() {
var r HostStaleRow
var last string
if err := rows.Scan(&r.HostID, &r.CustomerID, &last); err != nil {
return nil, err
}
r.LastReportAt = parseSQLiteTime(last)
out = append(out, r)
}
return out, rows.Err()
}
+1 -1
View File
@@ -10,7 +10,7 @@ import (
"time" "time"
) )
const templateRawURL = "https://gitea.dooplex.hu/admin/deploy-felhom-compose/raw/branch/main/controller/configs/controller.yaml.example" const templateRawURL = "https://gitea.dooplex.hu/admin/felhom-controller/raw/branch/main/controller/configs/controller.yaml.example"
// TemplateFetcher periodically fetches controller.yaml.example from the Gitea // TemplateFetcher periodically fetches controller.yaml.example from the Gitea
// repo and caches it for config generation. Falls back to go:embed default. // repo and caches it for config generation. Falls back to go:embed default.
+1 -1
View File
@@ -187,7 +187,7 @@ spec:
cpu: "50m" cpu: "50m"
containers: containers:
- name: umami - name: umami
image: ghcr.io/umami-software/umami:postgresql-latest image: ghcr.io/umami-software/umami:3.1.0
ports: ports:
- containerPort: 3000 - containerPort: 3000
env: env:
+14 -1
View File
@@ -105,9 +105,22 @@ spec:
labels: labels:
app: filebrowser app: filebrowser
spec: spec:
# filebrowser v2.63.13 (debian default) runs as a non-root UID by default
# and can't write to PVC files left by the previous v2-alpine image (which
# ran as root). Force root explicitly so the existing PVC contents are
# readable + writable. (The alternative -- chown the PVC then drop perms --
# needs a one-shot initContainer; not worth the moving parts here.)
securityContext:
runAsUser: 0
runAsGroup: 0
containers: containers:
- name: filebrowser - name: filebrowser
image: filebrowser/filebrowser:v2-alpine image: filebrowser/filebrowser:v2.63.13
# v2.63.x default config path is `/config/settings.json`; our ConfigMap
# is mounted at `/.filebrowser.json`. Tell filebrowser to read it
# explicitly so it picks up port 8080 (else it falls back to port 80
# and the readiness probe on 8080 fails).
args: ["-c", "/.filebrowser.json"]
ports: ports:
- containerPort: 8080 - containerPort: 8080
volumeMounts: volumeMounts: