This commit is contained in:
2026-06-08 09:15:16 +02:00
parent 8ae6e8abf3
commit bb0a9e7205
4 changed files with 165 additions and 58 deletions
+18 -6
View File
@@ -91,9 +91,10 @@ credentials.
| customer ↔ controller UI | management UI | Cloudflare Tunnel; UI auth (bcrypt) | the customer's own box | | customer ↔ controller UI | management UI | Cloudflare Tunnel; UI auth (bcrypt) | the customer's own box |
| controller ↔ agent | snapshot/resize/backup requests | local constrained RPC; agent authorizes per-guest | the controller's own guest only | | controller ↔ agent | snapshot/resize/backup requests | local constrained RPC; agent authorizes per-guest | the controller's own guest only |
| agent ↔ hub | reports + signed jobs | outbound poll; signed jobs | one box; signed jobs limit forgery | | agent ↔ hub | reports + signed jobs | outbound poll; signed jobs | one box; signed jobs limit forgery |
| controller ↔ hub | app-domain reports/jobs | outbound, own API key | app-domain of one customer | | controller ↔ hub | app-domain reports/jobs (incl. geo desired-state) | outbound, own API key | app-domain of one customer |
| box ↔ PBS | encrypted backups | outbound; per-customer namespace; client-side encryption | ciphertext only (operator can't read) | | box ↔ PBS | encrypted backups | outbound; per-customer namespace; client-side encryption | ciphertext only (operator can't read) |
| guest ↔ Proxmox host | **(none direct)** | the guest holds no Proxmox creds; all via the agent | — | | guest ↔ Proxmox host | **(none direct)** | the guest holds no Proxmox creds; all via the agent | — |
| hub ↔ Cloudflare API | geo-restriction WAF (enforcement) | the **hub** holds the CF API token; reconciles geo desired-state → WAF | the customer's zone/WAF |
--- ---
@@ -123,8 +124,10 @@ credentials.
DNS/routing stay intact through an outage. DNS/routing stay intact through an outage.
- **Outbound only** for control/report/backup (poll to hub, push to PBS). No inbound control - **Outbound only** for control/report/backup (poll to hub, push to PBS). No inbound control
endpoint exists in the chosen model. endpoint exists in the chosen model.
- **OPEN:** Cloudflare Tunnel placement host vs guest (`cloudflared` on the Proxmox host - **Tunnel placement: host** (resolved, Part 3 §3/§5). `cloudflared` runs on the Proxmox host
routing to guest services, or inside the customer LXC). To resolve in a later part. as its own **agent-managed systemd service** — not inside the guest — so the data path
survives control-plane death by construction. Geo-restriction WAF is **hub-enforced** (the
hub holds the CF API token; the controller only reports geo desired-state).
--- ---
@@ -190,9 +193,7 @@ credentials.
## 11. Open sub-decisions (carried into later parts) ## 11. Open sub-decisions (carried into later parts)
- Cloudflare Tunnel placement: host vs guest (§7).
- **RTO/RPO targets** → drive the backup + offsite-replication schedule (§8). - **RTO/RPO targets** → drive the backup + offsite-replication schedule (§8).
- Self-update flow (scenario 5) — not yet designed.
- Offboarding / decommission (scenario 6) — not yet designed; must honour "never hold data - Offboarding / decommission (scenario 6) — not yet designed; must honour "never hold data
hostage" in credential revocation + data hand-off. hostage" in credential revocation + data hand-off.
- Multi-tenant resource fairness — deferred until multi-tenant is real (§2). - Multi-tenant resource fairness — deferred until multi-tenant is real (§2).
@@ -205,4 +206,15 @@ credentials.
- **Phase 1** → §3/§5: validated the privilege boundary (create/allocate is operator-tier). - **Phase 1** → §3/§5: validated the privilege boundary (create/allocate is operator-tier).
The guest-side scoped-backup-token it proved possible is **not** used — we chose the The guest-side scoped-backup-token it proved possible is **not** used — we chose the
agent-mediated path — but it confirmed restore = operator-tier, which shapes the agent. agent-mediated path — but it confirmed restore = operator-tier, which shapes the agent.
- **Phase 2** → §8/§9: backup→restore round-trip; identity reset on restore. - **Phase 2** → §8/§9: backup→restore round-trip; identity reset on restore.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
- §5 trust boundaries: **added `hub ↔ Cloudflare API`** row (hub holds the CF token, enforces
geo→WAF); controller↔hub row notes it carries geo desired-state (S4).
- §7 networking: **tunnel placement resolved → host** (agent-managed systemd service); geo is
hub-enforced (S4/S5).
- §11 open items: removed the now-resolved **tunnel placement** and **self-update flow** entries
(S5; self-update designed in 03 §11).
+36 -21
View File
@@ -54,7 +54,7 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| `appexport/` | `.fab` app export/import (config+DB+volumes, AES-256-CTR+scrypt) | **backup** (DB dump), (provider iface → stacks) | | `appexport/` | `.fab` app export/import (config+DB+volumes, AES-256-CTR+scrypt) | **backup** (DB dump), (provider iface → stacks) |
| `assets/` | Download/cache app assets from Hub API | — (HTTP only) | | `assets/` | Download/cache app assets from Hub API | — (HTTP only) |
| `backup/` | DB dumps, Docker-volume archive, **restic**, **cross-drive rsync**, per-app restore, **drive mount**, disk-layout, infra-backup metadata | config, monitor, settings, system, util | | `backup/` | DB dumps, Docker-volume archive, **restic**, **cross-drive rsync**, per-app restore, **drive mount**, disk-layout, infra-backup metadata | config, monitor, settings, system, util |
| `cloudflare/` | Geo-restriction via Cloudflare WAF (zone/waf/geosync/countries) | settings | | `cloudflare/` | Geo-restriction via Cloudflare WAF (zone/waf/geosync/countries)**enforcement → hub** (S4) | settings |
| `config/` | `controller.yaml` schema + load | — | | `config/` | `controller.yaml` schema + load | — |
| `crypto/` | AES-256-GCM for app.yaml secrets | — | | `crypto/` | AES-256-GCM for app.yaml secrets | — |
| `integrations/` | App-to-app (OnlyOffice→FileBrowser/Nextcloud) via docker exec / config patch | stacks, crypto, settings | | `integrations/` | App-to-app (OnlyOffice→FileBrowser/Nextcloud) via docker exec / config patch | stacks, crypto, settings |
@@ -88,7 +88,7 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| File | Class | Reason | Risk | | File | Class | Reason | Risk |
|---|---|---|---| |---|---|---|---|
| `api/router.go` | **PORT/MODIFY** | Keep stacks/deploy/integrations/metrics/sync/assets/selfupdate routes; **remove `/api/storage/*` (disk)**; backup routes become **agent-coordinated guest-backup** requests; `config/apply` (hub-pushes-yaml) changes since the **agent** now injects config at provision. | needs-rework | | `api/router.go` | **PORT/MODIFY** | Keep stacks/deploy/integrations/metrics/sync/assets/selfupdate routes; **remove `/api/storage/*` (disk)**; backup routes become **agent-coordinated guest-backup** requests; `config/apply` (hub-pushes-yaml) changes since the **agent** now injects config at provision. | needs-rework |
| `api/geo.go` | **PORT (blocked)** | Geo is app-domain, but gated on the tunnel-placement decision (doc 01 §7/§11). | blocked | | `api/geo.go` | **PORT/MODIFY** | Keep the customer-facing geo **preference** endpoints (set/get global + per-app); **drop the Cloudflare-sync trigger** — enforcement → hub (S4). The controller reports geo desired-state up instead of calling the CF API. | needs-rework |
### `appexport/` — KEEP/PORT (Docker-volume + DB level, no disk ops) ### `appexport/` — KEEP/PORT (Docker-volume + DB level, no disk ops)
| File | Class | Reason | Risk | | File | Class | Reason | Risk |
@@ -120,15 +120,15 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| `local_infra.go` | **DELETE (→agent)** | Per-drive infra-backup metadata → agent. | clean | | `local_infra.go` | **DELETE (→agent)** | Per-drive infra-backup metadata → agent. | clean |
| `restore_scan.go` | **DELETE (→agent)** | Scans drives to build a DR restore plan = agent-tier DR. | needs-rework | | `restore_scan.go` | **DELETE (→agent)** | Scans drives to build a DR restore plan = agent-tier DR. | needs-rework |
### `cloudflare/` — BLOCKED on tunnel-placement (doc 01 §7/§11) ### `cloudflare/` — DELETE (→hub): CF-API enforcement moves to the hub (S4)
| File | Class | Reason | Risk | | File | Class | Reason | Risk |
|---|---|---|---| |---|---|---|---|
| `client.go`,`zone.go`,`waf.go`,`geosync.go`,`countries.go` | **PORT (blocked)** | Geo-restriction WAF is app-domain and could stay in the controller, but it shares the Cloudflare account/zone with the **tunnel**, whose host-vs-guest placement is undecided. Classify provisionally PORT; do not force. | blocked | | `client.go`,`zone.go`,`waf.go`,`geosync.go`,`countries.go` | **DELETE (→hub)** | The **hub** holds the CF API token and reconciles geo desired-state → WAF (doc 01 §5, doc 03 §2). The controller no longer calls the Cloudflare API — it reports geo desired-state up. The customer-facing geo *preference UI/data* stays (see `api/geo.go`). | needs-rework |
### `config/`, `crypto/`, `util/` ### `config/`, `crypto/`, `util/`
| File | Class | Reason | Risk | | File | Class | Reason | Risk |
|---|---|---|---| |---|---|---|---|
| `config/config.go` | **MODIFY** | Drop `BackupConfig` (restic/retention) and storage-drive keys; keep customer/paths/web/git/stacks/monitoring/hub/assets/system; **add agent local-API endpoint+token**. Self-update section gated (open). | needs-rework | | `config/config.go` | **MODIFY** | Drop `BackupConfig` (restic/retention), storage-drive keys, and `InfrastructureConfig.cf_api_token` (→hub, S4); keep customer/paths/web/git/stacks/monitoring/hub/assets/system; **add agent local-API endpoint+token**. | needs-rework |
| `crypto/crypto.go` | **KEEP** | App.yaml secret encryption. | clean | | `crypto/crypto.go` | **KEEP** | App.yaml secret encryption. | clean |
| `util/strings.go` | **KEEP** | Trivial helper. | clean | | `util/strings.go` | **KEEP** | Trivial helper. | clean |
@@ -169,16 +169,17 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| `infra_backup.go`/`_linux.go`/`_other.go` | **DELETE (→agent)** | Builds infra-backup payload (disk layout, restic/enc passwords) for hub. | hazard | | `infra_backup.go`/`_linux.go`/`_other.go` | **DELETE (→agent)** | Builds infra-backup payload (disk layout, restic/enc passwords) for hub. | hazard |
| `infra_pull.go` | **DELETE (→agent)** | Pulls recovery config + infra backup from hub (setup-wizard DR). | needs-rework | | `infra_pull.go` | **DELETE (→agent)** | Pulls recovery config + infra backup from hub (setup-wizard DR). | needs-rework |
### `selfupdate/` — OPEN (doc 01 §11: "self-update flow not yet designed") ### `selfupdate/` — controller is agent-managed (doc 03 §11)
| File | Class | Reason | Risk | | File | Class | Reason | Risk |
|---|---|---|---| |---|---|---|---|
| `version.go`,`state.go` | **KEEP** | Semver parse; update audit state. | clean | | `version.go` | **KEEP** | Semver parse / version string (still used for reporting). | clean |
| `updater.go` | **PORT (open)** | Pulls image + edits `docker-compose.yml` + `compose up -d`. In the agent model the controller is the **agent's product** (doc 01 §3) — self-update may move under the agent. Flag as open. | blocked | | `state.go` | **DELETE (obsolete)** | Self-update audit state — the agent owns controller updates now (doc 03 §11). | clean |
| `updater.go` | **DELETE (→agent)** | Resolved (doc 03 §11): the controller is **agent-managed** — the agent snapshots → redeploys → health-gates → rolls back the controller. The controller's old self-update path (image pull + compose edit) is **removed**. | clean |
### `settings/` ### `settings/`
| File | Class | Reason | Risk | | File | Class | Reason | Risk |
|---|---|---|---| |---|---|---|---|
| `settings/settings.go` (1101 L) | **MODIFY (split)** | Keep notif prefs, integration state, geo, DB-validation cache, cross-drive *intent*. The **storage-path registry** (`StoragePath` with `Disconnected`/`DisconnectedAt`/`StoppedStacks`/decommission/UUID) is disk-management state → reshape to **per-volume placement** fed by the agent's storage manifest; disconnect/decommission/migrate state leaves. | hazard | | `settings/settings.go` (1101 L) | **MODIFY (split)** | Keep notif prefs, integration state, geo, DB-validation cache, cross-drive *intent*. The **storage-path registry** (`StoragePath` with `Disconnected`/`DisconnectedAt`/`StoppedStacks`/decommission) is disk-management state → reshape to **per-volume placement** fed by the agent's storage manifest; disconnect/decommission/migrate state leaves. (UUID is *not* a persisted field — runtime-derived from fstab.) | hazard |
### `setup/` — all DELETE (obsolete); the agent provisions the controller ### `setup/` — all DELETE (obsolete); the agent provisions the controller
| File | Class | Reason | Risk | | File | Class | Reason | Risk |
@@ -272,7 +273,7 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
host-info first.** host-info first.**
6. **`settings/StoragePath` carries disk state into an app-domain store.** Disk fields 6. **`settings/StoragePath` carries disk state into an app-domain store.** Disk fields
(`Disconnected`,`DisconnectedAt`,`StoppedStacks`, decommission, UUID) are written by (`Disconnected`,`DisconnectedAt`,`StoppedStacks`, decommission — UUID is *not* persisted, it's runtime-derived from fstab via `system.ParseFstabUUID`/`watchdog.go`) are written by
`watchdog.go`/`storage_handlers.go`/`crossdrive.go` (all delete) but the same struct is `watchdog.go`/`storage_handlers.go`/`crossdrive.go` (all delete) but the same struct is
read by `stacks`/`web` for labels and **placement** (keep). Reshape `StoragePath` to a read by `stacks`/`web` for labels and **placement** (keep). Reshape `StoragePath` to a
placement record fed by the agent manifest. placement record fed by the agent manifest.
@@ -327,10 +328,11 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
2. **Per-volume storage placement** (doc 01 §8) — `.felhom.yml` `hot`/`bulk` volume 2. **Per-volume storage placement** (doc 01 §8) — `.felhom.yml` `hot`/`bulk` volume
classification (extend `stacks/metadata.go`), enforcement at deploy (extend classification (extend `stacks/metadata.go`), enforcement at deploy (extend
`stacks/deploy.go`), and a placement record in `settings`. Replaces the per-app `stacks/deploy.go`), and a placement record in `settings`. Replaces the per-app
HDD-path + cross-drive model. HDD-path + cross-drive model. A `bulk` volume must be realized as a `backup=0` mount point,
3. **Self-restore-test orchestration** — controller asks the agent to restore the latest **never** a rootfs Docker named volume (validated recipe: `phase3-findings.md` B2 / doc 03 §7).
guest backup to a scratch guest, runs its post-restore health probes, reports the 3. **Self-restore-test status display** (read-only) — the **agent owns orchestration** (it
verdict to the hub. (Backed by the validated Phase 2 round-trip in holds the PBS key and creates the scratch guest — operator-tier, doc 03 §8); the controller
only surfaces `GET /restore-test/status` in its UI. (Round-trip validated: Phase 2,
[../proxmox-platform.md](../proxmox-platform.md) §4.) [../proxmox-platform.md](../proxmox-platform.md) §4.)
4. **Snapshot-before-deploy/rollback flow** in the deploy path — wraps the existing 4. **Snapshot-before-deploy/rollback flow** in the deploy path — wraps the existing
compose deploy with agent snapshot → health check → agent rollback-on-failure compose deploy with agent snapshot → health check → agent rollback-on-failure
@@ -343,13 +345,12 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
## 6. Open / blocked items ## 6. Open / blocked items
- **`cloudflare/` + `api/geo.go` — blocked on tunnel placement** (doc 01 §7, §11: host vs - **Geo — resolved (S4):** CF-API **enforcement moves to the hub** (it holds the CF token and
guest `cloudflared`). Geo-WAF is app-domain and likely PORT, but it shares the reconciles geo → WAF); the controller keeps the geo **preference UI/data** and reports
Cloudflare account/zone with the tunnel; do not finalize until placement is decided. desired-state up. Tunnel placement is settled (host, agent-managed, doc 03 §3/§5). The
- **`selfupdate/updater.go` — open** (doc 01 §11: self-update flow undesigned). Because the `cloudflare/` package + `api/geo.go`'s CF-sync are DELETE-from-controller → hub.
controller is "the agent's product" (doc 01 §3), self-update may move under the agent - **Self-update — resolved (doc 03 §11):** the controller is agent-managed; its self-update
(snapshot → swap → health-gate → rollback) rather than the controller editing its own path is removed.
compose file. Provisionally PORT.
- **`settings`/`stacks` per-volume reshape** — depends on the storage-manifest contract - **`settings`/`stacks` per-volume reshape** — depends on the storage-manifest contract
between hub ↔ agent ↔ controller (doc 01 §8), not yet specified. between hub ↔ agent ↔ controller (doc 01 §8), not yet specified.
- **Backup UI/report surface** — depends on the agent's guest-backup status API shape - **Backup UI/report surface** — depends on the agent's guest-backup status API shape
@@ -357,3 +358,17 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
- **Notification event taxonomy** — which infra events (`storage_disconnected`, - **Notification event taxonomy** — which infra events (`storage_disconnected`,
`crossdrive_*`, `disaster_recovery_*`) the **agent** emits vs the controller, once those `crossdrive_*`, `disaster_recovery_*`) the **agent** emits vs the controller, once those
responsibilities move. responsibilities move.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
- **M1:** removed `UUID` from the `settings.StoragePath` field lists (§ settings, hazard #6) —
it is runtime-derived from fstab, not persisted.
- **S4 (geo):** `cloudflare/` reclassified **PORT(blocked) → DELETE(→hub)** (CF-API enforcement
moves to the hub); `api/geo.go`**PORT/MODIFY** (keep geo *preference* endpoints, drop the
CF-sync trigger); `config/config.go` also drops `cf_api_token`. §6 + §1 updated.
- **S5:** cloudflare/geo no longer "blocked on tunnel placement" (resolved).
- **S6:** §5(3) self-restore-test → **status-display only**; the agent owns orchestration.
- **Self-update resolved (03 §11):** `updater.go`**DELETE(→agent)**, `state.go`
DELETE(obsolete), `version.go` KEEP; §6 + §5(2) updated (bulk = `backup=0` mountpoint recipe).
+101 -31
View File
@@ -29,11 +29,11 @@ N-guests, never "the guest").
Owns: Owns:
1. **Proxmox lifecycle** — create/start/stop/destroy guests, snapshots, storage allocation. Via a scoped Proxmox API token (minimal role from Phase 1) for everything the API covers; raw host ops only where unavoidable. 1. **Proxmox lifecycle** — create/start/stop/destroy guests, snapshots, storage allocation. Via a scoped Proxmox API token (the **`FelhomAgent` operator role** — `proxmox-platform.md` §3.6, validated Phase 3 B3) for everything the API covers; raw host ops only where unavoidable.
2. **Storage management** — attach/classify targets, reconcile the storage manifest, mount USB-by-UUID, present mounts into guests. 2. **Storage management** — attach/classify targets, reconcile the storage manifest, mount USB-by-UUID, present mounts into guests.
3. **Backup/restore orchestration** — vzdump to the tiers, PBS, snapshot management, and the **self-restore-test**. 3. **Backup/restore orchestration** — vzdump to the tiers, PBS, snapshot management, and the **self-restore-test**.
4. **Host & tunnel monitoring** — host metrics, guest up/down, storage-target status, and `cloudflared` health; reports the host domain to the hub. 4. **Host & tunnel monitoring** — host metrics, guest up/down, storage-target status, and `cloudflared` health; reports the host domain to the hub.
5. **Provisioning**build a guest, deploy the controller into it, hand it its bootstrap config. 5. **Provisioning**provision a guest **by restoring the golden base image** (§9), deploy the controller into it, hand it its bootstrap config; also **build and refresh the golden base image** itself.
6. **Hub control loop** — poll for desired state + signed jobs, reconcile, execute, report, heartbeat. 6. **Hub control loop** — poll for desired state + signed jobs, reconcile, execute, report, heartbeat.
7. **Local API** — the per-guest authorization gate the controller calls. 7. **Local API** — the per-guest authorization gate the controller calls.
8. **Self-update** — update itself (carefully — it is a host service) and update the controllers it owns. 8. **Self-update** — update itself (carefully — it is a host service) and update the controllers it owns.
@@ -43,11 +43,12 @@ Explicitly does **not**:
- Serve application traffic or sit in the data path. **Control plane, not data plane**: if the agent dies, apps keep serving (Docker + LXC run without it); only *management* degrades — no new backups, no provisioning, hub loses the heartbeat. - Serve application traffic or sit in the data path. **Control plane, not data plane**: if the agent dies, apps keep serving (Docker + LXC run without it); only *management* degrades — no new backups, no provisioning, hub loses the heartbeat.
- Hold or proxy customer application data. - Hold or proxy customer application data.
- Run inside a guest. It is the thing that recovers guests and the host; it cannot be one of them. - Run inside a guest. It is the thing that recovers guests and the host; it cannot be one of them.
- Manage **geo-restriction / the Cloudflare API**. Geo is hub-owned: the customer sets it in the controller UI, the controller reports the geo desired-state to the hub, and the **hub** (holding the CF API token) reconciles the WAF (S4). The agent manages only the *tunnel* service (`cloudflared`, §3/§5), never WAF rules.
## 3. Process model & host integration ## 3. Process model & host integration
- **Native Go binary, systemd service** on the host: boot-start, `Restart=always`, systemd watchdog (kill+restart on hang), journald logging, resource limits. - **Native Go binary, systemd service** on the host: boot-start, `Restart=always`, systemd watchdog (kill+restart on hang), journald logging, resource limits.
- **Root-minimized.** Default to a **non-root** service user with the scoped Proxmox token for API-covered work + a **narrow `sudoers` allowlist** for the handful of true host ops (USB mount-by-UUID, systemd mount units). Full root on the crown-jewel host is what a compromise most wants; avoid it where the API or a scoped sudoers entry suffices. *(Open: confirm during build which ops genuinely need host root vs. are API-covered — the Phase-1 minimal role is the API floor.)* - **Root-minimized (boundary settled — Phase 3 B3).** The agent runs as a **non-root** service user with the scoped `FelhomAgent` token for all API-covered work + a **narrow `sudoers` allowlist** for true host ops. Per Phase 3 (B3) the boundary is settled: the entire per-customer guest lifecycle — provision (by restore, §9), config, start/stop, snapshot, backup, **restore**, destroy — is token-covered. Genuine OS-root is confined to: (1) building/refreshing the **golden base image** (`keyctl` create is `root@pam`-only — one-time at enrollment + a maintenance cadence, §9); (2) **host mounts** (USB mount-by-UUID, systemd mount units / fstab); (3) **SMART / hardware sensors**. Root therefore never sits on the per-customer path. See `proxmox-platform.md` §3.6 for the role + boundary table.
- **`cloudflared` is a separate systemd service**, not embedded in the agent. This is what makes the data path survive control-plane death by construction. The agent **manages and health-watches** it (see §5) but the tunnel does not live or die with the agent process. - **`cloudflared` is a separate systemd service**, not embedded in the agent. This is what makes the data path survive control-plane death by construction. The agent **manages and health-watches** it (see §5) but the tunnel does not live or die with the agent process.
## 4. Control model — reconcile + signed destructive ops ## 4. Control model — reconcile + signed destructive ops
@@ -71,10 +72,12 @@ snapshot S"), not a procedure; the agent owns the *how*.
**The reversibility gate (security-critical).** **The reversibility gate (security-critical).**
"Signed jobs resist hub compromise" only holds if the agent also distrusts hub-supplied "Signed jobs resist hub compromise" only holds if the agent also distrusts hub-supplied
*desired state* for destructive changes. So: *desired state* for destructive changes. The gate is by **provenance + data-bearing-ness, not
by verb**:
- **Irreversible/destructive operations** — guest destroy, storage detach/wipe, restore-overwrite, decommission — require a valid **operator signature**, *regardless of whether they arrive as a job or as a desired-state delta*. A compromised hub cannot forge them because the signing key is **not held by the hub** (it lives with the operator / a separate signing path; the hub only queues opaque signed blobs). - **The reconciler MAY act without an operator signature** when: (a) creating/starting/restarting; (b) destroying resources it created earlier **within the same journaled transaction** (compensating rollback, §10); (c) destroying resources it **tagged ephemeral/scratch** (e.g. restore-test scratch guests, §8). The ephemeral/scratch tag is **agent-internal provenance and is never accepted from the hub** — else a compromised hub could relabel a data-bearing guest as scratch to walk the gate.
- **Benign convergence** — deploy a guest, attach storage, adjust a non-destructive policy, bump a controller version — runs on normal hub API auth, no signature. - **An operator signature is always required** to destroy/overwrite any resource holding the only/primary copy of customer data — live-guest destroy, storage detach/wipe, restore-overwrite, decommission — *regardless of whether it arrives as a job or as a desired-state delta*. A compromised hub cannot forge them because the signing key is **not held by the hub** (it lives with the operator / a separate signing path; the hub only queues opaque signed blobs).
- **Healing a crashed controller is non-destructive by construction:** it is reconstructable from its image + the guest's persistent volume, so "redeploy" = restart the LXC / `docker compose up -d` **inside the existing guest** — never a guest destroy. (v0.33 precedent: `watchdog.go` restarts stopped stacks, it never destroys the guest.)
Signed payloads carry a **nonce + expiry** (anti-replay: a captured "restore" job cannot be Signed payloads carry a **nonce + expiry** (anti-replay: a captured "restore" job cannot be
re-injected later) and a target binding (host + guest id) so a signature can't be retargeted. re-injected later) and a target binding (host + guest id) so a signature can't be retargeted.
@@ -111,15 +114,22 @@ The controller (in its LXC) reaches the agent (on the host) over the local bridg
- `POST /snapshot` — snapshot *this* guest (the snapshot-before-deploy primitive). - `POST /snapshot` — snapshot *this* guest (the snapshot-before-deploy primitive).
- `POST /rollback` — roll *this* guest back to a named snapshot (post-deploy failure recovery). - `POST /rollback` — roll *this* guest back to a named snapshot (post-deploy failure recovery).
- `POST /backup` — request a backup-now of *this* guest (enqueued; non-destructive). - `POST /backup` — request a backup-now of *this* guest (enqueued; non-destructive).
- `GET /backup/due` — whether a policy-scheduled backup is due for *this* guest, so the controller can quiesce then call `POST /backup` (the app-consistent path, §8).
- `GET /backup/status`, `GET /restore-test/status` — read-only status for the controller's UI. - `GET /backup/status`, `GET /restore-test/status` — read-only status for the controller's UI.
Note what is *absent*: nothing here lets a controller touch another guest, the host, storage Note what is *absent*: nothing here lets a controller touch another guest, the host, storage
attachment, or restore-overwrite. Destructive/cross-guest power stays operator-signed (§4). attachment, or restore-overwrite. Destructive/cross-guest power stays operator-signed (§4).
A controller can only `POST /rollback` (or snapshot/backup) **its own** guest — the agent maps
token → guest and authorizes per guest, so a compromised controller's blast radius is
**self-scoped and bounded** to its own guest.
## 7. Storage manifest & reconciliation ## 7. Storage manifest & reconciliation
The manifest is the load-bearing contract (it absorbs the disk-state fields that The manifest is the load-bearing contract. It absorbs the **persisted** disk-state fields that
`settings.StoragePath` carries today — see Part 2). Held in the hub, reconciled by the agent. `settings.StoragePath` carries today **and adds** `durable_id`/UUID — today the controller
re-derives the UUID from fstab each boot (Part 2 / Phase-3), so persisting it is an
improvement. Held in the hub, reconciled by the agent.
Per target: Per target:
@@ -138,10 +148,20 @@ allowlist), each Proxmox storage entry matches, and `disconnected` targets are s
the hub (the storage watchdog — detect a USB drop in seconds, not at the next health cycle). the hub (the storage watchdog — detect a USB drop in seconds, not at the next health cycle).
**Placement is per-volume, not per-app.** Hot volumes (DB/config) → a `fast` target, **Placement is per-volume, not per-app.** Hot volumes (DB/config) → a `fast` target,
**enforced**; bulk volumes (media) → may live on `slow`, declared in `.felhom.yml`. **Bulk **enforced**; bulk volumes (media) → may live on `slow`, declared in `.felhom.yml`.
external mounts are excluded from the guest's vzdump** (a per-mount backup flag) and carry
their own per-volume policy (file-level to a tier, or explicitly *not* backed up for A `bulk` volume **MUST** be realized as a `backup=0` **volume mount point** (or an external
re-downloadable media). This is what keeps a 1 TB media drive out of the whole-guest image. bind mount) — **never** a Docker named volume in rootfs, which `vzdump` always captures
(verified, `phase3-findings.md` B2). Proven recipe: attach
`-mpN <storage>:<size>,mp=/mnt/bulk,backup=0`, then
`docker volume create --driver local -o type=none -o o=bind -o device=/mnt/bulk <vol>` (or a
compose bind). The per-volume placement component (Part 2 §5(2)) enforces this at deploy. The
**DR consequence** of excluding bulk is covered in §8.
**Field re-homing (from `settings.StoragePath`, Part 2):** `Label` → manifest (canonical);
`IsDefault`/`Schedulable` → manifest `policy`; `MigratedTo` + decommission → manifest `state`;
`StoppedStacks` → the **controller's `settings`** (app-domain: which apps to restart on
reconnect, not a host concern).
## 8. Backup/restore orchestration ## 8. Backup/restore orchestration
@@ -149,9 +169,18 @@ Tiers double as backup *and* restore-source priority (fastest surviving source f
per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a backup) → per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a backup) →
**local second storage** (vzdump to dir/NFS/CIFS) → **PBS offsite** (the DR substrate). **local second storage** (vzdump to dir/NFS/CIFS) → **PBS offsite** (the DR substrate).
- **Quiescing:** the controller stops the app stack (volume-consistent) before a guest - **Quiescing (controller-driven for app-consistency):** an LXC has no fsfreeze
vzdump where app-consistency matters; stop-mode/snapshot-mode per Phase 1. Every Proxmox (`proxmox-platform.md` §4.2), so app-consistency is the controller's job: it learns a backup
op is async → the agent polls `task exitstatus`, never trusts the POST return. is due (`GET /backup/due`, §6, or via its hub channel) → **quiesces** the app stack →
`POST /backup` → polls `GET /backup/status` → unquiesces. **An agent-initiated vzdump is
crash-consistent only** (there is no inbound-to-guest channel to trigger a quiesce — §3/§5).
Every Proxmox op is async → the agent polls `task exitstatus`, never trusts the POST return.
- **Bulk volumes have no DR coverage from the guest vzdump** — they are excluded (§7). Every
`bulk` volume needs an explicit own-backup decision: its own backup target per the manifest
`policy`, **or deliberately none** when the data is re-downloadable (customer informed). On
host-loss, un-backed-up bulk is gone; a **bind-mounted** bulk volume re-attaches only on the
*same* host, so cross-host DR needs the separate backup. A deliberate per-volume choice,
never a silent loss.
- **Key custody (PBS):** the **live** PBS key sits on the box so the agent can both back up - **Key custody (PBS):** the **live** PBS key sits on the box so the agent can both back up
*and* run restore-tests. The hub holds only the **recovery-code-wrapped escrow** copy it *and* run restore-tests. The hub holds only the **recovery-code-wrapped escrow** copy it
cannot open (zero-knowledge default). So: the box can restore-test; the operator cannot cannot open (zero-knowledge default). So: the box can restore-test; the operator cannot
@@ -165,15 +194,31 @@ per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a ba
## 9. Provisioning & DR flows ## 9. Provisioning & DR flows
**Provisioning (reconcile-driven).** Desired state says "this customer should have guest G **Provisioning (reconcile-driven, by restore).** Fresh creation of a Docker-capable LXC needs
with controller C." The agent: enrolls (mints its scoped Proxmox token as root at setup) → the `keyctl=1` feature flag, which Proxmox permits only for `root@pam` (Phase 3, B3) — not the
creates the LXC (unprivileged, `nesting=1,keyctl=1`, overlayfs — Phase 0) → deploys the scoped token. But a token-authorized **restore preserves `keyctl`** (Phase 3, B3), so the agent
controller → hands it the bootstrap config (identity, hub API key, local-API token, mount provisions **by restoring a golden base image**, never by `pct create` on the per-customer path:
map). If any step fails, reconciliation retries; a half-built guest is journaled (§10) and
rolled back, never orphaned. - A **golden base archive** — minimal Debian + Docker, `nesting=1,keyctl=1`, overlayfs — is
built once as `root@pam` **at enrollment** (when the agent legitimately holds root to mint its
Proxmox token) and refreshed on a maintenance cadence. This is the one place `keyctl`/root
provisioning lives — off the per-customer path.
- To provision guest G: restore the golden archive → new VMID (token-covered: `VM.Allocate` +
`Datastore.AllocateSpace`; `keyctl` preserved) → reset identity (MAC/hostname) → size the guest
(CPU/mem config + `pct resize` rootfs, token-covered) → attach storage mounts per the manifest
→ deploy the controller → hand it bootstrap config. A mid-flight failure is journaled and
compensating-rolled-back (destroy the just-restored guest — allowed without a signature per §4,
same-transaction provenance).
**Unified bring-up primitive.** Provisioning and DR-restore share the same token-covered front
half — *restore an archive → reset identity* — and differ only in the archive and the back half:
provisioning restores the **golden base** then deploys a fresh controller; DR-restore restores
the **customer's backup** (already containing controller + data), brings it up, and reattaches
external storage. One code path, exercised by every restore-test (§8).
**Guest loss.** Agent restores G from the fastest surviving tier and resets identity **Guest loss.** Agent restores G from the fastest surviving tier and resets identity
(MAC/hostname) so the restored guest rejoins cleanly. (MAC/hostname) so the restored guest rejoins cleanly — this *is* the unified restore primitive
above (customer-backup archive, DR back half).
**Host/hardware loss.** Re-enroll the new host in **restore mode**; the hub — the durable **Host/hardware loss.** Re-enroll the new host in **restore mode**; the hub — the durable
source of truth that survives box death — hands the new agent the existing identity, PBS source of truth that survives box death — hands the new agent the existing identity, PBS
@@ -185,17 +230,21 @@ the hub record, so DNS stays intact.
- **Per-guest serialization.** Reconcile, one-shot jobs, and local-API calls all feed a - **Per-guest serialization.** Reconcile, one-shot jobs, and local-API calls all feed a
work queue that serializes mutations **per guest** (Proxmox dislikes concurrent conflicting work queue that serializes mutations **per guest** (Proxmox dislikes concurrent conflicting
ops on the same guest). Independent guests proceed in parallel. ops on the same guest). Independent guests proceed in parallel.
- **Operation journaling.** Multi-step async ops (provision, restore) are journaled with - **Operation journaling.** Multi-step async ops (provision, restore, controller-update, agent
their in-flight Proxmox task ids. On agent restart, the journal is replayed: self-update) are journaled with their in-flight Proxmox task ids. On agent restart, the
resume-or-rollback, so a crash mid-restore never leaves a corrupt or half-built guest. journal is replayed: resume-or-rollback, so a crash mid-restore never leaves a corrupt or
half-built guest.
- **Idempotency keys** on one-shot jobs (run-once across retries and restarts). - **Idempotency keys** on one-shot jobs (run-once across retries and restarts).
## 11. Self-update ## 11. Self-update
- **Agent (the hard case — a host service, no snapshot-rollback).** Atomic binary swap: - **Agent (the hard case — a host service, no snapshot-rollback).** **A/B layout:** download →
download → verify signature → atomic rename → restart; **keep last-known-good**; a watchdog verify signature → stage as the inactive slot → flip a `current → good|new` symlink → restart.
reverts to last-good if the new binary fails to come up healthy. Triggered by a hub signed **Revert authority lives outside the swapped binary**`Restart=always` alone just
job within the update window; manual always allowed. crash-loops a bad binary — so a **separate health-gate** (a systemd oneshot `ExecStartPost`
probe, or a tiny supervisor unit) flips `current` back to last-good and restarts on a failed
health window. The new version is **committed as "good" only after a clean health window**.
Triggered by a hub signed job within the update window; manual always allowed. Journaled (§10).
- **Controller (the easy case — it's a guest).** The agent owns the controller's lifecycle, - **Controller (the easy case — it's a guest).** The agent owns the controller's lifecycle,
so the **agent updates the controller**: snapshot-before-update (free rollback, because the so the **agent updates the controller**: snapshot-before-update (free rollback, because the
controller *is* a snapshottable guest) → pull new image → redeploy → health-check → rollback controller *is* a snapshottable guest) → pull new image → redeploy → health-check → rollback
@@ -214,16 +263,37 @@ argument for §3's root-minimization and a small, auditable agent.
Resolved here: tunnel placement (host, agent-managed, own systemd service), the Resolved here: tunnel placement (host, agent-managed, own systemd service), the
reconcile-vs-jobs fork (hybrid, gated by reversibility), agent process model, self-update reconcile-vs-jobs fork (hybrid, gated by reversibility), agent process model, self-update
ownership, the local-API surface, and the storage-manifest schema. ownership, the local-API surface, the storage-manifest schema, **provision-by-restore**, and
the **root-vs-API boundary** (Phase 3, B3).
Still open: Still open:
- Multi-tenant **resource fairness** on a shared host (per-guest cgroup limits, noisy-neighbor) — deferred to the company-case pass. - Multi-tenant **resource fairness** on a shared host (per-guest cgroup limits, noisy-neighbor) — deferred to the company-case pass.
- Operator-side **signing tooling** — where the operator signing key lives operationally and how a destructive op gets signed without undue friction (offline key vs. a small signing service; the security floor is "not in the hub"). - Operator-side **signing tooling** — where the operator signing key lives operationally and how a destructive op gets signed without undue friction (offline key vs. a small signing service; the security floor is "not in the hub").
- Hub-side **desired-state editing UX** and the host-domain report schema details — belong to the hub architecture doc. - Hub-side **desired-state editing UX** and the host-domain report schema details — belong to the hub architecture doc.
- **Golden base image** refresh cadence + fleet versioning — who triggers a rebuild, how the per-host image version is tracked (operational detail, not blocking; §9).
This doc hands the implementation three contracts it was waiting on: This doc hands the implementation three contracts it was waiting on:
1. the **local-API surface** (§6) → the controller's NEW local-API client, snapshot-before-deploy, and self-restore-test wiring (Part 2); 1. the **local-API surface** (§6) → the controller's NEW local-API client, snapshot-before-deploy, and self-restore-test wiring (Part 2);
2. the **storage-manifest schema** (§7) → the `settings.StoragePath` reshape and per-volume hot/bulk placement (Part 2); 2. the **storage-manifest schema** (§7) → the `settings.StoragePath` reshape and per-volume hot/bulk placement (Part 2);
3. the **backup contract** (§78) → the destination for the app-data-backup package extracted in the Part-2 refactor. 3. the **backup contract** (§78) → the destination for the app-data-backup package extracted in the Part-2 refactor.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
- **NEW provision-by-restore** (§9): the agent provisions by **restoring a golden base image**
(token-covered, preserves `keyctl`), never `pct create` on the per-customer path; one unified
restore primitive shared with DR. §2 responsibility + §3 boundary updated.
- **B3** (§2/§3): replaced "Phase-1 minimal role" with the validated **`FelhomAgent`** operator
role; root-vs-API boundary **settled** (root only for golden-image build, host mounts, SMART).
- **B1** (§4): reversibility gate rewritten as **provenance + data-bearing** (scratch tag is
agent-internal, never hub-supplied; crashed-controller heal is non-destructive in-place).
- **B2** (§7/§8): validated bulk-as-`backup=0`-mountpoint recipe + the **bulk-DR consequence**
(excluded bulk needs its own backup decision).
- **S1** (§6/§8): `GET /backup/due` added; controller-driven quiescing; agent vzdump is
crash-consistent only. **S2** (§10/§11): A/B self-update with external revert authority;
controller-update + agent self-update journaled. **S3** (§7): `StoragePath` field re-homing.
**S4:** geo non-responsibility added (§2). **M2** (§7): manifest "absorbs + adds durable_id".
**§6:** rollback is self-scoped/bounded. **§13:** golden-image refresh cadence added as open.
+10
View File
@@ -1,5 +1,15 @@
# Critical design review — Proxmox re-platform doc set # Critical design review — Proxmox re-platform doc set
> ✅ **RESOLVED (2026-06-08).** All findings folded into 01/02/03 + `proxmox-platform.md`
> (Phase-3 spike run for B2/B3 → `tests/phase3-findings.md`). **Folded:** B1 (03 §4), B2
> (03 §7/§8 + platform §4.7), B3 (03 §2/§3 + platform §3.6), S1 (03 §6/§8), S2 (03 §10/§11),
> S3 (03 §7), S4 (01 §5/§7 + 02 + 03 §2), S5 (01 §7/§11 + 02 §6), S6 (02 §5), M1 (02 §3),
> M2 (03 §7), M3 (03 §10), §6-residual (03 §6). Plus the two Phase-3 design updates:
> provision-by-restore (03 §9) and the settled root-vs-API boundary (03 §3). **Deferred/none:**
> no finding was deferred; the pre-existing open items (operator signing-key mechanics,
> multi-tenant fairness, hub-side desired-state UX, golden-image refresh cadence) remain
> flagged in 03 §13. This artifact can be deleted once confirmed.
Working artifact. Review pass over `01-topology-and-trust.md`, `02-controller-module-map.md`, Working artifact. Review pass over `01-topology-and-trust.md`, `02-controller-module-map.md`,
`03-host-agent.md`, `proxmox-platform.md`, and the Phase 0 / Phase 1-2 findings, grounded `03-host-agent.md`, `proxmox-platform.md`, and the Phase 0 / Phase 1-2 findings, grounded
against the v0.33 source (`deploy-felhom-compose/controller/`). Every finding cites a against the v0.33 source (`deploy-felhom-compose/controller/`). Every finding cites a