This commit is contained in:
2026-06-08 09:15:16 +02:00
parent 8ae6e8abf3
commit bb0a9e7205
4 changed files with 165 additions and 58 deletions
+36 -21
View File
@@ -54,7 +54,7 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| `appexport/` | `.fab` app export/import (config+DB+volumes, AES-256-CTR+scrypt) | **backup** (DB dump), (provider iface → stacks) |
| `assets/` | Download/cache app assets from Hub API | — (HTTP only) |
| `backup/` | DB dumps, Docker-volume archive, **restic**, **cross-drive rsync**, per-app restore, **drive mount**, disk-layout, infra-backup metadata | config, monitor, settings, system, util |
| `cloudflare/` | Geo-restriction via Cloudflare WAF (zone/waf/geosync/countries) | settings |
| `cloudflare/` | Geo-restriction via Cloudflare WAF (zone/waf/geosync/countries)**enforcement → hub** (S4) | settings |
| `config/` | `controller.yaml` schema + load | — |
| `crypto/` | AES-256-GCM for app.yaml secrets | — |
| `integrations/` | App-to-app (OnlyOffice→FileBrowser/Nextcloud) via docker exec / config patch | stacks, crypto, settings |
@@ -88,7 +88,7 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| File | Class | Reason | Risk |
|---|---|---|---|
| `api/router.go` | **PORT/MODIFY** | Keep stacks/deploy/integrations/metrics/sync/assets/selfupdate routes; **remove `/api/storage/*` (disk)**; backup routes become **agent-coordinated guest-backup** requests; `config/apply` (hub-pushes-yaml) changes since the **agent** now injects config at provision. | needs-rework |
| `api/geo.go` | **PORT (blocked)** | Geo is app-domain, but gated on the tunnel-placement decision (doc 01 §7/§11). | blocked |
| `api/geo.go` | **PORT/MODIFY** | Keep the customer-facing geo **preference** endpoints (set/get global + per-app); **drop the Cloudflare-sync trigger** — enforcement → hub (S4). The controller reports geo desired-state up instead of calling the CF API. | needs-rework |
### `appexport/` — KEEP/PORT (Docker-volume + DB level, no disk ops)
| File | Class | Reason | Risk |
@@ -120,15 +120,15 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| `local_infra.go` | **DELETE (→agent)** | Per-drive infra-backup metadata → agent. | clean |
| `restore_scan.go` | **DELETE (→agent)** | Scans drives to build a DR restore plan = agent-tier DR. | needs-rework |
### `cloudflare/` — BLOCKED on tunnel-placement (doc 01 §7/§11)
### `cloudflare/` — DELETE (→hub): CF-API enforcement moves to the hub (S4)
| File | Class | Reason | Risk |
|---|---|---|---|
| `client.go`,`zone.go`,`waf.go`,`geosync.go`,`countries.go` | **PORT (blocked)** | Geo-restriction WAF is app-domain and could stay in the controller, but it shares the Cloudflare account/zone with the **tunnel**, whose host-vs-guest placement is undecided. Classify provisionally PORT; do not force. | blocked |
| `client.go`,`zone.go`,`waf.go`,`geosync.go`,`countries.go` | **DELETE (→hub)** | The **hub** holds the CF API token and reconciles geo desired-state → WAF (doc 01 §5, doc 03 §2). The controller no longer calls the Cloudflare API — it reports geo desired-state up. The customer-facing geo *preference UI/data* stays (see `api/geo.go`). | needs-rework |
### `config/`, `crypto/`, `util/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `config/config.go` | **MODIFY** | Drop `BackupConfig` (restic/retention) and storage-drive keys; keep customer/paths/web/git/stacks/monitoring/hub/assets/system; **add agent local-API endpoint+token**. Self-update section gated (open). | needs-rework |
| `config/config.go` | **MODIFY** | Drop `BackupConfig` (restic/retention), storage-drive keys, and `InfrastructureConfig.cf_api_token` (→hub, S4); keep customer/paths/web/git/stacks/monitoring/hub/assets/system; **add agent local-API endpoint+token**. | needs-rework |
| `crypto/crypto.go` | **KEEP** | App.yaml secret encryption. | clean |
| `util/strings.go` | **KEEP** | Trivial helper. | clean |
@@ -169,16 +169,17 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
| `infra_backup.go`/`_linux.go`/`_other.go` | **DELETE (→agent)** | Builds infra-backup payload (disk layout, restic/enc passwords) for hub. | hazard |
| `infra_pull.go` | **DELETE (→agent)** | Pulls recovery config + infra backup from hub (setup-wizard DR). | needs-rework |
### `selfupdate/` — OPEN (doc 01 §11: "self-update flow not yet designed")
### `selfupdate/` — controller is agent-managed (doc 03 §11)
| File | Class | Reason | Risk |
|---|---|---|---|
| `version.go`,`state.go` | **KEEP** | Semver parse; update audit state. | clean |
| `updater.go` | **PORT (open)** | Pulls image + edits `docker-compose.yml` + `compose up -d`. In the agent model the controller is the **agent's product** (doc 01 §3) — self-update may move under the agent. Flag as open. | blocked |
| `version.go` | **KEEP** | Semver parse / version string (still used for reporting). | clean |
| `state.go` | **DELETE (obsolete)** | Self-update audit state — the agent owns controller updates now (doc 03 §11). | clean |
| `updater.go` | **DELETE (→agent)** | Resolved (doc 03 §11): the controller is **agent-managed** — the agent snapshots → redeploys → health-gates → rolls back the controller. The controller's old self-update path (image pull + compose edit) is **removed**. | clean |
### `settings/`
| File | Class | Reason | Risk |
|---|---|---|---|
| `settings/settings.go` (1101 L) | **MODIFY (split)** | Keep notif prefs, integration state, geo, DB-validation cache, cross-drive *intent*. The **storage-path registry** (`StoragePath` with `Disconnected`/`DisconnectedAt`/`StoppedStacks`/decommission/UUID) is disk-management state → reshape to **per-volume placement** fed by the agent's storage manifest; disconnect/decommission/migrate state leaves. | hazard |
| `settings/settings.go` (1101 L) | **MODIFY (split)** | Keep notif prefs, integration state, geo, DB-validation cache, cross-drive *intent*. The **storage-path registry** (`StoragePath` with `Disconnected`/`DisconnectedAt`/`StoppedStacks`/decommission) is disk-management state → reshape to **per-volume placement** fed by the agent's storage manifest; disconnect/decommission/migrate state leaves. (UUID is *not* a persisted field — runtime-derived from fstab.) | hazard |
### `setup/` — all DELETE (obsolete); the agent provisions the controller
| File | Class | Reason | Risk |
@@ -272,7 +273,7 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
host-info first.**
6. **`settings/StoragePath` carries disk state into an app-domain store.** Disk fields
(`Disconnected`,`DisconnectedAt`,`StoppedStacks`, decommission, UUID) are written by
(`Disconnected`,`DisconnectedAt`,`StoppedStacks`, decommission — UUID is *not* persisted, it's runtime-derived from fstab via `system.ParseFstabUUID`/`watchdog.go`) are written by
`watchdog.go`/`storage_handlers.go`/`crossdrive.go` (all delete) but the same struct is
read by `stacks`/`web` for labels and **placement** (keep). Reshape `StoragePath` to a
placement record fed by the agent manifest.
@@ -327,10 +328,11 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
2. **Per-volume storage placement** (doc 01 §8) — `.felhom.yml` `hot`/`bulk` volume
classification (extend `stacks/metadata.go`), enforcement at deploy (extend
`stacks/deploy.go`), and a placement record in `settings`. Replaces the per-app
HDD-path + cross-drive model.
3. **Self-restore-test orchestration** — controller asks the agent to restore the latest
guest backup to a scratch guest, runs its post-restore health probes, reports the
verdict to the hub. (Backed by the validated Phase 2 round-trip in
HDD-path + cross-drive model. A `bulk` volume must be realized as a `backup=0` mount point,
**never** a rootfs Docker named volume (validated recipe: `phase3-findings.md` B2 / doc 03 §7).
3. **Self-restore-test status display** (read-only) — the **agent owns orchestration** (it
holds the PBS key and creates the scratch guest — operator-tier, doc 03 §8); the controller
only surfaces `GET /restore-test/status` in its UI. (Round-trip validated: Phase 2,
[../proxmox-platform.md](../proxmox-platform.md) §4.)
4. **Snapshot-before-deploy/rollback flow** in the deploy path — wraps the existing
compose deploy with agent snapshot → health check → agent rollback-on-failure
@@ -343,13 +345,12 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
## 6. Open / blocked items
- **`cloudflare/` + `api/geo.go` — blocked on tunnel placement** (doc 01 §7, §11: host vs
guest `cloudflared`). Geo-WAF is app-domain and likely PORT, but it shares the
Cloudflare account/zone with the tunnel; do not finalize until placement is decided.
- **`selfupdate/updater.go` — open** (doc 01 §11: self-update flow undesigned). Because the
controller is "the agent's product" (doc 01 §3), self-update may move under the agent
(snapshot → swap → health-gate → rollback) rather than the controller editing its own
compose file. Provisionally PORT.
- **Geo — resolved (S4):** CF-API **enforcement moves to the hub** (it holds the CF token and
reconciles geo → WAF); the controller keeps the geo **preference UI/data** and reports
desired-state up. Tunnel placement is settled (host, agent-managed, doc 03 §3/§5). The
`cloudflare/` package + `api/geo.go`'s CF-sync are DELETE-from-controller → hub.
- **Self-update — resolved (doc 03 §11):** the controller is agent-managed; its self-update
path is removed.
- **`settings`/`stacks` per-volume reshape** — depends on the storage-manifest contract
between hub ↔ agent ↔ controller (doc 01 §8), not yet specified.
- **Backup UI/report surface** — depends on the agent's guest-backup status API shape
@@ -357,3 +358,17 @@ Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-targe
- **Notification event taxonomy** — which infra events (`storage_disconnected`,
`crossdrive_*`, `disaster_recovery_*`) the **agent** emits vs the controller, once those
responsibilities move.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
- **M1:** removed `UUID` from the `settings.StoragePath` field lists (§ settings, hazard #6) —
it is runtime-derived from fstab, not persisted.
- **S4 (geo):** `cloudflare/` reclassified **PORT(blocked) → DELETE(→hub)** (CF-API enforcement
moves to the hub); `api/geo.go`**PORT/MODIFY** (keep geo *preference* endpoints, drop the
CF-sync trigger); `config/config.go` also drops `cf_api_token`. §6 + §1 updated.
- **S5:** cloudflare/geo no longer "blocked on tunnel placement" (resolved).
- **S6:** §5(3) self-restore-test → **status-display only**; the agent owns orchestration.
- **Self-update resolved (03 §11):** `updater.go`**DELETE(→agent)**, `state.go`
DELETE(obsolete), `version.go` KEEP; §6 + §5(2) updated (bulk = `backup=0` mountpoint recipe).