From dc9ac6084d28c20ba94efc5bafa922b643ac42c8 Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Sun, 7 Jun 2026 22:52:05 +0200 Subject: [PATCH] module map added --- docs/architecture/02-controller-module-map.md | 359 ++++++++++++++++++ 1 file changed, 359 insertions(+) create mode 100644 docs/architecture/02-controller-module-map.md diff --git a/docs/architecture/02-controller-module-map.md b/docs/architecture/02-controller-module-map.md new file mode 100644 index 0000000..e7433b6 --- /dev/null +++ b/docs/architecture/02-controller-module-map.md @@ -0,0 +1,359 @@ +# Felhom Controller Architecture — Part 2: Controller Module Map + +**Status:** audit (keep / port / delete / modify / add), grounded in the v0.33 source. +**Subject:** the v0.33 controller in `deploy-felhom-compose/controller/` (110 `.go` files, +~40 K LOC) audited against [01-topology-and-trust.md](01-topology-and-trust.md) and +[../proxmox-platform.md](../proxmox-platform.md). + +> This is a **planning map, not the port.** No controller code was changed. Source +> citations use `controller/internal/...:line` (a different repo, so links are not +> clickable). Classifications reflect the **target model**: the in-guest controller is +> **Docker-only and holds no Proxmox credentials**; everything host/disk/Proxmox moves to +> a new **host agent** (out of scope here); the controller reaches the agent through a +> constrained **local API**. + +## Classification scheme +**KEEP** (host-agnostic, ~unchanged) · **PORT** (survives, needs rework) · +**DELETE (→agent)** (responsibility moves to the host agent) · +**DELETE (obsolete)** (no longer needed) · **MODIFY** (stays, materially changes) · +**NEW** (no v0.33 equivalent). +Risk tags: **clean** · **needs-rework** · **hazard** (entangles a delete-target with a keep/port target). + +--- + +## 0. Executive summary + +- The **app domain is largely intact and portable**: stack lifecycle (`stacks/`), catalog + git-sync (`sync/`), app-to-app integrations (`integrations/`), `.fab` export/import + (`appexport/`), the scheduler, crypto, asset sync, the hub report/notify *channels*, and + most of the web UI **KEEP/PORT cleanly**. +- The **disk/storage/host half deletes wholesale to the agent**: all of `storage/`, + `monitor/watchdog.go`, the restic/cross-drive/disk-layout/drive-mount parts of `backup/`, + `report/infra_backup*`+`infra_pull`, and the host-physical parts of `system/`. +- The **setup wizard (`setup/`) is obsolete** — the agent provisions the controller. +- **The single biggest hazard is `backup/`**: the keep side (DB dumps, Docker-volume + archive, per-app restore — needed by `appexport/` and the backup UI) and the delete side + (restic, cross-drive, drive-mount) are **interleaved inside the same files** + (`backup.go`, `restore.go`, `paths.go`), not cleanly file-separated. Extracting the + app-data-backup subset into a clean retained package is the critical refactor. +- **Intent-vs-reality corrections** (vs the task's provisional split): `monitor/pinger.go` + is already **dead** (legacy Healthchecks.io, "deprecated… now handled by Hub" per + `main.go`) → DELETE(obsolete), not keep. `backup.go`/`restore.go`/`paths.go` do **not** + split on file boundaries — they split *within* the file. `settings/` is **not** pure app + domain — it stores disk/disconnect/decommission state. `system/` is genuinely + mixed-per-function, not per-file. + +--- + +## 1. v0.33 module inventory (package → purpose, key deps) + +| Package | Purpose | Key internal deps | +|---|---|---| +| `cmd/controller/main.go` | Entry point; wires all subsystems; 6 adapters break import cycles; branches into setup mode | imports **every** package | +| `api/` | REST API (`router.go`) + geo endpoints (`geo.go`) | stacks, backup, metrics, notify, selfupdate, sync, system, assets, integrations, cloudflare, config, settings | +| `appexport/` | `.fab` app export/import (config+DB+volumes, AES-256-CTR+scrypt) | **backup** (DB dump), (provider iface → stacks) | +| `assets/` | Download/cache app assets from Hub API | — (HTTP only) | +| `backup/` | DB dumps, Docker-volume archive, **restic**, **cross-drive rsync**, per-app restore, **drive mount**, disk-layout, infra-backup metadata | config, monitor, settings, system, util | +| `cloudflare/` | Geo-restriction via Cloudflare WAF (zone/waf/geosync/countries) | settings | +| `config/` | `controller.yaml` schema + load | — | +| `crypto/` | AES-256-GCM for app.yaml secrets | — | +| `integrations/` | App-to-app (OnlyOffice→FileBrowser/Nextcloud) via docker exec / config patch | stacks, crypto, settings | +| `metrics/` | SQLite time-series: system + container metrics, log scan | system | +| `monitor/` | App health (`healthcheck`,`pinger`) + **storage/USB watchdog** | config, notify, settings, system | +| `notify/` | Hub event push (direct, own API key) | settings | +| `recovery/` | Generate `recovery-info.txt` (DR guide) | — | +| `report/` | Build+push hub report; **infra-backup payload**; **recovery pull** | backup, config, metrics, monitor, scheduler, settings, stacks, system | +| `scheduler/` | Cron/interval jobs, Budapest TZ | — | +| `selftest/` | Startup checks (docker/dirs/catalog/hub/**restic repos**/mountpoint) | backup, config, settings, system | +| `selfupdate/` | Self-update: pull image, edit compose, `up -d` | config | +| `settings/` | `settings.json` persistent state: **storage paths/disconnect/decommission**, cross-drive cfg, notif prefs, geo, integration state, DB-validation cache | — | +| `setup/` | **First-run wizard** (scan drives, hub-restore, manual config) | backup, config, report, settings, web | +| `stacks/` | Docker Compose lifecycle, deploy + memory validation, metadata (`.felhom.yml`), HDD-data delete | config, crypto, system | +| `storage/` | **Physical disk** scan/format/attach/mount/migrate/fstab/safety | backup, settings, util | +| `sync/` | Catalog git-sync (pull templates) | config | +| `system/` | Resource info: mem/cpu/load (guest) + **temp/disk-model/USB/mount topology (host)** | — | +| `util/` | String helper | — | +| `web/` | Hungarian dashboard: pages, auth, deploy, backup UI, **storage/disk UI**, DR restore UI, export UI, debug | appexport, backup, config, crypto, integrations, monitor, notify, scheduler, selfupdate, settings, stacks, storage, system | + +--- + +## 2. Classification table (per package/file) + +### `cmd/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `cmd/controller/main.go` | **MODIFY** | Wiring stays, but drop the setup-mode branch, the storage/watchdog/drive-migrator/restic/cross-drive/infra-backup wiring, and add the **agent local-API client**. 6 adapters shrink. | hazard | + +### `api/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `api/router.go` | **PORT/MODIFY** | Keep stacks/deploy/integrations/metrics/sync/assets/selfupdate routes; **remove `/api/storage/*` (disk)**; backup routes become **agent-coordinated guest-backup** requests; `config/apply` (hub-pushes-yaml) changes since the **agent** now injects config at provision. | needs-rework | +| `api/geo.go` | **PORT (blocked)** | Geo is app-domain, but gated on the tunnel-placement decision (doc 01 §7/§11). | blocked | + +### `appexport/` — KEEP/PORT (Docker-volume + DB level, no disk ops) +| File | Class | Reason | Risk | +|---|---|---|---| +| `crypto.go` | **KEEP** | Self-contained AES-256-CTR+HMAC+scrypt for `.fab`. | clean | +| `manifest.go`, `provider.go` | **KEEP** | Bundle metadata; provider interface (impl in main). | clean | +| `export.go` | **PORT** | Docker-volume `tar`, DB dump via `backup.DumpOne`, config copy. Depends on the **retained** app-data-backup subset of `backup/`; HDD-mount enumeration reworked to **per-volume placement**. | needs-rework | +| `restore.go` | **PORT** | `docker volume create`/`tar xf`, DB import, compose up. Same per-volume rework. | needs-rework | +| `estimate.go` | **PORT** | `du`/`df` on mounts → per-volume sizing. | clean | + +### `assets/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `syncer.go` | **KEEP** | Hub API download + checksum cache; already a direct hub channel. | clean | + +### `backup/` — THE SPLIT (delete side interleaved with keep side; see §3) +| File | Class | Reason | Risk | +|---|---|---|---| +| `dbdump.go` | **KEEP** | Pure `docker exec pg_dump`/`mariadb-dump` — app/DB data layer; the retained per-app backup. | clean | +| `appdata.go` | **PORT** | App-data discovery (stacks/volumes/DB containers, `du`). "HDD mount" concept → per-volume. | needs-rework | +| `backup.go` (1478 L) | **MODIFY (split)** | Mixes **keep** (`RunDBDumps`, `DumpAppVolumes(Safe)`, app restore) with **delete→agent** (`RunBackup`/`backupDrive`/restic snapshot/prune/check on per-drive repos). Must be torn in two. | hazard | +| `restore.go` (442 L) | **MODIFY (split)** | `RestoreApp` restic path → agent; Docker-volume + Tier-2 rsync restore (app layer) → keep. | hazard | +| `restore_app_linux.go`/`_other.go` | **PORT** | Per-app restore: compose pull/up, rsync app data, DB-dump restore. App layer; depends on backup location that changes. | needs-rework | +| `paths.go` | **MODIFY (split)** | `AppDBDumpPath`/`AppVolumeDumpPath` keep; `Primary/SecondaryResticRepoPath`, `InfraBackupDir` → agent. | needs-rework | +| `restic.go` | **DELETE (→agent)** | restic repos on drives = infra backup tier; agent does vzdump/PBS. | hazard | +| `crossdrive.go` | **DELETE (→agent)** | Tier-2 cross-drive rsync to secondary storage = storage-tier (agent + storage manifest). | hazard | +| `restore_drives_linux.go`/`_other.go` | **DELETE (→agent)** | `lsblk`/`blkid`/`mount`/fstab — pure host disk. | hazard | +| `disk_layout.go` | **DELETE (→agent)** | Disk topology for DR → agent. | clean | +| `local_infra.go` | **DELETE (→agent)** | Per-drive infra-backup metadata → agent. | clean | +| `restore_scan.go` | **DELETE (→agent)** | Scans drives to build a DR restore plan = agent-tier DR. | needs-rework | + +### `cloudflare/` — BLOCKED on tunnel-placement (doc 01 §7/§11) +| File | Class | Reason | Risk | +|---|---|---|---| +| `client.go`,`zone.go`,`waf.go`,`geosync.go`,`countries.go` | **PORT (blocked)** | Geo-restriction WAF is app-domain and could stay in the controller, but it shares the Cloudflare account/zone with the **tunnel**, whose host-vs-guest placement is undecided. Classify provisionally PORT; do not force. | blocked | + +### `config/`, `crypto/`, `util/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `config/config.go` | **MODIFY** | Drop `BackupConfig` (restic/retention) and storage-drive keys; keep customer/paths/web/git/stacks/monitoring/hub/assets/system; **add agent local-API endpoint+token**. Self-update section gated (open). | needs-rework | +| `crypto/crypto.go` | **KEEP** | App.yaml secret encryption. | clean | +| `util/strings.go` | **KEEP** | Trivial helper. | clean | + +### `integrations/` — all KEEP (pure app-domain) +| File | Class | Reason | Risk | +|---|---|---|---| +| `integrations.go`,`lifecycle.go`,`manager.go`,`onlyoffice_filebrowser.go`,`onlyoffice_nextcloud.go` | **KEEP** | App-to-app via `docker exec` / compose-config patch; no host ops. | clean | + +### `metrics/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `store.go`,`logscanner.go`,`telemetry.go`,`types.go` | **KEEP** | SQLite store, `docker logs` scan, container telemetry — app-domain. | clean | +| `collector.go` | **PORT** | Container metrics (`docker stats`) keep; host metrics via `system.GetInfo` (temp, physical disk) become **agent-provided or dropped**. | needs-rework | +| `sysinfo.go`/`sysinfo_other.go` | **MODIFY** | Reads `/host/etc`, `/proc/cpuinfo`, uptime — host static info; in-guest some is meaningful, hardware identity via agent. | needs-rework | + +### `monitor/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `healthcheck.go` | **PORT (split)** | Keep guest health (mem/cpu/docker/protected-containers); host health (temp, **physical disk**, storage-path mount status) becomes **agent-fed**. | needs-rework | +| `pinger.go` | **DELETE (obsolete)** | Legacy Healthchecks.io; `main.go` itself marks it "deprecated… now handled by Hub". *(Corrects the task's KEEP/PORT guess.)* | clean | +| `watchdog.go` (902 L) | **DELETE (→agent)** | Storage/USB disconnect monitoring: `umount -l`, `mount -T /host-fstab`, UUID probing, restic-lock cleanup — pure host storage. | hazard | + +### `notify/`, `recovery/`, `scheduler/`, `selftest/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `notify/notifier.go` | **KEEP/MODIFY** | Direct hub event channel (own API key) — keep; prune infra event types that move to the agent (`storage_disconnected`, `crossdrive_*`, `disaster_recovery_*`). | clean | +| `recovery/info.go` | **DELETE (obsolete)** | Generates a DR text guide (OS install, docker-setup.sh, hub restore UI); DR is now agent+hub provisioning. | clean | +| `scheduler/scheduler.go` | **KEEP** | Generic cron/interval, Budapest TZ. | clean | +| `selftest/selftest.go` | **PORT** | Keep docker/dirs/catalog/hub checks; drop restic-repo + system-data **mountpoint** checks (→agent). | needs-rework | + +### `report/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `pusher.go` | **KEEP** | Direct hub push (`/api/v1/report`, Bearer). | clean | +| `telemetry.go` | **KEEP** | Per-app telemetry section. | clean | +| `builder.go` (326 L) | **MODIFY** | Keep containers/telemetry/stacks/geo/app-health; drop/relocate host system info, physical storage, **restic backup status incl. restic password**. | hazard | +| `types.go` | **MODIFY** | Schema: drop infra fields (`restic password`, physical storage), keep app-domain. | needs-rework | +| `infra_backup.go`/`_linux.go`/`_other.go` | **DELETE (→agent)** | Builds infra-backup payload (disk layout, restic/enc passwords) for hub. | hazard | +| `infra_pull.go` | **DELETE (→agent)** | Pulls recovery config + infra backup from hub (setup-wizard DR). | needs-rework | + +### `selfupdate/` — OPEN (doc 01 §11: "self-update flow not yet designed") +| File | Class | Reason | Risk | +|---|---|---|---| +| `version.go`,`state.go` | **KEEP** | Semver parse; update audit state. | clean | +| `updater.go` | **PORT (open)** | Pulls image + edits `docker-compose.yml` + `compose up -d`. In the agent model the controller is the **agent's product** (doc 01 §3) — self-update may move under the agent. Flag as open. | blocked | + +### `settings/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `settings/settings.go` (1101 L) | **MODIFY (split)** | Keep notif prefs, integration state, geo, DB-validation cache, cross-drive *intent*. The **storage-path registry** (`StoragePath` with `Disconnected`/`DisconnectedAt`/`StoppedStacks`/decommission/UUID) is disk-management state → reshape to **per-volume placement** fed by the agent's storage manifest; disconnect/decommission/migrate state leaves. | hazard | + +### `setup/` — all DELETE (obsolete); the agent provisions the controller +| File | Class | Reason | Risk | +|---|---|---|---| +| `handlers.go`,`setup.go`,`csrf.go`,`network.go` | **DELETE (obsolete)** | First-run wizard (hub-restore, manual config, LAN-IP detection). | needs-rework | +| `scanner.go` | **DELETE (→agent)** | Drive scan (`lsblk`+temp mounts) for backup discovery — host op; its capability informs the agent. | clean | + +### `stacks/` — core app domain (KEEP/PORT) +| File | Class | Reason | Risk | +|---|---|---|---| +| `manager.go` (1074 L) | **KEEP/PORT** | Docker Compose orchestration, scan/state/start/stop/logs — the heart. Minor port. | clean | +| `deploy.go` | **PORT** | Memory validation (`system.GetMemoryMB` — **guest** mem, fine in LXC), secret gen, encrypted app.yaml. **Add snapshot-before-deploy → agent** hook. | needs-rework | +| `healthprobe.go` | **KEEP** | TCP/HTTP app probes. | clean | +| `metadata.go` | **PORT** | `.felhom.yml` parse. **Add per-volume hot/bulk classification** (doc 01 §8). | needs-rework | +| `delete.go` | **PORT** | Stack delete + HDD-data `os.RemoveAll` on bind mounts → per-volume cleanup. | needs-rework | + +### `storage/` — entire package DELETE (→agent) +| File | Class | Reason | Risk | +|---|---|---|---| +| `scan*`,`format*`,`attach*`,`migrate*`,`migrate_drive*`,`safety*` | **DELETE (→agent)** | Physical disk: `lsblk`/`sfdisk`/`wipefs`/`mkfs.ext4`/`partprobe`/`mount`/`umount`/fstab/`blkid`/drive-rsync. The agent owns all of this (doc 01 §3, §8). | hazard | + +### `sync/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `sync/sync.go` | **KEEP** | Catalog git-sync (clone/fetch/reset, copy compose+`.felhom.yml`, never overwrite app.yaml). | clean | + +### `system/` — split per-function (not per-file) +| File | Class | Reason | Risk | +|---|---|---|---| +| `cpu_linux.go`/`cpu_other.go` | **KEEP** | `/proc/stat` works inside an LXC. | clean | +| `info.go`/`info_other.go` | **KEEP** | Structs/stubs. | clean | +| `info_linux.go` | **MODIFY (split)** | Keep mem (`/proc/meminfo`)/load/statfs (guest); **temp via `/host/sys`, hwmon → agent**. | needs-rework | +| `mounts_linux.go`/`mounts_other.go` | **DELETE (→agent)** mostly | Mount-point detection, USB, disk model, fstab, probe — host/disk. Guest-meaningful `statfs` disk-usage is the only keep-candidate → fold into the kept `info`. | hazard | + +### `web/` — split by UI surface +| File | Class | Reason | Risk | +|---|---|---|---| +| `auth.go`,`csrf.go`,`logbuffer.go`,`embed.go`,`templates.go` | **KEEP** | Session/CSRF, log ring buffer, embeds/logo. | clean | +| `funcmap.go` | **KEEP/PORT** | Template helpers; a few backup/state labels track the backup rework. | clean | +| `server.go` (559 L) | **MODIFY** | Routing/wiring; remove storage/DR-restore/watchdog wiring; keep app/deploy/backup/settings/export/debug. | needs-rework | +| `handlers.go` (1883 L) | **PORT/MODIFY** | Core pages keep; the embedded **storage-path management** (add/remove/label/schedulable, storage bars, FileBrowser mount sync) → per-volume / agent-fed. | hazard | +| `handler_export.go` | **KEEP/PORT** | `.fab` UI. | clean | +| `handler_debug.go` (823 L) | **PORT** | Drop storage-simulate/infra-push/DR debug; keep the rest. | needs-rework | +| `alerts.go` | **PORT/MODIFY** | Storage-disconnect alert now sourced from **agent** status; backup/update alerts keep. | needs-rework | +| `handler_restore.go` | **DELETE (→agent) / MODIFY** | DR restore-mode UI; DR is agent-tier — replace with an agent-status view or remove. | needs-rework | +| `storage_handlers.go` (1600 L) | **DELETE (→agent)** | Format/attach/mount/disconnect/migrate-drive/decommission disk UI. Any survivor is a **thin client calling the agent API** (e.g. per-volume placement requests). | hazard | +| `templates/` (HTML, non-Go) | **PORT** | Remove disk-wizard + DR pages; keep app/deploy/backup/settings pages. | needs-rework | + +### `scripts/` +| File | Class | Reason | Risk | +|---|---|---|---| +| `scripts/hashpass.go` | **KEEP** | Standalone bcrypt helper. | clean | + +--- + +## 3. Coupling hazards (delete-targets depended on by keep/port) + +1. **`backup/` is half-deleted but split *inside files*, not across them.** `backup.go` + contains both `RunDBDumps`/`DumpAppVolumesSafe`/app-restore (keep) and + `RunBackup`/`backupDrive` + restic (delete→agent); `restore.go` and `paths.go` are + likewise mixed. **Keep/port consumers reach into this same package:** + - `appexport/export.go:295` → `backup.DiscoverDatabases`/`DumpOne` (DB dump is app-layer — must survive) + - `report/builder.go:buildBackupReport` → backup status (MODIFY) + - `web/handlers.go` (backups page, `buildAppBackupRows`), `web/funcmap.go`, `web/alerts.go`, `web/handler_restore.go`, `web/handler_debug.go` + - `selftest/selftest.go:217` → `checkResticRepos` (restic path — delete) + - `main.go` scheduler chain `RunFullBackup` (DB→volume→restic→infra-push) interleaves both sides. + **Action:** extract the app-data-backup subset (DB dump, volume archive, per-app + restore) into a clean retained package *before* deleting the restic/cross-drive code, + or every keep consumer breaks. + +2. **`backup/crossdrive.go` (delete→agent) is wired as `crossDriveRunner` into** + `main.go`, `api/router.go`, `web/server.go`, and surfaced by `report/builder.go` and the + backups page. Removing it requires reworking the backup UI/report to the agent's + guest-backup status. + +3. **`storage/` (delete→agent) depended on by keep/port UI:** `web/storage_handlers.go` + (delete) and `web/server.go`/`web/handlers.go` (port) — the latter renders storage + labels/bars and runs **FileBrowser mount sync** off the storage-path registry. + `storage/migrate*.go` also imports `backup` (also being split). Untangle the per-volume + placement UI from the disk-management UI. + +4. **`monitor/watchdog.go` (delete→agent) depended on by** `web/alerts.go` (port), + `web/server.go`, `web/handler_debug.go`, `main.go`. The disconnect **alert** must instead + consume agent-reported storage status. + +5. **`system/` mixed-per-function, consumed by both sides.** Keep consumers — + `stacks/deploy.go` (`GetMemoryMB`, guest), `metrics/collector.go` (container) — must not + drag in the host-disk/temp/USB code that goes to the agent (`mounts_linux.go`, + `info_linux.go` temp). Also consumed by `report/builder.go` (MODIFY), `monitor/healthcheck.go` + (PORT), `selftest`, `crossdrive` (delete). **Split `system/` cleanly into guest-info vs + host-info first.** + +6. **`settings/StoragePath` carries disk state into an app-domain store.** Disk fields + (`Disconnected`,`DisconnectedAt`,`StoppedStacks`, decommission, UUID) are written by + `watchdog.go`/`storage_handlers.go`/`crossdrive.go` (all delete) but the same struct is + read by `stacks`/`web` for labels and **placement** (keep). Reshape `StoragePath` to a + placement record fed by the agent manifest. + +7. **`report/builder.go` imports almost everything** (backup, monitor, scheduler, stacks, + system, metrics, settings, config). Its MODIFY must land *after* the backup and system + splits, or it pulls deleted code along. + +8. **`backup/paths.go` shared both ways** — `appexport` + `selftest` + the kept DB-dump + flow use the app-dump path helpers; the same file holds the restic/secondary helpers + that leave. + +9. **DR/provisioning chain is cross-cut:** `setup/` (obsolete) → `report/infra_pull` + + `recovery/info` + `backup.MountDrivesFromLayout` + `backup.ReadLocalInfraBackup`. All + obsolete/→agent, but `main.go`'s setup branch and `web/handler_restore.go` reference + them; remove together. + +--- + +## 4. Moves to the host agent (consolidated — feeds the future agent design) + +> Reporting only; **not** designing the agent here. + +- **All physical-disk management** — `storage/` in full: scan/classify, format + (`wipefs`/`sfdisk`/`mkfs.ext4`/`partprobe`), attach (raw mount + bind + fstab), per-app + and full-drive migration (rsync), safety checks (system-disk detection). +- **Storage/USB watchdog** — `monitor/watchdog.go`: disconnect/reconnect detection, + `umount -l`, `mount -T /host-fstab`, UUID-by-id probing, safe-disconnect, restic-lock + cleanup. +- **Infra/disk backup tier** — `backup/restic.go`, `crossdrive.go`, + `restore_drives_*`, `disk_layout.go`, `local_infra.go`, `restore_scan.go`, plus the + restic-snapshot half of `backup.go`, the restic-restore half of `restore.go`, and the + restic/secondary path helpers in `paths.go`. (Maps to the agent's `vzdump`→tiers→PBS in + doc 01 §8.) +- **Infra-backup payload + recovery pull** — `report/infra_backup*`, `report/infra_pull`. +- **Host-physical telemetry** — `system/mounts_linux.go` (mount topology, USB, disk + model), the temp/hwmon parts of `system/info_linux.go`, and the host-hardware parts of + `metrics/sysinfo.go`. +- **Drive scanning for provisioning/DR** — `setup/scanner.go`. +- **Self-restore-test execution** — the agent performs the restore-to-scratch-guest; the + controller only orchestrates/validates (see §5). + +--- + +## 5. New components to build (no v0.33 equivalent) + +1. **Agent local-API client** — the controller's only path to guest-level Proxmox + operations (doc 01 §3, §5): `snapshot-before-deploy` + rollback, "grow my RAM", request + guest backup/restore, read the storage manifest / mount placement, query per-target + storage status. Replaces the deleted direct host/disk code with constrained RPC. The + controller holds **no Proxmox creds** — only a local-API token. +2. **Per-volume storage placement** (doc 01 §8) — `.felhom.yml` `hot`/`bulk` volume + classification (extend `stacks/metadata.go`), enforcement at deploy (extend + `stacks/deploy.go`), and a placement record in `settings`. Replaces the per-app + HDD-path + cross-drive model. +3. **Self-restore-test orchestration** — controller asks the agent to restore the latest + guest backup to a scratch guest, runs its post-restore health probes, reports the + verdict to the hub. (Backed by the validated Phase 2 round-trip in + [../proxmox-platform.md](../proxmox-platform.md) §4.) +4. **Snapshot-before-deploy/rollback flow** in the deploy path — wraps the existing + compose deploy with agent snapshot → health check → agent rollback-on-failure + (doc 01 §9). New behaviour on top of `stacks/deploy.go` + `stacks/healthprobe.go`. +5. **Agent-provisioning bootstrap receiver** — the controller accepts its injected hub API + key + local-API token from the agent at provision time (doc 01 §6), replacing the + deleted `setup/` wizard. + +--- + +## 6. Open / blocked items + +- **`cloudflare/` + `api/geo.go` — blocked on tunnel placement** (doc 01 §7, §11: host vs + guest `cloudflared`). Geo-WAF is app-domain and likely PORT, but it shares the + Cloudflare account/zone with the tunnel; do not finalize until placement is decided. +- **`selfupdate/updater.go` — open** (doc 01 §11: self-update flow undesigned). Because the + controller is "the agent's product" (doc 01 §3), self-update may move under the agent + (snapshot → swap → health-gate → rollback) rather than the controller editing its own + compose file. Provisionally PORT. +- **`settings`/`stacks` per-volume reshape** — depends on the storage-manifest contract + between hub ↔ agent ↔ controller (doc 01 §8), not yet specified. +- **Backup UI/report surface** — depends on the agent's guest-backup status API shape + (what the controller can see about vzdump/PBS state) — undefined. +- **Notification event taxonomy** — which infra events (`storage_disconnected`, + `crossdrive_*`, `disaster_recovery_*`) the **agent** emits vs the controller, once those + responsibilities move.