diff --git a/CONTEXT.md b/CONTEXT.md index e64b411..506e6e2 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -13,6 +13,21 @@ Last updated: 2026-06-12 (storage UX polish) > is tracked in `CHANGELOG.md`, `controller/README.md`, and the auto-memory `MEMORY.md`. Live version: > **v0.45.0**. > +> **2026-06-13 — v0.52.0 Phase 1 GATE: deploy-side double-nest fix (catalog) + path-agreement test:** +> - The `felhom-data` double-nest lived in the **app-catalog compose templates** +> (`${HDD_PATH}/felhom-data/appdata/`), not in `deploy.go`. On a Model-A in-guest drive the mount +> already IS the `felhom-data` namespace, so it double-nested on disk while the v0.51.0 backup helpers +> resolved single-nested → divergence. Fixed all four HDD templates (romm, nextcloud, immich, +> paperless-ngx) → `${HDD_PATH}/appdata/`. +> - New `internal/stacks/hddpath_agreement_test.go` locks deploy-resolver (`ParseComposeHDDMounts`) == +> backup helper (`AppDataDir(NamespaceRoot(.,true))`). No controller runtime change → no image rebuild +> (deployed stays 0.51.0, functionally current; golden not rebaked for a no-op). +> - **Live (guest 9201):** git-sync auto-delivered the fix to all four stack files; RomM migrated +> (stop→move→verify→redeploy) from `/mnt/felhom-usb/felhom-data/appdata/romm` → +> `/mnt/felhom-usb/appdata/romm`, healthy + HTTP 200, no data loss, old namespace empty. **GATE PASSED.** +> - Next: Phase 2 (per-app recovery unit), Phase 3 (auto-enabled off-drive Tier 2 w/ rootfs-headroom +> guard), Phase 4 (FileBrowser scoping + deploy-UI DB-on-SSD note + monitoring sort). +> > **2026-06-12 — storage UX polish (v0.45.0), pairs with felhom-agent v0.24.0:** > - **Agent eject role-gate (Part A, felhom-agent v0.24.0):** `POST /disks/eject` now refuses to > unmount system/backup storage *at the agent* (fail-safe to protected on ambiguity) — the UI hiding diff --git a/REPORT.md b/REPORT.md index 741748d..c1fe0e4 100644 --- a/REPORT.md +++ b/REPORT.md @@ -1,43 +1,60 @@ -# REPORT — felhom-controller v0.51.0 +# REPORT — felhom-controller v0.52.0 (Phase 1 GATE: deploy-side double-nest fix) -Offsite-backup UI (felhom-pbs = real DR) + Model-A double-nest fix. Pairs with felhom-agent v0.28.0 -(whole-guest backup re-targeted to the offsite PBS tier). Live-deployed in guest 9201 on demo-felhom. +Completes the Model-A double-nest reconciliation deferred in v0.48.0 and half-fixed in v0.51.0. +v0.51.0 fixed the **backup-helper** side (`NamespaceRoot` provenance); this slice fixes the +**deploy/compose** side and locks the two together. Validated live on guest 9201 (demo-felhom). -## Backups page — whole-guest backup shown as real DR -- `backupTargetLabel` returns **"Biztonsági szerver – külön hardver (PBS)"** for a PBS-stored backup - (detected via `backupIsPBS` on the target id / archive volid), so the customer sees the backup - survives a host hardware failure. -- The app-data section's **"Távoli mentés"** card stops reading "nincs beállítva": new - `guestBackupView.Offsite` flag drives it to **"külön hardveren (PBS)"** with a ✓ when the whole-guest - backup landed on PBS. -- The restore-test "Visszaállítás ellenőrizve" trust signal is unchanged (already wired). -- Live: agent `/backup/status` reports `target_id=felhom-pbs`; `/restore-test/status` reports - `pass:true, verified:"boot+running", source_tier:"pbs"` → the page renders the PBS label, the offsite - card, and verified-restorable. +This is the GATE for the larger per-app-recovery-unit / Tier-2 slice — Phases 2–4 build on a proven +single-nested, agreeing path layout. Per plan: Phase 1 shipped + validated first. -## Model-A double-nest fix -- Under slice-10 Model A the host agent binds `/felhom-data` onto the guest mountpoint, so an - enrolled drive's in-guest mount IS the felhom-data namespace root (basename need not be `felhom-data`, - e.g. `/mnt/felhom-usb`). The backup path helpers were re-prepending `felhom-data`, producing - `.../felhom-data/felhom-data/...` on the host (confirmed live: `/mnt/felhom-usb/felhom-data/felhom-data/...`). -- `appbackup` path helpers now take a **namespace ROOT** (no internal `felhom-data` join) plus a new - `NamespaceRoot(drivePath, inGuestDrive)`. `backup.Manager.namespaceRoot`/`AppNamespaceRoot` resolve - provenance (`drivePath != systemDataPath` ⟺ a registered in-guest drive → namespace root as-is; the - SSD-only `systemDataPath` fallback appends `felhom-data`). -- All parallel constructions updated coherently so writes, deletion (`GetStackBackupData`, - `RemoveStack` backups-base + `ProtectedHDDPaths` — legacy double-nest dirs KEPT protected), the - wipe-warning secondary scan, and export all agree. `api.router` passes the namespace root across the - package boundary. Result: a drive-resident app's DB-dump lands single-nested at `/backups/...` - in-guest = `/felhom-data/backups/...` on the host. -- New `appbackup` test asserts no doubled `felhom-data` segment for an in-guest drive and exactly one - for the system fallback. Full `go build ./...` + tests green. +## Root cause (corrected from the spec's assumption) +The extra `felhom-data` segment was **not** built in controller `deploy.go` — `DeployStack` passes +`HDD_PATH` through verbatim, and the deploy storage dropdown only ever offers registered in-guest drives +(`GetSchedulableStoragePaths`), never the system-data path. The segment was hardcoded in the **app-catalog +compose templates** as `${HDD_PATH}/felhom-data/appdata/`. On a Model-A in-guest drive the guest +mount `/mnt/` already IS the host's `/felhom-data` namespace, so that segment double-nested +to `/felhom-data/felhom-data/appdata/` on disk — diverging from where the (v0.51.0) backup +helpers look: `AppDataDir(NamespaceRoot(HDD_PATH,true))` = `/mnt//appdata/`, single-nested. -## Decommission (P3) — NO controller change -- Permanent decommission is operator-signature-gated (never customer-confirmable), so it is wired - entirely agent-side (hub jobs-queue → signed-jobs runner). The controller deliberately exposes no - decommission UI. (felhom-agent v0.28.0.) +## Changes +- **Catalog (`app-catalog-felhom.eu`, the behavioral fix):** exhaustive `grep` confirmed exactly four HDD + templates carried the segment — `romm`, `nextcloud`, `immich`, `paperless-ngx`. All changed + `${HDD_PATH}/felhom-data/appdata/` → `${HDD_PATH}/appdata/` (volume mounts + header comments). +- **Controller (test-only, no runtime change):** `internal/stacks/hddpath_agreement_test.go` resolves a + compose's `${HDD_PATH}` bind mounts via the real deploy-side `ParseComposeHDDMounts` and asserts they + are byte-identical to the backup-side `AppDataDir(NamespaceRoot(HDD_PATH,true))` — no doubled + `felhom-data`, deploy↔backup locked so they can't drift again. `go test ./internal/stacks/...` and + `./internal/appbackup/...` pass; `go build ./...` clean. +- **No controller image rebuild.** The controller passes `HDD_PATH` through unchanged and already + resolved the single-nested path since v0.51.0, so no runtime change was needed. The controller in guest + 9201 stays `:0.51.0` (functionally current); v0.52.0 marks the catalog gate + test and rolls into the + next image build (Phase 2). The controller is golden-image/bootstrap-managed — not rebaked for a no-op. -## Live deploy -- `gitea.dooplex.hu/admin/felhom-controller:0.51.0` running + healthy in guest 9201 (bootstrap-launched - via `/etc/felhom-controller-image`; prior 0.50.0). Startup clean (catalog sync, health ok, - FileBrowser mounts synced). +## Live validation (guest 9201, demo-felhom, root@felhom-pve) +- **Only one live HDD app:** RomM (the others aren't drive-deployed). `HDD_PATH=/mnt/felhom-usb`; data was + at the doubled `/mnt/felhom-usb/felhom-data/appdata/romm`. +- **Catalog fix delivered by the real mechanism:** the controller's periodic git-sync logged + *"Sablonok frissítve — frissítve: immich, nextcloud, paperless-ngx, romm"* and updated all four + on-disk stack files to `${HDD_PATH}/appdata/` (single-nest) — confirmed in + `/opt/docker/stacks/*/docker-compose.yml`. +- **RomM migrated (stop → move → verify → redeploy, ordered, reversible `mv`, never delete-then-move):** + captured RomM's runtime env from the running containers into a guest-only temp file (secrets never left + the guest), `docker compose down` (named DB/redis volumes preserved), moved + `/mnt/felhom-usb/felhom-data/appdata/romm` → `/mnt/felhom-usb/appdata/romm`, verified file count + unchanged, then recreated with the fixed compose + captured env. +- **Result:** RomM binds the single-nest paths + (`/mnt/felhom-usb/appdata/romm/{library,resources}`), all three containers healthy, DB connected (the + captured creds worked), in-guest `HTTP/1.1 200 OK`, *"Application startup complete"*. Old namespace + `/mnt/felhom-usb/felhom-data/appdata/` confirmed empty; backups already single-nested at + `/mnt/felhom-usb/backups/primary`. Controller untouched & healthy throughout. + +## Gate outcome — PASSED +A drive-app lands single-nested AND the backup helpers resolve the identical path — proven live (not +REPORT-only): the deploy-resolver and the backup helper agree by test and by the live RomM binds, the +catalog fix propagated via real git-sync, and no doubled `felhom-data` remains. Cleared to start Phase 2. + +## Not done (intentionally deferred to Phases 2–4) +Per-app recovery unit (`backup/`), Tier 2 off-drive copy (auto-enabled, durable-id target, rootfs +headroom guard), secret preserve-vs-regenerate classing, FileBrowser scoping, deploy-UI DB-on-SSD note, +monitoring storage sort/descriptions. The README's backup-paths section still describes the stale +restic/secondary layout — to be rewritten when Tier 2 is built. diff --git a/controller/README.md b/controller/README.md index 0e244fa..874ada7 100644 --- a/controller/README.md +++ b/controller/README.md @@ -328,8 +328,16 @@ The nightly backup has two phases that run sequentially. All paths are **per-dri └── media/ ← user files (not controller-managed) ``` -> **Note:** `HDD_PATH` env var in `app.yaml` is still the mount point (e.g., `/mnt/hdd_1`). The `felhom-data` segment is embedded in path helpers — not in `HDD_PATH`. -> Pre-v0.26.0 installations use `/appdata/` and `/backups/` directly (no `felhom-data/` namespace). +> **Note (Model A — corrected in v0.52.0):** `HDD_PATH` in `app.yaml` is the **in-guest mount point** +> (e.g., `/mnt/felhom-usb`). Under slice-10 Model A the host agent binds `/felhom-data` directly +> onto that mount, so the in-guest mount **already is** the `felhom-data` namespace root. Neither the +> compose templates nor the path helpers add a `felhom-data` segment for a drive-resident app: app data +> is `${HDD_PATH}/appdata/` and backups `${HDD_PATH}/backups/...`, **single-nested**. Only the +> SSD-only system-data fallback (a bare root, `inGuestDrive=false`) appends `felhom-data`. See +> `NamespaceRoot(drivePath, inGuestDrive)` in `internal/appbackup/paths.go`. +> Earlier catalog templates used `${HDD_PATH}/felhom-data/appdata/`, which double-nested to +> `.../felhom-data/felhom-data/...` on a Model-A drive; v0.52.0 dropped that segment in the catalog and +> locks deploy↔backup path agreement with `internal/stacks/hddpath_agreement_test.go`. Path computation is centralized in `backup/paths.go` via the `FelhomDataDir = "felhom-data"` constant: - `PrimaryResticRepoPath(drivePath)` → `/felhom-data/backups/primary/restic/`