docs: Phase 1 gate — REPORT/CONTEXT/README for v0.52.0 double-nest fix

REPORT.md overwritten with the Phase-1 gate run (catalog template fix + agreement
test + live RomM migration on guest 9201, gate PASSED). CONTEXT.md dated entry.
README HDD_PATH/felhom-data convention note corrected for Model-A single-nesting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 09:38:42 +02:00
parent 2da23462c9
commit 5eb25c3861
3 changed files with 80 additions and 40 deletions
+15
View File
@@ -13,6 +13,21 @@ Last updated: 2026-06-12 (storage UX polish)
> is tracked in `CHANGELOG.md`, `controller/README.md`, and the auto-memory `MEMORY.md`. Live version: > is tracked in `CHANGELOG.md`, `controller/README.md`, and the auto-memory `MEMORY.md`. Live version:
> **v0.45.0**. > **v0.45.0**.
> >
> **2026-06-13 — v0.52.0 Phase 1 GATE: deploy-side double-nest fix (catalog) + path-agreement test:**
> - The `felhom-data` double-nest lived in the **app-catalog compose templates**
> (`${HDD_PATH}/felhom-data/appdata/<app>`), not in `deploy.go`. On a Model-A in-guest drive the mount
> already IS the `felhom-data` namespace, so it double-nested on disk while the v0.51.0 backup helpers
> resolved single-nested → divergence. Fixed all four HDD templates (romm, nextcloud, immich,
> paperless-ngx) → `${HDD_PATH}/appdata/<app>`.
> - New `internal/stacks/hddpath_agreement_test.go` locks deploy-resolver (`ParseComposeHDDMounts`) ==
> backup helper (`AppDataDir(NamespaceRoot(.,true))`). No controller runtime change → no image rebuild
> (deployed stays 0.51.0, functionally current; golden not rebaked for a no-op).
> - **Live (guest 9201):** git-sync auto-delivered the fix to all four stack files; RomM migrated
> (stop→move→verify→redeploy) from `/mnt/felhom-usb/felhom-data/appdata/romm` →
> `/mnt/felhom-usb/appdata/romm`, healthy + HTTP 200, no data loss, old namespace empty. **GATE PASSED.**
> - Next: Phase 2 (per-app recovery unit), Phase 3 (auto-enabled off-drive Tier 2 w/ rootfs-headroom
> guard), Phase 4 (FileBrowser scoping + deploy-UI DB-on-SSD note + monitoring sort).
>
> **2026-06-12 — storage UX polish (v0.45.0), pairs with felhom-agent v0.24.0:** > **2026-06-12 — storage UX polish (v0.45.0), pairs with felhom-agent v0.24.0:**
> - **Agent eject role-gate (Part A, felhom-agent v0.24.0):** `POST /disks/eject` now refuses to > - **Agent eject role-gate (Part A, felhom-agent v0.24.0):** `POST /disks/eject` now refuses to
> unmount system/backup storage *at the agent* (fail-safe to protected on ambiguity) — the UI hiding > unmount system/backup storage *at the agent* (fail-safe to protected on ambiguity) — the UI hiding
+55 -38
View File
@@ -1,43 +1,60 @@
# REPORT — felhom-controller v0.51.0 # REPORT — felhom-controller v0.52.0 (Phase 1 GATE: deploy-side double-nest fix)
Offsite-backup UI (felhom-pbs = real DR) + Model-A double-nest fix. Pairs with felhom-agent v0.28.0 Completes the Model-A double-nest reconciliation deferred in v0.48.0 and half-fixed in v0.51.0.
(whole-guest backup re-targeted to the offsite PBS tier). Live-deployed in guest 9201 on demo-felhom. v0.51.0 fixed the **backup-helper** side (`NamespaceRoot` provenance); this slice fixes the
**deploy/compose** side and locks the two together. Validated live on guest 9201 (demo-felhom).
## Backups page — whole-guest backup shown as real DR This is the GATE for the larger per-app-recovery-unit / Tier-2 slice — Phases 24 build on a proven
- `backupTargetLabel` returns **"Biztonsági szerver külön hardver (PBS)"** for a PBS-stored backup single-nested, agreeing path layout. Per plan: Phase 1 shipped + validated first.
(detected via `backupIsPBS` on the target id / archive volid), so the customer sees the backup
survives a host hardware failure.
- The app-data section's **"Távoli mentés"** card stops reading "nincs beállítva": new
`guestBackupView.Offsite` flag drives it to **"külön hardveren (PBS)"** with a ✓ when the whole-guest
backup landed on PBS.
- The restore-test "Visszaállítás ellenőrizve" trust signal is unchanged (already wired).
- Live: agent `/backup/status` reports `target_id=felhom-pbs`; `/restore-test/status` reports
`pass:true, verified:"boot+running", source_tier:"pbs"` → the page renders the PBS label, the offsite
card, and verified-restorable.
## Model-A double-nest fix ## Root cause (corrected from the spec's assumption)
- Under slice-10 Model A the host agent binds `<drive>/felhom-data` onto the guest mountpoint, so an The extra `felhom-data` segment was **not** built in controller `deploy.go``DeployStack` passes
enrolled drive's in-guest mount IS the felhom-data namespace root (basename need not be `felhom-data`, `HDD_PATH` through verbatim, and the deploy storage dropdown only ever offers registered in-guest drives
e.g. `/mnt/felhom-usb`). The backup path helpers were re-prepending `felhom-data`, producing (`GetSchedulableStoragePaths`), never the system-data path. The segment was hardcoded in the **app-catalog
`.../felhom-data/felhom-data/...` on the host (confirmed live: `/mnt/felhom-usb/felhom-data/felhom-data/...`). compose templates** as `${HDD_PATH}/felhom-data/appdata/<app>`. On a Model-A in-guest drive the guest
- `appbackup` path helpers now take a **namespace ROOT** (no internal `felhom-data` join) plus a new mount `/mnt/<drive>` already IS the host's `<drive>/felhom-data` namespace, so that segment double-nested
`NamespaceRoot(drivePath, inGuestDrive)`. `backup.Manager.namespaceRoot`/`AppNamespaceRoot` resolve to `<drive>/felhom-data/felhom-data/appdata/<app>` on disk — diverging from where the (v0.51.0) backup
provenance (`drivePath != systemDataPath` ⟺ a registered in-guest drive → namespace root as-is; the helpers look: `AppDataDir(NamespaceRoot(HDD_PATH,true))` = `/mnt/<drive>/appdata/<app>`, single-nested.
SSD-only `systemDataPath` fallback appends `felhom-data`).
- All parallel constructions updated coherently so writes, deletion (`GetStackBackupData`,
`RemoveStack` backups-base + `ProtectedHDDPaths` — legacy double-nest dirs KEPT protected), the
wipe-warning secondary scan, and export all agree. `api.router` passes the namespace root across the
package boundary. Result: a drive-resident app's DB-dump lands single-nested at `<drive>/backups/...`
in-guest = `<drive>/felhom-data/backups/...` on the host.
- New `appbackup` test asserts no doubled `felhom-data` segment for an in-guest drive and exactly one
for the system fallback. Full `go build ./...` + tests green.
## Decommission (P3) — NO controller change ## Changes
- Permanent decommission is operator-signature-gated (never customer-confirmable), so it is wired - **Catalog (`app-catalog-felhom.eu`, the behavioral fix):** exhaustive `grep` confirmed exactly four HDD
entirely agent-side (hub jobs-queue → signed-jobs runner). The controller deliberately exposes no templates carried the segment — `romm`, `nextcloud`, `immich`, `paperless-ngx`. All changed
decommission UI. (felhom-agent v0.28.0.) `${HDD_PATH}/felhom-data/appdata/<app>``${HDD_PATH}/appdata/<app>` (volume mounts + header comments).
- **Controller (test-only, no runtime change):** `internal/stacks/hddpath_agreement_test.go` resolves a
compose's `${HDD_PATH}` bind mounts via the real deploy-side `ParseComposeHDDMounts` and asserts they
are byte-identical to the backup-side `AppDataDir(NamespaceRoot(HDD_PATH,true))` — no doubled
`felhom-data`, deploy↔backup locked so they can't drift again. `go test ./internal/stacks/...` and
`./internal/appbackup/...` pass; `go build ./...` clean.
- **No controller image rebuild.** The controller passes `HDD_PATH` through unchanged and already
resolved the single-nested path since v0.51.0, so no runtime change was needed. The controller in guest
9201 stays `:0.51.0` (functionally current); v0.52.0 marks the catalog gate + test and rolls into the
next image build (Phase 2). The controller is golden-image/bootstrap-managed — not rebaked for a no-op.
## Live deploy ## Live validation (guest 9201, demo-felhom, root@felhom-pve)
- `gitea.dooplex.hu/admin/felhom-controller:0.51.0` running + healthy in guest 9201 (bootstrap-launched - **Only one live HDD app:** RomM (the others aren't drive-deployed). `HDD_PATH=/mnt/felhom-usb`; data was
via `/etc/felhom-controller-image`; prior 0.50.0). Startup clean (catalog sync, health ok, at the doubled `/mnt/felhom-usb/felhom-data/appdata/romm`.
FileBrowser mounts synced). - **Catalog fix delivered by the real mechanism:** the controller's periodic git-sync logged
*"Sablonok frissítve — frissítve: immich, nextcloud, paperless-ngx, romm"* and updated all four
on-disk stack files to `${HDD_PATH}/appdata/<app>` (single-nest) — confirmed in
`/opt/docker/stacks/*/docker-compose.yml`.
- **RomM migrated (stop → move → verify → redeploy, ordered, reversible `mv`, never delete-then-move):**
captured RomM's runtime env from the running containers into a guest-only temp file (secrets never left
the guest), `docker compose down` (named DB/redis volumes preserved), moved
`/mnt/felhom-usb/felhom-data/appdata/romm``/mnt/felhom-usb/appdata/romm`, verified file count
unchanged, then recreated with the fixed compose + captured env.
- **Result:** RomM binds the single-nest paths
(`/mnt/felhom-usb/appdata/romm/{library,resources}`), all three containers healthy, DB connected (the
captured creds worked), in-guest `HTTP/1.1 200 OK`, *"Application startup complete"*. Old namespace
`/mnt/felhom-usb/felhom-data/appdata/` confirmed empty; backups already single-nested at
`/mnt/felhom-usb/backups/primary`. Controller untouched & healthy throughout.
## Gate outcome — PASSED
A drive-app lands single-nested AND the backup helpers resolve the identical path — proven live (not
REPORT-only): the deploy-resolver and the backup helper agree by test and by the live RomM binds, the
catalog fix propagated via real git-sync, and no doubled `felhom-data` remains. Cleared to start Phase 2.
## Not done (intentionally deferred to Phases 24)
Per-app recovery unit (`backup/<app>`), Tier 2 off-drive copy (auto-enabled, durable-id target, rootfs
headroom guard), secret preserve-vs-regenerate classing, FileBrowser scoping, deploy-UI DB-on-SSD note,
monitoring storage sort/descriptions. The README's backup-paths section still describes the stale
restic/secondary layout — to be rewritten when Tier 2 is built.
+10 -2
View File
@@ -328,8 +328,16 @@ The nightly backup has two phases that run sequentially. All paths are **per-dri
└── media/ ← user files (not controller-managed) └── media/ ← user files (not controller-managed)
``` ```
> **Note:** `HDD_PATH` env var in `app.yaml` is still the mount point (e.g., `/mnt/hdd_1`). The `felhom-data` segment is embedded in path helpers — not in `HDD_PATH`. > **Note (Model A — corrected in v0.52.0):** `HDD_PATH` in `app.yaml` is the **in-guest mount point**
> Pre-v0.26.0 installations use `<drive>/appdata/` and `<drive>/backups/` directly (no `felhom-data/` namespace). > (e.g., `/mnt/felhom-usb`). Under slice-10 Model A the host agent binds `<drive>/felhom-data` directly
> onto that mount, so the in-guest mount **already is** the `felhom-data` namespace root. Neither the
> compose templates nor the path helpers add a `felhom-data` segment for a drive-resident app: app data
> is `${HDD_PATH}/appdata/<app>` and backups `${HDD_PATH}/backups/...`, **single-nested**. Only the
> SSD-only system-data fallback (a bare root, `inGuestDrive=false`) appends `felhom-data`. See
> `NamespaceRoot(drivePath, inGuestDrive)` in `internal/appbackup/paths.go`.
> Earlier catalog templates used `${HDD_PATH}/felhom-data/appdata/<app>`, which double-nested to
> `.../felhom-data/felhom-data/...` on a Model-A drive; v0.52.0 dropped that segment in the catalog and
> locks deploy↔backup path agreement with `internal/stacks/hddpath_agreement_test.go`.
Path computation is centralized in `backup/paths.go` via the `FelhomDataDir = "felhom-data"` constant: Path computation is centralized in `backup/paths.go` via the `FelhomDataDir = "felhom-data"` constant:
- `PrimaryResticRepoPath(drivePath)``<drive>/felhom-data/backups/primary/restic/` - `PrimaryResticRepoPath(drivePath)``<drive>/felhom-data/backups/primary/restic/`