docs: Phase 1 gate — REPORT/CONTEXT/README for v0.52.0 double-nest fix

REPORT.md overwritten with the Phase-1 gate run (catalog template fix + agreement
test + live RomM migration on guest 9201, gate PASSED). CONTEXT.md dated entry.
README HDD_PATH/felhom-data convention note corrected for Model-A single-nesting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 09:38:42 +02:00
parent 2da23462c9
commit 5eb25c3861
3 changed files with 80 additions and 40 deletions
+15
View File
@@ -13,6 +13,21 @@ Last updated: 2026-06-12 (storage UX polish)
> is tracked in `CHANGELOG.md`, `controller/README.md`, and the auto-memory `MEMORY.md`. Live version:
> **v0.45.0**.
>
> **2026-06-13 — v0.52.0 Phase 1 GATE: deploy-side double-nest fix (catalog) + path-agreement test:**
> - The `felhom-data` double-nest lived in the **app-catalog compose templates**
> (`${HDD_PATH}/felhom-data/appdata/<app>`), not in `deploy.go`. On a Model-A in-guest drive the mount
> already IS the `felhom-data` namespace, so it double-nested on disk while the v0.51.0 backup helpers
> resolved single-nested → divergence. Fixed all four HDD templates (romm, nextcloud, immich,
> paperless-ngx) → `${HDD_PATH}/appdata/<app>`.
> - New `internal/stacks/hddpath_agreement_test.go` locks deploy-resolver (`ParseComposeHDDMounts`) ==
> backup helper (`AppDataDir(NamespaceRoot(.,true))`). No controller runtime change → no image rebuild
> (deployed stays 0.51.0, functionally current; golden not rebaked for a no-op).
> - **Live (guest 9201):** git-sync auto-delivered the fix to all four stack files; RomM migrated
> (stop→move→verify→redeploy) from `/mnt/felhom-usb/felhom-data/appdata/romm` →
> `/mnt/felhom-usb/appdata/romm`, healthy + HTTP 200, no data loss, old namespace empty. **GATE PASSED.**
> - Next: Phase 2 (per-app recovery unit), Phase 3 (auto-enabled off-drive Tier 2 w/ rootfs-headroom
> guard), Phase 4 (FileBrowser scoping + deploy-UI DB-on-SSD note + monitoring sort).
>
> **2026-06-12 — storage UX polish (v0.45.0), pairs with felhom-agent v0.24.0:**
> - **Agent eject role-gate (Part A, felhom-agent v0.24.0):** `POST /disks/eject` now refuses to
> unmount system/backup storage *at the agent* (fail-safe to protected on ambiguity) — the UI hiding
+55 -38
View File
@@ -1,43 +1,60 @@
# REPORT — felhom-controller v0.51.0
# REPORT — felhom-controller v0.52.0 (Phase 1 GATE: deploy-side double-nest fix)
Offsite-backup UI (felhom-pbs = real DR) + Model-A double-nest fix. Pairs with felhom-agent v0.28.0
(whole-guest backup re-targeted to the offsite PBS tier). Live-deployed in guest 9201 on demo-felhom.
Completes the Model-A double-nest reconciliation deferred in v0.48.0 and half-fixed in v0.51.0.
v0.51.0 fixed the **backup-helper** side (`NamespaceRoot` provenance); this slice fixes the
**deploy/compose** side and locks the two together. Validated live on guest 9201 (demo-felhom).
## Backups page — whole-guest backup shown as real DR
- `backupTargetLabel` returns **"Biztonsági szerver külön hardver (PBS)"** for a PBS-stored backup
(detected via `backupIsPBS` on the target id / archive volid), so the customer sees the backup
survives a host hardware failure.
- The app-data section's **"Távoli mentés"** card stops reading "nincs beállítva": new
`guestBackupView.Offsite` flag drives it to **"külön hardveren (PBS)"** with a ✓ when the whole-guest
backup landed on PBS.
- The restore-test "Visszaállítás ellenőrizve" trust signal is unchanged (already wired).
- Live: agent `/backup/status` reports `target_id=felhom-pbs`; `/restore-test/status` reports
`pass:true, verified:"boot+running", source_tier:"pbs"` → the page renders the PBS label, the offsite
card, and verified-restorable.
This is the GATE for the larger per-app-recovery-unit / Tier-2 slice — Phases 24 build on a proven
single-nested, agreeing path layout. Per plan: Phase 1 shipped + validated first.
## Model-A double-nest fix
- Under slice-10 Model A the host agent binds `<drive>/felhom-data` onto the guest mountpoint, so an
enrolled drive's in-guest mount IS the felhom-data namespace root (basename need not be `felhom-data`,
e.g. `/mnt/felhom-usb`). The backup path helpers were re-prepending `felhom-data`, producing
`.../felhom-data/felhom-data/...` on the host (confirmed live: `/mnt/felhom-usb/felhom-data/felhom-data/...`).
- `appbackup` path helpers now take a **namespace ROOT** (no internal `felhom-data` join) plus a new
`NamespaceRoot(drivePath, inGuestDrive)`. `backup.Manager.namespaceRoot`/`AppNamespaceRoot` resolve
provenance (`drivePath != systemDataPath` ⟺ a registered in-guest drive → namespace root as-is; the
SSD-only `systemDataPath` fallback appends `felhom-data`).
- All parallel constructions updated coherently so writes, deletion (`GetStackBackupData`,
`RemoveStack` backups-base + `ProtectedHDDPaths` — legacy double-nest dirs KEPT protected), the
wipe-warning secondary scan, and export all agree. `api.router` passes the namespace root across the
package boundary. Result: a drive-resident app's DB-dump lands single-nested at `<drive>/backups/...`
in-guest = `<drive>/felhom-data/backups/...` on the host.
- New `appbackup` test asserts no doubled `felhom-data` segment for an in-guest drive and exactly one
for the system fallback. Full `go build ./...` + tests green.
## Root cause (corrected from the spec's assumption)
The extra `felhom-data` segment was **not** built in controller `deploy.go``DeployStack` passes
`HDD_PATH` through verbatim, and the deploy storage dropdown only ever offers registered in-guest drives
(`GetSchedulableStoragePaths`), never the system-data path. The segment was hardcoded in the **app-catalog
compose templates** as `${HDD_PATH}/felhom-data/appdata/<app>`. On a Model-A in-guest drive the guest
mount `/mnt/<drive>` already IS the host's `<drive>/felhom-data` namespace, so that segment double-nested
to `<drive>/felhom-data/felhom-data/appdata/<app>` on disk — diverging from where the (v0.51.0) backup
helpers look: `AppDataDir(NamespaceRoot(HDD_PATH,true))` = `/mnt/<drive>/appdata/<app>`, single-nested.
## Decommission (P3) — NO controller change
- Permanent decommission is operator-signature-gated (never customer-confirmable), so it is wired
entirely agent-side (hub jobs-queue → signed-jobs runner). The controller deliberately exposes no
decommission UI. (felhom-agent v0.28.0.)
## Changes
- **Catalog (`app-catalog-felhom.eu`, the behavioral fix):** exhaustive `grep` confirmed exactly four HDD
templates carried the segment — `romm`, `nextcloud`, `immich`, `paperless-ngx`. All changed
`${HDD_PATH}/felhom-data/appdata/<app>``${HDD_PATH}/appdata/<app>` (volume mounts + header comments).
- **Controller (test-only, no runtime change):** `internal/stacks/hddpath_agreement_test.go` resolves a
compose's `${HDD_PATH}` bind mounts via the real deploy-side `ParseComposeHDDMounts` and asserts they
are byte-identical to the backup-side `AppDataDir(NamespaceRoot(HDD_PATH,true))` — no doubled
`felhom-data`, deploy↔backup locked so they can't drift again. `go test ./internal/stacks/...` and
`./internal/appbackup/...` pass; `go build ./...` clean.
- **No controller image rebuild.** The controller passes `HDD_PATH` through unchanged and already
resolved the single-nested path since v0.51.0, so no runtime change was needed. The controller in guest
9201 stays `:0.51.0` (functionally current); v0.52.0 marks the catalog gate + test and rolls into the
next image build (Phase 2). The controller is golden-image/bootstrap-managed — not rebaked for a no-op.
## Live deploy
- `gitea.dooplex.hu/admin/felhom-controller:0.51.0` running + healthy in guest 9201 (bootstrap-launched
via `/etc/felhom-controller-image`; prior 0.50.0). Startup clean (catalog sync, health ok,
FileBrowser mounts synced).
## Live validation (guest 9201, demo-felhom, root@felhom-pve)
- **Only one live HDD app:** RomM (the others aren't drive-deployed). `HDD_PATH=/mnt/felhom-usb`; data was
at the doubled `/mnt/felhom-usb/felhom-data/appdata/romm`.
- **Catalog fix delivered by the real mechanism:** the controller's periodic git-sync logged
*"Sablonok frissítve — frissítve: immich, nextcloud, paperless-ngx, romm"* and updated all four
on-disk stack files to `${HDD_PATH}/appdata/<app>` (single-nest) — confirmed in
`/opt/docker/stacks/*/docker-compose.yml`.
- **RomM migrated (stop → move → verify → redeploy, ordered, reversible `mv`, never delete-then-move):**
captured RomM's runtime env from the running containers into a guest-only temp file (secrets never left
the guest), `docker compose down` (named DB/redis volumes preserved), moved
`/mnt/felhom-usb/felhom-data/appdata/romm``/mnt/felhom-usb/appdata/romm`, verified file count
unchanged, then recreated with the fixed compose + captured env.
- **Result:** RomM binds the single-nest paths
(`/mnt/felhom-usb/appdata/romm/{library,resources}`), all three containers healthy, DB connected (the
captured creds worked), in-guest `HTTP/1.1 200 OK`, *"Application startup complete"*. Old namespace
`/mnt/felhom-usb/felhom-data/appdata/` confirmed empty; backups already single-nested at
`/mnt/felhom-usb/backups/primary`. Controller untouched & healthy throughout.
## Gate outcome — PASSED
A drive-app lands single-nested AND the backup helpers resolve the identical path — proven live (not
REPORT-only): the deploy-resolver and the backup helper agree by test and by the live RomM binds, the
catalog fix propagated via real git-sync, and no doubled `felhom-data` remains. Cleared to start Phase 2.
## Not done (intentionally deferred to Phases 24)
Per-app recovery unit (`backup/<app>`), Tier 2 off-drive copy (auto-enabled, durable-id target, rootfs
headroom guard), secret preserve-vs-regenerate classing, FileBrowser scoping, deploy-UI DB-on-SSD note,
monitoring storage sort/descriptions. The README's backup-paths section still describes the stale
restic/secondary layout — to be rewritten when Tier 2 is built.
+10 -2
View File
@@ -328,8 +328,16 @@ The nightly backup has two phases that run sequentially. All paths are **per-dri
└── media/ ← user files (not controller-managed)
```
> **Note:** `HDD_PATH` env var in `app.yaml` is still the mount point (e.g., `/mnt/hdd_1`). The `felhom-data` segment is embedded in path helpers — not in `HDD_PATH`.
> Pre-v0.26.0 installations use `<drive>/appdata/` and `<drive>/backups/` directly (no `felhom-data/` namespace).
> **Note (Model A — corrected in v0.52.0):** `HDD_PATH` in `app.yaml` is the **in-guest mount point**
> (e.g., `/mnt/felhom-usb`). Under slice-10 Model A the host agent binds `<drive>/felhom-data` directly
> onto that mount, so the in-guest mount **already is** the `felhom-data` namespace root. Neither the
> compose templates nor the path helpers add a `felhom-data` segment for a drive-resident app: app data
> is `${HDD_PATH}/appdata/<app>` and backups `${HDD_PATH}/backups/...`, **single-nested**. Only the
> SSD-only system-data fallback (a bare root, `inGuestDrive=false`) appends `felhom-data`. See
> `NamespaceRoot(drivePath, inGuestDrive)` in `internal/appbackup/paths.go`.
> Earlier catalog templates used `${HDD_PATH}/felhom-data/appdata/<app>`, which double-nested to
> `.../felhom-data/felhom-data/...` on a Model-A drive; v0.52.0 dropped that segment in the catalog and
> locks deploy↔backup path agreement with `internal/stacks/hddpath_agreement_test.go`.
Path computation is centralized in `backup/paths.go` via the `FelhomDataDir = "felhom-data"` constant:
- `PrimaryResticRepoPath(drivePath)``<drive>/felhom-data/backups/primary/restic/`