docs: Phase 1 gate — REPORT/CONTEXT/README for v0.52.0 double-nest fix
REPORT.md overwritten with the Phase-1 gate run (catalog template fix + agreement test + live RomM migration on guest 9201, gate PASSED). CONTEXT.md dated entry. README HDD_PATH/felhom-data convention note corrected for Model-A single-nesting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,43 +1,60 @@
|
||||
# REPORT — felhom-controller v0.51.0
|
||||
# REPORT — felhom-controller v0.52.0 (Phase 1 GATE: deploy-side double-nest fix)
|
||||
|
||||
Offsite-backup UI (felhom-pbs = real DR) + Model-A double-nest fix. Pairs with felhom-agent v0.28.0
|
||||
(whole-guest backup re-targeted to the offsite PBS tier). Live-deployed in guest 9201 on demo-felhom.
|
||||
Completes the Model-A double-nest reconciliation deferred in v0.48.0 and half-fixed in v0.51.0.
|
||||
v0.51.0 fixed the **backup-helper** side (`NamespaceRoot` provenance); this slice fixes the
|
||||
**deploy/compose** side and locks the two together. Validated live on guest 9201 (demo-felhom).
|
||||
|
||||
## Backups page — whole-guest backup shown as real DR
|
||||
- `backupTargetLabel` returns **"Biztonsági szerver – külön hardver (PBS)"** for a PBS-stored backup
|
||||
(detected via `backupIsPBS` on the target id / archive volid), so the customer sees the backup
|
||||
survives a host hardware failure.
|
||||
- The app-data section's **"Távoli mentés"** card stops reading "nincs beállítva": new
|
||||
`guestBackupView.Offsite` flag drives it to **"külön hardveren (PBS)"** with a ✓ when the whole-guest
|
||||
backup landed on PBS.
|
||||
- The restore-test "Visszaállítás ellenőrizve" trust signal is unchanged (already wired).
|
||||
- Live: agent `/backup/status` reports `target_id=felhom-pbs`; `/restore-test/status` reports
|
||||
`pass:true, verified:"boot+running", source_tier:"pbs"` → the page renders the PBS label, the offsite
|
||||
card, and verified-restorable.
|
||||
This is the GATE for the larger per-app-recovery-unit / Tier-2 slice — Phases 2–4 build on a proven
|
||||
single-nested, agreeing path layout. Per plan: Phase 1 shipped + validated first.
|
||||
|
||||
## Model-A double-nest fix
|
||||
- Under slice-10 Model A the host agent binds `<drive>/felhom-data` onto the guest mountpoint, so an
|
||||
enrolled drive's in-guest mount IS the felhom-data namespace root (basename need not be `felhom-data`,
|
||||
e.g. `/mnt/felhom-usb`). The backup path helpers were re-prepending `felhom-data`, producing
|
||||
`.../felhom-data/felhom-data/...` on the host (confirmed live: `/mnt/felhom-usb/felhom-data/felhom-data/...`).
|
||||
- `appbackup` path helpers now take a **namespace ROOT** (no internal `felhom-data` join) plus a new
|
||||
`NamespaceRoot(drivePath, inGuestDrive)`. `backup.Manager.namespaceRoot`/`AppNamespaceRoot` resolve
|
||||
provenance (`drivePath != systemDataPath` ⟺ a registered in-guest drive → namespace root as-is; the
|
||||
SSD-only `systemDataPath` fallback appends `felhom-data`).
|
||||
- All parallel constructions updated coherently so writes, deletion (`GetStackBackupData`,
|
||||
`RemoveStack` backups-base + `ProtectedHDDPaths` — legacy double-nest dirs KEPT protected), the
|
||||
wipe-warning secondary scan, and export all agree. `api.router` passes the namespace root across the
|
||||
package boundary. Result: a drive-resident app's DB-dump lands single-nested at `<drive>/backups/...`
|
||||
in-guest = `<drive>/felhom-data/backups/...` on the host.
|
||||
- New `appbackup` test asserts no doubled `felhom-data` segment for an in-guest drive and exactly one
|
||||
for the system fallback. Full `go build ./...` + tests green.
|
||||
## Root cause (corrected from the spec's assumption)
|
||||
The extra `felhom-data` segment was **not** built in controller `deploy.go` — `DeployStack` passes
|
||||
`HDD_PATH` through verbatim, and the deploy storage dropdown only ever offers registered in-guest drives
|
||||
(`GetSchedulableStoragePaths`), never the system-data path. The segment was hardcoded in the **app-catalog
|
||||
compose templates** as `${HDD_PATH}/felhom-data/appdata/<app>`. On a Model-A in-guest drive the guest
|
||||
mount `/mnt/<drive>` already IS the host's `<drive>/felhom-data` namespace, so that segment double-nested
|
||||
to `<drive>/felhom-data/felhom-data/appdata/<app>` on disk — diverging from where the (v0.51.0) backup
|
||||
helpers look: `AppDataDir(NamespaceRoot(HDD_PATH,true))` = `/mnt/<drive>/appdata/<app>`, single-nested.
|
||||
|
||||
## Decommission (P3) — NO controller change
|
||||
- Permanent decommission is operator-signature-gated (never customer-confirmable), so it is wired
|
||||
entirely agent-side (hub jobs-queue → signed-jobs runner). The controller deliberately exposes no
|
||||
decommission UI. (felhom-agent v0.28.0.)
|
||||
## Changes
|
||||
- **Catalog (`app-catalog-felhom.eu`, the behavioral fix):** exhaustive `grep` confirmed exactly four HDD
|
||||
templates carried the segment — `romm`, `nextcloud`, `immich`, `paperless-ngx`. All changed
|
||||
`${HDD_PATH}/felhom-data/appdata/<app>` → `${HDD_PATH}/appdata/<app>` (volume mounts + header comments).
|
||||
- **Controller (test-only, no runtime change):** `internal/stacks/hddpath_agreement_test.go` resolves a
|
||||
compose's `${HDD_PATH}` bind mounts via the real deploy-side `ParseComposeHDDMounts` and asserts they
|
||||
are byte-identical to the backup-side `AppDataDir(NamespaceRoot(HDD_PATH,true))` — no doubled
|
||||
`felhom-data`, deploy↔backup locked so they can't drift again. `go test ./internal/stacks/...` and
|
||||
`./internal/appbackup/...` pass; `go build ./...` clean.
|
||||
- **No controller image rebuild.** The controller passes `HDD_PATH` through unchanged and already
|
||||
resolved the single-nested path since v0.51.0, so no runtime change was needed. The controller in guest
|
||||
9201 stays `:0.51.0` (functionally current); v0.52.0 marks the catalog gate + test and rolls into the
|
||||
next image build (Phase 2). The controller is golden-image/bootstrap-managed — not rebaked for a no-op.
|
||||
|
||||
## Live deploy
|
||||
- `gitea.dooplex.hu/admin/felhom-controller:0.51.0` running + healthy in guest 9201 (bootstrap-launched
|
||||
via `/etc/felhom-controller-image`; prior 0.50.0). Startup clean (catalog sync, health ok,
|
||||
FileBrowser mounts synced).
|
||||
## Live validation (guest 9201, demo-felhom, root@felhom-pve)
|
||||
- **Only one live HDD app:** RomM (the others aren't drive-deployed). `HDD_PATH=/mnt/felhom-usb`; data was
|
||||
at the doubled `/mnt/felhom-usb/felhom-data/appdata/romm`.
|
||||
- **Catalog fix delivered by the real mechanism:** the controller's periodic git-sync logged
|
||||
*"Sablonok frissítve — frissítve: immich, nextcloud, paperless-ngx, romm"* and updated all four
|
||||
on-disk stack files to `${HDD_PATH}/appdata/<app>` (single-nest) — confirmed in
|
||||
`/opt/docker/stacks/*/docker-compose.yml`.
|
||||
- **RomM migrated (stop → move → verify → redeploy, ordered, reversible `mv`, never delete-then-move):**
|
||||
captured RomM's runtime env from the running containers into a guest-only temp file (secrets never left
|
||||
the guest), `docker compose down` (named DB/redis volumes preserved), moved
|
||||
`/mnt/felhom-usb/felhom-data/appdata/romm` → `/mnt/felhom-usb/appdata/romm`, verified file count
|
||||
unchanged, then recreated with the fixed compose + captured env.
|
||||
- **Result:** RomM binds the single-nest paths
|
||||
(`/mnt/felhom-usb/appdata/romm/{library,resources}`), all three containers healthy, DB connected (the
|
||||
captured creds worked), in-guest `HTTP/1.1 200 OK`, *"Application startup complete"*. Old namespace
|
||||
`/mnt/felhom-usb/felhom-data/appdata/` confirmed empty; backups already single-nested at
|
||||
`/mnt/felhom-usb/backups/primary`. Controller untouched & healthy throughout.
|
||||
|
||||
## Gate outcome — PASSED
|
||||
A drive-app lands single-nested AND the backup helpers resolve the identical path — proven live (not
|
||||
REPORT-only): the deploy-resolver and the backup helper agree by test and by the live RomM binds, the
|
||||
catalog fix propagated via real git-sync, and no doubled `felhom-data` remains. Cleared to start Phase 2.
|
||||
|
||||
## Not done (intentionally deferred to Phases 2–4)
|
||||
Per-app recovery unit (`backup/<app>`), Tier 2 off-drive copy (auto-enabled, durable-id target, rootfs
|
||||
headroom guard), secret preserve-vs-regenerate classing, FileBrowser scoping, deploy-UI DB-on-SSD note,
|
||||
monitoring storage sort/descriptions. The README's backup-paths section still describes the stale
|
||||
restic/secondary layout — to be rewritten when Tier 2 is built.
|
||||
|
||||
Reference in New Issue
Block a user