docs: Phase 4 + SLICE COMPLETE — REPORT/CONTEXT for v0.56.0

REPORT (Phase 4 FileBrowser scoping + deploy note + monitoring descriptions; live
validation; full slice summary 1/2/2b/3/4 all shipped + validated v0.52->v0.56).
CONTEXT entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 13:38:35 +02:00
parent 476a97376f
commit cae2bfbe5b
2 changed files with 48 additions and 47 deletions
+8
View File
@@ -13,6 +13,14 @@ Last updated: 2026-06-12 (storage UX polish)
> is tracked in `CHANGELOG.md`, `controller/README.md`, and the auto-memory `MEMORY.md`. Live version:
> **v0.45.0**.
>
> **2026-06-13 — v0.56.0 Phase 4: FileBrowser scoping + UI polish (SLICE COMPLETE):**
> - 4A: FileBrowser bind scoped to `<drive>/appdata` (recovery units + Tier 2 copies under `backups/`
> NOT mounted → customer can't browse/delete the restore source). 4B: deploy storage step states
> files-on-drive / DB-on-fast-SSD. 4C: `buildStorageBars` stable sort + purpose description on the
> monitoring list (user-data drives only; agent local/local-lvm/pbs live on the storage page, not here).
> - Live-validated (9201): FileBrowser mount `/mnt/felhom-usb/appdata -> /srv/felhom-usb` (backups hidden);
> deploy + monitoring text rendered. **All 5 phases (1, 2, 2b, 3, 4) shipped + live-validated, v0.52→v0.56.**
>
> **2026-06-13 — v0.55.0 Phase 3: auto off-drive Tier 2 (rootfs-headroom guard):**
> - `internal/backup/tier2.go`: rsync `-a --delete` of each HDD app's recovery unit + appdata → a
> DIFFERENT physical disk (`<target>/backups/secondary/<app>/`). Auto target: prefer another registered
+40 -47
View File
@@ -1,53 +1,46 @@
# REPORT — felhom-controller v0.55.0 (Phase 3: auto off-drive Tier 2)
# REPORT — felhom-controller v0.56.0 (Phase 4: FileBrowser scoping + UI polish) — SLICE COMPLETE
Tier 2 makes an **off-drive copy** of each HDD app's recovery unit + bulk userdata to a **different
physical disk** — the only off-drive protection browsable HDD userdata can get (PBS can't reach bind
mounts). Auto-enabled, auto-targeted, and — crucially — it **refuses rather than fills** the small guest
rootfs. Built, unit-tested, shipped, deployed, and live-validated on guest 9201. (Phases 1/2/2b shipped
as v0.520.54 — see git history.)
Phase 4 closes the per-app-recovery-unit / Tier-2 slice. Built, deployed, and live-validated on guest
9201. With this, **all five phases of the spec (1, 2, 2b, 3, 4) are shipped and live-validated** — see
git history (v0.52 → v0.56).
## What shipped
- **Engine** (`internal/backup/tier2.go`, `RunTier2`/`RunAllTier2`): rsync `-a --delete` mirror of the
recovery unit (`backups/primary/<app>/`) and the app's `appdata/<app>/``<target>/backups/secondary/
<app>/`. restic is **not** revived — a plain, browsable mirror.
- **Auto target selection:** prefer another registered user-data drive on a **different physical disk**
(can hold bulk userdata); else the internal SSD for **small units only**. Off-disk enforced by
`system.SamePhysicalDevice` (block-device identity — new exported helper, linux + non-linux stub),
re-checked before the copy (defense in depth).
- **Rootfs-headroom guard (the safety):** the SSD target is the ~8 GB guest rootfs, so a size-aware
guard (`tier2FitsHeadroom`, unit-tested) **refuses** unless the unit fits leaving a reserve free
(`max(2 GB, 20% of total)`). When nothing fits, it records an **honest** "needs a 2nd HDD" status —
never silently no-ops, never endangers the rootfs.
- **Status + UI:** results persist via the surviving `settings.CrossDriveBackup`. `buildAppBackupRows`
now **populates** the "2. mentés" card — real target ("belső SSD (csak DB/konfiguráció)" vs an external
drive) on success, or the honest no-target reason. Notifications via the surviving
`NotifyCrossDrive{Completed,Failed}` hooks.
- **Scheduling + trigger:** daily `tier2-backup` (03:30, after the DB dump); manual `POST /api/backup/tier2`.
- Fixed a stale pre-existing test (`TestBackupCopiesOnPath`, which still used the old
`felhom-data/backups/secondary` layout) to the Model-A in-guest layout Tier 2 actually uses.
## Phase 4 — what shipped (v0.56.0)
- **4A FileBrowser scoping (safety):** the FileBrowser bind mount is scoped to each drive's `appdata/`
subtree (`<drive>/appdata:/srv/<name>`) instead of the whole drive root. The recovery units + Tier 2
copies under `backups/` are **not mounted into FileBrowser at all** — the customer browses their
userdata but cannot reach (or see) the thing that restores them. `syncFileBrowserMounts` mkdir's the
appdata dir before binding; it runs on controller startup, so the scoping applies immediately.
- **4B Deploy-UI communication:** the storage-selection step states plainly (Hungarian) that the chosen
drive holds the app's **files**, while its **database runs on the fast internal SSD** and is backed up
alongside the app — so the DB-on-SSD split stops being a surprise.
- **4C Monitoring storage list:** `buildStorageBars` sorts deterministically (by path) and carries a
**purpose description** for the user-data drives, rendered on the monitoring "Tárolók kapacitása" list.
(Correction to the spec's premise: this list is the controller's registered **user-data** drives only —
the agent's local/local-lvm/pbs storage is not in this registry, so the role-tier sort and
`local`-vs-`local-lvm` descriptions live on the agent-backed storage-management page, not here.)
## Live validation (guest 9201)
- **Happy path:** triggered Tier 2 → *"Tier 2 copied romm → /mnt/sys_drive/felhom-data/backups/secondary/
romm (77.1 KB) [SSD: DB/config only]"*. The recovery unit landed on the SSD, **off** the felhom-usb
source (block devices 2065 vs 64518 — off-disk confirmed), auto-picking the SSD (no 2nd drive).
- **Refuse path (rootfs-headroom guard):** placed a 1 GB userdata dummy (SSD had 2.3 GB free) → Tier 2
**refused**: *"nincs elég hely a belső SSD-n — a nagy fájlok off-drive mentéséhez 2. meghajtó (vagy
távoli tárhely) szükséges"*, and did **not** copy the 1 GB to the rootfs. Removed the dummy; re-trigger
restored the successful small-unit copy.
- **UI end-to-end:** the backups page "2. mentés" card renders *Sikeres → belső SSD (csak
DB/konfiguráció)* for RomM.
- Demo left clean (dummy removed; RomM's intended small Tier 2 copy remains on the SSD).
- **4A:** after deploy, FileBrowser's mount is `/mnt/felhom-usb/appdata -> /srv/felhom-usb`; `/srv/
felhom-usb` lists `romm` (userdata) and the recovery units at `/mnt/felhom-usb/backups` are outside the
mount — confirmed via `docker inspect`.
- **4B:** the deploy page (nextcloud) renders "…adatbázis a gyors belső SSD-n…".
- **4C:** the monitoring page renders "Külső adattároló — … az adatbázisok a belső SSD-n vannak."
## Notes / follow-ups
- **Off-disk identity** uses block-device (`Stat_t.Dev`) equality — correct for the felhom layout
(external drive vs system rootfs). Two partitions on one physical disk would look "different"; the
agent's `DiskInfo.DurableID` is the stronger guarantee for that case (future hardening).
- Non-HDD apps (data on the rootfs, already in PBS) are skipped by Tier 2; their "2. mentés" card shows
"Nincs 2." — cosmetically it could be hidden for non-HDD apps (Phase 4 polish).
- The single-drive demo can only Tier 2 to the SSD (small units); a 2nd HDD would let bulk userdata copy
off-drive — the engine already prefers it when present.
## Slice summary (all live-validated on guest 9201)
- **Phase 1 (v0.52.0):** deploy-side Model-A double-nest fix (catalog templates) + deploy↔backup path
agreement test; RomM migrated.
- **Phase 2 (v0.53.x):** per-app **secret-free** recovery unit (compose + secret-stripped app.yaml +
db-dumps + volume-dumps + manifest), idempotent capture.
- **Phase 2b (v0.54.0):** restore-from-unit recreate + **fail-closed `data_key` gate** (proven live on
AdventureLog: refused when the encryption key was unrecoverable).
- **Phase 3 (v0.55.0):** auto **off-drive Tier 2** with the **rootfs-headroom guard** (refuse-not-fill,
proven live).
- **Phase 4 (v0.56.0):** FileBrowser scoping + deploy DB-on-SSD note + monitoring descriptions.
## Still ahead
Phase 4: FileBrowser scoping (hide recovery units), deploy-UI "DB runs on the fast internal drive" note,
monitoring storage-bar sort + descriptions. The README backup-paths section's stale restic/secondary
text should be rewritten alongside.
## Known follow-ups (small, optional)
- Off-disk identity uses block-device equality; the agent's `DiskInfo.DurableID` is stronger for the
same-disk-multiple-partitions case.
- Non-HDD apps' "2. mentés" card shows "Nincs 2." (they're in PBS); could be hidden for them.
- The README backup-paths section still has stale restic/secondary text (flagged inline) — worth a pass.
- Full readable-data restore e2e vs AdventureLog couldn't run on the 8 GB demo rootfs (images too big);
the gate + recreate are unit/integration-tested and the fail-closed path is proven live.