diff --git a/REPORT.md b/REPORT.md index 9f46c0a..12c074b 100644 --- a/REPORT.md +++ b/REPORT.md @@ -1,109 +1,91 @@ -# REPORT — felhom-controller v0.46.0: fix /backups 500 (stale disk-tier template fields) +# REPORT — USB drive availability (diagnosis) + backup-page whole-guest wiring -**Repo:** `felhom-controller` · **Version:** 0.46.0 · **Date:** 2026-06-12 - -## v0.46.0 — /backups 500 fix (most recent) - -**Symptom:** `GET /backups` → HTTP 500 ("Internal error"). - -**Diagnosis (from the live controller log, not source-guessing):** -`backups.html:64: executing "backups" at <.Backup.RepoStats>: can't evaluate field RepoStats in type -interface {}`. Not a panic / not a funcmap nil-deref. The 8C de-privileging slimmed `FullBackupStatus` -to app-data-only (DB dumps + Docker-volume tars); the disk-tier restic/cross-drive backup moved to the -host agent. But `backups.html` still carried the pre-8C restic UI, referencing `.Backup.X` **struct -fields that no longer exist**: `RepoStats, LastBackup, ResticSchedule, NextBackup, PruneSchedule, -Retention, SnapshotHistory, LastCheckTime, LastCheckOK`. While those fields existed-but-nil, the -`{{if .Backup.X}}` guards short-circuited; once removed from the struct, the field access itself errors. -(Root-level keys like `.PerDriveRepoStats`/`.Tier2DriveGroups`/`.ResticPassword` are map lookups → nil -on miss → safe; the `Tier1*/Tier2*` fields are on `AppBackupRows`, which the handler still supplies — so -the user's "Tier2/cross-drive fields are consistent" was correct; the break was the `.Backup.*` restic -stats only.) - -**Fix (template-only, no Go change — the struct was already correct):** removed the dead disk-tier UI -from `backups.html`, keeping the app-data backup view — storage overview (DB dumps), DB-dump status card -+ schedule + table, per-app backup rows (Tier1/Tier2 via `AppBackupRows`), restore. Removed: the restic -"Mentési tároló"/"Tároló méret"/snapshot-history/per-drive-repo-stats/integrity/retention blocks, and -the restic schedule rows. The status card + schedule summary now key on `.Backup.LastDBDump`. - -**Validation (guest 9201):** built + deployed `felhom-controller:0.46.0`; `GET /backups` → **HTTP 200**, -renders all sections (Tárhely áttekintés / Adatbázisok / Ütemezés), no template error in logs. -`TestTemplatesParse` + `TestSortDisksForView` green; golden rebaked to 0.46.0. - -`settings.html`'s `.ResticSchedule`/`.LastCheckTime` are unaffected (root-map lookups, nil-safe; the -latter is the self-update checker, not backup). +**Repo:** `felhom-controller` · **Version:** 0.47.0 · **Date:** 2026-06-12 · agent untouched --- -## v0.45.0 — storage UX polish (order, init filter, register shortcut, clarity) · pairs with **felhom-agent v0.24.0** +## PART 1 — USB drive availability: DIAGNOSIS (GATED — Branch A, no blind build) -Part B of the storage-fixes spec — controller-side ordering/filter/clarity polish on top of v0.44.0's -role-aware drive management. (Part A, the eject role-gate, lives at the agent — see felhom-agent -v0.24.0 / its REPORT.) +**Conclusion: BRANCH A confirmed — felhom-usb is NOT passed into guest 9201.** The drive is mounted on +the **host** only; the guest (and therefore the controller container and any app) cannot reach it. This +is the slice-10 additive-mount passthrough — **reported, not built.** The banner is correct. -## What changed +### Captured evidence (live, guest 9201 / felhom-pve) +**Host:** +- `pct config 9201` → mountpoints: only `mp9: …/bootstrap → /etc/felhom-bootstrap (ro)` and + `rootfs: local-lvm:vm-9201-disk-0,size=8G`. **No felhom-usb mp.** +- `findmnt /mnt/felhom-usb` → `/dev/sdb1 ext4` (mounted on the host). +- `/etc/pve/storage.cfg` → `dir: felhom-usb, path /mnt/felhom-usb, content backup, is_mountpoint 1`. -### B1 — deterministic disk order (`internal/web/agent_disk_handlers.go`) -`agentDisksListHandler` sorts the agent's drive list server-side before rendering (`sortDisksForView`): -**user-data → system → backup** (then unrecognized), alphabetical by storage name within each tier. -The agent's storage view iterates an unordered Go map, so the list previously reordered on every reload -(CLAUDE.md lesson #3). A stable Go-side contract beats relying on map order or template JS. -Test: `TestSortDisksForView`. +**Guest:** +- `findmnt /mnt/felhom-usb` → **nothing** (not a mount in the guest). +- `ls -la /mnt/felhom-usb` → empty dir (just `.`/`..`), created `Jun 12 07:46`. +- `df -h /mnt/felhom-usb /` → **both** `/dev/mapper/pve-vm--9201--disk--0` (the 8 GB rootfs) — i.e. + `/mnt/felhom-usb` in the guest is a plain directory **on the rootfs**, not the external drive. -### B2 — init wizard excludes mounted drives (`templates/storage_init.html`) -The `formattable` filter gained `&& !d.mount_path` (matching the attach wizard): an already-mounted -drive (e.g. `felhom-usb`) no longer appears as an "initialize" candidate. Eject it first to make it an -init target. +**Controller container:** +- `stat /mnt/felhom-usb` → **No such file or directory** (the de-privileged container has no `/mnt` + bind) → `os.Stat` in `monitor/healthcheck.go` fails → the "Adattároló nem elérhető" banner. +- logs: `Storage paths: 0 connected, 1 disconnected`. -### B3 — register shortcut for a mounted-but-unregistered user-data drive -- New `POST /api/storage/register` → `handleStorageRegister` → reuses `registerStoragePath` (the - manual-add path): records the existing mount into the `StoragePath` registry (no format, no eject), - then FileBrowser-syncs. `AddStoragePath` dedupes (clean error on double-register). -- `settings.html`: a mounted, **unregistered** user-data drive now shows **Regisztrálás** as its - PRIMARY per-card action; Leválasztás/Törlés stay secondary. +**Host drive contents (real data exists on the host):** `du -sh /mnt/felhom-usb` = 8.0 G; +contains `Dokumentumok/`, `felhom_data/`, `storage/`, `dump/`, `lost+found/` (Feb–Jun timestamps) — +a prior bare-metal layout. **Apps in the guest cannot see any of it.** -### B4 — system-storage clarity (presentation only, `settings.html` + `style.css`) -`local` and `local-lvm` are both kept (not collapsed). Each card now carries: -- a plain-Hungarian **purpose description** keyed on the agent's `type`/`role` (`purposeDesc`): - local-lvm → internal SSD (system, Docker, app **databases**); local → host storage, **no app data**; - pbs → backups; user-data → external store for large app files. -- an **app-backing tag** (`appBackingTag`): `local-lvm` → "Alkalmazás-rendszer"; user-data → - "Alkalmazás-adatok". -- a one-line **tiering note** above the list answering "which storage do the apps use?". -Role/type stay authoritative from the agent — no agent contract change. +### Flags (must be addressed by the slice-10 passthrough) +1. **App-data location correctness:** an app configured with `HDD_PATH=/mnt/felhom-usb` would write to + the **8 GB rootfs (local-lvm)**, silently, NOT the external drive. The only deployed app + (`actualbudget`) has **0 HDD mounts** (`ParseComposeHDDMounts: found 0`), so nothing is mislanding + *today* — but the risk is real the moment any app is placed on "external" storage. +2. **The banner was surfaced by the `Regisztrálás` shortcut** (controller v0.45.0, prior task): + registering `/mnt/felhom-usb` (mounted on the host, not the guest) created the phantom empty dir on + rootfs (the `Jun 12 07:46` dir) and the `1 disconnected` probe. Registering a host-only drive should + arguably be refused until passthrough exists — a candidate guard for the slice-10 work. -### B5 — eject confirmation names affected apps -Already wired pre-existing: `confirmEject` → the type-to-confirm modal fetches `/api/storage/impact` -and lists, by name, the deployed apps that lose their storage (parity with the wipe warning). Verified, -no code change needed. +### Slice-10 scope (NOT built here) +assign drive → attach as an LXC `mpN` on the guest (`reconcile/bringup.go:GuestMount`, marked "slice 10 +wires") → mount propagation into the controller container → register. Gated per the spec; to be scoped +with this evidence in hand. -## Build / deploy -- Built `felhom-controller:0.45.0` on the build server (`build.sh 0.45.0 --push`). -- Deployed to **guest 9201** on `felhom-pve`: `docker pull`, updated `/etc/felhom-controller-image`, - re-ran `felhom-controller-bootstrap.sh` → container recreated on `0.45.0`, **healthy**; - `[selfupdate] Current version 0.45.0 is up to date`. -- CHANGELOG (newest on top) + `controller/README.md` (Storage Management section) updated. +--- -## Live validation (guest 9201, auth disabled on this demo → API reachable) -- **B1 order:** `GET /api/disks` returns `[felhom-usb(user-data), local(system), local-lvm(system), - felhom-pbs(backup)]` — **stable across 3 reloads** (user-data first, system alpha, backup last). -- **B2 filter:** the only user-data drive (`felhom-usb`) has a `mount_path`, so the new filter excludes - it from init candidates (confirmed by the live disk data). -- **B3 register:** `POST /api/storage/register {where:/mnt/felhom-usb}` → `{registered:true}`; - `settings.json` now lists `/mnt/felhom-usb` (label "Tárhely (felhom-usb)", schedulable); - `FileBrowser mounts synced — 1 storage path(s)`. -- **B5 impact:** `GET /api/storage/impact?where=/mnt/felhom-usb` → `{apps:[], where:…}` (valid wiring; - no deployed app currently routes data there). -- **B4 clarity:** pure presentation rendered client-side from the agent's role/type (templates - parse-tested; the disk data carries the keys the JS branches on). Visual confirmation is a browser - check. -- `go test ./...` green for the whole module. +## PART 2 — Backups page: whole-guest backup visibility + manual trigger (v0.47.0, BUILT) -## Golden rebake — DONE -- Rebaked on `felhom-pve` (the gitea registry allows anonymous pull — no creds needed): - `bash /root/build-golden.sh 9100 local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst local-lvm local - vmbr0 gitea.dooplex.hu/admin/felhom-controller:0.45.0`. -- Verified the build guest baked `/etc/felhom-controller-image` = `…/felhom-controller:0.45.0` and the - docker image `…/felhom-controller:0.45.0` before archiving. -- Golden archive: `local:backup/vzdump-lxc-9100-2026_06_12-09_53_03.tar.zst` (876 MB). Transient build - guest 9100 stopped + `pct destroy 9100 --purge`'d. New provisions now bring up controller 0.45.0 - directly (self-update still covers any later drift). +### 2C gate — quiesce ownership (confirmed before wiring) +**The CONTROLLER owns quiescing.** The `quiesce.Loop` (slice 8B) stops its app stacks → `POST /backup` +→ polls `/backup/status` → resumes; the agent's vzdump is crash-consistent only (an LXC has no +fsfreeze). So the manual trigger goes **through the loop**, never a bare `StartBackup` (which would be +crash-consistent and wouldn't stop apps). Guest 9201 is on lvm-thin → **snapshot mode**, so downtime is +the until-snapshot window (~10 s here), with 8B.2 early-resume. + +### What shipped +- **2A agentapi:** `StatusResponse.Backup *BackupRecord`, `DueResponse.AgeSecs`, new + `RestoreTestStatus()`. Non-hollow tests (`backup_test.go`): parse the documented JSON; assert + `StartBackup` POSTs `/backup`. +- **2B Section "Rendszermentés (teljes mentés)":** read-only cards — last whole-guest backup (time + + size + **target PBS-vs-local**, from the archive volid), next-due (`/backup/due` age vs cadence), + restore-test, running phase. Agent-unreachable degrades to a note. +- **2C "Mentés most":** `quiesce.Loop` gains a mutex + `TriggerNow()` (single-flight via `TryLock` + + the persisted marker; `ErrBackupInProgress` on overlap; async, bounded by max-quiesce). New + `POST /api/guest-backup/trigger` + `GET /api/guest-backup/status` (distinct prefix from apiRouter's + app-data `/api/backup/{run,status}` — verified the collision and avoided shadowing). Button warns per + mode. +- **2D:** existing per-app DB-dump UI relabeled under an "Alkalmazás-mentések (adatbázis + konfiguráció)" + divider, distinct from the whole-guest tier. +- **2E config:** OUT OF SCOPE (hub-served policy, slice 10) — no agent config surface added. + +### Live validation (guest 9201, against the agent API — not REPORT) +- Agent `curl`: `/backup/status` → done, backup `local:backup/vzdump-lxc-9201-…`, snapshot, 1.4 GB, + success; `/backup/due` → due=false, within cadence; `/restore-test/status` → null. +- `GET /backups` → **200**; Section 1 renders "Utolsó teljes mentés", "Helyi tároló (local)", + "Visszaállítás ellenőrizve / Még nem futott", "Mentés most"; "Alkalmazás-mentések" divider present. +- **Manual trigger:** `POST /api/guest-backup/trigger` → `{started:true}`; quiesce logs show + quiesce `actualbudget` → job started → **snapshotted → early-resume (8B.2) → done**; phase polled + `snapshotted → done`; a **new backup recorded** (`…11_17_38.tar.zst`); `actualbudget` back up+healthy; + quiesce marker cleared (no stranded quiesce). +- **Single-flight:** concurrent double-trigger → one `{started:true}`, one + `{"error":"mentés már folyamatban van"}` (409). +- `go test ./internal/{web,agentapi,quiesce}/` green; `go build ./...` clean. + +### Deploy +Built + pushed `felhom-controller:0.47.0`; deployed to guest 9201 (healthy). Golden rebaked to 0.47.0.