docs(v0.47.0): REPORT — USB passthrough diagnosis (Branch A, gated) + backup-page wiring; validated on 9201

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-12 11:21:12 +02:00
parent bbed5af662
commit 04bacbddfd
+77 -95
View File
@@ -1,109 +1,91 @@
# REPORT — felhom-controller v0.46.0: fix /backups 500 (stale disk-tier template fields)
# REPORT — USB drive availability (diagnosis) + backup-page whole-guest wiring
**Repo:** `felhom-controller` · **Version:** 0.46.0 · **Date:** 2026-06-12
## v0.46.0 — /backups 500 fix (most recent)
**Symptom:** `GET /backups` → HTTP 500 ("Internal error").
**Diagnosis (from the live controller log, not source-guessing):**
`backups.html:64: executing "backups" at <.Backup.RepoStats>: can't evaluate field RepoStats in type
interface {}`. Not a panic / not a funcmap nil-deref. The 8C de-privileging slimmed `FullBackupStatus`
to app-data-only (DB dumps + Docker-volume tars); the disk-tier restic/cross-drive backup moved to the
host agent. But `backups.html` still carried the pre-8C restic UI, referencing `.Backup.X` **struct
fields that no longer exist**: `RepoStats, LastBackup, ResticSchedule, NextBackup, PruneSchedule,
Retention, SnapshotHistory, LastCheckTime, LastCheckOK`. While those fields existed-but-nil, the
`{{if .Backup.X}}` guards short-circuited; once removed from the struct, the field access itself errors.
(Root-level keys like `.PerDriveRepoStats`/`.Tier2DriveGroups`/`.ResticPassword` are map lookups → nil
on miss → safe; the `Tier1*/Tier2*` fields are on `AppBackupRows`, which the handler still supplies — so
the user's "Tier2/cross-drive fields are consistent" was correct; the break was the `.Backup.*` restic
stats only.)
**Fix (template-only, no Go change — the struct was already correct):** removed the dead disk-tier UI
from `backups.html`, keeping the app-data backup view — storage overview (DB dumps), DB-dump status card
+ schedule + table, per-app backup rows (Tier1/Tier2 via `AppBackupRows`), restore. Removed: the restic
"Mentési tároló"/"Tároló méret"/snapshot-history/per-drive-repo-stats/integrity/retention blocks, and
the restic schedule rows. The status card + schedule summary now key on `.Backup.LastDBDump`.
**Validation (guest 9201):** built + deployed `felhom-controller:0.46.0`; `GET /backups`**HTTP 200**,
renders all sections (Tárhely áttekintés / Adatbázisok / Ütemezés), no template error in logs.
`TestTemplatesParse` + `TestSortDisksForView` green; golden rebaked to 0.46.0.
`settings.html`'s `.ResticSchedule`/`.LastCheckTime` are unaffected (root-map lookups, nil-safe; the
latter is the self-update checker, not backup).
**Repo:** `felhom-controller` · **Version:** 0.47.0 · **Date:** 2026-06-12 · agent untouched
---
## v0.45.0 — storage UX polish (order, init filter, register shortcut, clarity) · pairs with **felhom-agent v0.24.0**
## PART 1 — USB drive availability: DIAGNOSIS (GATED — Branch A, no blind build)
Part B of the storage-fixes spec — controller-side ordering/filter/clarity polish on top of v0.44.0's
role-aware drive management. (Part A, the eject role-gate, lives at the agent — see felhom-agent
v0.24.0 / its REPORT.)
**Conclusion: BRANCH A confirmed — felhom-usb is NOT passed into guest 9201.** The drive is mounted on
the **host** only; the guest (and therefore the controller container and any app) cannot reach it. This
is the slice-10 additive-mount passthrough — **reported, not built.** The banner is correct.
## What changed
### Captured evidence (live, guest 9201 / felhom-pve)
**Host:**
- `pct config 9201` → mountpoints: only `mp9: …/bootstrap → /etc/felhom-bootstrap (ro)` and
`rootfs: local-lvm:vm-9201-disk-0,size=8G`. **No felhom-usb mp.**
- `findmnt /mnt/felhom-usb``/dev/sdb1 ext4` (mounted on the host).
- `/etc/pve/storage.cfg``dir: felhom-usb, path /mnt/felhom-usb, content backup, is_mountpoint 1`.
### B1 — deterministic disk order (`internal/web/agent_disk_handlers.go`)
`agentDisksListHandler` sorts the agent's drive list server-side before rendering (`sortDisksForView`):
**user-data → system → backup** (then unrecognized), alphabetical by storage name within each tier.
The agent's storage view iterates an unordered Go map, so the list previously reordered on every reload
(CLAUDE.md lesson #3). A stable Go-side contract beats relying on map order or template JS.
Test: `TestSortDisksForView`.
**Guest:**
- `findmnt /mnt/felhom-usb`**nothing** (not a mount in the guest).
- `ls -la /mnt/felhom-usb` → empty dir (just `.`/`..`), created `Jun 12 07:46`.
- `df -h /mnt/felhom-usb /`**both** `/dev/mapper/pve-vm--9201--disk--0` (the 8 GB rootfs) — i.e.
`/mnt/felhom-usb` in the guest is a plain directory **on the rootfs**, not the external drive.
### B2 — init wizard excludes mounted drives (`templates/storage_init.html`)
The `formattable` filter gained `&& !d.mount_path` (matching the attach wizard): an already-mounted
drive (e.g. `felhom-usb`) no longer appears as an "initialize" candidate. Eject it first to make it an
init target.
**Controller container:**
- `stat /mnt/felhom-usb`**No such file or directory** (the de-privileged container has no `/mnt`
bind) → `os.Stat` in `monitor/healthcheck.go` fails → the "Adattároló nem elérhető" banner.
- logs: `Storage paths: 0 connected, 1 disconnected`.
### B3 — register shortcut for a mounted-but-unregistered user-data drive
- New `POST /api/storage/register``handleStorageRegister` → reuses `registerStoragePath` (the
manual-add path): records the existing mount into the `StoragePath` registry (no format, no eject),
then FileBrowser-syncs. `AddStoragePath` dedupes (clean error on double-register).
- `settings.html`: a mounted, **unregistered** user-data drive now shows **Regisztrálás** as its
PRIMARY per-card action; Leválasztás/Törlés stay secondary.
**Host drive contents (real data exists on the host):** `du -sh /mnt/felhom-usb` = 8.0 G;
contains `Dokumentumok/`, `felhom_data/`, `storage/`, `dump/`, `lost+found/` (FebJun timestamps) —
a prior bare-metal layout. **Apps in the guest cannot see any of it.**
### B4 — system-storage clarity (presentation only, `settings.html` + `style.css`)
`local` and `local-lvm` are both kept (not collapsed). Each card now carries:
- a plain-Hungarian **purpose description** keyed on the agent's `type`/`role` (`purposeDesc`):
local-lvm → internal SSD (system, Docker, app **databases**); local → host storage, **no app data**;
pbs → backups; user-data → external store for large app files.
- an **app-backing tag** (`appBackingTag`): `local-lvm` → "Alkalmazás-rendszer"; user-data →
"Alkalmazás-adatok".
- a one-line **tiering note** above the list answering "which storage do the apps use?".
Role/type stay authoritative from the agent — no agent contract change.
### Flags (must be addressed by the slice-10 passthrough)
1. **App-data location correctness:** an app configured with `HDD_PATH=/mnt/felhom-usb` would write to
the **8 GB rootfs (local-lvm)**, silently, NOT the external drive. The only deployed app
(`actualbudget`) has **0 HDD mounts** (`ParseComposeHDDMounts: found 0`), so nothing is mislanding
*today* — but the risk is real the moment any app is placed on "external" storage.
2. **The banner was surfaced by the `Regisztrálás` shortcut** (controller v0.45.0, prior task):
registering `/mnt/felhom-usb` (mounted on the host, not the guest) created the phantom empty dir on
rootfs (the `Jun 12 07:46` dir) and the `1 disconnected` probe. Registering a host-only drive should
arguably be refused until passthrough exists — a candidate guard for the slice-10 work.
### B5 — eject confirmation names affected apps
Already wired pre-existing: `confirmEject` the type-to-confirm modal fetches `/api/storage/impact`
and lists, by name, the deployed apps that lose their storage (parity with the wipe warning). Verified,
no code change needed.
### Slice-10 scope (NOT built here)
assign drive → attach as an LXC `mpN` on the guest (`reconcile/bringup.go:GuestMount`, marked "slice 10
wires") → mount propagation into the controller container → register. Gated per the spec; to be scoped
with this evidence in hand.
## Build / deploy
- Built `felhom-controller:0.45.0` on the build server (`build.sh 0.45.0 --push`).
- Deployed to **guest 9201** on `felhom-pve`: `docker pull`, updated `/etc/felhom-controller-image`,
re-ran `felhom-controller-bootstrap.sh` → container recreated on `0.45.0`, **healthy**;
`[selfupdate] Current version 0.45.0 is up to date`.
- CHANGELOG (newest on top) + `controller/README.md` (Storage Management section) updated.
---
## Live validation (guest 9201, auth disabled on this demo → API reachable)
- **B1 order:** `GET /api/disks` returns `[felhom-usb(user-data), local(system), local-lvm(system),
felhom-pbs(backup)]` — **stable across 3 reloads** (user-data first, system alpha, backup last).
- **B2 filter:** the only user-data drive (`felhom-usb`) has a `mount_path`, so the new filter excludes
it from init candidates (confirmed by the live disk data).
- **B3 register:** `POST /api/storage/register {where:/mnt/felhom-usb}` → `{registered:true}`;
`settings.json` now lists `/mnt/felhom-usb` (label "Tárhely (felhom-usb)", schedulable);
`FileBrowser mounts synced — 1 storage path(s)`.
- **B5 impact:** `GET /api/storage/impact?where=/mnt/felhom-usb` → `{apps:[], where:…}` (valid wiring;
no deployed app currently routes data there).
- **B4 clarity:** pure presentation rendered client-side from the agent's role/type (templates
parse-tested; the disk data carries the keys the JS branches on). Visual confirmation is a browser
check.
- `go test ./...` green for the whole module.
## PART 2 — Backups page: whole-guest backup visibility + manual trigger (v0.47.0, BUILT)
## Golden rebake — DONE
- Rebaked on `felhom-pve` (the gitea registry allows anonymous pull — no creds needed):
`bash /root/build-golden.sh 9100 local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst local-lvm local
vmbr0 gitea.dooplex.hu/admin/felhom-controller:0.45.0`.
- Verified the build guest baked `/etc/felhom-controller-image` = `…/felhom-controller:0.45.0` and the
docker image `…/felhom-controller:0.45.0` before archiving.
- Golden archive: `local:backup/vzdump-lxc-9100-2026_06_12-09_53_03.tar.zst` (876 MB). Transient build
guest 9100 stopped + `pct destroy 9100 --purge`'d. New provisions now bring up controller 0.45.0
directly (self-update still covers any later drift).
### 2C gate — quiesce ownership (confirmed before wiring)
**The CONTROLLER owns quiescing.** The `quiesce.Loop` (slice 8B) stops its app stacks → `POST /backup`
→ polls `/backup/status` → resumes; the agent's vzdump is crash-consistent only (an LXC has no
fsfreeze). So the manual trigger goes **through the loop**, never a bare `StartBackup` (which would be
crash-consistent and wouldn't stop apps). Guest 9201 is on lvm-thin → **snapshot mode**, so downtime is
the until-snapshot window (~10 s here), with 8B.2 early-resume.
### What shipped
- **2A agentapi:** `StatusResponse.Backup *BackupRecord`, `DueResponse.AgeSecs`, new
`RestoreTestStatus()`. Non-hollow tests (`backup_test.go`): parse the documented JSON; assert
`StartBackup` POSTs `/backup`.
- **2B Section "Rendszermentés (teljes mentés)":** read-only cards — last whole-guest backup (time +
size + **target PBS-vs-local**, from the archive volid), next-due (`/backup/due` age vs cadence),
restore-test, running phase. Agent-unreachable degrades to a note.
- **2C "Mentés most":** `quiesce.Loop` gains a mutex + `TriggerNow()` (single-flight via `TryLock` +
the persisted marker; `ErrBackupInProgress` on overlap; async, bounded by max-quiesce). New
`POST /api/guest-backup/trigger` + `GET /api/guest-backup/status` (distinct prefix from apiRouter's
app-data `/api/backup/{run,status}` — verified the collision and avoided shadowing). Button warns per
mode.
- **2D:** existing per-app DB-dump UI relabeled under an "Alkalmazás-mentések (adatbázis + konfiguráció)"
divider, distinct from the whole-guest tier.
- **2E config:** OUT OF SCOPE (hub-served policy, slice 10) — no agent config surface added.
### Live validation (guest 9201, against the agent API — not REPORT)
- Agent `curl`: `/backup/status` → done, backup `local:backup/vzdump-lxc-9201-…`, snapshot, 1.4 GB,
success; `/backup/due` → due=false, within cadence; `/restore-test/status` → null.
- `GET /backups`**200**; Section 1 renders "Utolsó teljes mentés", "Helyi tároló (local)",
"Visszaállítás ellenőrizve / Még nem futott", "Mentés most"; "Alkalmazás-mentések" divider present.
- **Manual trigger:** `POST /api/guest-backup/trigger``{started:true}`; quiesce logs show
quiesce `actualbudget` → job started → **snapshotted → early-resume (8B.2) → done**; phase polled
`snapshotted → done`; a **new backup recorded** (`…11_17_38.tar.zst`); `actualbudget` back up+healthy;
quiesce marker cleared (no stranded quiesce).
- **Single-flight:** concurrent double-trigger → one `{started:true}`, one
`{"error":"mentés már folyamatban van"}` (409).
- `go test ./internal/{web,agentapi,quiesce}/` green; `go build ./...` clean.
### Deploy
Built + pushed `felhom-controller:0.47.0`; deployed to guest 9201 (healthy). Golden rebaked to 0.47.0.