docs(v0.50.0): REPORT — controller slice-10 (P2C + activation-UX + P4); validated on 9201

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-12 18:06:11 +02:00
parent 4913130514
commit 1e82eebc5e
+41 -84
View File
@@ -1,91 +1,48 @@
# REPORT — USB drive availability (diagnosis) + backup-page whole-guest wiring
# REPORT — felhom-controller: slice 10 (external user-data drives) — controller side
**Repo:** `felhom-controller` · **Version:** 0.47.0 · **Date:** 2026-06-12 · agent untouched
**Version:** 0.50.0 · **Date:** 2026-06-12 · pairs with **felhom-agent v0.25.0v0.27.0**
---
The controller's half of slice 10 (the agent owns the host-side execution + self-heal; see
felhom-agent's REPORT for P1 spike / P2 passthrough / P2 activation endpoint / P3 self-heal). All
live-validated on guest 9201; golden rebaked to 0.50.0.
## PART 1 — USB drive availability: DIAGNOSIS (GATED — Branch A, no blind build)
## P2C — enroll passes the drive into the guest (v0.48.0)
- `agentapi.GuestAttach(where)` → agent `POST /disks/guest-attach`. `runStorageInit` /
`runStorageAttach` / `handleStorageRegister` call `attachIntoGuest` after recording the StoragePath
(best-effort; a transient failure is logged — P3 self-heal completes it). Closes Branch A: an enrolled
drive becomes usable in the guest (app `HDD_PATH` writes land on `/dev/sdb1`; the "nem elérhető"
banner clears).
**Conclusion: BRANCH A confirmed — felhom-usb is NOT passed into guest 9201.** The drive is mounted on
the **host** only; the guest (and therefore the controller container and any app) cannot reach it. This
is the slice-10 additive-mount passthrough — **reported, not built.** The banner is correct.
## Activation-UX (v0.49.0)
- The host-side live inject is blocked on unprivileged LXC, so a drive enrolled into a *running* guest
activates at the next guest boot. Per decision: enroll persists (no forced reboot) + a user-triggered
restart.
- `pendingActivationDrives()` flags registered drives the agent reports present+attached but which
aren't a live mount in the container. The settings page shows a banner + a batched **"Újraindítás
most (~30 mp)"** button → `POST /api/storage/activate``agentapi.GuestReboot` → agent
`POST /guest/reboot`. Live-validated: activate → guest reboots → drive active.
### Captured evidence (live, guest 9201 / felhom-pve)
**Host:**
- `pct config 9201` → mountpoints: only `mp9: …/bootstrap → /etc/felhom-bootstrap (ro)` and
`rootfs: local-lvm:vm-9201-disk-0,size=8G`. **No felhom-usb mp.**
- `findmnt /mnt/felhom-usb``/dev/sdb1 ext4` (mounted on the host).
- `/etc/pve/storage.cfg``dir: felhom-usb, path /mnt/felhom-usb, content backup, is_mountpoint 1`.
## P4 — dual-role drives + backup-aware wipe warning (v0.50.0)
- **4A:** a user-data drive is appdata AND backup-target-eligible (not locked to one role) — surfaced
in the drive overview's per-card purpose note. `felhom-pbs`/system/backup roles unchanged.
- **4B:** `handleStorageImpact` now also returns `backup_copies` — apps whose cross-drive (secondary)
backups are stored on the drive (`backupCopiesOnPath` scans `felhom-data/backups/secondary/<app>`,
skipping the shared restic repo / `_infra`). The type-to-confirm wipe/eject modal names them ("Ez a
meghajtó más alkalmazások biztonsági másolatait is tárolja — a törlés ezeket is eltávolítja"). The
wipe stays **customer-confirmable** (the copies are redundant; originals live on the source drive).
- **OUT OF SCOPE:** the cross-drive backup ENGINE (restic USB1↔USB2, scheduling, pruning) — a follow-on
slice (needs a 2nd physical drive to validate). The 4B detection is forward-compatible (empty until
the engine writes there).
**Guest:**
- `findmnt /mnt/felhom-usb`**nothing** (not a mount in the guest).
- `ls -la /mnt/felhom-usb` → empty dir (just `.`/`..`), created `Jun 12 07:46`.
- `df -h /mnt/felhom-usb /`**both** `/dev/mapper/pve-vm--9201--disk--0` (the 8 GB rootfs) — i.e.
`/mnt/felhom-usb` in the guest is a plain directory **on the rootfs**, not the external drive.
## Live validation (9201)
- P2C: app bytes on `/dev/sdb1`, banner `[PASS] 1 connected, 0 disconnected`.
- Activation: `/api/storage/activate` → reboot → drive active.
- P4: `/api/storage/impact?where=/mnt/felhom-usb``backup_copies:[]`; after creating
`felhom-data/backups/secondary/immich``backup_copies:["immich"]` (detection live).
- `go test ./internal/{web,agentapi}/` green; golden rebaked to 0.50.0, build guest purged.
**Controller container:**
- `stat /mnt/felhom-usb`**No such file or directory** (the de-privileged container has no `/mnt`
bind) → `os.Stat` in `monitor/healthcheck.go` fails → the "Adattároló nem elérhető" banner.
- logs: `Storage paths: 0 connected, 1 disconnected`.
**Host drive contents (real data exists on the host):** `du -sh /mnt/felhom-usb` = 8.0 G;
contains `Dokumentumok/`, `felhom_data/`, `storage/`, `dump/`, `lost+found/` (FebJun timestamps) —
a prior bare-metal layout. **Apps in the guest cannot see any of it.**
### Flags (must be addressed by the slice-10 passthrough)
1. **App-data location correctness:** an app configured with `HDD_PATH=/mnt/felhom-usb` would write to
the **8 GB rootfs (local-lvm)**, silently, NOT the external drive. The only deployed app
(`actualbudget`) has **0 HDD mounts** (`ParseComposeHDDMounts: found 0`), so nothing is mislanding
*today* — but the risk is real the moment any app is placed on "external" storage.
2. **The banner was surfaced by the `Regisztrálás` shortcut** (controller v0.45.0, prior task):
registering `/mnt/felhom-usb` (mounted on the host, not the guest) created the phantom empty dir on
rootfs (the `Jun 12 07:46` dir) and the `1 disconnected` probe. Registering a host-only drive should
arguably be refused until passthrough exists — a candidate guard for the slice-10 work.
### Slice-10 scope (NOT built here)
assign drive → attach as an LXC `mpN` on the guest (`reconcile/bringup.go:GuestMount`, marked "slice 10
wires") → mount propagation into the controller container → register. Gated per the spec; to be scoped
with this evidence in hand.
---
## PART 2 — Backups page: whole-guest backup visibility + manual trigger (v0.47.0, BUILT)
### 2C gate — quiesce ownership (confirmed before wiring)
**The CONTROLLER owns quiescing.** The `quiesce.Loop` (slice 8B) stops its app stacks → `POST /backup`
→ polls `/backup/status` → resumes; the agent's vzdump is crash-consistent only (an LXC has no
fsfreeze). So the manual trigger goes **through the loop**, never a bare `StartBackup` (which would be
crash-consistent and wouldn't stop apps). Guest 9201 is on lvm-thin → **snapshot mode**, so downtime is
the until-snapshot window (~10 s here), with 8B.2 early-resume.
### What shipped
- **2A agentapi:** `StatusResponse.Backup *BackupRecord`, `DueResponse.AgeSecs`, new
`RestoreTestStatus()`. Non-hollow tests (`backup_test.go`): parse the documented JSON; assert
`StartBackup` POSTs `/backup`.
- **2B Section "Rendszermentés (teljes mentés)":** read-only cards — last whole-guest backup (time +
size + **target PBS-vs-local**, from the archive volid), next-due (`/backup/due` age vs cadence),
restore-test, running phase. Agent-unreachable degrades to a note.
- **2C "Mentés most":** `quiesce.Loop` gains a mutex + `TriggerNow()` (single-flight via `TryLock` +
the persisted marker; `ErrBackupInProgress` on overlap; async, bounded by max-quiesce). New
`POST /api/guest-backup/trigger` + `GET /api/guest-backup/status` (distinct prefix from apiRouter's
app-data `/api/backup/{run,status}` — verified the collision and avoided shadowing). Button warns per
mode.
- **2D:** existing per-app DB-dump UI relabeled under an "Alkalmazás-mentések (adatbázis + konfiguráció)"
divider, distinct from the whole-guest tier.
- **2E config:** OUT OF SCOPE (hub-served policy, slice 10) — no agent config surface added.
### Live validation (guest 9201, against the agent API — not REPORT)
- Agent `curl`: `/backup/status` → done, backup `local:backup/vzdump-lxc-9201-…`, snapshot, 1.4 GB,
success; `/backup/due` → due=false, within cadence; `/restore-test/status` → null.
- `GET /backups`**200**; Section 1 renders "Utolsó teljes mentés", "Helyi tároló (local)",
"Visszaállítás ellenőrizve / Még nem futott", "Mentés most"; "Alkalmazás-mentések" divider present.
- **Manual trigger:** `POST /api/guest-backup/trigger``{started:true}`; quiesce logs show
quiesce `actualbudget` → job started → **snapshotted → early-resume (8B.2) → done**; phase polled
`snapshotted → done`; a **new backup recorded** (`…11_17_38.tar.zst`); `actualbudget` back up+healthy;
quiesce marker cleared (no stranded quiesce).
- **Single-flight:** concurrent double-trigger → one `{started:true}`, one
`{"error":"mentés már folyamatban van"}` (409).
- `go test ./internal/{web,agentapi,quiesce}/` green; `go build ./...` clean.
### Deploy
Built + pushed `felhom-controller:0.47.0`; deployed to guest 9201 (healthy). Golden rebaked to 0.47.0.
## Note (carried from P2)
The controller's app/backup path helpers still join `felhom-data` under the registered drive path; in
Model A the in-guest mount IS the felhom-data namespace, so backup paths double-nest
(`felhom-data/felhom-data/...`) — functional but untidy. Reconcile when wiring app-data-backup-to-drive
(not in this slice; app `HDD_PATH` data lands correctly today).