docs(v0.50.0): REPORT — controller slice-10 (P2C + activation-UX + P4); validated on 9201
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,91 +1,48 @@
|
||||
# REPORT — USB drive availability (diagnosis) + backup-page whole-guest wiring
|
||||
# REPORT — felhom-controller: slice 10 (external user-data drives) — controller side
|
||||
|
||||
**Repo:** `felhom-controller` · **Version:** 0.47.0 · **Date:** 2026-06-12 · agent untouched
|
||||
**Version:** 0.50.0 · **Date:** 2026-06-12 · pairs with **felhom-agent v0.25.0–v0.27.0**
|
||||
|
||||
---
|
||||
The controller's half of slice 10 (the agent owns the host-side execution + self-heal; see
|
||||
felhom-agent's REPORT for P1 spike / P2 passthrough / P2 activation endpoint / P3 self-heal). All
|
||||
live-validated on guest 9201; golden rebaked to 0.50.0.
|
||||
|
||||
## PART 1 — USB drive availability: DIAGNOSIS (GATED — Branch A, no blind build)
|
||||
## P2C — enroll passes the drive into the guest (v0.48.0)
|
||||
- `agentapi.GuestAttach(where)` → agent `POST /disks/guest-attach`. `runStorageInit` /
|
||||
`runStorageAttach` / `handleStorageRegister` call `attachIntoGuest` after recording the StoragePath
|
||||
(best-effort; a transient failure is logged — P3 self-heal completes it). Closes Branch A: an enrolled
|
||||
drive becomes usable in the guest (app `HDD_PATH` writes land on `/dev/sdb1`; the "nem elérhető"
|
||||
banner clears).
|
||||
|
||||
**Conclusion: BRANCH A confirmed — felhom-usb is NOT passed into guest 9201.** The drive is mounted on
|
||||
the **host** only; the guest (and therefore the controller container and any app) cannot reach it. This
|
||||
is the slice-10 additive-mount passthrough — **reported, not built.** The banner is correct.
|
||||
## Activation-UX (v0.49.0)
|
||||
- The host-side live inject is blocked on unprivileged LXC, so a drive enrolled into a *running* guest
|
||||
activates at the next guest boot. Per decision: enroll persists (no forced reboot) + a user-triggered
|
||||
restart.
|
||||
- `pendingActivationDrives()` flags registered drives the agent reports present+attached but which
|
||||
aren't a live mount in the container. The settings page shows a banner + a batched **"Újraindítás
|
||||
most (~30 mp)"** button → `POST /api/storage/activate` → `agentapi.GuestReboot` → agent
|
||||
`POST /guest/reboot`. Live-validated: activate → guest reboots → drive active.
|
||||
|
||||
### Captured evidence (live, guest 9201 / felhom-pve)
|
||||
**Host:**
|
||||
- `pct config 9201` → mountpoints: only `mp9: …/bootstrap → /etc/felhom-bootstrap (ro)` and
|
||||
`rootfs: local-lvm:vm-9201-disk-0,size=8G`. **No felhom-usb mp.**
|
||||
- `findmnt /mnt/felhom-usb` → `/dev/sdb1 ext4` (mounted on the host).
|
||||
- `/etc/pve/storage.cfg` → `dir: felhom-usb, path /mnt/felhom-usb, content backup, is_mountpoint 1`.
|
||||
## P4 — dual-role drives + backup-aware wipe warning (v0.50.0)
|
||||
- **4A:** a user-data drive is appdata AND backup-target-eligible (not locked to one role) — surfaced
|
||||
in the drive overview's per-card purpose note. `felhom-pbs`/system/backup roles unchanged.
|
||||
- **4B:** `handleStorageImpact` now also returns `backup_copies` — apps whose cross-drive (secondary)
|
||||
backups are stored on the drive (`backupCopiesOnPath` scans `felhom-data/backups/secondary/<app>`,
|
||||
skipping the shared restic repo / `_infra`). The type-to-confirm wipe/eject modal names them ("Ez a
|
||||
meghajtó más alkalmazások biztonsági másolatait is tárolja — a törlés ezeket is eltávolítja"). The
|
||||
wipe stays **customer-confirmable** (the copies are redundant; originals live on the source drive).
|
||||
- **OUT OF SCOPE:** the cross-drive backup ENGINE (restic USB1↔USB2, scheduling, pruning) — a follow-on
|
||||
slice (needs a 2nd physical drive to validate). The 4B detection is forward-compatible (empty until
|
||||
the engine writes there).
|
||||
|
||||
**Guest:**
|
||||
- `findmnt /mnt/felhom-usb` → **nothing** (not a mount in the guest).
|
||||
- `ls -la /mnt/felhom-usb` → empty dir (just `.`/`..`), created `Jun 12 07:46`.
|
||||
- `df -h /mnt/felhom-usb /` → **both** `/dev/mapper/pve-vm--9201--disk--0` (the 8 GB rootfs) — i.e.
|
||||
`/mnt/felhom-usb` in the guest is a plain directory **on the rootfs**, not the external drive.
|
||||
## Live validation (9201)
|
||||
- P2C: app bytes on `/dev/sdb1`, banner `[PASS] 1 connected, 0 disconnected`.
|
||||
- Activation: `/api/storage/activate` → reboot → drive active.
|
||||
- P4: `/api/storage/impact?where=/mnt/felhom-usb` → `backup_copies:[]`; after creating
|
||||
`felhom-data/backups/secondary/immich` → `backup_copies:["immich"]` (detection live).
|
||||
- `go test ./internal/{web,agentapi}/` green; golden rebaked to 0.50.0, build guest purged.
|
||||
|
||||
**Controller container:**
|
||||
- `stat /mnt/felhom-usb` → **No such file or directory** (the de-privileged container has no `/mnt`
|
||||
bind) → `os.Stat` in `monitor/healthcheck.go` fails → the "Adattároló nem elérhető" banner.
|
||||
- logs: `Storage paths: 0 connected, 1 disconnected`.
|
||||
|
||||
**Host drive contents (real data exists on the host):** `du -sh /mnt/felhom-usb` = 8.0 G;
|
||||
contains `Dokumentumok/`, `felhom_data/`, `storage/`, `dump/`, `lost+found/` (Feb–Jun timestamps) —
|
||||
a prior bare-metal layout. **Apps in the guest cannot see any of it.**
|
||||
|
||||
### Flags (must be addressed by the slice-10 passthrough)
|
||||
1. **App-data location correctness:** an app configured with `HDD_PATH=/mnt/felhom-usb` would write to
|
||||
the **8 GB rootfs (local-lvm)**, silently, NOT the external drive. The only deployed app
|
||||
(`actualbudget`) has **0 HDD mounts** (`ParseComposeHDDMounts: found 0`), so nothing is mislanding
|
||||
*today* — but the risk is real the moment any app is placed on "external" storage.
|
||||
2. **The banner was surfaced by the `Regisztrálás` shortcut** (controller v0.45.0, prior task):
|
||||
registering `/mnt/felhom-usb` (mounted on the host, not the guest) created the phantom empty dir on
|
||||
rootfs (the `Jun 12 07:46` dir) and the `1 disconnected` probe. Registering a host-only drive should
|
||||
arguably be refused until passthrough exists — a candidate guard for the slice-10 work.
|
||||
|
||||
### Slice-10 scope (NOT built here)
|
||||
assign drive → attach as an LXC `mpN` on the guest (`reconcile/bringup.go:GuestMount`, marked "slice 10
|
||||
wires") → mount propagation into the controller container → register. Gated per the spec; to be scoped
|
||||
with this evidence in hand.
|
||||
|
||||
---
|
||||
|
||||
## PART 2 — Backups page: whole-guest backup visibility + manual trigger (v0.47.0, BUILT)
|
||||
|
||||
### 2C gate — quiesce ownership (confirmed before wiring)
|
||||
**The CONTROLLER owns quiescing.** The `quiesce.Loop` (slice 8B) stops its app stacks → `POST /backup`
|
||||
→ polls `/backup/status` → resumes; the agent's vzdump is crash-consistent only (an LXC has no
|
||||
fsfreeze). So the manual trigger goes **through the loop**, never a bare `StartBackup` (which would be
|
||||
crash-consistent and wouldn't stop apps). Guest 9201 is on lvm-thin → **snapshot mode**, so downtime is
|
||||
the until-snapshot window (~10 s here), with 8B.2 early-resume.
|
||||
|
||||
### What shipped
|
||||
- **2A agentapi:** `StatusResponse.Backup *BackupRecord`, `DueResponse.AgeSecs`, new
|
||||
`RestoreTestStatus()`. Non-hollow tests (`backup_test.go`): parse the documented JSON; assert
|
||||
`StartBackup` POSTs `/backup`.
|
||||
- **2B Section "Rendszermentés (teljes mentés)":** read-only cards — last whole-guest backup (time +
|
||||
size + **target PBS-vs-local**, from the archive volid), next-due (`/backup/due` age vs cadence),
|
||||
restore-test, running phase. Agent-unreachable degrades to a note.
|
||||
- **2C "Mentés most":** `quiesce.Loop` gains a mutex + `TriggerNow()` (single-flight via `TryLock` +
|
||||
the persisted marker; `ErrBackupInProgress` on overlap; async, bounded by max-quiesce). New
|
||||
`POST /api/guest-backup/trigger` + `GET /api/guest-backup/status` (distinct prefix from apiRouter's
|
||||
app-data `/api/backup/{run,status}` — verified the collision and avoided shadowing). Button warns per
|
||||
mode.
|
||||
- **2D:** existing per-app DB-dump UI relabeled under an "Alkalmazás-mentések (adatbázis + konfiguráció)"
|
||||
divider, distinct from the whole-guest tier.
|
||||
- **2E config:** OUT OF SCOPE (hub-served policy, slice 10) — no agent config surface added.
|
||||
|
||||
### Live validation (guest 9201, against the agent API — not REPORT)
|
||||
- Agent `curl`: `/backup/status` → done, backup `local:backup/vzdump-lxc-9201-…`, snapshot, 1.4 GB,
|
||||
success; `/backup/due` → due=false, within cadence; `/restore-test/status` → null.
|
||||
- `GET /backups` → **200**; Section 1 renders "Utolsó teljes mentés", "Helyi tároló (local)",
|
||||
"Visszaállítás ellenőrizve / Még nem futott", "Mentés most"; "Alkalmazás-mentések" divider present.
|
||||
- **Manual trigger:** `POST /api/guest-backup/trigger` → `{started:true}`; quiesce logs show
|
||||
quiesce `actualbudget` → job started → **snapshotted → early-resume (8B.2) → done**; phase polled
|
||||
`snapshotted → done`; a **new backup recorded** (`…11_17_38.tar.zst`); `actualbudget` back up+healthy;
|
||||
quiesce marker cleared (no stranded quiesce).
|
||||
- **Single-flight:** concurrent double-trigger → one `{started:true}`, one
|
||||
`{"error":"mentés már folyamatban van"}` (409).
|
||||
- `go test ./internal/{web,agentapi,quiesce}/` green; `go build ./...` clean.
|
||||
|
||||
### Deploy
|
||||
Built + pushed `felhom-controller:0.47.0`; deployed to guest 9201 (healthy). Golden rebaked to 0.47.0.
|
||||
## Note (carried from P2)
|
||||
The controller's app/backup path helpers still join `felhom-data` under the registered drive path; in
|
||||
Model A the in-guest mount IS the felhom-data namespace, so backup paths double-nest
|
||||
(`felhom-data/felhom-data/...`) — functional but untidy. Reconcile when wiring app-data-backup-to-drive
|
||||
(not in this slice; app `HDD_PATH` data lands correctly today).
|
||||
|
||||
Reference in New Issue
Block a user