v0.44.0: role-aware drive management — protected lockout + customer type-to-confirm wipe + drive-list restyle

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-11 21:44:50 +02:00
parent 2c32c821fe
commit 12064dcd88
13 changed files with 696 additions and 182 deletions
+53 -53
View File
@@ -1,61 +1,61 @@
# REPORT — v0.43.0: rebuilt storage management (guided init/attach/eject on the agent disk model)
# REPORT — felhom-controller v0.44.0: role-aware drive management + customer type-to-confirm wipe
**Repo:** `felhom-controller` · **Version:** 0.43.0 · **Date:** 2026-06-11
**Pushed commit:** `29a9dcd` · paired with `felhom-agent` v0.22.0 (`4734d4a`, exposes `durable_id`) + golden rebake.
**Repo:** `felhom-controller` · **Version:** 0.44.0 · **Date:** 2026-06-11 · pairs with **felhom-agent v0.23.0**
## What shipped
## What this implements
After the 8C de-privileging the storage UI's buttons pointed at deleted routes (all 404); only manual
"add already-mounted path" survived. The agent already owns disk execution + the data-bearing signature
gate, and the controller already had the `agentapi` client + `/api/disks/*` proxies + the `StoragePath`
registry. This is a **controller-only UI/orchestration layer** over those — the controller holds **no
destructive authority**.
The controller half of the storage-authorization redesign (CC SPEC, Part B). The drive UI is driven by
the agent's authoritative **role** (`system` | `backup` | `user-data`): system/backup are visibly
protected with no destructive controls; the customer manages their own data drives with informed
consent (type-to-confirm + named app impact).
- **Storage overview** (`settings.html`, `GET /api/disks`): the agent's live disk view — name/type/state/
device/mount/class + the **`data_bearing` badge** + a "registered?" cross-reference.
- **Guided init** (`/settings/storage/init` + `POST /api/storage/init`): format → resolve the new fs UUID
from the re-listed disks → assign (mount) → register the `StoragePath`. **A data-bearing device is
REFUSED** by the agent; the UI surfaces the exact `felhom-opsign …` command and **stops** — no force-format.
- **Guided attach** (`/settings/storage/attach` + `POST /api/storage/attach`): non-destructive — resolve
the existing fs UUID → assign → register.
- **Eject** (`POST /api/storage/eject`): benign unmount + deregister, surfacing the agent's dependent-guest warning.
- **`agentapi`**: `DiskInfo.DurableID` + `FSUUID()` (the assign key — strip `uuid:`); `FormatResult.PendingOp`
+ `OpsignCommand()`, now parsed from the agent's 403 body (the old client discarded it).
- **Honest buttons**: init/attach wired; migrate (drive + per-stack, both places) disabled "Hamarosan" — **no 404s**.
- **Phase 3 (de-priv template debt)**: removed the dead `CrossDrive*` blocks in `deploy.html` (the "2.
mentés" form + 3 JS fns) and `backups.html` (run buttons + 2 JS fns) — they referenced fields the
de-privileged handlers no longer provide.
### B0 — `agentapi` client
- `DiskInfo` += `role`, `total_bytes`, `used_bytes`, `used_fraction`.
- `FormatResult` += `role`, `needs_confirmation`, `durable_id`.
- `FormatDisk(ctx, device, fstype, confirmed, durableID)` — new `ErrNeedsConfirmation` (user-data,
awaiting the customer's confirmation) vs `ErrFormatRefused` (system/backup, operator signature).
## Security invariant — held, proven live
The UI **never** bypasses the agent's data-bearing gate; there is **no force-format**. A refusal surfaces
the `felhom-opsign` command only. Unit-tested (`runStorageInit` on a data-bearing refusal performs **zero**
assign/register) **and** proven live on 9201's real `sdb`:
`POST /api/storage/init {device:/dev/sdb1}`**HTTP 409**, `refused:true`, `registered:false`,
`opsign: felhom-opsign -op storage_wipe -host demo-felhom-01 -durable-id byid:wwn-0x5000039ddb108568-part1`.
No format, no mount, no registration.
### B1 — Role-aware overview (no destructive controls on protected drives)
- `settings.html` "Meghajtók (ügynök nézet)" restyled from a raw `<table>` to **cards** (house style):
name prominent, mono device/mount sub-detail, badges for class / data / **role** (🔒 lock for
system & backup) / registered, and a **capacity bar** reusing the monitoring `system-bar`
(green→amber→red). Eject/Wipe controls render **only** for user-data drives mounted under `/mnt`.
## Live validation (guest 9201, real 1TB USB `sdb` = `felhom-usb`)
- `/api/disks` now carries `durable_id`; `felhom-usb``/dev/sdb1`, `data_bearing:true` ("device is
mounted"), `durable_id:uuid:277a2179-…`. Overview badge maps correctly.
- **Init on sdb (data-bearing) → 409 + opsign, gate held** (the spec's passing gate test — sdb holds data).
- Pages render (no 404/500): `/settings`, `/settings/storage/init`, `/settings/storage/attach`,
`/stacks/<app>/deploy` (deploy.html — CrossDrive removed), `/stacks`, `/monitoring`. No dead storage links.
- Tests: refusal-surfaces-opsign-and-does-NOT-mount/register; success assigns with the resolved UUID +
registers the expected `StoragePath`; UUID resolution; a **template-parse test** guards every page.
### B2 — Customer wipe/eject flow
- **Name the apps**: `GET /api/storage/impact?where=``appsUsingPath` → the deployed apps (by display
name) whose `HDD_PATH` is that mount. Shown in the modal before any destructive action.
- **Type-to-confirm**: a modal with a text field; the destructive button stays disabled until the typed
value equals the mount name exactly (enforced client-side AND server-side in `/api/storage/wipe`).
- **Wipe** (`POST /api/storage/wipe`): eject (unmount + deregister) → server-side two-step
customer-confirmed format (learn the agent's durable id via the NeedsConfirmation response, then
re-submit `confirmed:true` bound to it). Deregisters the StoragePath.
## Deferred / flagged (NOT in this slice)
- **Phase 2 — migration (controller-side rsync):** intentionally its own slice (the migrate buttons are
disabled "Hamarosan", not dead). The controller still has `/mnt:/mnt:rw`, so it can rsync app-data
between mounts + update `app.yaml`'s `HDD_PATH` (stop→rsync→verify→start) — no agent endpoint needed.
- **`/backups` still 500s on PRE-EXISTING restic debt (NOT this change, NOT CrossDrive).** The page
references ~30 dead restic-tier fields (`.Backup.RepoStats`, `.SnapshotHistory`, `.ResticSchedule`,
`.Retention`, `.LastBackup`, `.NextBackup`, `.LastCheckOK`, …) that 8C removed from the backend — the
whole restic snapshot tier + repo stats + snapshot history + restic-password UI is dead. That's a
**backups-page de-priv rebuild** (a design slice: what the page shows in the app-data-only model), well
beyond the CrossDrive cleanup this spec scoped (the spec listed "backups.html (5)" = the CrossDrive refs,
which I removed). `/backups` was already 500'ing before this task. **Recommend it as the next slice.**
### B3 — Init/attach role-gated + restyled
- `storage_init.html`: data-bearing path now uses the **customer-confirmation** flow (type-to-confirm →
re-submit confirmed) instead of the `felhom-opsign` instruction; selector restyled to cards, lists
**only user-data** targets (system/backup are not offered). The opsign surface remains as a fallback
if a protected device somehow reaches it.
- `storage_attach.html`: restyled to cards, user-data only.
## Notes
- No agent disk-subsystem or gate changes; the only agent change is the read-only `durable_id` exposure
(v0.22.0) the user approved (without it the de-privileged controller can't learn the fs UUID `assign`
needs). Golden rebaked with controller 0.43.0 so fresh provisions get the rebuilt UI.
### B4 — UI polish
- New CSS appended to `style.css` (reusing existing tokens): `.drive-card` / `.drive-badges` /
`.drive-cap`, the missing `.badge-ok` / `.badge-lock` / `.badge-muted` / standalone `.mono`, and the
type-to-confirm `.confirm-overlay` / `.confirm-box`. No new design system; no raw table left.
## Tests
- `internal/agentapi/disks_test.go` — blank → ok; system/backup → `ErrFormatRefused` (+pending op);
user-data → `ErrNeedsConfirmation` (+durable id + role); confirmed → formatted.
- `internal/web/storage_handlers_test.go` — init on a user-data data-bearing device returns
NeedsConfirmation and does NO mount/register; confirmed init forwards the confirmation+durable id and
proceeds to register; system/backup init still surfaces the opsign; `appsUsingPathIn` names the right
deployed apps (dependency-impact). Plus `TestTemplatesParse` (all templates parse).
`go build ./... && go vet ./... && go test ./...` all green (Go 1.26, local).
## Version
`v0.43.0 → v0.44.0` (build-time ldflags `Version`).
## Live validation / deploy
Pending in this session: build+push the image, deploy to guest 9201, rebake the golden, and run the
live checks (overview tiers, the hand-issued `confirmed:true` refusal on a system/backup device, a
user-data confirmed wipe binding to the durable id, and the audit-log entry).