docs(v0.43.0): REPORT (storage mgmt rebuild) + README agent-delegated storage note
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,46 +1,61 @@
|
|||||||
# REPORT — v0.42.1: real Let's Encrypt wildcard cert (wildcard proactive issuance)
|
# REPORT — v0.43.0: rebuilt storage management (guided init/attach/eject on the agent disk model)
|
||||||
|
|
||||||
**Repo:** `felhom-controller` · **Version:** 0.42.1 · **Date:** 2026-06-11
|
**Repo:** `felhom-controller` · **Version:** 0.43.0 · **Date:** 2026-06-11
|
||||||
**Pushed commits:** `84c3e84` (v0.42.0, superseded) → `e61e7dd` (v0.42.1) · paired with `felhom-agent`
|
**Pushed commit:** `29a9dcd` · paired with `felhom-agent` v0.22.0 (`4734d4a`, exposes `durable_id`) + golden rebake.
|
||||||
v0.21.0 (split-horizon LAN resolver — depends on this real cert) + golden rebake (controller 0.42.1).
|
|
||||||
|
|
||||||
## What shipped
|
## What shipped
|
||||||
|
|
||||||
The base-infra traefik obtained **no** real cert (acme.json empty) — both routers relied on the
|
After the 8C de-privileging the storage UI's buttons pointed at deleted routes (all 404); only manual
|
||||||
websecure entrypoint-default `certResolver`, which does **not** trigger proactive DNS-01 issuance, so
|
"add already-mounted path" survived. The agent already owns disk execution + the data-bearing signature
|
||||||
everything ran on traefik's self-signed default (masked externally only by the tunnel's `noTLSVerify`).
|
gate, and the controller already had the `agentapi` client + `/api/disks/*` proxies + the `StoragePath`
|
||||||
This blocked LAN-direct (a LAN client TLS-handshakes straight to traefik and needs the real cert).
|
registry. This is a **controller-only UI/orchestration layer** over those — the controller holds **no
|
||||||
|
destructive authority**.
|
||||||
|
|
||||||
- **`infra.RenderControllerRoute(domain, wildcardTLS)`** — the always-present controller route is now
|
- **Storage overview** (`settings.html`, `GET /api/disks`): the agent's live disk view — name/type/state/
|
||||||
the **wildcard-issuance anchor**: when DNS-01 ACME is configured (`CFAPIToken && Email`) it carries
|
device/mount/class + the **`data_bearing` badge** + a "registered?" cross-reference.
|
||||||
router-level `tls.certResolver: letsencrypt` + `tls.domains: [{main: "*.<domain>", sans: ["<domain>"]}]`,
|
- **Guided init** (`/settings/storage/init` + `POST /api/storage/init`): format → resolve the new fs UUID
|
||||||
so traefik **proactively obtains `*.<domain>` + apex at startup** via Cloudflare DNS-01. Every other
|
from the re-listed disks → assign (mount) → register the `StoragePath`. **A data-bearing device is
|
||||||
router (filebrowser, future apps) then serves that one wildcard by SNI match — **no per-app
|
REFUSED** by the agent; the UI surfaces the exact `felhom-opsign …` command and **stops** — no force-format.
|
||||||
certresolver labels**, real cert ready before the first client connects. `stacks.wireController` passes
|
- **Guided attach** (`/settings/storage/attach` + `POST /api/storage/attach`): non-destructive — resolve
|
||||||
`wildcardTLS = (CFAPIToken != "" && Email != "")`.
|
the existing fs UUID → assign → register.
|
||||||
- **Key empirical finding (staging on 9201):** traefik v3 issues a cert from a **router-level**
|
- **Eject** (`POST /api/storage/eject`): benign unmount + deregister, surfacing the agent's dependent-guest warning.
|
||||||
`tls.domains` but **NOT** from the entrypoint-level `http.tls.domains` (acme.json stayed 0 bytes with
|
- **`agentapi`**: `DiskInfo.DurableID` + `FSUUID()` (the assign key — strip `uuid:`); `FormatResult.PendingOp`
|
||||||
the latter). v0.42.0's entrypoint-domains attempt + `TraefikData.Domain` was reverted.
|
+ `OpsignCommand()`, now parsed from the agent's 403 body (the old client discarded it).
|
||||||
|
- **Honest buttons**: init/attach wired; migrate (drive + per-stack, both places) disabled "Hamarosan" — **no 404s**.
|
||||||
|
- **Phase 3 (de-priv template debt)**: removed the dead `CrossDrive*` blocks in `deploy.html` (the "2.
|
||||||
|
mentés" form + 3 JS fns) and `backups.html` (run buttons + 2 JS fns) — they referenced fields the
|
||||||
|
de-privileged handlers no longer provide.
|
||||||
|
|
||||||
## Validation (staging → prod on guest 9201)
|
## Security invariant — held, proven live
|
||||||
- **CF token pre-check:** active, scoped to `demo-felhom.eu`, DNS read OK (DNS:Edit confirmed by the run).
|
The UI **never** bypasses the agent's data-bearing gate; there is **no force-format**. A refusal surfaces
|
||||||
- **Staging (Fake LE):** with the router-level wildcard, acme.json went 0 → populated; `felhom.*`,
|
the `felhom-opsign` command only. Unit-tested (`runStorageInit` on a data-bearing refusal performs **zero**
|
||||||
`files.*`, and arbitrary `anything.demo-felhom.eu` all presented `*.demo-felhom.eu` (issuer `(STAGING)`)
|
assign/register) **and** proven live on 9201's real `sdb`:
|
||||||
— one wildcard, zero per-app labels.
|
`POST /api/storage/init {device:/dev/sdb1}` → **HTTP 409**, `refused:true`, `registered:false`,
|
||||||
- **Prod switch:** wiped acme.json, clean first-boot render (no `caServer`, no entrypoint domains) →
|
`opsign: felhom-opsign -op storage_wipe -host demo-felhom-01 -durable-id byid:wwn-0x5000039ddb108568-part1`.
|
||||||
PROD wildcard issued (acme.json ~16 KB).
|
No format, no mount, no registration.
|
||||||
- **GATE (from dooplex, real LAN host, direct to guest IP):** `felhom.demo-felhom.eu` + `files.demo-felhom.eu`
|
|
||||||
→ **`200 ssl_verify=0`**; issuer **`C=US, O=Let's Encrypt, CN=YR1`** (real prod LE); subject
|
|
||||||
`CN=*.demo-felhom.eu`, SAN `*.demo-felhom.eu, demo-felhom.eu`. Dashboard title `Vezérlőpult — Felhom.eu`.
|
|
||||||
|
|
||||||
## Live deploy
|
## Live validation (guest 9201, real 1TB USB `sdb` = `felhom-usb`)
|
||||||
- Built + pushed `0.42.1`; deployed to 9201 (clean first-boot: wipe acme.json → controller renders prod
|
- `/api/disks` now carries `durable_id`; `felhom-usb` → `/dev/sdb1`, `data_bearing:true` ("device is
|
||||||
traefik.yml + wildcard route → traefik obtains the prod wildcard).
|
mounted"), `durable_id:uuid:277a2179-…`. Overview badge maps correctly.
|
||||||
- **Golden rebaked** with controller 0.42.1 → `local:backup/vzdump-lxc-9100-2026_06_11-18_10_11.tar.zst`
|
- **Init on sdb (data-bearing) → 409 + opsign, gate held** (the spec's passing gate test — sdb holds data).
|
||||||
(fresh provisions get a real wildcard cert on first boot).
|
- Pages render (no 404/500): `/settings`, `/settings/storage/init`, `/settings/storage/attach`,
|
||||||
|
`/stacks/<app>/deploy` (deploy.html — CrossDrive removed), `/stacks`, `/monitoring`. No dead storage links.
|
||||||
|
- Tests: refusal-surfaces-opsign-and-does-NOT-mount/register; success assigns with the resolved UUID +
|
||||||
|
registers the expected `StoragePath`; UUID resolution; a **template-parse test** guards every page.
|
||||||
|
|
||||||
|
## Deferred / flagged (NOT in this slice)
|
||||||
|
- **Phase 2 — migration (controller-side rsync):** intentionally its own slice (the migrate buttons are
|
||||||
|
disabled "Hamarosan", not dead). The controller still has `/mnt:/mnt:rw`, so it can rsync app-data
|
||||||
|
between mounts + update `app.yaml`'s `HDD_PATH` (stop→rsync→verify→start) — no agent endpoint needed.
|
||||||
|
- **`/backups` still 500s on PRE-EXISTING restic debt (NOT this change, NOT CrossDrive).** The page
|
||||||
|
references ~30 dead restic-tier fields (`.Backup.RepoStats`, `.SnapshotHistory`, `.ResticSchedule`,
|
||||||
|
`.Retention`, `.LastBackup`, `.NextBackup`, `.LastCheckOK`, …) that 8C removed from the backend — the
|
||||||
|
whole restic snapshot tier + repo stats + snapshot history + restic-password UI is dead. That's a
|
||||||
|
**backups-page de-priv rebuild** (a design slice: what the page shows in the app-data-only model), well
|
||||||
|
beyond the CrossDrive cleanup this spec scoped (the spec listed "backups.html (5)" = the CrossDrive refs,
|
||||||
|
which I removed). `/backups` was already 500'ing before this task. **Recommend it as the next slice.**
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
- The traefik **dashboard route** (`dynamic/dashboard.yml`) remains deferred (needs a generated
|
- No agent disk-subsystem or gate changes; the only agent change is the read-only `durable_id` exposure
|
||||||
basic-auth hash) — routing/cert for filebrowser + controller work without it.
|
(v0.22.0) the user approved (without it the de-privileged controller can't learn the fs UUID `assign`
|
||||||
- HTTP-01 path (ACME email but no CF token) can't issue wildcards → falls back to a plain TLS route
|
needs). Golden rebaked with controller 0.43.0 so fresh provisions get the rebuilt UI.
|
||||||
(self-signed). The felhom production always uses Cloudflare DNS-01, so the wildcard path is the norm.
|
|
||||||
|
|||||||
@@ -514,6 +514,28 @@ not just those with HDD data. Non-HDD apps can configure destination, method, an
|
|||||||
|
|
||||||
### 4. Storage Management
|
### 4. Storage Management
|
||||||
|
|
||||||
|
> **⚠️ Rebuilt on the agent-delegated disk model (v0.43.0).** After the 8C de-privileging, the controller
|
||||||
|
> holds **no Proxmox/disk credentials and no destructive authority** — disk execution + the data-bearing
|
||||||
|
> signature gate live entirely in the **host agent**. The controller is now a thin presenter/orchestrator:
|
||||||
|
> - **Overview** (`settings.html` ← `GET /api/disks`): the agent's live disk view (name/type/state/device/
|
||||||
|
> mount/class) + the **`data_bearing`** badge + "registered?" cross-reference.
|
||||||
|
> - **Guided init** (`/settings/storage/init`, `POST /api/storage/init`, `web/storage_handlers.go`): format
|
||||||
|
> → resolve the new fs UUID from the re-listed disks (`durable_id`, `uuid:`-stripped) → `assign` (mount)
|
||||||
|
> → register a `StoragePath`. **A data-bearing device is REFUSED by the agent** (`pending_op`); the UI
|
||||||
|
> surfaces the exact `felhom-opsign -op storage_wipe -host … -durable-id …` command and stops — **there
|
||||||
|
> is no force-format**. The agent's `data_bearing` verdict (it inspects the device) is ground truth.
|
||||||
|
> - **Guided attach** (`/settings/storage/attach`, `POST /api/storage/attach`): non-destructive — resolve
|
||||||
|
> the existing fs UUID → `assign` → register.
|
||||||
|
> - **Eject** (`POST /api/storage/eject`): benign unmount + deregister, with the agent's dependent-guest warning.
|
||||||
|
> - **`agentapi`** (`internal/agentapi`) is the pinned client to the agent local API: `Disks`/`AssignDisk`/
|
||||||
|
> `EjectDisk`/`FormatDisk`; `DiskInfo.FSUUID()` + `FormatResult.PendingOp.OpsignCommand()`.
|
||||||
|
> - The **`StoragePath` registry** (`settings.go`: `AddStoragePath`/default/schedulable/label) is unchanged —
|
||||||
|
> init/attach register into it; the existing per-path management handlers stay.
|
||||||
|
> - **Migration** (drive + per-stack) is **deferred** to its own slice (buttons disabled "Hamarosan").
|
||||||
|
>
|
||||||
|
> The privileged controller-side disk subsections **below are historical** (the `internal/storage/*` scan/
|
||||||
|
> format code was removed in 8C — execution is the agent's now).
|
||||||
|
|
||||||
The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration.
|
The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration.
|
||||||
|
|
||||||
#### Disk Scanning (`internal/storage/scan.go`)
|
#### Disk Scanning (`internal/storage/scan.go`)
|
||||||
|
|||||||
Reference in New Issue
Block a user