diff --git a/TASK.md b/TASK.md index 7fbba80..dd17be2 100644 --- a/TASK.md +++ b/TASK.md @@ -1,36 +1,377 @@ -# BUGFIX: Missing disk tools (sfdisk, mkfs.ext4, etc.) +# TASK: Bugfix — Storage Initialization (FormatAndMount) -## Problem -`sfdisk` not found in container PATH → partitioning fails. +**Version:** 0.11.4 → 0.11.5 +**Priority:** Fix all 3 bugs + add safety improvements before testing format again. -## Required packages +## Context -### In the controller Dockerfile (container image) -```dockerfile -RUN apt-get update && apt-get install -y --no-install-recommends \ - fdisk \ - e2fsprogs \ - mount \ - && rm -rf /var/lib/apt/lists/* +Storage initialization wizard (scan → select disk → format → mount → register) works +up to the partitioning step. Three bugs prevent completion. Fix all three in one pass. + +**Test environment:** demo-felhom.eu, `/dev/sdb` = 931.5 GB USB HDD (HD710 PRO), +has existing GPT partition table with one partition `sdb1` (no filesystem). + +--- + +## Bug 1: sfdisk fails with "unsupported command" (CURRENT BLOCKER) + +### Error output +``` +Old situation: Device Start End Sectors Size Type +/host-dev/sdb1 2048 1953523711 1953521664 931.5G Linux filesystem +>>> Script header accepted. +>>> line 2: unsupported command +Hiba: exit status 1 ``` -Provides: -- `fdisk` package → `sfdisk`, `fdisk`, `cfdisk` -- `e2fsprogs` package → `mkfs.ext4`, `e2fsck`, `tune2fs` -- `mount` package → `mount`, `umount`, `blkid`, `findmnt`, `lsblk` +### Root cause +Two issues in `format_linux.go` line ~10225: -**Note:** `util-linux` (which provides `lsblk`, `blkid`, `mount`) is likely already installed as a base dependency. Check with `docker exec felhom-controller which lsblk` — if it exists, skip `mount` package. The critical missing ones are `fdisk` and `e2fsprogs`. - -### On the host node (docker-setup.sh) -These should already exist on a standard Debian 13 install, but ensure: -```bash -apt-get install -y fdisk e2fsprogs util-linux +```go +sfdiskInput := "label: gpt\n,,,L\n" +cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath)) ``` -Add to docker-setup.sh's dependency installation section if not already there. The host needs these for `mount -a` (fstab reload) and as fallback. +1. **`,,,L` type shorthand fails on GPT** — this sfdisk version doesn't accept `L` as type + for GPT disklabel. For GPT, sfdisk needs the full GUID or no type (defaults to Linux filesystem). +2. **No `--force` flag** — sdb already has a GPT table with sdb1. sfdisk tries to apply the + script as a delta to the existing layout, not as a fresh layout. +3. **No `wipefs` before sfdisk** — existing partition signatures confuse sfdisk. + +### Fix in `controller/internal/storage/format_linux.go` + +Find this block (around line 10222–10230): + +```go + if req.CreatePartition { + send("partitioning", "Partíció létrehozása...", 15) + + sfdiskInput := "label: gpt\n,,,L\n" + cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath)) + cmd.Stdin = strings.NewReader(sfdiskInput) + if out, err := cmd.CombinedOutput(); err != nil { + return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err) + } +``` + +Replace with: + +```go + if req.CreatePartition { + send("partitioning", fmt.Sprintf("wipefs -a %s ...", HostDevicePath(req.DevicePath)), 12) + + // Wipe existing partition table and filesystem signatures first + _ = exec.Command("wipefs", "-a", HostDevicePath(req.DevicePath)).Run() + time.Sleep(500 * time.Millisecond) + + // Create GPT with single partition spanning whole disk + // ",," = start=default, size=default(fill disk), type=default(Linux filesystem GUID) + // --force: overwrite even if device appears busy + // --wipe always: wipe filesystem signatures from newly created partitions + send("partitioning", fmt.Sprintf("sfdisk --force --wipe always %s ...", HostDevicePath(req.DevicePath)), 15) + sfdiskInput := "label: gpt\n,,\n" + cmd := exec.Command("sfdisk", "--force", "--wipe", "always", HostDevicePath(req.DevicePath)) + cmd.Stdin = strings.NewReader(sfdiskInput) + if out, err := cmd.CombinedOutput(); err != nil { + return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err) + } +``` + +--- + +## Bug 2: `mount mountPath` will fail (NEXT BLOCKER after Bug 1) + +### Current code (around line 10288–10290) + +```go + if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil { + return "", fail("mounting", "Csatlakoztatás sikertelen: "+string(out), err) + } +``` + +### Root cause + +`mount /mnt/hdd_1` works by looking up `/mnt/hdd_1` in the process's `/etc/fstab` to find +which device to mount. But inside the container, `/etc/fstab` is Docker's auto-generated fstab +(not the host's). The UUID entry was written to `/host-fstab` (the host's real fstab). + +So `mount /mnt/hdd_1` will fail with "can't find /mnt/hdd_1 in /etc/fstab" or similar. + +### Fix in `controller/internal/storage/format_linux.go` + +Find this line (around line 10288): + +```go + if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil { +``` + +Replace with: + +```go + // Mount by device path explicitly — container's /etc/fstab != host fstab, + // so "mount /mnt/hdd_1" (fstab lookup) won't work. + send("mounting", fmt.Sprintf("mount -t ext4 %s %s ...", HostDevicePath(partDev), mountPath), 70) + if out, err := exec.Command("mount", "-t", "ext4", "-o", "defaults,noatime", + HostDevicePath(partDev), mountPath).CombinedOutput(); err != nil { +``` + +The fstab entry in `/host-fstab` still ensures persistence across host reboots. +This explicit mount handles the immediate "mount it right now" operation. + +--- + +## Bug 3: Mount namespace isolation — mount won't be visible on host (RESTART BLOCKER) + +### Root cause + +Even with `privileged: true`, `mount` inside a container operates in the container's +mount namespace. The host kernel does NOT see the mount. Consequences: +- After controller container restart, the mount is gone +- Other containers can't access `/mnt/hdd_1` +- The bind mount `- /mnt:/mnt:rw` shares existing host mounts INTO the container, + but new mounts created inside the container don't propagate BACK to the host + +### Fix: Change `/mnt` volume to use `rshared` mount propagation + +#### 3a. `controller/docker-compose.yml` + +Find this line: + +```yaml + # All external storage — /mnt/* for multi-storage + restore + - /mnt:/mnt:rw +``` + +Replace with: + +```yaml + # All external storage — rshared propagation so mounts created inside + # the container (disk init) propagate to the host and vice versa + - type: bind + source: /mnt + target: /mnt + bind: + propagation: rshared +``` + +**Important:** This uses Docker Compose long-form volume syntax. The rest of the volumes +can stay in short form. Only `/mnt` needs propagation. + +#### 3b. `scripts/docker-setup.sh` — Add mount propagation setup + +Find the section where the script does final setup steps (after Docker installation, +before or after compose generation). Add: -## Quick verification after rebuild ```bash -docker exec felhom-controller which sfdisk mkfs.ext4 blkid mount lsblk -# Should show paths for all five -``` \ No newline at end of file +# Enable shared mount propagation on /mnt (required for controller disk init) +# This allows mounts created inside the controller container to propagate to the host +log_info "Configuring mount propagation for /mnt..." +mount --make-rshared /mnt 2>/dev/null || mount --make-shared /mnt 2>/dev/null || true +``` + +**Also** add a comment near the controller compose generation (if any) explaining this requirement. + +If `docker-setup.sh` doesn't generate the controller compose, just add the `mount --make-rshared` +to the node preparation section. It's idempotent and safe to run multiple times. + +--- + +## Safety improvement 1: Post-mount verification + +### What +After mount succeeds (exit code 0), verify the mount is actually visible. + +### Where +In `format_linux.go`, right after the mount command succeeds and BEFORE the +`send("mounting", "Csatlakoztatva..."` line, add: + +```go + // Verify mount actually worked (don't just trust exit code) + verifyOut, verifyErr := exec.Command("findmnt", "-n", "-o", "SOURCE", "--target", mountPath).Output() + if verifyErr != nil || strings.TrimSpace(string(verifyOut)) == "" { + return "", fail("mounting", "A csatlakoztatás nem ellenőrizhető: a mount parancs sikerült, de a meghajtó nem látható a rendszerben", fmt.Errorf("mount point %s not found after mount", mountPath)) + } +``` + +--- + +## Safety improvement 2: Use ASCII mount name for ext4 filesystem label + +### What +The current code uses `req.Label` (user-provided display label like "Külső HDD 1TB") for the +ext4 `-L` label. ext4 labels are limited to 16 BYTES. Hungarian UTF-8 chars (ű, ó, é) are +2 bytes each, so "Külső HDD 1TB" could exceed the limit or get truncated mid-character. + +### Where +In `format_linux.go`, find the label preparation block (around line 10249–10254): + +```go + label := req.Label + if label == "" { + label = req.MountName + } + if len(label) > 16 { + label = label[:16] + } +``` + +Replace with: + +```go + // Use ASCII-safe mount name for ext4 filesystem label (16-byte limit). + // The display label (req.Label) stays in settings.json for the UI. + fsLabel := req.MountName + if len(fsLabel) > 16 { + fsLabel = fsLabel[:16] + } +``` + +Then update the mkfs.ext4 call right below to use `fsLabel` instead of `label`: + +```go + mkfsCmd := exec.Command("mkfs.ext4", "-L", fsLabel, "-F", HostDevicePath(partDev)) +``` + +--- + +## Safety improvement 3: Smart partition handling (skip repartition when unnecessary) + +### What +The scan shows sdb has 1 partition (sdb1) with no filesystem. The JS always sends +`CreatePartition: true` (because `disk.CreatePartition` is undefined on the `BlockDevice` +struct, so `undefined !== false` evaluates to `true` in JS). + +For a disk that already has exactly one partition with no filesystem, we should skip +the destructive repartition step and just format the existing partition directly. + +### Where +In `handlers.go`, in `storageInitAPIHandler`, AFTER building `fmtReq` (around line 14175–14180) +and BEFORE the `go func()` goroutine, add: + +```go + // Smart partition: if device is a whole disk with exactly 1 partition + // with no filesystem, skip repartitioning — just format existing partition + if fmtReq.CreatePartition { + result, scanErr := storage.ScanDisks() + if scanErr == nil { + for _, disk := range result.AvailableDisks { + if disk.Path == req.DevicePath && len(disk.Partitions) == 1 && disk.Partitions[0].FSType == "" { + s.logger.Printf("[INFO] Disk %s has 1 empty partition (%s) — skipping repartition", + req.DevicePath, disk.Partitions[0].Path) + fmtReq.DevicePath = disk.Partitions[0].Path // e.g., "/dev/sdb1" + fmtReq.CreatePartition = false + break + } + } + } + } +``` + +This way, for demo sdb (which has sdb1 with no FS), it will: +1. Set DevicePath to `/dev/sdb1` +2. Set CreatePartition to `false` +3. Skip wipefs + sfdisk entirely +4. Go straight to `mkfs.ext4 /host-dev/sdb1` + +**Note:** The wipefs+sfdisk fix (Bug 1) is still needed as fallback for truly +unpartitioned disks or disks with multiple/incompatible partitions. + +--- + +## Safety improvement 4: Descriptive progress messages + +### What +Include executed command details in progress messages for remote debugging. +The progress messages show in the UI and get logged by the handler. + +### Where +Throughout `format_linux.go`, update the `send()` calls to include command info. +Examples already shown in the Bug 1 and Bug 2 fixes above. Also update: + +For the mkfs step: +```go + send("formatting", fmt.Sprintf("mkfs.ext4 -L %s -F %s ...", fsLabel, HostDevicePath(partDev)), 30) +``` + +For the blkid step (around line 10274): +```go + send("mounting", fmt.Sprintf("UUID lekérése: blkid %s ...", HostDevicePath(partDev)), 65) +``` + +--- + +## Summary: All changes by file + +### `controller/internal/storage/format_linux.go` (5 changes) + +1. Partition block: Add `wipefs -a`; change sfdisk input `",,,L"` → `",,"`; + add `--force --wipe always` flags +2. Mount block: Change `mount mountPath` → `mount -t ext4 -o defaults,noatime HostDevicePath(partDev) mountPath` +3. After mount: Add `findmnt` verification +4. Label: Use `req.MountName` (ASCII) instead of `req.Label` (UTF-8) for `mkfs.ext4 -L` +5. Progress messages: Include command details in `send()` calls + +### `controller/docker-compose.yml` (1 change) + +6. Change `/mnt:/mnt:rw` to long-form syntax with `propagation: rshared` + +### `controller/internal/web/handlers.go` (1 change) + +7. In `storageInitAPIHandler`: Add smart partition detection before launching goroutine + +### `scripts/docker-setup.sh` (1 change) + +8. Add `mount --make-rshared /mnt` to node preparation section + +--- + +## Build & deploy procedure + +```bash +# 1. On the host FIRST (before restarting controller): +sudo mount --make-rshared /mnt + +# 2. Build new image with fixes (normal build process) + +# 3. Deploy +cd /opt/docker/felhom-controller +sudo docker compose up -d + +# 4. Verify container sees /host-dev +docker exec felhom-controller ls /host-dev/sd* + +# 5. Verify rshared propagation is active +docker inspect felhom-controller --format '{{range .Mounts}}{{if eq .Destination "/mnt"}}Propagation={{.Propagation}}{{end}}{{end}}' +# Should show: Propagation=rshared + +# 6. Test storage init wizard: +# - Scan → sdb appears +# - Select sdb → configure hdd_1 → type FORMÁZÁS +# - Watch progress panel — should show command details +# - Should complete successfully + +# 7. Verify mount on HOST (proves propagation): +findmnt /mnt/hdd_1 +# Should show /dev/sdb1 mounted at /mnt/hdd_1 + +# 8. Verify fstab entry: +grep hdd_1 /etc/fstab +# Should show UUID=... /mnt/hdd_1 ext4 defaults,nofail,noatime 0 2 + +# 9. Verify storage registered in settings: +# Visit Settings page → Adattárolók → /mnt/hdd_1 should appear + +# 10. Restart controller — verify mount survives: +docker restart felhom-controller +docker exec felhom-controller ls /mnt/hdd_1/ +# Should show: storage/ Dokumentumok/ +``` + +--- + +## What NOT to change + +- **Dockerfile** — packages already correct (fdisk, e2fsprogs, util-linux, rsync, parted) +- **scan_linux.go** — scan works correctly after v0.11.1 fixes +- **safety_linux.go / safety.go** — system disk detection works +- **Template/JS** — wizard UI works fine; `CreatePartition` default-true is handled in handler \ No newline at end of file