Bugfix — Storage Initialization (FormatAndMount)

This commit is contained in:
2026-02-17 11:35:52 +01:00
parent 4f3401dc4c
commit 252058aec0
+367 -26
View File
@@ -1,36 +1,377 @@
# BUGFIX: Missing disk tools (sfdisk, mkfs.ext4, etc.)
# TASK: Bugfix — Storage Initialization (FormatAndMount)
## Problem
`sfdisk` not found in container PATH → partitioning fails.
**Version:** 0.11.4 → 0.11.5
**Priority:** Fix all 3 bugs + add safety improvements before testing format again.
## Required packages
## Context
### In the controller Dockerfile (container image)
```dockerfile
RUN apt-get update && apt-get install -y --no-install-recommends \
fdisk \
e2fsprogs \
mount \
&& rm -rf /var/lib/apt/lists/*
Storage initialization wizard (scan → select disk → format → mount → register) works
up to the partitioning step. Three bugs prevent completion. Fix all three in one pass.
**Test environment:** demo-felhom.eu, `/dev/sdb` = 931.5 GB USB HDD (HD710 PRO),
has existing GPT partition table with one partition `sdb1` (no filesystem).
---
## Bug 1: sfdisk fails with "unsupported command" (CURRENT BLOCKER)
### Error output
```
Old situation: Device Start End Sectors Size Type
/host-dev/sdb1 2048 1953523711 1953521664 931.5G Linux filesystem
>>> Script header accepted.
>>> line 2: unsupported command
Hiba: exit status 1
```
Provides:
- `fdisk` package → `sfdisk`, `fdisk`, `cfdisk`
- `e2fsprogs` package → `mkfs.ext4`, `e2fsck`, `tune2fs`
- `mount` package → `mount`, `umount`, `blkid`, `findmnt`, `lsblk`
### Root cause
Two issues in `format_linux.go` line ~10225:
**Note:** `util-linux` (which provides `lsblk`, `blkid`, `mount`) is likely already installed as a base dependency. Check with `docker exec felhom-controller which lsblk` — if it exists, skip `mount` package. The critical missing ones are `fdisk` and `e2fsprogs`.
### On the host node (docker-setup.sh)
These should already exist on a standard Debian 13 install, but ensure:
```bash
apt-get install -y fdisk e2fsprogs util-linux
```go
sfdiskInput := "label: gpt\n,,,L\n"
cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
```
Add to docker-setup.sh's dependency installation section if not already there. The host needs these for `mount -a` (fstab reload) and as fallback.
1. **`,,,L` type shorthand fails on GPT** — this sfdisk version doesn't accept `L` as type
for GPT disklabel. For GPT, sfdisk needs the full GUID or no type (defaults to Linux filesystem).
2. **No `--force` flag** — sdb already has a GPT table with sdb1. sfdisk tries to apply the
script as a delta to the existing layout, not as a fresh layout.
3. **No `wipefs` before sfdisk** — existing partition signatures confuse sfdisk.
### Fix in `controller/internal/storage/format_linux.go`
Find this block (around line 1022210230):
```go
if req.CreatePartition {
send("partitioning", "Partíció létrehozása...", 15)
sfdiskInput := "label: gpt\n,,,L\n"
cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
cmd.Stdin = strings.NewReader(sfdiskInput)
if out, err := cmd.CombinedOutput(); err != nil {
return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
}
```
Replace with:
```go
if req.CreatePartition {
send("partitioning", fmt.Sprintf("wipefs -a %s ...", HostDevicePath(req.DevicePath)), 12)
// Wipe existing partition table and filesystem signatures first
_ = exec.Command("wipefs", "-a", HostDevicePath(req.DevicePath)).Run()
time.Sleep(500 * time.Millisecond)
// Create GPT with single partition spanning whole disk
// ",," = start=default, size=default(fill disk), type=default(Linux filesystem GUID)
// --force: overwrite even if device appears busy
// --wipe always: wipe filesystem signatures from newly created partitions
send("partitioning", fmt.Sprintf("sfdisk --force --wipe always %s ...", HostDevicePath(req.DevicePath)), 15)
sfdiskInput := "label: gpt\n,,\n"
cmd := exec.Command("sfdisk", "--force", "--wipe", "always", HostDevicePath(req.DevicePath))
cmd.Stdin = strings.NewReader(sfdiskInput)
if out, err := cmd.CombinedOutput(); err != nil {
return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
}
```
---
## Bug 2: `mount mountPath` will fail (NEXT BLOCKER after Bug 1)
### Current code (around line 1028810290)
```go
if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {
return "", fail("mounting", "Csatlakoztatás sikertelen: "+string(out), err)
}
```
### Root cause
`mount /mnt/hdd_1` works by looking up `/mnt/hdd_1` in the process's `/etc/fstab` to find
which device to mount. But inside the container, `/etc/fstab` is Docker's auto-generated fstab
(not the host's). The UUID entry was written to `/host-fstab` (the host's real fstab).
So `mount /mnt/hdd_1` will fail with "can't find /mnt/hdd_1 in /etc/fstab" or similar.
### Fix in `controller/internal/storage/format_linux.go`
Find this line (around line 10288):
```go
if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {
```
Replace with:
```go
// Mount by device path explicitly — container's /etc/fstab != host fstab,
// so "mount /mnt/hdd_1" (fstab lookup) won't work.
send("mounting", fmt.Sprintf("mount -t ext4 %s %s ...", HostDevicePath(partDev), mountPath), 70)
if out, err := exec.Command("mount", "-t", "ext4", "-o", "defaults,noatime",
HostDevicePath(partDev), mountPath).CombinedOutput(); err != nil {
```
The fstab entry in `/host-fstab` still ensures persistence across host reboots.
This explicit mount handles the immediate "mount it right now" operation.
---
## Bug 3: Mount namespace isolation — mount won't be visible on host (RESTART BLOCKER)
### Root cause
Even with `privileged: true`, `mount` inside a container operates in the container's
mount namespace. The host kernel does NOT see the mount. Consequences:
- After controller container restart, the mount is gone
- Other containers can't access `/mnt/hdd_1`
- The bind mount `- /mnt:/mnt:rw` shares existing host mounts INTO the container,
but new mounts created inside the container don't propagate BACK to the host
### Fix: Change `/mnt` volume to use `rshared` mount propagation
#### 3a. `controller/docker-compose.yml`
Find this line:
```yaml
# All external storage — /mnt/* for multi-storage + restore
- /mnt:/mnt:rw
```
Replace with:
```yaml
# All external storage — rshared propagation so mounts created inside
# the container (disk init) propagate to the host and vice versa
- type: bind
source: /mnt
target: /mnt
bind:
propagation: rshared
```
**Important:** This uses Docker Compose long-form volume syntax. The rest of the volumes
can stay in short form. Only `/mnt` needs propagation.
#### 3b. `scripts/docker-setup.sh` — Add mount propagation setup
Find the section where the script does final setup steps (after Docker installation,
before or after compose generation). Add:
## Quick verification after rebuild
```bash
docker exec felhom-controller which sfdisk mkfs.ext4 blkid mount lsblk
# Should show paths for all five
```
# Enable shared mount propagation on /mnt (required for controller disk init)
# This allows mounts created inside the controller container to propagate to the host
log_info "Configuring mount propagation for /mnt..."
mount --make-rshared /mnt 2>/dev/null || mount --make-shared /mnt 2>/dev/null || true
```
**Also** add a comment near the controller compose generation (if any) explaining this requirement.
If `docker-setup.sh` doesn't generate the controller compose, just add the `mount --make-rshared`
to the node preparation section. It's idempotent and safe to run multiple times.
---
## Safety improvement 1: Post-mount verification
### What
After mount succeeds (exit code 0), verify the mount is actually visible.
### Where
In `format_linux.go`, right after the mount command succeeds and BEFORE the
`send("mounting", "Csatlakoztatva..."` line, add:
```go
// Verify mount actually worked (don't just trust exit code)
verifyOut, verifyErr := exec.Command("findmnt", "-n", "-o", "SOURCE", "--target", mountPath).Output()
if verifyErr != nil || strings.TrimSpace(string(verifyOut)) == "" {
return "", fail("mounting", "A csatlakoztatás nem ellenőrizhető: a mount parancs sikerült, de a meghajtó nem látható a rendszerben", fmt.Errorf("mount point %s not found after mount", mountPath))
}
```
---
## Safety improvement 2: Use ASCII mount name for ext4 filesystem label
### What
The current code uses `req.Label` (user-provided display label like "Külső HDD 1TB") for the
ext4 `-L` label. ext4 labels are limited to 16 BYTES. Hungarian UTF-8 chars (ű, ó, é) are
2 bytes each, so "Külső HDD 1TB" could exceed the limit or get truncated mid-character.
### Where
In `format_linux.go`, find the label preparation block (around line 1024910254):
```go
label := req.Label
if label == "" {
label = req.MountName
}
if len(label) > 16 {
label = label[:16]
}
```
Replace with:
```go
// Use ASCII-safe mount name for ext4 filesystem label (16-byte limit).
// The display label (req.Label) stays in settings.json for the UI.
fsLabel := req.MountName
if len(fsLabel) > 16 {
fsLabel = fsLabel[:16]
}
```
Then update the mkfs.ext4 call right below to use `fsLabel` instead of `label`:
```go
mkfsCmd := exec.Command("mkfs.ext4", "-L", fsLabel, "-F", HostDevicePath(partDev))
```
---
## Safety improvement 3: Smart partition handling (skip repartition when unnecessary)
### What
The scan shows sdb has 1 partition (sdb1) with no filesystem. The JS always sends
`CreatePartition: true` (because `disk.CreatePartition` is undefined on the `BlockDevice`
struct, so `undefined !== false` evaluates to `true` in JS).
For a disk that already has exactly one partition with no filesystem, we should skip
the destructive repartition step and just format the existing partition directly.
### Where
In `handlers.go`, in `storageInitAPIHandler`, AFTER building `fmtReq` (around line 1417514180)
and BEFORE the `go func()` goroutine, add:
```go
// Smart partition: if device is a whole disk with exactly 1 partition
// with no filesystem, skip repartitioning — just format existing partition
if fmtReq.CreatePartition {
result, scanErr := storage.ScanDisks()
if scanErr == nil {
for _, disk := range result.AvailableDisks {
if disk.Path == req.DevicePath && len(disk.Partitions) == 1 && disk.Partitions[0].FSType == "" {
s.logger.Printf("[INFO] Disk %s has 1 empty partition (%s) — skipping repartition",
req.DevicePath, disk.Partitions[0].Path)
fmtReq.DevicePath = disk.Partitions[0].Path // e.g., "/dev/sdb1"
fmtReq.CreatePartition = false
break
}
}
}
}
```
This way, for demo sdb (which has sdb1 with no FS), it will:
1. Set DevicePath to `/dev/sdb1`
2. Set CreatePartition to `false`
3. Skip wipefs + sfdisk entirely
4. Go straight to `mkfs.ext4 /host-dev/sdb1`
**Note:** The wipefs+sfdisk fix (Bug 1) is still needed as fallback for truly
unpartitioned disks or disks with multiple/incompatible partitions.
---
## Safety improvement 4: Descriptive progress messages
### What
Include executed command details in progress messages for remote debugging.
The progress messages show in the UI and get logged by the handler.
### Where
Throughout `format_linux.go`, update the `send()` calls to include command info.
Examples already shown in the Bug 1 and Bug 2 fixes above. Also update:
For the mkfs step:
```go
send("formatting", fmt.Sprintf("mkfs.ext4 -L %s -F %s ...", fsLabel, HostDevicePath(partDev)), 30)
```
For the blkid step (around line 10274):
```go
send("mounting", fmt.Sprintf("UUID lekérése: blkid %s ...", HostDevicePath(partDev)), 65)
```
---
## Summary: All changes by file
### `controller/internal/storage/format_linux.go` (5 changes)
1. Partition block: Add `wipefs -a`; change sfdisk input `",,,L"``",,"`;
add `--force --wipe always` flags
2. Mount block: Change `mount mountPath``mount -t ext4 -o defaults,noatime HostDevicePath(partDev) mountPath`
3. After mount: Add `findmnt` verification
4. Label: Use `req.MountName` (ASCII) instead of `req.Label` (UTF-8) for `mkfs.ext4 -L`
5. Progress messages: Include command details in `send()` calls
### `controller/docker-compose.yml` (1 change)
6. Change `/mnt:/mnt:rw` to long-form syntax with `propagation: rshared`
### `controller/internal/web/handlers.go` (1 change)
7. In `storageInitAPIHandler`: Add smart partition detection before launching goroutine
### `scripts/docker-setup.sh` (1 change)
8. Add `mount --make-rshared /mnt` to node preparation section
---
## Build & deploy procedure
```bash
# 1. On the host FIRST (before restarting controller):
sudo mount --make-rshared /mnt
# 2. Build new image with fixes (normal build process)
# 3. Deploy
cd /opt/docker/felhom-controller
sudo docker compose up -d
# 4. Verify container sees /host-dev
docker exec felhom-controller ls /host-dev/sd*
# 5. Verify rshared propagation is active
docker inspect felhom-controller --format '{{range .Mounts}}{{if eq .Destination "/mnt"}}Propagation={{.Propagation}}{{end}}{{end}}'
# Should show: Propagation=rshared
# 6. Test storage init wizard:
# - Scan → sdb appears
# - Select sdb → configure hdd_1 → type FORMÁZÁS
# - Watch progress panel — should show command details
# - Should complete successfully
# 7. Verify mount on HOST (proves propagation):
findmnt /mnt/hdd_1
# Should show /dev/sdb1 mounted at /mnt/hdd_1
# 8. Verify fstab entry:
grep hdd_1 /etc/fstab
# Should show UUID=... /mnt/hdd_1 ext4 defaults,nofail,noatime 0 2
# 9. Verify storage registered in settings:
# Visit Settings page → Adattárolók → /mnt/hdd_1 should appear
# 10. Restart controller — verify mount survives:
docker restart felhom-controller
docker exec felhom-controller ls /mnt/hdd_1/
# Should show: storage/ Dokumentumok/
```
---
## What NOT to change
- **Dockerfile** — packages already correct (fdisk, e2fsprogs, util-linux, rsync, parted)
- **scan_linux.go** — scan works correctly after v0.11.1 fixes
- **safety_linux.go / safety.go** — system disk detection works
- **Template/JS** — wizard UI works fine; `CreatePartition` default-true is handled in handler