Bugfix — Storage Initialization (FormatAndMount)
This commit is contained in:
@@ -1,36 +1,377 @@
|
|||||||
# BUGFIX: Missing disk tools (sfdisk, mkfs.ext4, etc.)
|
# TASK: Bugfix — Storage Initialization (FormatAndMount)
|
||||||
|
|
||||||
## Problem
|
**Version:** 0.11.4 → 0.11.5
|
||||||
`sfdisk` not found in container PATH → partitioning fails.
|
**Priority:** Fix all 3 bugs + add safety improvements before testing format again.
|
||||||
|
|
||||||
## Required packages
|
## Context
|
||||||
|
|
||||||
### In the controller Dockerfile (container image)
|
Storage initialization wizard (scan → select disk → format → mount → register) works
|
||||||
```dockerfile
|
up to the partitioning step. Three bugs prevent completion. Fix all three in one pass.
|
||||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
||||||
fdisk \
|
**Test environment:** demo-felhom.eu, `/dev/sdb` = 931.5 GB USB HDD (HD710 PRO),
|
||||||
e2fsprogs \
|
has existing GPT partition table with one partition `sdb1` (no filesystem).
|
||||||
mount \
|
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
---
|
||||||
|
|
||||||
|
## Bug 1: sfdisk fails with "unsupported command" (CURRENT BLOCKER)
|
||||||
|
|
||||||
|
### Error output
|
||||||
|
```
|
||||||
|
Old situation: Device Start End Sectors Size Type
|
||||||
|
/host-dev/sdb1 2048 1953523711 1953521664 931.5G Linux filesystem
|
||||||
|
>>> Script header accepted.
|
||||||
|
>>> line 2: unsupported command
|
||||||
|
Hiba: exit status 1
|
||||||
```
|
```
|
||||||
|
|
||||||
Provides:
|
### Root cause
|
||||||
- `fdisk` package → `sfdisk`, `fdisk`, `cfdisk`
|
Two issues in `format_linux.go` line ~10225:
|
||||||
- `e2fsprogs` package → `mkfs.ext4`, `e2fsck`, `tune2fs`
|
|
||||||
- `mount` package → `mount`, `umount`, `blkid`, `findmnt`, `lsblk`
|
|
||||||
|
|
||||||
**Note:** `util-linux` (which provides `lsblk`, `blkid`, `mount`) is likely already installed as a base dependency. Check with `docker exec felhom-controller which lsblk` — if it exists, skip `mount` package. The critical missing ones are `fdisk` and `e2fsprogs`.
|
```go
|
||||||
|
sfdiskInput := "label: gpt\n,,,L\n"
|
||||||
### On the host node (docker-setup.sh)
|
cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
|
||||||
These should already exist on a standard Debian 13 install, but ensure:
|
|
||||||
```bash
|
|
||||||
apt-get install -y fdisk e2fsprogs util-linux
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Add to docker-setup.sh's dependency installation section if not already there. The host needs these for `mount -a` (fstab reload) and as fallback.
|
1. **`,,,L` type shorthand fails on GPT** — this sfdisk version doesn't accept `L` as type
|
||||||
|
for GPT disklabel. For GPT, sfdisk needs the full GUID or no type (defaults to Linux filesystem).
|
||||||
|
2. **No `--force` flag** — sdb already has a GPT table with sdb1. sfdisk tries to apply the
|
||||||
|
script as a delta to the existing layout, not as a fresh layout.
|
||||||
|
3. **No `wipefs` before sfdisk** — existing partition signatures confuse sfdisk.
|
||||||
|
|
||||||
|
### Fix in `controller/internal/storage/format_linux.go`
|
||||||
|
|
||||||
|
Find this block (around line 10222–10230):
|
||||||
|
|
||||||
|
```go
|
||||||
|
if req.CreatePartition {
|
||||||
|
send("partitioning", "Partíció létrehozása...", 15)
|
||||||
|
|
||||||
|
sfdiskInput := "label: gpt\n,,,L\n"
|
||||||
|
cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
|
||||||
|
cmd.Stdin = strings.NewReader(sfdiskInput)
|
||||||
|
if out, err := cmd.CombinedOutput(); err != nil {
|
||||||
|
return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace with:
|
||||||
|
|
||||||
|
```go
|
||||||
|
if req.CreatePartition {
|
||||||
|
send("partitioning", fmt.Sprintf("wipefs -a %s ...", HostDevicePath(req.DevicePath)), 12)
|
||||||
|
|
||||||
|
// Wipe existing partition table and filesystem signatures first
|
||||||
|
_ = exec.Command("wipefs", "-a", HostDevicePath(req.DevicePath)).Run()
|
||||||
|
time.Sleep(500 * time.Millisecond)
|
||||||
|
|
||||||
|
// Create GPT with single partition spanning whole disk
|
||||||
|
// ",," = start=default, size=default(fill disk), type=default(Linux filesystem GUID)
|
||||||
|
// --force: overwrite even if device appears busy
|
||||||
|
// --wipe always: wipe filesystem signatures from newly created partitions
|
||||||
|
send("partitioning", fmt.Sprintf("sfdisk --force --wipe always %s ...", HostDevicePath(req.DevicePath)), 15)
|
||||||
|
sfdiskInput := "label: gpt\n,,\n"
|
||||||
|
cmd := exec.Command("sfdisk", "--force", "--wipe", "always", HostDevicePath(req.DevicePath))
|
||||||
|
cmd.Stdin = strings.NewReader(sfdiskInput)
|
||||||
|
if out, err := cmd.CombinedOutput(); err != nil {
|
||||||
|
return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Bug 2: `mount mountPath` will fail (NEXT BLOCKER after Bug 1)
|
||||||
|
|
||||||
|
### Current code (around line 10288–10290)
|
||||||
|
|
||||||
|
```go
|
||||||
|
if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {
|
||||||
|
return "", fail("mounting", "Csatlakoztatás sikertelen: "+string(out), err)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Root cause
|
||||||
|
|
||||||
|
`mount /mnt/hdd_1` works by looking up `/mnt/hdd_1` in the process's `/etc/fstab` to find
|
||||||
|
which device to mount. But inside the container, `/etc/fstab` is Docker's auto-generated fstab
|
||||||
|
(not the host's). The UUID entry was written to `/host-fstab` (the host's real fstab).
|
||||||
|
|
||||||
|
So `mount /mnt/hdd_1` will fail with "can't find /mnt/hdd_1 in /etc/fstab" or similar.
|
||||||
|
|
||||||
|
### Fix in `controller/internal/storage/format_linux.go`
|
||||||
|
|
||||||
|
Find this line (around line 10288):
|
||||||
|
|
||||||
|
```go
|
||||||
|
if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace with:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Mount by device path explicitly — container's /etc/fstab != host fstab,
|
||||||
|
// so "mount /mnt/hdd_1" (fstab lookup) won't work.
|
||||||
|
send("mounting", fmt.Sprintf("mount -t ext4 %s %s ...", HostDevicePath(partDev), mountPath), 70)
|
||||||
|
if out, err := exec.Command("mount", "-t", "ext4", "-o", "defaults,noatime",
|
||||||
|
HostDevicePath(partDev), mountPath).CombinedOutput(); err != nil {
|
||||||
|
```
|
||||||
|
|
||||||
|
The fstab entry in `/host-fstab` still ensures persistence across host reboots.
|
||||||
|
This explicit mount handles the immediate "mount it right now" operation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Bug 3: Mount namespace isolation — mount won't be visible on host (RESTART BLOCKER)
|
||||||
|
|
||||||
|
### Root cause
|
||||||
|
|
||||||
|
Even with `privileged: true`, `mount` inside a container operates in the container's
|
||||||
|
mount namespace. The host kernel does NOT see the mount. Consequences:
|
||||||
|
- After controller container restart, the mount is gone
|
||||||
|
- Other containers can't access `/mnt/hdd_1`
|
||||||
|
- The bind mount `- /mnt:/mnt:rw` shares existing host mounts INTO the container,
|
||||||
|
but new mounts created inside the container don't propagate BACK to the host
|
||||||
|
|
||||||
|
### Fix: Change `/mnt` volume to use `rshared` mount propagation
|
||||||
|
|
||||||
|
#### 3a. `controller/docker-compose.yml`
|
||||||
|
|
||||||
|
Find this line:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# All external storage — /mnt/* for multi-storage + restore
|
||||||
|
- /mnt:/mnt:rw
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace with:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# All external storage — rshared propagation so mounts created inside
|
||||||
|
# the container (disk init) propagate to the host and vice versa
|
||||||
|
- type: bind
|
||||||
|
source: /mnt
|
||||||
|
target: /mnt
|
||||||
|
bind:
|
||||||
|
propagation: rshared
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:** This uses Docker Compose long-form volume syntax. The rest of the volumes
|
||||||
|
can stay in short form. Only `/mnt` needs propagation.
|
||||||
|
|
||||||
|
#### 3b. `scripts/docker-setup.sh` — Add mount propagation setup
|
||||||
|
|
||||||
|
Find the section where the script does final setup steps (after Docker installation,
|
||||||
|
before or after compose generation). Add:
|
||||||
|
|
||||||
## Quick verification after rebuild
|
|
||||||
```bash
|
```bash
|
||||||
docker exec felhom-controller which sfdisk mkfs.ext4 blkid mount lsblk
|
# Enable shared mount propagation on /mnt (required for controller disk init)
|
||||||
# Should show paths for all five
|
# This allows mounts created inside the controller container to propagate to the host
|
||||||
```
|
log_info "Configuring mount propagation for /mnt..."
|
||||||
|
mount --make-rshared /mnt 2>/dev/null || mount --make-shared /mnt 2>/dev/null || true
|
||||||
|
```
|
||||||
|
|
||||||
|
**Also** add a comment near the controller compose generation (if any) explaining this requirement.
|
||||||
|
|
||||||
|
If `docker-setup.sh` doesn't generate the controller compose, just add the `mount --make-rshared`
|
||||||
|
to the node preparation section. It's idempotent and safe to run multiple times.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Safety improvement 1: Post-mount verification
|
||||||
|
|
||||||
|
### What
|
||||||
|
After mount succeeds (exit code 0), verify the mount is actually visible.
|
||||||
|
|
||||||
|
### Where
|
||||||
|
In `format_linux.go`, right after the mount command succeeds and BEFORE the
|
||||||
|
`send("mounting", "Csatlakoztatva..."` line, add:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Verify mount actually worked (don't just trust exit code)
|
||||||
|
verifyOut, verifyErr := exec.Command("findmnt", "-n", "-o", "SOURCE", "--target", mountPath).Output()
|
||||||
|
if verifyErr != nil || strings.TrimSpace(string(verifyOut)) == "" {
|
||||||
|
return "", fail("mounting", "A csatlakoztatás nem ellenőrizhető: a mount parancs sikerült, de a meghajtó nem látható a rendszerben", fmt.Errorf("mount point %s not found after mount", mountPath))
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Safety improvement 2: Use ASCII mount name for ext4 filesystem label
|
||||||
|
|
||||||
|
### What
|
||||||
|
The current code uses `req.Label` (user-provided display label like "Külső HDD 1TB") for the
|
||||||
|
ext4 `-L` label. ext4 labels are limited to 16 BYTES. Hungarian UTF-8 chars (ű, ó, é) are
|
||||||
|
2 bytes each, so "Külső HDD 1TB" could exceed the limit or get truncated mid-character.
|
||||||
|
|
||||||
|
### Where
|
||||||
|
In `format_linux.go`, find the label preparation block (around line 10249–10254):
|
||||||
|
|
||||||
|
```go
|
||||||
|
label := req.Label
|
||||||
|
if label == "" {
|
||||||
|
label = req.MountName
|
||||||
|
}
|
||||||
|
if len(label) > 16 {
|
||||||
|
label = label[:16]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace with:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Use ASCII-safe mount name for ext4 filesystem label (16-byte limit).
|
||||||
|
// The display label (req.Label) stays in settings.json for the UI.
|
||||||
|
fsLabel := req.MountName
|
||||||
|
if len(fsLabel) > 16 {
|
||||||
|
fsLabel = fsLabel[:16]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then update the mkfs.ext4 call right below to use `fsLabel` instead of `label`:
|
||||||
|
|
||||||
|
```go
|
||||||
|
mkfsCmd := exec.Command("mkfs.ext4", "-L", fsLabel, "-F", HostDevicePath(partDev))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Safety improvement 3: Smart partition handling (skip repartition when unnecessary)
|
||||||
|
|
||||||
|
### What
|
||||||
|
The scan shows sdb has 1 partition (sdb1) with no filesystem. The JS always sends
|
||||||
|
`CreatePartition: true` (because `disk.CreatePartition` is undefined on the `BlockDevice`
|
||||||
|
struct, so `undefined !== false` evaluates to `true` in JS).
|
||||||
|
|
||||||
|
For a disk that already has exactly one partition with no filesystem, we should skip
|
||||||
|
the destructive repartition step and just format the existing partition directly.
|
||||||
|
|
||||||
|
### Where
|
||||||
|
In `handlers.go`, in `storageInitAPIHandler`, AFTER building `fmtReq` (around line 14175–14180)
|
||||||
|
and BEFORE the `go func()` goroutine, add:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Smart partition: if device is a whole disk with exactly 1 partition
|
||||||
|
// with no filesystem, skip repartitioning — just format existing partition
|
||||||
|
if fmtReq.CreatePartition {
|
||||||
|
result, scanErr := storage.ScanDisks()
|
||||||
|
if scanErr == nil {
|
||||||
|
for _, disk := range result.AvailableDisks {
|
||||||
|
if disk.Path == req.DevicePath && len(disk.Partitions) == 1 && disk.Partitions[0].FSType == "" {
|
||||||
|
s.logger.Printf("[INFO] Disk %s has 1 empty partition (%s) — skipping repartition",
|
||||||
|
req.DevicePath, disk.Partitions[0].Path)
|
||||||
|
fmtReq.DevicePath = disk.Partitions[0].Path // e.g., "/dev/sdb1"
|
||||||
|
fmtReq.CreatePartition = false
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This way, for demo sdb (which has sdb1 with no FS), it will:
|
||||||
|
1. Set DevicePath to `/dev/sdb1`
|
||||||
|
2. Set CreatePartition to `false`
|
||||||
|
3. Skip wipefs + sfdisk entirely
|
||||||
|
4. Go straight to `mkfs.ext4 /host-dev/sdb1`
|
||||||
|
|
||||||
|
**Note:** The wipefs+sfdisk fix (Bug 1) is still needed as fallback for truly
|
||||||
|
unpartitioned disks or disks with multiple/incompatible partitions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Safety improvement 4: Descriptive progress messages
|
||||||
|
|
||||||
|
### What
|
||||||
|
Include executed command details in progress messages for remote debugging.
|
||||||
|
The progress messages show in the UI and get logged by the handler.
|
||||||
|
|
||||||
|
### Where
|
||||||
|
Throughout `format_linux.go`, update the `send()` calls to include command info.
|
||||||
|
Examples already shown in the Bug 1 and Bug 2 fixes above. Also update:
|
||||||
|
|
||||||
|
For the mkfs step:
|
||||||
|
```go
|
||||||
|
send("formatting", fmt.Sprintf("mkfs.ext4 -L %s -F %s ...", fsLabel, HostDevicePath(partDev)), 30)
|
||||||
|
```
|
||||||
|
|
||||||
|
For the blkid step (around line 10274):
|
||||||
|
```go
|
||||||
|
send("mounting", fmt.Sprintf("UUID lekérése: blkid %s ...", HostDevicePath(partDev)), 65)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary: All changes by file
|
||||||
|
|
||||||
|
### `controller/internal/storage/format_linux.go` (5 changes)
|
||||||
|
|
||||||
|
1. Partition block: Add `wipefs -a`; change sfdisk input `",,,L"` → `",,"`;
|
||||||
|
add `--force --wipe always` flags
|
||||||
|
2. Mount block: Change `mount mountPath` → `mount -t ext4 -o defaults,noatime HostDevicePath(partDev) mountPath`
|
||||||
|
3. After mount: Add `findmnt` verification
|
||||||
|
4. Label: Use `req.MountName` (ASCII) instead of `req.Label` (UTF-8) for `mkfs.ext4 -L`
|
||||||
|
5. Progress messages: Include command details in `send()` calls
|
||||||
|
|
||||||
|
### `controller/docker-compose.yml` (1 change)
|
||||||
|
|
||||||
|
6. Change `/mnt:/mnt:rw` to long-form syntax with `propagation: rshared`
|
||||||
|
|
||||||
|
### `controller/internal/web/handlers.go` (1 change)
|
||||||
|
|
||||||
|
7. In `storageInitAPIHandler`: Add smart partition detection before launching goroutine
|
||||||
|
|
||||||
|
### `scripts/docker-setup.sh` (1 change)
|
||||||
|
|
||||||
|
8. Add `mount --make-rshared /mnt` to node preparation section
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Build & deploy procedure
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. On the host FIRST (before restarting controller):
|
||||||
|
sudo mount --make-rshared /mnt
|
||||||
|
|
||||||
|
# 2. Build new image with fixes (normal build process)
|
||||||
|
|
||||||
|
# 3. Deploy
|
||||||
|
cd /opt/docker/felhom-controller
|
||||||
|
sudo docker compose up -d
|
||||||
|
|
||||||
|
# 4. Verify container sees /host-dev
|
||||||
|
docker exec felhom-controller ls /host-dev/sd*
|
||||||
|
|
||||||
|
# 5. Verify rshared propagation is active
|
||||||
|
docker inspect felhom-controller --format '{{range .Mounts}}{{if eq .Destination "/mnt"}}Propagation={{.Propagation}}{{end}}{{end}}'
|
||||||
|
# Should show: Propagation=rshared
|
||||||
|
|
||||||
|
# 6. Test storage init wizard:
|
||||||
|
# - Scan → sdb appears
|
||||||
|
# - Select sdb → configure hdd_1 → type FORMÁZÁS
|
||||||
|
# - Watch progress panel — should show command details
|
||||||
|
# - Should complete successfully
|
||||||
|
|
||||||
|
# 7. Verify mount on HOST (proves propagation):
|
||||||
|
findmnt /mnt/hdd_1
|
||||||
|
# Should show /dev/sdb1 mounted at /mnt/hdd_1
|
||||||
|
|
||||||
|
# 8. Verify fstab entry:
|
||||||
|
grep hdd_1 /etc/fstab
|
||||||
|
# Should show UUID=... /mnt/hdd_1 ext4 defaults,nofail,noatime 0 2
|
||||||
|
|
||||||
|
# 9. Verify storage registered in settings:
|
||||||
|
# Visit Settings page → Adattárolók → /mnt/hdd_1 should appear
|
||||||
|
|
||||||
|
# 10. Restart controller — verify mount survives:
|
||||||
|
docker restart felhom-controller
|
||||||
|
docker exec felhom-controller ls /mnt/hdd_1/
|
||||||
|
# Should show: storage/ Dokumentumok/
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What NOT to change
|
||||||
|
|
||||||
|
- **Dockerfile** — packages already correct (fdisk, e2fsprogs, util-linux, rsync, parted)
|
||||||
|
- **scan_linux.go** — scan works correctly after v0.11.1 fixes
|
||||||
|
- **safety_linux.go / safety.go** — system disk detection works
|
||||||
|
- **Template/JS** — wizard UI works fine; `CreatePartition` default-true is handled in handler
|
||||||
Reference in New Issue
Block a user