Files
deploy-felhom-compose/TASK.md
T

13 KiB
Raw Blame History

TASK: Bugfix — Storage Initialization (FormatAndMount)

Version: 0.11.4 → 0.11.5 Priority: Fix all 3 bugs + add safety improvements before testing format again.

Context

Storage initialization wizard (scan → select disk → format → mount → register) works up to the partitioning step. Three bugs prevent completion. Fix all three in one pass.

Test environment: demo-felhom.eu, /dev/sdb = 931.5 GB USB HDD (HD710 PRO), has existing GPT partition table with one partition sdb1 (no filesystem).


Bug 1: sfdisk fails with "unsupported command" (CURRENT BLOCKER)

Error output

Old situation: Device Start End Sectors Size Type
/host-dev/sdb1 2048 1953523711 1953521664 931.5G Linux filesystem
>>> Script header accepted.
>>> line 2: unsupported command
Hiba: exit status 1

Root cause

Two issues in format_linux.go line ~10225:

sfdiskInput := "label: gpt\n,,,L\n"
cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
  1. ,,,L type shorthand fails on GPT — this sfdisk version doesn't accept L as type for GPT disklabel. For GPT, sfdisk needs the full GUID or no type (defaults to Linux filesystem).
  2. No --force flag — sdb already has a GPT table with sdb1. sfdisk tries to apply the script as a delta to the existing layout, not as a fresh layout.
  3. No wipefs before sfdisk — existing partition signatures confuse sfdisk.

Fix in controller/internal/storage/format_linux.go

Find this block (around line 1022210230):

	if req.CreatePartition {
		send("partitioning", "Partíció létrehozása...", 15)

		sfdiskInput := "label: gpt\n,,,L\n"
		cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
		cmd.Stdin = strings.NewReader(sfdiskInput)
		if out, err := cmd.CombinedOutput(); err != nil {
			return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
		}

Replace with:

	if req.CreatePartition {
		send("partitioning", fmt.Sprintf("wipefs -a %s ...", HostDevicePath(req.DevicePath)), 12)

		// Wipe existing partition table and filesystem signatures first
		_ = exec.Command("wipefs", "-a", HostDevicePath(req.DevicePath)).Run()
		time.Sleep(500 * time.Millisecond)

		// Create GPT with single partition spanning whole disk
		// ",," = start=default, size=default(fill disk), type=default(Linux filesystem GUID)
		// --force: overwrite even if device appears busy
		// --wipe always: wipe filesystem signatures from newly created partitions
		send("partitioning", fmt.Sprintf("sfdisk --force --wipe always %s ...", HostDevicePath(req.DevicePath)), 15)
		sfdiskInput := "label: gpt\n,,\n"
		cmd := exec.Command("sfdisk", "--force", "--wipe", "always", HostDevicePath(req.DevicePath))
		cmd.Stdin = strings.NewReader(sfdiskInput)
		if out, err := cmd.CombinedOutput(); err != nil {
			return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
		}

Bug 2: mount mountPath will fail (NEXT BLOCKER after Bug 1)

Current code (around line 1028810290)

	if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {
		return "", fail("mounting", "Csatlakoztatás sikertelen: "+string(out), err)
	}

Root cause

mount /mnt/hdd_1 works by looking up /mnt/hdd_1 in the process's /etc/fstab to find which device to mount. But inside the container, /etc/fstab is Docker's auto-generated fstab (not the host's). The UUID entry was written to /host-fstab (the host's real fstab).

So mount /mnt/hdd_1 will fail with "can't find /mnt/hdd_1 in /etc/fstab" or similar.

Fix in controller/internal/storage/format_linux.go

Find this line (around line 10288):

	if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {

Replace with:

	// Mount by device path explicitly — container's /etc/fstab != host fstab,
	// so "mount /mnt/hdd_1" (fstab lookup) won't work.
	send("mounting", fmt.Sprintf("mount -t ext4 %s %s ...", HostDevicePath(partDev), mountPath), 70)
	if out, err := exec.Command("mount", "-t", "ext4", "-o", "defaults,noatime",
		HostDevicePath(partDev), mountPath).CombinedOutput(); err != nil {

The fstab entry in /host-fstab still ensures persistence across host reboots. This explicit mount handles the immediate "mount it right now" operation.


Bug 3: Mount namespace isolation — mount won't be visible on host (RESTART BLOCKER)

Root cause

Even with privileged: true, mount inside a container operates in the container's mount namespace. The host kernel does NOT see the mount. Consequences:

  • After controller container restart, the mount is gone
  • Other containers can't access /mnt/hdd_1
  • The bind mount - /mnt:/mnt:rw shares existing host mounts INTO the container, but new mounts created inside the container don't propagate BACK to the host

Fix: Change /mnt volume to use rshared mount propagation

3a. controller/docker-compose.yml

Find this line:

      # All external storage — /mnt/* for multi-storage + restore
      - /mnt:/mnt:rw

Replace with:

      # All external storage — rshared propagation so mounts created inside
      # the container (disk init) propagate to the host and vice versa
      - type: bind
        source: /mnt
        target: /mnt
        bind:
          propagation: rshared

Important: This uses Docker Compose long-form volume syntax. The rest of the volumes can stay in short form. Only /mnt needs propagation.

3b. scripts/docker-setup.sh — Add mount propagation setup

Find the section where the script does final setup steps (after Docker installation, before or after compose generation). Add:

# Enable shared mount propagation on /mnt (required for controller disk init)
# This allows mounts created inside the controller container to propagate to the host
log_info "Configuring mount propagation for /mnt..."
mount --make-rshared /mnt 2>/dev/null || mount --make-shared /mnt 2>/dev/null || true

Also add a comment near the controller compose generation (if any) explaining this requirement.

If docker-setup.sh doesn't generate the controller compose, just add the mount --make-rshared to the node preparation section. It's idempotent and safe to run multiple times.


Safety improvement 1: Post-mount verification

What

After mount succeeds (exit code 0), verify the mount is actually visible.

Where

In format_linux.go, right after the mount command succeeds and BEFORE the send("mounting", "Csatlakoztatva..." line, add:

	// Verify mount actually worked (don't just trust exit code)
	verifyOut, verifyErr := exec.Command("findmnt", "-n", "-o", "SOURCE", "--target", mountPath).Output()
	if verifyErr != nil || strings.TrimSpace(string(verifyOut)) == "" {
		return "", fail("mounting", "A csatlakoztatás nem ellenőrizhető: a mount parancs sikerült, de a meghajtó nem látható a rendszerben", fmt.Errorf("mount point %s not found after mount", mountPath))
	}

Safety improvement 2: Use ASCII mount name for ext4 filesystem label

What

The current code uses req.Label (user-provided display label like "Külső HDD 1TB") for the ext4 -L label. ext4 labels are limited to 16 BYTES. Hungarian UTF-8 chars (ű, ó, é) are 2 bytes each, so "Külső HDD 1TB" could exceed the limit or get truncated mid-character.

Where

In format_linux.go, find the label preparation block (around line 1024910254):

	label := req.Label
	if label == "" {
		label = req.MountName
	}
	if len(label) > 16 {
		label = label[:16]
	}

Replace with:

	// Use ASCII-safe mount name for ext4 filesystem label (16-byte limit).
	// The display label (req.Label) stays in settings.json for the UI.
	fsLabel := req.MountName
	if len(fsLabel) > 16 {
		fsLabel = fsLabel[:16]
	}

Then update the mkfs.ext4 call right below to use fsLabel instead of label:

	mkfsCmd := exec.Command("mkfs.ext4", "-L", fsLabel, "-F", HostDevicePath(partDev))

Safety improvement 3: Smart partition handling (skip repartition when unnecessary)

What

The scan shows sdb has 1 partition (sdb1) with no filesystem. The JS always sends CreatePartition: true (because disk.CreatePartition is undefined on the BlockDevice struct, so undefined !== false evaluates to true in JS).

For a disk that already has exactly one partition with no filesystem, we should skip the destructive repartition step and just format the existing partition directly.

Where

In handlers.go, in storageInitAPIHandler, AFTER building fmtReq (around line 1417514180) and BEFORE the go func() goroutine, add:

	// Smart partition: if device is a whole disk with exactly 1 partition
	// with no filesystem, skip repartitioning — just format existing partition
	if fmtReq.CreatePartition {
		result, scanErr := storage.ScanDisks()
		if scanErr == nil {
			for _, disk := range result.AvailableDisks {
				if disk.Path == req.DevicePath && len(disk.Partitions) == 1 && disk.Partitions[0].FSType == "" {
					s.logger.Printf("[INFO] Disk %s has 1 empty partition (%s) — skipping repartition",
						req.DevicePath, disk.Partitions[0].Path)
					fmtReq.DevicePath = disk.Partitions[0].Path // e.g., "/dev/sdb1"
					fmtReq.CreatePartition = false
					break
				}
			}
		}
	}

This way, for demo sdb (which has sdb1 with no FS), it will:

  1. Set DevicePath to /dev/sdb1
  2. Set CreatePartition to false
  3. Skip wipefs + sfdisk entirely
  4. Go straight to mkfs.ext4 /host-dev/sdb1

Note: The wipefs+sfdisk fix (Bug 1) is still needed as fallback for truly unpartitioned disks or disks with multiple/incompatible partitions.


Safety improvement 4: Descriptive progress messages

What

Include executed command details in progress messages for remote debugging. The progress messages show in the UI and get logged by the handler.

Where

Throughout format_linux.go, update the send() calls to include command info. Examples already shown in the Bug 1 and Bug 2 fixes above. Also update:

For the mkfs step:

	send("formatting", fmt.Sprintf("mkfs.ext4 -L %s -F %s ...", fsLabel, HostDevicePath(partDev)), 30)

For the blkid step (around line 10274):

	send("mounting", fmt.Sprintf("UUID lekérése: blkid %s ...", HostDevicePath(partDev)), 65)

Summary: All changes by file

controller/internal/storage/format_linux.go (5 changes)

  1. Partition block: Add wipefs -a; change sfdisk input ",,,L"",,"; add --force --wipe always flags
  2. Mount block: Change mount mountPathmount -t ext4 -o defaults,noatime HostDevicePath(partDev) mountPath
  3. After mount: Add findmnt verification
  4. Label: Use req.MountName (ASCII) instead of req.Label (UTF-8) for mkfs.ext4 -L
  5. Progress messages: Include command details in send() calls

controller/docker-compose.yml (1 change)

  1. Change /mnt:/mnt:rw to long-form syntax with propagation: rshared

controller/internal/web/handlers.go (1 change)

  1. In storageInitAPIHandler: Add smart partition detection before launching goroutine

scripts/docker-setup.sh (1 change)

  1. Add mount --make-rshared /mnt to node preparation section

Build & deploy procedure

# 1. On the host FIRST (before restarting controller):
sudo mount --make-rshared /mnt

# 2. Build new image with fixes (normal build process)

# 3. Deploy
cd /opt/docker/felhom-controller
sudo docker compose up -d

# 4. Verify container sees /host-dev
docker exec felhom-controller ls /host-dev/sd*

# 5. Verify rshared propagation is active
docker inspect felhom-controller --format '{{range .Mounts}}{{if eq .Destination "/mnt"}}Propagation={{.Propagation}}{{end}}{{end}}'
# Should show: Propagation=rshared

# 6. Test storage init wizard:
#    - Scan → sdb appears
#    - Select sdb → configure hdd_1 → type FORMÁZÁS
#    - Watch progress panel — should show command details
#    - Should complete successfully

# 7. Verify mount on HOST (proves propagation):
findmnt /mnt/hdd_1
# Should show /dev/sdb1 mounted at /mnt/hdd_1

# 8. Verify fstab entry:
grep hdd_1 /etc/fstab
# Should show UUID=... /mnt/hdd_1 ext4 defaults,nofail,noatime 0 2

# 9. Verify storage registered in settings:
# Visit Settings page → Adattárolók → /mnt/hdd_1 should appear

# 10. Restart controller — verify mount survives:
docker restart felhom-controller
docker exec felhom-controller ls /mnt/hdd_1/
# Should show: storage/ Dokumentumok/

What NOT to change

  • Dockerfile — packages already correct (fdisk, e2fsprogs, util-linux, rsync, parted)
  • scan_linux.go — scan works correctly after v0.11.1 fixes
  • safety_linux.go / safety.go — system disk detection works
  • Template/JS — wizard UI works fine; CreatePartition default-true is handled in handler