Files
deploy-felhom-compose/TASK.md
T

377 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TASK: Bugfix — Storage Initialization (FormatAndMount)
**Version:** 0.11.4 → 0.11.5
**Priority:** Fix all 3 bugs + add safety improvements before testing format again.
## Context
Storage initialization wizard (scan → select disk → format → mount → register) works
up to the partitioning step. Three bugs prevent completion. Fix all three in one pass.
**Test environment:** demo-felhom.eu, `/dev/sdb` = 931.5 GB USB HDD (HD710 PRO),
has existing GPT partition table with one partition `sdb1` (no filesystem).
---
## Bug 1: sfdisk fails with "unsupported command" (CURRENT BLOCKER)
### Error output
```
Old situation: Device Start End Sectors Size Type
/host-dev/sdb1 2048 1953523711 1953521664 931.5G Linux filesystem
>>> Script header accepted.
>>> line 2: unsupported command
Hiba: exit status 1
```
### Root cause
Two issues in `format_linux.go` line ~10225:
```go
sfdiskInput := "label: gpt\n,,,L\n"
cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
```
1. **`,,,L` type shorthand fails on GPT** — this sfdisk version doesn't accept `L` as type
for GPT disklabel. For GPT, sfdisk needs the full GUID or no type (defaults to Linux filesystem).
2. **No `--force` flag** — sdb already has a GPT table with sdb1. sfdisk tries to apply the
script as a delta to the existing layout, not as a fresh layout.
3. **No `wipefs` before sfdisk** — existing partition signatures confuse sfdisk.
### Fix in `controller/internal/storage/format_linux.go`
Find this block (around line 1022210230):
```go
if req.CreatePartition {
send("partitioning", "Partíció létrehozása...", 15)
sfdiskInput := "label: gpt\n,,,L\n"
cmd := exec.Command("sfdisk", HostDevicePath(req.DevicePath))
cmd.Stdin = strings.NewReader(sfdiskInput)
if out, err := cmd.CombinedOutput(); err != nil {
return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
}
```
Replace with:
```go
if req.CreatePartition {
send("partitioning", fmt.Sprintf("wipefs -a %s ...", HostDevicePath(req.DevicePath)), 12)
// Wipe existing partition table and filesystem signatures first
_ = exec.Command("wipefs", "-a", HostDevicePath(req.DevicePath)).Run()
time.Sleep(500 * time.Millisecond)
// Create GPT with single partition spanning whole disk
// ",," = start=default, size=default(fill disk), type=default(Linux filesystem GUID)
// --force: overwrite even if device appears busy
// --wipe always: wipe filesystem signatures from newly created partitions
send("partitioning", fmt.Sprintf("sfdisk --force --wipe always %s ...", HostDevicePath(req.DevicePath)), 15)
sfdiskInput := "label: gpt\n,,\n"
cmd := exec.Command("sfdisk", "--force", "--wipe", "always", HostDevicePath(req.DevicePath))
cmd.Stdin = strings.NewReader(sfdiskInput)
if out, err := cmd.CombinedOutput(); err != nil {
return "", fail("partitioning", "Partícionálás sikertelen: "+string(out), err)
}
```
---
## Bug 2: `mount mountPath` will fail (NEXT BLOCKER after Bug 1)
### Current code (around line 1028810290)
```go
if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {
return "", fail("mounting", "Csatlakoztatás sikertelen: "+string(out), err)
}
```
### Root cause
`mount /mnt/hdd_1` works by looking up `/mnt/hdd_1` in the process's `/etc/fstab` to find
which device to mount. But inside the container, `/etc/fstab` is Docker's auto-generated fstab
(not the host's). The UUID entry was written to `/host-fstab` (the host's real fstab).
So `mount /mnt/hdd_1` will fail with "can't find /mnt/hdd_1 in /etc/fstab" or similar.
### Fix in `controller/internal/storage/format_linux.go`
Find this line (around line 10288):
```go
if out, err := exec.Command("mount", mountPath).CombinedOutput(); err != nil {
```
Replace with:
```go
// Mount by device path explicitly — container's /etc/fstab != host fstab,
// so "mount /mnt/hdd_1" (fstab lookup) won't work.
send("mounting", fmt.Sprintf("mount -t ext4 %s %s ...", HostDevicePath(partDev), mountPath), 70)
if out, err := exec.Command("mount", "-t", "ext4", "-o", "defaults,noatime",
HostDevicePath(partDev), mountPath).CombinedOutput(); err != nil {
```
The fstab entry in `/host-fstab` still ensures persistence across host reboots.
This explicit mount handles the immediate "mount it right now" operation.
---
## Bug 3: Mount namespace isolation — mount won't be visible on host (RESTART BLOCKER)
### Root cause
Even with `privileged: true`, `mount` inside a container operates in the container's
mount namespace. The host kernel does NOT see the mount. Consequences:
- After controller container restart, the mount is gone
- Other containers can't access `/mnt/hdd_1`
- The bind mount `- /mnt:/mnt:rw` shares existing host mounts INTO the container,
but new mounts created inside the container don't propagate BACK to the host
### Fix: Change `/mnt` volume to use `rshared` mount propagation
#### 3a. `controller/docker-compose.yml`
Find this line:
```yaml
# All external storage — /mnt/* for multi-storage + restore
- /mnt:/mnt:rw
```
Replace with:
```yaml
# All external storage — rshared propagation so mounts created inside
# the container (disk init) propagate to the host and vice versa
- type: bind
source: /mnt
target: /mnt
bind:
propagation: rshared
```
**Important:** This uses Docker Compose long-form volume syntax. The rest of the volumes
can stay in short form. Only `/mnt` needs propagation.
#### 3b. `scripts/docker-setup.sh` — Add mount propagation setup
Find the section where the script does final setup steps (after Docker installation,
before or after compose generation). Add:
```bash
# Enable shared mount propagation on /mnt (required for controller disk init)
# This allows mounts created inside the controller container to propagate to the host
log_info "Configuring mount propagation for /mnt..."
mount --make-rshared /mnt 2>/dev/null || mount --make-shared /mnt 2>/dev/null || true
```
**Also** add a comment near the controller compose generation (if any) explaining this requirement.
If `docker-setup.sh` doesn't generate the controller compose, just add the `mount --make-rshared`
to the node preparation section. It's idempotent and safe to run multiple times.
---
## Safety improvement 1: Post-mount verification
### What
After mount succeeds (exit code 0), verify the mount is actually visible.
### Where
In `format_linux.go`, right after the mount command succeeds and BEFORE the
`send("mounting", "Csatlakoztatva..."` line, add:
```go
// Verify mount actually worked (don't just trust exit code)
verifyOut, verifyErr := exec.Command("findmnt", "-n", "-o", "SOURCE", "--target", mountPath).Output()
if verifyErr != nil || strings.TrimSpace(string(verifyOut)) == "" {
return "", fail("mounting", "A csatlakoztatás nem ellenőrizhető: a mount parancs sikerült, de a meghajtó nem látható a rendszerben", fmt.Errorf("mount point %s not found after mount", mountPath))
}
```
---
## Safety improvement 2: Use ASCII mount name for ext4 filesystem label
### What
The current code uses `req.Label` (user-provided display label like "Külső HDD 1TB") for the
ext4 `-L` label. ext4 labels are limited to 16 BYTES. Hungarian UTF-8 chars (ű, ó, é) are
2 bytes each, so "Külső HDD 1TB" could exceed the limit or get truncated mid-character.
### Where
In `format_linux.go`, find the label preparation block (around line 1024910254):
```go
label := req.Label
if label == "" {
label = req.MountName
}
if len(label) > 16 {
label = label[:16]
}
```
Replace with:
```go
// Use ASCII-safe mount name for ext4 filesystem label (16-byte limit).
// The display label (req.Label) stays in settings.json for the UI.
fsLabel := req.MountName
if len(fsLabel) > 16 {
fsLabel = fsLabel[:16]
}
```
Then update the mkfs.ext4 call right below to use `fsLabel` instead of `label`:
```go
mkfsCmd := exec.Command("mkfs.ext4", "-L", fsLabel, "-F", HostDevicePath(partDev))
```
---
## Safety improvement 3: Smart partition handling (skip repartition when unnecessary)
### What
The scan shows sdb has 1 partition (sdb1) with no filesystem. The JS always sends
`CreatePartition: true` (because `disk.CreatePartition` is undefined on the `BlockDevice`
struct, so `undefined !== false` evaluates to `true` in JS).
For a disk that already has exactly one partition with no filesystem, we should skip
the destructive repartition step and just format the existing partition directly.
### Where
In `handlers.go`, in `storageInitAPIHandler`, AFTER building `fmtReq` (around line 1417514180)
and BEFORE the `go func()` goroutine, add:
```go
// Smart partition: if device is a whole disk with exactly 1 partition
// with no filesystem, skip repartitioning — just format existing partition
if fmtReq.CreatePartition {
result, scanErr := storage.ScanDisks()
if scanErr == nil {
for _, disk := range result.AvailableDisks {
if disk.Path == req.DevicePath && len(disk.Partitions) == 1 && disk.Partitions[0].FSType == "" {
s.logger.Printf("[INFO] Disk %s has 1 empty partition (%s) — skipping repartition",
req.DevicePath, disk.Partitions[0].Path)
fmtReq.DevicePath = disk.Partitions[0].Path // e.g., "/dev/sdb1"
fmtReq.CreatePartition = false
break
}
}
}
}
```
This way, for demo sdb (which has sdb1 with no FS), it will:
1. Set DevicePath to `/dev/sdb1`
2. Set CreatePartition to `false`
3. Skip wipefs + sfdisk entirely
4. Go straight to `mkfs.ext4 /host-dev/sdb1`
**Note:** The wipefs+sfdisk fix (Bug 1) is still needed as fallback for truly
unpartitioned disks or disks with multiple/incompatible partitions.
---
## Safety improvement 4: Descriptive progress messages
### What
Include executed command details in progress messages for remote debugging.
The progress messages show in the UI and get logged by the handler.
### Where
Throughout `format_linux.go`, update the `send()` calls to include command info.
Examples already shown in the Bug 1 and Bug 2 fixes above. Also update:
For the mkfs step:
```go
send("formatting", fmt.Sprintf("mkfs.ext4 -L %s -F %s ...", fsLabel, HostDevicePath(partDev)), 30)
```
For the blkid step (around line 10274):
```go
send("mounting", fmt.Sprintf("UUID lekérése: blkid %s ...", HostDevicePath(partDev)), 65)
```
---
## Summary: All changes by file
### `controller/internal/storage/format_linux.go` (5 changes)
1. Partition block: Add `wipefs -a`; change sfdisk input `",,,L"``",,"`;
add `--force --wipe always` flags
2. Mount block: Change `mount mountPath``mount -t ext4 -o defaults,noatime HostDevicePath(partDev) mountPath`
3. After mount: Add `findmnt` verification
4. Label: Use `req.MountName` (ASCII) instead of `req.Label` (UTF-8) for `mkfs.ext4 -L`
5. Progress messages: Include command details in `send()` calls
### `controller/docker-compose.yml` (1 change)
6. Change `/mnt:/mnt:rw` to long-form syntax with `propagation: rshared`
### `controller/internal/web/handlers.go` (1 change)
7. In `storageInitAPIHandler`: Add smart partition detection before launching goroutine
### `scripts/docker-setup.sh` (1 change)
8. Add `mount --make-rshared /mnt` to node preparation section
---
## Build & deploy procedure
```bash
# 1. On the host FIRST (before restarting controller):
sudo mount --make-rshared /mnt
# 2. Build new image with fixes (normal build process)
# 3. Deploy
cd /opt/docker/felhom-controller
sudo docker compose up -d
# 4. Verify container sees /host-dev
docker exec felhom-controller ls /host-dev/sd*
# 5. Verify rshared propagation is active
docker inspect felhom-controller --format '{{range .Mounts}}{{if eq .Destination "/mnt"}}Propagation={{.Propagation}}{{end}}{{end}}'
# Should show: Propagation=rshared
# 6. Test storage init wizard:
# - Scan → sdb appears
# - Select sdb → configure hdd_1 → type FORMÁZÁS
# - Watch progress panel — should show command details
# - Should complete successfully
# 7. Verify mount on HOST (proves propagation):
findmnt /mnt/hdd_1
# Should show /dev/sdb1 mounted at /mnt/hdd_1
# 8. Verify fstab entry:
grep hdd_1 /etc/fstab
# Should show UUID=... /mnt/hdd_1 ext4 defaults,nofail,noatime 0 2
# 9. Verify storage registered in settings:
# Visit Settings page → Adattárolók → /mnt/hdd_1 should appear
# 10. Restart controller — verify mount survives:
docker restart felhom-controller
docker exec felhom-controller ls /mnt/hdd_1/
# Should show: storage/ Dokumentumok/
```
---
## What NOT to change
- **Dockerfile** — packages already correct (fdisk, e2fsprogs, util-linux, rsync, parted)
- **scan_linux.go** — scan works correctly after v0.11.1 fixes
- **safety_linux.go / safety.go** — system disk detection works
- **Template/JS** — wizard UI works fine; `CreatePartition` default-true is handled in handler