# TASK.md — Bug Fix Plan for Sonnet 4.5 ## Prompt / Context Gathering Before starting any fixes, read the following files to understand the project: 1. **`controller/README.md`** — Full architecture, module map, API endpoints, feature documentation 2. **`CHANGELOG.md`** — Recent changes (sessions 26-38, 2026-02-17) for context on what was implemented 3. **`controller/TASK.md`** (this file) — The bug list and fix instructions Then read each source file listed in a bug section before fixing it. **After all fixes are complete**, update: - `CHANGELOG.md` — Add a new entry at the top following the existing format (session 39, next version **v0.12.3**) - `controller/README.md` — If any fix changes behavior, API, or architecture, update the relevant section --- ## Fix Priority Fixes are organized into 3 tiers: - **CRITICAL** — Data races, security vulnerabilities, data loss risks. Fix these first. - **HIGH** — Logic errors, resource leaks, incorrect behavior. Fix these second. - **MEDIUM** — Edge cases, code quality, minor issues. Fix if time allows. --- ## CRITICAL Fixes ### C1. Data race on `m.lastDBDump.Results` mutation (backup/backup.go) **File:** `internal/backup/backup.go`, around lines 556-570 **Problem:** In `RefreshCache()`, the code mutates `m.lastDBDump.Results[i].Validation` while holding NO lock. The mutex is only acquired later at line 574. Other goroutines reading `m.lastDBDump` via `GetFullStatus()` can see torn writes. **Fix:** Move the `m.lastDBDump.Results` mutation block (lines 556-570) inside the `m.mu.Lock()` section that starts at line 574. The re-validation loop should happen after acquiring the lock, before setting `m.cachedStatus`. --- ### C2. SnapshotHistory reversed after unlock (backup/backup.go) **File:** `internal/backup/backup.go`, around lines 582-589 **Problem:** `m.cachedStatus = status` is set at line 582 while holding the lock, but the snapshot history reversal at lines 587-589 happens AFTER `m.mu.Unlock()`. Since `status.SnapshotHistory` is a slice backed by the same array as `m.cachedStatus.SnapshotHistory` (created by `copy` which copies elements but shares nothing — wait, actually `copy` creates a new backing array, so this is safe for the array). However, the local `status` variable IS `m.cachedStatus` (assigned at line 582), so reversing `status.SnapshotHistory` also reverses `m.cachedStatus.SnapshotHistory` without the lock. **Fix:** Move the reversal loop (lines 587-589) BEFORE `m.mu.Unlock()`, right after `copy(status.SnapshotHistory, m.snapshotHistory)` at line 581 and before assigning to `m.cachedStatus` at line 582. --- ### C3. `SetStackProvider` write without synchronization (backup/backup.go) **File:** `internal/backup/backup.go`, line 413-415 **Problem:** `m.stackProvider = provider` is a direct field write with no mutex protection. The field is read by `resolveAppBackupPaths()` and `RefreshCache()` concurrently. **Fix:** Wrap the assignment in `m.mu.Lock()` / `m.mu.Unlock()`. --- ### C4. `GetFullStatus` shallow-copies mutable pointers (backup/backup.go) **File:** `internal/backup/backup.go`, lines 619-620 **Problem:** `status.LastDBDump = m.lastDBDump` and `status.LastBackup = m.lastBackup` copy pointer values. Callers can mutate the referenced objects, creating races with the manager's internal state. **Fix:** Deep-copy these structs. If `lastDBDump` is non-nil, create a copy: `copyDump := *m.lastDBDump; status.LastDBDump = ©Dump`. Same for `lastBackup`. Also deep-copy the `Results` slice inside `lastDBDump` if present (since it's a slice of structs, a simple `copy` into a new slice suffices). --- ### C5. `IsSystemDisk` uses 8-bit major mask (storage/safety_linux.go) **File:** `internal/storage/safety_linux.go`, lines 30-31 **Problem:** `major := (stat.Rdev >> 8) & 0xff` extracts only 8 bits. Linux dev_t uses a 12-bit major: bits 8-19 with bits 0-7 at positions 8-15 and bits 8-11 at positions 44-47 (for 64-bit). For the common case, the formula should use `unix.Major()` from `golang.org/x/sys/unix` or at minimum the standard extraction: `major = (dev & 0xfff00) >> 8`. Also, the function only compares major numbers — ALL SCSI disks share major 8, so this blocks formatting of any SCSI disk if root is also SCSI. **Fix:** Use `unix.Major(uint64(stat.Rdev))` and `unix.Minor(uint64(stat.Rdev))` from `golang.org/x/sys/unix` (already a dependency). Compare both major AND the disk portion of the minor to identify the physical disk. For SCSI (major 8), disks are distinguished by `minor / 16`. For NVMe (major 259), compare the whole-disk minor. The simplest correct approach: resolve both device paths to their parent disk name (e.g., via `partitionToParentDisk`) and compare those names instead of using device numbers. --- ### C6. No `/dev/` prefix validation on `DevicePath` (storage/format_linux.go) **File:** `internal/storage/format_linux.go`, around lines 37-41 **Problem:** The `FormatAndMount` function accepts `DevicePath` from user JSON without validating it starts with `/dev/`. An attacker could pass arbitrary paths. **Fix:** At the start of `FormatAndMount`, validate: ```go if !strings.HasPrefix(req.DevicePath, "/dev/") { return fmt.Errorf("invalid device path: must start with /dev/") } if strings.Contains(req.DevicePath, "..") { return fmt.Errorf("invalid device path: must not contain ..") } ``` --- ### C7. Path traversal in `extractName` (api/router.go) **File:** `internal/api/router.go`, lines 769-771 **Problem:** `extractName()` extracts a stack name from the URL path but doesn't reject `..`, `/`, or `\` characters. This name is used in file paths and Docker commands. **Fix:** Add validation after extracting the name: ```go func extractName(r *http.Request) string { name := // ... existing extraction logic if strings.ContainsAny(name, "/\\") || name == ".." || name == "." || name == "" { return "" } return name } ``` Then ensure all callers handle empty string as an error (most already check `name == ""`). --- ### C8. Path traversal in `TargetPath` (web/storage_handlers.go) **File:** `internal/web/storage_handlers.go`, around lines 374-417 **Problem:** Migration API accepts `TargetPath` from user input without validating it's a registered storage path. Allows writing to arbitrary filesystem locations. **Fix:** Validate `TargetPath` against the list of registered storage paths from settings: ```go registeredPaths := s.settings.GetStoragePaths() // or equivalent valid := false for _, p := range registeredPaths { if req.TargetPath == p.Path { valid = true break } } if !valid { http.Error(w, "invalid target path", http.StatusBadRequest) return } ``` --- ### C9. Path traversal in `DestinationPath` (api/router.go) **File:** `internal/api/router.go`, in `saveCrossDriveConfig()` around lines 587-644 **Problem:** `DestinationPath` from user JSON not validated against registered storage paths. Allows cross-drive backup to write anywhere. **Fix:** Same pattern as C8 — validate `DestinationPath` against registered storage paths. Also validate it's not the same device as the source (this check may already exist in `crossdrive.go`, but the API should validate early). --- ### C10. Path traversal in `ParseComposeHDDMounts` (stacks/delete.go) **File:** `internal/stacks/delete.go`, lines 243-251 **Problem:** After `${HDD_PATH}` substitution, the resulting path is checked with `strings.HasPrefix(hostPath, hddPath)` BEFORE `filepath.Clean`. An attacker could craft `${HDD_PATH}/../../etc/passwd` — the prefix check passes, but the resolved path escapes the HDD directory. **Fix:** Apply `filepath.Clean()` to the resolved path BEFORE the prefix check: ```go resolved := filepath.Clean(hostPath) if !strings.HasPrefix(resolved, filepath.Clean(hddPath) + string(filepath.Separator)) { continue // skip paths that escape HDD directory } ``` --- ## HIGH Fixes ### H1. `ValidateDump` reads entire file into memory (backup/dbdump.go) **File:** `internal/backup/dbdump.go`, line 251 **Problem:** `os.ReadFile(filePath)` loads entire SQL dump into memory. For large databases (500MB+), this causes massive memory allocation during every 5-minute cache refresh. **Fix:** Replace with `bufio.Scanner` reading line by line, or `io.LimitReader` to read only the first N bytes (e.g., 64KB for header) and last N bytes (for footer/completion markers). The validation only checks for structural markers like `CREATE TABLE`, `INSERT INTO`, completion tags — these are at the start and end of the file. --- ### H2. `du` calls without timeout (backup/appdata.go) **File:** `internal/backup/appdata.go`, lines 136 and 150 **Problem:** `exec.Command("du", "-sh", path)` and `exec.Command("du", "-sb", path)` have no timeout. Can hang on slow/unmounted filesystems, blocking the web UI. **Fix:** Use `exec.CommandContext` with a 30-second timeout: ```go ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() cmd := exec.CommandContext(ctx, "du", "-sh", path) ``` --- ### H3. Double `du` invocation per mount (backup/appdata.go) **File:** `internal/backup/appdata.go`, lines 79-80 **Problem:** Both `appDirSizeBytes(mount)` and `appDirSizeHuman(mount)` run `du` independently. That's 2 subprocess calls per mount point. **Fix:** Create a single function `appDirSize(path string) (int64, string)` that runs `du -sb` once, parses the byte count, and formats the human string using `humanizeBytes()`. Replace both calls with this single function. --- ### H4. Snapshot validation only checks first 100 (backup/restore.go) **File:** `internal/backup/restore.go`, line 22 **Problem:** `m.restic.ListSnapshots(100)` limits to 100 snapshots. Older snapshots can't be restored. **Fix:** Either: - Pass 0 or a very large number to list all snapshots, OR - Better: validate the snapshot ID format with a regex (`^[0-9a-f]{8,64}$`) and skip the existence check (let `restic restore` fail naturally with a clear error if the ID doesn't exist). This avoids listing all snapshots entirely. --- ### H5. No pruning for cross-drive restic repos (backup/crossdrive.go) **File:** `internal/backup/crossdrive.go` **Problem:** Cross-drive restic backups accumulate snapshots forever. Main backups have pruning via `restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6`, but cross-drive repos don't. **Fix:** After a successful cross-drive restic backup, run a prune step similar to the main backup. Add a `pruneRepo(ctx, repoPath, pwFile string)` function that runs `restic forget --keep-daily 7 --keep-weekly 4 --prune` on the cross-drive repo. Call it after backup in `runResticBackup()`. --- ### H6. Temp password file management (backup/crossdrive.go) **File:** `internal/backup/crossdrive.go`, lines 230-240 **Problem:** `defer os.Remove(pwFile.Name())` is registered before the file is closed. On some systems this can fail. Also, no `defer pwFile.Close()` — the explicit `Close()` at line 240 won't run if code panics between creation and close. **Fix:** Reorganize: ```go pwFile, err := os.CreateTemp("", "felhom-crossdrive-pw-*") if err != nil { return ... } pwPath := pwFile.Name() if _, err := pwFile.WriteString(password); err != nil { pwFile.Close() os.Remove(pwPath) return ... } pwFile.Close() defer os.Remove(pwPath) ``` --- ### H7. `dirSizeBytes` swallows all walk errors (backup/crossdrive.go) **File:** `internal/backup/crossdrive.go`, lines 297-307 **Problem:** `filepath.Walk` callback returns `nil` on error, hiding permission/IO issues. **Fix:** Return the error (or at least log it): ```go err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error { if err != nil { return err // propagate instead of swallowing } if !info.IsDir() { total += info.Size() } return nil }) ``` --- ### H8. Non-atomic fstab write (storage/safety_linux.go) **File:** `internal/storage/safety_linux.go`, lines 91-102 **Problem:** `AppendFstabEntry` writes directly to `/etc/fstab`. A crash during write corrupts the file, potentially bricking the system on reboot. **Fix:** Read fstab, append entry, write to `/etc/fstab.tmp`, then `os.Rename("/etc/fstab.tmp", "/etc/fstab")`. This is the same atomic pattern used in settings.go. --- ### H9. `IsDeviceMounted` naive prefix matching (storage/safety_linux.go) **File:** `internal/storage/safety_linux.go`, line 54 **Problem:** `strings.HasPrefix(fields[0], devicePath)` causes `/dev/sdb` to match `/dev/sdb1`, `/dev/sdb2`, etc. — and worse, `/dev/sdb` matches `/dev/sdba` (hypothetical). **Fix:** After the prefix check, verify the next character is either end-of-string or a digit (for partition suffixes): ```go src := fields[0] if src == devicePath || (strings.HasPrefix(src, devicePath) && len(src) > len(devicePath) && (src[len(devicePath)] >= '0' && src[len(devicePath)] <= '9' || src[len(devicePath)] == 'p')) { return true, nil } ``` Or better: just check exact match for the device, and separately check if any partition of it is mounted. --- ### H10. eMMC device mapping bug (storage/scan_linux.go) **File:** `internal/storage/scan_linux.go`, lines 161-176 **Problem:** `partitionToParentDisk` strips trailing digits to find the parent. For eMMC (`mmcblk0p1`), it strips `0p1` → `mmcblk` instead of correctly producing `mmcblk0`. The `p` separator between device number and partition number is not handled. **Fix:** Add eMMC/NVMe pattern detection: ```go func partitionToParentDisk(partName string) string { // Handle mmcblk0p1, nvme0n1p1 patterns (separator is "p" before partition number) if idx := strings.LastIndex(partName, "p"); idx > 0 { prefix := partName[:idx] suffix := partName[idx+1:] if len(suffix) > 0 && suffix[0] >= '0' && suffix[0] <= '9' { // Check if prefix ends with a digit (e.g., mmcblk0, nvme0n1) if prefix[len(prefix)-1] >= '0' && prefix[len(prefix)-1] <= '9' { return prefix } } } // Default: strip trailing digits (sda1 → sda) return strings.TrimRight(partName, "0123456789") } ``` --- ### H11. Data race on `bytesCopied` in rsync error path (storage/migrate.go) **File:** `internal/storage/migrate.go`, around line 243 **Problem:** In `runRsync`, the `bytesCopied` variable is read (in the error return path) without holding the mutex, while a goroutine may be writing to it. **Fix:** Acquire the lock before reading `bytesCopied` in the error path, or use `atomic.Int64` instead of a mutex-protected variable. --- ### H12. Goroutine leak in rsync stdout reader (storage/migrate.go) **File:** `internal/storage/migrate.go`, lines 214-237 **Problem:** A goroutine reads rsync's stdout. If the main function returns early (e.g., on context cancellation), the goroutine may block on `scanner.Scan()` forever if stdout isn't closed, or try to send on a closed channel. **Fix:** Ensure the goroutine exits cleanly: 1. Use `cmd.StdoutPipe()` and ensure it's closed on function exit 2. Use a `done` channel or context cancellation to signal the goroutine 3. Don't send on the progress channel after function returns — use a select with a done channel --- ### H13. Path prefix match without separator (storage/migrate.go) **File:** `internal/storage/migrate.go`, lines 128-131 **Problem:** `strings.HasPrefix(stackMount, currentHDD)` causes `/mnt/hdd` to match `/mnt/hdd_backup/data`. Stacks on a similarly-named but different mount would be migrated incorrectly. **Fix:** Append a trailing separator: ```go if strings.HasPrefix(stackMount, currentHDD + "/") || stackMount == currentHDD { ``` --- ### H14. `DeleteStack` continues after failed `docker compose down` (stacks/delete.go) **File:** `internal/stacks/delete.go`, lines 88-92 **Problem:** If `docker compose down` fails, the function continues to delete stack files and HDD data. This can leave orphaned containers running while their config is deleted. **Fix:** Return the error from `docker compose down` failure. If the user wants to force-delete, that should be a separate explicit option. --- ### H15. XSS in flash messages (web/handlers.go) **File:** `internal/web/handlers.go`, around lines 282-288 **Problem:** URL query parameters (`?success=...`, `?error=...`) are passed to template data. If templates render these with `{{.}}` instead of escaping, it's XSS. Go's `html/template` auto-escapes in HTML context, but check if any use `{{. | safeHTML}}` or JavaScript context. **Fix:** Verify all flash message rendering uses standard `{{.Success}}` / `{{.Error}}` in HTML context (which Go auto-escapes). If any template uses these values inside `