Bug Fix Plan for Sonnet 4.5 (Session 2)

This commit is contained in:
2026-02-18 07:40:40 +01:00
parent 93d9b474f1
commit 9c340433b2
+291 -466
View File
@@ -1,17 +1,17 @@
# TASK.md — Bug Fix Plan for Sonnet 4.5
# TASK.md — Bug Fix Plan for Sonnet 4.5 (Session 2)
## Prompt / Context Gathering
Before starting any fixes, read the following files to understand the project:
1. **`controller/README.md`** — Full architecture, module map, API endpoints, feature documentation
2. **`CHANGELOG.md`** — Recent changes (sessions 26-38, 2026-02-17) for context on what was implemented
2. **`CHANGELOG.md`** — Recent changes for context on what was implemented (including yesterday's bug fixes)
3. **`controller/TASK.md`** (this file) — The bug list and fix instructions
Then read each source file listed in a bug section before fixing it.
**After all fixes are complete**, update:
- `CHANGELOG.md` — Add a new entry at the top following the existing format (session 39, next version **v0.12.3**)
- `CHANGELOG.md` — Add a new entry at the top following the existing format (next session, next version **v0.12.4**)
- `controller/README.md` — If any fix changes behavior, API, or architecture, update the relevant section
---
@@ -20,7 +20,7 @@ Then read each source file listed in a bug section before fixing it.
Fixes are organized into 3 tiers:
- **CRITICAL** — Data races, security vulnerabilities, data loss risks. Fix these first.
- **CRITICAL** — Data loss, panics, crashes. Fix these first.
- **HIGH** — Logic errors, resource leaks, incorrect behavior. Fix these second.
- **MEDIUM** — Edge cases, code quality, minor issues. Fix if time allows.
@@ -28,139 +28,71 @@ Fixes are organized into 3 tiers:
## CRITICAL Fixes
### C1. Data race on `m.lastDBDump.Results` mutation (backup/backup.go)
### C1. `SetAppBackupBulk` data loss + nil map panic (settings/settings.go)
**File:** `internal/backup/backup.go`, around lines 556-570
**File:** `internal/settings/settings.go`, lines 271-282
**Problem:** In `RefreshCache()`, the code mutates `m.lastDBDump.Results[i].Validation` while holding NO lock. The mutex is only acquired later at line 574. Other goroutines reading `m.lastDBDump` via `GetFullStatus()` can see torn writes.
**Problem (two bugs in one function):**
**Fix:** Move the `m.lastDBDump.Results` mutation block (lines 556-570) inside the `m.mu.Lock()` section that starts at line 574. The re-validation loop should happen after acquiring the lock, before setting `m.cachedStatus`.
1. **Data loss:** Line 280 `s.AppBackup = newMap` replaces the ENTIRE map with only the keys from the input `prefs` parameter. Any stacks NOT in the input map are permanently deleted from the settings file, including their `CrossDrive` backup configurations.
---
Example: Settings has `{"app1": {Enabled:true, CrossDrive:{...}}, "app2": {Enabled:false}}`. Caller passes `prefs = {"app1": false}` (intending to disable only app1). After this function, app2's entire config is deleted.
### C2. SnapshotHistory reversed after unlock (backup/backup.go)
2. **Nil map panic:** Line 276 reads `s.AppBackup[name]` but there is no nil check on `s.AppBackup` before the loop. If `s.AppBackup` is nil (new installation, empty settings), this panics.
**File:** `internal/backup/backup.go`, around lines 582-589
**Fix:** Update the map IN PLACE instead of replacing it. Add nil guard:
**Problem:** `m.cachedStatus = status` is set at line 582 while holding the lock, but the snapshot history reversal at lines 587-589 happens AFTER `m.mu.Unlock()`. Since `status.SnapshotHistory` is a slice backed by the same array as `m.cachedStatus.SnapshotHistory` (created by `copy` which copies elements but shares nothing — wait, actually `copy` creates a new backing array, so this is safe for the array). However, the local `status` variable IS `m.cachedStatus` (assigned at line 582), so reversing `status.SnapshotHistory` also reverses `m.cachedStatus.SnapshotHistory` without the lock.
**Fix:** Move the reversal loop (lines 587-589) BEFORE `m.mu.Unlock()`, right after `copy(status.SnapshotHistory, m.snapshotHistory)` at line 581 and before assigning to `m.cachedStatus` at line 582.
---
### C3. `SetStackProvider` write without synchronization (backup/backup.go)
**File:** `internal/backup/backup.go`, line 413-415
**Problem:** `m.stackProvider = provider` is a direct field write with no mutex protection. The field is read by `resolveAppBackupPaths()` and `RefreshCache()` concurrently.
**Fix:** Wrap the assignment in `m.mu.Lock()` / `m.mu.Unlock()`.
---
### C4. `GetFullStatus` shallow-copies mutable pointers (backup/backup.go)
**File:** `internal/backup/backup.go`, lines 619-620
**Problem:** `status.LastDBDump = m.lastDBDump` and `status.LastBackup = m.lastBackup` copy pointer values. Callers can mutate the referenced objects, creating races with the manager's internal state.
**Fix:** Deep-copy these structs. If `lastDBDump` is non-nil, create a copy: `copyDump := *m.lastDBDump; status.LastDBDump = &copyDump`. Same for `lastBackup`. Also deep-copy the `Results` slice inside `lastDBDump` if present (since it's a slice of structs, a simple `copy` into a new slice suffices).
---
### C5. `IsSystemDisk` uses 8-bit major mask (storage/safety_linux.go)
**File:** `internal/storage/safety_linux.go`, lines 30-31
**Problem:** `major := (stat.Rdev >> 8) & 0xff` extracts only 8 bits. Linux dev_t uses a 12-bit major: bits 8-19 with bits 0-7 at positions 8-15 and bits 8-11 at positions 44-47 (for 64-bit). For the common case, the formula should use `unix.Major()` from `golang.org/x/sys/unix` or at minimum the standard extraction: `major = (dev & 0xfff00) >> 8`. Also, the function only compares major numbers — ALL SCSI disks share major 8, so this blocks formatting of any SCSI disk if root is also SCSI.
**Fix:** Use `unix.Major(uint64(stat.Rdev))` and `unix.Minor(uint64(stat.Rdev))` from `golang.org/x/sys/unix` (already a dependency). Compare both major AND the disk portion of the minor to identify the physical disk. For SCSI (major 8), disks are distinguished by `minor / 16`. For NVMe (major 259), compare the whole-disk minor. The simplest correct approach: resolve both device paths to their parent disk name (e.g., via `partitionToParentDisk`) and compare those names instead of using device numbers.
---
### C6. No `/dev/` prefix validation on `DevicePath` (storage/format_linux.go)
**File:** `internal/storage/format_linux.go`, around lines 37-41
**Problem:** The `FormatAndMount` function accepts `DevicePath` from user JSON without validating it starts with `/dev/`. An attacker could pass arbitrary paths.
**Fix:** At the start of `FormatAndMount`, validate:
```go
if !strings.HasPrefix(req.DevicePath, "/dev/") {
return fmt.Errorf("invalid device path: must start with /dev/")
}
if strings.Contains(req.DevicePath, "..") {
return fmt.Errorf("invalid device path: must not contain ..")
}
```
---
### C7. Path traversal in `extractName` (api/router.go)
**File:** `internal/api/router.go`, lines 769-771
**Problem:** `extractName()` extracts a stack name from the URL path but doesn't reject `..`, `/`, or `\` characters. This name is used in file paths and Docker commands.
**Fix:** Add validation after extracting the name:
```go
func extractName(r *http.Request) string {
name := // ... existing extraction logic
if strings.ContainsAny(name, "/\\") || name == ".." || name == "." || name == "" {
return ""
func (s *Settings) SetAppBackupBulk(prefs map[string]bool) error {
s.mu.Lock()
defer s.mu.Unlock()
if s.AppBackup == nil {
s.AppBackup = make(map[string]AppBackupPrefs)
}
return name
}
```
Then ensure all callers handle empty string as an error (most already check `name == ""`).
---
### C8. Path traversal in `TargetPath` (web/storage_handlers.go)
**File:** `internal/web/storage_handlers.go`, around lines 374-417
**Problem:** Migration API accepts `TargetPath` from user input without validating it's a registered storage path. Allows writing to arbitrary filesystem locations.
**Fix:** Validate `TargetPath` against the list of registered storage paths from settings:
```go
registeredPaths := s.settings.GetStoragePaths() // or equivalent
valid := false
for _, p := range registeredPaths {
if req.TargetPath == p.Path {
valid = true
break
for name, enabled := range prefs {
existing := s.AppBackup[name]
existing.Enabled = enabled
s.AppBackup[name] = existing
}
return s.save()
}
if !valid {
http.Error(w, "invalid target path", http.StatusBadRequest)
return
```
**Key change:** Remove line 274 (`newMap := make(...)`) and line 280 (`s.AppBackup = newMap`). Iterate over `s.AppBackup` directly. The comment on line 270 says "Preserves existing CrossDrive configs" — the current code does NOT do that for stacks absent from the input.
---
### C2. `UpdateStackConfig` nil Env map panic (stacks/deploy.go)
**File:** `internal/stacks/deploy.go`, line 245
**Problem:** `appCfg.Env[key] = val` panics with "assignment to entry in nil map" when `appCfg.Env` is nil. This happens when app.yaml exists but has no `env:` section (e.g., `deployed: true` with no env vars).
The fix already exists in `UpdateOptionalConfig` (lines 312-314) but is missing from `UpdateStackConfig`.
**Fix:** Add nil check after line 233 (after `if appCfg == nil || !appCfg.Deployed` check), before the loop:
```go
if appCfg.Env == nil {
appCfg.Env = make(map[string]string)
}
```
---
### C9. Path traversal in `DestinationPath` (api/router.go)
### C3. `ValidateDump` missing `scanner.Err()` check (backup/dbdump.go)
**File:** `internal/api/router.go`, in `saveCrossDriveConfig()` around lines 587-644
**File:** `internal/backup/dbdump.go`, after line 295
**Problem:** `DestinationPath` from user JSON not validated against registered storage paths. Allows cross-drive backup to write anywhere.
**Problem:** After the `for scanner.Scan() { ... }` loop (lines 269-295), `scanner.Err()` is never checked. If an I/O error occurs during scanning (disk error, permission issue, file truncated), the scanner silently stops iterating. The function continues with whatever partial data was read and may incorrectly mark the dump as valid (`v.Valid = true` at line 315). This means a corrupted or unreadable dump file could pass validation.
**Fix:** Same pattern as C8 — validate `DestinationPath` against registered storage paths. Also validate it's not the same device as the source (this check may already exist in `crossdrive.go`, but the API should validate early).
**Fix:** Add error check immediately after the for loop ends (after line 295, before line 296):
---
### C10. Path traversal in `ParseComposeHDDMounts` (stacks/delete.go)
**File:** `internal/stacks/delete.go`, lines 243-251
**Problem:** After `${HDD_PATH}` substitution, the resulting path is checked with `strings.HasPrefix(hostPath, hddPath)` BEFORE `filepath.Clean`. An attacker could craft `${HDD_PATH}/../../etc/passwd` — the prefix check passes, but the resolved path escapes the HDD directory.
**Fix:** Apply `filepath.Clean()` to the resolved path BEFORE the prefix check:
```go
resolved := filepath.Clean(hostPath)
if !strings.HasPrefix(resolved, filepath.Clean(hddPath) + string(filepath.Separator)) {
continue // skip paths that escape HDD directory
if err := scanner.Err(); err != nil {
v.Error = fmt.Sprintf("hiba az olvasás közben: %v", err)
log.Printf("[WARN] ValidateDump FAIL: %s — scanner error: %v", filePath, err)
return v
}
```
@@ -168,432 +100,324 @@ if !strings.HasPrefix(resolved, filepath.Clean(hddPath) + string(filepath.Separa
## HIGH Fixes
### H1. `ValidateDump` reads entire file into memory (backup/dbdump.go)
### H1. `nextDailyRun` uses `Add(24h)` — wrong across DST transitions (scheduler/scheduler.go)
**File:** `internal/backup/dbdump.go`, line 251
**File:** `internal/scheduler/scheduler.go`, line 252
**Problem:** `os.ReadFile(filePath)` loads entire SQL dump into memory. For large databases (500MB+), this causes massive memory allocation during every 5-minute cache refresh.
**Problem:** `next = next.Add(24 * time.Hour)` is used when today's scheduled time has passed. During a DST transition in Europe/Budapest:
- Spring forward (March): Adding 24h to "02:30 CET" gives "03:30 CEST" instead of "02:30 CEST" — the job runs 1 hour late
- Fall back (October): Adding 24h from "02:30 CEST" gives "01:30 CET" — the job runs 1 hour early
**Fix:** Replace with `bufio.Scanner` reading line by line, or `io.LimitReader` to read only the first N bytes (e.g., 64KB for header) and last N bytes (for footer/completion markers). The validation only checks for structural markers like `CREATE TABLE`, `INSERT INTO`, completion tags — these are at the start and end of the file.
**Fix:** Replace line 252 with calendar-day arithmetic:
---
### H2. `du` calls without timeout (backup/appdata.go)
**File:** `internal/backup/appdata.go`, lines 136 and 150
**Problem:** `exec.Command("du", "-sh", path)` and `exec.Command("du", "-sb", path)` have no timeout. Can hang on slow/unmounted filesystems, blocking the web UI.
**Fix:** Use `exec.CommandContext` with a 30-second timeout:
```go
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
cmd := exec.CommandContext(ctx, "du", "-sh", path)
```
---
### H3. Double `du` invocation per mount (backup/appdata.go)
**File:** `internal/backup/appdata.go`, lines 79-80
**Problem:** Both `appDirSizeBytes(mount)` and `appDirSizeHuman(mount)` run `du` independently. That's 2 subprocess calls per mount point.
**Fix:** Create a single function `appDirSize(path string) (int64, string)` that runs `du -sb` once, parses the byte count, and formats the human string using `humanizeBytes()`. Replace both calls with this single function.
---
### H4. Snapshot validation only checks first 100 (backup/restore.go)
**File:** `internal/backup/restore.go`, line 22
**Problem:** `m.restic.ListSnapshots(100)` limits to 100 snapshots. Older snapshots can't be restored.
**Fix:** Either:
- Pass 0 or a very large number to list all snapshots, OR
- Better: validate the snapshot ID format with a regex (`^[0-9a-f]{8,64}$`) and skip the existence check (let `restic restore` fail naturally with a clear error if the ID doesn't exist). This avoids listing all snapshots entirely.
---
### H5. No pruning for cross-drive restic repos (backup/crossdrive.go)
**File:** `internal/backup/crossdrive.go`
**Problem:** Cross-drive restic backups accumulate snapshots forever. Main backups have pruning via `restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6`, but cross-drive repos don't.
**Fix:** After a successful cross-drive restic backup, run a prune step similar to the main backup. Add a `pruneRepo(ctx, repoPath, pwFile string)` function that runs `restic forget --keep-daily 7 --keep-weekly 4 --prune` on the cross-drive repo. Call it after backup in `runResticBackup()`.
---
### H6. Temp password file management (backup/crossdrive.go)
**File:** `internal/backup/crossdrive.go`, lines 230-240
**Problem:** `defer os.Remove(pwFile.Name())` is registered before the file is closed. On some systems this can fail. Also, no `defer pwFile.Close()` — the explicit `Close()` at line 240 won't run if code panics between creation and close.
**Fix:** Reorganize:
```go
pwFile, err := os.CreateTemp("", "felhom-crossdrive-pw-*")
if err != nil { return ... }
pwPath := pwFile.Name()
if _, err := pwFile.WriteString(password); err != nil {
pwFile.Close()
os.Remove(pwPath)
return ...
}
pwFile.Close()
defer os.Remove(pwPath)
```
---
### H7. `dirSizeBytes` swallows all walk errors (backup/crossdrive.go)
**File:** `internal/backup/crossdrive.go`, lines 297-307
**Problem:** `filepath.Walk` callback returns `nil` on error, hiding permission/IO issues.
**Fix:** Return the error (or at least log it):
```go
err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
if err != nil {
return err // propagate instead of swallowing
}
if !info.IsDir() {
total += info.Size()
}
return nil
})
```
---
### H8. Non-atomic fstab write (storage/safety_linux.go)
**File:** `internal/storage/safety_linux.go`, lines 91-102
**Problem:** `AppendFstabEntry` writes directly to `/etc/fstab`. A crash during write corrupts the file, potentially bricking the system on reboot.
**Fix:** Read fstab, append entry, write to `/etc/fstab.tmp`, then `os.Rename("/etc/fstab.tmp", "/etc/fstab")`. This is the same atomic pattern used in settings.go.
---
### H9. `IsDeviceMounted` naive prefix matching (storage/safety_linux.go)
**File:** `internal/storage/safety_linux.go`, line 54
**Problem:** `strings.HasPrefix(fields[0], devicePath)` causes `/dev/sdb` to match `/dev/sdb1`, `/dev/sdb2`, etc. — and worse, `/dev/sdb` matches `/dev/sdba` (hypothetical).
**Fix:** After the prefix check, verify the next character is either end-of-string or a digit (for partition suffixes):
```go
src := fields[0]
if src == devicePath || (strings.HasPrefix(src, devicePath) && len(src) > len(devicePath) && (src[len(devicePath)] >= '0' && src[len(devicePath)] <= '9' || src[len(devicePath)] == 'p')) {
return true, nil
// If the time has already passed today, schedule for tomorrow
if !next.After(now) {
next = time.Date(now.Year(), now.Month(), now.Day()+1, hour, min, 0, 0, loc)
}
```
Or better: just check exact match for the device, and separately check if any partition of it is mounted.
This uses `time.Date` which correctly handles DST by constructing the time in the target timezone for the specific calendar date.
---
### H10. eMMC device mapping bug (storage/scan_linux.go)
### H2. `nextDailyRun` calls `time.LoadLocation` on every iteration (scheduler/scheduler.go)
**File:** `internal/storage/scan_linux.go`, lines 161-176
**File:** `internal/scheduler/scheduler.go`, lines 241-245
**Problem:** `partitionToParentDisk` strips trailing digits to find the parent. For eMMC (`mmcblk0p1`), it strips `0p1``mmcblk` instead of correctly producing `mmcblk0`. The `p` separator between device number and partition number is not handled.
**Problem:** `time.LoadLocation("Europe/Budapest")` is called every time `nextDailyRun` runs — i.e., every single day for each daily job, and additionally on startup. This is both inefficient (filesystem read + parsing each time) and dangerous: if the system loses access to tzdata at runtime (e.g., container rebuild removing the package), the fallback to UTC happens silently with only a log line. Jobs would shift by 1-2 hours with no alert.
**Fix:** Cache the timezone at package level or in the Scheduler struct. Preferred approach — package-level `sync.Once`:
**Fix:** Add eMMC/NVMe pattern detection:
```go
func partitionToParentDisk(partName string) string {
// Handle mmcblk0p1, nvme0n1p1 patterns (separator is "p" before partition number)
if idx := strings.LastIndex(partName, "p"); idx > 0 {
prefix := partName[:idx]
suffix := partName[idx+1:]
if len(suffix) > 0 && suffix[0] >= '0' && suffix[0] <= '9' {
// Check if prefix ends with a digit (e.g., mmcblk0, nvme0n1)
if prefix[len(prefix)-1] >= '0' && prefix[len(prefix)-1] <= '9' {
return prefix
}
var (
budapestLoc *time.Location
budapestLocOnce sync.Once
)
func getBudapestLocation() *time.Location {
budapestLocOnce.Do(func() {
loc, err := time.LoadLocation("Europe/Budapest")
if err != nil {
log.Printf("[ERROR] Cannot load Europe/Budapest timezone: %v — using UTC", err)
loc = time.UTC
}
}
// Default: strip trailing digits (sda1 → sda)
return strings.TrimRight(partName, "0123456789")
budapestLoc = loc
})
return budapestLoc
}
```
Then in `nextDailyRun`, replace lines 241-245 with: `loc := getBudapestLocation()`.
---
### H3. `settings.save()` leaks .tmp file on `WriteFile` failure (settings/settings.go)
**File:** `internal/settings/settings.go`, lines 138-139
**Problem:** If `os.WriteFile(tmpPath, data, 0644)` fails (e.g., disk full, permission denied), the function returns the error but does NOT clean up the .tmp file. Only the `os.Rename` failure path (line 143) calls `os.Remove(tmpPath)`. Repeated failures accumulate orphaned .tmp files.
**Fix:** Add cleanup to the WriteFile error path:
```go
if err := os.WriteFile(tmpPath, data, 0644); err != nil {
os.Remove(tmpPath) // clean up partial file
return fmt.Errorf("writing tmp settings: %w", err)
}
```
---
### H11. Data race on `bytesCopied` in rsync error path (storage/migrate.go)
### H4. `SetNotificationPrefs` panics on nil input (settings/settings.go)
**File:** `internal/storage/migrate.go`, around line 243
**File:** `internal/settings/settings.go`, line 220
**Problem:** In `runRsync`, the `bytesCopied` variable is read (in the error return path) without holding the mutex, while a goroutine may be writing to it.
**Problem:** `copy := *prefs` dereferences the pointer without checking for nil. If any caller passes `nil` (possible from malformed JSON decoding or future code changes), this panics.
**Fix:** Acquire the lock before reading `bytesCopied` in the error path, or use `atomic.Int64` instead of a mutex-protected variable.
**Fix:** Add nil guard at the start of the function (after line 217, before line 218):
---
### H12. Goroutine leak in rsync stdout reader (storage/migrate.go)
**File:** `internal/storage/migrate.go`, lines 214-237
**Problem:** A goroutine reads rsync's stdout. If the main function returns early (e.g., on context cancellation), the goroutine may block on `scanner.Scan()` forever if stdout isn't closed, or try to send on a closed channel.
**Fix:** Ensure the goroutine exits cleanly:
1. Use `cmd.StdoutPipe()` and ensure it's closed on function exit
2. Use a `done` channel or context cancellation to signal the goroutine
3. Don't send on the progress channel after function returns — use a select with a done channel
---
### H13. Path prefix match without separator (storage/migrate.go)
**File:** `internal/storage/migrate.go`, lines 128-131
**Problem:** `strings.HasPrefix(stackMount, currentHDD)` causes `/mnt/hdd` to match `/mnt/hdd_backup/data`. Stacks on a similarly-named but different mount would be migrated incorrectly.
**Fix:** Append a trailing separator:
```go
if strings.HasPrefix(stackMount, currentHDD + "/") || stackMount == currentHDD {
```
---
### H14. `DeleteStack` continues after failed `docker compose down` (stacks/delete.go)
**File:** `internal/stacks/delete.go`, lines 88-92
**Problem:** If `docker compose down` fails, the function continues to delete stack files and HDD data. This can leave orphaned containers running while their config is deleted.
**Fix:** Return the error from `docker compose down` failure. If the user wants to force-delete, that should be a separate explicit option.
---
### H15. XSS in flash messages (web/handlers.go)
**File:** `internal/web/handlers.go`, around lines 282-288
**Problem:** URL query parameters (`?success=...`, `?error=...`) are passed to template data. If templates render these with `{{.}}` instead of escaping, it's XSS. Go's `html/template` auto-escapes in HTML context, but check if any use `{{. | safeHTML}}` or JavaScript context.
**Fix:** Verify all flash message rendering uses standard `{{.Success}}` / `{{.Error}}` in HTML context (which Go auto-escapes). If any template uses these values inside `<script>` tags or `onclick` attributes, switch to JSON encoding or remove the pattern. Consider using session-based flash messages instead of URL params to prevent reflected XSS entirely.
---
### H16. `exec.Command("docker")` without timeout (web/handlers.go)
**File:** `internal/web/handlers.go`, around lines 1247-1253
**Problem:** `syncFileBrowserMounts()` runs `docker compose up -d` without a context timeout. Can hang indefinitely.
**Fix:** Use `exec.CommandContext` with a 60-second timeout.
---
### H17. `SetNotificationPrefs` stores caller's pointer (settings/settings.go)
**File:** `internal/settings/settings.go`, around line 219
**Problem:** `s.data.NotificationPrefs = prefs` stores the caller's pointer/struct directly. If the caller later modifies the struct, it mutates settings state without going through the mutex.
**Fix:** Deep-copy the prefs:
```go
func (s *Settings) SetNotificationPrefs(prefs NotificationPrefs) {
func (s *Settings) SetNotificationPrefs(prefs *NotificationPrefs) error {
if prefs == nil {
return fmt.Errorf("notification preferences cannot be nil")
}
s.mu.Lock()
defer s.mu.Unlock()
copy := prefs // value copy
copy.EnabledEvents = make([]string, len(prefs.EnabledEvents))
copy(copy.EnabledEvents, prefs.EnabledEvents)
s.data.NotificationPrefs = copy
s.saveLocked()
// ... rest unchanged
}
```
---
### H18. `wipefs` error silently discarded (storage/format_linux.go)
### H5. `appDirSize` ignores `Sscanf` return value (backup/appdata.go)
**File:** `internal/storage/format_linux.go`, line 74
**File:** `internal/backup/appdata.go`, line 150
**Problem:** `_ = exec.Command("wipefs", ...).Run()` discards the error. If `wipefs` fails, stale filesystem signatures remain, causing `mkfs.ext4` or mount to misbehave.
**Problem:** `fmt.Sscanf(fields[0], "%d", &size)` return value is not checked. If `du -sb` returns malformed output, `size` silently stays 0 and the function returns `(0, "0 B")` which looks like a successful zero-size measurement rather than a parse error.
**Fix:** Check the error. If `wipefs` fails, log a warning but continue (some systems don't have `wipefs`). At minimum don't use `_ =`:
Same pattern exists in `stacks/delete.go` line 289 (`getDirSizeBytes`).
**Fix for appdata.go line 150:**
```go
if err := exec.Command("wipefs", "-a", partPath).Run(); err != nil {
log.Printf("[WARN] wipefs failed on %s: %v (continuing)", partPath, err)
var size int64
n, _ := fmt.Sscanf(fields[0], "%d", &size)
if n != 1 {
return 0, "?"
}
return size, humanizeBytes(size)
```
**Fix for stacks/delete.go line 289:**
```go
var size int64
if n, _ := fmt.Sscanf(fields[0], "%d", &size); n != 1 {
return 0
}
return size
```
---
### H19. Orphaned fstab entry on mount failure (storage/format_linux.go)
### H6. `getDirSizeBytes` has no timeout (stacks/delete.go)
**File:** `internal/storage/format_linux.go`, lines 140-150
**File:** `internal/stacks/delete.go`, lines 280-293
**Problem:** If fstab entry is written but the subsequent `mount` command fails, an orphaned entry remains in fstab. On next reboot, the system may hang waiting for a mount that doesn't work.
**Problem:** `exec.Command("du", "-sb", path)` runs with no timeout. If the path is on a slow/unmounted filesystem, this hangs the delete handler indefinitely. The analogous function in `backup/appdata.go` already uses `exec.CommandContext` with a 30s timeout.
**Fix:** If mount fails after fstab write, remove the fstab entry as rollback. Read fstab, remove the line containing the UUID, write back atomically.
**Fix:**
```go
func getDirSizeBytes(path string) int64 {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
cmd := exec.CommandContext(ctx, "du", "-sb", path)
// ... rest unchanged
}
```
Add `"context"` and `"time"` to the import block if not already present.
---
### H7. `dbdump.go` tmpFile not using `defer Close` (backup/dbdump.go)
**File:** `internal/backup/dbdump.go`, lines 178-190
**Problem:** After `os.Create(tmpPath)` at line 178, the file is closed explicitly at line 190 (`tmpFile.Close()`). But if `cmd.Run()` panics (rather than returning an error), the file handle leaks. The standard Go idiom is to use `defer` for resource cleanup.
**Fix:** Replace the explicit close with a deferred close:
```go
tmpFile, err := os.Create(tmpPath)
if err != nil {
result.Error = fmt.Errorf("creating tmp file: %w", err)
result.Duration = time.Since(start)
return result
}
defer tmpFile.Close()
cmd.Stdout = tmpFile
// ... remove the explicit tmpFile.Close() at line 190
```
Note: The `defer` runs after `cmd.Run()`, so stdout is properly flushed by the subprocess exit. On Linux, closing an already-closed fd is a no-op, so this is safe even if the OS closes it implicitly.
---
### H8. `UpdateCrossDriveStatus` comment contradicts implementation (settings/settings.go)
**File:** `internal/settings/settings.go`, line 324
**Problem:** The comment says *"fn receives a pointer to the CrossDriveBackup (creates one if nil) and may mutate it"* but the implementation at lines 332-333 does the opposite — it returns nil when CrossDrive is nil (does NOT create one).
**Fix:** Update the comment:
```go
// UpdateCrossDriveStatus updates runtime status fields for a cross-drive backup in-place.
// fn receives a pointer to the CrossDriveBackup and may mutate it.
// If no cross-drive config exists for the stack, does nothing and returns nil.
```
---
## MEDIUM Fixes
### M1. `formatBytes` duplicate in dbdump.go (backup/dbdump.go)
### M1. `notify/notifier.go` custom `containsBytes` should use `strings.Contains` (notify/notifier.go)
**File:** `internal/backup/dbdump.go`, lines 467-483
**File:** `internal/notify/notifier.go`, lines 316-327
**Problem:** Duplicates `humanizeBytes()` from `appdata.go`.
**Problem:** The `contains()` and `containsBytes()` functions are custom reimplementations of `strings.Contains()`. While technically correct (Go uses signed `int` for `len()` so the subtraction doesn't underflow), this is unnecessary complexity. The standard library function is well-tested, optimized, and immediately recognizable.
**Fix:** Delete `formatBytes()` from dbdump.go and replace all calls with `humanizeBytes()` from appdata.go.
---
### M2. Dead code `.tmp` suffix check (backup/dbdump.go)
**File:** `internal/backup/dbdump.go`, lines 331-332
**Problem:** Line 328 already filters for `.sql` suffix, so `.tmp` check is unreachable.
**Fix:** Remove the dead `.tmp` check. If intent was to skip `.sql.tmp` files, fix the filter order: check `.tmp` before `.sql`.
---
### M3. `sizeBytes()` returns 0 for string types (storage/scan_linux.go)
**File:** `internal/storage/scan_linux.go`, lines 33-39
**Problem:** `lsblk` can return size as a string. The function only handles `float64` from JSON unmarshaling.
**Fix:** Add string handling:
**Fix:** Replace both functions with:
```go
case string:
n, _ := strconv.ParseUint(v, 10, 64)
return n
func contains(s, substr string) bool {
return strings.Contains(s, substr)
}
```
---
Or better: remove the `contains` function entirely and use `strings.Contains()` directly at all call sites. Delete the `containsBytes` function.
### M4. Missing `PARTUUID=` handling in fstab (storage/scan_linux.go)
**File:** `internal/storage/scan_linux.go`, lines 130-134
**Problem:** `getSystemDiskNames` only parses `UUID=` entries from fstab, missing `PARTUUID=` entries common on Raspberry Pi.
**Fix:** Add a `PARTUUID=` case that uses `blkid -t PARTUUID=xxx -o device` to resolve to a device path.
Add `"strings"` to the import block if not already present.
---
### M5. Inconsistent rollback in migration (storage/migrate.go)
### M2. `scheduler.Every()` doesn't validate interval > 0 (scheduler/scheduler.go)
**File:** `internal/storage/migrate.go`, lines 160-163
**File:** `internal/scheduler/scheduler.go`, lines 43-53
**Problem:** If `updateFn` fails (the callback that updates app config to point to new path), the data has been copied but the config still points to old path. No cleanup of copied data on target.
**Problem:** If `interval` is zero or negative, `time.NewTicker(interval)` panics at line 134 with *"non-positive interval for NewTicker"*. While current callers always pass valid intervals, a future misconfiguration (e.g., a missing config value parsed as `0`) would crash the controller at startup with no useful error message.
**Fix:** If `updateFn` fails, attempt to clean up the target directory and return a clear error explaining the situation.
---
### M6. Dead `elapsed` variable (storage/migrate.go)
**File:** `internal/storage/migrate.go`, around line 177
**Problem:** `elapsed` is calculated but never used.
**Fix:** Remove or use it in a log message.
---
### M7. `time.LoadLocation` error silently discarded (web/handlers.go)
**File:** `internal/web/handlers.go`, lines 449-450, 541-542
**Problem:** `loc, _ := time.LoadLocation("Europe/Budapest")` — if tzdata is missing, `loc` is nil causing panic on `time.Now().In(loc)`.
**Fix:** Handle the error:
**Fix:** Add validation at the start of `Every()`:
```go
loc, err := time.LoadLocation("Europe/Budapest")
if err != nil {
loc = time.UTC
log.Printf("[WARN] timezone Europe/Budapest not available: %v", err)
func (s *Scheduler) Every(name string, interval time.Duration, fn JobFunc) {
if interval <= 0 {
s.logger.Printf("[ERROR] Periodic job %s has invalid interval %s — job not registered", name, interval)
return
}
// ... rest unchanged
}
```
---
### M8. "Run all" triggers manual-schedule backups (api/router.go)
### M3. `scheduler.executeJob` doesn't set `LastRun` on panic (scheduler/scheduler.go)
**File:** `internal/api/router.go`, lines 701-703
**File:** `internal/scheduler/scheduler.go`, lines 186-193
**Problem:** The "run all" endpoint calls `RunAllScheduled(ctx, "manual")` or iterates all configs. If the intent of "manual" schedule was "only run when explicitly triggered per-app", running all defeats this.
**Problem:** When a job panics, the panic recovery defer at lines 186-193 sets `job.LastErr` but does NOT set `job.LastRun`. The normal path sets both at lines 203-205, but those lines are never reached during a panic. This means after a panic, the job status shows the panic error with a stale `LastRun` timestamp (or zero time if the job never succeeded).
**Fix:** Clarify intent. If "Run all" should only trigger daily+weekly configs, filter out manual ones. If manual should also be included in "run all", document this behavior.
---
### M9. Background goroutines use `context.Background()` (api/router.go)
**File:** `internal/api/router.go`, lines 658-662, 693-704
**Problem:** Background goroutines for backup/restore use `context.Background()` and survive server shutdown, potentially corrupting data.
**Fix:** Use a server-level context that's cancelled on shutdown. Pass the server's context (from `http.Server.BaseContext` or a custom one) and derive goroutine contexts from it.
---
### M10. `filterSnapshotsByPaths` imprecise prefix (api/router.go)
**File:** `internal/api/router.go`, lines 476-490
**Problem:** `strings.HasPrefix(snapPath, reqPath)` causes `/mnt/hdd_1` to match `/mnt/hdd_10/data`.
**Fix:** Append separator: `strings.HasPrefix(snapPath, reqPath + "/") || snapPath == reqPath`.
---
### M11. XSS in `editStorageLabel` innerHTML (templates/settings.html)
**File:** `internal/web/templates/settings.html`, around lines 280-296
**Problem:** `cancelEditLabel` function uses `innerHTML` to restore label content. If `path` or `currentLabel` contain HTML, it's XSS.
**Fix:** Use `textContent` instead of `innerHTML`:
```javascript
el.textContent = currentLabel || path;
```
---
### M12. `GetNotificationPrefs` leaks slice reference (settings/settings.go)
**File:** `internal/settings/settings.go`, around line 196
**Problem:** Returns direct reference to `DefaultEnabledEvents` slice. Caller can modify the global default.
**Fix:** Return a copy of the slice:
**Fix:** Update the panic recovery to also set LastRun:
```go
events := make([]string, len(DefaultEnabledEvents))
copy(events, DefaultEnabledEvents)
defer func() {
if r := recover(); r != nil {
s.mu.Lock()
job.LastErr = fmt.Errorf("panic: %v", r)
job.LastRun = time.Now()
s.mu.Unlock()
s.logger.Printf("[ERROR] Job %s panicked: %v", job.Name, r)
}
}()
```
---
### M13. `CheckBackupDestination` tier priority (system/mounts_linux.go)
### M4. `logPostStartStatus` goroutine captures env slice by reference (stacks/manager.go)
**Problem:** Tier 4 (disk full) overwrites Tier 3 (system drive) warning. If a drive is both the system drive and >90% full, only the disk-full warning shows.
**File:** `internal/stacks/manager.go`, around lines 630-651
**Fix:** Accumulate warnings instead of overwriting. Return a slice of warnings, or use the highest severity.
**Problem:** The goroutine captures `env []string` from the function parameter. If the calling code reuses or appends to the slice after calling `logPostStartStatus`, the goroutine may read modified data due to Go's slice sharing semantics. While this is unlikely in current code (callers don't modify `env` after the call), it's a defensive programming issue.
---
### M14. Hardcoded paths in backup.go
**File:** `internal/backup/backup.go`, lines 245-248, 524, 683
**Problem:** `/opt/docker/felhom-controller/controller.yaml` is hardcoded in 3 places.
**Fix:** Add a constant or derive from `m.cfg.Paths.DataDir`:
**Fix:** Copy the slice at the start of the function:
```go
const controllerYAMLPath = "/opt/docker/felhom-controller/controller.yaml"
func (m *Manager) logPostStartStatus(name, stackDir string, env []string) {
envCopy := make([]string, len(env))
copy(envCopy, env)
go func() {
time.Sleep(3 * time.Second)
output, err := m.composeExecCustomEnv(stackDir, envCopy, "ps", "-a", ...)
// ... rest uses envCopy instead of env
}()
}
```
Or better: `filepath.Join(m.cfg.Paths.DataDir, "controller.yaml")`.
---
### M5. Multiple `time.LoadLocation("Europe/Budapest")` calls across the codebase
**Files:** `internal/web/handlers.go` (lines 450, 546), `internal/web/funcmap.go` (lines 15-18)
**Problem:** Same `time.LoadLocation` call duplicated in multiple places, each with its own error handling. The handlers.go calls were fixed with M7 from last session to fall back to UTC, but the repeated pattern is inefficient and error-prone.
**Fix:** Create a package-level helper in `internal/web/` (e.g., in funcmap.go since it already loads the timezone):
```go
var (
webTimezone *time.Location
webTimezoneOnce sync.Once
)
func getTimezone() *time.Location {
webTimezoneOnce.Do(func() {
loc, err := time.LoadLocation("Europe/Budapest")
if err != nil {
loc = time.UTC
}
webTimezone = loc
})
return webTimezone
}
```
Then replace all `time.LoadLocation("Europe/Budapest")` calls in the web package with `getTimezone()`.
---
## Previously Identified (from session 1) — Still Open
These were in the previous TASK.md but were not yet fixed:
### P1. Missing `PARTUUID=` handling in fstab (storage/scan_linux.go)
**File:** `internal/storage/scan_linux.go`, around line 130
`getSystemDiskNames` only parses `UUID=` entries from fstab, missing `PARTUUID=` entries common on Raspberry Pi. Add a `PARTUUID=` case that uses `blkid -t PARTUUID=xxx -o device` to resolve to a device path.
### P2. "Run all" triggers manual-schedule cross-drive backups (api/router.go)
If the "Run all" endpoint should only trigger daily+weekly configs, it should filter out manual ones. Clarify intent and filter accordingly.
### P3. Background goroutines use `context.Background()` (api/router.go)
Background goroutines for backup/restore use `context.Background()` and survive server shutdown. Use a server-level context that's cancelled on shutdown.
### P4. `CheckBackupDestination` tier priority (system/mounts_linux.go)
Tier 4 (disk full) overwrites Tier 3 (system drive) warning. Should accumulate warnings instead of overwriting.
### P5. Hardcoded `/opt/docker/felhom-controller/controller.yaml` paths (backup/backup.go)
Should use a constant or derive from `m.cfg.Paths.DataDir`.
### P6. Inconsistent rollback in migration (storage/migrate.go)
If `updateFn` callback fails after data copy, the copied data on target is not cleaned up.
---
@@ -603,14 +427,15 @@ After making fixes:
1. **Build:** `go build ./...` from the `controller/` directory must succeed
2. **Vet:** `go vet ./...` must pass
3. **Key areas to manually verify:**
- Storage format/mount still works (C5, C6, H8, H9, H18, H19)
- Backup cycle completes (C1-C4, H1-H7)
- Migration works (H11-H13, M5)
- Web UI loads without errors (H15, M11)
- Settings save/load cycle (C1 — create a settings.json with cross-drive configs, call SetAppBackupBulk with a subset, verify missing stacks are preserved)
- Stack config update with empty env (C2 — deploy a stack, clear its env, try UpdateStackConfig)
- DB dump validation on large files (C3 — ensure scanner.Err is checked)
- Daily job scheduling around midnight (H1)
- Settings file operations under disk-full conditions (H3)
## Post-Fix Checklist
- [ ] Add `CHANGELOG.md` entry (session 39, v0.12.3) listing all fixes
- [ ] Add `CHANGELOG.md` entry listing all fixes
- [ ] Update `controller/README.md` if any API behavior changed
- [ ] `go build ./...` passes
- [ ] `go vet ./...` passes