diff --git a/TASK.md b/TASK.md index 65483a4..92c5957 100644 --- a/TASK.md +++ b/TASK.md @@ -1,17 +1,17 @@ -# TASK.md — Bug Fix Plan for Sonnet 4.5 +# TASK.md — Bug Fix Plan for Sonnet 4.5 (Session 2) ## Prompt / Context Gathering Before starting any fixes, read the following files to understand the project: 1. **`controller/README.md`** — Full architecture, module map, API endpoints, feature documentation -2. **`CHANGELOG.md`** — Recent changes (sessions 26-38, 2026-02-17) for context on what was implemented +2. **`CHANGELOG.md`** — Recent changes for context on what was implemented (including yesterday's bug fixes) 3. **`controller/TASK.md`** (this file) — The bug list and fix instructions Then read each source file listed in a bug section before fixing it. **After all fixes are complete**, update: -- `CHANGELOG.md` — Add a new entry at the top following the existing format (session 39, next version **v0.12.3**) +- `CHANGELOG.md` — Add a new entry at the top following the existing format (next session, next version **v0.12.4**) - `controller/README.md` — If any fix changes behavior, API, or architecture, update the relevant section --- @@ -20,7 +20,7 @@ Then read each source file listed in a bug section before fixing it. Fixes are organized into 3 tiers: -- **CRITICAL** — Data races, security vulnerabilities, data loss risks. Fix these first. +- **CRITICAL** — Data loss, panics, crashes. Fix these first. - **HIGH** — Logic errors, resource leaks, incorrect behavior. Fix these second. - **MEDIUM** — Edge cases, code quality, minor issues. Fix if time allows. @@ -28,139 +28,71 @@ Fixes are organized into 3 tiers: ## CRITICAL Fixes -### C1. Data race on `m.lastDBDump.Results` mutation (backup/backup.go) +### C1. `SetAppBackupBulk` data loss + nil map panic (settings/settings.go) -**File:** `internal/backup/backup.go`, around lines 556-570 +**File:** `internal/settings/settings.go`, lines 271-282 -**Problem:** In `RefreshCache()`, the code mutates `m.lastDBDump.Results[i].Validation` while holding NO lock. The mutex is only acquired later at line 574. Other goroutines reading `m.lastDBDump` via `GetFullStatus()` can see torn writes. +**Problem (two bugs in one function):** -**Fix:** Move the `m.lastDBDump.Results` mutation block (lines 556-570) inside the `m.mu.Lock()` section that starts at line 574. The re-validation loop should happen after acquiring the lock, before setting `m.cachedStatus`. +1. **Data loss:** Line 280 `s.AppBackup = newMap` replaces the ENTIRE map with only the keys from the input `prefs` parameter. Any stacks NOT in the input map are permanently deleted from the settings file, including their `CrossDrive` backup configurations. ---- + Example: Settings has `{"app1": {Enabled:true, CrossDrive:{...}}, "app2": {Enabled:false}}`. Caller passes `prefs = {"app1": false}` (intending to disable only app1). After this function, app2's entire config is deleted. -### C2. SnapshotHistory reversed after unlock (backup/backup.go) +2. **Nil map panic:** Line 276 reads `s.AppBackup[name]` but there is no nil check on `s.AppBackup` before the loop. If `s.AppBackup` is nil (new installation, empty settings), this panics. -**File:** `internal/backup/backup.go`, around lines 582-589 +**Fix:** Update the map IN PLACE instead of replacing it. Add nil guard: -**Problem:** `m.cachedStatus = status` is set at line 582 while holding the lock, but the snapshot history reversal at lines 587-589 happens AFTER `m.mu.Unlock()`. Since `status.SnapshotHistory` is a slice backed by the same array as `m.cachedStatus.SnapshotHistory` (created by `copy` which copies elements but shares nothing — wait, actually `copy` creates a new backing array, so this is safe for the array). However, the local `status` variable IS `m.cachedStatus` (assigned at line 582), so reversing `status.SnapshotHistory` also reverses `m.cachedStatus.SnapshotHistory` without the lock. - -**Fix:** Move the reversal loop (lines 587-589) BEFORE `m.mu.Unlock()`, right after `copy(status.SnapshotHistory, m.snapshotHistory)` at line 581 and before assigning to `m.cachedStatus` at line 582. - ---- - -### C3. `SetStackProvider` write without synchronization (backup/backup.go) - -**File:** `internal/backup/backup.go`, line 413-415 - -**Problem:** `m.stackProvider = provider` is a direct field write with no mutex protection. The field is read by `resolveAppBackupPaths()` and `RefreshCache()` concurrently. - -**Fix:** Wrap the assignment in `m.mu.Lock()` / `m.mu.Unlock()`. - ---- - -### C4. `GetFullStatus` shallow-copies mutable pointers (backup/backup.go) - -**File:** `internal/backup/backup.go`, lines 619-620 - -**Problem:** `status.LastDBDump = m.lastDBDump` and `status.LastBackup = m.lastBackup` copy pointer values. Callers can mutate the referenced objects, creating races with the manager's internal state. - -**Fix:** Deep-copy these structs. If `lastDBDump` is non-nil, create a copy: `copyDump := *m.lastDBDump; status.LastDBDump = ©Dump`. Same for `lastBackup`. Also deep-copy the `Results` slice inside `lastDBDump` if present (since it's a slice of structs, a simple `copy` into a new slice suffices). - ---- - -### C5. `IsSystemDisk` uses 8-bit major mask (storage/safety_linux.go) - -**File:** `internal/storage/safety_linux.go`, lines 30-31 - -**Problem:** `major := (stat.Rdev >> 8) & 0xff` extracts only 8 bits. Linux dev_t uses a 12-bit major: bits 8-19 with bits 0-7 at positions 8-15 and bits 8-11 at positions 44-47 (for 64-bit). For the common case, the formula should use `unix.Major()` from `golang.org/x/sys/unix` or at minimum the standard extraction: `major = (dev & 0xfff00) >> 8`. Also, the function only compares major numbers — ALL SCSI disks share major 8, so this blocks formatting of any SCSI disk if root is also SCSI. - -**Fix:** Use `unix.Major(uint64(stat.Rdev))` and `unix.Minor(uint64(stat.Rdev))` from `golang.org/x/sys/unix` (already a dependency). Compare both major AND the disk portion of the minor to identify the physical disk. For SCSI (major 8), disks are distinguished by `minor / 16`. For NVMe (major 259), compare the whole-disk minor. The simplest correct approach: resolve both device paths to their parent disk name (e.g., via `partitionToParentDisk`) and compare those names instead of using device numbers. - ---- - -### C6. No `/dev/` prefix validation on `DevicePath` (storage/format_linux.go) - -**File:** `internal/storage/format_linux.go`, around lines 37-41 - -**Problem:** The `FormatAndMount` function accepts `DevicePath` from user JSON without validating it starts with `/dev/`. An attacker could pass arbitrary paths. - -**Fix:** At the start of `FormatAndMount`, validate: ```go -if !strings.HasPrefix(req.DevicePath, "/dev/") { - return fmt.Errorf("invalid device path: must start with /dev/") -} -if strings.Contains(req.DevicePath, "..") { - return fmt.Errorf("invalid device path: must not contain ..") -} -``` - ---- - -### C7. Path traversal in `extractName` (api/router.go) - -**File:** `internal/api/router.go`, lines 769-771 - -**Problem:** `extractName()` extracts a stack name from the URL path but doesn't reject `..`, `/`, or `\` characters. This name is used in file paths and Docker commands. - -**Fix:** Add validation after extracting the name: -```go -func extractName(r *http.Request) string { - name := // ... existing extraction logic - if strings.ContainsAny(name, "/\\") || name == ".." || name == "." || name == "" { - return "" +func (s *Settings) SetAppBackupBulk(prefs map[string]bool) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.AppBackup == nil { + s.AppBackup = make(map[string]AppBackupPrefs) } - return name -} -``` -Then ensure all callers handle empty string as an error (most already check `name == ""`). - ---- - -### C8. Path traversal in `TargetPath` (web/storage_handlers.go) - -**File:** `internal/web/storage_handlers.go`, around lines 374-417 - -**Problem:** Migration API accepts `TargetPath` from user input without validating it's a registered storage path. Allows writing to arbitrary filesystem locations. - -**Fix:** Validate `TargetPath` against the list of registered storage paths from settings: -```go -registeredPaths := s.settings.GetStoragePaths() // or equivalent -valid := false -for _, p := range registeredPaths { - if req.TargetPath == p.Path { - valid = true - break + for name, enabled := range prefs { + existing := s.AppBackup[name] + existing.Enabled = enabled + s.AppBackup[name] = existing } + return s.save() } -if !valid { - http.Error(w, "invalid target path", http.StatusBadRequest) - return +``` + +**Key change:** Remove line 274 (`newMap := make(...)`) and line 280 (`s.AppBackup = newMap`). Iterate over `s.AppBackup` directly. The comment on line 270 says "Preserves existing CrossDrive configs" — the current code does NOT do that for stacks absent from the input. + +--- + +### C2. `UpdateStackConfig` nil Env map panic (stacks/deploy.go) + +**File:** `internal/stacks/deploy.go`, line 245 + +**Problem:** `appCfg.Env[key] = val` panics with "assignment to entry in nil map" when `appCfg.Env` is nil. This happens when app.yaml exists but has no `env:` section (e.g., `deployed: true` with no env vars). + +The fix already exists in `UpdateOptionalConfig` (lines 312-314) but is missing from `UpdateStackConfig`. + +**Fix:** Add nil check after line 233 (after `if appCfg == nil || !appCfg.Deployed` check), before the loop: + +```go +if appCfg.Env == nil { + appCfg.Env = make(map[string]string) } ``` --- -### C9. Path traversal in `DestinationPath` (api/router.go) +### C3. `ValidateDump` missing `scanner.Err()` check (backup/dbdump.go) -**File:** `internal/api/router.go`, in `saveCrossDriveConfig()` around lines 587-644 +**File:** `internal/backup/dbdump.go`, after line 295 -**Problem:** `DestinationPath` from user JSON not validated against registered storage paths. Allows cross-drive backup to write anywhere. +**Problem:** After the `for scanner.Scan() { ... }` loop (lines 269-295), `scanner.Err()` is never checked. If an I/O error occurs during scanning (disk error, permission issue, file truncated), the scanner silently stops iterating. The function continues with whatever partial data was read and may incorrectly mark the dump as valid (`v.Valid = true` at line 315). This means a corrupted or unreadable dump file could pass validation. -**Fix:** Same pattern as C8 — validate `DestinationPath` against registered storage paths. Also validate it's not the same device as the source (this check may already exist in `crossdrive.go`, but the API should validate early). +**Fix:** Add error check immediately after the for loop ends (after line 295, before line 296): ---- - -### C10. Path traversal in `ParseComposeHDDMounts` (stacks/delete.go) - -**File:** `internal/stacks/delete.go`, lines 243-251 - -**Problem:** After `${HDD_PATH}` substitution, the resulting path is checked with `strings.HasPrefix(hostPath, hddPath)` BEFORE `filepath.Clean`. An attacker could craft `${HDD_PATH}/../../etc/passwd` — the prefix check passes, but the resolved path escapes the HDD directory. - -**Fix:** Apply `filepath.Clean()` to the resolved path BEFORE the prefix check: ```go -resolved := filepath.Clean(hostPath) -if !strings.HasPrefix(resolved, filepath.Clean(hddPath) + string(filepath.Separator)) { - continue // skip paths that escape HDD directory +if err := scanner.Err(); err != nil { + v.Error = fmt.Sprintf("hiba az olvasás közben: %v", err) + log.Printf("[WARN] ValidateDump FAIL: %s — scanner error: %v", filePath, err) + return v } ``` @@ -168,432 +100,324 @@ if !strings.HasPrefix(resolved, filepath.Clean(hddPath) + string(filepath.Separa ## HIGH Fixes -### H1. `ValidateDump` reads entire file into memory (backup/dbdump.go) +### H1. `nextDailyRun` uses `Add(24h)` — wrong across DST transitions (scheduler/scheduler.go) -**File:** `internal/backup/dbdump.go`, line 251 +**File:** `internal/scheduler/scheduler.go`, line 252 -**Problem:** `os.ReadFile(filePath)` loads entire SQL dump into memory. For large databases (500MB+), this causes massive memory allocation during every 5-minute cache refresh. +**Problem:** `next = next.Add(24 * time.Hour)` is used when today's scheduled time has passed. During a DST transition in Europe/Budapest: +- Spring forward (March): Adding 24h to "02:30 CET" gives "03:30 CEST" instead of "02:30 CEST" — the job runs 1 hour late +- Fall back (October): Adding 24h from "02:30 CEST" gives "01:30 CET" — the job runs 1 hour early -**Fix:** Replace with `bufio.Scanner` reading line by line, or `io.LimitReader` to read only the first N bytes (e.g., 64KB for header) and last N bytes (for footer/completion markers). The validation only checks for structural markers like `CREATE TABLE`, `INSERT INTO`, completion tags — these are at the start and end of the file. +**Fix:** Replace line 252 with calendar-day arithmetic: ---- - -### H2. `du` calls without timeout (backup/appdata.go) - -**File:** `internal/backup/appdata.go`, lines 136 and 150 - -**Problem:** `exec.Command("du", "-sh", path)` and `exec.Command("du", "-sb", path)` have no timeout. Can hang on slow/unmounted filesystems, blocking the web UI. - -**Fix:** Use `exec.CommandContext` with a 30-second timeout: ```go -ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) -defer cancel() -cmd := exec.CommandContext(ctx, "du", "-sh", path) -``` - ---- - -### H3. Double `du` invocation per mount (backup/appdata.go) - -**File:** `internal/backup/appdata.go`, lines 79-80 - -**Problem:** Both `appDirSizeBytes(mount)` and `appDirSizeHuman(mount)` run `du` independently. That's 2 subprocess calls per mount point. - -**Fix:** Create a single function `appDirSize(path string) (int64, string)` that runs `du -sb` once, parses the byte count, and formats the human string using `humanizeBytes()`. Replace both calls with this single function. - ---- - -### H4. Snapshot validation only checks first 100 (backup/restore.go) - -**File:** `internal/backup/restore.go`, line 22 - -**Problem:** `m.restic.ListSnapshots(100)` limits to 100 snapshots. Older snapshots can't be restored. - -**Fix:** Either: -- Pass 0 or a very large number to list all snapshots, OR -- Better: validate the snapshot ID format with a regex (`^[0-9a-f]{8,64}$`) and skip the existence check (let `restic restore` fail naturally with a clear error if the ID doesn't exist). This avoids listing all snapshots entirely. - ---- - -### H5. No pruning for cross-drive restic repos (backup/crossdrive.go) - -**File:** `internal/backup/crossdrive.go` - -**Problem:** Cross-drive restic backups accumulate snapshots forever. Main backups have pruning via `restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6`, but cross-drive repos don't. - -**Fix:** After a successful cross-drive restic backup, run a prune step similar to the main backup. Add a `pruneRepo(ctx, repoPath, pwFile string)` function that runs `restic forget --keep-daily 7 --keep-weekly 4 --prune` on the cross-drive repo. Call it after backup in `runResticBackup()`. - ---- - -### H6. Temp password file management (backup/crossdrive.go) - -**File:** `internal/backup/crossdrive.go`, lines 230-240 - -**Problem:** `defer os.Remove(pwFile.Name())` is registered before the file is closed. On some systems this can fail. Also, no `defer pwFile.Close()` — the explicit `Close()` at line 240 won't run if code panics between creation and close. - -**Fix:** Reorganize: -```go -pwFile, err := os.CreateTemp("", "felhom-crossdrive-pw-*") -if err != nil { return ... } -pwPath := pwFile.Name() -if _, err := pwFile.WriteString(password); err != nil { - pwFile.Close() - os.Remove(pwPath) - return ... -} -pwFile.Close() -defer os.Remove(pwPath) -``` - ---- - -### H7. `dirSizeBytes` swallows all walk errors (backup/crossdrive.go) - -**File:** `internal/backup/crossdrive.go`, lines 297-307 - -**Problem:** `filepath.Walk` callback returns `nil` on error, hiding permission/IO issues. - -**Fix:** Return the error (or at least log it): -```go -err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error { - if err != nil { - return err // propagate instead of swallowing - } - if !info.IsDir() { - total += info.Size() - } - return nil -}) -``` - ---- - -### H8. Non-atomic fstab write (storage/safety_linux.go) - -**File:** `internal/storage/safety_linux.go`, lines 91-102 - -**Problem:** `AppendFstabEntry` writes directly to `/etc/fstab`. A crash during write corrupts the file, potentially bricking the system on reboot. - -**Fix:** Read fstab, append entry, write to `/etc/fstab.tmp`, then `os.Rename("/etc/fstab.tmp", "/etc/fstab")`. This is the same atomic pattern used in settings.go. - ---- - -### H9. `IsDeviceMounted` naive prefix matching (storage/safety_linux.go) - -**File:** `internal/storage/safety_linux.go`, line 54 - -**Problem:** `strings.HasPrefix(fields[0], devicePath)` causes `/dev/sdb` to match `/dev/sdb1`, `/dev/sdb2`, etc. — and worse, `/dev/sdb` matches `/dev/sdba` (hypothetical). - -**Fix:** After the prefix check, verify the next character is either end-of-string or a digit (for partition suffixes): -```go -src := fields[0] -if src == devicePath || (strings.HasPrefix(src, devicePath) && len(src) > len(devicePath) && (src[len(devicePath)] >= '0' && src[len(devicePath)] <= '9' || src[len(devicePath)] == 'p')) { - return true, nil +// If the time has already passed today, schedule for tomorrow +if !next.After(now) { + next = time.Date(now.Year(), now.Month(), now.Day()+1, hour, min, 0, 0, loc) } ``` -Or better: just check exact match for the device, and separately check if any partition of it is mounted. + +This uses `time.Date` which correctly handles DST by constructing the time in the target timezone for the specific calendar date. --- -### H10. eMMC device mapping bug (storage/scan_linux.go) +### H2. `nextDailyRun` calls `time.LoadLocation` on every iteration (scheduler/scheduler.go) -**File:** `internal/storage/scan_linux.go`, lines 161-176 +**File:** `internal/scheduler/scheduler.go`, lines 241-245 -**Problem:** `partitionToParentDisk` strips trailing digits to find the parent. For eMMC (`mmcblk0p1`), it strips `0p1` → `mmcblk` instead of correctly producing `mmcblk0`. The `p` separator between device number and partition number is not handled. +**Problem:** `time.LoadLocation("Europe/Budapest")` is called every time `nextDailyRun` runs — i.e., every single day for each daily job, and additionally on startup. This is both inefficient (filesystem read + parsing each time) and dangerous: if the system loses access to tzdata at runtime (e.g., container rebuild removing the package), the fallback to UTC happens silently with only a log line. Jobs would shift by 1-2 hours with no alert. + +**Fix:** Cache the timezone at package level or in the Scheduler struct. Preferred approach — package-level `sync.Once`: -**Fix:** Add eMMC/NVMe pattern detection: ```go -func partitionToParentDisk(partName string) string { - // Handle mmcblk0p1, nvme0n1p1 patterns (separator is "p" before partition number) - if idx := strings.LastIndex(partName, "p"); idx > 0 { - prefix := partName[:idx] - suffix := partName[idx+1:] - if len(suffix) > 0 && suffix[0] >= '0' && suffix[0] <= '9' { - // Check if prefix ends with a digit (e.g., mmcblk0, nvme0n1) - if prefix[len(prefix)-1] >= '0' && prefix[len(prefix)-1] <= '9' { - return prefix - } +var ( + budapestLoc *time.Location + budapestLocOnce sync.Once +) + +func getBudapestLocation() *time.Location { + budapestLocOnce.Do(func() { + loc, err := time.LoadLocation("Europe/Budapest") + if err != nil { + log.Printf("[ERROR] Cannot load Europe/Budapest timezone: %v — using UTC", err) + loc = time.UTC } - } - // Default: strip trailing digits (sda1 → sda) - return strings.TrimRight(partName, "0123456789") + budapestLoc = loc + }) + return budapestLoc +} +``` + +Then in `nextDailyRun`, replace lines 241-245 with: `loc := getBudapestLocation()`. + +--- + +### H3. `settings.save()` leaks .tmp file on `WriteFile` failure (settings/settings.go) + +**File:** `internal/settings/settings.go`, lines 138-139 + +**Problem:** If `os.WriteFile(tmpPath, data, 0644)` fails (e.g., disk full, permission denied), the function returns the error but does NOT clean up the .tmp file. Only the `os.Rename` failure path (line 143) calls `os.Remove(tmpPath)`. Repeated failures accumulate orphaned .tmp files. + +**Fix:** Add cleanup to the WriteFile error path: + +```go +if err := os.WriteFile(tmpPath, data, 0644); err != nil { + os.Remove(tmpPath) // clean up partial file + return fmt.Errorf("writing tmp settings: %w", err) } ``` --- -### H11. Data race on `bytesCopied` in rsync error path (storage/migrate.go) +### H4. `SetNotificationPrefs` panics on nil input (settings/settings.go) -**File:** `internal/storage/migrate.go`, around line 243 +**File:** `internal/settings/settings.go`, line 220 -**Problem:** In `runRsync`, the `bytesCopied` variable is read (in the error return path) without holding the mutex, while a goroutine may be writing to it. +**Problem:** `copy := *prefs` dereferences the pointer without checking for nil. If any caller passes `nil` (possible from malformed JSON decoding or future code changes), this panics. -**Fix:** Acquire the lock before reading `bytesCopied` in the error path, or use `atomic.Int64` instead of a mutex-protected variable. +**Fix:** Add nil guard at the start of the function (after line 217, before line 218): ---- - -### H12. Goroutine leak in rsync stdout reader (storage/migrate.go) - -**File:** `internal/storage/migrate.go`, lines 214-237 - -**Problem:** A goroutine reads rsync's stdout. If the main function returns early (e.g., on context cancellation), the goroutine may block on `scanner.Scan()` forever if stdout isn't closed, or try to send on a closed channel. - -**Fix:** Ensure the goroutine exits cleanly: -1. Use `cmd.StdoutPipe()` and ensure it's closed on function exit -2. Use a `done` channel or context cancellation to signal the goroutine -3. Don't send on the progress channel after function returns — use a select with a done channel - ---- - -### H13. Path prefix match without separator (storage/migrate.go) - -**File:** `internal/storage/migrate.go`, lines 128-131 - -**Problem:** `strings.HasPrefix(stackMount, currentHDD)` causes `/mnt/hdd` to match `/mnt/hdd_backup/data`. Stacks on a similarly-named but different mount would be migrated incorrectly. - -**Fix:** Append a trailing separator: ```go -if strings.HasPrefix(stackMount, currentHDD + "/") || stackMount == currentHDD { -``` - ---- - -### H14. `DeleteStack` continues after failed `docker compose down` (stacks/delete.go) - -**File:** `internal/stacks/delete.go`, lines 88-92 - -**Problem:** If `docker compose down` fails, the function continues to delete stack files and HDD data. This can leave orphaned containers running while their config is deleted. - -**Fix:** Return the error from `docker compose down` failure. If the user wants to force-delete, that should be a separate explicit option. - ---- - -### H15. XSS in flash messages (web/handlers.go) - -**File:** `internal/web/handlers.go`, around lines 282-288 - -**Problem:** URL query parameters (`?success=...`, `?error=...`) are passed to template data. If templates render these with `{{.}}` instead of escaping, it's XSS. Go's `html/template` auto-escapes in HTML context, but check if any use `{{. | safeHTML}}` or JavaScript context. - -**Fix:** Verify all flash message rendering uses standard `{{.Success}}` / `{{.Error}}` in HTML context (which Go auto-escapes). If any template uses these values inside `