Bug hunt and TASK.md for Sonnet
This commit is contained in:
@@ -1,192 +1,616 @@
|
||||
# 0.12.2 - Bug 4: Restore Section ("Visszaállítás") Simplification — TASK.md
|
||||
# TASK.md — Bug Fix Plan for Sonnet 4.5
|
||||
|
||||
## Current State
|
||||
## Prompt / Context Gathering
|
||||
|
||||
### Backup architecture (what goes where)
|
||||
Before starting any fixes, read the following files to understand the project:
|
||||
|
||||
**Restic snapshots** (daily, on SSD):
|
||||
- `/opt/docker/stacks/` — all Docker configs, compose files, app.yaml
|
||||
- `/opt/docker/db-dumps/` — DB dump SQL files
|
||||
- `/opt/docker/felhom-controller/controller.yaml`
|
||||
- App HDD data paths (if backup enabled) — e.g., `/mnt/hdd_1/storage/immich/`
|
||||
1. **`controller/README.md`** — Full architecture, module map, API endpoints, feature documentation
|
||||
2. **`CHANGELOG.md`** — Recent changes (sessions 26-38, 2026-02-17) for context on what was implemented
|
||||
3. **`controller/TASK.md`** (this file) — The bug list and fix instructions
|
||||
|
||||
**Cross-drive backups** (per-app, rsync/restic to secondary drive):
|
||||
- Copies app user data between drives (e.g., HDD→SSD or SSD→HDD)
|
||||
- Separate from restic — not part of snapshot restore
|
||||
Then read each source file listed in a bug section before fixing it.
|
||||
|
||||
**Key detail:** `SnapshotInfo.Paths []string` from `restic snapshots --json` includes the backup source paths. Snapshots created BEFORE an app's HDD backup was enabled won't have that app's HDD paths listed. This can be used for filtering.
|
||||
|
||||
### Current restore UI flow (backups.html Section 7)
|
||||
1. **App dropdown** — only apps with `HasHDDData && BackupEnabled` (filters out auto-only apps)
|
||||
2. **Snapshot dropdown** — loads ALL snapshots from `/api/backup/snapshots` (no filtering by app)
|
||||
3. **Path display** — shows HDD paths that will be restored (technical, confusing for customers)
|
||||
4. **Warning block** — "FELÜLÍRJA a jelenlegi adatait... NEM vonható vissza!... Javasoljuk az alkalmazás leállítását"
|
||||
5. **Confirmation checkbox** — "Megértettem, visszaállítás saját felelősségre"
|
||||
6. **Restore button** → POST `/backup/restore` with `stack_name` + `snapshot_id`
|
||||
|
||||
### Current restore backend (`RestoreApp` in `restore.go`)
|
||||
- Validates backup enabled + snapshot exists
|
||||
- Resolves app's HDD paths via `GetStackHDDMounts()`
|
||||
- Runs `restic restore {snapshot} --target / --include {path1} --include {path2}`
|
||||
- Only restores HDD data — does NOT restore Docker configs or DB dumps
|
||||
**After all fixes are complete**, update:
|
||||
- `CHANGELOG.md` — Add a new entry at the top following the existing format (session 39, next version **v0.12.3**)
|
||||
- `controller/README.md` — If any fix changes behavior, API, or architecture, update the relevant section
|
||||
|
||||
---
|
||||
|
||||
## Problems
|
||||
## Fix Priority
|
||||
|
||||
1. **Snapshot selector shows ALL snapshots** — user can pick one that predates their app's backup setup → restore fails or does nothing
|
||||
2. **Path display is confusing** — customers don't need to see `/mnt/hdd_1/storage/immich/`
|
||||
3. **Warning is intimidating** — multiple strong statements, feels dangerous
|
||||
4. **No auto-stop** — just "recommended" to stop the app, user can forget
|
||||
5. **DB-only restore** is implicitly possible (pick old snapshot) but not clear what it does
|
||||
Fixes are organized into 3 tiers:
|
||||
|
||||
- **CRITICAL** — Data races, security vulnerabilities, data loss risks. Fix these first.
|
||||
- **HIGH** — Logic errors, resource leaks, incorrect behavior. Fix these second.
|
||||
- **MEDIUM** — Edge cases, code quality, minor issues. Fix if time allows.
|
||||
|
||||
---
|
||||
|
||||
## Proposed Design
|
||||
## CRITICAL Fixes
|
||||
|
||||
### Simplified UI (3 elements instead of 6)
|
||||
### C1. Data race on `m.lastDBDump.Results` mutation (backup/backup.go)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Visszaállítás │
|
||||
│ │
|
||||
│ Alkalmazás: [▼ Immich ] │
|
||||
│ Pillanatkép: [▼ 2026-02-17 03:00 (a3f2b1) ] │
|
||||
│ │
|
||||
│ ⚠ A visszaállítás felülírja az alkalmazás │
|
||||
│ jelenlegi adatait a kiválasztott mentés │
|
||||
│ állapotával. Az alkalmazás a folyamat során │
|
||||
│ automatikusan leáll és újraindul. │
|
||||
│ │
|
||||
│ ☐ Megértettem, visszaállítás indítása │
|
||||
│ │
|
||||
│ [ Visszaállítás indítása ] │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
**File:** `internal/backup/backup.go`, around lines 556-570
|
||||
|
||||
### Changes
|
||||
**Problem:** In `RefreshCache()`, the code mutates `m.lastDBDump.Results[i].Validation` while holding NO lock. The mutex is only acquired later at line 574. Other goroutines reading `m.lastDBDump` via `GetFullStatus()` can see torn writes.
|
||||
|
||||
| # | What | Details |
|
||||
|---|------|---------|
|
||||
| 1 | **Filter snapshots by app** | When app selected, `/api/backup/snapshots?stack={name}` returns only snapshots whose `Paths` contain at least one of the app's HDD paths. Show snapshot time in human-friendly format, no raw IDs |
|
||||
| 2 | **Remove path display** | Customer doesn't need to see mount paths. The restore handler knows what to include |
|
||||
| 3 | **Single calm warning** | Replace 3-line warning + separate checkbox label with one concise block |
|
||||
| 4 | **Auto-stop + restart** | Restore handler stops the app's containers before restore, restarts after. Eliminates "javasoljuk leállítását" advice |
|
||||
| 5 | **Hide DB-only restore** | Not exposed in UI — support contacts Viktor for CLI-level `restic restore` |
|
||||
**Fix:** Move the `m.lastDBDump.Results` mutation block (lines 556-570) inside the `m.mu.Lock()` section that starts at line 574. The re-validation loop should happen after acquiring the lock, before setting `m.cachedStatus`.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
### C2. SnapshotHistory reversed after unlock (backup/backup.go)
|
||||
|
||||
### Backend changes
|
||||
**File:** `internal/backup/backup.go`, around lines 582-589
|
||||
|
||||
**1. Filtered snapshots API** (`internal/web/api_router.go` or `router.go`)
|
||||
**Problem:** `m.cachedStatus = status` is set at line 582 while holding the lock, but the snapshot history reversal at lines 587-589 happens AFTER `m.mu.Unlock()`. Since `status.SnapshotHistory` is a slice backed by the same array as `m.cachedStatus.SnapshotHistory` (created by `copy` which copies elements but shares nothing — wait, actually `copy` creates a new backing array, so this is safe for the array). However, the local `status` variable IS `m.cachedStatus` (assigned at line 582), so reversing `status.SnapshotHistory` also reverses `m.cachedStatus.SnapshotHistory` without the lock.
|
||||
|
||||
Extend `GET /api/backup/snapshots` with optional `?stack={name}` query param:
|
||||
**Fix:** Move the reversal loop (lines 587-589) BEFORE `m.mu.Unlock()`, right after `copy(status.SnapshotHistory, m.snapshotHistory)` at line 581 and before assigning to `m.cachedStatus` at line 582.
|
||||
|
||||
---
|
||||
|
||||
### C3. `SetStackProvider` write without synchronization (backup/backup.go)
|
||||
|
||||
**File:** `internal/backup/backup.go`, line 413-415
|
||||
|
||||
**Problem:** `m.stackProvider = provider` is a direct field write with no mutex protection. The field is read by `resolveAppBackupPaths()` and `RefreshCache()` concurrently.
|
||||
|
||||
**Fix:** Wrap the assignment in `m.mu.Lock()` / `m.mu.Unlock()`.
|
||||
|
||||
---
|
||||
|
||||
### C4. `GetFullStatus` shallow-copies mutable pointers (backup/backup.go)
|
||||
|
||||
**File:** `internal/backup/backup.go`, lines 619-620
|
||||
|
||||
**Problem:** `status.LastDBDump = m.lastDBDump` and `status.LastBackup = m.lastBackup` copy pointer values. Callers can mutate the referenced objects, creating races with the manager's internal state.
|
||||
|
||||
**Fix:** Deep-copy these structs. If `lastDBDump` is non-nil, create a copy: `copyDump := *m.lastDBDump; status.LastDBDump = ©Dump`. Same for `lastBackup`. Also deep-copy the `Results` slice inside `lastDBDump` if present (since it's a slice of structs, a simple `copy` into a new slice suffices).
|
||||
|
||||
---
|
||||
|
||||
### C5. `IsSystemDisk` uses 8-bit major mask (storage/safety_linux.go)
|
||||
|
||||
**File:** `internal/storage/safety_linux.go`, lines 30-31
|
||||
|
||||
**Problem:** `major := (stat.Rdev >> 8) & 0xff` extracts only 8 bits. Linux dev_t uses a 12-bit major: bits 8-19 with bits 0-7 at positions 8-15 and bits 8-11 at positions 44-47 (for 64-bit). For the common case, the formula should use `unix.Major()` from `golang.org/x/sys/unix` or at minimum the standard extraction: `major = (dev & 0xfff00) >> 8`. Also, the function only compares major numbers — ALL SCSI disks share major 8, so this blocks formatting of any SCSI disk if root is also SCSI.
|
||||
|
||||
**Fix:** Use `unix.Major(uint64(stat.Rdev))` and `unix.Minor(uint64(stat.Rdev))` from `golang.org/x/sys/unix` (already a dependency). Compare both major AND the disk portion of the minor to identify the physical disk. For SCSI (major 8), disks are distinguished by `minor / 16`. For NVMe (major 259), compare the whole-disk minor. The simplest correct approach: resolve both device paths to their parent disk name (e.g., via `partitionToParentDisk`) and compare those names instead of using device numbers.
|
||||
|
||||
---
|
||||
|
||||
### C6. No `/dev/` prefix validation on `DevicePath` (storage/format_linux.go)
|
||||
|
||||
**File:** `internal/storage/format_linux.go`, around lines 37-41
|
||||
|
||||
**Problem:** The `FormatAndMount` function accepts `DevicePath` from user JSON without validating it starts with `/dev/`. An attacker could pass arbitrary paths.
|
||||
|
||||
**Fix:** At the start of `FormatAndMount`, validate:
|
||||
```go
|
||||
func (r *Router) snapshotsHandler(w http.ResponseWriter, req *http.Request) {
|
||||
snapshots, err := r.backupMgr.ListSnapshots(50)
|
||||
// ... existing error handling ...
|
||||
|
||||
// Filter by stack if requested
|
||||
if stackName := req.URL.Query().Get("stack"); stackName != "" {
|
||||
hddMounts := r.stackProvider.GetStackHDDMounts(stackName)
|
||||
if len(hddMounts) > 0 {
|
||||
snapshots = filterSnapshotsByPaths(snapshots, hddMounts)
|
||||
}
|
||||
}
|
||||
|
||||
writeJSON(w, http.StatusOK, apiResponse{OK: true, Data: snapshots})
|
||||
if !strings.HasPrefix(req.DevicePath, "/dev/") {
|
||||
return fmt.Errorf("invalid device path: must start with /dev/")
|
||||
}
|
||||
if strings.Contains(req.DevicePath, "..") {
|
||||
return fmt.Errorf("invalid device path: must not contain ..")
|
||||
}
|
||||
```
|
||||
|
||||
func filterSnapshotsByPaths(snapshots []backup.SnapshotInfo, requiredPaths []string) []backup.SnapshotInfo {
|
||||
var filtered []backup.SnapshotInfo
|
||||
for _, snap := range snapshots {
|
||||
for _, req := range requiredPaths {
|
||||
for _, sp := range snap.Paths {
|
||||
if strings.HasPrefix(req, sp) || strings.HasPrefix(sp, req) {
|
||||
filtered = append(filtered, snap)
|
||||
goto next
|
||||
}
|
||||
---
|
||||
|
||||
### C7. Path traversal in `extractName` (api/router.go)
|
||||
|
||||
**File:** `internal/api/router.go`, lines 769-771
|
||||
|
||||
**Problem:** `extractName()` extracts a stack name from the URL path but doesn't reject `..`, `/`, or `\` characters. This name is used in file paths and Docker commands.
|
||||
|
||||
**Fix:** Add validation after extracting the name:
|
||||
```go
|
||||
func extractName(r *http.Request) string {
|
||||
name := // ... existing extraction logic
|
||||
if strings.ContainsAny(name, "/\\") || name == ".." || name == "." || name == "" {
|
||||
return ""
|
||||
}
|
||||
return name
|
||||
}
|
||||
```
|
||||
Then ensure all callers handle empty string as an error (most already check `name == ""`).
|
||||
|
||||
---
|
||||
|
||||
### C8. Path traversal in `TargetPath` (web/storage_handlers.go)
|
||||
|
||||
**File:** `internal/web/storage_handlers.go`, around lines 374-417
|
||||
|
||||
**Problem:** Migration API accepts `TargetPath` from user input without validating it's a registered storage path. Allows writing to arbitrary filesystem locations.
|
||||
|
||||
**Fix:** Validate `TargetPath` against the list of registered storage paths from settings:
|
||||
```go
|
||||
registeredPaths := s.settings.GetStoragePaths() // or equivalent
|
||||
valid := false
|
||||
for _, p := range registeredPaths {
|
||||
if req.TargetPath == p.Path {
|
||||
valid = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !valid {
|
||||
http.Error(w, "invalid target path", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### C9. Path traversal in `DestinationPath` (api/router.go)
|
||||
|
||||
**File:** `internal/api/router.go`, in `saveCrossDriveConfig()` around lines 587-644
|
||||
|
||||
**Problem:** `DestinationPath` from user JSON not validated against registered storage paths. Allows cross-drive backup to write anywhere.
|
||||
|
||||
**Fix:** Same pattern as C8 — validate `DestinationPath` against registered storage paths. Also validate it's not the same device as the source (this check may already exist in `crossdrive.go`, but the API should validate early).
|
||||
|
||||
---
|
||||
|
||||
### C10. Path traversal in `ParseComposeHDDMounts` (stacks/delete.go)
|
||||
|
||||
**File:** `internal/stacks/delete.go`, lines 243-251
|
||||
|
||||
**Problem:** After `${HDD_PATH}` substitution, the resulting path is checked with `strings.HasPrefix(hostPath, hddPath)` BEFORE `filepath.Clean`. An attacker could craft `${HDD_PATH}/../../etc/passwd` — the prefix check passes, but the resolved path escapes the HDD directory.
|
||||
|
||||
**Fix:** Apply `filepath.Clean()` to the resolved path BEFORE the prefix check:
|
||||
```go
|
||||
resolved := filepath.Clean(hostPath)
|
||||
if !strings.HasPrefix(resolved, filepath.Clean(hddPath) + string(filepath.Separator)) {
|
||||
continue // skip paths that escape HDD directory
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## HIGH Fixes
|
||||
|
||||
### H1. `ValidateDump` reads entire file into memory (backup/dbdump.go)
|
||||
|
||||
**File:** `internal/backup/dbdump.go`, line 251
|
||||
|
||||
**Problem:** `os.ReadFile(filePath)` loads entire SQL dump into memory. For large databases (500MB+), this causes massive memory allocation during every 5-minute cache refresh.
|
||||
|
||||
**Fix:** Replace with `bufio.Scanner` reading line by line, or `io.LimitReader` to read only the first N bytes (e.g., 64KB for header) and last N bytes (for footer/completion markers). The validation only checks for structural markers like `CREATE TABLE`, `INSERT INTO`, completion tags — these are at the start and end of the file.
|
||||
|
||||
---
|
||||
|
||||
### H2. `du` calls without timeout (backup/appdata.go)
|
||||
|
||||
**File:** `internal/backup/appdata.go`, lines 136 and 150
|
||||
|
||||
**Problem:** `exec.Command("du", "-sh", path)` and `exec.Command("du", "-sb", path)` have no timeout. Can hang on slow/unmounted filesystems, blocking the web UI.
|
||||
|
||||
**Fix:** Use `exec.CommandContext` with a 30-second timeout:
|
||||
```go
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(ctx, "du", "-sh", path)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### H3. Double `du` invocation per mount (backup/appdata.go)
|
||||
|
||||
**File:** `internal/backup/appdata.go`, lines 79-80
|
||||
|
||||
**Problem:** Both `appDirSizeBytes(mount)` and `appDirSizeHuman(mount)` run `du` independently. That's 2 subprocess calls per mount point.
|
||||
|
||||
**Fix:** Create a single function `appDirSize(path string) (int64, string)` that runs `du -sb` once, parses the byte count, and formats the human string using `humanizeBytes()`. Replace both calls with this single function.
|
||||
|
||||
---
|
||||
|
||||
### H4. Snapshot validation only checks first 100 (backup/restore.go)
|
||||
|
||||
**File:** `internal/backup/restore.go`, line 22
|
||||
|
||||
**Problem:** `m.restic.ListSnapshots(100)` limits to 100 snapshots. Older snapshots can't be restored.
|
||||
|
||||
**Fix:** Either:
|
||||
- Pass 0 or a very large number to list all snapshots, OR
|
||||
- Better: validate the snapshot ID format with a regex (`^[0-9a-f]{8,64}$`) and skip the existence check (let `restic restore` fail naturally with a clear error if the ID doesn't exist). This avoids listing all snapshots entirely.
|
||||
|
||||
---
|
||||
|
||||
### H5. No pruning for cross-drive restic repos (backup/crossdrive.go)
|
||||
|
||||
**File:** `internal/backup/crossdrive.go`
|
||||
|
||||
**Problem:** Cross-drive restic backups accumulate snapshots forever. Main backups have pruning via `restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6`, but cross-drive repos don't.
|
||||
|
||||
**Fix:** After a successful cross-drive restic backup, run a prune step similar to the main backup. Add a `pruneRepo(ctx, repoPath, pwFile string)` function that runs `restic forget --keep-daily 7 --keep-weekly 4 --prune` on the cross-drive repo. Call it after backup in `runResticBackup()`.
|
||||
|
||||
---
|
||||
|
||||
### H6. Temp password file management (backup/crossdrive.go)
|
||||
|
||||
**File:** `internal/backup/crossdrive.go`, lines 230-240
|
||||
|
||||
**Problem:** `defer os.Remove(pwFile.Name())` is registered before the file is closed. On some systems this can fail. Also, no `defer pwFile.Close()` — the explicit `Close()` at line 240 won't run if code panics between creation and close.
|
||||
|
||||
**Fix:** Reorganize:
|
||||
```go
|
||||
pwFile, err := os.CreateTemp("", "felhom-crossdrive-pw-*")
|
||||
if err != nil { return ... }
|
||||
pwPath := pwFile.Name()
|
||||
if _, err := pwFile.WriteString(password); err != nil {
|
||||
pwFile.Close()
|
||||
os.Remove(pwPath)
|
||||
return ...
|
||||
}
|
||||
pwFile.Close()
|
||||
defer os.Remove(pwPath)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### H7. `dirSizeBytes` swallows all walk errors (backup/crossdrive.go)
|
||||
|
||||
**File:** `internal/backup/crossdrive.go`, lines 297-307
|
||||
|
||||
**Problem:** `filepath.Walk` callback returns `nil` on error, hiding permission/IO issues.
|
||||
|
||||
**Fix:** Return the error (or at least log it):
|
||||
```go
|
||||
err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
|
||||
if err != nil {
|
||||
return err // propagate instead of swallowing
|
||||
}
|
||||
if !info.IsDir() {
|
||||
total += info.Size()
|
||||
}
|
||||
return nil
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### H8. Non-atomic fstab write (storage/safety_linux.go)
|
||||
|
||||
**File:** `internal/storage/safety_linux.go`, lines 91-102
|
||||
|
||||
**Problem:** `AppendFstabEntry` writes directly to `/etc/fstab`. A crash during write corrupts the file, potentially bricking the system on reboot.
|
||||
|
||||
**Fix:** Read fstab, append entry, write to `/etc/fstab.tmp`, then `os.Rename("/etc/fstab.tmp", "/etc/fstab")`. This is the same atomic pattern used in settings.go.
|
||||
|
||||
---
|
||||
|
||||
### H9. `IsDeviceMounted` naive prefix matching (storage/safety_linux.go)
|
||||
|
||||
**File:** `internal/storage/safety_linux.go`, line 54
|
||||
|
||||
**Problem:** `strings.HasPrefix(fields[0], devicePath)` causes `/dev/sdb` to match `/dev/sdb1`, `/dev/sdb2`, etc. — and worse, `/dev/sdb` matches `/dev/sdba` (hypothetical).
|
||||
|
||||
**Fix:** After the prefix check, verify the next character is either end-of-string or a digit (for partition suffixes):
|
||||
```go
|
||||
src := fields[0]
|
||||
if src == devicePath || (strings.HasPrefix(src, devicePath) && len(src) > len(devicePath) && (src[len(devicePath)] >= '0' && src[len(devicePath)] <= '9' || src[len(devicePath)] == 'p')) {
|
||||
return true, nil
|
||||
}
|
||||
```
|
||||
Or better: just check exact match for the device, and separately check if any partition of it is mounted.
|
||||
|
||||
---
|
||||
|
||||
### H10. eMMC device mapping bug (storage/scan_linux.go)
|
||||
|
||||
**File:** `internal/storage/scan_linux.go`, lines 161-176
|
||||
|
||||
**Problem:** `partitionToParentDisk` strips trailing digits to find the parent. For eMMC (`mmcblk0p1`), it strips `0p1` → `mmcblk` instead of correctly producing `mmcblk0`. The `p` separator between device number and partition number is not handled.
|
||||
|
||||
**Fix:** Add eMMC/NVMe pattern detection:
|
||||
```go
|
||||
func partitionToParentDisk(partName string) string {
|
||||
// Handle mmcblk0p1, nvme0n1p1 patterns (separator is "p" before partition number)
|
||||
if idx := strings.LastIndex(partName, "p"); idx > 0 {
|
||||
prefix := partName[:idx]
|
||||
suffix := partName[idx+1:]
|
||||
if len(suffix) > 0 && suffix[0] >= '0' && suffix[0] <= '9' {
|
||||
// Check if prefix ends with a digit (e.g., mmcblk0, nvme0n1)
|
||||
if prefix[len(prefix)-1] >= '0' && prefix[len(prefix)-1] <= '9' {
|
||||
return prefix
|
||||
}
|
||||
}
|
||||
next:
|
||||
}
|
||||
return filtered
|
||||
// Default: strip trailing digits (sda1 → sda)
|
||||
return strings.TrimRight(partName, "0123456789")
|
||||
}
|
||||
```
|
||||
|
||||
**2. Auto-stop/restart in RestoreApp** (`internal/backup/restore.go`)
|
||||
---
|
||||
|
||||
### H11. Data race on `bytesCopied` in rsync error path (storage/migrate.go)
|
||||
|
||||
**File:** `internal/storage/migrate.go`, around line 243
|
||||
|
||||
**Problem:** In `runRsync`, the `bytesCopied` variable is read (in the error return path) without holding the mutex, while a goroutine may be writing to it.
|
||||
|
||||
**Fix:** Acquire the lock before reading `bytesCopied` in the error path, or use `atomic.Int64` instead of a mutex-protected variable.
|
||||
|
||||
---
|
||||
|
||||
### H12. Goroutine leak in rsync stdout reader (storage/migrate.go)
|
||||
|
||||
**File:** `internal/storage/migrate.go`, lines 214-237
|
||||
|
||||
**Problem:** A goroutine reads rsync's stdout. If the main function returns early (e.g., on context cancellation), the goroutine may block on `scanner.Scan()` forever if stdout isn't closed, or try to send on a closed channel.
|
||||
|
||||
**Fix:** Ensure the goroutine exits cleanly:
|
||||
1. Use `cmd.StdoutPipe()` and ensure it's closed on function exit
|
||||
2. Use a `done` channel or context cancellation to signal the goroutine
|
||||
3. Don't send on the progress channel after function returns — use a select with a done channel
|
||||
|
||||
---
|
||||
|
||||
### H13. Path prefix match without separator (storage/migrate.go)
|
||||
|
||||
**File:** `internal/storage/migrate.go`, lines 128-131
|
||||
|
||||
**Problem:** `strings.HasPrefix(stackMount, currentHDD)` causes `/mnt/hdd` to match `/mnt/hdd_backup/data`. Stacks on a similarly-named but different mount would be migrated incorrectly.
|
||||
|
||||
**Fix:** Append a trailing separator:
|
||||
```go
|
||||
func (m *Manager) RestoreApp(stackName, snapshotID string) error {
|
||||
// ... existing validation ...
|
||||
if strings.HasPrefix(stackMount, currentHDD + "/") || stackMount == currentHDD {
|
||||
```
|
||||
|
||||
// Stop the app's containers before restore
|
||||
if m.stackProvider != nil {
|
||||
m.logger.Printf("[INFO] Stopping %s for restore...", stackName)
|
||||
if err := m.stackProvider.StopStack(stackName); err != nil {
|
||||
m.logger.Printf("[WARN] Could not stop %s: %v (proceeding anyway)", stackName, err)
|
||||
}
|
||||
}
|
||||
---
|
||||
|
||||
// Execute restore (existing logic)
|
||||
if err := m.restic.RestoreAppData(snapshotID, hddMounts); err != nil {
|
||||
// Try to restart even on failure
|
||||
m.stackProvider.StartStack(stackName)
|
||||
return err
|
||||
}
|
||||
### H14. `DeleteStack` continues after failed `docker compose down` (stacks/delete.go)
|
||||
|
||||
// Restart after successful restore
|
||||
if m.stackProvider != nil {
|
||||
m.logger.Printf("[INFO] Restarting %s after restore...", stackName)
|
||||
m.stackProvider.StartStack(stackName)
|
||||
}
|
||||
**File:** `internal/stacks/delete.go`, lines 88-92
|
||||
|
||||
return nil
|
||||
**Problem:** If `docker compose down` fails, the function continues to delete stack files and HDD data. This can leave orphaned containers running while their config is deleted.
|
||||
|
||||
**Fix:** Return the error from `docker compose down` failure. If the user wants to force-delete, that should be a separate explicit option.
|
||||
|
||||
---
|
||||
|
||||
### H15. XSS in flash messages (web/handlers.go)
|
||||
|
||||
**File:** `internal/web/handlers.go`, around lines 282-288
|
||||
|
||||
**Problem:** URL query parameters (`?success=...`, `?error=...`) are passed to template data. If templates render these with `{{.}}` instead of escaping, it's XSS. Go's `html/template` auto-escapes in HTML context, but check if any use `{{. | safeHTML}}` or JavaScript context.
|
||||
|
||||
**Fix:** Verify all flash message rendering uses standard `{{.Success}}` / `{{.Error}}` in HTML context (which Go auto-escapes). If any template uses these values inside `<script>` tags or `onclick` attributes, switch to JSON encoding or remove the pattern. Consider using session-based flash messages instead of URL params to prevent reflected XSS entirely.
|
||||
|
||||
---
|
||||
|
||||
### H16. `exec.Command("docker")` without timeout (web/handlers.go)
|
||||
|
||||
**File:** `internal/web/handlers.go`, around lines 1247-1253
|
||||
|
||||
**Problem:** `syncFileBrowserMounts()` runs `docker compose up -d` without a context timeout. Can hang indefinitely.
|
||||
|
||||
**Fix:** Use `exec.CommandContext` with a 60-second timeout.
|
||||
|
||||
---
|
||||
|
||||
### H17. `SetNotificationPrefs` stores caller's pointer (settings/settings.go)
|
||||
|
||||
**File:** `internal/settings/settings.go`, around line 219
|
||||
|
||||
**Problem:** `s.data.NotificationPrefs = prefs` stores the caller's pointer/struct directly. If the caller later modifies the struct, it mutates settings state without going through the mutex.
|
||||
|
||||
**Fix:** Deep-copy the prefs:
|
||||
```go
|
||||
func (s *Settings) SetNotificationPrefs(prefs NotificationPrefs) {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
copy := prefs // value copy
|
||||
copy.EnabledEvents = make([]string, len(prefs.EnabledEvents))
|
||||
copy(copy.EnabledEvents, prefs.EnabledEvents)
|
||||
s.data.NotificationPrefs = copy
|
||||
s.saveLocked()
|
||||
}
|
||||
```
|
||||
|
||||
### Frontend changes
|
||||
---
|
||||
|
||||
**3. Template simplification** (`backups.html` Section 7)
|
||||
- Remove `restore-paths` div entirely
|
||||
- Replace warning text with single concise block
|
||||
- Keep app dropdown + snapshot dropdown + confirm checkbox + button
|
||||
- Update `onRestoreAppChange()` JS to call `/api/backup/snapshots?stack={name}`
|
||||
### H18. `wipefs` error silently discarded (storage/format_linux.go)
|
||||
|
||||
**4. Snapshot display format**
|
||||
- Show: `2026-02-17 vasárnap 03:00` instead of `a3f2b1 — 2026.02.17. 3:00:00`
|
||||
- Keep snapshot ID in `option.value`, show human time in label
|
||||
**File:** `internal/storage/format_linux.go`, line 74
|
||||
|
||||
**Problem:** `_ = exec.Command("wipefs", ...).Run()` discards the error. If `wipefs` fails, stale filesystem signatures remain, causing `mkfs.ext4` or mount to misbehave.
|
||||
|
||||
**Fix:** Check the error. If `wipefs` fails, log a warning but continue (some systems don't have `wipefs`). At minimum don't use `_ =`:
|
||||
```go
|
||||
if err := exec.Command("wipefs", "-a", partPath).Run(); err != nil {
|
||||
log.Printf("[WARN] wipefs failed on %s: %v (continuing)", partPath, err)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions
|
||||
### H19. Orphaned fstab entry on mount failure (storage/format_linux.go)
|
||||
|
||||
**D1: Snapshot dropdown format** → Show `date (id)` — human-friendly time with short ID in parentheses for support debugging. E.g. `2026-02-17 vasárnap 03:00 (a3f2b1)`
|
||||
**File:** `internal/storage/format_linux.go`, lines 140-150
|
||||
|
||||
**D2: Zero filtered snapshots** → Show inline message instead of empty dropdown: "Még nincs mentés felhasználói adattal." (shortened from original)
|
||||
**Problem:** If fstab entry is written but the subsequent `mount` command fails, an orphaned entry remains in fstab. On next reboot, the system may hang waiting for a mount that doesn't work.
|
||||
|
||||
**D3: Auto-stop/restart** → Yes, stacks must be stopped during restore. Check if StackProvider interface already exposes Stop/Start; extend it if not.
|
||||
|
||||
**D4: Progress indication** → Keep current simple behavior (button text change + page redirect) for v0.13.0. Async polling (like backup progress) is a future enhancement.
|
||||
|
||||
**D5: Restore scope** → User data (HDD paths) only. Docker configs and DB dumps are NOT restored — that's a disaster recovery task handled via CLI/support.
|
||||
**Fix:** If mount fails after fstab write, remove the fstab entry as rollback. Read fstab, remove the line containing the UUID, write back atomically.
|
||||
|
||||
---
|
||||
|
||||
## Files to modify
|
||||
## MEDIUM Fixes
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `internal/web/api_router.go` (or `router.go`) | Add `?stack=` filter to snapshots endpoint |
|
||||
| `internal/backup/restore.go` | Add auto-stop/restart logic |
|
||||
| `internal/backup/types.go` | Add `filterSnapshotsByPaths` helper if needed |
|
||||
| `internal/web/templates/backups.html` | Simplify restore section HTML |
|
||||
| `internal/web/templates/backups.html` (JS) | Update `onRestoreAppChange()` to pass stack param |
|
||||
| `internal/stacks/` (maybe) | Ensure StackProvider exposes Stop/Start if needed for Q3 |
|
||||
### M1. `formatBytes` duplicate in dbdump.go (backup/dbdump.go)
|
||||
|
||||
## Estimated effort
|
||||
- Backend: ~2 hours (snapshot filtering + auto-stop/restart + interface extension)
|
||||
- Frontend: ~1 hour (template simplification + JS update)
|
||||
- Testing: ~1 hour (restore flow with different apps/snapshots)
|
||||
**File:** `internal/backup/dbdump.go`, lines 467-483
|
||||
|
||||
**Problem:** Duplicates `humanizeBytes()` from `appdata.go`.
|
||||
|
||||
**Fix:** Delete `formatBytes()` from dbdump.go and replace all calls with `humanizeBytes()` from appdata.go.
|
||||
|
||||
---
|
||||
|
||||
### M2. Dead code `.tmp` suffix check (backup/dbdump.go)
|
||||
|
||||
**File:** `internal/backup/dbdump.go`, lines 331-332
|
||||
|
||||
**Problem:** Line 328 already filters for `.sql` suffix, so `.tmp` check is unreachable.
|
||||
|
||||
**Fix:** Remove the dead `.tmp` check. If intent was to skip `.sql.tmp` files, fix the filter order: check `.tmp` before `.sql`.
|
||||
|
||||
---
|
||||
|
||||
### M3. `sizeBytes()` returns 0 for string types (storage/scan_linux.go)
|
||||
|
||||
**File:** `internal/storage/scan_linux.go`, lines 33-39
|
||||
|
||||
**Problem:** `lsblk` can return size as a string. The function only handles `float64` from JSON unmarshaling.
|
||||
|
||||
**Fix:** Add string handling:
|
||||
```go
|
||||
case string:
|
||||
n, _ := strconv.ParseUint(v, 10, 64)
|
||||
return n
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### M4. Missing `PARTUUID=` handling in fstab (storage/scan_linux.go)
|
||||
|
||||
**File:** `internal/storage/scan_linux.go`, lines 130-134
|
||||
|
||||
**Problem:** `getSystemDiskNames` only parses `UUID=` entries from fstab, missing `PARTUUID=` entries common on Raspberry Pi.
|
||||
|
||||
**Fix:** Add a `PARTUUID=` case that uses `blkid -t PARTUUID=xxx -o device` to resolve to a device path.
|
||||
|
||||
---
|
||||
|
||||
### M5. Inconsistent rollback in migration (storage/migrate.go)
|
||||
|
||||
**File:** `internal/storage/migrate.go`, lines 160-163
|
||||
|
||||
**Problem:** If `updateFn` fails (the callback that updates app config to point to new path), the data has been copied but the config still points to old path. No cleanup of copied data on target.
|
||||
|
||||
**Fix:** If `updateFn` fails, attempt to clean up the target directory and return a clear error explaining the situation.
|
||||
|
||||
---
|
||||
|
||||
### M6. Dead `elapsed` variable (storage/migrate.go)
|
||||
|
||||
**File:** `internal/storage/migrate.go`, around line 177
|
||||
|
||||
**Problem:** `elapsed` is calculated but never used.
|
||||
|
||||
**Fix:** Remove or use it in a log message.
|
||||
|
||||
---
|
||||
|
||||
### M7. `time.LoadLocation` error silently discarded (web/handlers.go)
|
||||
|
||||
**File:** `internal/web/handlers.go`, lines 449-450, 541-542
|
||||
|
||||
**Problem:** `loc, _ := time.LoadLocation("Europe/Budapest")` — if tzdata is missing, `loc` is nil causing panic on `time.Now().In(loc)`.
|
||||
|
||||
**Fix:** Handle the error:
|
||||
```go
|
||||
loc, err := time.LoadLocation("Europe/Budapest")
|
||||
if err != nil {
|
||||
loc = time.UTC
|
||||
log.Printf("[WARN] timezone Europe/Budapest not available: %v", err)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### M8. "Run all" triggers manual-schedule backups (api/router.go)
|
||||
|
||||
**File:** `internal/api/router.go`, lines 701-703
|
||||
|
||||
**Problem:** The "run all" endpoint calls `RunAllScheduled(ctx, "manual")` or iterates all configs. If the intent of "manual" schedule was "only run when explicitly triggered per-app", running all defeats this.
|
||||
|
||||
**Fix:** Clarify intent. If "Run all" should only trigger daily+weekly configs, filter out manual ones. If manual should also be included in "run all", document this behavior.
|
||||
|
||||
---
|
||||
|
||||
### M9. Background goroutines use `context.Background()` (api/router.go)
|
||||
|
||||
**File:** `internal/api/router.go`, lines 658-662, 693-704
|
||||
|
||||
**Problem:** Background goroutines for backup/restore use `context.Background()` and survive server shutdown, potentially corrupting data.
|
||||
|
||||
**Fix:** Use a server-level context that's cancelled on shutdown. Pass the server's context (from `http.Server.BaseContext` or a custom one) and derive goroutine contexts from it.
|
||||
|
||||
---
|
||||
|
||||
### M10. `filterSnapshotsByPaths` imprecise prefix (api/router.go)
|
||||
|
||||
**File:** `internal/api/router.go`, lines 476-490
|
||||
|
||||
**Problem:** `strings.HasPrefix(snapPath, reqPath)` causes `/mnt/hdd_1` to match `/mnt/hdd_10/data`.
|
||||
|
||||
**Fix:** Append separator: `strings.HasPrefix(snapPath, reqPath + "/") || snapPath == reqPath`.
|
||||
|
||||
---
|
||||
|
||||
### M11. XSS in `editStorageLabel` innerHTML (templates/settings.html)
|
||||
|
||||
**File:** `internal/web/templates/settings.html`, around lines 280-296
|
||||
|
||||
**Problem:** `cancelEditLabel` function uses `innerHTML` to restore label content. If `path` or `currentLabel` contain HTML, it's XSS.
|
||||
|
||||
**Fix:** Use `textContent` instead of `innerHTML`:
|
||||
```javascript
|
||||
el.textContent = currentLabel || path;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### M12. `GetNotificationPrefs` leaks slice reference (settings/settings.go)
|
||||
|
||||
**File:** `internal/settings/settings.go`, around line 196
|
||||
|
||||
**Problem:** Returns direct reference to `DefaultEnabledEvents` slice. Caller can modify the global default.
|
||||
|
||||
**Fix:** Return a copy of the slice:
|
||||
```go
|
||||
events := make([]string, len(DefaultEnabledEvents))
|
||||
copy(events, DefaultEnabledEvents)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### M13. `CheckBackupDestination` tier priority (system/mounts_linux.go)
|
||||
|
||||
**Problem:** Tier 4 (disk full) overwrites Tier 3 (system drive) warning. If a drive is both the system drive and >90% full, only the disk-full warning shows.
|
||||
|
||||
**Fix:** Accumulate warnings instead of overwriting. Return a slice of warnings, or use the highest severity.
|
||||
|
||||
---
|
||||
|
||||
### M14. Hardcoded paths in backup.go
|
||||
|
||||
**File:** `internal/backup/backup.go`, lines 245-248, 524, 683
|
||||
|
||||
**Problem:** `/opt/docker/felhom-controller/controller.yaml` is hardcoded in 3 places.
|
||||
|
||||
**Fix:** Add a constant or derive from `m.cfg.Paths.DataDir`:
|
||||
```go
|
||||
const controllerYAMLPath = "/opt/docker/felhom-controller/controller.yaml"
|
||||
```
|
||||
Or better: `filepath.Join(m.cfg.Paths.DataDir, "controller.yaml")`.
|
||||
|
||||
---
|
||||
|
||||
## Testing Notes
|
||||
|
||||
After making fixes:
|
||||
1. **Build:** `go build ./...` from the `controller/` directory must succeed
|
||||
2. **Vet:** `go vet ./...` must pass
|
||||
3. **Key areas to manually verify:**
|
||||
- Storage format/mount still works (C5, C6, H8, H9, H18, H19)
|
||||
- Backup cycle completes (C1-C4, H1-H7)
|
||||
- Migration works (H11-H13, M5)
|
||||
- Web UI loads without errors (H15, M11)
|
||||
|
||||
## Post-Fix Checklist
|
||||
|
||||
- [ ] Add `CHANGELOG.md` entry (session 39, v0.12.3) listing all fixes
|
||||
- [ ] Update `controller/README.md` if any API behavior changed
|
||||
- [ ] `go build ./...` passes
|
||||
- [ ] `go vet ./...` passes
|
||||
|
||||
+623
-422
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user