# Bug Hunt Report — Comprehensive Controller Audit **Date:** 2026-02-25 **Scope:** Full controller codebase (`controller/` — all ~80 Go files, all packages) **Method:** Systematic code review of every file + live testing on demo node (192.168.0.162) **Verified on:** v0.30.2, Go 1.24.0 --- ## Priority Legend | Priority | Meaning | |----------|---------| | P0 | **Critical** — Data race causing corruption, potential panic in production, security vulnerability | | P1 | **High** — Logic error causing wrong behavior, resource leak, silent data loss | | P2 | **Medium** — Suboptimal behavior, inconsistency, performance issue, minor bug | | P3 | **Low** — Code smell, dead code, cosmetic issue, hardening | --- ## Table of Contents - [P0 Critical Issues](#p0-critical-issues) - [P1 High Issues](#p1-high-issues) - [P2 Medium Issues](#p2-medium-issues) - [P3 Low Issues](#p3-low-issues) - [Standardization / Optimization Opportunities](#standardization--optimization-opportunities) - [Fix Plan — Ordered by Implementation Phase](#fix-plan--ordered-by-implementation-phase) --- ## P0 Critical Issues ### C01 — `UpdateStackConfig` passes encrypted env values to docker compose **Files:** `stacks/deploy.go:340-366` **Category:** Bug — Silent data corruption `UpdateStackConfig` loads `app.yaml` via `LoadAppConfig()` (line 340) which returns encrypted `ENC:...` values for sensitive fields. It then updates the modified fields and calls `composeExecWithEnv` (line 366) passing the mixed map — old encrypted values + new plaintext values. Docker compose receives the encrypted ciphertext as literal environment variable values. Containers get `ENC:H98iUGr...` instead of the actual DB password. **Verified on demo:** `immich/app.yaml` has `DB_PASSWORD: ENC:H98i...`. If a user changes optional config via the UI, the next `docker compose up -d` would inject the encrypted string as DB_PASSWORD. **Fix:** Use `LoadAppConfigDecrypted(stackDir, m.encKey)` for the compose execution path, or call `m.stackEnv(stackDir)` instead of `m.composeExecWithEnv`. --- ### C02 — `DecryptMap` silently swallows decryption errors **File:** `crypto/crypto.go:107-113` **Category:** Silent data corruption When decryption fails (key rotated, data corrupted), `DecryptMap` silently returns the raw `ENC:...` ciphertext as the value. Applications receive garbled base64 as passwords/secrets, causing silent failures. No error is logged or returned. **Fix:** Return an error from `DecryptMap`, or at minimum log a `[WARN]`. Update callers to handle the error. --- ### C03 — `appCfg.Deployed = false` written outside lock in deploy goroutine **Files:** `stacks/deploy.go:311` **Category:** Data race In `runComposeDeploy`, on failure, `appCfg.Deployed = false` is written at line 311 **outside** the `m.mu.Lock()` block (lines 302-309). The same `appCfg` pointer is shared with `s.AppConfig` in the stacks map. Any concurrent `GetStack`/`GetStacks` call reading `AppConfig.Deployed` is a data race. **Fix:** Move line 311 (`appCfg.Deployed = false`) inside the lock block before line 309. --- ### C04 — `GetStack`/`GetStacks` returns shallow copies with shared pointer fields **Files:** `stacks/manager.go:449-470` **Category:** Data race `GetStack` does `copy := *s` — a shallow copy. `Stack.AppConfig` is a pointer, so the "copy" shares the `AppConfig` with the original. Any goroutine reading `copy.AppConfig.Env` while another modifies the original's `AppConfig` (e.g., deploy goroutine) is a data race. Same issue with `Containers`, `HealthProbe`, and `Meta.DeployFields`. **Fix:** Implement deep copy for `Stack` struct. At minimum, deep-copy `AppConfig`, `Containers`, and `HealthProbe`. --- ### C05 — Watchdog `pathProbeState` fields read/written without synchronization **File:** `monitor/watchdog.go:141-163 and throughout` **Category:** Data race `Check()` reads and writes `pathProbeState` fields (`lastProbeTime`, `consecutiveFailures`, `probeCount`, `lastStatus`, etc.) without holding `w.mu`. Meanwhile, `GetDebugStatus()` reads these same fields while holding `w.mu`. `SafeDisconnect()` and `Reconnect()` also modify state concurrently. Multiple data races. **Fix:** Either hold `w.mu` during all state modifications in `Check()`, or give each `pathProbeState` its own mutex. --- ### C06 — Metrics collector `cancel` field has data race between `Start` and `Stop` **File:** `metrics/collector.go:Start/Stop` **Category:** Data race `Start()` writes `c.cancel` without any lock. `Stop()` reads it without any lock. If called from different goroutines (the expected pattern), this is a data race. **Fix:** Protect with `c.mu` or use `sync.Once` for start. --- ### C07 — `activeRawMount` race between cleanup, mount, and set **File:** `web/storage_handlers.go:919-935` **Category:** Race condition In `storageAttachMountRawHandler`, the old mount is cleaned up under `diskJobMu` (line 919-924), then the lock is released, then `storage.MountRaw` runs without lock (line 926), then the new mount path is set under lock (line 933-935). Two concurrent requests can both clean up and create mounts, orphaning the first. **Fix:** Hold `diskJobMu` across the entire operation (cleanup + mount + set). --- ### C08 — `SetEncryptionKey` on stacks Manager has no locking **File:** `stacks/manager.go:114-116` **Category:** Data race `SetEncryptionKey` writes `m.encKey` without holding any lock. If called concurrently with `MigrateEncryption`, `stackEnv`, or `SaveAppConfig` (which read `m.encKey`), this is a data race. **Fix:** Acquire `m.mu.Lock()` before setting `m.encKey`. --- ## P1 High Issues ### H01 — Restic lock detection never triggers (checks stdout, not stderr) **File:** `backup/restic.go:135-151` **Category:** Bug — backup recovery broken `Snapshot()` uses `cmd.Output()` which returns stdout only. Restic writes lock errors to stderr. The lock-detection code at line 138 checks `string(out)` for "lock"/"locked" — but `out` is stdout, not stderr. The auto-unlock-and-retry logic likely never triggers. **Fix:** Use `cmd.CombinedOutput()`, or extract stderr from `*exec.ExitError`: ```go if exitErr, ok := err.(*exec.ExitError); ok { errStr = string(exitErr.Stderr) } ``` --- ### H02 — `activeDrives()` returns disconnected/decommissioned drives **File:** `backup/backup.go:206-222` **Category:** Logic error `activeDrives()` logs which drives are disconnected but still adds ALL drives to the result. Functions like `RunIntegrityCheck`, `ListSnapshots`, `aggregateRepoStats` will then operate on disconnected drives, causing timeouts. **Fix:** Add `continue` after logging disconnected drives to skip them: ```go if m.settings != nil && (m.settings.IsDisconnected(d) || m.settings.IsDecommissioned(d)) { disconnected = append(disconnected, d) continue // don't add to drives } ``` --- ### H03 — `MigrateEncryption` holds RLock while writing files **File:** `stacks/manager.go:124-156` **Category:** Concurrency — semantic error `MigrateEncryption()` acquires `m.mu.RLock()` then calls `SaveAppConfig()` which writes to disk. A write operation should hold a write lock, not a read lock. **Fix:** Use `m.mu.Lock()` instead of `m.mu.RLock()`. --- ### H04 — `SaveAppConfig` is not atomic **File:** `stacks/deploy.go:555` (uses `os.WriteFile`) **Category:** Data integrity `SaveAppConfig` writes directly to `app.yaml`. A crash mid-write corrupts the file. This violates the project's own pattern (settings.go uses write-to-tmp + rename). **Fix:** Write to `app.yaml.tmp`, then `os.Rename` to `app.yaml`. --- ### H05 — Deploy failure saves plaintext secrets to disk **File:** `stacks/deploy.go:311-312` **Category:** Security On deploy failure, `runComposeDeploy` calls `SaveAppConfig(stackDir, appCfg, nil, nil)` — passing nil for encKey and sensitiveVars. The plaintext secrets from the deploy form are saved unencrypted to `app.yaml`. **Fix:** Pass `m.encKey` and `SensitiveEnvVars(&meta)` to `SaveAppConfig` in the failure path. --- ### H06 — No rate limiting on login attempts **File:** `web/auth.go:91-128` **Category:** Security `handleLogin` has no rate limiting, lockout, or progressive delay. Bcrypt is slow but still allows ~5-10 attempts/sec. Note: demo node currently has no password set (`Auth: no password configured`), making this irrelevant there but critical for production deployments. **Fix:** Add per-IP rate limiter or exponential backoff on failed attempts. --- ### H07 — Template rendering writes directly to ResponseWriter **Files:** `web/server.go:393-398`, `setup/handlers.go:871-877` **Category:** Bug — corrupt HTTP responses `render()` calls `tmpl.ExecuteTemplate(w, ...)` directly on the ResponseWriter. If the template partially writes then errors, the client receives corrupt HTML with a 200 status. The subsequent `http.Error(500)` is silently ignored because headers are already sent. **Fix:** Render to a `bytes.Buffer` first, only write to `w` on success: ```go var buf bytes.Buffer if err := s.tmpl.ExecuteTemplate(&buf, name, data); err != nil { http.Error(w, "Internal error", 500) return } w.Header().Set("Content-Type", "text/html; charset=utf-8") buf.WriteTo(w) ``` --- ### H08 — `Syncer.Stop()` panics if called twice **File:** `sync/sync.go:114-116` **Category:** Panic `Stop()` calls `close(s.stopCh)`. A double-call panics with "close of closed channel". **Fix:** Use `sync.Once`: ```go s.stopOnce.Do(func() { close(s.stopCh) }) ``` --- ### H09 — Sync race: `TriggerSync` and ticker can run `doSync` concurrently **File:** `sync/sync.go:120-137, 164-167` **Category:** Race condition `TriggerSync` checks `s.syncing` under lock, releases the lock, then calls `doSync()`. Between unlock and doSync acquiring the lock to set `syncing=true`, the ticker goroutine can also enter `doSync()`. Two concurrent syncs corrupt the git cache. **Fix:** Set `s.syncing = true` inside `TriggerSync()` before releasing the lock. --- ### H10 — Cloudflare HTTP requests ignore context — cannot be cancelled **File:** `cloudflare/client.go:73` **Category:** Resource leak `do()` uses `http.NewRequest()` instead of `http.NewRequestWithContext()`. The `ctx` parameter passed to `Sync()` is never propagated to HTTP calls. During shutdown, in-flight Cloudflare API requests block until their 15s timeout. **Fix:** Pass context through to `do()` and use `http.NewRequestWithContext()`. --- ### H11 — `GetFullStatus` returns shared pointers when no cache exists **File:** `backup/backup.go:1213-1228` **Category:** Data race When `m.cachedStatus == nil`, the fallback returns a `FullBackupStatus` that directly references `m.lastDBDump` and `m.lastBackup` pointers. After `m.mu` is released, the caller reads these while backup operations may be mutating them. **Fix:** Deep-copy `m.lastDBDump` and `m.lastBackup` in the no-cache path. --- ### H12 — No HTTP request body size limits **Files:** Multiple handlers in `web/handler_debug.go`, `web/storage_handlers.go`, `api/router.go` **Category:** Security — DoS Every `json.NewDecoder(r.Body).Decode(&req)` has no body size limit. A multi-GB request consumes all memory. **Fix:** Add middleware: `r.Body = http.MaxBytesReader(w, r.Body, 1<<20)` for all POST handlers. --- ### H13 — `runDriveScan` in setup never calls `CleanupTempMounts` **File:** `setup/scanner.go:266-279` **Category:** Resource leak `ScanDrivesForInfraBackups` temporarily mounts partitions. `runDriveScan` stores results but never cleans up. Temp mounts remain forever. **Fix:** Call `CleanupTempMounts(results, s.logger)` after storing scan results. --- ### H14 — `Snapshot` retry reuses same context with diminished timeout **File:** `backup/restic.go:139-147` **Category:** Bug After lock detection + unlock, the retry reuses the same `ctx` with the original 30-minute timeout. Time spent on the first attempt is subtracted. Also, `unlockCmd.Run()` ignores its error. **Fix:** Check unlock error. Consider creating a fresh context for retry. --- ### H15 — `crossdrive` rsync leaf-name collision detection is incomplete **File:** `backup/crossdrive.go:375-389` **Category:** Logic error The collision check uses `os.Stat(dstPath)` which checks pre-existing directories from previous runs, not just current-iteration collisions. Mount index 0 skips collision check entirely. Pre-existing leftovers could cause data to be rsynced into wrong directories. **Fix:** Use a `seen` map to track leaf names within the current iteration: ```go seen := make(map[string]bool) for _, srcMount := range mounts { leaf := filepath.Base(srcMount) if seen[leaf] { leaf = fmt.Sprintf("%s_%d", leaf, i) } seen[leaf] = true // ... } ``` --- ### H16 — CSRF Bearer bypass — token not validated **File:** `web/csrf.go:36-41` **Category:** Security Any request with `Authorization: Bearer ` skips CSRF entirely. The Bearer token is not validated against the actual API key. **Fix:** Validate the Bearer token before skipping CSRF (check against `s.settings.GetAPIKey()`). --- ### H17 — Potential nil dereference on `crossDriveRunner` **File:** `web/handlers.go:919` **Category:** Potential panic `s.crossDriveRunner.ValidateDestination(...)` is called in `buildAppBackupRows()` without nil check. The field is set post-construction via `SetCrossDriveRunner`. **Fix:** Add nil guard before access. --- ### H18 — `selftest.checkDockerSocket` panics on empty output **File:** `selftest/selftest.go:99` **Category:** Potential panic `string(out[:len(out)-1])` panics with index-out-of-range if `out` is empty. **Fix:** Use `strings.TrimSpace(string(out))` instead. --- ### H19 — `stderrBuf` data race in `MigrateDrive` **File:** `storage/migrate_drive.go:314-331` **Category:** Data race A background goroutine writes to `stderrBuf`. After `cmd.Wait()` returns, `stderrBuf.String()` is read, but `Wait()` with piped stderr does NOT guarantee the reader goroutine has finished. **Fix:** Use `sync.WaitGroup` to wait for the stderr goroutine before reading. --- ### H20 — `MountRaw` UUID slice panics if UUID shorter than 8 chars **File:** `storage/attach_linux.go:53-58` **Category:** Potential panic `uuid[:8]` panics if UUID is shorter than 8 characters. **Fix:** `if len(uuid) >= 8 { dirName = uuid[:8] } else { dirName = uuid }` --- ### H21 — `removeBindFstabEntry` uses loose string matching **File:** `storage/attach_linux.go:431-456` **Category:** Logic error `strings.Contains(line, targetMountPath)` matches `/mnt/hdd_10` when target is `/mnt/hdd_1`. Can remove unrelated fstab entries. **Fix:** Parse fstab fields and compare second field exactly. --- ## P2 Medium Issues ### M01 — `http.Get` without timeout in Hub connectivity test **File:** `main.go:~691` Uses `http.DefaultClient` (no timeout) while Gitea test correctly uses 5s timeout. **Fix:** `client := &http.Client{Timeout: 5 * time.Second}` ### M02 — `alertMgr.Refresh` missing `storagePaths` arg at initial call **File:** `main.go:591` Initial refresh omits `sett.GetStoragePaths()` while all 3 other call sites pass it. **Fix:** Add `sett.GetStoragePaths()` to the variadic args. ### M03 — `DrainPendingEvents` risks duplicate delivery on save failure **File:** `settings/settings.go:801-814` Events cleared from memory before save. If save fails, events returned but still on disk — duplicated on restart. **Fix:** Restore events if save fails, or save before clearing. ### M04 — `AddStoragePath` allows duplicate paths **File:** `settings/settings.go:422-432` No duplicate check. Calling twice with same path creates duplicate entries. **Fix:** Check existing paths before append. ### M05 — `LoadFromBytes` skips env overrides **File:** `config/config.go:200-211` Unlike `Load`, `LoadFromBytes` doesn't call `applyEnvOverrides`. **Fix:** Add `applyEnvOverrides(cfg)` call. ### M06 — `ListDumpFiles` re-validates every SQL file on every cache refresh (5 min) **File:** `backup/dbdump.go:449-450` Reads and parses entire dump files every refresh. For large dumps, significant I/O. **Fix:** Use `DBValidationCache` to skip re-validation when file hasn't changed. ### M07 — `copyStackDBDumps` reads entire files into memory **File:** `backup/crossdrive.go:466-484` Uses `os.ReadFile` for DB dumps that can be hundreds of MB. **Fix:** Use `io.Copy` with `os.Open`/`os.Create` for streaming. ### M08 — `backupDrive` only records last drive's snapshot in history **File:** `backup/backup.go:369-437` Multi-drive setups only report the last drive's snapshot. **Fix:** Record all successful snapshots. ### M09 — `SubdomainInUse` does disk I/O under RLock **File:** `stacks/deploy.go:58-82` Calls `LoadAppConfig()` (disk read) for every stack while holding RLock. **Fix:** Cache subdomain in memory `Stack` struct. ### M10 — `InjectMissingFields` auto-fills subdomain without uniqueness check **File:** `stacks/deploy.go:676-690` Uses `field.Default` or `meta.Subdomain` without checking `SubdomainInUse` or reserved names. **Fix:** Add validation. ### M11 — `ParseComposeHDDMounts` uses fragile line-by-line YAML parsing **File:** `stacks/delete.go:392-454` Cannot distinguish top-level vs service-level `volumes:` key. **Fix:** Use `yaml.v3` to properly parse. ### M12 — Backup path removal in `RemoveStack` has no path validation **File:** `stacks/delete.go:304-316` Accepts arbitrary `backupPathsToRemove` from caller, calls `os.RemoveAll`. No validation paths are actually backup paths. **Fix:** Validate paths start with expected backup directory prefix. ### M13 — `getDirSizeHuman` has no timeout **File:** `stacks/delete.go:457-468` `exec.Command("du", "-sh", path)` with no timeout. Can hang on NFS. **Fix:** Add `context.WithTimeout(30s)` like `getDirSizeBytes` does. ### M14 — Scheduler doesn't guard against double-start **File:** `scheduler/scheduler.go:99-116` Calling `Start()` twice overwrites `ctx`/`cancel`, leaking the first set of goroutines. **Fix:** Guard with `if s.ctx != nil { return }`. ### M15 — Jobs registered after `Start()` never execute **File:** `scheduler/scheduler.go:60-96 vs 99-116` `Start()` only launches goroutines for currently-registered jobs. **Fix:** Document requirement, or auto-launch late-registered jobs. ### M16 — `selfupdate.performUpdate` leaves "pending" state on compose-up failure **File:** `selfupdate/updater.go:396-410` State file stays "pending" when compose-up fails, creating contradictory API state. **Fix:** Set `state.Status = "failed"` on compose-up failure. ### M17 — `selfupdate` docker commands have no timeout **File:** `selfupdate/updater.go:495-502` `docker pull` and `docker compose up -d` can hang indefinitely. **Fix:** Use `exec.CommandContext` with timeout. ### M18 — `SetBackupRunningCheck` has no mutex protection **File:** `selfupdate/updater.go:66-68` Writes `u.backupRunning` without lock. `DryRun()` also reads it without lock. **Fix:** Use `u.mu.Lock()`. ### M19 — `goroutine leak` in `ProbeStoragePath` on hung mounts **File:** `system/mounts_linux.go:258-289` `syscall.Statfs` on a dead mount blocks forever. Each probe leaks a goroutine. **Fix:** Track outstanding probes per path, prevent duplicate probes. ### M20 — `rsync --info=progress2` carriage returns not handled **File:** `storage/format.go:31-53` Progress scanner splits on `\n` only; `\r`-delimited updates get concatenated. **Fix:** Custom scanner splitting on both `\r` and `\n`. ### M21 — `mmcblk` device partition path construction is wrong **File:** `storage/format_linux.go:119-122` Only NVMe gets `p` separator. `mmcblk0` → `mmcblk01` instead of `mmcblk0p1`. **Fix:** Add `|| strings.Contains(req.DevicePath, "mmcblk")`. ### M22 — Integer underflow in memory availability message **File:** `api/router.go:381-386` If `reservedMB > totalMB`, `usableMB` goes negative, blocking all app starts. **Fix:** Clamp `usableMB` to minimum 0. ### M23 — `triggerAllCrossBackups` runs "manual" schedule **File:** `api/router.go:907-929` "Manual" schedule meant for per-app on-demand, not bulk triggers. **Fix:** Remove `RunAllScheduled(ctx, "manual")`. ### M24 — `writeFreshConfig` creates bare `&settings.Settings{}` without path **File:** `setup/handlers.go:668-693` On settings load failure, creates Settings with no file path. `SetPasswordHash` (which calls save) writes to empty path. **Fix:** Create Settings with proper path and logger. ### M25 — `SetStep` race condition between unlock and Save **File:** `setup/setup.go:99-106` State modified under lock, then saved after unlock. Another goroutine can modify between. **Fix:** Use `saveLocked()` variant called while holding lock. ### M26 — Setup CSRF relies on client-controlled cookie + form **File:** `setup/csrf.go:34-43` Both sides of the comparison are client-controlled. No server-side token storage. **Fix:** Store CSRF token server-side in SetupState. ### M27 — `PushEvent` in notifier spawns goroutine without tracking **File:** `notify/notifier.go` Goroutine leak potential — no WaitGroup or context for clean shutdown. **Fix:** Track goroutines with sync.WaitGroup. ### M28 — `SafeDisconnect` and `Check` can race on same storage path **File:** `monitor/watchdog.go:498-644` Both modify `pathProbeState` concurrently without coordination. Can double-stop stacks. **Fix:** Use broader mutex or state-transition lock. ### M29 — `stackProvider` reads in backup package lack mutex protection **File:** `backup/backup.go:171-181,184-203,478-485` `SetStackProvider` writes under mutex, but all reads are unprotected. **Fix:** Ensure single-call-at-startup invariant is documented, or add read locks. ### M30 — Multiple `GetStacks()` calls in same handler **File:** `web/handlers.go:214,233` `stacksHandler` calls `GetStacks()` 3 times — each may return different snapshots. **Fix:** Call once, reuse result. --- ## P3 Low Issues ### L01 — Dead code: `fileExists` function in main.go (line 1071) ### L02 — Dead code: `imageName` function in selfupdate/updater.go (line 231) ### L03 — Dead code: `detectHostIPViaRoute` in setup/network.go (always returns "") ### L04 — Dead code: `handleRoot` in setup/handlers.go (both if branches do same redirect) ### L05 — `TruncateStr` operates on bytes, not runes — breaks multi-byte UTF-8 - File: `util/strings.go:6-12` and `stacks/manager.go:718-724` ### L06 — `TruncateStr` no guard for negative maxLen ### L07 — Variable `copy` shadows builtin in settings.go:294 ### L08 — `save()` in settings.go does not guard nil logger (line 218) ### L09 — `GetNotificationPrefs` early return shares global `DefaultEnabledEvents` slice ### L10 — `os.ExpandEnv` in config.go may corrupt `$` in YAML values ### L11 — Key file written non-atomically (crypto.go:37) ### L12 — `ValidateDump` uses global `log.Printf` instead of injected logger ### L13 — `hasPrefix` reimplements `strings.HasPrefix` in restore_scan.go ### L14 — `AllDone` returns true for empty Apps list (restore_scan.go:134) ### L15 — `RunIntegrityCheck` only reports last error across drives ### L16 — `restic.Stats()` casts uint64 to int64 unsafely ### L17 — `logPostStartStatus` goroutine leak potential (no context/WaitGroup) ### L18 — `Pinger.send` response body not drained before close ### L19 — `sampleContainers` creates `context.Background()` instead of using parent ctx ### L20 — Docker commands without timeout in healthcheck.go ### L21 — Non-deterministic Cloudflare rule order due to map iteration ### L22 — `parseDailyTime` accepts trailing garbage ("12:30:45") ### L23 — `Syncer.Start()` can be called multiple times (goroutine leak) ### L24 — `Stop()` on scheduler reads `s.cancel` without mutex ### L25 — `stripPartition` doesn't handle `mmcblk` devices ### L26 — `sizeBytes()` silently returns 0 on ParseUint error ### L27 — Duplicate `dirSizeHuman`/`dirSizeBytesHuman`/`fmtBytes` functions ### L28 — `w.Write` errors ignored in multiple HTTP handlers ### L29 — `json.Encode` errors ignored in multiple handlers ### L30 — `r.ParseForm()` errors ignored (multiple handlers) ### L31 — No upper bound on log `lines` query parameter (DoS) ### L32 — Logout accepts GET requests (CSRF) ### L33 — Content-Type check uses exact match, fails for `application/json; charset=utf-8` ### L34 — `SyncFileBrowserMounts` writes config files non-atomically ### L35 — `DriveMigrator` adapter types have triplicated methods (main.go:840-1038) ### L36 — Goroutines in main.go use `time.Sleep` without context cancellation check ### L37 — `hubPusher.Push()` error silently ignored in multiple callbacks ### L38 — Setup shutdown errors silently discarded (main.go:1136-1137) ### L39 — `PushOnce` in report/pusher.go always returns nil (misleading return type) ### L40 — Static fallback CSRF token ("fallback-csrf-token") in setup/csrf.go ### L41 — `extractName` returns "" but most API callers don't check for empty ### L42 — Hardcoded config path `/opt/docker/felhom-controller/controller.yaml` in 3 setup places ### L43 — `GetStatus()` in backup returns raw shared pointers (not deep-copied like `GetFullStatus`) ### L44 — Fstab entries accumulate extra blank lines from double-newline appends ### L45 — `logFileHashes` called after write — shows identical src/dst hashes ### L46 — `LoadMetadata` prints to stderr instead of using logger ### L47 — `LoadMetadata` silently ignores YAML parse errors, returns defaults ### L48 — `simpleHash` in alerts.go has 32-bit collision risk ### L49 — `Close()` on auth SessionStore can panic if called twice ### L50 — CSS path in setup/handlers.go always fails (dead code) ### L51 — `isError` in setup uses string comparison instead of `errors.Is` ### L52 — Password stored in plaintext in setup-state.json during setup --- ## Standardization / Optimization Opportunities ### S01 — Unify env var construction for compose commands Three different code paths build env vars: `stackEnv`, `composeExecWithEnv`, `composeExecCustomEnv(env=nil)`. `DOMAIN` can be appended twice. Should consolidate into single method. ### S02 — Unify `humanizeBytes` / `fmtBytes` / `dirSizeBytesHuman` / `dirSizeHuman` At least 4 separate implementations of byte-to-human-readable conversion across packages. Should be one function in `util/`. ### S03 — Standardize atomic file writes `settings.go` uses write-to-tmp + rename, but `SaveAppConfig`, `SyncFileBrowserMounts`, crypto key file, and setup state do not. All persistent file writes should use the same pattern. ### S04 — Standardize HTTP client creation Many places create `&http.Client{Timeout: X}` per call. Should have shared clients with sensible defaults. ### S05 — Standardize `exec.Command` with context/timeout Most `exec.Command` calls have no timeout. Should use `exec.CommandContext` with appropriate timeouts consistently. ### S06 — Standardize error handling in HTTP handlers Some handlers return JSON errors, some use `http.Error`, some ignore write errors. Should be consistent. ### S07 — Extract common adapter pattern `stackAdapter`, `watchdogStackAdapter`, `driveMigrateStackAdapter` in main.go share identical methods. Should use embedded base adapter. ### S08 — Consolidate `getDirSizeHuman` + `getDirSizeBytes` into single call Two `du` invocations for the same directory. Call `getDirSizeBytes` once and format. ### S09 — Template rendering: always buffer before writing Multiple `render()` functions write directly to ResponseWriter. All should use bytes.Buffer. ### S10 — Add `MaxBytesReader` middleware Rather than per-handler body limiting, add a global middleware with sensible default (1MB for JSON, larger for file uploads). --- ## Fix Plan — Ordered by Implementation Phase ### Phase 1 — Critical Data Races & Security (P0) **Estimate: ~2 hours** 1. **C01** — Fix `UpdateStackConfig` to use decrypted env for compose 2. **C02** — Add error return to `DecryptMap` (or at least log) 3. **C03** — Move `appCfg.Deployed = false` inside lock block 4. **C04** — Implement `Stack.DeepCopy()`, use in `GetStack`/`GetStacks` 5. **C05** — Add per-state mutex to `pathProbeState` in watchdog 6. **C06** — Protect metrics collector `cancel` with mutex 7. **C07** — Hold `diskJobMu` across entire raw mount operation 8. **C08** — Add lock to `SetEncryptionKey` ### Phase 2 — High-Priority Bugs (P1) **Estimate: ~3 hours** 1. **H01** — Fix restic lock detection to use stderr 2. **H02** — Filter disconnected drives in `activeDrives()` 3. **H03** — Change to write lock in `MigrateEncryption` 4. **H04** — Make `SaveAppConfig` atomic (write-to-tmp + rename) 5. **H05** — Pass encKey to SaveAppConfig on deploy failure 6. **H07** — Buffer template rendering (web + setup) 7. **H08** — Use sync.Once in Syncer.Stop() 8. **H09** — Fix sync race (set syncing=true before unlock) 9. **H10** — Thread context through Cloudflare HTTP calls 10. **H11** — Deep-copy in GetFullStatus no-cache path 11. **H14** — Fix restic retry context + check unlock error 12. **H15** — Fix crossdrive leaf collision with `seen` map 13. **H17** — Add nil check for crossDriveRunner 14. **H18** — Use TrimSpace for selftest output 15. **H19** — WaitGroup for stderr goroutine in MigrateDrive 16. **H20** — Guard UUID slice length ### Phase 3 — Security Hardening (P1-P2) **Estimate: ~1.5 hours** 1. **H06** — Add login rate limiting 2. **H12** — Add MaxBytesReader middleware 3. **H16** — Validate Bearer token in CSRF middleware 4. **H21** — Fix fstab entry matching to use field parsing 5. **M12** — Validate backup paths in RemoveStack 6. **M22** — Clamp memory availability to min 0 7. **M26** — Server-side CSRF for setup wizard ### Phase 4 — Medium Priority Fixes (P2) **Estimate: ~3 hours** 1. **M01** — Add timeout to Hub connectivity test 2. **M02** — Add storagePaths to initial alert refresh 3. **M03** — Fix DrainPendingEvents to restore on save failure 4. **M04** — Add duplicate check in AddStoragePath 5. **M06** — Use validation cache to skip re-validation 6. **M07** — Stream file copies instead of reading into memory 7. **M08** — Record all drive snapshots in history 8. **M09** — Cache subdomain in Stack struct 9. **M14** — Guard scheduler double-start 10. **M16** — Set "failed" state on compose-up failure 11. **M17** — Add timeout to selfupdate docker commands 12. **M21** — Fix mmcblk partition path 13. **M23** — Remove "manual" from triggerAllCrossBackups 14. **M24** — Fix writeFreshConfig settings path 15. **M25** — Fix SetStep race (saveLocked) 16. **M30** — Single GetStacks() call per handler ### Phase 5 — Standardization (S-items) **Estimate: ~2 hours** 1. **S01** — Unify compose env construction 2. **S02** — Single `humanizeBytes` in util/ 3. **S03** — Atomic writes everywhere 4. **S04** — Shared HTTP clients 5. **S05** — exec.CommandContext everywhere 6. **S07** — Base adapter embedding 7. **S09** — Template buffer pattern ### Phase 6 — Low Priority Cleanup (P3) **Estimate: ~1.5 hours** Remove dead code (L01-L04), fix UTF-8 truncation (L05-L06), address remaining code smells and minor issues. --- **Total estimated effort: ~13 hours across 6 phases** Phases 1-3 are the most important and should be done first. Phase 4-5 can be done incrementally. Phase 6 is best done opportunistically when touching related files.