fix: deep bug hunt II — concurrency, security & optimization (25 files)

Critical: watchdog mutex panic safety, SetGeoAppOverride nil guard,
SSD-only app DB restore fallback.

High: double deploy race (atomic Deploying flag), delete/remove during
deploy guard, ScanStacks overwrite protection, FileBrowser mount mutex,
PushEvent history, PushOnce error handling, DB dump sync+close before
rename, restic retry fresh context, encrypt failure logging, cross-backup
path traversal validation, deepCopyStack completeness.

Security: constant-time API key comparison, login rate limiting (5/min),
git credential masking in logs, storage path prefix traversal fix.

Concurrency: MigrateEncryption lock ordering, SubdomainInUse I/O outside
lock, scheduler late-registered jobs, SQLite WAL verification, metrics
shutdown context, telemetry scan error logging, asset sync lock scope.

Optimization: streaming file copy for DB dumps, restic stats dedup,
atomic infra config copy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-25 14:21:09 +01:00
parent 72ab145b41
commit db83db383c
25 changed files with 930 additions and 626 deletions
+40
View File
@@ -1,5 +1,45 @@
## Changelog
### v0.30.4 — Deep Bug Hunt II: Concurrency, Security & Optimization (2026-02-25)
#### Fixed (Critical)
- **Watchdog mutex panic** — Wrapped `handleDisconnect` call in anonymous func with deferred re-lock to guarantee mutex re-acquisition even on panic (C1)
- **SetGeoAppOverride nil crash** — Added nil guard; passing nil override now correctly deletes the entry instead of panicking (C2)
- **SSD-only app DB restore** — `restoreDBDumps` now falls back to `app.DrivePath` when `HDDPath` is empty (C3)
#### Fixed (High)
- **Double deploy race** — Added atomic check-and-set of `Deploying` flag with `clearDeploying()` helper on all error paths (H1)
- **Delete/Remove during deploy** — Both `DeleteStack` and `RemoveStack` now reject operations while stack is deploying (H2)
- **ScanStacks overwrite** — Skips updating `Deployed`/`AppConfig` for stacks with active deploy in progress (H3)
- **FileBrowser mount race** — Added `fileBrowserMu` mutex to prevent concurrent `SyncFileBrowserMounts` calls (H5)
- **PushEvent history gap** — Added `recordHistory` calls on both success and failure paths in PushEvent goroutine (H6)
- **PushOnce silent failure** — Now returns error for non-2xx HTTP responses instead of nil (H7)
- **DB dump file corruption** — Added `tmpFile.Sync()` and `tmpFile.Close()` before rename in `DumpOne` (H8)
- **Restic retry timeout** — Creates fresh 30-minute context for retry after unlock instead of reusing near-expired original (H9)
- **Encrypt failure silent** — Added warning log when encryption fails in `SaveAppConfig` (H10)
- **Cross-backup path traversal** — Validates destination path against registered storage paths in both web and API handlers (H11)
- **deepCopyStack incomplete** — Now deep-copies `Meta.OptionalConfig`, `Meta.HealthCheck`, and `DeployField.Options` (H12)
#### Security
- **Constant-time API key** — Replaced `==` with `subtle.ConstantTimeCompare` for API key comparison, preventing timing attacks (M1)
- **Login rate limiting** — Added per-IP rate limiter (5 attempts/minute) to login handler (M8)
- **Git credential masking** — Applied `maskRepoURL()` in `runGitInDir` log output to prevent credential leakage (M23)
- **Path prefix traversal** — Fixed `storageAttachBrowseHandler` prefix check to require trailing `/`, preventing sibling directory matches (M24)
#### Concurrency & Logic
- **MigrateEncryption race** — Moved `encKey == nil` check inside the mutex lock (M5)
- **SubdomainInUse I/O under lock** — Collect stack dirs under RLock, release, then perform disk I/O outside (M4)
- **Scheduler late jobs** — Jobs registered after `Start()` now immediately get their goroutine launched (M10)
- **SQLite WAL verification** — WAL pragma now verified via `QueryRow` + `Scan` instead of silent `Exec` (M13)
- **Metrics shutdown** — `sampleContainers` now uses parent context instead of `context.Background()` for clean shutdown (M14)
- **Telemetry scan logging** — Row scan errors now logged instead of silently swallowed (M15)
- **Asset sync lock** — Refactored to hold mutex only for status updates, not during entire HTTP download (M22)
#### Optimization
- **DB dump copy** — Replaced `os.ReadFile`/`os.WriteFile` with streaming `io.Copy` via `copyFile` helper for large dumps (M16)
- **Restic stats dedup** — Per-drive stats now computed once and aggregated, eliminating duplicate restic subprocess calls (M17)
- **Infra config atomic** — `syncInfraConfig` controller.yaml copy now uses atomic write via `copyFile` (M20)
### v0.30.3 — Comprehensive Bug Hunt Fixes (2026-02-25)
#### Fixed (Critical — P0)