- Clear HealthProbe on StartStack/RestartStack so stale unhealthy state
isn't re-applied by RefreshStatus
- Use 10s probe interval for unhealthy/new stacks (nil HealthProbe probes
immediately on next tick), switch to normal 5m interval once healthy
- Scheduler frequency 1m → 10s to support fast probing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Concurrency fixes:
- Deep-copy stacks in GetStack/GetStacks to prevent shared state mutation (C04)
- Add per-state mutex to watchdog pathProbeState (C05)
- Guard MetricsCollector.Start() with sync.Once against double-start (C06)
- Hold diskJobMu across entire raw mount operation (C07)
- Add mutex to SetEncryptionKey (C08), MigrateEncryption write lock (H03)
- Use sync.Once for sync.Stop() channel close (H08)
- Set syncing=true before releasing lock in TriggerSync (H09)
- Deep-copy lastDBDump/lastBackup in GetFullStatus (H11)
- Add WaitGroup for stderr goroutine in MigrateDrive (H19)
- Add mutex to SetBackupRunningCheck (M18)
Security fixes:
- Validate Bearer token against Hub API key in CSRF middleware (H16)
- Validate backup paths start with expected prefix in RemoveStack (M12)
- Guard uuid[:8] slice with length check (H20)
- Parse fstab fields exactly for mount target matching (H21)
Bug fixes:
- Use decrypted env vars for compose deploy (C01)
- Log decrypt failures in DecryptMap instead of swallowing (C02)
- Move Deployed=false inside lock in runComposeDeploy (C03)
- Fix activeDrives() to skip disconnected drives (H02)
- Fix Snapshot() stderr extraction from exec.ExitError (H01)
- Check unlockCmd.Run() error in restic (H01)
- Buffer template rendering via bytes.Buffer (H07)
- Thread context.Context through cloudflare client (H10)
- Fix leaf-name collision detection in cross-drive backup (H15)
- Add nil check for crossDriveRunner (H17)
- Use strings.TrimSpace instead of slice on command output (H18)
- Make SaveAppConfig atomic with write-to-tmp+rename (H04)
- Pass encKey on deploy failure SaveAppConfig (H05)
- Fix IPv6 address format in TCP health probe
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add network-level health probing from the controller to deployed apps.
The controller probes containers over the shared Docker network and
overrides stack state to "unhealthy" if the service isn't responding.
Three probe types: http (any response = alive), api (validates status
code and body content), tcp (port reachability). Configured per-app
via healthcheck: section in .felhom.yml. Runs every minute, per-app
interval defaults to 5 minutes.
This replaces Docker-level healthchecks for distroless images (e.g.
Vikunja) that lack shell utilities, and complements existing Docker
healthchecks for other apps.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>