feat: storage watchdog — USB disconnect detection, auto-stop, safe eject, auto-reconnect (v0.17.0)
New storage watchdog monitors registered storage paths every 5s. On disconnect (3 consecutive probe failures), auto-stops affected apps, lazy-unmounts stale VFS entries, fires alerts/notifications/hub report. On reconnect (UUID detected), auto-remounts via fstab, cleans stale restic locks, offers app restart. Safe disconnect UI for USB drives: confirmation dialog, stop apps, sync, unmount. Disconnected state visible across all pages (dashboard, settings, backups, monitoring) with hatched red bars and badges. Backup guards skip disconnected drives. 22 files changed (1 new: monitor/watchdog.go), ~1500 lines added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+48
-8
@@ -85,7 +85,7 @@ A single, lightweight Go container that replaces Portainer + scattered systemd s
|
||||
| **Backup** | `internal/backup/` | Per-drive 3-layer backup: DB dumps → restic snapshots → cross-drive copies, restore |
|
||||
| **Storage** | `internal/storage/` | Disk scanning (`lsblk`), partitioning (`sfdisk`), formatting (`mkfs.ext4`), mounting, data migration (`rsync`) |
|
||||
| **System** | `internal/system/` | System info (`/proc`), CPU collector, mount points, disk usage, FS info |
|
||||
| **Monitor** | `internal/monitor/` | Healthchecks.io pinger, system health checks |
|
||||
| **Monitor** | `internal/monitor/` | Healthchecks.io pinger, system health checks, storage watchdog |
|
||||
| **Metrics** | `internal/metrics/` | SQLite time-series store, system + container metric collection |
|
||||
| **Scheduler** | `internal/scheduler/` | Central job scheduler (periodic + daily, skip-if-running, panic recovery) |
|
||||
| **SelfUpdate** | `internal/selfupdate/` | Version checking (registry), update trigger, state persistence, startup verification |
|
||||
@@ -403,8 +403,9 @@ Multiple external storage paths supported with:
|
||||
- **Label**: Human-readable name (editable inline)
|
||||
- **Default flag**: New deploys use this path by default
|
||||
- **Schedulable flag**: Path appears in deploy dropdown
|
||||
- **Disconnected state**: `Disconnected`, `DisconnectedAt`, `StoppedStacks` — set by watchdog or safe-disconnect API, cleared on reconnect
|
||||
- **Auto-discovery**: On startup, scans deployed apps' `HDD_PATH` values and registers unknown paths
|
||||
- Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel
|
||||
- Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel, SetDisconnected, ClearDisconnected
|
||||
|
||||
#### Data Migration (`internal/storage/migrate.go`)
|
||||
|
||||
@@ -431,6 +432,39 @@ After migration, the deploy page detects leftover data on previous storage paths
|
||||
|
||||
When storage paths are added or removed, `syncFileBrowserMounts()` auto-regenerates FileBrowser's `docker-compose.yml` with volume mounts for all registered paths, then recreates the container.
|
||||
|
||||
#### Storage Watchdog (`internal/monitor/watchdog.go`)
|
||||
|
||||
Continuously monitors registered storage paths for disconnection/reconnection (primarily USB drives):
|
||||
|
||||
- **Probe loop**: `ProbeStoragePath()` calls `syscall.Statfs()` with 3-second timeout in a goroutine. Runs every 5s per connected path, 30s per disconnected path.
|
||||
- **Debouncing**: 3 consecutive probe failures required before declaring a drive disconnected (prevents false positives from transient I/O).
|
||||
- **Disconnect reaction** (automatic, ~15s detection):
|
||||
1. Stops all deployed stacks whose `HDD_PATH` is under the disconnected drive (skips protected stacks)
|
||||
2. Persists `Disconnected`, `DisconnectedAt`, `StoppedStacks` to `settings.json`
|
||||
3. Lazy-unmounts stale VFS entries (`umount -l`) — for attach-wizard drives, unmounts bind first, then raw
|
||||
4. Fires alert refresh (red banner on all pages), notification (`storage_disconnected`), and immediate hub report push
|
||||
- **Auto-reconnect** (for UUID-based fstab entries):
|
||||
1. Checks `/host-dev/disk/by-uuid/<uuid>` for device reappearance
|
||||
2. Cleans stale mounts, then `mount -T /host-fstab <path>` (raw + bind for attach-wizard drives)
|
||||
3. Verifies with a post-mount probe
|
||||
4. Runs `restic unlock` if stale lock files exist
|
||||
5. Validates `StoppedStacks` (filters to actually-stopped stacks), clears `Disconnected` flag
|
||||
6. Fires alert refresh, notification (`storage_reconnected`), hub report push
|
||||
|
||||
**Safe disconnect UI** (manual, Settings page):
|
||||
|
||||
- "Leválasztás" button shown for USB drives (detected via sysfs symlink path containing `/usb`)
|
||||
- Confirmation dialog lists affected apps
|
||||
- Flow: stop apps → `sync` → `umount` (fallback `umount -l`) → mark disconnected → notification
|
||||
- Disconnected card: dashed border, red badge, timestamp, stopped apps list, "Csatlakoztatás" (reconnect) button
|
||||
- After reconnect: "Alkalmazások indítása" button to restart auto-stopped stacks
|
||||
|
||||
**USB detection** (`system.IsUSBDevice`): Reads `/host/sys/block/<disk>` symlink — if target path contains `/usb`, it's a USB device. The `removable` sysfs flag is unreliable for USB HDDs (returns 0).
|
||||
|
||||
**Backup guards**: Nightly DB dumps, restic snapshots, and cross-drive backups all skip disconnected drives with WARN log (not treated as failures).
|
||||
|
||||
**UI integration**: Disconnected drives show with hatched red bars on dashboard, monitoring, and backup pages. Per-app backup rows show "Meghajtó leválasztva" badge. Health check emits warnings for disconnected paths.
|
||||
|
||||
---
|
||||
|
||||
### 4. Monitoring & Health
|
||||
@@ -446,7 +480,7 @@ When storage paths are added or removed, `syncFileBrowserMounts()` auto-regenera
|
||||
| CPU temperature | >= 75C | >= 85C |
|
||||
| Docker daemon | — | unreachable |
|
||||
| Protected containers | — | not running |
|
||||
| Storage paths | not a mount point (data on SSD) | path inaccessible, disk >= 95% |
|
||||
| Storage paths | not a mount point (data on SSD), drive disconnected | path inaccessible, disk >= 95% |
|
||||
|
||||
Backup destination validation (`CheckBackupDestination`) has tiered checks:
|
||||
- Path doesn't exist → critical/blocked
|
||||
@@ -694,7 +728,7 @@ Runtime-mutable settings in `settings.json` (separate from infrastructure config
|
||||
| `notifications` | email, enabled events, cooldown hours |
|
||||
| `db_validations` | per-DB dump validation results (survives restarts) |
|
||||
| `app_backup` | per-app map: enabled flag, cross-drive config (method, dest, schedule, runtime status) |
|
||||
| `storage_paths` | registered paths with label, default flag, schedulable flag |
|
||||
| `storage_paths` | registered paths with label, default flag, schedulable flag, disconnected state |
|
||||
| `cross_drive_restic_password` | auto-generated restic password for cross-drive repos |
|
||||
|
||||
All public methods use `sync.RWMutex`. File writes are atomic (`.tmp` + rename).
|
||||
@@ -704,7 +738,7 @@ All public methods use `sync.RWMutex`. File writes are atomic (`.tmp` + rename).
|
||||
Five sections:
|
||||
1. **System config** — read-only display of `controller.yaml` values
|
||||
2. **Version & update** — current/latest version, check/update buttons, auto-update status, last update result
|
||||
3. **Storage paths** — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes
|
||||
3. **Storage paths** — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes, safe disconnect/reconnect for USB drives
|
||||
4. **Password change** — current + new + confirm, min 8 chars
|
||||
5. **Notifications** — email, event checkboxes, cooldown hours, test email button
|
||||
|
||||
@@ -805,10 +839,11 @@ controller/
|
||||
│ ├── system/
|
||||
│ │ ├── info.go, info_linux.go # RAM, disk, CPU, temperature, load average
|
||||
│ │ ├── cpu_linux.go # Background /proc/stat sampling
|
||||
│ │ └── mounts_linux.go # Mount points, disk usage, FS info, backup dest checks
|
||||
│ │ └── mounts_linux.go # Mount points, disk usage, FS info, backup dest checks, storage probing, USB detection
|
||||
│ ├── monitor/
|
||||
│ │ ├── pinger.go # Healthchecks.io HTTP ping client
|
||||
│ │ └── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker)
|
||||
│ │ ├── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker)
|
||||
│ │ └── watchdog.go # Storage watchdog (probe, disconnect/reconnect, safe eject)
|
||||
│ ├── metrics/
|
||||
│ │ ├── store.go # SQLite time-series (WAL mode, downsampled queries)
|
||||
│ │ ├── collector.go # Background collector (60s, system + docker stats)
|
||||
@@ -827,7 +862,7 @@ controller/
|
||||
│ ├── auth.go # Session auth, login/logout, session cleanup
|
||||
│ ├── handlers.go # Page handlers (dashboard, stacks, deploy, backups, etc.)
|
||||
│ ├── handler_restore.go # DR: restore page handler + APIs (scan, restore all, skip)
|
||||
│ ├── storage_handlers.go # Storage API handlers (scan, format, attach, migrate, cleanup)
|
||||
│ ├── storage_handlers.go # Storage API handlers (scan, format, attach, migrate, cleanup, disconnect/reconnect)
|
||||
│ ├── alerts.go # State-based alert generation
|
||||
│ ├── funcmap.go # Template functions (state colors, Hungarian formatting)
|
||||
│ ├── embed.go # go:embed for templates + Chart.js
|
||||
@@ -973,6 +1008,10 @@ All daily jobs use Europe/Budapest timezone. Skip-if-running prevents concurrent
|
||||
| POST | `/api/storage/attach/cancel` | Cleanup temp raw mount |
|
||||
| POST | `/api/storage/migrate` | Start app data migration |
|
||||
| GET | `/api/storage/migrate/status` | Migration progress |
|
||||
| POST | `/api/storage/disconnect` | Safe disconnect (stop apps, unmount) |
|
||||
| POST | `/api/storage/reconnect` | Reconnect disconnected drive |
|
||||
| POST | `/api/storage/restart-apps` | Restart auto-stopped apps |
|
||||
| GET | `/api/storage/status` | All storage paths with connection state |
|
||||
|
||||
### Self-Update
|
||||
|
||||
@@ -1060,6 +1099,7 @@ See `docker-compose.yml` for the full volume configuration.
|
||||
- [x] Storage management (scan, format, mount, registry)
|
||||
- [x] Attach existing drive wizard (v0.15.0) — bind-mount subfolder from pre-formatted drive, directory browser
|
||||
- [x] App data migration between storage paths
|
||||
- [x] Storage watchdog (v0.17.0) — USB disconnect detection (~15s), auto-stop apps, auto-remount on reconnect, safe eject UI
|
||||
- [x] Central hub reporting
|
||||
- [x] Email notifications via hub relay
|
||||
- [x] Settings persistence and password management
|
||||
|
||||
Reference in New Issue
Block a user