v0.4.0: monitoring & backup — scheduler, CPU/temp metrics, healthchecks, restic backups

Phase 2 (Monitoring & Health):
- Central job scheduler replacing ad-hoc goroutines (internal/scheduler)
- CPU usage collector via /proc/stat background sampling (internal/system/cpu_linux.go)
- Temperature reading from /sys/class/thermal + /host/sys (Docker mount)
- Load average from /proc/loadavg
- Healthchecks.io-compatible HTTP pinger (internal/monitor/pinger.go)
- System health checks: disk, memory, CPU, temp, Docker, protected containers (internal/monitor/healthcheck.go)

Phase 3 (Backups):
- Database auto-discovery via docker ps + docker inspect (internal/backup/dbdump.go)
- Database dumping via docker exec (pg_dump / mariadb-dump) with atomic writes
- Restic backup integration with auto-password generation (internal/backup/restic.go)
- Backup orchestrator: DB dumps + restic snapshots + weekly prune (internal/backup/backup.go)
- Manual backup trigger via dashboard button and POST /api/backup/run

Dashboard UI:
- CPU usage bar with load average display
- Temperature with colored indicator dot
- Backup status card with last run time, DB count, repo stats
- "Mentés most" button for manual backup trigger

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-15 11:17:10 +01:00
parent 8a988c5998
commit d32d9fb44b
21 changed files with 2060 additions and 82 deletions
+7 -1
View File
@@ -31,6 +31,11 @@ paths:
data_dir: "/opt/docker/felhom-controller/data"
backup_dir: "/srv/backups"
db_dump_dir: "/srv/backups/db-dumps"
hdd_path: "" # Optional: HDD mount path (e.g., /mnt/hdd)
# --- System ---
system:
reserved_memory_mb: 384 # Memory reserved for OS (excluded from app budget)
# --- Web UI ---
web:
@@ -61,7 +66,7 @@ stacks:
backup:
enabled: true
restic_repo: "/srv/backups/restic-repo"
restic_password_file: "/opt/docker/felhom-controller/restic-password"
restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
db_dump_schedule: "02:30"
restic_schedule: "03:00"
retention:
@@ -78,6 +83,7 @@ monitoring:
db_dump: "CHANGEME-uuid-for-db-dump"
backup: "CHANGEME-uuid-for-backup"
system_health: "CHANGEME-uuid-for-system-health"
system_health_interval: "5m"
health_check_schedule: "06:00"
thresholds:
disk_warn_percent: 80