feat: Hub monitoring takeover — event system, dead man's switch, notifications (v0.3.0)

Replace external Healthchecks.io with Hub-native monitoring. New events table + /api/v1/event endpoint for structured events from controllers. Staleness checker (60s) detects unresponsive nodes. Backup deadline checker (daily 05:00) catches missed backups. Notification dispatcher sends operator (English) + customer (Hungarian) emails via Resend with per-event cooldowns. Event timeline on customer page, dashboard badges. Config form deprecates Monitoring UUIDs section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 18:53:24 +01:00
parent b4cb92e09f
commit 3217cb4751
16 changed files with 1319 additions and 64 deletions
@@ -80,12 +80,12 @@ backup:
 monitoring:
  enabled: true
  healthchecks_base: "https://status.felhom.eu"
-  ping_uuids:
-    heartbeat: ""                                  # Every 5 min — controller process alive
-    system_health: ""                              # Every 5 min — comprehensive system check
-    db_dump: ""                                    # Daily — after database dumps
-    backup: ""                                     # Daily — after restic snapshot
-    backup_integrity: ""                           # Weekly (Sunday) — restic check
+  # ping_uuids: (deprecated — monitoring is now handled by the Hub event system)
+  #   heartbeat: ""
+  #   system_health: ""
+  #   db_dump: ""
+  #   backup: ""
+  #   backup_integrity: ""
  system_health_interval: "5m"
  health_check_schedule: "06:00"
  thresholds: