feat: Hub monitoring takeover — event system, dead man's switch, notifications (v0.3.0)

Replace external Healthchecks.io with Hub-native monitoring. New events
table + /api/v1/event endpoint for structured events from controllers.
Staleness checker (60s) detects unresponsive nodes. Backup deadline
checker (daily 05:00) catches missed backups. Notification dispatcher
sends operator (English) + customer (Hungarian) emails via Resend with
per-event cooldowns. Event timeline on customer page, dashboard badges.
Config form deprecates Monitoring UUIDs section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-20 18:53:24 +01:00
parent b4cb92e09f
commit 3217cb4751
16 changed files with 1319 additions and 64 deletions
+6 -6
View File
@@ -80,12 +80,12 @@ backup:
monitoring:
enabled: true
healthchecks_base: "https://status.felhom.eu"
ping_uuids:
heartbeat: "" # Every 5 min — controller process alive
system_health: "" # Every 5 min — comprehensive system check
db_dump: "" # Daily — after database dumps
backup: "" # Daily — after restic snapshot
backup_integrity: "" # Weekly (Sunday) — restic check
# ping_uuids: (deprecated — monitoring is now handled by the Hub event system)
# heartbeat: ""
# system_health: ""
# db_dump: ""
# backup: ""
# backup_integrity: ""
system_health_interval: "5m"
health_check_schedule: "06:00"
thresholds: