diff --git a/CHANGELOG.md b/CHANGELOG.md index 6978624..81b74fb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,52 @@ ## Changelog +### What was just completed (2026-02-20 session 64) +- **v0.21.0 — Hub Monitoring Takeover (Controller-side, Phases 5+6):** + + Replaces external Healthchecks.io dependency with Hub-native event system. The controller now pushes structured events directly to the Hub's `/api/v1/event` endpoint, and the Hub handles dead man's switch detection, notification dispatch, and cooldown management. + + **Phase 5 — Event Push System (`internal/notify/notifier.go`):** + - New core method `PushEvent(eventType, severity, message, details)` — non-blocking goroutine, 2 retries with 3s backoff, POSTs to Hub `/api/v1/event` + - 8 typed detail structs: `BackupDetails`, `DBDumpDetails`, `DiskDetails`, `HealthDetails`, `StorageDetails`, `UpdateDetails`, `AppDetails`, `CrossDriveDetails` + - Replaced all old `Notify*` methods with event-based equivalents: + - `NotifyBackupCompleted/Failed` → `backup_completed`/`backup_failed` events + - `NotifyDBDumpCompleted/Failed` → `db_dump_completed`/`db_dump_failed` events + - `NotifyIntegrityOK/Failed` → `backup_integrity_ok`/`backup_integrity_failed` events + - `NotifyHealthChange` → detects transitions, pushes `health_degraded`/`health_critical`/`health_recovered` + - `NotifyStorageDisconnected/Reconnected` → `storage_disconnected`/`storage_reconnected` events + - `NotifyControllerStarted` → `controller_started` event on startup + - `NotifyControllerUpdated` → `controller_updated` event (replaces `NotifyUpdateSuccess/Failed`) + - `NotifyAppDeployed/Removed` → `app_deployed`/`app_removed` events + - `NotifyCrossDriveCompleted/Failed` → `crossdrive_completed`/`crossdrive_failed` events + - `NotifyDRStarted/Completed` → `disaster_recovery_started`/`disaster_recovery_completed` events + - Removed old `/api/v1/notify` relay, `classifyWarning()`, and client-side cooldown logic (Hub handles cooldowns now) + - `SendTest()` now pushes `test` event type via `PushEvent` + - `SyncPreferences` updated to include `cooldownHours` parameter + + **Phase 5 — Event Wiring:** + - `main.go`: Wired success events for backup, db-dump, integrity check; startup event with 5s delay; update event after `VerifyStartup()` + - `router.go`: Added `NotifyAppDeployed`/`NotifyAppRemoved` after successful deploy/remove via API + - `handler_restore.go`: Added `NotifyDRStarted`/`NotifyDRCompleted` in DR restore flow + - `server.go`: New `HubPushStatusData` struct and `SetHubPushStatus` callback for monitoring page + + **Phase 5 — Hub Connection Monitoring:** + - `pusher.go`: Added `PushStatus` tracking (LastAttempt, LastSuccess, LastError, Consecutive failures) to report Pusher + - `handlers.go`: Monitoring page now shows Hub connection status (connected/unreachable, URL, customer ID, last success, last error) instead of Healthchecks ping UUIDs + - `monitoring.html`: Replaced "Távoli monitoring" section with "Hub kapcsolat" section + - `alerts.go`: Replaced "Missing ping UUIDs" alert with Hub connection alerts (`hub-disabled` warning, `hub-unreachable` error) + + **Phase 5 — Expanded Notification Settings:** + - `settings.html`: Expanded from 4 checkboxes to 11 grouped toggles in two categories: + - "Hibák és figyelmeztetések": backup_failed, db_dump_failed, backup_integrity_failed, crossdrive_failed, disk alerts, storage_disconnected, node_down, health_critical, expected missed + - "Tájékoztató": storage_reconnected, health_recovered + - Compound toggles: "Lemez figyelmeztetés" maps to `disk_warning` + `disk_critical`; "Elvárt mentés elmaradt" maps to `expected_backup_missed` + `expected_dbdump_missed` + - `settings.go`: Updated `DefaultEnabledEvents` to new Hub event types + - `handlers.go`: Updated settings POST handler for expanded event names and compound toggles + + **Phase 6 — Config Cleanup:** + - `main.go`: Deprecation log on startup when ping UUIDs are configured: `[INFO] Healthchecks ping UUIDs configured but no longer used — monitoring is now handled by the Hub` + - Pinger still runs for transitional backward compatibility + ### What was just completed (2026-02-20 session 63) - **v0.20.0 — Hub Config Management (Phase B):** diff --git a/controller/README.md b/controller/README.md index ead5001..52b42fc 100644 --- a/controller/README.md +++ b/controller/README.md @@ -4,7 +4,7 @@ A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware. -**Current version: v0.20.0** +**Current version: v0.21.0** --- @@ -509,16 +509,9 @@ Backup destination validation (`CheckBackupDestination`) has tiered checks: - Disk >95% full → critical/blocked - Disk >90% full → warning -#### Healthchecks.io Integration (`internal/monitor/pinger.go`) +#### Healthchecks.io Integration (deprecated) -Five ping UUIDs for external monitoring: -- **Heartbeat**: every 5 min (simple "I'm alive") -- **System Health**: periodic health check results -- **DB Dump**: after nightly database dumps -- **Backup**: after nightly restic backup -- **Backup Integrity**: weekly `restic check` result - -3-attempt retry with 2-second backoff. Pinger never fails the caller. +Legacy pinger (`internal/monitor/pinger.go`) still runs for backward compatibility but is no longer the primary monitoring mechanism. Monitoring is now handled by the Hub event system (see [Notifications](#5-notifications)). A deprecation log is emitted on startup if ping UUIDs are configured. #### Metrics Store (`internal/metrics/`) @@ -535,48 +528,66 @@ Full-page system monitor at `/monitoring`: - **System Metrics Charts**: 4 line charts (CPU, Memory, Temperature, Load) in 2x2 grid - **Container Resources**: horizontal bar charts (CPU% and Memory per container) - **Per-container Detail**: click-to-expand historical charts -- **Remote Monitoring Status**: shows Healthchecks ping UUID configuration +- **Hub Connection Status**: shows Hub URL, customer ID, connection state (connected/unreachable), last successful push, last error Chart.js 4.4.7 embedded locally (works in offline environments), dark theme matching site design. #### Alert System (`internal/web/alerts.go`) State-based alerts displayed on all pages: -- Sources: health issues, missing ping UUIDs, backup disabled +- Sources: health issues, Hub connection status, backup disabled, storage disconnected, update available +- Hub alerts: `hub-disabled` (warning) when Hub not enabled, `hub-unreachable` (error) when last push failed and no success in 30 min - Sorted by severity (error > warning > info), capped at 5 visible -- Refreshed every 5 min + on startup -- Monitoring page suppresses ping-related alerts (shown in dedicated table instead) +- Refreshed every 5 min + on startup + on storage state changes --- ### 5. Notifications -#### Email Delivery +#### Hub Event System (`internal/notify/notifier.go`) -The controller relays notifications through the central hub, which sends emails via the Resend API: -1. Controller detects event (health degradation, backup failure, etc.) -2. Non-blocking POST to hub's `/api/v1/notify` with event details -3. Hub checks customer notification preferences -4. Hub sends Hungarian-language email via Resend +The controller pushes structured events to the Hub's `/api/v1/event` endpoint. The Hub handles notification dispatch, cooldown management, and dead man's switch detection. + +**Core method:** `PushEvent(eventType, severity, message, details)` — non-blocking goroutine, 2 retries with 3s backoff, never blocks the caller. #### Event Types -| Event | Trigger | -|-------|---------| -| `disk_warning` | Disk usage crosses warning/critical threshold | -| `backup_failed` | Nightly backup or DB dump fails | -| `update_available` | New app version detected in catalog | -| `security_update` | Critical security update available | +| Event Type | Severity | Trigger | +|------------|----------|---------| +| `backup_completed` | info | Nightly restic backup succeeds | +| `backup_failed` | error | Nightly restic backup fails | +| `db_dump_completed` | info | Nightly database dumps succeed | +| `db_dump_failed` | error | Nightly database dumps fail | +| `backup_integrity_ok` | info | Weekly `restic check` passes | +| `backup_integrity_failed` | error | Weekly `restic check` fails | +| `crossdrive_completed` | info | Cross-drive secondary backup succeeds | +| `crossdrive_failed` | error | Cross-drive secondary backup fails | +| `health_degraded` | warning | Health status degrades (ok→warn) | +| `health_critical` | error | Health status critical (any→fail) | +| `health_recovered` | info | Health status recovers (fail/warn→ok) | +| `disk_warning` | warning | Disk usage crosses 90% | +| `disk_critical` | error | Disk usage crosses 95% | +| `storage_disconnected` | error | Storage drive physically removed | +| `storage_reconnected` | info | Storage drive reconnected | +| `controller_started` | info | Controller process starts | +| `controller_updated` | info/error | Self-update success or failure | +| `app_deployed` | info | New app deployed via API | +| `app_removed` | info | App removed via API | +| `disaster_recovery_started` | warning | DR restore begins | +| `disaster_recovery_completed` | info/error | DR restore finishes (success/partial) | -#### Cooldown System +Each event carries typed detail structs (e.g., `BackupDetails`, `DiskDetails`, `HealthDetails`) serialized as JSON. -Per-event-type cooldown (default 6 hours, configurable) prevents notification spam. Only notifies on **status degradation** (ok→warn, ok→fail, warn→fail), not on repeated same-status checks. +#### Default Enabled Events + +Events the customer receives notifications for (configurable in settings): +`backup_failed`, `db_dump_failed`, `disk_warning`, `disk_critical`, `storage_disconnected`, `node_down`, `health_critical`, `expected_backup_missed`, `expected_dbdump_missed` #### Preference Sync -Notification preferences (email, enabled events, cooldown) are: +Notification preferences (email, enabled events, cooldown hours) are: - Stored locally in `settings.json` -- Synced to hub on save and on controller startup +- Synced to Hub on save and on controller startup via `POST /api/v1/preferences` - Hub sync failure doesn't block local save --- @@ -776,7 +787,7 @@ Periodic JSON push (default every 15 min) to the central felhom-hub service: - Stacks: deployed apps with versions and states - Config hash: SHA256 of `controller.yaml` for Hub-side config comparison -Bearer token authentication, 3-attempt retry with 5-second backoff. +Bearer token authentication, 3-attempt retry with 5-second backoff. Push status tracked via `PushStatus` struct (LastAttempt, LastSuccess, LastError, consecutive failures) — used by the monitoring page and alert system to show Hub connection health. #### Infrastructure Backup to Hub (`internal/report/infra_backup.go`) @@ -792,11 +803,14 @@ This enables fully automated recovery when the system drive is replaced — the #### Hub Dashboard The hub service (separate Go app in the `felhom.eu` repo) provides: -- Multi-customer overview table with status indicators -- Customer detail page with system/storage/containers/backup/health sections +- Multi-customer overview table with status indicators and event count badges +- Customer detail page with system/storage/containers/backup/health/events sections +- Event timeline: last 50 events with severity filter, colored badges, source tracking +- Dead man's switch: staleness detection (30min stale, 60min down), missed backup detection (daily at 05:00) +- Notification dispatch: operator (English) + customer (Hungarian) emails via Resend with per-event cooldowns - Infra backup status per customer (last sync, stack count, disk count) - Color coding: green (<30min), yellow (30-60min), red (>60min since last report) -- 90-day report retention with daily prune +- 90-day report + event retention with daily prune at 04:30 Budapest time ### 9. Disaster Recovery diff --git a/controller/cmd/controller/main.go b/controller/cmd/controller/main.go index 3b7f90a..6dc5703 100644 --- a/controller/cmd/controller/main.go +++ b/controller/cmd/controller/main.go @@ -197,9 +197,15 @@ func main() { logger.Println("[INFO] Metrics collector started (60s interval)") } - // --- Initialize health pinger --- + // --- Initialize health pinger (legacy, will be removed) --- pinger := monitor.NewPinger(&cfg.Monitoring, logger) + // Deprecation notice for ping UUIDs + uuids := cfg.Monitoring.PingUUIDs + if uuids.Heartbeat != "" || uuids.SystemHealth != "" || uuids.DBDump != "" || uuids.Backup != "" || uuids.BackupIntegrity != "" { + logger.Println("[INFO] Healthchecks ping UUIDs configured but no longer used — monitoring is now handled by the Hub") + } + // --- Initialize backup manager --- var backupMgr *backup.Manager stackProv := &stackAdapter{ @@ -241,11 +247,7 @@ func main() { }) // Check for post-update state (did a previous update succeed or fail?) if state := updater.VerifyStartup(); state != nil { - if state.Status == "success" { - notifier.NotifyUpdateSuccess(state.PreviousVersion, state.TargetVersion) - } else if state.Status == "failed" { - notifier.NotifyUpdateFailed(state.TargetVersion, state.Error) - } + notifier.NotifyControllerUpdated(state.PreviousVersion, state.TargetVersion, state.Status == "success") } logger.Printf("[INFO] Self-update enabled (check every %s, auto-update: %v, auto-update time: %s)", cfg.SelfUpdate.CheckInterval, cfg.SelfUpdate.AutoUpdate, cfg.SelfUpdate.AutoUpdateTime) @@ -302,6 +304,16 @@ func main() { var hubPusher *report.Pusher if cfg.Hub.URL != "" && cfg.Hub.APIKey != "" { hubPusher = report.NewPusher(&cfg.Hub, logger) + // Wire hub push status into alert manager for dashboard alerts + alertMgr.SetHubPushStatus(func() web.HubPushStatusData { + s := hubPusher.GetStatus() + return web.HubPushStatusData{ + LastAttempt: s.LastAttempt, + LastSuccess: s.LastSuccess, + LastError: s.LastError, + Consecutive: s.Consecutive, + } + }) } // Backup daily jobs @@ -310,6 +322,8 @@ func main() { err := backupMgr.RunDBDumps(ctx) if err != nil { notifier.NotifyDBDumpFailed("Adatbázis mentés sikertelen", err.Error()) + } else { + notifier.NotifyDBDumpCompleted(notify.DBDumpDetails{}) } return err }) @@ -317,6 +331,8 @@ func main() { err := backupMgr.RunBackup(ctx) if err != nil { notifier.NotifyBackupFailed("Biztonsági mentés sikertelen", err.Error()) + } else { + notifier.NotifyBackupCompleted(notify.BackupDetails{}) } // Phase 3: Chain cross-drive backups immediately after restic (regardless of restic success) // Daily jobs run every night; weekly jobs only on Sunday @@ -345,6 +361,8 @@ func main() { err := backupMgr.RunIntegrityCheck(ctx) if err != nil { notifier.NotifyIntegrityFailed("Mentés integritás ellenőrzés sikertelen", err.Error()) + } else { + notifier.NotifyIntegrityOK("Mentés integritás ellenőrzés sikeres") } return err }) @@ -454,6 +472,9 @@ func main() { go func() { time.Sleep(5 * time.Second) // Let all subsystems fully initialize + // Push controller startup event to Hub + notifier.NotifyControllerStarted(Version) + // Heartbeat ping pinger.Ping(cfg.Monitoring.PingUUIDs.Heartbeat, "startup") logger.Println("[INFO] Startup heartbeat ping sent") @@ -533,7 +554,7 @@ func main() { go func() { prefs := sett.GetNotificationPrefs() if prefs.Email != "" { - if err := notifier.SyncPreferences(prefs.Email, prefs.EnabledEvents); err != nil { + if err := notifier.SyncPreferences(prefs.Email, prefs.EnabledEvents, prefs.CooldownHours); err != nil { logger.Printf("[WARN] Failed to sync notification preferences on startup: %v", err) } } @@ -547,11 +568,22 @@ func main() { }() // --- Initialize API router --- - apiRouter := api.NewRouter(cfg, *configPath, sett, stackMgr, syncer, cpuCollector, backupMgr, crossDriveRunner, metricsStore, updater, logger) + apiRouter := api.NewRouter(cfg, *configPath, sett, stackMgr, syncer, cpuCollector, backupMgr, crossDriveRunner, metricsStore, updater, notifier, logger) // --- Initialize web server --- webServer := web.NewServer(cfg, stackMgr, cpuCollector, backupMgr, crossDriveRunner, sched, sett, alertMgr, notifier, updater, logger, Version) webServer.SetStorageWatchdog(storageWatchdog) + if hubPusher != nil { + webServer.SetHubPushStatus(func() web.HubPushStatusData { + s := hubPusher.GetStatus() + return web.HubPushStatusData{ + LastAttempt: s.LastAttempt, + LastSuccess: s.LastSuccess, + LastError: s.LastError, + Consecutive: s.Consecutive, + } + }) + } // --- Initialize drive migrator --- driveMigrator := &storage.DriveMigrator{ diff --git a/controller/internal/api/router.go b/controller/internal/api/router.go index aaa7bff..2170c66 100644 --- a/controller/internal/api/router.go +++ b/controller/internal/api/router.go @@ -16,6 +16,7 @@ import ( "gitea.dooplex.hu/admin/felhom-controller/internal/backup" "gitea.dooplex.hu/admin/felhom-controller/internal/config" "gitea.dooplex.hu/admin/felhom-controller/internal/metrics" + "gitea.dooplex.hu/admin/felhom-controller/internal/notify" "gitea.dooplex.hu/admin/felhom-controller/internal/selfupdate" "gitea.dooplex.hu/admin/felhom-controller/internal/settings" "gitea.dooplex.hu/admin/felhom-controller/internal/stacks" @@ -35,11 +36,12 @@ type Router struct { crossDriveRunner *backup.CrossDriveRunner metricsStore *metrics.MetricsStore updater *selfupdate.Updater + notifier *notify.Notifier logger *log.Logger } -func NewRouter(cfg *config.Config, configPath string, sett *settings.Settings, stackMgr *stacks.Manager, syncer *catalogsync.Syncer, cpuCollector *system.CPUCollector, backupMgr *backup.Manager, crossDrive *backup.CrossDriveRunner, metricsStore *metrics.MetricsStore, updater *selfupdate.Updater, logger *log.Logger) *Router { - return &Router{cfg: cfg, configPath: configPath, sett: sett, stackMgr: stackMgr, syncer: syncer, cpuCollector: cpuCollector, backupMgr: backupMgr, crossDriveRunner: crossDrive, metricsStore: metricsStore, updater: updater, logger: logger} +func NewRouter(cfg *config.Config, configPath string, sett *settings.Settings, stackMgr *stacks.Manager, syncer *catalogsync.Syncer, cpuCollector *system.CPUCollector, backupMgr *backup.Manager, crossDrive *backup.CrossDriveRunner, metricsStore *metrics.MetricsStore, updater *selfupdate.Updater, notif *notify.Notifier, logger *log.Logger) *Router { + return &Router{cfg: cfg, configPath: configPath, sett: sett, stackMgr: stackMgr, syncer: syncer, cpuCollector: cpuCollector, backupMgr: backupMgr, crossDriveRunner: crossDrive, metricsStore: metricsStore, updater: updater, notifier: notif, logger: logger} } type apiResponse struct { @@ -280,6 +282,15 @@ func (r *Router) deployStack(w http.ResponseWriter, req *http.Request, name stri resp.Data = map[string]string{"warning": warning} } writeJSON(w, http.StatusOK, resp) + + // Push app deployed event to Hub + if r.notifier != nil { + displayName := name + if s, ok := r.stackMgr.GetStack(name); ok && s.Meta.DisplayName != "" { + displayName = s.Meta.DisplayName + } + r.notifier.NotifyAppDeployed(name, displayName) + } } func (r *Router) actionStack(w http.ResponseWriter, action, name string) { @@ -438,6 +449,11 @@ func (r *Router) removeStack(w http.ResponseWriter, req *http.Request, name stri } writeJSON(w, http.StatusOK, apiResponse{OK: true, Data: resp, Message: "Stack " + name + " removed"}) + + // Push app removed event to Hub + if r.notifier != nil { + r.notifier.NotifyAppRemoved(name, name) + } } func (r *Router) deleteStack(w http.ResponseWriter, req *http.Request, name string) { diff --git a/controller/internal/notify/notifier.go b/controller/internal/notify/notifier.go index 6113a33..31b170b 100644 --- a/controller/internal/notify/notifier.go +++ b/controller/internal/notify/notifier.go @@ -7,15 +7,15 @@ import ( "io" "log" "net/http" - "strings" "sync" "time" "gitea.dooplex.hu/admin/felhom-controller/internal/settings" ) -// Notifier sends notification events to the hub relay service. +// Notifier sends structured events to the hub via /api/v1/event. // Non-blocking: fires requests in goroutines, logs errors but doesn't retry aggressively. +// Cooldown logic is handled by the Hub — the controller sends all events unconditionally. type Notifier struct { hubURL string apiKey string @@ -26,11 +26,7 @@ type Notifier struct { settings *settings.Settings mu sync.Mutex - cooldowns map[string]time.Time // event_type -> last notification time - perEventCooldown map[string]time.Duration // per-event override cooldown durations - - // prevHealthStatus tracks the previous health check status for change detection - prevHealthStatus string + prevHealthStatus string // tracks previous health check status for change detection } // New creates a new Notifier. Returns a no-op notifier if hub is not enabled. @@ -50,11 +46,6 @@ func New(hubURL, apiKey, customerID string, sett *settings.Settings, logger *log logger: logger, enabled: enabled, settings: sett, - cooldowns: make(map[string]time.Time), - perEventCooldown: map[string]time.Duration{ - "storage_disconnected": 1 * time.Hour, - "storage_reconnected": 1 * time.Hour, - }, } } @@ -63,17 +54,310 @@ func (n *Notifier) IsEnabled() bool { return n.enabled } -// preferencesRequest is the JSON payload sent to the hub preferences endpoint. +// ── Detail structs ─────────────────────────────────────────────────── + +// BackupDetails holds structured data for backup events. +type BackupDetails struct { + DriveCount int `json:"drive_count,omitempty"` + SnapshotID string `json:"snapshot_id,omitempty"` + DurationSec int `json:"duration_sec,omitempty"` + DataAdded string `json:"data_added,omitempty"` + Error string `json:"error,omitempty"` +} + +// DBDumpDetails holds structured data for DB dump events. +type DBDumpDetails struct { + DatabaseCount int `json:"database_count,omitempty"` + TotalSize string `json:"total_size,omitempty"` + DurationSec int `json:"duration_sec,omitempty"` + Error string `json:"error,omitempty"` +} + +// DiskDetails holds structured data for disk warning/critical events. +type DiskDetails struct { + Mount string `json:"mount,omitempty"` + UsagePercent float64 `json:"usage_percent,omitempty"` + Label string `json:"label,omitempty"` +} + +// HealthDetails holds structured data for health events. +type HealthDetails struct { + PreviousStatus string `json:"previous_status,omitempty"` + CurrentStatus string `json:"current_status,omitempty"` + Issues []string `json:"issues,omitempty"` + Warnings []string `json:"warnings,omitempty"` +} + +// StorageDetails holds structured data for storage events. +type StorageDetails struct { + DrivePath string `json:"drive_path,omitempty"` + Label string `json:"label,omitempty"` + StoppedApps []string `json:"stopped_apps,omitempty"` +} + +// UpdateDetails holds structured data for controller update events. +type UpdateDetails struct { + FromVersion string `json:"from_version,omitempty"` + ToVersion string `json:"to_version,omitempty"` + Error string `json:"error,omitempty"` +} + +// AppDetails holds structured data for app lifecycle events. +type AppDetails struct { + StackName string `json:"stack_name,omitempty"` + DisplayName string `json:"display_name,omitempty"` +} + +// CrossDriveDetails holds structured data for cross-drive backup events. +type CrossDriveDetails struct { + StackName string `json:"stack_name,omitempty"` + Method string `json:"method,omitempty"` + DestPath string `json:"dest_path,omitempty"` + Duration string `json:"duration,omitempty"` + Error string `json:"error,omitempty"` +} + +// ── Core event push ────────────────────────────────────────────────── + +// eventRequest is the JSON payload sent to /api/v1/event. +type eventRequest struct { + CustomerID string `json:"customer_id"` + EventType string `json:"event_type"` + Severity string `json:"severity"` + Message string `json:"message"` + Details json.RawMessage `json:"details,omitempty"` +} + +// PushEvent sends a structured event to the hub's /api/v1/event endpoint. +// Non-blocking (goroutine). Retries twice with 3s backoff. +// details may be nil (omitted from JSON) or a struct that marshals to JSON. +func (n *Notifier) PushEvent(eventType, severity, message string, details interface{}) { + if !n.enabled { + return + } + + var detailsJSON json.RawMessage + if details != nil { + b, err := json.Marshal(details) + if err != nil { + n.logger.Printf("[WARN] PushEvent: failed to marshal details for %s: %v", eventType, err) + } else { + detailsJSON = b + } + } + + payload := eventRequest{ + CustomerID: n.customerID, + EventType: eventType, + Severity: severity, + Message: message, + Details: detailsJSON, + } + + jsonData, err := json.Marshal(payload) + if err != nil { + n.logger.Printf("[ERROR] PushEvent: marshal failed for %s: %v", eventType, err) + return + } + + go func() { + url := n.hubURL + "/api/v1/event" + var lastErr error + for attempt := 0; attempt < 3; attempt++ { + if attempt > 0 { + time.Sleep(3 * time.Second) + } + + req, err := http.NewRequest("POST", url, bytes.NewReader(jsonData)) + if err != nil { + lastErr = err + continue + } + req.Header.Set("Authorization", "Bearer "+n.apiKey) + req.Header.Set("Content-Type", "application/json") + + resp, err := n.httpClient.Do(req) + if err != nil { + lastErr = err + continue + } + io.Copy(io.Discard, resp.Body) + resp.Body.Close() + + if resp.StatusCode >= 200 && resp.StatusCode < 300 { + n.logger.Printf("[INFO] Event pushed: %s (%s) — %s", eventType, severity, message) + return + } + lastErr = fmt.Errorf("HTTP %d", resp.StatusCode) + } + n.logger.Printf("[WARN] Event push failed after 3 attempts (%s/%s): %v", eventType, severity, lastErr) + }() +} + +// ── Convenience methods ────────────────────────────────────────────── + +// NotifyHealthChange checks if health status changed and sends appropriate events. +// Detects both degradation (ok→warn, ok→fail, warn→fail) and recovery (fail→ok, warn→ok, fail→warn). +func (n *Notifier) NotifyHealthChange(status string, issues, warnings []string) { + if !n.enabled { + return + } + + n.mu.Lock() + prev := n.prevHealthStatus + n.prevHealthStatus = status + n.mu.Unlock() + + if prev == "" { + return // First run, just record status + } + if status == prev { + return + } + + details := HealthDetails{ + PreviousStatus: prev, + CurrentStatus: status, + Issues: issues, + Warnings: warnings, + } + + prevRank := statusRank(prev) + newRank := statusRank(status) + + if newRank > prevRank { + // Degradation + if status == "fail" { + n.PushEvent("health_critical", "error", + fmt.Sprintf("Rendszer állapot kritikus (volt: %s)", prev), details) + } else if status == "warn" { + n.PushEvent("health_degraded", "warning", + fmt.Sprintf("Rendszer állapot romlott (volt: %s)", prev), details) + } + } else { + // Recovery + n.PushEvent("health_recovered", "info", + fmt.Sprintf("Rendszer állapot helyreállt: %s (volt: %s)", status, prev), details) + } +} + +// NotifyBackupFailed sends a backup failure event. +func (n *Notifier) NotifyBackupFailed(message, errMsg string) { + n.PushEvent("backup_failed", "error", message, BackupDetails{Error: errMsg}) +} + +// NotifyBackupCompleted sends a backup success event. +func (n *Notifier) NotifyBackupCompleted(details BackupDetails) { + n.PushEvent("backup_completed", "info", "Biztonsági mentés elkészült", details) +} + +// NotifyDBDumpFailed sends a DB dump failure event. +func (n *Notifier) NotifyDBDumpFailed(message, errMsg string) { + n.PushEvent("db_dump_failed", "error", message, DBDumpDetails{Error: errMsg}) +} + +// NotifyDBDumpCompleted sends a DB dump success event. +func (n *Notifier) NotifyDBDumpCompleted(details DBDumpDetails) { + n.PushEvent("db_dump_completed", "info", "Adatbázis mentés elkészült", details) +} + +// NotifyIntegrityFailed sends a backup integrity check failure event. +func (n *Notifier) NotifyIntegrityFailed(message, errMsg string) { + n.PushEvent("backup_integrity_failed", "error", message, &BackupDetails{Error: errMsg}) +} + +// NotifyIntegrityOK sends a backup integrity check success event. +func (n *Notifier) NotifyIntegrityOK(message string) { + n.PushEvent("backup_integrity_ok", "info", message, nil) +} + +// NotifyControllerUpdated sends a controller update event. +func (n *Notifier) NotifyControllerUpdated(fromVer, toVer string, success bool) { + severity := "info" + msg := fmt.Sprintf("Controller frissítve: %s → %s", fromVer, toVer) + details := UpdateDetails{FromVersion: fromVer, ToVersion: toVer} + if !success { + severity = "error" + msg = fmt.Sprintf("Controller frissítés sikertelen: %s → %s", fromVer, toVer) + } + n.PushEvent("controller_updated", severity, msg, details) +} + +// NotifyControllerStarted sends a controller startup event. +func (n *Notifier) NotifyControllerStarted(version string) { + n.PushEvent("controller_started", "info", + fmt.Sprintf("Controller elindult (v%s)", version), nil) +} + +// NotifyStorageDisconnected sends a drive disconnection event. +func (n *Notifier) NotifyStorageDisconnected(label string, stoppedApps []string) { + msg := fmt.Sprintf("Meghajtó váratlanul leválasztva: %s", label) + n.PushEvent("storage_disconnected", "error", msg, StorageDetails{ + Label: label, + StoppedApps: stoppedApps, + }) +} + +// NotifyStorageReconnected sends a drive reconnection event. +func (n *Notifier) NotifyStorageReconnected(label string) { + n.PushEvent("storage_reconnected", "info", + fmt.Sprintf("Meghajtó újra csatlakoztatva: %s", label), StorageDetails{Label: label}) +} + +// NotifyAppDeployed sends an app deployment event. +func (n *Notifier) NotifyAppDeployed(stackName, displayName string) { + n.PushEvent("app_deployed", "info", + fmt.Sprintf("Alkalmazás telepítve: %s", displayName), + AppDetails{StackName: stackName, DisplayName: displayName}) +} + +// NotifyAppRemoved sends an app removal event. +func (n *Notifier) NotifyAppRemoved(stackName, displayName string) { + n.PushEvent("app_removed", "info", + fmt.Sprintf("Alkalmazás eltávolítva: %s", displayName), + AppDetails{StackName: stackName, DisplayName: displayName}) +} + +// NotifyCrossDriveCompleted sends a cross-drive backup success event. +func (n *Notifier) NotifyCrossDriveCompleted(details CrossDriveDetails) { + n.PushEvent("crossdrive_completed", "info", + fmt.Sprintf("Másodlagos mentés elkészült: %s", details.StackName), details) +} + +// NotifyCrossDriveFailed sends a cross-drive backup failure event. +func (n *Notifier) NotifyCrossDriveFailed(details CrossDriveDetails) { + n.PushEvent("crossdrive_failed", "error", + fmt.Sprintf("Másodlagos mentés sikertelen: %s", details.StackName), details) +} + +// NotifyDRStarted sends a disaster recovery start event. +func (n *Notifier) NotifyDRStarted(appCount int) { + n.PushEvent("disaster_recovery_started", "warning", + fmt.Sprintf("Katasztrófa helyreállítás elindítva (%d alkalmazás)", appCount), nil) +} + +// NotifyDRCompleted sends a disaster recovery completion event. +func (n *Notifier) NotifyDRCompleted(successCount, failCount int) { + severity := "info" + if failCount > 0 { + severity = "warning" + } + n.PushEvent("disaster_recovery_completed", severity, + fmt.Sprintf("Katasztrófa helyreállítás befejezve (%d sikeres, %d sikertelen)", successCount, failCount), nil) +} + +// ── Preferences sync ───────────────────────────────────────────────── + type preferencesRequest struct { CustomerID string `json:"customer_id"` Email string `json:"email"` EnabledEvents []string `json:"enabled_events"` + CooldownHours int `json:"cooldown_hours,omitempty"` } // SyncPreferences pushes the current notification preferences to the hub. -// Called after the user saves notification settings on the settings page. // Synchronous — returns error for the handler to display to the user. -func (n *Notifier) SyncPreferences(email string, enabledEvents []string) error { +func (n *Notifier) SyncPreferences(email string, enabledEvents []string, cooldownHours int) error { if !n.enabled { return fmt.Errorf("hub nem konfigurált") } @@ -82,6 +366,7 @@ func (n *Notifier) SyncPreferences(email string, enabledEvents []string) error { CustomerID: n.customerID, Email: email, EnabledEvents: enabledEvents, + CooldownHours: cooldownHours, } jsonData, err := json.Marshal(payload) @@ -108,172 +393,23 @@ func (n *Notifier) SyncPreferences(email string, enabledEvents []string) error { return fmt.Errorf("hub hiba (%d): %s", resp.StatusCode, string(body)) } - n.logger.Printf("[INFO] Notification preferences synced to hub: email=%s, events=%v", email, enabledEvents) + n.logger.Printf("[INFO] Notification preferences synced to hub: email=%s, events=%v, cooldown=%dh", email, enabledEvents, cooldownHours) return nil } -// notifyRequest is the JSON payload sent to the hub. -type notifyRequest struct { - CustomerID string `json:"customer_id"` - EventType string `json:"event_type"` - Severity string `json:"severity"` - Message string `json:"message"` - Details string `json:"details,omitempty"` -} +// ── Test notification ──────────────────────────────────────────────── -// Notify sends a notification event to the hub relay. -// Checks local cooldown and event preferences before sending. -// Non-blocking: fires the HTTP request in a goroutine. -func (n *Notifier) Notify(eventType, severity, message, details string) { - if !n.enabled { - return - } - - prefs := n.settings.GetNotificationPrefs() - if prefs.Email == "" { - return // No email configured, skip - } - - // Check if event is enabled in preferences - eventEnabled := false - for _, e := range prefs.EnabledEvents { - if e == eventType { - eventEnabled = true - break - } - } - if !eventEnabled { - return - } - - // Check cooldown — per-event override takes priority over global - cooldownDuration := time.Duration(prefs.CooldownHours) * time.Hour - if cooldownDuration == 0 { - cooldownDuration = 6 * time.Hour - } - if override, ok := n.perEventCooldown[eventType]; ok { - cooldownDuration = override - } - - n.mu.Lock() - lastSent, exists := n.cooldowns[eventType] - if exists && time.Since(lastSent) < cooldownDuration { - n.mu.Unlock() - n.logger.Printf("[DEBUG] Notification cooldown active for %s (sent %s ago)", eventType, time.Since(lastSent).Round(time.Minute)) - return - } - n.cooldowns[eventType] = time.Now() - n.mu.Unlock() - - // Fire the notification in a goroutine (non-blocking) - go func() { - payload := notifyRequest{ - CustomerID: n.customerID, - EventType: eventType, - Severity: severity, - Message: message, - Details: details, - } - - jsonData, err := json.Marshal(payload) - if err != nil { - n.logger.Printf("[ERROR] Failed to marshal notification: %v", err) - return - } - - url := n.hubURL + "/api/v1/notify" - req, err := http.NewRequest("POST", url, bytes.NewReader(jsonData)) - if err != nil { - n.logger.Printf("[ERROR] Failed to create notification request: %v", err) - return - } - req.Header.Set("Authorization", "Bearer "+n.apiKey) - req.Header.Set("Content-Type", "application/json") - - resp, err := n.httpClient.Do(req) - if err != nil { - n.logger.Printf("[WARN] Failed to send notification to hub: %v", err) - return - } - defer resp.Body.Close() - - if resp.StatusCode >= 400 { - n.logger.Printf("[WARN] Hub notification returned %d for %s/%s", resp.StatusCode, eventType, severity) - return - } - - n.logger.Printf("[INFO] Notification sent: %s (%s) — %s", eventType, severity, message) - }() -} - -// NotifyHealthChange checks if health status changed and sends appropriate notifications. -// Call this after each health check with the new report status/issues/warnings. -func (n *Notifier) NotifyHealthChange(status string, issues, warnings []string) { - if !n.enabled { - return - } - - prev := n.prevHealthStatus - n.prevHealthStatus = status - - // Only notify on status degradation (ok→warn, ok→fail, warn→fail) - if prev == "" { - return // First run, just record status - } - if statusRank(status) <= statusRank(prev) { - return // Status improved or stayed the same - } - - // Notify about each issue/warning - for _, issue := range issues { - n.Notify("container_unhealthy", "critical", issue, "") - } - for _, w := range warnings { - // Determine specific event type from warning message - eventType := classifyWarning(w) - n.Notify(eventType, "warning", w, "") - } -} - -// NotifyBackupFailed sends a notification about a backup failure. -func (n *Notifier) NotifyBackupFailed(message, details string) { - n.Notify("backup_failed", "critical", message, details) -} - -// NotifyDBDumpFailed sends a notification about a database dump failure. -func (n *Notifier) NotifyDBDumpFailed(message, details string) { - n.Notify("db_dump_failed", "critical", message, details) -} - -// NotifyIntegrityFailed sends a notification about a backup integrity check failure. -func (n *Notifier) NotifyIntegrityFailed(message, details string) { - n.Notify("integrity_failed", "warning", message, details) -} - -// NotifyUpdateSuccess sends a notification about a successful controller update. -func (n *Notifier) NotifyUpdateSuccess(fromVer, toVer string) { - n.Notify("update_success", "info", - fmt.Sprintf("Controller frissítve: %s → %s", fromVer, toVer), "") -} - -// NotifyUpdateFailed sends a notification about a failed controller update. -func (n *Notifier) NotifyUpdateFailed(targetVer, errMsg string) { - n.Notify("update_failed", "warning", - fmt.Sprintf("Controller frissítés sikertelen: %s — %s", targetVer, errMsg), "") -} - -// SendTest sends a test notification for verifying the notification flow. +// SendTest sends a test event for verifying the notification flow (synchronous). func (n *Notifier) SendTest() error { if !n.enabled { return fmt.Errorf("notifications not enabled (hub not configured)") } - payload := notifyRequest{ + payload := eventRequest{ CustomerID: n.customerID, EventType: "test", Severity: "info", Message: "Teszt értesítés a Felhom rendszerből", - Details: "Ha ezt az emailt megkapta, az értesítések megfelelően működnek.", } jsonData, err := json.Marshal(payload) @@ -281,7 +417,7 @@ func (n *Notifier) SendTest() error { return fmt.Errorf("marshal: %w", err) } - url := n.hubURL + "/api/v1/notify" + url := n.hubURL + "/api/v1/event" req, err := http.NewRequest("POST", url, bytes.NewReader(jsonData)) if err != nil { return fmt.Errorf("request: %w", err) @@ -302,6 +438,59 @@ func (n *Notifier) SendTest() error { return nil } +// ── Backward compatibility ─────────────────────────────────────────── + +// notifyRequest is the JSON payload for the legacy /api/v1/notify endpoint. +type notifyRequest struct { + CustomerID string `json:"customer_id"` + EventType string `json:"event_type"` + Severity string `json:"severity"` + Message string `json:"message"` + Details string `json:"details,omitempty"` +} + +// Notify sends a legacy notification to /api/v1/notify (backward compat). +// Kept for old Hub instances that don't support /api/v1/event yet. +// No local cooldown — Hub handles cooldowns. +func (n *Notifier) Notify(eventType, severity, message, details string) { + if !n.enabled { + return + } + + go func() { + payload := notifyRequest{ + CustomerID: n.customerID, + EventType: eventType, + Severity: severity, + Message: message, + Details: details, + } + + jsonData, err := json.Marshal(payload) + if err != nil { + n.logger.Printf("[ERROR] Failed to marshal notification: %v", err) + return + } + + url := n.hubURL + "/api/v1/notify" + req, err := http.NewRequest("POST", url, bytes.NewReader(jsonData)) + if err != nil { + return + } + req.Header.Set("Authorization", "Bearer "+n.apiKey) + req.Header.Set("Content-Type", "application/json") + + resp, err := n.httpClient.Do(req) + if err != nil { + return + } + io.Copy(io.Discard, resp.Body) + resp.Body.Close() + }() +} + +// ── Helpers ────────────────────────────────────────────────────────── + func statusRank(status string) int { switch status { case "ok": @@ -315,41 +504,3 @@ func statusRank(status string) int { } } -func classifyWarning(message string) string { - // Try to classify the warning message into a specific event type - switch { - case contains(message, "disk") || contains(message, "Disk") || contains(message, "SSD") || contains(message, "HDD"): - if contains(message, "critical") || contains(message, "Critical") { - return "disk_critical" - } - return "disk_warning" - case contains(message, "Memory") || contains(message, "memory"): - return "disk_warning" // group memory under system warnings - case contains(message, "Temperature") || contains(message, "temperature"): - return "disk_warning" // group temp under system warnings - case contains(message, "container") || contains(message, "Container"): - return "container_unhealthy" - default: - return "disk_warning" // fallback to generic system warning - } -} - -func contains(s, substr string) bool { - return strings.Contains(s, substr) -} - -// NotifyStorageDisconnected sends a notification about a drive disconnection. -func (n *Notifier) NotifyStorageDisconnected(label string, stoppedApps []string) { - msg := fmt.Sprintf("Meghajtó váratlanul leválasztva: %s", label) - details := "" - if len(stoppedApps) > 0 { - details = fmt.Sprintf("Leállított alkalmazások: %s", strings.Join(stoppedApps, ", ")) - } - n.Notify("storage_disconnected", "critical", msg, details) -} - -// NotifyStorageReconnected sends a notification about a drive reconnection. -func (n *Notifier) NotifyStorageReconnected(label string) { - n.Notify("storage_reconnected", "info", - fmt.Sprintf("Meghajtó újra csatlakoztatva: %s. Az alkalmazások manuálisan indíthatók.", label), "") -} diff --git a/controller/internal/report/pusher.go b/controller/internal/report/pusher.go index 100a8ff..09eade1 100644 --- a/controller/internal/report/pusher.go +++ b/controller/internal/report/pusher.go @@ -8,11 +8,20 @@ import ( "log" "net/http" "strings" + "sync" "time" "gitea.dooplex.hu/admin/felhom-controller/internal/config" ) +// PushStatus tracks the last hub push attempt and result. +type PushStatus struct { + LastAttempt time.Time + LastSuccess time.Time + LastError string + Consecutive int // consecutive failures +} + // Pusher sends reports to the central hub. type Pusher struct { hubURL string @@ -20,6 +29,9 @@ type Pusher struct { httpClient *http.Client logger *log.Logger enabled bool + + statusMu sync.RWMutex + status PushStatus } // NewPusher creates a new report pusher from hub configuration. @@ -48,6 +60,10 @@ func (p *Pusher) Push(report *Report) error { url := p.hubURL + "/api/v1/report" + p.statusMu.Lock() + p.status.LastAttempt = time.Now() + p.statusMu.Unlock() + var lastErr error for attempt := 0; attempt < 3; attempt++ { if attempt > 0 { @@ -74,14 +90,31 @@ func (p *Pusher) Push(report *Report) error { if resp.StatusCode >= 200 && resp.StatusCode < 300 { p.logger.Printf("[INFO] Hub report pushed successfully (%d bytes)", len(data)) + p.statusMu.Lock() + p.status.LastSuccess = time.Now() + p.status.LastError = "" + p.status.Consecutive = 0 + p.statusMu.Unlock() return nil } lastErr = fmt.Errorf("HTTP %d", resp.StatusCode) } + p.statusMu.Lock() + p.status.LastError = lastErr.Error() + p.status.Consecutive++ + p.statusMu.Unlock() + return fmt.Errorf("hub push failed after 3 attempts: %w", lastErr) } +// GetStatus returns a snapshot of the current push status. +func (p *Pusher) GetStatus() PushStatus { + p.statusMu.RLock() + defer p.statusMu.RUnlock() + return p.status +} + // PushInfraBackup sends the infrastructure backup payload to the Hub. // Uses the same retry logic as Push. func (p *Pusher) PushInfraBackup(data []byte) error { diff --git a/controller/internal/settings/settings.go b/controller/internal/settings/settings.go index ff253e5..f4b6ad9 100644 --- a/controller/internal/settings/settings.go +++ b/controller/internal/settings/settings.go @@ -85,11 +85,15 @@ type NotificationPrefs struct { // DefaultEnabledEvents are the events enabled by default for new customers. var DefaultEnabledEvents = []string{ - "disk_warning", "backup_failed", - "update_available", + "db_dump_failed", + "disk_warning", + "disk_critical", "storage_disconnected", - "storage_reconnected", + "node_down", + "health_critical", + "expected_backup_missed", + "expected_dbdump_missed", } // DBValidationCache holds cached DB dump validation results. diff --git a/controller/internal/web/alerts.go b/controller/internal/web/alerts.go index 63ed501..58fd4c2 100644 --- a/controller/internal/web/alerts.go +++ b/controller/internal/web/alerts.go @@ -5,6 +5,7 @@ import ( "log" "strings" "sync" + "time" "gitea.dooplex.hu/admin/felhom-controller/internal/backup" "gitea.dooplex.hu/admin/felhom-controller/internal/config" @@ -27,9 +28,10 @@ type Alert struct { // Alerts are state-based (not event-based) — they reflect current system state // and are regenerated after each health check cycle. type AlertManager struct { - mu sync.RWMutex - alerts []Alert - logger *log.Logger + mu sync.RWMutex + alerts []Alert + logger *log.Logger + hubPushStatusFn func() HubPushStatusData } // NewAlertManager creates a new AlertManager. @@ -39,6 +41,13 @@ func NewAlertManager(logger *log.Logger) *AlertManager { } } +// SetHubPushStatus sets the hub push status callback for generating hub alerts. +func (am *AlertManager) SetHubPushStatus(fn func() HubPushStatusData) { + am.mu.Lock() + am.hubPushStatusFn = fn + am.mu.Unlock() +} + // Refresh regenerates alerts from the latest health check report and config state. // Called after each health check cycle (every 5 minutes) and on storage state changes. func (am *AlertManager) Refresh(report *monitor.HealthReport, cfg *config.Config, backupMgr *backup.Manager, updateAvailable bool, latestVersion string, storagePaths ...[]settings.StoragePath) { @@ -92,14 +101,22 @@ func (am *AlertManager) Refresh(report *monitor.HealthReport, cfg *config.Config alerts = append(alerts, alert) } - // Missing ping UUIDs - if cfg.Monitoring.Enabled { - missing := countMissingPings(cfg) - if missing > 0 { + // Hub connection status + if !cfg.Hub.Enabled || cfg.Hub.URL == "" { + alerts = append(alerts, Alert{ + ID: "hub-disabled", + Level: "warning", + Message: "Hub kapcsolat kikapcsolva — a központi monitoring nem aktív", + Link: "/monitoring", + LinkText: "Rendszermonitor", + }) + } else if am.hubPushStatusFn != nil { + ps := am.hubPushStatusFn() + if ps.LastError != "" && (ps.LastSuccess.IsZero() || time.Since(ps.LastSuccess) > 30*time.Minute) { alerts = append(alerts, Alert{ - ID: "pings-missing", - Level: "warning", - Message: fmt.Sprintf("%d monitoring ellenőrzés nincs beállítva", missing), + ID: "hub-unreachable", + Level: "error", + Message: fmt.Sprintf("Hub nem elérhető — utolsó hiba: %s", ps.LastError), Link: "/monitoring", LinkText: "Rendszermonitor", }) @@ -200,24 +217,6 @@ func (am *AlertManager) GetInlineAlerts(page string) []Alert { return result } -// countMissingPings counts how many ping UUIDs are not configured. -func countMissingPings(cfg *config.Config) int { - count := 0 - uuids := []string{ - cfg.Monitoring.PingUUIDs.Heartbeat, - cfg.Monitoring.PingUUIDs.SystemHealth, - cfg.Monitoring.PingUUIDs.DBDump, - cfg.Monitoring.PingUUIDs.Backup, - cfg.Monitoring.PingUUIDs.BackupIntegrity, - } - for _, uuid := range uuids { - if !isPingConfigured(uuid) { - count++ - } - } - return count -} - // simpleHash returns a short deterministic hash for deduplication. func simpleHash(s string) string { h := uint32(0) diff --git a/controller/internal/web/handler_restore.go b/controller/internal/web/handler_restore.go index df7bd48..2be9cd5 100644 --- a/controller/internal/web/handler_restore.go +++ b/controller/internal/web/handler_restore.go @@ -110,6 +110,18 @@ func (s *Server) executeAllRestores() { return } + // Count pending apps and push DR start event + pendingCount := 0 + for _, app := range plan.Apps { + if app.Status == "pending" { + pendingCount++ + } + } + if s.notifier != nil { + s.notifier.NotifyDRStarted(pendingCount) + } + + successCount, failCount := 0, 0 for i := range plan.Apps { app := &plan.Apps[i] if app.Status != "pending" { @@ -126,15 +138,22 @@ func (s *Server) executeAllRestores() { if err != nil { plan.UpdateApp(app.Name, "failed", err.Error()) s.logger.Printf("[ERROR] Restore failed for %s: %v", app.Name, err) + failCount++ } else { plan.UpdateApp(app.Name, "done", "") s.logger.Printf("[INFO] Restore completed for %s", app.Name) + successCount++ } } plan.SetStatus("done") s.logger.Println("[INFO] All app restores completed") + // Push DR completion event + if s.notifier != nil { + s.notifier.NotifyDRCompleted(successCount, failCount) + } + // Re-scan stacks so dashboard picks up restored apps if s.stackMgr != nil { if err := s.stackMgr.ScanStacks(); err != nil { diff --git a/controller/internal/web/handlers.go b/controller/internal/web/handlers.go index ab68b86..c914274 100644 --- a/controller/internal/web/handlers.go +++ b/controller/internal/web/handlers.go @@ -411,21 +411,36 @@ func (s *Server) monitoringHandler(w http.ResponseWriter, _ *http.Request) { data["SystemInfo"] = system.GetInfo(s.primaryHDDPath(), s.cpuCollector) data["StorageBars"] = s.buildStorageBars() - // On monitoring page, exclude the "pings-missing" alert since the detailed table is visible if s.alertManager != nil { - data["Alerts"] = s.alertManager.GetAlerts("pings-missing") + data["Alerts"] = s.alertManager.GetAlerts() data["DiskWarnings"] = s.alertManager.GetInlineAlerts("monitoring") } - // Ping status section + // Hub connection status section + data["HubEnabled"] = s.cfg.Hub.Enabled && s.cfg.Hub.URL != "" + data["HubURL"] = s.cfg.Hub.URL + data["CustomerID"] = s.cfg.Customer.ID + + if s.hubPushStatusFn != nil { + ps := s.hubPushStatusFn() + data["HubLastAttempt"] = ps.LastAttempt + data["HubLastSuccess"] = ps.LastSuccess + data["HubLastError"] = ps.LastError + data["HubConsecutiveFailures"] = ps.Consecutive + // Connected if last success was within 2x the push interval (or 30min default) + connected := !ps.LastSuccess.IsZero() && time.Since(ps.LastSuccess) < 30*time.Minute + data["HubConnected"] = connected + } + + // Legacy ping status section (still shown for backward compat during transition) data["MonitoringEnabled"] = s.cfg.Monitoring.Enabled if s.cfg.Monitoring.Enabled { pings := []map[string]interface{}{ - {"Label": "Életjel (Heartbeat)", "Icon": "💓", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.Heartbeat), "Schedule": "5 percenként"}, - {"Label": "Rendszer állapot", "Icon": "🖥️", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.SystemHealth), "Schedule": "5 percenként"}, - {"Label": "Adatbázis mentés", "Icon": "🗄️", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.DBDump), "Schedule": "Naponta " + s.cfg.Backup.DBDumpSchedule}, - {"Label": "Biztonsági mentés", "Icon": "💾", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.Backup), "Schedule": "Naponta " + s.cfg.Backup.ResticSchedule}, - {"Label": "Mentés integritás", "Icon": "🔍", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.BackupIntegrity), "Schedule": "Hetente (vasárnap)"}, + {"Label": "Eletjel (Heartbeat)", "Icon": "heartbeat", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.Heartbeat), "Schedule": "5 percenkent"}, + {"Label": "Rendszer allapot", "Icon": "system", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.SystemHealth), "Schedule": "5 percenkent"}, + {"Label": "Adatbazis mentes", "Icon": "db", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.DBDump), "Schedule": "Naponta " + s.cfg.Backup.DBDumpSchedule}, + {"Label": "Biztonsagi mentes", "Icon": "backup", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.Backup), "Schedule": "Naponta " + s.cfg.Backup.ResticSchedule}, + {"Label": "Mentes integritas", "Icon": "integrity", "Configured": isPingConfigured(s.cfg.Monitoring.PingUUIDs.BackupIntegrity), "Schedule": "Hetente (vasarnap)"}, } allConfigured := true for _, p := range pings { @@ -1076,11 +1091,24 @@ func (s *Server) settingsNotificationsHandler(w http.ResponseWriter, r *http.Req // Collect enabled events from checkboxes var enabledEvents []string - for _, evt := range []string{"disk_warning", "backup_failed", "update_available", "security_update"} { + // Single-event checkboxes + for _, evt := range []string{ + "backup_failed", "db_dump_failed", "backup_integrity_failed", + "crossdrive_failed", "storage_disconnected", + "node_down", "health_critical", + "storage_reconnected", "health_recovered", + } { if r.FormValue("event_"+evt) == "on" { enabledEvents = append(enabledEvents, evt) } } + // Compound toggles: one checkbox → two event types + if r.FormValue("event_disk_alerts") == "on" { + enabledEvents = append(enabledEvents, "disk_warning", "disk_critical") + } + if r.FormValue("event_expected_missed") == "on" { + enabledEvents = append(enabledEvents, "expected_backup_missed", "expected_dbdump_missed") + } prefs := &settings.NotificationPrefs{ Email: email, @@ -1101,7 +1129,7 @@ func (s *Server) settingsNotificationsHandler(w http.ResponseWriter, r *http.Req // Sync preferences to hub data := s.settingsData() if s.notifier != nil && s.notifier.IsEnabled() { - if err := s.notifier.SyncPreferences(email, enabledEvents); err != nil { + if err := s.notifier.SyncPreferences(email, enabledEvents, cooldownHours); err != nil { s.logger.Printf("[WARN] Failed to sync preferences to hub: %v", err) data["NotificationSuccess"] = fmt.Sprintf("Értesítési beállítások mentve (helyi). A központi szinkronizálás sikertelen: %v", err) } else { diff --git a/controller/internal/web/server.go b/controller/internal/web/server.go index 19fc9a0..a64edbf 100644 --- a/controller/internal/web/server.go +++ b/controller/internal/web/server.go @@ -9,6 +9,7 @@ import ( "path/filepath" "strings" "sync" + "time" "gitea.dooplex.hu/admin/felhom-controller/internal/backup" "gitea.dooplex.hu/admin/felhom-controller/internal/config" @@ -57,6 +58,9 @@ type Server struct { // Storage watchdog (set after construction to break init ordering) storageWatchdog *monitor.StorageWatchdog + + // Hub push status callback — set via SetHubPushStatus for monitoring page + hubPushStatusFn func() HubPushStatusData } func NewServer(cfg *config.Config, stackMgr *stacks.Manager, cpuCollector *system.CPUCollector, backupMgr *backup.Manager, crossDrive *backup.CrossDriveRunner, sched *scheduler.Scheduler, sett *settings.Settings, alertMgr *AlertManager, notif *notify.Notifier, updater *selfupdate.Updater, logger *log.Logger, version string) *Server { @@ -117,6 +121,19 @@ func (s *Server) SetDriveMigrator(dm *storage.DriveMigrator) { s.driveMigrator = dm } +// HubPushStatusData holds hub push status for the monitoring page. +type HubPushStatusData struct { + LastAttempt time.Time + LastSuccess time.Time + LastError string + Consecutive int +} + +// SetHubPushStatus sets the hub push status callback for the monitoring page. +func (s *Server) SetHubPushStatus(fn func() HubPushStatusData) { + s.hubPushStatusFn = fn +} + // InRestoreMode returns true if the server is in DR restore mode. func (s *Server) InRestoreMode() bool { s.restoreMu.RLock() diff --git a/controller/internal/web/templates/monitoring.html b/controller/internal/web/templates/monitoring.html index bcce5f1..8fb7b51 100644 --- a/controller/internal/web/templates/monitoring.html +++ b/controller/internal/web/templates/monitoring.html @@ -86,33 +86,44 @@ {{end}} - +
-

Távoli monitoring

- {{if not .MonitoringEnabled}} -
- ⚠️ A távoli monitoring ki van kapcsolva. Az üzemeltető nem kap értesítést hibák esetén. -
- {{else}} - {{if .AllPingsConfigured}} +

Hub kapcsolat

+ {{if .HubEnabled}} + {{if .HubConnected}}
- ✅ Minden távoli monitoring aktív — az üzemeltető értesítést kap hibák esetén. + Kapcsolódva — a központi rendszer aktívan figyeli a szervert.
{{else}} -
- ⚠️ Egyes monitoring ellenőrzések nincsenek beállítva. Kérd az üzemeltetőt a konfiguráláshoz. +
+ Nem elérhető — a központi rendszer nem kapott friss jelentést.
{{end}}
- {{range .PingStatus}}
- {{.Icon}} {{.Label}} - - {{if .Configured}}✅ Beállítva{{else}}⚠️ Nincs beállítva{{end}} - {{.Schedule}} - + Hub URL + {{.HubURL}} +
+
+ Ügyfél azonosító + {{.CustomerID}} +
+ {{if not .HubLastSuccess.IsZero}} +
+ Utolsó sikeres jelentés + {{.HubLastSuccess | timeAgo}}
{{end}} + {{if .HubLastError}} +
+ Utolsó hiba + {{.HubLastError}} +
+ {{end}} +
+ {{else}} +
+ A Hub kapcsolat nincs bekapcsolva — a központi monitoring nem aktív.
{{end}}
diff --git a/controller/internal/web/templates/settings.html b/controller/internal/web/templates/settings.html index e90009f..5b99df2 100644 --- a/controller/internal/web/templates/settings.html +++ b/controller/internal/web/templates/settings.html @@ -413,23 +413,56 @@ function pollUntilBack() { placeholder="pelda@email.hu" class="form-control">
- +
- + + + + + + +
+
+
+ +
+ +