feat: Hub monitoring takeover — event system, dead man's switch, notifications (v0.3.0)
Replace external Healthchecks.io with Hub-native monitoring. New events table + /api/v1/event endpoint for structured events from controllers. Staleness checker (60s) detects unresponsive nodes. Backup deadline checker (daily 05:00) catches missed backups. Notification dispatcher sends operator (English) + customer (Hungarian) emails via Resend with per-event cooldowns. Event timeline on customer page, dashboard badges. Config form deprecates Monitoring UUIDs section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,55 @@
|
|||||||
# Felhom Hub — Changelog
|
# Felhom Hub — Changelog
|
||||||
|
|
||||||
|
## v0.3.0 (2026-02-20)
|
||||||
|
|
||||||
|
**Hub Monitoring Takeover — Event System, Dead Man's Switch, Notifications**
|
||||||
|
|
||||||
|
Replaces external Healthchecks.io with a Hub-native event system. The Hub becomes the single source of truth for all customer monitoring, event tracking, dead man's switch alerting, and notification delivery.
|
||||||
|
|
||||||
|
### Phase 1 — Event System
|
||||||
|
- **`events` table** in SQLite: stores all events with customer_id, event_type, severity, message, details_json, source, timestamp
|
||||||
|
- **Indexes**: `idx_events_customer_created` (customer + time DESC), `idx_events_type` (type + time DESC)
|
||||||
|
- **Store methods**: `SaveEvent`, `GetRecentEvents`, `GetEventsByType`, `GetLatestEventByType`, `GetAllRecentEvents`, `CountEventsBySeverity`, `PruneEvents`, `GetActiveCustomerIDs`
|
||||||
|
- **`POST /api/v1/event`** endpoint: accepts structured events from controllers, validates event_type against 27 allowed types, validates severity (info/warning/error), stores in DB
|
||||||
|
- **Enhanced auth**: `checkAuthCustomer()` validates per-customer API keys match the customer_id in payload; global key bypasses ownership check
|
||||||
|
- **Prune**: events pruned alongside reports at 04:30 Budapest time
|
||||||
|
|
||||||
|
### Phase 2 — Dead Man's Switch
|
||||||
|
- **Staleness checker** (`internal/monitor/staleness.go`): runs every 60s, detects when controllers stop reporting
|
||||||
|
- ok→stale (>30min): inserts `node_stale` warning event
|
||||||
|
- any→down (>60min): inserts `node_down` error event
|
||||||
|
- stale/down→ok: inserts `node_recovered` info event
|
||||||
|
- Skips blocked customers, no false alerts on startup
|
||||||
|
- **Backup deadline checker** (`internal/monitor/deadline.go`): runs daily at 05:00 Budapest
|
||||||
|
- Detects missing `backup_completed` events since midnight → inserts `expected_backup_missed` error
|
||||||
|
- Detects missing `db_dump_completed` events → inserts `expected_dbdump_missed` error
|
||||||
|
- Grace: skips customers with `node_down` state
|
||||||
|
- **`scheduleDaily()`** helper: goroutine that sleeps until target time (Europe/Budapest), runs function, loops
|
||||||
|
- **`/healthz`** enhanced: returns 503 if SQLite Ping fails
|
||||||
|
|
||||||
|
### Phase 3 — Notification System
|
||||||
|
- **Dispatcher** (`internal/notify/dispatcher.go`): processes events and sends emails via Resend API
|
||||||
|
- **Operator channel**: English emails to operator for warning/error events, 1h cooldown per customer:eventType
|
||||||
|
- **Customer channel**: Hungarian emails per event_type, respects customer preferences (enabled_events, cooldown_hours), blocked customers skipped
|
||||||
|
- **Test bypass**: `test` event type skips cooldown/preferences, sends directly to customer email
|
||||||
|
- **Email templates** (`internal/notify/templates.go`): operator (concise English), customer (Hungarian per event type with complete message table)
|
||||||
|
- **Cooldown tracking**: in-memory maps with per-customer:eventType granularity
|
||||||
|
- **`customer_notifications` table**: added `cooldown_hours` column (default 6)
|
||||||
|
- **`notification_log` table**: added `channel` column (operator/customer)
|
||||||
|
- Wired into `/api/v1/event` handler and staleness/deadline checkers
|
||||||
|
|
||||||
|
### Phase 4 — Hub UI
|
||||||
|
- **Events section** on customer detail page: last 50 events, severity filter buttons (All/Errors/Warnings/Info), colored severity badges
|
||||||
|
- **Dashboard badges**: error+warning count in last 24h per customer, clickable to customer events
|
||||||
|
- **Notification log**: shows channel column (operator/customer) in customer detail page
|
||||||
|
- **Config form**: Monitoring UUIDs section marked as "Legacy" with deprecation notice, collapsed by default
|
||||||
|
|
||||||
|
### Phase 6 — Config Cleanup
|
||||||
|
- **`controller.yaml.default`**: `monitoring.ping_uuids` section commented out (deprecated)
|
||||||
|
- **`buildConfigJSON`**: only writes `ping_uuids` to config JSON if user explicitly provides UUID values (new configs get none)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## v0.2.2 (2026-02-20)
|
## v0.2.2 (2026-02-20)
|
||||||
|
|
||||||
**Config Hash Comparison**
|
**Config Hash Comparison**
|
||||||
|
|||||||
+41
-13
@@ -2,9 +2,9 @@
|
|||||||
|
|
||||||
**Central operator dashboard for monitoring and managing Felhom customer deployments.**
|
**Central operator dashboard for monitoring and managing Felhom customer deployments.**
|
||||||
|
|
||||||
A lightweight Go service that receives periodic reports from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery.
|
A lightweight Go service that receives periodic reports and structured events from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery, event-based dead man's switch monitoring, and notification dispatch.
|
||||||
|
|
||||||
**Current version: v0.2.2**
|
**Current version: v0.3.0**
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -72,14 +72,29 @@ The infra-backup payload contains everything needed to restore a customer deploy
|
|||||||
4. Controller uses disk UUIDs to auto-mount surviving drives
|
4. Controller uses disk UUIDs to auto-mount surviving drives
|
||||||
5. Controller restores apps from local backups on those drives
|
5. Controller restores apps from local backups on those drives
|
||||||
|
|
||||||
|
### Events
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `POST` | `/api/v1/event` | Controller pushes structured event (27 allowed types, severity: info/warning/error) |
|
||||||
|
|
||||||
|
Events are the primary monitoring mechanism. Each event has: customer_id, event_type, severity, message, details_json, source. Per-customer API keys are validated against the customer_id in the payload. Stored in the `events` table with automatic pruning.
|
||||||
|
|
||||||
|
**Hub-generated events** (source="hub"):
|
||||||
|
- `node_stale` / `node_down` / `node_recovered` — dead man's switch from staleness checker (every 60s)
|
||||||
|
- `expected_backup_missed` / `expected_dbdump_missed` — backup deadline checker (daily at 05:00 Budapest)
|
||||||
|
|
||||||
### Notifications
|
### Notifications
|
||||||
|
|
||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
|--------|------|-------------|
|
|--------|------|-------------|
|
||||||
| `POST` | `/api/v1/notify` | Controller sends event notification (backup_failed, disk_warning, etc.) |
|
| `POST` | `/api/v1/notify` | Legacy notification relay (kept for backward compatibility) |
|
||||||
| `POST` | `/api/v1/preferences` | Controller syncs customer notification preferences |
|
| `POST` | `/api/v1/preferences` | Controller syncs customer notification preferences (email, enabled_events, cooldown_hours) |
|
||||||
|
|
||||||
Notifications are sent via Resend.com email API.
|
Notifications are dispatched automatically when events are processed:
|
||||||
|
- **Operator channel**: English emails for warning/error events, 1h cooldown per customer:eventType
|
||||||
|
- **Customer channel**: Hungarian emails per event type, respects customer preferences and cooldown (default 6h)
|
||||||
|
- Email delivery via Resend.com API
|
||||||
|
|
||||||
### Customer Config Retrieval
|
### Customer Config Retrieval
|
||||||
|
|
||||||
@@ -93,7 +108,7 @@ Config retrieval uses a separate per-customer retrieval password (not the API ke
|
|||||||
|
|
||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
|--------|------|-------------|
|
|--------|------|-------------|
|
||||||
| `GET` | `/healthz` | Health check (no auth required) |
|
| `GET` | `/healthz` | Health check (no auth required, returns 503 if SQLite ping fails) |
|
||||||
|
|
||||||
## Web Dashboard
|
## Web Dashboard
|
||||||
|
|
||||||
@@ -101,13 +116,13 @@ Protected by bcrypt password + session cookie (7-day expiry).
|
|||||||
|
|
||||||
### Pages
|
### Pages
|
||||||
|
|
||||||
- **Dashboard (`/`)** — Fleet overview table showing all customers with live status. Config-only customers (no reports yet) appear as "PENDING" with gray badge. Blocked customers are hidden. Auto-refreshes every 60 seconds.
|
- **Dashboard (`/`)** — Fleet overview table showing all customers with live status and event count badges (error+warning in last 24h). Config-only customers (no reports yet) appear as "PENDING" with gray badge. Blocked customers are hidden. Auto-refreshes every 60 seconds.
|
||||||
- **Customers (`/configs`)** — Customer management list. Shows all customers (both managed and manual), their status, controller version, and config type (MANAGED/MANUAL). Blocked customers shown grayed-out with BLOCKED badge.
|
- **Customers (`/configs`)** — Customer management list. Shows all customers (both managed and manual), their status, controller version, and config type (MANAGED/MANUAL). Blocked customers shown grayed-out with BLOCKED badge.
|
||||||
- **Unified Customer Detail (`/customers/{id}`)** — Single page per customer combining config management and live monitoring. Adapts content based on available data:
|
- **Unified Customer Detail (`/customers/{id}`)** — Single page per customer combining config management and live monitoring. Adapts content based on available data:
|
||||||
- **Managed + reporting:** Full view — config info, system metrics, storage, containers, backup status, credentials, setup commands, YAML preview, controller update, notifications, history
|
- **Managed + reporting:** Full view — config info, system metrics, storage, containers, backup status, events timeline (last 50, severity filter), credentials, setup commands, YAML preview, controller update, notifications (with channel column), history
|
||||||
- **Managed + no reports yet:** Config info, credentials, setup commands, "Waiting for first report" indicator
|
- **Managed + no reports yet:** Config info, credentials, setup commands, "Waiting for first report" indicator
|
||||||
- **Manual (report-only):** System metrics, storage, containers, backup, with "Create Config" button to convert to managed
|
- **Manual (report-only):** System metrics, storage, containers, backup, with "Create Config" button to convert to managed
|
||||||
- **Config Form (`/configs/new`, `/configs/{id}/edit`)** — Create/edit customer configurations with identity, infrastructure tokens, and monitoring overrides
|
- **Config Form (`/configs/new`, `/configs/{id}/edit`)** — Create/edit customer configurations with identity, infrastructure tokens, and monitoring overrides. Legacy Monitoring UUIDs section collapsed by default with deprecation notice
|
||||||
|
|
||||||
### Customer States
|
### Customer States
|
||||||
|
|
||||||
@@ -144,9 +159,10 @@ SQLite with WAL mode. Tables:
|
|||||||
| Table | Purpose |
|
| Table | Purpose |
|
||||||
|-------|---------|
|
|-------|---------|
|
||||||
| `reports` | Full JSON reports with denormalized fields for dashboard queries |
|
| `reports` | Full JSON reports with denormalized fields for dashboard queries |
|
||||||
|
| `events` | Structured events from controllers and Hub (type, severity, message, details, source) |
|
||||||
| `infra_backups` | Per-customer infrastructure snapshots for disaster recovery |
|
| `infra_backups` | Per-customer infrastructure snapshots for disaster recovery |
|
||||||
| `customer_notifications` | Email + enabled event types per customer |
|
| `customer_notifications` | Email, enabled event types, cooldown hours per customer |
|
||||||
| `notification_log` | Send/skip/fail history for notifications |
|
| `notification_log` | Send/skip/fail history for notifications with channel (operator/customer) |
|
||||||
| `customer_configs` | Pre-configured customer settings, retrieval passwords, per-customer API keys, status (active/blocked) |
|
| `customer_configs` | Pre-configured customer settings, retrieval passwords, per-customer API keys, status (active/blocked) |
|
||||||
|
|
||||||
Retention: configurable (default 90 days), daily prune at 04:30 Budapest time.
|
Retention: configurable (default 90 days), daily prune at 04:30 Budapest time.
|
||||||
@@ -164,6 +180,8 @@ api:
|
|||||||
notifications:
|
notifications:
|
||||||
resend_api_key: "" # Resend.com API key for email
|
resend_api_key: "" # Resend.com API key for email
|
||||||
from_email: "monitoring@felhom.eu"
|
from_email: "monitoring@felhom.eu"
|
||||||
|
operator_email: "" # Operator alert recipient
|
||||||
|
operator_enabled: true # Enable operator email notifications
|
||||||
|
|
||||||
retention:
|
retention:
|
||||||
max_days: 90
|
max_days: 90
|
||||||
@@ -195,16 +213,26 @@ Runs on k3s (Kubernetes) in the `felhom-system` namespace:
|
|||||||
```bash
|
```bash
|
||||||
# Build and push
|
# Build and push
|
||||||
cd hub/
|
cd hub/
|
||||||
make VERSION=0.2.2 docker docker-push
|
make VERSION=0.3.0 docker docker-push
|
||||||
|
|
||||||
# Deploy
|
# Deploy
|
||||||
kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.2
|
kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.3.0
|
||||||
kubectl rollout status -n felhom-system deploy/hub
|
kubectl rollout status -n felhom-system deploy/hub
|
||||||
|
|
||||||
# Check
|
# Check
|
||||||
kubectl logs -n felhom-system -l app=hub --tail 20
|
kubectl logs -n felhom-system -l app=hub --tail 20
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Background Services
|
||||||
|
|
||||||
|
| Service | Schedule | Description |
|
||||||
|
|---------|----------|-------------|
|
||||||
|
| **Staleness checker** | Every 60s | Detects controllers that stopped reporting. Generates `node_stale` (>30min), `node_down` (>60min), `node_recovered` events |
|
||||||
|
| **Backup deadline checker** | Daily 05:00 Budapest | Detects missing backup/db-dump events since midnight. Generates `expected_backup_missed`, `expected_dbdump_missed` events |
|
||||||
|
| **Report/event prune** | Daily 04:30 Budapest | Deletes reports and events older than retention period (default 90 days) |
|
||||||
|
| **Registry version check** | Every 30min | Checks Gitea registry for new controller image tags |
|
||||||
|
| **Template refresh** | Every 1h | Fetches latest `controller.yaml.example` from Gitea |
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- `golang.org/x/crypto` — bcrypt for password hashing
|
- `golang.org/x/crypto` — bcrypt for password hashing
|
||||||
|
|||||||
+87
-18
@@ -13,6 +13,8 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
|
|
||||||
"gitea.dooplex.hu/admin/felhom-hub/internal/api"
|
"gitea.dooplex.hu/admin/felhom-hub/internal/api"
|
||||||
|
"gitea.dooplex.hu/admin/felhom-hub/internal/monitor"
|
||||||
|
"gitea.dooplex.hu/admin/felhom-hub/internal/notify"
|
||||||
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
||||||
"gitea.dooplex.hu/admin/felhom-hub/internal/web"
|
"gitea.dooplex.hu/admin/felhom-hub/internal/web"
|
||||||
"gopkg.in/yaml.v3"
|
"gopkg.in/yaml.v3"
|
||||||
@@ -32,8 +34,10 @@ type Config struct {
|
|||||||
ReportAPIKey string `yaml:"report_api_key"`
|
ReportAPIKey string `yaml:"report_api_key"`
|
||||||
} `yaml:"api"`
|
} `yaml:"api"`
|
||||||
Notifications struct {
|
Notifications struct {
|
||||||
ResendAPIKey string `yaml:"resend_api_key"`
|
ResendAPIKey string `yaml:"resend_api_key"`
|
||||||
FromEmail string `yaml:"from_email"`
|
FromEmail string `yaml:"from_email"`
|
||||||
|
OperatorEmail string `yaml:"operator_email"`
|
||||||
|
OperatorEnabled bool `yaml:"operator_enabled"`
|
||||||
} `yaml:"notifications"`
|
} `yaml:"notifications"`
|
||||||
Retention struct {
|
Retention struct {
|
||||||
MaxDays int `yaml:"max_days"`
|
MaxDays int `yaml:"max_days"`
|
||||||
@@ -119,6 +123,18 @@ func main() {
|
|||||||
templateProvider = templateFetcher
|
templateProvider = templateFetcher
|
||||||
}
|
}
|
||||||
apiHandler := api.New(dataStore, cfg.API.ReportAPIKey, cfg.Notifications.ResendAPIKey, cfg.Notifications.FromEmail, templateProvider, logger)
|
apiHandler := api.New(dataStore, cfg.API.ReportAPIKey, cfg.Notifications.ResendAPIKey, cfg.Notifications.FromEmail, templateProvider, logger)
|
||||||
|
|
||||||
|
// Initialize notification dispatcher
|
||||||
|
dispatcher := notify.NewDispatcher(
|
||||||
|
dataStore,
|
||||||
|
cfg.Notifications.ResendAPIKey,
|
||||||
|
cfg.Notifications.FromEmail,
|
||||||
|
cfg.Notifications.OperatorEmail,
|
||||||
|
cfg.Notifications.OperatorEnabled,
|
||||||
|
logger,
|
||||||
|
)
|
||||||
|
apiHandler.SetDispatcher(dispatcher)
|
||||||
|
|
||||||
webServer := web.New(dataStore, cfg.Auth.PasswordHash, cfg.API.ReportAPIKey, staleThreshold, logger)
|
webServer := web.New(dataStore, cfg.Auth.PasswordHash, cfg.API.ReportAPIKey, staleThreshold, logger)
|
||||||
webServer.SetTemplateFetcher(templateFetcher)
|
webServer.SetTemplateFetcher(templateFetcher)
|
||||||
|
|
||||||
@@ -127,6 +143,11 @@ func main() {
|
|||||||
|
|
||||||
// Health check endpoint — bypasses auth (for k8s probes)
|
// Health check endpoint — bypasses auth (for k8s probes)
|
||||||
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
|
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
if err := dataStore.Ping(); err != nil {
|
||||||
|
w.WriteHeader(http.StatusServiceUnavailable)
|
||||||
|
w.Write([]byte("db unhealthy"))
|
||||||
|
return
|
||||||
|
}
|
||||||
w.WriteHeader(http.StatusOK)
|
w.WriteHeader(http.StatusOK)
|
||||||
w.Write([]byte("ok"))
|
w.Write([]byte("ok"))
|
||||||
})
|
})
|
||||||
@@ -165,10 +186,34 @@ func main() {
|
|||||||
}
|
}
|
||||||
webServer.SetVersionChecker(versionChecker)
|
webServer.SetVersionChecker(versionChecker)
|
||||||
|
|
||||||
|
// Prune on startup, then daily at configured time (default 04:30)
|
||||||
if cfg.Retention.MaxDays > 0 {
|
if cfg.Retention.MaxDays > 0 {
|
||||||
go pruneLoop(ctx, dataStore, cfg.Retention.MaxDays, logger)
|
pruneAll(dataStore, cfg.Retention.MaxDays, logger)
|
||||||
|
go scheduleDaily(ctx, "prune", cfg.Retention.PruneSchedule, func() {
|
||||||
|
pruneAll(dataStore, cfg.Retention.MaxDays, logger)
|
||||||
|
}, logger)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Staleness checker — runs every 60s
|
||||||
|
stalenessChecker := monitor.NewStalenessChecker(dataStore, staleThreshold, dispatcher.ProcessEvent, logger)
|
||||||
|
go func() {
|
||||||
|
ticker := time.NewTicker(60 * time.Second)
|
||||||
|
defer ticker.Stop()
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
return
|
||||||
|
case <-ticker.C:
|
||||||
|
stalenessChecker.Check()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Backup deadline checker — runs daily at 05:00 Budapest
|
||||||
|
go scheduleDaily(ctx, "deadline-check", "05:00", func() {
|
||||||
|
monitor.CheckBackupDeadlines(dataStore, stalenessChecker, dispatcher.ProcessEvent, logger)
|
||||||
|
}, logger)
|
||||||
|
|
||||||
// Signal handling
|
// Signal handling
|
||||||
sigCh := make(chan os.Signal, 1)
|
sigCh := make(chan os.Signal, 1)
|
||||||
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
|
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
|
||||||
@@ -244,28 +289,52 @@ func loadConfig(path string, logger *log.Logger) *Config {
|
|||||||
return cfg
|
return cfg
|
||||||
}
|
}
|
||||||
|
|
||||||
func pruneLoop(ctx context.Context, s *store.Store, maxDays int, logger *log.Logger) {
|
// scheduleDaily runs fn once daily at the given "HH:MM" time in Europe/Budapest.
|
||||||
// Prune once on startup
|
// It blocks until ctx is cancelled.
|
||||||
if deleted, err := s.Prune(maxDays); err != nil {
|
func scheduleDaily(ctx context.Context, name, timeStr string, fn func(), logger *log.Logger) {
|
||||||
logger.Printf("[WARN] Prune failed: %v", err)
|
budapest, err := time.LoadLocation("Europe/Budapest")
|
||||||
} else if deleted > 0 {
|
if err != nil {
|
||||||
logger.Printf("[INFO] Pruned %d old report rows", deleted)
|
budapest = time.FixedZone("CET", 3600)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Then daily
|
hour, min := parseHM(timeStr)
|
||||||
ticker := time.NewTicker(24 * time.Hour)
|
|
||||||
defer ticker.Stop()
|
|
||||||
|
|
||||||
for {
|
for {
|
||||||
|
now := time.Now().In(budapest)
|
||||||
|
next := time.Date(now.Year(), now.Month(), now.Day(), hour, min, 0, 0, budapest)
|
||||||
|
if !next.After(now) {
|
||||||
|
next = next.Add(24 * time.Hour)
|
||||||
|
}
|
||||||
|
delay := time.Until(next)
|
||||||
|
logger.Printf("[INFO] %s: next run at %s (in %s)", name, next.Format("2006-01-02 15:04 MST"), delay.Round(time.Second))
|
||||||
|
|
||||||
select {
|
select {
|
||||||
case <-ctx.Done():
|
case <-ctx.Done():
|
||||||
return
|
return
|
||||||
case <-ticker.C:
|
case <-time.After(delay):
|
||||||
if deleted, err := s.Prune(maxDays); err != nil {
|
fn()
|
||||||
logger.Printf("[WARN] Prune failed: %v", err)
|
|
||||||
} else if deleted > 0 {
|
|
||||||
logger.Printf("[INFO] Pruned %d old report rows", deleted)
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// parseHM parses "HH:MM" into hour and minute. Returns 0, 0 on invalid input.
|
||||||
|
func parseHM(s string) (int, int) {
|
||||||
|
var h, m int
|
||||||
|
if _, err := fmt.Sscanf(s, "%d:%d", &h, &m); err != nil {
|
||||||
|
return 0, 0
|
||||||
|
}
|
||||||
|
return h, m
|
||||||
|
}
|
||||||
|
|
||||||
|
func pruneAll(s *store.Store, maxDays int, logger *log.Logger) {
|
||||||
|
if deleted, err := s.Prune(maxDays); err != nil {
|
||||||
|
logger.Printf("[WARN] Prune reports failed: %v", err)
|
||||||
|
} else if deleted > 0 {
|
||||||
|
logger.Printf("[INFO] Pruned %d old report rows", deleted)
|
||||||
|
}
|
||||||
|
if deleted, err := s.PruneEvents(maxDays); err != nil {
|
||||||
|
logger.Printf("[WARN] Prune events failed: %v", err)
|
||||||
|
} else if deleted > 0 {
|
||||||
|
logger.Printf("[INFO] Pruned %d old event rows", deleted)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
+149
-11
@@ -12,6 +12,7 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
|
|
||||||
"gitea.dooplex.hu/admin/felhom-hub/internal/configgen"
|
"gitea.dooplex.hu/admin/felhom-hub/internal/configgen"
|
||||||
|
"gitea.dooplex.hu/admin/felhom-hub/internal/notify"
|
||||||
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -29,6 +30,7 @@ type Handler struct {
|
|||||||
logger *log.Logger
|
logger *log.Logger
|
||||||
httpClient *http.Client
|
httpClient *http.Client
|
||||||
templateProvider ConfigTemplateProvider
|
templateProvider ConfigTemplateProvider
|
||||||
|
dispatcher *notify.Dispatcher
|
||||||
}
|
}
|
||||||
|
|
||||||
// New creates a new API handler.
|
// New creates a new API handler.
|
||||||
@@ -44,23 +46,40 @@ func New(store *store.Store, apiKey, resendAPIKey, fromEmail string, templatePro
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// SetDispatcher sets the notification dispatcher for event-triggered emails.
|
||||||
|
func (h *Handler) SetDispatcher(d *notify.Dispatcher) {
|
||||||
|
h.dispatcher = d
|
||||||
|
}
|
||||||
|
|
||||||
// checkAuth verifies the Bearer token against the global API key or a per-customer API key.
|
// checkAuth verifies the Bearer token against the global API key or a per-customer API key.
|
||||||
// Returns true if authorized.
|
// Returns true if authorized.
|
||||||
func (h *Handler) checkAuth(r *http.Request) bool {
|
func (h *Handler) checkAuth(r *http.Request) bool {
|
||||||
|
_, _, ok := h.checkAuthCustomer(r)
|
||||||
|
return ok
|
||||||
|
}
|
||||||
|
|
||||||
|
// checkAuthCustomer verifies the Bearer token and returns the authenticated customer identity.
|
||||||
|
// For per-customer keys: returns (customerID, false, true).
|
||||||
|
// For global key: returns ("", true, true) — caller must allow any customer_id.
|
||||||
|
// On failure: returns ("", false, false).
|
||||||
|
func (h *Handler) checkAuthCustomer(r *http.Request) (customerID string, isGlobal bool, ok bool) {
|
||||||
auth := r.Header.Get("Authorization")
|
auth := r.Header.Get("Authorization")
|
||||||
if !strings.HasPrefix(auth, "Bearer ") {
|
if !strings.HasPrefix(auth, "Bearer ") {
|
||||||
return false
|
return "", false, false
|
||||||
}
|
}
|
||||||
token := strings.TrimPrefix(auth, "Bearer ")
|
token := strings.TrimPrefix(auth, "Bearer ")
|
||||||
|
|
||||||
// Check global key first
|
// Check global key first
|
||||||
if h.apiKey != "" && subtle.ConstantTimeCompare([]byte(token), []byte(h.apiKey)) == 1 {
|
if h.apiKey != "" && subtle.ConstantTimeCompare([]byte(token), []byte(h.apiKey)) == 1 {
|
||||||
return true
|
return "", true, true
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check per-customer key
|
// Check per-customer key
|
||||||
cfg, err := h.store.GetCustomerConfigByAPIKey(token)
|
cfg, err := h.store.GetCustomerConfigByAPIKey(token)
|
||||||
return err == nil && cfg != nil
|
if err != nil || cfg == nil {
|
||||||
|
return "", false, false
|
||||||
|
}
|
||||||
|
return cfg.CustomerID, false, true
|
||||||
}
|
}
|
||||||
|
|
||||||
// ServeHTTP routes API requests.
|
// ServeHTTP routes API requests.
|
||||||
@@ -70,6 +89,8 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
|||||||
switch {
|
switch {
|
||||||
case r.Method == http.MethodPost && path == "/report":
|
case r.Method == http.MethodPost && path == "/report":
|
||||||
h.handleReport(w, r)
|
h.handleReport(w, r)
|
||||||
|
case r.Method == http.MethodPost && path == "/event":
|
||||||
|
h.handleEvent(w, r)
|
||||||
case r.Method == http.MethodPost && path == "/notify":
|
case r.Method == http.MethodPost && path == "/notify":
|
||||||
h.handleNotify(w, r)
|
h.handleNotify(w, r)
|
||||||
case r.Method == http.MethodPost && path == "/infra-backup":
|
case r.Method == http.MethodPost && path == "/infra-backup":
|
||||||
@@ -97,7 +118,8 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (h *Handler) handleReport(w http.ResponseWriter, r *http.Request) {
|
func (h *Handler) handleReport(w http.ResponseWriter, r *http.Request) {
|
||||||
if !h.checkAuth(r) {
|
authCustomerID, isGlobal, ok := h.checkAuthCustomer(r)
|
||||||
|
if !ok {
|
||||||
http.Error(w, "Unauthorized", http.StatusUnauthorized)
|
http.Error(w, "Unauthorized", http.StatusUnauthorized)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -117,6 +139,12 @@ func (h *Handler) handleReport(w http.ResponseWriter, r *http.Request) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Validate customer_id matches authenticated customer (unless global key)
|
||||||
|
if !isGlobal && authCustomerID != payload.CustomerID {
|
||||||
|
http.Error(w, "Forbidden: customer_id mismatch", http.StatusForbidden)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
if err := h.store.SaveReport(payload.CustomerID, body); err != nil {
|
if err := h.store.SaveReport(payload.CustomerID, body); err != nil {
|
||||||
h.logger.Printf("[ERROR] Failed to save report from %s: %v", payload.CustomerID, err)
|
h.logger.Printf("[ERROR] Failed to save report from %s: %v", payload.CustomerID, err)
|
||||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||||
@@ -128,6 +156,114 @@ func (h *Handler) handleReport(w http.ResponseWriter, r *http.Request) {
|
|||||||
w.Write([]byte(`{"status":"ok"}`))
|
w.Write([]byte(`{"status":"ok"}`))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// allowedEventTypes lists all valid event_type values the Hub accepts.
|
||||||
|
var allowedEventTypes = map[string]bool{
|
||||||
|
// Controller-pushed events
|
||||||
|
"controller_started": true,
|
||||||
|
"controller_updated": true,
|
||||||
|
"backup_completed": true,
|
||||||
|
"backup_failed": true,
|
||||||
|
"db_dump_completed": true,
|
||||||
|
"db_dump_failed": true,
|
||||||
|
"backup_integrity_ok": true,
|
||||||
|
"backup_integrity_failed": true,
|
||||||
|
"crossdrive_completed": true,
|
||||||
|
"crossdrive_failed": true,
|
||||||
|
"storage_disconnected": true,
|
||||||
|
"storage_reconnected": true,
|
||||||
|
"disk_warning": true,
|
||||||
|
"disk_critical": true,
|
||||||
|
"health_degraded": true,
|
||||||
|
"health_critical": true,
|
||||||
|
"health_recovered": true,
|
||||||
|
"app_deployed": true,
|
||||||
|
"app_removed": true,
|
||||||
|
"disaster_recovery_started": true,
|
||||||
|
"disaster_recovery_completed": true,
|
||||||
|
// Hub-generated events
|
||||||
|
"node_stale": true,
|
||||||
|
"node_down": true,
|
||||||
|
"node_recovered": true,
|
||||||
|
"expected_backup_missed": true,
|
||||||
|
"expected_dbdump_missed": true,
|
||||||
|
// Special
|
||||||
|
"test": true,
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleEvent processes structured events from controllers (new endpoint, replaces /notify for updated controllers).
|
||||||
|
func (h *Handler) handleEvent(w http.ResponseWriter, r *http.Request) {
|
||||||
|
authCustomerID, isGlobal, ok := h.checkAuthCustomer(r)
|
||||||
|
if !ok {
|
||||||
|
http.Error(w, "Unauthorized", http.StatusUnauthorized)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20))
|
||||||
|
if err != nil {
|
||||||
|
http.Error(w, "Bad request", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
var payload struct {
|
||||||
|
CustomerID string `json:"customer_id"`
|
||||||
|
EventType string `json:"event_type"`
|
||||||
|
Severity string `json:"severity"`
|
||||||
|
Message string `json:"message"`
|
||||||
|
Details json.RawMessage `json:"details"`
|
||||||
|
}
|
||||||
|
if err := json.Unmarshal(body, &payload); err != nil {
|
||||||
|
http.Error(w, "Invalid JSON", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if payload.CustomerID == "" || payload.EventType == "" {
|
||||||
|
http.Error(w, "customer_id and event_type are required", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate customer_id matches authenticated customer (unless global key)
|
||||||
|
if !isGlobal && authCustomerID != payload.CustomerID {
|
||||||
|
http.Error(w, "Forbidden: customer_id mismatch", http.StatusForbidden)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate event_type
|
||||||
|
if !allowedEventTypes[payload.EventType] {
|
||||||
|
http.Error(w, fmt.Sprintf("Invalid event_type: %s", payload.EventType), http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate/default severity
|
||||||
|
switch payload.Severity {
|
||||||
|
case "info", "warning", "error":
|
||||||
|
default:
|
||||||
|
payload.Severity = "info"
|
||||||
|
}
|
||||||
|
|
||||||
|
// Store details as JSON string
|
||||||
|
detailsStr := "{}"
|
||||||
|
if len(payload.Details) > 0 && string(payload.Details) != "null" {
|
||||||
|
detailsStr = string(payload.Details)
|
||||||
|
}
|
||||||
|
|
||||||
|
_, err = h.store.SaveEvent(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, detailsStr, "controller")
|
||||||
|
if err != nil {
|
||||||
|
h.logger.Printf("[ERROR] Failed to save event from %s: %v", payload.CustomerID, err)
|
||||||
|
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
h.logger.Printf("[INFO] Event from %s: %s (%s) — %s", payload.CustomerID, payload.EventType, payload.Severity, payload.Message)
|
||||||
|
|
||||||
|
// Dispatch notifications (non-blocking)
|
||||||
|
if h.dispatcher != nil {
|
||||||
|
go h.dispatcher.ProcessEvent(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, detailsStr, "controller")
|
||||||
|
}
|
||||||
|
|
||||||
|
w.Header().Set("Content-Type", "application/json")
|
||||||
|
w.WriteHeader(http.StatusOK)
|
||||||
|
w.Write([]byte(`{"ok":true}`))
|
||||||
|
}
|
||||||
|
|
||||||
func (h *Handler) handleCustomers(w http.ResponseWriter, r *http.Request) {
|
func (h *Handler) handleCustomers(w http.ResponseWriter, r *http.Request) {
|
||||||
customers, err := h.store.GetCustomers()
|
customers, err := h.store.GetCustomers()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -258,7 +394,7 @@ func (h *Handler) handleNotify(w http.ResponseWriter, r *http.Request) {
|
|||||||
// Check if customer is blocked
|
// Check if customer is blocked
|
||||||
if h.store.IsCustomerBlocked(payload.CustomerID) {
|
if h.store.IsCustomerBlocked(payload.CustomerID) {
|
||||||
h.logger.Printf("[INFO] Notification suppressed for blocked customer %s", payload.CustomerID)
|
h.logger.Printf("[INFO] Notification suppressed for blocked customer %s", payload.CustomerID)
|
||||||
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "customer blocked")
|
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "customer blocked", "customer")
|
||||||
w.WriteHeader(http.StatusOK)
|
w.WriteHeader(http.StatusOK)
|
||||||
w.Write([]byte(`{"status":"ok","sent":false,"reason":"blocked"}`))
|
w.Write([]byte(`{"status":"ok","sent":false,"reason":"blocked"}`))
|
||||||
return
|
return
|
||||||
@@ -275,7 +411,7 @@ func (h *Handler) handleNotify(w http.ResponseWriter, r *http.Request) {
|
|||||||
// Check if customer has email configured and event type is enabled
|
// Check if customer has email configured and event type is enabled
|
||||||
if prefs == nil || prefs.Email == "" {
|
if prefs == nil || prefs.Email == "" {
|
||||||
h.logger.Printf("[INFO] No email configured for %s, skipping notification", payload.CustomerID)
|
h.logger.Printf("[INFO] No email configured for %s, skipping notification", payload.CustomerID)
|
||||||
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "no email configured")
|
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "no email configured", "customer")
|
||||||
w.WriteHeader(http.StatusOK)
|
w.WriteHeader(http.StatusOK)
|
||||||
w.Write([]byte(`{"status":"ok","sent":false,"reason":"no_email"}`))
|
w.Write([]byte(`{"status":"ok","sent":false,"reason":"no_email"}`))
|
||||||
return
|
return
|
||||||
@@ -291,7 +427,7 @@ func (h *Handler) handleNotify(w http.ResponseWriter, r *http.Request) {
|
|||||||
}
|
}
|
||||||
if !eventEnabled {
|
if !eventEnabled {
|
||||||
h.logger.Printf("[INFO] Event %s not enabled for %s, skipping", payload.EventType, payload.CustomerID)
|
h.logger.Printf("[INFO] Event %s not enabled for %s, skipping", payload.EventType, payload.CustomerID)
|
||||||
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "event not enabled")
|
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "event not enabled", "customer")
|
||||||
w.WriteHeader(http.StatusOK)
|
w.WriteHeader(http.StatusOK)
|
||||||
w.Write([]byte(`{"status":"ok","sent":false,"reason":"event_disabled"}`))
|
w.Write([]byte(`{"status":"ok","sent":false,"reason":"event_disabled"}`))
|
||||||
return
|
return
|
||||||
@@ -300,7 +436,7 @@ func (h *Handler) handleNotify(w http.ResponseWriter, r *http.Request) {
|
|||||||
// Send email via Resend API
|
// Send email via Resend API
|
||||||
if h.resendAPIKey == "" {
|
if h.resendAPIKey == "" {
|
||||||
h.logger.Printf("[WARN] Resend API key not configured, cannot send notification email")
|
h.logger.Printf("[WARN] Resend API key not configured, cannot send notification email")
|
||||||
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "resend api key not configured")
|
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "skipped", "resend api key not configured", "customer")
|
||||||
w.WriteHeader(http.StatusOK)
|
w.WriteHeader(http.StatusOK)
|
||||||
w.Write([]byte(`{"status":"ok","sent":false,"reason":"no_api_key"}`))
|
w.Write([]byte(`{"status":"ok","sent":false,"reason":"no_api_key"}`))
|
||||||
return
|
return
|
||||||
@@ -310,13 +446,13 @@ func (h *Handler) handleNotify(w http.ResponseWriter, r *http.Request) {
|
|||||||
sendErr := h.sendResendEmail(prefs.Email, subject, emailBody)
|
sendErr := h.sendResendEmail(prefs.Email, subject, emailBody)
|
||||||
if sendErr != nil {
|
if sendErr != nil {
|
||||||
h.logger.Printf("[ERROR] Failed to send notification email to %s: %v", prefs.Email, sendErr)
|
h.logger.Printf("[ERROR] Failed to send notification email to %s: %v", prefs.Email, sendErr)
|
||||||
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "failed", sendErr.Error())
|
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "failed", sendErr.Error(), "customer")
|
||||||
http.Error(w, "Failed to send email", http.StatusInternalServerError)
|
http.Error(w, "Failed to send email", http.StatusInternalServerError)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
h.logger.Printf("[INFO] Notification email sent to %s for %s/%s", prefs.Email, payload.CustomerID, payload.EventType)
|
h.logger.Printf("[INFO] Notification email sent to %s for %s/%s", prefs.Email, payload.CustomerID, payload.EventType)
|
||||||
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "sent", "")
|
h.store.LogNotification(payload.CustomerID, payload.EventType, payload.Severity, payload.Message, "sent", "", "customer")
|
||||||
|
|
||||||
w.WriteHeader(http.StatusOK)
|
w.WriteHeader(http.StatusOK)
|
||||||
w.Write([]byte(`{"status":"ok","sent":true}`))
|
w.Write([]byte(`{"status":"ok","sent":true}`))
|
||||||
@@ -339,13 +475,14 @@ func (h *Handler) handleSavePreferences(w http.ResponseWriter, r *http.Request)
|
|||||||
CustomerID string `json:"customer_id"`
|
CustomerID string `json:"customer_id"`
|
||||||
Email string `json:"email"`
|
Email string `json:"email"`
|
||||||
EnabledEvents []string `json:"enabled_events"`
|
EnabledEvents []string `json:"enabled_events"`
|
||||||
|
CooldownHours int `json:"cooldown_hours"`
|
||||||
}
|
}
|
||||||
if err := json.Unmarshal(body, &payload); err != nil || payload.CustomerID == "" {
|
if err := json.Unmarshal(body, &payload); err != nil || payload.CustomerID == "" {
|
||||||
http.Error(w, "Invalid payload: customer_id required", http.StatusBadRequest)
|
http.Error(w, "Invalid payload: customer_id required", http.StatusBadRequest)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := h.store.SaveNotificationPrefs(payload.CustomerID, payload.Email, payload.EnabledEvents); err != nil {
|
if err := h.store.SaveNotificationPrefs(payload.CustomerID, payload.Email, payload.EnabledEvents, payload.CooldownHours); err != nil {
|
||||||
h.logger.Printf("[ERROR] Failed to save notification prefs for %s: %v", payload.CustomerID, err)
|
h.logger.Printf("[ERROR] Failed to save notification prefs for %s: %v", payload.CustomerID, err)
|
||||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||||
return
|
return
|
||||||
@@ -503,6 +640,7 @@ func formatNotificationEmail(customerID, eventType, severity, message, details s
|
|||||||
severityLabel := map[string]string{
|
severityLabel := map[string]string{
|
||||||
"info": "Információ",
|
"info": "Információ",
|
||||||
"warning": "Figyelmeztetés",
|
"warning": "Figyelmeztetés",
|
||||||
|
"error": "Hiba",
|
||||||
"critical": "Kritikus",
|
"critical": "Kritikus",
|
||||||
}
|
}
|
||||||
label := severityLabel[severity]
|
label := severityLabel[severity]
|
||||||
|
|||||||
@@ -0,0 +1,86 @@
|
|||||||
|
package monitor
|
||||||
|
|
||||||
|
import (
|
||||||
|
"log"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
||||||
|
)
|
||||||
|
|
||||||
|
// budapest returns the Europe/Budapest timezone (cached).
|
||||||
|
var budapest *time.Location
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
var err error
|
||||||
|
budapest, err = time.LoadLocation("Europe/Budapest")
|
||||||
|
if err != nil {
|
||||||
|
// Fallback: UTC+1 (CET base; DST handled by OS if available)
|
||||||
|
budapest = time.FixedZone("CET", 3600)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// CheckBackupDeadlines checks whether active customers had their expected
|
||||||
|
// daily backups and DB dumps. Runs once daily at 05:00 Budapest time.
|
||||||
|
//
|
||||||
|
// For each active customer, it checks for backup_completed and db_dump_completed
|
||||||
|
// events since Budapest midnight. If neither success nor failure events exist,
|
||||||
|
// it inserts expected_backup_missed / expected_dbdump_missed events.
|
||||||
|
//
|
||||||
|
// Customers whose nodes are "down" (no report in >1h) are skipped — they
|
||||||
|
// already have staleness events.
|
||||||
|
func CheckBackupDeadlines(s *store.Store, staleness *StalenessChecker, onEvent EventNotifyFunc, logger *log.Logger) {
|
||||||
|
customerIDs, err := s.GetActiveCustomerIDs()
|
||||||
|
if err != nil {
|
||||||
|
logger.Printf("[WARN] Deadline check: failed to get active customers: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Budapest midnight today
|
||||||
|
now := time.Now().In(budapest)
|
||||||
|
midnightBudapest := time.Date(now.Year(), now.Month(), now.Day(), 0, 0, 0, 0, budapest)
|
||||||
|
sinceUTC := midnightBudapest.UTC()
|
||||||
|
|
||||||
|
var backupMissed, dbdumpMissed, skipped int
|
||||||
|
|
||||||
|
for _, id := range customerIDs {
|
||||||
|
// Skip nodes that are down — they already have staleness events
|
||||||
|
if staleness != nil && staleness.GetState(id) == "down" {
|
||||||
|
skipped++
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check blocked
|
||||||
|
if s.IsCustomerBlocked(id) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check backup_completed / backup_failed since midnight
|
||||||
|
backupOK, _ := s.GetEventsByType(id, "backup_completed", sinceUTC)
|
||||||
|
backupFailed, _ := s.GetEventsByType(id, "backup_failed", sinceUTC)
|
||||||
|
if len(backupOK) == 0 && len(backupFailed) == 0 {
|
||||||
|
msg := "No backup completed or failed since midnight"
|
||||||
|
if _, err := s.SaveEvent(id, "expected_backup_missed", "error", msg, "{}", "hub"); err != nil {
|
||||||
|
logger.Printf("[WARN] Failed to save expected_backup_missed for %s: %v", id, err)
|
||||||
|
} else if onEvent != nil {
|
||||||
|
onEvent(id, "expected_backup_missed", "error", msg, "{}", "hub")
|
||||||
|
}
|
||||||
|
backupMissed++
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check db_dump_completed / db_dump_failed since midnight
|
||||||
|
dumpOK, _ := s.GetEventsByType(id, "db_dump_completed", sinceUTC)
|
||||||
|
dumpFailed, _ := s.GetEventsByType(id, "db_dump_failed", sinceUTC)
|
||||||
|
if len(dumpOK) == 0 && len(dumpFailed) == 0 {
|
||||||
|
msg := "No DB dump completed or failed since midnight"
|
||||||
|
if _, err := s.SaveEvent(id, "expected_dbdump_missed", "error", msg, "{}", "hub"); err != nil {
|
||||||
|
logger.Printf("[WARN] Failed to save expected_dbdump_missed for %s: %v", id, err)
|
||||||
|
} else if onEvent != nil {
|
||||||
|
onEvent(id, "expected_dbdump_missed", "error", msg, "{}", "hub")
|
||||||
|
}
|
||||||
|
dbdumpMissed++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.Printf("[INFO] Deadline check: %d customers, %d backup missed, %d dbdump missed, %d skipped (down)",
|
||||||
|
len(customerIDs), backupMissed, dbdumpMissed, skipped)
|
||||||
|
}
|
||||||
@@ -0,0 +1,183 @@
|
|||||||
|
package monitor
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
||||||
|
)
|
||||||
|
|
||||||
|
// EventNotifyFunc is called after a hub-generated event is saved,
|
||||||
|
// to trigger notification dispatch. Keeps monitor decoupled from notify.
|
||||||
|
type EventNotifyFunc func(customerID, eventType, severity, message, detailsJSON, source string)
|
||||||
|
|
||||||
|
// StalenessChecker monitors customer report freshness and generates
|
||||||
|
// node_stale / node_down / node_recovered events on state transitions.
|
||||||
|
type StalenessChecker struct {
|
||||||
|
store *store.Store
|
||||||
|
threshold time.Duration // "stale" after this duration (default 30m)
|
||||||
|
downAfter time.Duration // "down" after this duration (2x threshold)
|
||||||
|
logger *log.Logger
|
||||||
|
onEvent EventNotifyFunc
|
||||||
|
|
||||||
|
mu sync.Mutex
|
||||||
|
states map[string]string // customerID → "ok" | "stale" | "down"
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewStalenessChecker creates a checker and initializes state from current data.
|
||||||
|
// No events are generated during initialization (binding #12).
|
||||||
|
// onEvent is called after each hub-generated event is saved (may be nil).
|
||||||
|
func NewStalenessChecker(s *store.Store, threshold time.Duration, onEvent EventNotifyFunc, logger *log.Logger) *StalenessChecker {
|
||||||
|
sc := &StalenessChecker{
|
||||||
|
store: s,
|
||||||
|
threshold: threshold,
|
||||||
|
downAfter: 2 * threshold,
|
||||||
|
logger: logger,
|
||||||
|
onEvent: onEvent,
|
||||||
|
states: make(map[string]string),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Seed states from current report timestamps — no events on init
|
||||||
|
customers, err := s.GetCustomers()
|
||||||
|
if err != nil {
|
||||||
|
logger.Printf("[WARN] Staleness checker: failed to seed states: %v", err)
|
||||||
|
return sc
|
||||||
|
}
|
||||||
|
|
||||||
|
var okCount, staleCount, downCount int
|
||||||
|
for _, c := range customers {
|
||||||
|
if s.IsCustomerBlocked(c.CustomerID) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
age := time.Since(c.ReceivedAt)
|
||||||
|
switch {
|
||||||
|
case age > sc.downAfter:
|
||||||
|
sc.states[c.CustomerID] = "down"
|
||||||
|
downCount++
|
||||||
|
case age > sc.threshold:
|
||||||
|
sc.states[c.CustomerID] = "stale"
|
||||||
|
staleCount++
|
||||||
|
default:
|
||||||
|
sc.states[c.CustomerID] = "ok"
|
||||||
|
okCount++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
logger.Printf("[INFO] Staleness checker initialized: %d ok, %d stale, %d down", okCount, staleCount, downCount)
|
||||||
|
|
||||||
|
return sc
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check evaluates all customers and emits events on state transitions.
|
||||||
|
// Should be called periodically (every 60s).
|
||||||
|
func (sc *StalenessChecker) Check() {
|
||||||
|
customers, err := sc.store.GetCustomers()
|
||||||
|
if err != nil {
|
||||||
|
sc.logger.Printf("[WARN] Staleness check failed: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
sc.mu.Lock()
|
||||||
|
defer sc.mu.Unlock()
|
||||||
|
|
||||||
|
// Track which customers are still present (to clean up removed ones)
|
||||||
|
seen := make(map[string]bool, len(customers))
|
||||||
|
|
||||||
|
for _, c := range customers {
|
||||||
|
seen[c.CustomerID] = true
|
||||||
|
|
||||||
|
if sc.store.IsCustomerBlocked(c.CustomerID) {
|
||||||
|
delete(sc.states, c.CustomerID)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
age := time.Since(c.ReceivedAt)
|
||||||
|
var newState string
|
||||||
|
switch {
|
||||||
|
case age > sc.downAfter:
|
||||||
|
newState = "down"
|
||||||
|
case age > sc.threshold:
|
||||||
|
newState = "stale"
|
||||||
|
default:
|
||||||
|
newState = "ok"
|
||||||
|
}
|
||||||
|
|
||||||
|
oldState := sc.states[c.CustomerID]
|
||||||
|
if oldState == "" {
|
||||||
|
// New customer — set state without event
|
||||||
|
sc.states[c.CustomerID] = newState
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
if oldState == newState {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// State transition — emit event
|
||||||
|
sc.states[c.CustomerID] = newState
|
||||||
|
sc.emitTransition(c.CustomerID, oldState, newState, age)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up customers that no longer have reports
|
||||||
|
for id := range sc.states {
|
||||||
|
if !seen[id] {
|
||||||
|
delete(sc.states, id)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetState returns the current staleness state for a customer.
|
||||||
|
func (sc *StalenessChecker) GetState(customerID string) string {
|
||||||
|
sc.mu.Lock()
|
||||||
|
defer sc.mu.Unlock()
|
||||||
|
s := sc.states[customerID]
|
||||||
|
if s == "" {
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
|
||||||
|
func (sc *StalenessChecker) emitTransition(customerID, oldState, newState string, age time.Duration) {
|
||||||
|
var eventType, severity, message string
|
||||||
|
|
||||||
|
switch {
|
||||||
|
case newState == "stale":
|
||||||
|
eventType = "node_stale"
|
||||||
|
severity = "warning"
|
||||||
|
message = "No report received for " + formatDuration(age)
|
||||||
|
case newState == "down":
|
||||||
|
eventType = "node_down"
|
||||||
|
severity = "error"
|
||||||
|
message = "No report received for " + formatDuration(age)
|
||||||
|
case newState == "ok" && (oldState == "stale" || oldState == "down"):
|
||||||
|
eventType = "node_recovered"
|
||||||
|
severity = "info"
|
||||||
|
message = "Reports resumed (was " + oldState + " for " + formatDuration(age) + ")"
|
||||||
|
default:
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
sc.logger.Printf("[INFO] Staleness: %s %s → %s (%s)", customerID, oldState, newState, eventType)
|
||||||
|
|
||||||
|
if _, err := sc.store.SaveEvent(customerID, eventType, severity, message, "{}", "hub"); err != nil {
|
||||||
|
sc.logger.Printf("[WARN] Failed to save staleness event for %s: %v", customerID, err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
if sc.onEvent != nil {
|
||||||
|
sc.onEvent(customerID, eventType, severity, message, "{}", "hub")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatDuration(d time.Duration) string {
|
||||||
|
if d < time.Hour {
|
||||||
|
return fmt.Sprintf("%dm", int(d.Minutes()))
|
||||||
|
}
|
||||||
|
h := int(d.Hours())
|
||||||
|
m := int(d.Minutes()) % 60
|
||||||
|
if m == 0 {
|
||||||
|
return fmt.Sprintf("%dh", h)
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%dh%dm", h, m)
|
||||||
|
}
|
||||||
@@ -0,0 +1,201 @@
|
|||||||
|
package notify
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"log"
|
||||||
|
"net/http"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Dispatcher routes events to operator and/or customer email channels.
|
||||||
|
// Cooldowns are in-memory (lost on restart, acceptable).
|
||||||
|
type Dispatcher struct {
|
||||||
|
store *store.Store
|
||||||
|
resendAPIKey string
|
||||||
|
fromEmail string
|
||||||
|
operatorEmail string
|
||||||
|
operatorOn bool
|
||||||
|
httpClient *http.Client
|
||||||
|
logger *log.Logger
|
||||||
|
|
||||||
|
mu sync.Mutex
|
||||||
|
opCooldowns map[string]time.Time // "customerID:eventType" → last operator notify
|
||||||
|
custCooldowns map[string]time.Time // "customerID:eventType" → last customer notify
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewDispatcher creates a new notification dispatcher.
|
||||||
|
func NewDispatcher(s *store.Store, resendAPIKey, fromEmail, operatorEmail string, operatorOn bool, logger *log.Logger) *Dispatcher {
|
||||||
|
return &Dispatcher{
|
||||||
|
store: s,
|
||||||
|
resendAPIKey: resendAPIKey,
|
||||||
|
fromEmail: fromEmail,
|
||||||
|
operatorEmail: operatorEmail,
|
||||||
|
operatorOn: operatorOn,
|
||||||
|
httpClient: &http.Client{Timeout: 10 * time.Second},
|
||||||
|
logger: logger,
|
||||||
|
opCooldowns: make(map[string]time.Time),
|
||||||
|
custCooldowns: make(map[string]time.Time),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ProcessEvent evaluates an event and sends notifications as appropriate.
|
||||||
|
// Safe to call from goroutines.
|
||||||
|
func (d *Dispatcher) ProcessEvent(customerID, eventType, severity, message, detailsJSON, source string) {
|
||||||
|
if d.resendAPIKey == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// "test" bypass — send directly to customer email, skip prefs/cooldown
|
||||||
|
if eventType == "test" {
|
||||||
|
d.sendTestEmail(customerID)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Only warning and error severity trigger notifications
|
||||||
|
if severity != "warning" && severity != "error" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Operator channel
|
||||||
|
d.processOperator(customerID, eventType, severity, message, detailsJSON, source)
|
||||||
|
|
||||||
|
// Customer channel
|
||||||
|
d.processCustomer(customerID, eventType, severity, message, detailsJSON, source)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (d *Dispatcher) sendTestEmail(customerID string) {
|
||||||
|
prefs, err := d.store.GetNotificationPrefs(customerID)
|
||||||
|
if err != nil || prefs.Email == "" {
|
||||||
|
d.logger.Printf("[WARN] Test email: no email configured for %s", customerID)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
subject := "[Felhom] Teszt értesítés"
|
||||||
|
body := "Kedves Ügyfél!\n\nEz egy teszt értesítés a Felhom monitoring rendszerből.\nAz értesítések megfelelően működnek.\n\nÜdvözlettel,\nFelhom.eu monitoring"
|
||||||
|
|
||||||
|
if err := d.sendEmail(prefs.Email, subject, body); err != nil {
|
||||||
|
d.logger.Printf("[ERROR] Test email to %s failed: %v", prefs.Email, err)
|
||||||
|
d.store.LogNotification(customerID, "test", "info", "Teszt értesítés", "failed", err.Error(), "customer")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
d.logger.Printf("[INFO] Test email sent to %s for %s", prefs.Email, customerID)
|
||||||
|
d.store.LogNotification(customerID, "test", "info", "Teszt értesítés", "sent", "", "customer")
|
||||||
|
}
|
||||||
|
|
||||||
|
func (d *Dispatcher) processOperator(customerID, eventType, severity, message, detailsJSON, source string) {
|
||||||
|
if !d.operatorOn || d.operatorEmail == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
cooldownKey := customerID + ":" + eventType
|
||||||
|
d.mu.Lock()
|
||||||
|
if last, ok := d.opCooldowns[cooldownKey]; ok && time.Since(last) < 1*time.Hour {
|
||||||
|
d.mu.Unlock()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
d.opCooldowns[cooldownKey] = time.Now()
|
||||||
|
d.mu.Unlock()
|
||||||
|
|
||||||
|
subject, body := FormatOperatorEmail(customerID, eventType, severity, message, detailsJSON)
|
||||||
|
|
||||||
|
if err := d.sendEmail(d.operatorEmail, subject, body); err != nil {
|
||||||
|
d.logger.Printf("[ERROR] Operator email failed for %s/%s: %v", customerID, eventType, err)
|
||||||
|
d.store.LogNotification(customerID, eventType, severity, message, "failed", err.Error(), "operator")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
d.logger.Printf("[INFO] Operator email sent for %s/%s", customerID, eventType)
|
||||||
|
d.store.LogNotification(customerID, eventType, severity, message, "sent", "", "operator")
|
||||||
|
}
|
||||||
|
|
||||||
|
func (d *Dispatcher) processCustomer(customerID, eventType, severity, message, detailsJSON, source string) {
|
||||||
|
// Check if customer is blocked
|
||||||
|
if d.store.IsCustomerBlocked(customerID) {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load preferences
|
||||||
|
prefs, err := d.store.GetNotificationPrefs(customerID)
|
||||||
|
if err != nil || prefs.Email == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if event type is enabled
|
||||||
|
if !isEventEnabled(prefs.EnabledEvents, eventType) {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Customer cooldown (from prefs, default 6h)
|
||||||
|
cooldownHours := prefs.CooldownHours
|
||||||
|
if cooldownHours <= 0 {
|
||||||
|
cooldownHours = 6
|
||||||
|
}
|
||||||
|
cooldownDur := time.Duration(cooldownHours) * time.Hour
|
||||||
|
|
||||||
|
cooldownKey := customerID + ":" + eventType
|
||||||
|
d.mu.Lock()
|
||||||
|
if last, ok := d.custCooldowns[cooldownKey]; ok && time.Since(last) < cooldownDur {
|
||||||
|
d.mu.Unlock()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
d.custCooldowns[cooldownKey] = time.Now()
|
||||||
|
d.mu.Unlock()
|
||||||
|
|
||||||
|
subject, body := FormatCustomerEmail(customerID, eventType, severity, message, detailsJSON)
|
||||||
|
|
||||||
|
if err := d.sendEmail(prefs.Email, subject, body); err != nil {
|
||||||
|
d.logger.Printf("[ERROR] Customer email failed for %s/%s: %v", customerID, eventType, err)
|
||||||
|
d.store.LogNotification(customerID, eventType, severity, message, "failed", err.Error(), "customer")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
d.logger.Printf("[INFO] Customer email sent to %s for %s/%s", prefs.Email, customerID, eventType)
|
||||||
|
d.store.LogNotification(customerID, eventType, severity, message, "sent", "", "customer")
|
||||||
|
}
|
||||||
|
|
||||||
|
func (d *Dispatcher) sendEmail(to, subject, textBody string) error {
|
||||||
|
payload := map[string]interface{}{
|
||||||
|
"from": d.fromEmail,
|
||||||
|
"to": []string{to},
|
||||||
|
"subject": subject,
|
||||||
|
"text": textBody,
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonData, err := json.Marshal(payload)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("marshaling email payload: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
req, err := http.NewRequest("POST", "https://api.resend.com/emails", bytes.NewReader(jsonData))
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating request: %w", err)
|
||||||
|
}
|
||||||
|
req.Header.Set("Authorization", "Bearer "+d.resendAPIKey)
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
resp, err := d.httpClient.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("sending request: %w", err)
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
if resp.StatusCode >= 400 {
|
||||||
|
respBody, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
|
||||||
|
return fmt.Errorf("resend API returned %d: %s", resp.StatusCode, string(respBody))
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func isEventEnabled(enabledEvents []string, eventType string) bool {
|
||||||
|
for _, e := range enabledEvents {
|
||||||
|
if e == eventType {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
@@ -0,0 +1,153 @@
|
|||||||
|
package notify
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// budapest timezone for formatting.
|
||||||
|
var budapest *time.Location
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
var err error
|
||||||
|
budapest, err = time.LoadLocation("Europe/Budapest")
|
||||||
|
if err != nil {
|
||||||
|
budapest = time.FixedZone("CET", 3600)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// Operator email — concise, English
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// FormatOperatorEmail returns (subject, textBody) for the operator channel.
|
||||||
|
func FormatOperatorEmail(customerID, eventType, severity, message, detailsJSON string) (string, string) {
|
||||||
|
icon := "⚠️"
|
||||||
|
if severity == "error" {
|
||||||
|
icon = "🔴"
|
||||||
|
}
|
||||||
|
|
||||||
|
subject := fmt.Sprintf("[Felhom] %s %s: %s", icon, customerID, eventType)
|
||||||
|
|
||||||
|
now := time.Now().In(budapest).Format("2006-01-02 15:04 MST")
|
||||||
|
body := fmt.Sprintf(`Customer: %s
|
||||||
|
Event: %s
|
||||||
|
Severity: %s
|
||||||
|
Time: %s
|
||||||
|
Message: %s`, customerID, eventType, severity, now, message)
|
||||||
|
|
||||||
|
if detailsJSON != "" && detailsJSON != "{}" {
|
||||||
|
body += fmt.Sprintf("\nDetails: %s", detailsJSON)
|
||||||
|
}
|
||||||
|
|
||||||
|
body += fmt.Sprintf("\n\nDashboard: https://hub.felhom.eu/customers/%s", customerID)
|
||||||
|
|
||||||
|
return subject, body
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// Customer email — Hungarian, friendly
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// customerMessages maps event_type → Hungarian customer message.
|
||||||
|
var customerMessages = map[string]string{
|
||||||
|
// Backup events
|
||||||
|
"backup_completed": "A biztonsági mentés sikeresen elkészült.",
|
||||||
|
"backup_failed": "A biztonsági mentés sikertelen! Kérjük, ellenőrizd a rendszert.",
|
||||||
|
"db_dump_completed": "Az adatbázis mentés sikeresen elkészült.",
|
||||||
|
"db_dump_failed": "Az adatbázis mentés sikertelen!",
|
||||||
|
"backup_integrity_ok": "A mentés integritás ellenőrzés sikeres.",
|
||||||
|
"backup_integrity_failed": "A mentés integritás ellenőrzés hibát talált!",
|
||||||
|
"crossdrive_completed": "A másodlagos mentés sikeresen elkészült.",
|
||||||
|
"crossdrive_failed": "A másodlagos mentés sikertelen!",
|
||||||
|
|
||||||
|
// Disk events
|
||||||
|
"disk_warning": "A lemezterület 90% felett van — kérjük, szabadíts fel helyet.",
|
||||||
|
"disk_critical": "A lemezterület kritikusan magas (95%+) — azonnali beavatkozás szükséges!",
|
||||||
|
|
||||||
|
// Storage events
|
||||||
|
"storage_disconnected": "Egy meghajtó leválasztva — a mentések szünetelhetnek.",
|
||||||
|
"storage_reconnected": "A meghajtó újra csatlakoztatva.",
|
||||||
|
|
||||||
|
// Staleness events (Hub-generated)
|
||||||
|
"node_stale": "A szerver nem küldött jelentést az elmúlt időszakban.",
|
||||||
|
"node_down": "A szerver nem elérhető!",
|
||||||
|
"node_recovered": "A szerver újra elérhető.",
|
||||||
|
|
||||||
|
// Health events
|
||||||
|
"health_degraded": "A rendszer állapota romlott.",
|
||||||
|
"health_critical": "A rendszer állapota kritikus!",
|
||||||
|
"health_recovered": "A rendszer állapota helyreállt.",
|
||||||
|
|
||||||
|
// Controller events
|
||||||
|
"controller_started": "A vezérlő elindult.",
|
||||||
|
"controller_updated": "A vezérlő frissítve lett.",
|
||||||
|
|
||||||
|
// Deadline events (Hub-generated)
|
||||||
|
"expected_backup_missed": "A mai biztonsági mentés nem készült el a határidőig!",
|
||||||
|
"expected_dbdump_missed": "A mai adatbázis mentés nem készült el a határidőig!",
|
||||||
|
|
||||||
|
// App lifecycle events
|
||||||
|
"app_deployed": "Alkalmazás telepítve.",
|
||||||
|
"app_removed": "Alkalmazás eltávolítva.",
|
||||||
|
|
||||||
|
// Disaster recovery events
|
||||||
|
"disaster_recovery_started": "Katasztrófa helyreállítás elindítva.",
|
||||||
|
"disaster_recovery_completed": "Katasztrófa helyreállítás befejezve.",
|
||||||
|
|
||||||
|
// Test
|
||||||
|
"test": "Ez egy teszt értesítés.",
|
||||||
|
}
|
||||||
|
|
||||||
|
// severityLabels maps severity to Hungarian labels.
|
||||||
|
var severityLabels = map[string]string{
|
||||||
|
"info": "Információ",
|
||||||
|
"warning": "Figyelmeztetés",
|
||||||
|
"error": "Hiba",
|
||||||
|
}
|
||||||
|
|
||||||
|
// FormatCustomerEmail returns (subject, textBody) for the customer channel.
|
||||||
|
func FormatCustomerEmail(customerID, eventType, severity, message, detailsJSON string) (string, string) {
|
||||||
|
label := severityLabels[severity]
|
||||||
|
if label == "" {
|
||||||
|
label = severity
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use the per-event-type Hungarian message if available, otherwise fall back to message
|
||||||
|
hunMessage := customerMessages[eventType]
|
||||||
|
if hunMessage == "" {
|
||||||
|
hunMessage = message
|
||||||
|
}
|
||||||
|
|
||||||
|
subject := fmt.Sprintf("[Felhom] %s: %s", label, hunMessage)
|
||||||
|
|
||||||
|
now := time.Now().In(budapest).Format("2006-01-02 15:04")
|
||||||
|
body := fmt.Sprintf(`Kedves Ügyfél!
|
||||||
|
|
||||||
|
A Felhom rendszered a következő értesítést küldte:
|
||||||
|
|
||||||
|
%s
|
||||||
|
|
||||||
|
Részletek:
|
||||||
|
- Szerver: %s
|
||||||
|
- Időpont: %s
|
||||||
|
- Szint: %s
|
||||||
|
- Típus: %s`, hunMessage, customerID, now, label, eventType)
|
||||||
|
|
||||||
|
if message != "" && message != hunMessage {
|
||||||
|
body += fmt.Sprintf("\n- Üzenet: %s", message)
|
||||||
|
}
|
||||||
|
|
||||||
|
if detailsJSON != "" && detailsJSON != "{}" {
|
||||||
|
body += fmt.Sprintf("\n- Megjegyzés: %s", detailsJSON)
|
||||||
|
}
|
||||||
|
|
||||||
|
body += `
|
||||||
|
|
||||||
|
Ha kérdésed van, vedd fel a kapcsolatot az üzemeltetővel.
|
||||||
|
|
||||||
|
Üdvözlettel,
|
||||||
|
Felhom.eu monitoring`
|
||||||
|
|
||||||
|
return subject, body
|
||||||
|
}
|
||||||
+225
-12
@@ -121,6 +121,33 @@ func (s *Store) migrate() error {
|
|||||||
// v0.2.1: add status column to customer_configs (idempotent)
|
// v0.2.1: add status column to customer_configs (idempotent)
|
||||||
s.db.Exec("ALTER TABLE customer_configs ADD COLUMN status TEXT NOT NULL DEFAULT 'active'")
|
s.db.Exec("ALTER TABLE customer_configs ADD COLUMN status TEXT NOT NULL DEFAULT 'active'")
|
||||||
|
|
||||||
|
// v0.3.0: events table for hub-native monitoring
|
||||||
|
_, err = s.db.Exec(`
|
||||||
|
CREATE TABLE IF NOT EXISTS events (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
customer_id TEXT NOT NULL,
|
||||||
|
event_type TEXT NOT NULL,
|
||||||
|
severity TEXT NOT NULL DEFAULT 'info',
|
||||||
|
message TEXT NOT NULL DEFAULT '',
|
||||||
|
details_json TEXT NOT NULL DEFAULT '{}',
|
||||||
|
source TEXT NOT NULL DEFAULT 'controller',
|
||||||
|
created_at DATETIME NOT NULL DEFAULT (datetime('now'))
|
||||||
|
);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_events_customer_created
|
||||||
|
ON events(customer_id, created_at DESC);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_events_type
|
||||||
|
ON events(event_type, created_at DESC);
|
||||||
|
`)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// v0.3.0: add cooldown_hours to customer_notifications (idempotent)
|
||||||
|
s.db.Exec("ALTER TABLE customer_notifications ADD COLUMN cooldown_hours INTEGER DEFAULT 6")
|
||||||
|
|
||||||
|
// v0.3.0: add channel column to notification_log (idempotent)
|
||||||
|
s.db.Exec("ALTER TABLE notification_log ADD COLUMN channel TEXT NOT NULL DEFAULT 'customer'")
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -129,15 +156,17 @@ type NotificationPrefs struct {
|
|||||||
CustomerID string
|
CustomerID string
|
||||||
Email string
|
Email string
|
||||||
EnabledEvents []string
|
EnabledEvents []string
|
||||||
|
CooldownHours int
|
||||||
}
|
}
|
||||||
|
|
||||||
// GetNotificationPrefs returns notification preferences for a customer.
|
// GetNotificationPrefs returns notification preferences for a customer.
|
||||||
func (s *Store) GetNotificationPrefs(customerID string) (*NotificationPrefs, error) {
|
func (s *Store) GetNotificationPrefs(customerID string) (*NotificationPrefs, error) {
|
||||||
var email, eventsJSON string
|
var email, eventsJSON string
|
||||||
|
var cooldownHours int
|
||||||
err := s.db.QueryRow(
|
err := s.db.QueryRow(
|
||||||
"SELECT email, enabled_events FROM customer_notifications WHERE customer_id = ?",
|
"SELECT email, enabled_events, COALESCE(cooldown_hours, 6) FROM customer_notifications WHERE customer_id = ?",
|
||||||
customerID,
|
customerID,
|
||||||
).Scan(&email, &eventsJSON)
|
).Scan(&email, &eventsJSON, &cooldownHours)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
if err == sql.ErrNoRows {
|
if err == sql.ErrNoRows {
|
||||||
return nil, nil
|
return nil, nil
|
||||||
@@ -150,34 +179,46 @@ func (s *Store) GetNotificationPrefs(customerID string) (*NotificationPrefs, err
|
|||||||
s.logger.Printf("[WARN] Corrupt enabled_events JSON for %s: %v", customerID, err)
|
s.logger.Printf("[WARN] Corrupt enabled_events JSON for %s: %v", customerID, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if cooldownHours <= 0 {
|
||||||
|
cooldownHours = 6
|
||||||
|
}
|
||||||
|
|
||||||
return &NotificationPrefs{
|
return &NotificationPrefs{
|
||||||
CustomerID: customerID,
|
CustomerID: customerID,
|
||||||
Email: email,
|
Email: email,
|
||||||
EnabledEvents: events,
|
EnabledEvents: events,
|
||||||
|
CooldownHours: cooldownHours,
|
||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// SaveNotificationPrefs creates or updates notification preferences for a customer.
|
// SaveNotificationPrefs creates or updates notification preferences for a customer.
|
||||||
func (s *Store) SaveNotificationPrefs(customerID, email string, enabledEvents []string) error {
|
func (s *Store) SaveNotificationPrefs(customerID, email string, enabledEvents []string, cooldownHours int) error {
|
||||||
eventsJSON, _ := json.Marshal(enabledEvents)
|
eventsJSON, _ := json.Marshal(enabledEvents)
|
||||||
|
if cooldownHours <= 0 {
|
||||||
|
cooldownHours = 6
|
||||||
|
}
|
||||||
_, err := s.db.Exec(`
|
_, err := s.db.Exec(`
|
||||||
INSERT INTO customer_notifications (customer_id, email, enabled_events, updated_at)
|
INSERT INTO customer_notifications (customer_id, email, enabled_events, cooldown_hours, updated_at)
|
||||||
VALUES (?, ?, ?, datetime('now'))
|
VALUES (?, ?, ?, ?, datetime('now'))
|
||||||
ON CONFLICT(customer_id) DO UPDATE SET
|
ON CONFLICT(customer_id) DO UPDATE SET
|
||||||
email = excluded.email,
|
email = excluded.email,
|
||||||
enabled_events = excluded.enabled_events,
|
enabled_events = excluded.enabled_events,
|
||||||
|
cooldown_hours = excluded.cooldown_hours,
|
||||||
updated_at = datetime('now')`,
|
updated_at = datetime('now')`,
|
||||||
customerID, email, string(eventsJSON),
|
customerID, email, string(eventsJSON), cooldownHours,
|
||||||
)
|
)
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
// LogNotification records a notification attempt.
|
// LogNotification records a notification attempt.
|
||||||
func (s *Store) LogNotification(customerID, eventType, severity, message, status, errorMsg string) error {
|
func (s *Store) LogNotification(customerID, eventType, severity, message, status, errorMsg, channel string) error {
|
||||||
|
if channel == "" {
|
||||||
|
channel = "customer"
|
||||||
|
}
|
||||||
_, err := s.db.Exec(`
|
_, err := s.db.Exec(`
|
||||||
INSERT INTO notification_log (customer_id, event_type, severity, message, status, error_message)
|
INSERT INTO notification_log (customer_id, event_type, severity, message, status, error_message, channel)
|
||||||
VALUES (?, ?, ?, ?, ?, ?)`,
|
VALUES (?, ?, ?, ?, ?, ?, ?)`,
|
||||||
customerID, eventType, severity, message, status, errorMsg,
|
customerID, eventType, severity, message, status, errorMsg, channel,
|
||||||
)
|
)
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@@ -189,13 +230,14 @@ type NotificationLogEntry struct {
|
|||||||
Message string
|
Message string
|
||||||
Status string // "sent", "skipped", "failed"
|
Status string // "sent", "skipped", "failed"
|
||||||
ErrorMessage string
|
ErrorMessage string
|
||||||
|
Channel string // "operator" or "customer"
|
||||||
CreatedAt time.Time
|
CreatedAt time.Time
|
||||||
}
|
}
|
||||||
|
|
||||||
// GetRecentNotifications returns the most recent notification log entries for a customer.
|
// GetRecentNotifications returns the most recent notification log entries for a customer.
|
||||||
func (s *Store) GetRecentNotifications(customerID string, limit int) ([]NotificationLogEntry, error) {
|
func (s *Store) GetRecentNotifications(customerID string, limit int) ([]NotificationLogEntry, error) {
|
||||||
rows, err := s.db.Query(`
|
rows, err := s.db.Query(`
|
||||||
SELECT event_type, severity, message, status, COALESCE(error_message, ''), created_at
|
SELECT event_type, severity, message, status, COALESCE(error_message, ''), COALESCE(channel, 'customer'), created_at
|
||||||
FROM notification_log
|
FROM notification_log
|
||||||
WHERE customer_id = ?
|
WHERE customer_id = ?
|
||||||
ORDER BY created_at DESC
|
ORDER BY created_at DESC
|
||||||
@@ -209,7 +251,7 @@ func (s *Store) GetRecentNotifications(customerID string, limit int) ([]Notifica
|
|||||||
for rows.Next() {
|
for rows.Next() {
|
||||||
var e NotificationLogEntry
|
var e NotificationLogEntry
|
||||||
var createdAt, errorMsg string
|
var createdAt, errorMsg string
|
||||||
if err := rows.Scan(&e.EventType, &e.Severity, &e.Message, &e.Status, &errorMsg, &createdAt); err != nil {
|
if err := rows.Scan(&e.EventType, &e.Severity, &e.Message, &e.Status, &errorMsg, &e.Channel, &createdAt); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
e.CreatedAt = parseSQLiteTime(createdAt)
|
e.CreatedAt = parseSQLiteTime(createdAt)
|
||||||
@@ -658,6 +700,177 @@ func (s *Store) UpdateRetrievalPassword(customerID, newPassword string) error {
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// --- Event system ---
|
||||||
|
|
||||||
|
// Event represents a single event record.
|
||||||
|
type Event struct {
|
||||||
|
ID int64
|
||||||
|
CustomerID string
|
||||||
|
EventType string
|
||||||
|
Severity string // "info", "warning", "error"
|
||||||
|
Message string
|
||||||
|
DetailsJSON string // raw JSON
|
||||||
|
Source string // "controller" or "hub"
|
||||||
|
CreatedAt time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
// SaveEvent inserts a new event and returns its ID.
|
||||||
|
func (s *Store) SaveEvent(customerID, eventType, severity, message, detailsJSON, source string) (int64, error) {
|
||||||
|
if detailsJSON == "" {
|
||||||
|
detailsJSON = "{}"
|
||||||
|
}
|
||||||
|
if source == "" {
|
||||||
|
source = "controller"
|
||||||
|
}
|
||||||
|
res, err := s.db.Exec(`
|
||||||
|
INSERT INTO events (customer_id, event_type, severity, message, details_json, source)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?)`,
|
||||||
|
customerID, eventType, severity, message, detailsJSON, source,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return 0, err
|
||||||
|
}
|
||||||
|
return res.LastInsertId()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetRecentEvents returns the most recent events for a customer, newest first.
|
||||||
|
func (s *Store) GetRecentEvents(customerID string, limit int) ([]Event, error) {
|
||||||
|
rows, err := s.db.Query(`
|
||||||
|
SELECT id, customer_id, event_type, severity, message, details_json, source, created_at
|
||||||
|
FROM events
|
||||||
|
WHERE customer_id = ?
|
||||||
|
ORDER BY created_at DESC
|
||||||
|
LIMIT ?`, customerID, limit)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
return scanEvents(rows)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetEventsByType returns events of a specific type for a customer since a given time.
|
||||||
|
func (s *Store) GetEventsByType(customerID, eventType string, since time.Time) ([]Event, error) {
|
||||||
|
rows, err := s.db.Query(`
|
||||||
|
SELECT id, customer_id, event_type, severity, message, details_json, source, created_at
|
||||||
|
FROM events
|
||||||
|
WHERE customer_id = ? AND event_type = ? AND created_at >= ?
|
||||||
|
ORDER BY created_at DESC`,
|
||||||
|
customerID, eventType, since.UTC().Format("2006-01-02 15:04:05"))
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
return scanEvents(rows)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLatestEventByType returns the most recent event of a given type for a customer.
|
||||||
|
func (s *Store) GetLatestEventByType(customerID, eventType string) (*Event, error) {
|
||||||
|
var e Event
|
||||||
|
var createdAt string
|
||||||
|
err := s.db.QueryRow(`
|
||||||
|
SELECT id, customer_id, event_type, severity, message, details_json, source, created_at
|
||||||
|
FROM events
|
||||||
|
WHERE customer_id = ? AND event_type = ?
|
||||||
|
ORDER BY created_at DESC
|
||||||
|
LIMIT 1`, customerID, eventType,
|
||||||
|
).Scan(&e.ID, &e.CustomerID, &e.EventType, &e.Severity, &e.Message, &e.DetailsJSON, &e.Source, &createdAt)
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
e.CreatedAt = parseSQLiteTime(createdAt)
|
||||||
|
return &e, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetAllRecentEvents returns the most recent events across all customers.
|
||||||
|
func (s *Store) GetAllRecentEvents(limit int) ([]Event, error) {
|
||||||
|
rows, err := s.db.Query(`
|
||||||
|
SELECT id, customer_id, event_type, severity, message, details_json, source, created_at
|
||||||
|
FROM events
|
||||||
|
ORDER BY created_at DESC
|
||||||
|
LIMIT ?`, limit)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
return scanEvents(rows)
|
||||||
|
}
|
||||||
|
|
||||||
|
// CountEventsBySeverity returns a count of events per severity for a customer since a given time.
|
||||||
|
func (s *Store) CountEventsBySeverity(customerID string, since time.Time) (map[string]int, error) {
|
||||||
|
rows, err := s.db.Query(`
|
||||||
|
SELECT severity, COUNT(*) FROM events
|
||||||
|
WHERE customer_id = ? AND created_at >= ?
|
||||||
|
GROUP BY severity`,
|
||||||
|
customerID, since.UTC().Format("2006-01-02 15:04:05"))
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
|
||||||
|
counts := make(map[string]int)
|
||||||
|
for rows.Next() {
|
||||||
|
var sev string
|
||||||
|
var count int
|
||||||
|
if err := rows.Scan(&sev, &count); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
counts[sev] = count
|
||||||
|
}
|
||||||
|
return counts, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// PruneEvents deletes events older than the given number of days.
|
||||||
|
func (s *Store) PruneEvents(maxDays int) (int64, error) {
|
||||||
|
cutoff := time.Now().AddDate(0, 0, -maxDays).UTC().Format("2006-01-02 15:04:05")
|
||||||
|
res, err := s.db.Exec("DELETE FROM events WHERE created_at < ?", cutoff)
|
||||||
|
if err != nil {
|
||||||
|
return 0, err
|
||||||
|
}
|
||||||
|
return res.RowsAffected()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetActiveCustomerIDs returns customer IDs from customer_configs where status is 'active'.
|
||||||
|
func (s *Store) GetActiveCustomerIDs() ([]string, error) {
|
||||||
|
rows, err := s.db.Query("SELECT customer_id FROM customer_configs WHERE status = 'active'")
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
|
||||||
|
var ids []string
|
||||||
|
for rows.Next() {
|
||||||
|
var id string
|
||||||
|
if err := rows.Scan(&id); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
ids = append(ids, id)
|
||||||
|
}
|
||||||
|
return ids, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ping verifies the database is accessible.
|
||||||
|
func (s *Store) Ping() error {
|
||||||
|
var n int
|
||||||
|
return s.db.QueryRow("SELECT 1").Scan(&n)
|
||||||
|
}
|
||||||
|
|
||||||
|
func scanEvents(rows *sql.Rows) ([]Event, error) {
|
||||||
|
var events []Event
|
||||||
|
for rows.Next() {
|
||||||
|
var e Event
|
||||||
|
var createdAt string
|
||||||
|
if err := rows.Scan(&e.ID, &e.CustomerID, &e.EventType, &e.Severity, &e.Message, &e.DetailsJSON, &e.Source, &createdAt); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
e.CreatedAt = parseSQLiteTime(createdAt)
|
||||||
|
events = append(events, e)
|
||||||
|
}
|
||||||
|
return events, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return.
|
// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return.
|
||||||
func parseSQLiteTime(s string) time.Time {
|
func parseSQLiteTime(s string) time.Time {
|
||||||
formats := []string{
|
formats := []string{
|
||||||
|
|||||||
@@ -223,12 +223,14 @@ func (s *Server) handleCustomerUnified(w http.ResponseWriter, r *http.Request, c
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// History, notifications, infra backup
|
// History, notifications, events, infra backup
|
||||||
var history []store.CustomerSummary
|
var history []store.CustomerSummary
|
||||||
var notifPrefs *store.NotificationPrefs
|
var notifPrefs *store.NotificationPrefs
|
||||||
var recentNotifs []store.NotificationLogEntry
|
var recentNotifs []store.NotificationLogEntry
|
||||||
var infraMeta *store.InfraBackupMeta
|
var infraMeta *store.InfraBackupMeta
|
||||||
var infraBackupAge string
|
var infraBackupAge string
|
||||||
|
var events []store.Event
|
||||||
|
var eventCounts map[string]int
|
||||||
|
|
||||||
if customer != nil {
|
if customer != nil {
|
||||||
history, _ = s.store.GetCustomerHistory(customerID, 24*time.Hour)
|
history, _ = s.store.GetCustomerHistory(customerID, 24*time.Hour)
|
||||||
@@ -238,6 +240,8 @@ func (s *Server) handleCustomerUnified(w http.ResponseWriter, r *http.Request, c
|
|||||||
if infraMeta != nil {
|
if infraMeta != nil {
|
||||||
infraBackupAge = timeAgo(infraMeta.UpdatedAt)
|
infraBackupAge = timeAgo(infraMeta.UpdatedAt)
|
||||||
}
|
}
|
||||||
|
events, _ = s.store.GetRecentEvents(customerID, 50)
|
||||||
|
eventCounts, _ = s.store.CountEventsBySeverity(customerID, time.Now().Add(-24*time.Hour))
|
||||||
}
|
}
|
||||||
|
|
||||||
type pageData struct {
|
type pageData struct {
|
||||||
@@ -270,6 +274,9 @@ func (s *Server) handleCustomerUnified(w http.ResponseWriter, r *http.Request, c
|
|||||||
RecentNotifications []store.NotificationLogEntry
|
RecentNotifications []store.NotificationLogEntry
|
||||||
History []store.CustomerSummary
|
History []store.CustomerSummary
|
||||||
|
|
||||||
|
Events []store.Event
|
||||||
|
EventCounts map[string]int // severity → count (last 24h)
|
||||||
|
|
||||||
Flash string
|
Flash string
|
||||||
ActiveNav string
|
ActiveNav string
|
||||||
}
|
}
|
||||||
@@ -304,6 +311,9 @@ func (s *Server) handleCustomerUnified(w http.ResponseWriter, r *http.Request, c
|
|||||||
RecentNotifications: recentNotifs,
|
RecentNotifications: recentNotifs,
|
||||||
History: history,
|
History: history,
|
||||||
|
|
||||||
|
Events: events,
|
||||||
|
EventCounts: eventCounts,
|
||||||
|
|
||||||
Flash: r.URL.Query().Get("flash"),
|
Flash: r.URL.Query().Get("flash"),
|
||||||
ActiveNav: "configs",
|
ActiveNav: "configs",
|
||||||
}
|
}
|
||||||
@@ -697,7 +707,7 @@ func buildConfigJSON(r *http.Request) string {
|
|||||||
overrides["git"] = git
|
overrides["git"] = git
|
||||||
}
|
}
|
||||||
|
|
||||||
// Monitoring UUIDs
|
// Monitoring UUIDs (legacy — only written if user explicitly provides values)
|
||||||
uuids := make(map[string]interface{})
|
uuids := make(map[string]interface{})
|
||||||
for _, key := range []string{"heartbeat", "system_health", "db_dump", "backup", "backup_integrity"} {
|
for _, key := range []string{"heartbeat", "system_health", "db_dump", "backup", "backup_integrity"} {
|
||||||
if v := strings.TrimSpace(r.FormValue("uuid_" + key)); v != "" {
|
if v := strings.TrimSpace(r.FormValue("uuid_" + key)); v != "" {
|
||||||
|
|||||||
@@ -80,12 +80,12 @@ backup:
|
|||||||
monitoring:
|
monitoring:
|
||||||
enabled: true
|
enabled: true
|
||||||
healthchecks_base: "https://status.felhom.eu"
|
healthchecks_base: "https://status.felhom.eu"
|
||||||
ping_uuids:
|
# ping_uuids: (deprecated — monitoring is now handled by the Hub event system)
|
||||||
heartbeat: "" # Every 5 min — controller process alive
|
# heartbeat: ""
|
||||||
system_health: "" # Every 5 min — comprehensive system check
|
# system_health: ""
|
||||||
db_dump: "" # Daily — after database dumps
|
# db_dump: ""
|
||||||
backup: "" # Daily — after restic snapshot
|
# backup: ""
|
||||||
backup_integrity: "" # Weekly (Sunday) — restic check
|
# backup_integrity: ""
|
||||||
system_health_interval: "5m"
|
system_health_interval: "5m"
|
||||||
health_check_schedule: "06:00"
|
health_check_schedule: "06:00"
|
||||||
thresholds:
|
thresholds:
|
||||||
|
|||||||
@@ -40,6 +40,13 @@ func New(store *store.Store, passwordHash, apiKey string, staleThreshold time.Du
|
|||||||
b, _ := json.Marshal(v)
|
b, _ := json.Marshal(v)
|
||||||
return template.JS(b)
|
return template.JS(b)
|
||||||
},
|
},
|
||||||
|
"add": func(a, b int) int { return a + b },
|
||||||
|
"mapGet": func(m map[string]int, key string) int {
|
||||||
|
if m == nil {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
return m[key]
|
||||||
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
tmpl := template.Must(template.New("").Funcs(funcMap).ParseFS(templateFS, "templates/*.html"))
|
tmpl := template.Must(template.New("").Funcs(funcMap).ParseFS(templateFS, "templates/*.html"))
|
||||||
@@ -232,6 +239,8 @@ func (s *Server) handleDashboard(w http.ResponseWriter, r *http.Request) {
|
|||||||
store.CustomerSummary
|
store.CustomerSummary
|
||||||
OverallStatus string // "ok", "warn", "down", "pending"
|
OverallStatus string // "ok", "warn", "down", "pending"
|
||||||
BackupAge string
|
BackupAge string
|
||||||
|
EventErrors int
|
||||||
|
EventWarnings int
|
||||||
}
|
}
|
||||||
|
|
||||||
// Build map of report customers keyed by ID
|
// Build map of report customers keyed by ID
|
||||||
@@ -266,6 +275,12 @@ func (s *Server) handleDashboard(w http.ResponseWriter, r *http.Request) {
|
|||||||
dc.BackupAge = "–"
|
dc.BackupAge = "–"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Event counts (last 24h)
|
||||||
|
if counts, err := s.store.CountEventsBySeverity(c.CustomerID, time.Now().Add(-24*time.Hour)); err == nil {
|
||||||
|
dc.EventErrors = counts["error"]
|
||||||
|
dc.EventWarnings = counts["warning"]
|
||||||
|
}
|
||||||
|
|
||||||
data = append(data, dc)
|
data = append(data, dc)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -96,8 +96,9 @@
|
|||||||
</div>
|
</div>
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details class="card" {{if index .Overrides "monitoring"}}open{{end}}>
|
<details class="card">
|
||||||
<summary><h2 style="display:inline">Monitoring UUIDs</h2></summary>
|
<summary><h2 style="display:inline">Monitoring UUIDs</h2> <span class="severity-badge severity-warning" style="font-size: 0.7em; vertical-align: middle;">Legacy</span></summary>
|
||||||
|
<p class="text-muted" style="margin: 0.5rem 0;">Healthchecks ping UUIDs are deprecated. Monitoring is now handled natively by the Hub event system. These fields are kept for backward compatibility with older controllers.</p>
|
||||||
<div class="form-grid" style="margin-top: 1rem;">
|
<div class="form-grid" style="margin-top: 1rem;">
|
||||||
{{$uuids := ""}}
|
{{$uuids := ""}}
|
||||||
{{with .Overrides}}{{with index . "monitoring"}}{{with index . "ping_uuids"}}{{$uuids = .}}{{end}}{{end}}{{end}}
|
{{with .Overrides}}{{with index . "monitoring"}}{{with index . "ping_uuids"}}{{$uuids = .}}{{end}}{{end}}{{end}}
|
||||||
|
|||||||
@@ -403,6 +403,61 @@
|
|||||||
{{end}}
|
{{end}}
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<!-- Events -->
|
||||||
|
<section class="card">
|
||||||
|
<h2>Events
|
||||||
|
{{if .EventCounts}}
|
||||||
|
{{with mapGet .EventCounts "error"}}<span class="severity-badge severity-error">{{.}} error{{if gt . 1}}s{{end}}</span>{{end}}
|
||||||
|
{{with mapGet .EventCounts "warning"}}<span class="severity-badge severity-warning">{{.}} warning{{if gt . 1}}s{{end}}</span>{{end}}
|
||||||
|
{{end}}
|
||||||
|
<span class="text-muted" style="font-size: 0.7em; font-weight: normal;"> (last 24h)</span>
|
||||||
|
</h2>
|
||||||
|
{{if .Events}}
|
||||||
|
<div style="margin-bottom: 0.5rem;">
|
||||||
|
<button class="btn btn-sm btn-outline event-filter active" data-filter="all">All</button>
|
||||||
|
<button class="btn btn-sm btn-outline event-filter" data-filter="error">Errors</button>
|
||||||
|
<button class="btn btn-sm btn-outline event-filter" data-filter="warning">Warnings</button>
|
||||||
|
<button class="btn btn-sm btn-outline event-filter" data-filter="info">Info</button>
|
||||||
|
</div>
|
||||||
|
<table class="history-table" id="events-table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Time</th>
|
||||||
|
<th>Severity</th>
|
||||||
|
<th>Type</th>
|
||||||
|
<th>Message</th>
|
||||||
|
<th>Source</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{{range .Events}}
|
||||||
|
<tr data-severity="{{.Severity}}">
|
||||||
|
<td title="{{.CreatedAt.Format "2006-01-02 15:04:05"}}">{{.CreatedAt.Format "Jan 02 15:04"}}</td>
|
||||||
|
<td><span class="severity-badge severity-{{.Severity}}">{{.Severity}}</span></td>
|
||||||
|
<td><code>{{.EventType}}</code></td>
|
||||||
|
<td>{{.Message}}</td>
|
||||||
|
<td>{{.Source}}</td>
|
||||||
|
</tr>
|
||||||
|
{{end}}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<script>
|
||||||
|
document.querySelectorAll('.event-filter').forEach(btn => {
|
||||||
|
btn.addEventListener('click', function() {
|
||||||
|
document.querySelectorAll('.event-filter').forEach(b => b.classList.remove('active'));
|
||||||
|
this.classList.add('active');
|
||||||
|
const filter = this.dataset.filter;
|
||||||
|
document.querySelectorAll('#events-table tbody tr').forEach(row => {
|
||||||
|
row.style.display = (filter === 'all' || row.dataset.severity === filter) ? '' : 'none';
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
{{else}}
|
||||||
|
<p class="text-muted">No events recorded yet.</p>
|
||||||
|
{{end}}
|
||||||
|
</section>
|
||||||
|
|
||||||
<!-- Notifications -->
|
<!-- Notifications -->
|
||||||
<section class="card">
|
<section class="card">
|
||||||
<h2>Notifications</h2>
|
<h2>Notifications</h2>
|
||||||
@@ -424,6 +479,7 @@
|
|||||||
<thead>
|
<thead>
|
||||||
<tr>
|
<tr>
|
||||||
<th>Time</th>
|
<th>Time</th>
|
||||||
|
<th>Channel</th>
|
||||||
<th>Event</th>
|
<th>Event</th>
|
||||||
<th>Status</th>
|
<th>Status</th>
|
||||||
<th>Message</th>
|
<th>Message</th>
|
||||||
@@ -433,6 +489,7 @@
|
|||||||
{{range .RecentNotifications}}
|
{{range .RecentNotifications}}
|
||||||
<tr>
|
<tr>
|
||||||
<td>{{.CreatedAt.Format "Jan 02 15:04"}}</td>
|
<td>{{.CreatedAt.Format "Jan 02 15:04"}}</td>
|
||||||
|
<td><span class="status-badge status-badge-{{.Channel}}">{{.Channel}}</span></td>
|
||||||
<td>{{.EventType}}</td>
|
<td>{{.EventType}}</td>
|
||||||
<td><span class="status-badge status-badge-{{.Status}}">{{.Status}}</span></td>
|
<td><span class="status-badge status-badge-{{.Status}}">{{.Status}}</span></td>
|
||||||
<td>{{.Message}}</td>
|
<td>{{.Message}}</td>
|
||||||
|
|||||||
@@ -28,6 +28,7 @@
|
|||||||
<tr>
|
<tr>
|
||||||
<th>Customer</th>
|
<th>Customer</th>
|
||||||
<th>Status</th>
|
<th>Status</th>
|
||||||
|
<th>Events</th>
|
||||||
<th>Last Seen</th>
|
<th>Last Seen</th>
|
||||||
<th>CPU</th>
|
<th>CPU</th>
|
||||||
<th>Memory</th>
|
<th>Memory</th>
|
||||||
@@ -49,6 +50,7 @@
|
|||||||
{{if eq .OverallStatus "ok"}}OK{{else if eq .OverallStatus "warn"}}WARN{{else if eq .OverallStatus "disabled"}}PAUSED{{else if eq .OverallStatus "pending"}}PENDING{{else}}DOWN{{end}}
|
{{if eq .OverallStatus "ok"}}OK{{else if eq .OverallStatus "warn"}}WARN{{else if eq .OverallStatus "disabled"}}PAUSED{{else if eq .OverallStatus "pending"}}PENDING{{else}}DOWN{{end}}
|
||||||
</span>
|
</span>
|
||||||
</td>
|
</td>
|
||||||
|
<td>{{if eq .OverallStatus "pending"}}—{{else}}{{if gt (add .EventErrors .EventWarnings) 0}}{{if gt .EventErrors 0}}<span class="severity-badge severity-error">{{.EventErrors}}</span>{{end}}{{if gt .EventWarnings 0}}<span class="severity-badge severity-warning">{{.EventWarnings}}</span>{{end}}{{else}}<span class="text-muted">—</span>{{end}}{{end}}</td>
|
||||||
<td>{{if eq .OverallStatus "pending"}}—{{else}}{{timeAgo .ReceivedAt}}{{end}}</td>
|
<td>{{if eq .OverallStatus "pending"}}—{{else}}{{timeAgo .ReceivedAt}}{{end}}</td>
|
||||||
<td>{{if eq .OverallStatus "pending"}}—{{else}}{{formatFloat .CPUPercent}}%{{end}}</td>
|
<td>{{if eq .OverallStatus "pending"}}—{{else}}{{formatFloat .CPUPercent}}%{{end}}</td>
|
||||||
<td>{{if eq .OverallStatus "pending"}}—{{else}}{{formatFloat .MemoryPercent}}%{{end}}</td>
|
<td>{{if eq .OverallStatus "pending"}}—{{else}}{{formatFloat .MemoryPercent}}%{{end}}</td>
|
||||||
|
|||||||
@@ -564,6 +564,55 @@ code {
|
|||||||
color: var(--text-muted);
|
color: var(--text-muted);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Severity badges */
|
||||||
|
.severity-badge {
|
||||||
|
display: inline-block;
|
||||||
|
padding: 0.15em 0.5em;
|
||||||
|
border-radius: 4px;
|
||||||
|
font-size: 0.8em;
|
||||||
|
font-weight: 600;
|
||||||
|
line-height: 1.4;
|
||||||
|
}
|
||||||
|
|
||||||
|
.severity-error {
|
||||||
|
background: rgba(239, 68, 68, 0.15);
|
||||||
|
color: #ef4444;
|
||||||
|
}
|
||||||
|
|
||||||
|
.severity-warning {
|
||||||
|
background: rgba(245, 158, 11, 0.15);
|
||||||
|
color: #f59e0b;
|
||||||
|
}
|
||||||
|
|
||||||
|
.severity-info {
|
||||||
|
background: rgba(59, 130, 246, 0.15);
|
||||||
|
color: #3b82f6;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Event filter buttons */
|
||||||
|
.event-filter {
|
||||||
|
font-size: 0.8em;
|
||||||
|
padding: 0.2em 0.6em;
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
|
||||||
|
.event-filter.active {
|
||||||
|
background: var(--accent);
|
||||||
|
color: #fff;
|
||||||
|
border-color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Notification channel badges */
|
||||||
|
.status-badge-operator {
|
||||||
|
background: rgba(139, 92, 246, 0.15);
|
||||||
|
color: #8b5cf6;
|
||||||
|
}
|
||||||
|
|
||||||
|
.status-badge-customer {
|
||||||
|
background: rgba(59, 130, 246, 0.15);
|
||||||
|
color: #3b82f6;
|
||||||
|
}
|
||||||
|
|
||||||
/* Responsive */
|
/* Responsive */
|
||||||
@media (max-width: 768px) {
|
@media (max-width: 768px) {
|
||||||
.container { padding: 1rem; }
|
.container { padding: 1rem; }
|
||||||
|
|||||||
Reference in New Issue
Block a user