# TASK: Phase 2 — Monitoring Warnings, Dashboard Alerts & Notification System
**Version target:** 0.7.1
**Repos:** `deploy-felhom-compose` (controller) + `felhom.eu` (notification-relay on k3s)
## Overview
Three workstreams in this phase:
1. **Monitoring page warnings** — Show healthcheck ping configuration status, warn about missing UUIDs
2. **Dashboard alert system** — Persistent in-app banners for active issues/warnings
3. **Notification system** — Central email relay on k3s + customer-side preferences UI
---
## 1. Monitoring Page: Healthcheck Ping Status
### 1.1 Problem
The monitoring page shows system metrics but doesn't indicate whether healthcheck pings are actually configured. If a ping UUID is empty or `CHANGEME`, the pinger silently skips it — the customer has no visibility into whether remote monitoring is working.
### 1.2 New section: "Távoli monitoring" (Remote Monitoring)
Add a new section to `monitoring.html` **between** "Rendszer áttekintés" and "Rendszer metrikák" (section 1 and 2). This section is server-rendered (not JS/API — ping config is static and known at page load).
Display a table showing each healthcheck ping's configuration status:
| Ellenőrzés | UUID státusz | Gyakoriság |
|---|---|---|
| 💓 Életjel (Heartbeat) | ✅ Beállítva | 5 percenként |
| 🖥️ Rendszer állapot | ✅ Beállítva | 5 percenként |
| 🗄️ Adatbázis mentés | ⚠️ Nincs beállítva | Naponta 02:30 |
| 💾 Biztonsági mentés | ✅ Beállítva | Naponta 03:00 |
| 🔍 Mentés integritás | ⚠️ Nincs beállítva | Hetente (vasárnap) |
**Logic for each row:**
- Read `monitoring.ping_uuids.*` from config
- UUID is "configured" if: non-empty AND doesn't start with `CHANGEME`
- If configured: show `✅ Beállítva` (green text)
- If not configured: show `⚠️ Nincs beállítva` (yellow/orange warning text)
- If monitoring is disabled entirely (`monitoring.enabled = false`): show a single warning banner instead of the table: "A távoli monitoring ki van kapcsolva. Az üzemeltető nem kap értesítést hibák esetén."
**Summary banner above the table:**
- All configured: green banner — "✅ Minden távoli monitoring aktív — az üzemeltető értesítést kap hibák esetén."
- Some missing: yellow banner — "⚠️ Egyes monitoring ellenőrzések nincsenek beállítva. Kérd az üzemeltetőt a konfiguráláshoz."
- Monitoring disabled: red/orange banner — as above
### 1.3 Data flow
Add a new template data struct for the monitoring handler:
```go
type MonitoringPageData struct {
// Existing fields...
SystemInfo *system.Info
ActivePage string
// New: healthcheck ping status
MonitoringEnabled bool
PingStatus []PingStatusItem
AllPingsConfigured bool
}
type PingStatusItem struct {
Label string // Hungarian display name
Icon string // emoji
Configured bool // UUID is valid
Schedule string // "5 percenként" / "Naponta 02:30" etc.
}
```
Build `PingStatus` slice in the handler from `cfg.Monitoring.PingUUIDs`:
```go
pings := []PingStatusItem{
{Label: "Életjel (Heartbeat)", Icon: "💓", Configured: isConfigured(cfg.Monitoring.PingUUIDs.Heartbeat), Schedule: "5 percenként"},
{Label: "Rendszer állapot", Icon: "🖥️", Configured: isConfigured(cfg.Monitoring.PingUUIDs.SystemHealth), Schedule: "5 percenként"},
{Label: "Adatbázis mentés", Icon: "🗄️", Configured: isConfigured(cfg.Monitoring.PingUUIDs.DBDump), Schedule: "Naponta " + cfg.Backup.DBDumpSchedule},
{Label: "Biztonsági mentés", Icon: "💾", Configured: isConfigured(cfg.Monitoring.PingUUIDs.Backup), Schedule: "Naponta " + cfg.Backup.ResticSchedule},
{Label: "Mentés integritás", Icon: "🔍", Configured: isConfigured(cfg.Monitoring.PingUUIDs.BackupIntegrity), Schedule: "Hetente (vasárnap)"},
}
func isConfigured(uuid string) bool {
return uuid != "" && !strings.HasPrefix(uuid, "CHANGEME")
}
```
### 1.4 CSS
Reuse existing `.settings-row` / `.sysinfo-row` pattern for the table. Add:
- `.ping-status-ok` — green text (same as `.state-text-green`)
- `.ping-status-warn` — orange/yellow text
- `.monitoring-banner` — full-width banner with icon, green/yellow/red variants
---
## 2. Dashboard Alert System
### 2.1 Concept
Display persistent alert banners at the top of the main content area (below page header, above page content). Alerts are generated from the latest health check results and other events. They show on ALL pages, not just monitoring.
### 2.2 Alert sources
The controller already runs health checks every 5 minutes (`RunHealthCheck`). The resulting `HealthReport` contains `Issues` (critical) and `Warnings` (non-critical). Use these directly.
Additionally, generate alerts for:
- Missing healthcheck ping UUIDs (from section 1 above)
- Backup not configured (`backup.enabled = false`)
- Hub reporting not configured when it should be
- Recent backup failures (from backup manager state)
### 2.3 Implementation: Alert Manager
Create `internal/web/alerts.go`:
```go
type Alert struct {
ID string // unique, for dismiss tracking
Level string // "error", "warning", "info"
Message string // Hungarian text
Link string // optional link to relevant page (e.g., "/monitoring", "/backups")
LinkText string // "Részletek" etc.
}
type AlertManager struct {
mu sync.RWMutex
alerts []Alert
logger *log.Logger
}
```
**Alert generation** runs after each health check cycle (every 5 min):
```go
func (am *AlertManager) Refresh(healthReport *HealthReport, cfg *config.Config, backupMgr *backup.Manager) {
var alerts []Alert
// From health check issues
for _, issue := range healthReport.Issues {
alerts = append(alerts, Alert{
ID: "health-" + hash(issue), Level: "error",
Message: issue, Link: "/monitoring", LinkText: "Rendszermonitor",
})
}
// From health check warnings
for _, w := range healthReport.Warnings {
alerts = append(alerts, Alert{
ID: "health-" + hash(w), Level: "warning",
Message: w, Link: "/monitoring", LinkText: "Rendszermonitor",
})
}
// Missing ping UUIDs
missingCount := countMissingPings(cfg)
if missingCount > 0 {
alerts = append(alerts, Alert{
ID: "pings-missing", Level: "warning",
Message: fmt.Sprintf("%d monitoring ellenőrzés nincs beállítva", missingCount),
Link: "/monitoring", LinkText: "Rendszermonitor",
})
}
// Backup disabled
if !cfg.Backup.Enabled {
alerts = append(alerts, Alert{
ID: "backup-disabled", Level: "warning",
Message: "A biztonsági mentés nincs bekapcsolva",
Link: "/settings", LinkText: "Beállítások",
})
}
am.mu.Lock()
am.alerts = alerts
am.mu.Unlock()
}
```
### 2.4 Template integration
In `layout_start` template (or a new `alerts` partial), render alerts above the page content:
```html
{{if .Alerts}}
{{range .Alerts}}
{{if eq .Level "error"}}🔴{{else if eq .Level "warning"}}🟡{{else}}ℹ️{{end}}
{{.Message}}
{{if .Link}}
{{.LinkText}} →{{end}}
{{end}}
{{end}}
```
**Key decisions:**
- Alerts are NOT dismissible (they reflect real state — they disappear when the issue is resolved)
- Maximum 5 alerts shown, with "+N more" indicator if overflow
- On the monitoring page, skip the "pings-missing" alert since the detailed table is already visible
- Error alerts (red) above warning alerts (yellow)
### 2.5 Passing alerts to templates
Every page handler already passes template data via a struct. Add an `Alerts []Alert` field to each page's data struct (or use a shared base struct). The alert manager is available via the web server struct.
```go
// In each handler:
data.Alerts = s.alertManager.GetAlerts()
```
### 2.6 CSS
```css
.alerts-container { margin-bottom: 1rem; }
.alert-banner {
display: flex; align-items: center; gap: 0.75rem;
padding: 0.75rem 1rem; border-radius: 8px; margin-bottom: 0.5rem;
font-size: 0.9rem;
}
.alert-banner-error { background: rgba(248, 113, 113, 0.1); border: 1px solid rgba(248, 113, 113, 0.3); color: #f87171; }
.alert-banner-warning { background: rgba(250, 204, 21, 0.1); border: 1px solid rgba(250, 204, 21, 0.3); color: #facc15; }
.alert-banner-info { background: rgba(96, 165, 250, 0.1); border: 1px solid rgba(96, 165, 250, 0.3); color: #60a5fa; }
.alert-link { margin-left: auto; white-space: nowrap; }
```
---
## 3. Notification System
### 3.1 Architecture
```
Customer Node k3s Cluster
┌──────────────────────┐ ┌──────────────────────────────┐
│ felhom-controller │ HTTP POST │ notification-relay │
│ │ ─────────────────>│ (notify.felhom.eu) │
│ Event detected: │ {customer_id, │ │
│ - disk_warning │ event_type, │ 1. Validate API key │
│ - backup_failed │ message, │ 2. Format email │
│ - ... │ severity} │ 3. Send via Resend API │
│ │ │ 4. Return 200/4xx/5xx │
└──────────────────────┘ └──────────────────────────────┘
│
│ Resend API
▼
┌──────────────┐
│ Customer │
│ email inbox │
└──────────────┘
```
**Why a relay?**
- Resend API key stays on trusted infrastructure (k3s), never on customer hardware
- Central rate limiting and logging of all notifications
- Operator visibility into what notifications were sent
- Customer controllers only need hub URL + API key (already have these for hub reporting)
### 3.2 Notification Relay Service (k3s side)
**Repo:** `felhom.eu` — new directory `notification-relay/` alongside `hub/`
This is a small Go service, similar to `contact-mailer`. Deploy on k3s at `notify.felhom.eu` (or as a path under hub, e.g., `hub.felhom.eu/api/v1/notify`).
**Option A: Standalone service at notify.felhom.eu**
- Separate deployment, its own ingress
- Clean separation of concerns
- More k3s resources
**Option B: Add notify endpoint to the existing hub**
- Hub already runs, has API key auth, knows customer IDs
- Just add a `POST /api/v1/notify` endpoint
- Reuse hub's Resend integration
- Less infrastructure
**Recommendation: Option B** — Add to the hub. The hub already authenticates customers by API key and has all the context needed. Adding a `/api/v1/notify` endpoint is minimal work.
#### Hub notify endpoint
```
POST /api/v1/notify
Authorization: Bearer
Content-Type: application/json
{
"customer_id": "demo-felhom",
"event_type": "disk_warning",
"severity": "warning", // "info", "warning", "critical"
"message": "SSD disk usage: 85%",
"details": "Threshold: 80%" // optional
}
```
**Hub processing:**
1. Validate API key (same auth as report push)
2. Look up customer notification preferences (stored in hub's SQLite)
3. If customer has email configured AND event_type is in their enabled events:
- Format email (Hungarian template)
- Send via Resend API (direct HTTP call, same pattern as contact-mailer)
4. Log the notification attempt and result
5. Return 200 (accepted), 400 (bad request), 401 (unauthorized)
**Hub config additions** (hub.yaml secret):
```yaml
RESEND_API_KEY: "re_XZZenCJs..." # Same key as healthchecks/contact-mailer
FROM_EMAIL: "monitoring@felhom.eu"
```
#### Customer notification config (hub-side storage)
The hub stores per-customer notification preferences in its SQLite DB:
```sql
CREATE TABLE customer_notifications (
customer_id TEXT PRIMARY KEY,
email TEXT NOT NULL DEFAULT '', -- customer email address
enabled_events TEXT NOT NULL DEFAULT '[]', -- JSON array of event types
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
**How the customer's email and preferences get there:**
- **Phase 2a (this task):** Operator sets them manually via hub dashboard or SQLite
- **Phase 2b (future):** Controller pushes notification preferences to hub along with reports, hub saves them
For now (2a), the operator configures each customer's email in the hub after setup. This avoids needing the controller to push preferences to the hub yet.
### 3.3 Controller-side: Notification Trigger
Add `internal/notify/notifier.go`:
```go
type Notifier struct {
hubURL string
apiKey string
httpClient *http.Client
logger *log.Logger
enabled bool
prefs *settings.NotificationPrefs // local preferences
}
// Notify sends a notification event to the hub relay.
// Non-blocking: fires and forgets (logs errors but doesn't retry aggressively).
func (n *Notifier) Notify(eventType, severity, message, details string) {
if !n.enabled { return }
if !n.prefs.IsEventEnabled(eventType) { return }
// POST to hub
payload := NotifyRequest{
CustomerID: n.customerID,
EventType: eventType,
Severity: severity,
Message: message,
Details: details,
}
// ... HTTP POST to hubURL + "/api/v1/notify"
}
```
**Integration points** — trigger notifications from:
1. `monitor/healthcheck.go` — after RunHealthCheck, if status changed from ok to warn/fail
2. `backup/backup.go` — after backup failure
3. `backup/dbdump.go` — after DB dump failure
4. `scheduler/scheduler.go` — after integrity check failure
**Important: Don't spam.** Track last notification time per event type. Don't re-notify for the same ongoing issue within 6 hours (configurable). This is handled locally with an in-memory map.
### 3.4 Notification Preferences (settings.json + settings page)
Expand the `NotificationPrefs` struct:
```go
type NotificationPrefs struct {
// Customer email for notifications (sent to hub, hub delivers via Resend)
Email string `json:"email,omitempty"`
// Which events to be notified about
EnabledEvents []string `json:"enabled_events,omitempty"`
// Notification cooldown in hours (don't re-send for same issue within this period)
CooldownHours int `json:"cooldown_hours,omitempty"` // default: 6
}
// Default events if not configured
var DefaultCustomerEvents = []string{
"disk_warning",
"backup_failed",
"update_available",
}
```
### 3.5 Settings page: Notification Preferences UI
Add a **third section** to the existing settings page (below "Jelszó módosítás"):
**Section C: "Értesítések" (Notifications)**
Only shown if hub is enabled (`hub.enabled = true`). If hub is disabled, show info message:
"Az értesítések a központi rendszeren keresztül működnek, ami jelenleg nincs bekapcsolva."
```
┌─────────────────────────────────────────────────────────┐
│ Értesítések │
│ │
│ E-mail cím: [________________________] (text input) │
│ │
│ Az alábbi eseményekről kapjon értesítést: │
│ │
│ [x] Lemez figyelmeztetés (80%+) │
│ [x] Biztonsági mentés sikertelen │
│ [x] Frissítés elérhető │
│ [ ] Biztonsági frissítés │
│ │
│ Értesítési szünet: [6] óra │
│ (Azonos probléma esetén ennyi ideig nem küld újat) │
│ │
│ [Mentés] [Teszt email küldése]│
└─────────────────────────────────────────────────────────┘
```
**Route:**
| Method | Path | Auth? | Handler |
|--------|------|-------|---------|
| POST | `/settings/notifications` | Yes | Save notification preferences |
| POST | `/settings/notifications/test` | Yes | Send test notification via hub relay |
**POST `/settings/notifications` handler:**
1. Parse form: email, enabled_events (checkbox list), cooldown_hours
2. Validate email format (basic regex, allow empty = disable)
3. Save to `settings.json` → `notifications`
4. Show success flash: "Értesítési beállítások mentve."
**POST `/settings/notifications/test` handler:**
1. Read current notification preferences from settings
2. Send a test notification via the hub relay:
```json
{
"customer_id": "demo-felhom",
"event_type": "test",
"severity": "info",
"message": "Teszt értesítés a Felhom rendszerből",
"details": "Ha ezt az emailt megkapta, az értesítések megfelelően működnek."
}
```
3. Show result: "Teszt email elküldve." or error message
### 3.6 Event Types Reference
| Event type | Severity | Trigger | Hungarian label |
|---|---|---|---|
| `disk_warning` | warning | Disk usage >= warn threshold | Lemez figyelmeztetés |
| `disk_critical` | critical | Disk usage >= crit threshold | Lemez kritikus |
| `backup_failed` | critical | Restic snapshot failed | Biztonsági mentés sikertelen |
| `db_dump_failed` | critical | DB dump failed | Adatbázis mentés sikertelen |
| `update_available` | info | New controller version available | Frissítés elérhető |
| `security_update` | warning | Security update available | Biztonsági frissítés |
| `container_unhealthy` | warning | Protected container not running | Alkalmazás leállt |
| `integrity_failed` | warning | Weekly restic check failed | Mentés integritás hiba |
| `test` | info | Manual test from settings page | Teszt |
---
## 4. Implementation Order
### Step 1: Monitoring page — ping status section
- Add `PingStatusItem` struct and builder in monitoring handler
- Add "Távoli monitoring" section to `monitoring.html`
- Add CSS for ping status rows and banner
- **Test:** Check monitoring page shows ✅/⚠️ for each ping UUID
### Step 2: Alert manager + dashboard banners
- Create `internal/web/alerts.go` with AlertManager
- Wire AlertManager into health check cycle
- Add alert rendering to `layout_start` template
- Add CSS for alert banners
- **Test:** Set a ping UUID to empty → warning banner appears on all pages. Fix it → banner disappears.
### Step 3: Hub notification endpoint (felhom.eu repo)
- Add Resend API key to hub's k8s secret
- Add `POST /api/v1/notify` endpoint to hub
- Add `customer_notifications` table to hub's SQLite
- Add email sending via Resend HTTP API (not SMTP — direct API call)
- Add hub admin page or CLI to set customer email/preferences
- **Test:** `curl -X POST hub.felhom.eu/api/v1/notify -H "Authorization: Bearer ..." -d '{"customer_id":"demo-felhom","event_type":"test","severity":"info","message":"Test"}'` → email arrives
### Step 4: Controller-side notifier
- Create `internal/notify/notifier.go`
- Wire into health check, backup, and DB dump flows
- Add cooldown tracking (in-memory map, not persisted)
- **Test:** Trigger a disk warning → notification sent to hub → email arrives
### Step 5: Notification preferences UI
- Expand `NotificationPrefs` struct in settings.go
- Add "Értesítések" section to settings.html
- Add POST handlers for save and test
- Push email preference to hub when saving (optional — can be deferred)
- **Test:** Set email → save → test → email arrives → change events → save → verify filtering works
### Step 6: Cleanup & version bump
- Update CONTEXT.md
- Bump controller version to 0.7.1
- Bump hub version accordingly
- Build + deploy both → verify on demo-felhom.eu
---
## 5. Files to Create / Modify
### Controller repo (`deploy-felhom-compose`):
**New files:**
- `controller/internal/web/alerts.go` — AlertManager
- `controller/internal/notify/notifier.go` — Hub notification client
**Modified files:**
- `controller/internal/web/server.go` — Add AlertManager, wire into handlers, pass alerts to all templates
- `controller/internal/web/handlers.go` — Monitoring handler: add ping status data. Settings handler: add notification preferences section + POST handlers.
- `controller/internal/web/templates/monitoring.html` — Add "Távoli monitoring" section
- `controller/internal/web/templates/settings.html` — Add "Értesítések" section
- `controller/internal/web/templates/layout.html` — Add alert banner rendering
- `controller/internal/web/templates/style.css` — New styles for alerts and ping status
- `controller/internal/settings/settings.go` — Expand NotificationPrefs struct
- `controller/internal/monitor/healthcheck.go` — After health check, update AlertManager + trigger notifications
- `controller/internal/backup/backup.go` — Trigger notification on backup failure
- `controller/internal/backup/dbdump.go` — Trigger notification on dump failure
- `controller/cmd/controller/main.go` — Initialize Notifier, AlertManager, wire dependencies
### Hub repo (`felhom.eu`):
**Modified files:**
- `hub/internal/api/server.go` (or new `notify.go`) — Add `POST /api/v1/notify` endpoint
- `hub/internal/store/store.go` — Add `customer_notifications` table + queries
- `hub/cmd/hub/main.go` — Add Resend API key config
- `manifests/hub.yaml` — Add `RESEND_API_KEY` to hub secret
---
## 6. Design Decisions & Notes
### Why add notify to the hub instead of a new service?
- Hub already authenticates customers, has SQLite, knows customer IDs
- Adding one endpoint is simpler than deploying+maintaining a separate service
- Shared Resend API key, shared k8s secret
- One less DNS record, ingress, deployment to manage
### Customer email configuration flow (Phase 2a — operator-managed)
For now, the operator sets each customer's email via direct SQLite or a simple hub admin endpoint. The customer can see and change their email in the controller's settings page, but actually syncing this to the hub is deferred — the controller just stores it locally. The operator manually ensures the hub has the right email.
This is acceptable for the initial small customer base. Phase 2b (future) will add automatic preference sync via the report push.
### Notification cooldown
The controller tracks in-memory when each event type was last notified. If the same event type fires again within the cooldown period (default 6 hours), the notification is suppressed. This prevents email spam during prolonged issues (e.g., disk stays at 85% for days).
The cooldown resets on controller restart, which is fine — restarting the controller during an active issue should re-trigger a notification.
### Dashboard alerts are state-based, not event-based
Alerts reflect current system state. They're regenerated every 5 minutes from the latest health check. When the issue resolves, the alert disappears. No persistence needed — alerts live in memory only.
### Resend API usage from hub
Use Resend's HTTP API directly (POST to `https://api.resend.com/emails`) rather than SMTP. This avoids SMTP connection management complexity and is more idiomatic for a Go service. The contact-mailer already demonstrates this pattern.
```go
// Example Resend API call
req, _ := http.NewRequest("POST", "https://api.resend.com/emails", bytes.NewReader(payload))
req.Header.Set("Authorization", "Bearer " + resendAPIKey)
req.Header.Set("Content-Type", "application/json")
```
### Email template
Notifications should be simple, text-focused emails in Hungarian:
```
Tárgy: [Felhom] Figyelmeztetés: SSD lemez használat 85%
Kedves Ügyfél!
A Felhom rendszered a következő figyelmeztetést jelezte:
SSD lemez használat: 85% (küszöb: 80%)
Részletek:
- Szerver: demo-felhom.eu
- Időpont: 2026-02-16 14:30
- Szint: Figyelmeztetés
Ha kérdésed van, vedd fel a kapcsolatot az üzemeltetővel.
Üdvözlettel,
Felhom.eu monitoring
```
### Monitoring page vs Settings page — what goes where
- **Rendszermonitor** shows: live ping status table (read-only), system metrics, alerts related to monitoring
- **Beállítások** shows: notification email + event preferences (editable), test button
- No overlap — monitoring shows status, settings allows configuration