Files
deploy-felhom-compose/TASK.md
T

25 KiB
Raw Blame History

TASK: Phase 2 — Monitoring Warnings, Dashboard Alerts & Notification System

Version target: 0.7.1 Repos: deploy-felhom-compose (controller) + felhom.eu (notification-relay on k3s)

Overview

Three workstreams in this phase:

  1. Monitoring page warnings — Show healthcheck ping configuration status, warn about missing UUIDs
  2. Dashboard alert system — Persistent in-app banners for active issues/warnings
  3. Notification system — Central email relay on k3s + customer-side preferences UI

1. Monitoring Page: Healthcheck Ping Status

1.1 Problem

The monitoring page shows system metrics but doesn't indicate whether healthcheck pings are actually configured. If a ping UUID is empty or CHANGEME, the pinger silently skips it — the customer has no visibility into whether remote monitoring is working.

1.2 New section: "Távoli monitoring" (Remote Monitoring)

Add a new section to monitoring.html between "Rendszer áttekintés" and "Rendszer metrikák" (section 1 and 2). This section is server-rendered (not JS/API — ping config is static and known at page load).

Display a table showing each healthcheck ping's configuration status:

Ellenőrzés UUID státusz Gyakoriság
💓 Életjel (Heartbeat) Beállítva 5 percenként
🖥️ Rendszer állapot Beállítva 5 percenként
🗄️ Adatbázis mentés ⚠️ Nincs beállítva Naponta 02:30
💾 Biztonsági mentés Beállítva Naponta 03:00
🔍 Mentés integritás ⚠️ Nincs beállítva Hetente (vasárnap)

Logic for each row:

  • Read monitoring.ping_uuids.* from config
  • UUID is "configured" if: non-empty AND doesn't start with CHANGEME
  • If configured: show ✅ Beállítva (green text)
  • If not configured: show ⚠️ Nincs beállítva (yellow/orange warning text)
  • If monitoring is disabled entirely (monitoring.enabled = false): show a single warning banner instead of the table: "A távoli monitoring ki van kapcsolva. Az üzemeltető nem kap értesítést hibák esetén."

Summary banner above the table:

  • All configured: green banner — " Minden távoli monitoring aktív — az üzemeltető értesítést kap hibák esetén."
  • Some missing: yellow banner — "⚠️ Egyes monitoring ellenőrzések nincsenek beállítva. Kérd az üzemeltetőt a konfiguráláshoz."
  • Monitoring disabled: red/orange banner — as above

1.3 Data flow

Add a new template data struct for the monitoring handler:

type MonitoringPageData struct {
    // Existing fields...
    SystemInfo     *system.Info
    ActivePage     string

    // New: healthcheck ping status
    MonitoringEnabled bool
    PingStatus        []PingStatusItem
    AllPingsConfigured bool
}

type PingStatusItem struct {
    Label      string // Hungarian display name
    Icon       string // emoji
    Configured bool   // UUID is valid
    Schedule   string // "5 percenként" / "Naponta 02:30" etc.
}

Build PingStatus slice in the handler from cfg.Monitoring.PingUUIDs:

pings := []PingStatusItem{
    {Label: "Életjel (Heartbeat)", Icon: "💓", Configured: isConfigured(cfg.Monitoring.PingUUIDs.Heartbeat), Schedule: "5 percenként"},
    {Label: "Rendszer állapot", Icon: "🖥️", Configured: isConfigured(cfg.Monitoring.PingUUIDs.SystemHealth), Schedule: "5 percenként"},
    {Label: "Adatbázis mentés", Icon: "🗄️", Configured: isConfigured(cfg.Monitoring.PingUUIDs.DBDump), Schedule: "Naponta " + cfg.Backup.DBDumpSchedule},
    {Label: "Biztonsági mentés", Icon: "💾", Configured: isConfigured(cfg.Monitoring.PingUUIDs.Backup), Schedule: "Naponta " + cfg.Backup.ResticSchedule},
    {Label: "Mentés integritás", Icon: "🔍", Configured: isConfigured(cfg.Monitoring.PingUUIDs.BackupIntegrity), Schedule: "Hetente (vasárnap)"},
}

func isConfigured(uuid string) bool {
    return uuid != "" && !strings.HasPrefix(uuid, "CHANGEME")
}

1.4 CSS

Reuse existing .settings-row / .sysinfo-row pattern for the table. Add:

  • .ping-status-ok — green text (same as .state-text-green)
  • .ping-status-warn — orange/yellow text
  • .monitoring-banner — full-width banner with icon, green/yellow/red variants

2. Dashboard Alert System

2.1 Concept

Display persistent alert banners at the top of the main content area (below page header, above page content). Alerts are generated from the latest health check results and other events. They show on ALL pages, not just monitoring.

2.2 Alert sources

The controller already runs health checks every 5 minutes (RunHealthCheck). The resulting HealthReport contains Issues (critical) and Warnings (non-critical). Use these directly.

Additionally, generate alerts for:

  • Missing healthcheck ping UUIDs (from section 1 above)
  • Backup not configured (backup.enabled = false)
  • Hub reporting not configured when it should be
  • Recent backup failures (from backup manager state)

2.3 Implementation: Alert Manager

Create internal/web/alerts.go:

type Alert struct {
    ID       string // unique, for dismiss tracking
    Level    string // "error", "warning", "info"
    Message  string // Hungarian text
    Link     string // optional link to relevant page (e.g., "/monitoring", "/backups")
    LinkText string // "Részletek" etc.
}

type AlertManager struct {
    mu     sync.RWMutex
    alerts []Alert
    logger *log.Logger
}

Alert generation runs after each health check cycle (every 5 min):

func (am *AlertManager) Refresh(healthReport *HealthReport, cfg *config.Config, backupMgr *backup.Manager) {
    var alerts []Alert

    // From health check issues
    for _, issue := range healthReport.Issues {
        alerts = append(alerts, Alert{
            ID: "health-" + hash(issue), Level: "error",
            Message: issue, Link: "/monitoring", LinkText: "Rendszermonitor",
        })
    }

    // From health check warnings
    for _, w := range healthReport.Warnings {
        alerts = append(alerts, Alert{
            ID: "health-" + hash(w), Level: "warning",
            Message: w, Link: "/monitoring", LinkText: "Rendszermonitor",
        })
    }

    // Missing ping UUIDs
    missingCount := countMissingPings(cfg)
    if missingCount > 0 {
        alerts = append(alerts, Alert{
            ID: "pings-missing", Level: "warning",
            Message: fmt.Sprintf("%d monitoring ellenőrzés nincs beállítva", missingCount),
            Link: "/monitoring", LinkText: "Rendszermonitor",
        })
    }

    // Backup disabled
    if !cfg.Backup.Enabled {
        alerts = append(alerts, Alert{
            ID: "backup-disabled", Level: "warning",
            Message: "A biztonsági mentés nincs bekapcsolva",
            Link: "/settings", LinkText: "Beállítások",
        })
    }

    am.mu.Lock()
    am.alerts = alerts
    am.mu.Unlock()
}

2.4 Template integration

In layout_start template (or a new alerts partial), render alerts above the page content:

{{if .Alerts}}
<div class="alerts-container">
    {{range .Alerts}}
    <div class="alert-banner alert-banner-{{.Level}}">
        <span class="alert-icon">{{if eq .Level "error"}}🔴{{else if eq .Level "warning"}}🟡{{else}}️{{end}}</span>
        <span class="alert-message">{{.Message}}</span>
        {{if .Link}}<a href="{{.Link}}" class="alert-link">{{.LinkText}} →</a>{{end}}
    </div>
    {{end}}
</div>
{{end}}

Key decisions:

  • Alerts are NOT dismissible (they reflect real state — they disappear when the issue is resolved)
  • Maximum 5 alerts shown, with "+N more" indicator if overflow
  • On the monitoring page, skip the "pings-missing" alert since the detailed table is already visible
  • Error alerts (red) above warning alerts (yellow)

2.5 Passing alerts to templates

Every page handler already passes template data via a struct. Add an Alerts []Alert field to each page's data struct (or use a shared base struct). The alert manager is available via the web server struct.

// In each handler:
data.Alerts = s.alertManager.GetAlerts()

2.6 CSS

.alerts-container { margin-bottom: 1rem; }
.alert-banner {
    display: flex; align-items: center; gap: 0.75rem;
    padding: 0.75rem 1rem; border-radius: 8px; margin-bottom: 0.5rem;
    font-size: 0.9rem;
}
.alert-banner-error { background: rgba(248, 113, 113, 0.1); border: 1px solid rgba(248, 113, 113, 0.3); color: #f87171; }
.alert-banner-warning { background: rgba(250, 204, 21, 0.1); border: 1px solid rgba(250, 204, 21, 0.3); color: #facc15; }
.alert-banner-info { background: rgba(96, 165, 250, 0.1); border: 1px solid rgba(96, 165, 250, 0.3); color: #60a5fa; }
.alert-link { margin-left: auto; white-space: nowrap; }

3. Notification System

3.1 Architecture

Customer Node                              k3s Cluster
┌──────────────────────┐                   ┌──────────────────────────────┐
│  felhom-controller   │   HTTP POST       │  notification-relay          │
│                      │ ─────────────────>│  (notify.felhom.eu)          │
│  Event detected:     │  {customer_id,    │                              │
│  - disk_warning      │   event_type,     │  1. Validate API key         │
│  - backup_failed     │   message,        │  2. Format email             │
│  - ...               │   severity}       │  3. Send via Resend API      │
│                      │                   │  4. Return 200/4xx/5xx       │
└──────────────────────┘                   └──────────────────────────────┘
                                                      │
                                                      │ Resend API
                                                      ▼
                                               ┌──────────────┐
                                               │ Customer     │
                                               │ email inbox  │
                                               └──────────────┘

Why a relay?

  • Resend API key stays on trusted infrastructure (k3s), never on customer hardware
  • Central rate limiting and logging of all notifications
  • Operator visibility into what notifications were sent
  • Customer controllers only need hub URL + API key (already have these for hub reporting)

3.2 Notification Relay Service (k3s side)

Repo: felhom.eu — new directory notification-relay/ alongside hub/

This is a small Go service, similar to contact-mailer. Deploy on k3s at notify.felhom.eu (or as a path under hub, e.g., hub.felhom.eu/api/v1/notify).

Option A: Standalone service at notify.felhom.eu

  • Separate deployment, its own ingress
  • Clean separation of concerns
  • More k3s resources

Option B: Add notify endpoint to the existing hub

  • Hub already runs, has API key auth, knows customer IDs
  • Just add a POST /api/v1/notify endpoint
  • Reuse hub's Resend integration
  • Less infrastructure

Recommendation: Option B — Add to the hub. The hub already authenticates customers by API key and has all the context needed. Adding a /api/v1/notify endpoint is minimal work.

Hub notify endpoint

POST /api/v1/notify
Authorization: Bearer <customer_api_key>
Content-Type: application/json

{
    "customer_id": "demo-felhom",
    "event_type": "disk_warning",
    "severity": "warning",           // "info", "warning", "critical"
    "message": "SSD disk usage: 85%",
    "details": "Threshold: 80%"      // optional
}

Hub processing:

  1. Validate API key (same auth as report push)
  2. Look up customer notification preferences (stored in hub's SQLite)
  3. If customer has email configured AND event_type is in their enabled events:
    • Format email (Hungarian template)
    • Send via Resend API (direct HTTP call, same pattern as contact-mailer)
  4. Log the notification attempt and result
  5. Return 200 (accepted), 400 (bad request), 401 (unauthorized)

Hub config additions (hub.yaml secret):

RESEND_API_KEY: "re_XZZenCJs..."    # Same key as healthchecks/contact-mailer
FROM_EMAIL: "monitoring@felhom.eu"

Customer notification config (hub-side storage)

The hub stores per-customer notification preferences in its SQLite DB:

CREATE TABLE customer_notifications (
    customer_id TEXT PRIMARY KEY,
    email TEXT NOT NULL DEFAULT '',                -- customer email address
    enabled_events TEXT NOT NULL DEFAULT '[]',     -- JSON array of event types
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

How the customer's email and preferences get there:

  • Phase 2a (this task): Operator sets them manually via hub dashboard or SQLite
  • Phase 2b (future): Controller pushes notification preferences to hub along with reports, hub saves them

For now (2a), the operator configures each customer's email in the hub after setup. This avoids needing the controller to push preferences to the hub yet.

3.3 Controller-side: Notification Trigger

Add internal/notify/notifier.go:

type Notifier struct {
    hubURL     string
    apiKey     string
    httpClient *http.Client
    logger     *log.Logger
    enabled    bool
    prefs      *settings.NotificationPrefs  // local preferences
}

// Notify sends a notification event to the hub relay.
// Non-blocking: fires and forgets (logs errors but doesn't retry aggressively).
func (n *Notifier) Notify(eventType, severity, message, details string) {
    if !n.enabled { return }
    if !n.prefs.IsEventEnabled(eventType) { return }

    // POST to hub
    payload := NotifyRequest{
        CustomerID: n.customerID,
        EventType:  eventType,
        Severity:   severity,
        Message:    message,
        Details:    details,
    }
    // ... HTTP POST to hubURL + "/api/v1/notify"
}

Integration points — trigger notifications from:

  1. monitor/healthcheck.go — after RunHealthCheck, if status changed from ok to warn/fail
  2. backup/backup.go — after backup failure
  3. backup/dbdump.go — after DB dump failure
  4. scheduler/scheduler.go — after integrity check failure

Important: Don't spam. Track last notification time per event type. Don't re-notify for the same ongoing issue within 6 hours (configurable). This is handled locally with an in-memory map.

3.4 Notification Preferences (settings.json + settings page)

Expand the NotificationPrefs struct:

type NotificationPrefs struct {
    // Customer email for notifications (sent to hub, hub delivers via Resend)
    Email string `json:"email,omitempty"`

    // Which events to be notified about
    EnabledEvents []string `json:"enabled_events,omitempty"`

    // Notification cooldown in hours (don't re-send for same issue within this period)
    CooldownHours int `json:"cooldown_hours,omitempty"` // default: 6
}

// Default events if not configured
var DefaultCustomerEvents = []string{
    "disk_warning",
    "backup_failed",
    "update_available",
}

3.5 Settings page: Notification Preferences UI

Add a third section to the existing settings page (below "Jelszó módosítás"):

Section C: "Értesítések" (Notifications)

Only shown if hub is enabled (hub.enabled = true). If hub is disabled, show info message: "Az értesítések a központi rendszeren keresztül működnek, ami jelenleg nincs bekapcsolva."

┌─────────────────────────────────────────────────────────┐
│ Értesítések                                              │
│                                                          │
│ E-mail cím:  [________________________] (text input)     │
│                                                          │
│ Az alábbi eseményekről kapjon értesítést:                 │
│                                                          │
│ [x] Lemez figyelmeztetés (80%+)                          │
│ [x] Biztonsági mentés sikertelen                         │
│ [x] Frissítés elérhető                                   │
│ [ ] Biztonsági frissítés                                 │
│                                                          │
│ Értesítési szünet: [6] óra                               │
│ (Azonos probléma esetén ennyi ideig nem küld újat)       │
│                                                          │
│ [Mentés]                            [Teszt email küldése]│
└─────────────────────────────────────────────────────────┘

Route:

Method Path Auth? Handler
POST /settings/notifications Yes Save notification preferences
POST /settings/notifications/test Yes Send test notification via hub relay

POST /settings/notifications handler:

  1. Parse form: email, enabled_events (checkbox list), cooldown_hours
  2. Validate email format (basic regex, allow empty = disable)
  3. Save to settings.jsonnotifications
  4. Show success flash: "Értesítési beállítások mentve."

POST /settings/notifications/test handler:

  1. Read current notification preferences from settings
  2. Send a test notification via the hub relay:
    {
        "customer_id": "demo-felhom",
        "event_type": "test",
        "severity": "info",
        "message": "Teszt értesítés a Felhom rendszerből",
        "details": "Ha ezt az emailt megkapta, az értesítések megfelelően működnek."
    }
    
  3. Show result: "Teszt email elküldve." or error message

3.6 Event Types Reference

Event type Severity Trigger Hungarian label
disk_warning warning Disk usage >= warn threshold Lemez figyelmeztetés
disk_critical critical Disk usage >= crit threshold Lemez kritikus
backup_failed critical Restic snapshot failed Biztonsági mentés sikertelen
db_dump_failed critical DB dump failed Adatbázis mentés sikertelen
update_available info New controller version available Frissítés elérhető
security_update warning Security update available Biztonsági frissítés
container_unhealthy warning Protected container not running Alkalmazás leállt
integrity_failed warning Weekly restic check failed Mentés integritás hiba
test info Manual test from settings page Teszt

4. Implementation Order

Step 1: Monitoring page — ping status section

  • Add PingStatusItem struct and builder in monitoring handler
  • Add "Távoli monitoring" section to monitoring.html
  • Add CSS for ping status rows and banner
  • Test: Check monitoring page shows /⚠️ for each ping UUID

Step 2: Alert manager + dashboard banners

  • Create internal/web/alerts.go with AlertManager
  • Wire AlertManager into health check cycle
  • Add alert rendering to layout_start template
  • Add CSS for alert banners
  • Test: Set a ping UUID to empty → warning banner appears on all pages. Fix it → banner disappears.

Step 3: Hub notification endpoint (felhom.eu repo)

  • Add Resend API key to hub's k8s secret
  • Add POST /api/v1/notify endpoint to hub
  • Add customer_notifications table to hub's SQLite
  • Add email sending via Resend HTTP API (not SMTP — direct API call)
  • Add hub admin page or CLI to set customer email/preferences
  • Test: curl -X POST hub.felhom.eu/api/v1/notify -H "Authorization: Bearer ..." -d '{"customer_id":"demo-felhom","event_type":"test","severity":"info","message":"Test"}' → email arrives

Step 4: Controller-side notifier

  • Create internal/notify/notifier.go
  • Wire into health check, backup, and DB dump flows
  • Add cooldown tracking (in-memory map, not persisted)
  • Test: Trigger a disk warning → notification sent to hub → email arrives

Step 5: Notification preferences UI

  • Expand NotificationPrefs struct in settings.go
  • Add "Értesítések" section to settings.html
  • Add POST handlers for save and test
  • Push email preference to hub when saving (optional — can be deferred)
  • Test: Set email → save → test → email arrives → change events → save → verify filtering works

Step 6: Cleanup & version bump

  • Update CONTEXT.md
  • Bump controller version to 0.7.1
  • Bump hub version accordingly
  • Build + deploy both → verify on demo-felhom.eu

5. Files to Create / Modify

Controller repo (deploy-felhom-compose):

New files:

  • controller/internal/web/alerts.go — AlertManager
  • controller/internal/notify/notifier.go — Hub notification client

Modified files:

  • controller/internal/web/server.go — Add AlertManager, wire into handlers, pass alerts to all templates
  • controller/internal/web/handlers.go — Monitoring handler: add ping status data. Settings handler: add notification preferences section + POST handlers.
  • controller/internal/web/templates/monitoring.html — Add "Távoli monitoring" section
  • controller/internal/web/templates/settings.html — Add "Értesítések" section
  • controller/internal/web/templates/layout.html — Add alert banner rendering
  • controller/internal/web/templates/style.css — New styles for alerts and ping status
  • controller/internal/settings/settings.go — Expand NotificationPrefs struct
  • controller/internal/monitor/healthcheck.go — After health check, update AlertManager + trigger notifications
  • controller/internal/backup/backup.go — Trigger notification on backup failure
  • controller/internal/backup/dbdump.go — Trigger notification on dump failure
  • controller/cmd/controller/main.go — Initialize Notifier, AlertManager, wire dependencies

Hub repo (felhom.eu):

Modified files:

  • hub/internal/api/server.go (or new notify.go) — Add POST /api/v1/notify endpoint
  • hub/internal/store/store.go — Add customer_notifications table + queries
  • hub/cmd/hub/main.go — Add Resend API key config
  • manifests/hub.yaml — Add RESEND_API_KEY to hub secret

6. Design Decisions & Notes

Why add notify to the hub instead of a new service?

  • Hub already authenticates customers, has SQLite, knows customer IDs
  • Adding one endpoint is simpler than deploying+maintaining a separate service
  • Shared Resend API key, shared k8s secret
  • One less DNS record, ingress, deployment to manage

Customer email configuration flow (Phase 2a — operator-managed)

For now, the operator sets each customer's email via direct SQLite or a simple hub admin endpoint. The customer can see and change their email in the controller's settings page, but actually syncing this to the hub is deferred — the controller just stores it locally. The operator manually ensures the hub has the right email.

This is acceptable for the initial small customer base. Phase 2b (future) will add automatic preference sync via the report push.

Notification cooldown

The controller tracks in-memory when each event type was last notified. If the same event type fires again within the cooldown period (default 6 hours), the notification is suppressed. This prevents email spam during prolonged issues (e.g., disk stays at 85% for days).

The cooldown resets on controller restart, which is fine — restarting the controller during an active issue should re-trigger a notification.

Dashboard alerts are state-based, not event-based

Alerts reflect current system state. They're regenerated every 5 minutes from the latest health check. When the issue resolves, the alert disappears. No persistence needed — alerts live in memory only.

Resend API usage from hub

Use Resend's HTTP API directly (POST to https://api.resend.com/emails) rather than SMTP. This avoids SMTP connection management complexity and is more idiomatic for a Go service. The contact-mailer already demonstrates this pattern.

// Example Resend API call
req, _ := http.NewRequest("POST", "https://api.resend.com/emails", bytes.NewReader(payload))
req.Header.Set("Authorization", "Bearer " + resendAPIKey)
req.Header.Set("Content-Type", "application/json")

Email template

Notifications should be simple, text-focused emails in Hungarian:

Tárgy: [Felhom] Figyelmeztetés: SSD lemez használat 85%

Kedves Ügyfél!

A Felhom rendszered a következő figyelmeztetést jelezte:

SSD lemez használat: 85% (küszöb: 80%)

Részletek:
- Szerver: demo-felhom.eu
- Időpont: 2026-02-16 14:30
- Szint: Figyelmeztetés

Ha kérdésed van, vedd fel a kapcsolatot az üzemeltetővel.

Üdvözlettel,
Felhom.eu monitoring

Monitoring page vs Settings page — what goes where

  • Rendszermonitor shows: live ping status table (read-only), system metrics, alerts related to monitoring
  • Beállítások shows: notification email + event preferences (editable), test button
  • No overlap — monitoring shows status, settings allows configuration