From d1032a3a4f9fbb5a6f3a6d1ef0bba341b9b09077 Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Mon, 16 Feb 2026 19:34:24 +0100 Subject: [PATCH] =?UTF-8?q?Update=20CONTEXT.md=20for=20session=2023=20?= =?UTF-8?q?=E2=80=94=20v0.7.1=20Phase=202=20summary?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.6 --- CONTEXT.md | 72 ++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 59 insertions(+), 13 deletions(-) diff --git a/CONTEXT.md b/CONTEXT.md index 4c867b7..6e6599d 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -7,7 +7,7 @@ > > Ask Claude Code: "Please update CONTEXT.md with what we did today" -Last updated: 2026-02-16 (session 22) +Last updated: 2026-02-16 (session 23) --- @@ -22,17 +22,59 @@ Last updated: 2026-02-16 (session 22) ## Current project state ### felhom-controller (this repo) -- **Version:** v0.7.0 +- **Version:** v0.7.1 - **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow - **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings) - **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**) - **Phase 4:** ✅ COMPLETE — Monitoring Page with Metrics Store (SQLite, Chart.js, system + container metrics) - **Phase 5:** ✅ COMPLETE — Authentication, Persistence & Settings Page (settings.json, password change, session management) +- **Phase 6:** ✅ COMPLETE — Monitoring Warnings, Dashboard Alerts & Notification System - **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13) - **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080 - **All Phase 1-5 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page, settings page -### What was just completed (2026-02-16 session 22) +### What was just completed (2026-02-16 session 23) +- **v0.7.1 — Phase 2: Monitoring Warnings, Dashboard Alerts & Notification System:** + - **Three workstreams across two repos** (deploy-felhom-compose + felhom.eu): + - **Monitoring page "Távoli monitoring" section** (`monitoring.html`, `handlers.go`): + - New section between System Overview and System Metrics showing healthcheck ping UUID status + - 5 rows: Heartbeat, System Health, DB Dump, Backup, Backup Integrity — each shows ✅ configured or ⚠️ missing + - Banner: green (all configured), yellow (some missing), red (monitoring disabled) + - `isPingConfigured()` helper checks non-empty AND not "CHANGEME" prefix + - **Dashboard alert banners** (new `alerts.go`, `layout.html`): + - `AlertManager` struct with `Refresh()` + `GetAlerts()` — generates alerts from health report, missing pings, backup disabled + - Alert types: `Alert{ID, Level, Message, Link, LinkText}` — levels: error/warning/info + - Renders colored banners (red/yellow/blue) after `
` on all pages + - Caps at 5 alerts with "+N more" overflow; monitoring page excludes "pings-missing" (shown in table instead) + - Refreshed every 5 min via system-health scheduler task + once at startup + - **Hub notification relay** (felhom.eu repo — `hub/internal/api/handler.go`, `hub/internal/store/store.go`): + - `POST /api/v1/notify` endpoint: Bearer auth, JSON payload (customer_id, event_type, severity, message, details) + - New `customer_notifications` table (email, enabled_events JSON) + `notification_log` audit table + - Resend email integration: direct HTTP POST to `https://api.resend.com/emails` + - Hungarian email template with event details, timestamp, severity + - `hub.yaml.example` updated with notifications config section + - **Controller-side notifier** (new `internal/notify/notifier.go`): + - `Notifier` struct: fires HTTP POST to hub `/api/v1/notify`, non-blocking (goroutine) + - Cooldown tracking per event type (default 6h, configurable via UI) + - Checks notification preferences (email configured + event enabled) before sending + - `NotifyHealthChange()`: only notifies on status degradation (ok→warn, ok→fail, warn→fail) + - `NotifyBackupFailed/NotifyDBDumpFailed/NotifyIntegrityFailed` convenience methods + - `SendTest()` for test email flow + - Wired into scheduler: system-health task calls `NotifyHealthChange()`, backup tasks call failure notifiers + - **Notification preferences UI** (`settings.html`, `handlers.go`): + - New "Értesítések" Section C on Settings page (only shown when hub enabled) + - Email input, 4 event checkboxes (disk_warning, backup_failed, update_available, security_update) + - Cooldown hours input (default 6) + - "Mentés" + "Teszt email küldése" buttons + - Saved to `settings.json` via `NotificationPrefs` struct (Email, EnabledEvents, CooldownHours) + - **Settings persistence expanded** (`settings.go`): + - `NotificationPrefs` struct with Email, EnabledEvents, CooldownHours + - `DefaultEnabledEvents`: disk_warning, backup_failed, update_available + - `GetNotificationPrefs()` returns defaults if nil, `SetNotificationPrefs()` saves atomically + - **Files changed**: 3 new (alerts.go, notifier.go, notify package), ~12 modified across both repos + - **Deployed:** Controller v0.7.1 to demo-felhom.eu, verified healthy (0 alerts on clean system) + +### What was previously completed (2026-02-16 session 22) - **v0.7.0 — Phase 1: Authentication, Persistence & Settings Page:** - **New `internal/settings/settings.go`:** Shared persistence layer via `settings.json` in the data directory. Atomic writes (tmp + rename), thread-safe with `sync.RWMutex`. Stores password hash overrides and DB validation cache. Graceful handling if file doesn't exist. - **Auth improvements:** @@ -427,19 +469,18 @@ Last updated: 2026-02-16 (session 22) 7. Documentation: restart vs up -d for image updates ### What's next (priorities) -1. **Manual steps for v0.6.0** — Viktor needs to: - - Create 5 healthcheck checks on status.felhom.eu with correct periods/grace - - Update controller.yaml on demo-felhom with real UUIDs - - Build + deploy felhom-hub to k3s (`cd hub && make docker-push`, `kubectl apply -f manifests/hub.yaml`) - - Configure hub.felhom.eu DNS in Cloudflare - - Enable hub reporting on demo-felhom (`hub.enabled: true`, `hub.api_key: `) -2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps -3. **Test backup integrity check** — wait for Sunday 04:00 or manually trigger +1. **Manual steps for v0.7.1** — Viktor needs to: + - Rebuild + redeploy felhom-hub to k3s (hub code updated with notification endpoint + Resend integration) + - Configure `notifications.resend_api_key` in hub.yaml + - Set notification email in Settings → Értesítések on demo-felhom + - Test notification flow end-to-end (Settings → "Teszt email küldése") +2. **Test alert banners** — Configure some missing ping UUIDs or disable backup to verify yellow/red banners appear +3. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps 4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden) 5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) 6. Test on Raspberry Pi (pi-customer-1) -7. Phase 4: Self-update mechanism -8. v0.6.1: Hub alerting (webhook to Healthchecks for stale customers) +7. Self-update mechanism +8. Hub alerting (webhook to Healthchecks for stale customers) ## Architecture decisions @@ -471,6 +512,11 @@ Last updated: 2026-02-16 (session 22) | DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars | | Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger | | modernc.org/sqlite (pure Go) | No CGO/gcc needed in Docker build stage — keeps `CGO_ENABLED=0` static binary | +| AlertManager state-based refresh | Alerts regenerated every 5min from health report — no persistent storage needed, always reflects current state | +| Notification relay via hub | Controller → hub → Resend → email. Hub acts as central relay: knows customer email, handles Resend API. Controller only needs hub URL + API key | +| In-memory notification cooldowns | Per-event-type cooldown map (default 6h). Lost on restart = acceptable (better to re-notify than miss). No persistence needed | +| Health status change detection | Only notify on degradation (ok→warn, ok→fail, warn→fail). Avoids spam on flapping. First run records baseline, doesn't notify | +| Resend HTTP API (no SMTP) | Direct POST to api.resend.com — same pattern as website contact-mailer. Simpler than SMTP setup, good deliverability | | Chart.js embedded locally | Customer hardware may not have internet — CDN not reliable for offline environments | | Metrics downsampling via SQL | Bucket-based AVG in GROUP BY keeps Chart.js responsive with up to 30 days of data | | 60s metrics collection interval | Good balance of resolution vs. storage — ~44K rows/month for system metrics |