Update CONTEXT.md for session 23 — v0.7.1 Phase 2 summary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+59
-13
@@ -7,7 +7,7 @@
|
||||
>
|
||||
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
|
||||
|
||||
Last updated: 2026-02-16 (session 22)
|
||||
Last updated: 2026-02-16 (session 23)
|
||||
|
||||
---
|
||||
|
||||
@@ -22,17 +22,59 @@ Last updated: 2026-02-16 (session 22)
|
||||
## Current project state
|
||||
|
||||
### felhom-controller (this repo)
|
||||
- **Version:** v0.7.0
|
||||
- **Version:** v0.7.1
|
||||
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
|
||||
- **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings)
|
||||
- **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**)
|
||||
- **Phase 4:** ✅ COMPLETE — Monitoring Page with Metrics Store (SQLite, Chart.js, system + container metrics)
|
||||
- **Phase 5:** ✅ COMPLETE — Authentication, Persistence & Settings Page (settings.json, password change, session management)
|
||||
- **Phase 6:** ✅ COMPLETE — Monitoring Warnings, Dashboard Alerts & Notification System
|
||||
- **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13)
|
||||
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
|
||||
- **All Phase 1-5 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page, settings page
|
||||
|
||||
### What was just completed (2026-02-16 session 22)
|
||||
### What was just completed (2026-02-16 session 23)
|
||||
- **v0.7.1 — Phase 2: Monitoring Warnings, Dashboard Alerts & Notification System:**
|
||||
- **Three workstreams across two repos** (deploy-felhom-compose + felhom.eu):
|
||||
- **Monitoring page "Távoli monitoring" section** (`monitoring.html`, `handlers.go`):
|
||||
- New section between System Overview and System Metrics showing healthcheck ping UUID status
|
||||
- 5 rows: Heartbeat, System Health, DB Dump, Backup, Backup Integrity — each shows ✅ configured or ⚠️ missing
|
||||
- Banner: green (all configured), yellow (some missing), red (monitoring disabled)
|
||||
- `isPingConfigured()` helper checks non-empty AND not "CHANGEME" prefix
|
||||
- **Dashboard alert banners** (new `alerts.go`, `layout.html`):
|
||||
- `AlertManager` struct with `Refresh()` + `GetAlerts()` — generates alerts from health report, missing pings, backup disabled
|
||||
- Alert types: `Alert{ID, Level, Message, Link, LinkText}` — levels: error/warning/info
|
||||
- Renders colored banners (red/yellow/blue) after `<main class="content">` on all pages
|
||||
- Caps at 5 alerts with "+N more" overflow; monitoring page excludes "pings-missing" (shown in table instead)
|
||||
- Refreshed every 5 min via system-health scheduler task + once at startup
|
||||
- **Hub notification relay** (felhom.eu repo — `hub/internal/api/handler.go`, `hub/internal/store/store.go`):
|
||||
- `POST /api/v1/notify` endpoint: Bearer auth, JSON payload (customer_id, event_type, severity, message, details)
|
||||
- New `customer_notifications` table (email, enabled_events JSON) + `notification_log` audit table
|
||||
- Resend email integration: direct HTTP POST to `https://api.resend.com/emails`
|
||||
- Hungarian email template with event details, timestamp, severity
|
||||
- `hub.yaml.example` updated with notifications config section
|
||||
- **Controller-side notifier** (new `internal/notify/notifier.go`):
|
||||
- `Notifier` struct: fires HTTP POST to hub `/api/v1/notify`, non-blocking (goroutine)
|
||||
- Cooldown tracking per event type (default 6h, configurable via UI)
|
||||
- Checks notification preferences (email configured + event enabled) before sending
|
||||
- `NotifyHealthChange()`: only notifies on status degradation (ok→warn, ok→fail, warn→fail)
|
||||
- `NotifyBackupFailed/NotifyDBDumpFailed/NotifyIntegrityFailed` convenience methods
|
||||
- `SendTest()` for test email flow
|
||||
- Wired into scheduler: system-health task calls `NotifyHealthChange()`, backup tasks call failure notifiers
|
||||
- **Notification preferences UI** (`settings.html`, `handlers.go`):
|
||||
- New "Értesítések" Section C on Settings page (only shown when hub enabled)
|
||||
- Email input, 4 event checkboxes (disk_warning, backup_failed, update_available, security_update)
|
||||
- Cooldown hours input (default 6)
|
||||
- "Mentés" + "Teszt email küldése" buttons
|
||||
- Saved to `settings.json` via `NotificationPrefs` struct (Email, EnabledEvents, CooldownHours)
|
||||
- **Settings persistence expanded** (`settings.go`):
|
||||
- `NotificationPrefs` struct with Email, EnabledEvents, CooldownHours
|
||||
- `DefaultEnabledEvents`: disk_warning, backup_failed, update_available
|
||||
- `GetNotificationPrefs()` returns defaults if nil, `SetNotificationPrefs()` saves atomically
|
||||
- **Files changed**: 3 new (alerts.go, notifier.go, notify package), ~12 modified across both repos
|
||||
- **Deployed:** Controller v0.7.1 to demo-felhom.eu, verified healthy (0 alerts on clean system)
|
||||
|
||||
### What was previously completed (2026-02-16 session 22)
|
||||
- **v0.7.0 — Phase 1: Authentication, Persistence & Settings Page:**
|
||||
- **New `internal/settings/settings.go`:** Shared persistence layer via `settings.json` in the data directory. Atomic writes (tmp + rename), thread-safe with `sync.RWMutex`. Stores password hash overrides and DB validation cache. Graceful handling if file doesn't exist.
|
||||
- **Auth improvements:**
|
||||
@@ -427,19 +469,18 @@ Last updated: 2026-02-16 (session 22)
|
||||
7. Documentation: restart vs up -d for image updates
|
||||
|
||||
### What's next (priorities)
|
||||
1. **Manual steps for v0.6.0** — Viktor needs to:
|
||||
- Create 5 healthcheck checks on status.felhom.eu with correct periods/grace
|
||||
- Update controller.yaml on demo-felhom with real UUIDs
|
||||
- Build + deploy felhom-hub to k3s (`cd hub && make docker-push`, `kubectl apply -f manifests/hub.yaml`)
|
||||
- Configure hub.felhom.eu DNS in Cloudflare
|
||||
- Enable hub reporting on demo-felhom (`hub.enabled: true`, `hub.api_key: <key>`)
|
||||
2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps
|
||||
3. **Test backup integrity check** — wait for Sunday 04:00 or manually trigger
|
||||
1. **Manual steps for v0.7.1** — Viktor needs to:
|
||||
- Rebuild + redeploy felhom-hub to k3s (hub code updated with notification endpoint + Resend integration)
|
||||
- Configure `notifications.resend_api_key` in hub.yaml
|
||||
- Set notification email in Settings → Értesítések on demo-felhom
|
||||
- Test notification flow end-to-end (Settings → "Teszt email küldése")
|
||||
2. **Test alert banners** — Configure some missing ping UUIDs or disable backup to verify yellow/red banners appear
|
||||
3. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps
|
||||
4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
|
||||
5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets)
|
||||
6. Test on Raspberry Pi (pi-customer-1)
|
||||
7. Phase 4: Self-update mechanism
|
||||
8. v0.6.1: Hub alerting (webhook to Healthchecks for stale customers)
|
||||
7. Self-update mechanism
|
||||
8. Hub alerting (webhook to Healthchecks for stale customers)
|
||||
|
||||
## Architecture decisions
|
||||
|
||||
@@ -471,6 +512,11 @@ Last updated: 2026-02-16 (session 22)
|
||||
| DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars |
|
||||
| Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger |
|
||||
| modernc.org/sqlite (pure Go) | No CGO/gcc needed in Docker build stage — keeps `CGO_ENABLED=0` static binary |
|
||||
| AlertManager state-based refresh | Alerts regenerated every 5min from health report — no persistent storage needed, always reflects current state |
|
||||
| Notification relay via hub | Controller → hub → Resend → email. Hub acts as central relay: knows customer email, handles Resend API. Controller only needs hub URL + API key |
|
||||
| In-memory notification cooldowns | Per-event-type cooldown map (default 6h). Lost on restart = acceptable (better to re-notify than miss). No persistence needed |
|
||||
| Health status change detection | Only notify on degradation (ok→warn, ok→fail, warn→fail). Avoids spam on flapping. First run records baseline, doesn't notify |
|
||||
| Resend HTTP API (no SMTP) | Direct POST to api.resend.com — same pattern as website contact-mailer. Simpler than SMTP setup, good deliverability |
|
||||
| Chart.js embedded locally | Customer hardware may not have internet — CDN not reliable for offline environments |
|
||||
| Metrics downsampling via SQL | Bucket-based AVG in GROUP BY keeps Chart.js responsive with up to 30 days of data |
|
||||
| 60s metrics collection interval | Good balance of resolution vs. storage — ~44K rows/month for system metrics |
|
||||
|
||||
Reference in New Issue
Block a user