Files
felhom.eu/hub/CHANGELOG.md
T
admin 3217cb4751 feat: Hub monitoring takeover — event system, dead man's switch, notifications (v0.3.0)
Replace external Healthchecks.io with Hub-native monitoring. New events
table + /api/v1/event endpoint for structured events from controllers.
Staleness checker (60s) detects unresponsive nodes. Backup deadline
checker (daily 05:00) catches missed backups. Notification dispatcher
sends operator (English) + customer (Hungarian) emails via Resend with
per-event cooldowns. Event timeline on customer page, dashboard badges.
Config form deprecates Monitoring UUIDs section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 18:53:24 +01:00

235 lines
11 KiB
Markdown

# Felhom Hub — Changelog
## v0.3.0 (2026-02-20)
**Hub Monitoring Takeover — Event System, Dead Man's Switch, Notifications**
Replaces external Healthchecks.io with a Hub-native event system. The Hub becomes the single source of truth for all customer monitoring, event tracking, dead man's switch alerting, and notification delivery.
### Phase 1 — Event System
- **`events` table** in SQLite: stores all events with customer_id, event_type, severity, message, details_json, source, timestamp
- **Indexes**: `idx_events_customer_created` (customer + time DESC), `idx_events_type` (type + time DESC)
- **Store methods**: `SaveEvent`, `GetRecentEvents`, `GetEventsByType`, `GetLatestEventByType`, `GetAllRecentEvents`, `CountEventsBySeverity`, `PruneEvents`, `GetActiveCustomerIDs`
- **`POST /api/v1/event`** endpoint: accepts structured events from controllers, validates event_type against 27 allowed types, validates severity (info/warning/error), stores in DB
- **Enhanced auth**: `checkAuthCustomer()` validates per-customer API keys match the customer_id in payload; global key bypasses ownership check
- **Prune**: events pruned alongside reports at 04:30 Budapest time
### Phase 2 — Dead Man's Switch
- **Staleness checker** (`internal/monitor/staleness.go`): runs every 60s, detects when controllers stop reporting
- ok→stale (>30min): inserts `node_stale` warning event
- any→down (>60min): inserts `node_down` error event
- stale/down→ok: inserts `node_recovered` info event
- Skips blocked customers, no false alerts on startup
- **Backup deadline checker** (`internal/monitor/deadline.go`): runs daily at 05:00 Budapest
- Detects missing `backup_completed` events since midnight → inserts `expected_backup_missed` error
- Detects missing `db_dump_completed` events → inserts `expected_dbdump_missed` error
- Grace: skips customers with `node_down` state
- **`scheduleDaily()`** helper: goroutine that sleeps until target time (Europe/Budapest), runs function, loops
- **`/healthz`** enhanced: returns 503 if SQLite Ping fails
### Phase 3 — Notification System
- **Dispatcher** (`internal/notify/dispatcher.go`): processes events and sends emails via Resend API
- **Operator channel**: English emails to operator for warning/error events, 1h cooldown per customer:eventType
- **Customer channel**: Hungarian emails per event_type, respects customer preferences (enabled_events, cooldown_hours), blocked customers skipped
- **Test bypass**: `test` event type skips cooldown/preferences, sends directly to customer email
- **Email templates** (`internal/notify/templates.go`): operator (concise English), customer (Hungarian per event type with complete message table)
- **Cooldown tracking**: in-memory maps with per-customer:eventType granularity
- **`customer_notifications` table**: added `cooldown_hours` column (default 6)
- **`notification_log` table**: added `channel` column (operator/customer)
- Wired into `/api/v1/event` handler and staleness/deadline checkers
### Phase 4 — Hub UI
- **Events section** on customer detail page: last 50 events, severity filter buttons (All/Errors/Warnings/Info), colored severity badges
- **Dashboard badges**: error+warning count in last 24h per customer, clickable to customer events
- **Notification log**: shows channel column (operator/customer) in customer detail page
- **Config form**: Monitoring UUIDs section marked as "Legacy" with deprecation notice, collapsed by default
### Phase 6 — Config Cleanup
- **`controller.yaml.default`**: `monitoring.ping_uuids` section commented out (deprecated)
- **`buildConfigJSON`**: only writes `ping_uuids` to config JSON if user explicitly provides UUID values (new configs get none)
---
## v0.2.2 (2026-02-20)
**Config Hash Comparison**
- **Config sync status** on unified customer page: compares SHA256 hash of controller's
`controller.yaml` (from report payload) against Hub-generated YAML. Shows "In sync",
"Config mismatch", or "Unknown" (controller needs v0.20.0+ to report hash).
- Visible in the Controller Update section next to Push Config button.
---
## v0.2.1 (2026-02-20)
**Unified Customer Management**
All customer views consolidated into a single page. New management features: blocked status,
dashboard merge, config push, and auto-config creation.
### New features
- **Unified customer page — `/customers/{id}`:**
- Single page showing both configuration info and live report data
- Replaces separate `/configs/{id}` (config detail) and `/customers/{id}` (report detail) pages
- Shows config management (credentials, setup commands, YAML preview) when config exists
- Shows "Create Config" button for manual (report-only) customers
- Old `/configs/{id}` URLs redirect to `/customers/{id}`
- **Dashboard shows pending customers:**
- Customers with config but no reports appear on dashboard with "PENDING" status
- All metric columns show "—" for pending customers
- **Blocked/Banned status:**
- Customers can be blocked via button on detail page
- Blocked customers hidden from Dashboard
- Reports still accepted (prevents controller retry loops) but notifications suppressed
- "BLOCKED" badge shown on Customers list and detail page
- One-click unblock button
- **Config push to controller:**
- "Push Config" button on unified page (visible when controller URL known)
- Generates YAML and POSTs to `{controller_url}/api/config/apply`
- Note: requires controller v0.20.0+ with config apply endpoint
- **Auto-create config from report data:**
- "Create Config" button on manual customer pages
- Pre-fills customer name from report, generates credentials
- Redirects to edit form for additional fields
### Changes
- Customers list: all rows now link to `/customers/{id}` (unified page)
- Config badges: new MANAGED/MANUAL/BLOCKED pill-style badges
- `customer_configs` table: added `status` column (active/blocked)
- Status functions handle "pending" and "blocked" status values
---
## v0.2.0 (2026-02-20)
**Customer Configuration Management**
New "Configurations" section for pre-provisioning customer nodes. Operators can configure
customer settings in the Hub web UI, then `docker-setup.sh` downloads a ready-made
`controller.yaml` — reducing deployment to a customer ID and password.
### New features
- **Web UI — `/configs` pages:**
- List all customer configurations in a table
- Create new configuration: customer identity, infrastructure secrets (CF tunnel/API tokens),
git sync credentials, monitoring UUIDs — organized in collapsible sections
- Detail page: shows credentials (retrieval password, per-customer API key) with copy-to-clipboard,
setup commands (`docker-setup.sh` and `curl`), live YAML preview
- Edit and delete configurations
- Navigation tabs (Dashboard / Configurations) on all pages
- **Config retrieval API — `GET /api/v1/config/{customer_id}`:**
- Authenticated via `X-Retrieval-Password` header (separate from Bearer token)
- Generates complete `controller.yaml` by deep-merging template with customer overrides
- Template sourced from `controller.yaml.example` (fetched from Gitea repo periodically)
- Falls back to embedded default template if fetcher not configured
- **Per-customer API keys:**
- Each customer config gets its own API key (auto-generated, 64 hex chars)
- Controllers can authenticate with per-customer key instead of the shared global key
- Backward compatible — global `report_api_key` continues to work alongside per-customer keys
- **YAML generation (`internal/configgen` package):**
- Deep-merge of template + customer-specific overrides
- Programmatic injection: customer identity, hub config, session secret
- Shared by both API handler and web UI preview
- **Template fetcher (background goroutine):**
- Periodically fetches `controller.yaml.example` from Gitea (configurable interval)
- Requires `registry.username` + `registry.token` in hub.yaml
- Falls back to `go:embed` default template when not configured
- **Data layer:**
- New `customer_configs` SQLite table
- 6 CRUD methods: Save, Get, List, Delete, GetByAPIKey, UpdateRetrievalPassword
### Configuration
New `registry` section in `hub.yaml`:
```yaml
registry:
image: "gitea.dooplex.hu/admin/felhom-controller"
username: "" # Gitea credentials (for version checker + template fetcher)
token: ""
check_interval: "6h"
template_interval: "1h" # How often to refresh controller.yaml.example
```
### Files added
- `internal/configgen/configgen.go` — shared YAML generation package
- `internal/web/configs.go` — web handlers for config CRUD
- `internal/web/templatefetcher.go` — background template refresh
- `internal/web/controller.yaml.default` — embedded fallback template
- `internal/web/templates/configs.html` — config list page
- `internal/web/templates/config_form.html` — create/edit form
- `internal/web/templates/config_detail.html` — detail + credentials page
### Files modified
- `internal/store/store.go` — customer_configs table + CRUD methods
- `internal/api/handler.go` — config retrieval endpoint, per-customer auth, `ConfigTemplateProvider` interface
- `internal/web/server.go``/configs/*` routes, `SetTemplateFetcher()`
- `internal/web/embed.go` — embedded default template
- `internal/web/templates/dashboard.html` — navigation bar
- `internal/web/templates/customer.html` — navigation bar
- `internal/web/templates/style.css` — form, nav, button, credential styles
- `cmd/hub/main.go` — template fetcher wiring, `TemplateInterval` config
- `configs/hub.yaml.example` — registry section
---
## v0.1.8 (2026-02-16)
- Controller update trigger: "Update" button on customer detail page calls controller's self-update endpoint
- Registry version checker: background goroutine checks Gitea registry for latest controller image tag
- Update available indicator on customer detail page
## v0.1.7 (2026-02-15)
- Infrastructure backup endpoints for disaster recovery (POST + GET `/api/v1/infra-backup`)
## v0.1.6 (2026-02-14)
- Handle disabled reporting status
- Storage labels display
- Date in history table
## v0.1.5 (2026-02-13)
- Notification preferences sync endpoint (`POST /api/v1/preferences`)
- Notification display on customer detail page
## v0.1.4 (2026-02-12)
- Resend API key support for email notifications
- Notification endpoint (`POST /api/v1/notify`)
## v0.1.3 (2026-02-11)
- Customer detail page: system info, storage bars, container table
- 24h history graphs
## v0.1.2 (2026-02-10)
- Dashboard auto-refresh (60s cycle)
- Status logic (green/yellow/red based on report age + health)
## v0.1.1 (2026-02-09)
- Basic dashboard with customer overview table
- Report ingest API
## v0.1.0 (2026-02-08)
- Initial release: SQLite store, report API, basic web dashboard