Replace external Healthchecks.io with Hub-native monitoring. New events table + /api/v1/event endpoint for structured events from controllers. Staleness checker (60s) detects unresponsive nodes. Backup deadline checker (daily 05:00) catches missed backups. Notification dispatcher sends operator (English) + customer (Hungarian) emails via Resend with per-event cooldowns. Event timeline on customer page, dashboard badges. Config form deprecates Monitoring UUIDs section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
11 KiB
Felhom Hub — Changelog
v0.3.0 (2026-02-20)
Hub Monitoring Takeover — Event System, Dead Man's Switch, Notifications
Replaces external Healthchecks.io with a Hub-native event system. The Hub becomes the single source of truth for all customer monitoring, event tracking, dead man's switch alerting, and notification delivery.
Phase 1 — Event System
eventstable in SQLite: stores all events with customer_id, event_type, severity, message, details_json, source, timestamp- Indexes:
idx_events_customer_created(customer + time DESC),idx_events_type(type + time DESC) - Store methods:
SaveEvent,GetRecentEvents,GetEventsByType,GetLatestEventByType,GetAllRecentEvents,CountEventsBySeverity,PruneEvents,GetActiveCustomerIDs POST /api/v1/eventendpoint: accepts structured events from controllers, validates event_type against 27 allowed types, validates severity (info/warning/error), stores in DB- Enhanced auth:
checkAuthCustomer()validates per-customer API keys match the customer_id in payload; global key bypasses ownership check - Prune: events pruned alongside reports at 04:30 Budapest time
Phase 2 — Dead Man's Switch
- Staleness checker (
internal/monitor/staleness.go): runs every 60s, detects when controllers stop reporting- ok→stale (>30min): inserts
node_stalewarning event - any→down (>60min): inserts
node_downerror event - stale/down→ok: inserts
node_recoveredinfo event - Skips blocked customers, no false alerts on startup
- ok→stale (>30min): inserts
- Backup deadline checker (
internal/monitor/deadline.go): runs daily at 05:00 Budapest- Detects missing
backup_completedevents since midnight → insertsexpected_backup_missederror - Detects missing
db_dump_completedevents → insertsexpected_dbdump_missederror - Grace: skips customers with
node_downstate
- Detects missing
scheduleDaily()helper: goroutine that sleeps until target time (Europe/Budapest), runs function, loops/healthzenhanced: returns 503 if SQLite Ping fails
Phase 3 — Notification System
- Dispatcher (
internal/notify/dispatcher.go): processes events and sends emails via Resend API- Operator channel: English emails to operator for warning/error events, 1h cooldown per customer:eventType
- Customer channel: Hungarian emails per event_type, respects customer preferences (enabled_events, cooldown_hours), blocked customers skipped
- Test bypass:
testevent type skips cooldown/preferences, sends directly to customer email
- Email templates (
internal/notify/templates.go): operator (concise English), customer (Hungarian per event type with complete message table) - Cooldown tracking: in-memory maps with per-customer:eventType granularity
customer_notificationstable: addedcooldown_hourscolumn (default 6)notification_logtable: addedchannelcolumn (operator/customer)- Wired into
/api/v1/eventhandler and staleness/deadline checkers
Phase 4 — Hub UI
- Events section on customer detail page: last 50 events, severity filter buttons (All/Errors/Warnings/Info), colored severity badges
- Dashboard badges: error+warning count in last 24h per customer, clickable to customer events
- Notification log: shows channel column (operator/customer) in customer detail page
- Config form: Monitoring UUIDs section marked as "Legacy" with deprecation notice, collapsed by default
Phase 6 — Config Cleanup
controller.yaml.default:monitoring.ping_uuidssection commented out (deprecated)buildConfigJSON: only writesping_uuidsto config JSON if user explicitly provides UUID values (new configs get none)
v0.2.2 (2026-02-20)
Config Hash Comparison
- Config sync status on unified customer page: compares SHA256 hash of controller's
controller.yaml(from report payload) against Hub-generated YAML. Shows "In sync", "Config mismatch", or "Unknown" (controller needs v0.20.0+ to report hash). - Visible in the Controller Update section next to Push Config button.
v0.2.1 (2026-02-20)
Unified Customer Management
All customer views consolidated into a single page. New management features: blocked status, dashboard merge, config push, and auto-config creation.
New features
-
Unified customer page —
/customers/{id}:- Single page showing both configuration info and live report data
- Replaces separate
/configs/{id}(config detail) and/customers/{id}(report detail) pages - Shows config management (credentials, setup commands, YAML preview) when config exists
- Shows "Create Config" button for manual (report-only) customers
- Old
/configs/{id}URLs redirect to/customers/{id}
-
Dashboard shows pending customers:
- Customers with config but no reports appear on dashboard with "PENDING" status
- All metric columns show "—" for pending customers
-
Blocked/Banned status:
- Customers can be blocked via button on detail page
- Blocked customers hidden from Dashboard
- Reports still accepted (prevents controller retry loops) but notifications suppressed
- "BLOCKED" badge shown on Customers list and detail page
- One-click unblock button
-
Config push to controller:
- "Push Config" button on unified page (visible when controller URL known)
- Generates YAML and POSTs to
{controller_url}/api/config/apply - Note: requires controller v0.20.0+ with config apply endpoint
-
Auto-create config from report data:
- "Create Config" button on manual customer pages
- Pre-fills customer name from report, generates credentials
- Redirects to edit form for additional fields
Changes
- Customers list: all rows now link to
/customers/{id}(unified page) - Config badges: new MANAGED/MANUAL/BLOCKED pill-style badges
customer_configstable: addedstatuscolumn (active/blocked)- Status functions handle "pending" and "blocked" status values
v0.2.0 (2026-02-20)
Customer Configuration Management
New "Configurations" section for pre-provisioning customer nodes. Operators can configure
customer settings in the Hub web UI, then docker-setup.sh downloads a ready-made
controller.yaml — reducing deployment to a customer ID and password.
New features
-
Web UI —
/configspages:- List all customer configurations in a table
- Create new configuration: customer identity, infrastructure secrets (CF tunnel/API tokens), git sync credentials, monitoring UUIDs — organized in collapsible sections
- Detail page: shows credentials (retrieval password, per-customer API key) with copy-to-clipboard,
setup commands (
docker-setup.shandcurl), live YAML preview - Edit and delete configurations
- Navigation tabs (Dashboard / Configurations) on all pages
-
Config retrieval API —
GET /api/v1/config/{customer_id}:- Authenticated via
X-Retrieval-Passwordheader (separate from Bearer token) - Generates complete
controller.yamlby deep-merging template with customer overrides - Template sourced from
controller.yaml.example(fetched from Gitea repo periodically) - Falls back to embedded default template if fetcher not configured
- Authenticated via
-
Per-customer API keys:
- Each customer config gets its own API key (auto-generated, 64 hex chars)
- Controllers can authenticate with per-customer key instead of the shared global key
- Backward compatible — global
report_api_keycontinues to work alongside per-customer keys
-
YAML generation (
internal/configgenpackage):- Deep-merge of template + customer-specific overrides
- Programmatic injection: customer identity, hub config, session secret
- Shared by both API handler and web UI preview
-
Template fetcher (background goroutine):
- Periodically fetches
controller.yaml.examplefrom Gitea (configurable interval) - Requires
registry.username+registry.tokenin hub.yaml - Falls back to
go:embeddefault template when not configured
- Periodically fetches
-
Data layer:
- New
customer_configsSQLite table - 6 CRUD methods: Save, Get, List, Delete, GetByAPIKey, UpdateRetrievalPassword
- New
Configuration
New registry section in hub.yaml:
registry:
image: "gitea.dooplex.hu/admin/felhom-controller"
username: "" # Gitea credentials (for version checker + template fetcher)
token: ""
check_interval: "6h"
template_interval: "1h" # How often to refresh controller.yaml.example
Files added
internal/configgen/configgen.go— shared YAML generation packageinternal/web/configs.go— web handlers for config CRUDinternal/web/templatefetcher.go— background template refreshinternal/web/controller.yaml.default— embedded fallback templateinternal/web/templates/configs.html— config list pageinternal/web/templates/config_form.html— create/edit forminternal/web/templates/config_detail.html— detail + credentials page
Files modified
internal/store/store.go— customer_configs table + CRUD methodsinternal/api/handler.go— config retrieval endpoint, per-customer auth,ConfigTemplateProviderinterfaceinternal/web/server.go—/configs/*routes,SetTemplateFetcher()internal/web/embed.go— embedded default templateinternal/web/templates/dashboard.html— navigation barinternal/web/templates/customer.html— navigation barinternal/web/templates/style.css— form, nav, button, credential stylescmd/hub/main.go— template fetcher wiring,TemplateIntervalconfigconfigs/hub.yaml.example— registry section
v0.1.8 (2026-02-16)
- Controller update trigger: "Update" button on customer detail page calls controller's self-update endpoint
- Registry version checker: background goroutine checks Gitea registry for latest controller image tag
- Update available indicator on customer detail page
v0.1.7 (2026-02-15)
- Infrastructure backup endpoints for disaster recovery (POST + GET
/api/v1/infra-backup)
v0.1.6 (2026-02-14)
- Handle disabled reporting status
- Storage labels display
- Date in history table
v0.1.5 (2026-02-13)
- Notification preferences sync endpoint (
POST /api/v1/preferences) - Notification display on customer detail page
v0.1.4 (2026-02-12)
- Resend API key support for email notifications
- Notification endpoint (
POST /api/v1/notify)
v0.1.3 (2026-02-11)
- Customer detail page: system info, storage bars, container table
- 24h history graphs
v0.1.2 (2026-02-10)
- Dashboard auto-refresh (60s cycle)
- Status logic (green/yellow/red based on report age + health)
v0.1.1 (2026-02-09)
- Basic dashboard with customer overview table
- Report ingest API
v0.1.0 (2026-02-08)
- Initial release: SQLite store, report API, basic web dashboard