Files
felhom.eu/hub/CHANGELOG.md
T
admin a757bee07a feat(hub): app telemetry analytics dashboard (v0.4.0)
- store/telemetry.go: new app_telemetry + app_log_issues tables with
  SaveAppTelemetry, GetFleetAppSummary (with P95), GetAppTelemetryHistory,
  GetAppCustomerBreakdown, GetCustomerAppSummary, GetAppIssues, prune methods
- api/handler.go: parse and save optional app_telemetry from report body,
  backward-compatible with old controllers
- cmd/hub/main.go: prune app_telemetry (90d) and stale issues (30d)
- web/apps.go: handleApps + handleAppDetail + chart data aggregation helpers
- web/server.go: routes for /apps, /apps/{name}, /static/chart.min.js;
  added memoryColor/accuracyClass/gt template functions
- web/embed.go: embed static/chart.min.js
- web/configs.go: add app telemetry section to handleCustomerUnified
- templates/apps.html: fleet-wide app list with summary cards and sortable table
- templates/app_detail.html: per-app page with Chart.js memory trend,
  customer breakdown, and known issues table
- templates/customer_unified.html: new Alkalmazás telemetria card
- templates/style.css: badge, summary-card, chart, period-selector,
  accuracy-dot, mem-color, data-table styles
- All templates: added Alkalmazások nav link

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 10:46:50 +01:00

17 KiB
Raw Blame History

Felhom Hub — Changelog

v0.4.0 (2026-02-23)

App Telemetry & Analytics Dashboard

Added

  • app_telemetry and app_log_issues SQLite tables (store/store.go) — store per-app resource metrics and deduplicated log issues reported by v0.28.0+ controllers.
  • internal/store/telemetry.go — New store methods: SaveAppTelemetry, GetFleetAppSummary (with P95 memory calculation), GetAppTelemetryHistory, GetAppCustomerBreakdown, GetCustomerAppSummary, GetAppIssues, GetRecentIssuesAllApps, PruneAppTelemetry, PruneStaleIssues. New types: AppTelemetryRecord, FleetAppSummary, AppTelemetryPoint, AppCustomerStats, CustomerAppSummary, AppIssue.
  • /api/v1/report handler update (api/handler.go) — After saving the standard report, parses the optional app_telemetry JSON field and persists it. Backward-compatible: old controllers (no app_telemetry key) are unaffected.
  • Fleet app list page (GET /apps) — Hungarian-language dashboard showing all deployed apps fleet-wide with deployment count, avg/P95 memory, catalog estimate/limit accuracy, error/warning badges. Sortable columns, 24h/7d/30d period selector.
  • Per-app detail page (GET /apps/{name}) — Memory trend Chart.js chart (avg + peak, with catalog limit line), per-customer breakdown table, known log issues table (severity, message, occurrence count, affected customers). Includes suggested mem_limit from P95×1.2 rounded to 32M.
  • Customer detail page telemetry section (customer_unified.html) — New "Alkalmazás telemetria" card with per-app memory (current/avg/peak) and log error/warning counts linking to /apps/{name}.
  • Chart.js (static/chart.min.js) — Embedded from controller build, served at /static/chart.min.js.
  • "Alkalmazások" nav link — Added to header navigation across all templates.
  • New CSS (style.css) — .badge, .badge-error, .badge-warn, .summary-cards, .summary-card, .chart-container, .period-selector, .period-btn, .accuracy-dot, .mem-ok/warn/danger, .data-table styles.
  • Telemetry pruning (cmd/hub/main.go) — pruneAll() now also prunes app_telemetry rows older than 90 days and stale log issues not seen in 30 days.

Changed

  • internal/web/apps.go (new file) — handleApps, handleAppDetail, parsePeriod, sortFleetSummary, aggregateHistoryForChart, parseLimitMB, memoryColor, accuracyClass, getCSRFToken helper functions.
  • internal/web/server.go — Added routes for /apps, /apps/{name}, /static/chart.min.js. Added memoryColor, accuracyClass, gt template functions.
  • internal/web/embed.go — Added //go:embed static/chart.min.js directive.

v0.3.7 (2026-02-21)

Asset management API

  • New internal/assets package: manages app assets (logos, screenshots) on Hub PVC (/data/assets/) with automatic seeding from baked-in image copy on first run.
  • Two new authenticated API endpoints for controllers to sync assets:
    • GET /api/v1/assets/manifest — returns JSON manifest with filenames + SHA-256 checksums
    • GET /api/v1/assets/file/{filename} — serves individual asset files
  • Dockerfile updated to COPY assets/ /usr/share/felhom/assets-seed/ for first-run seeding.
  • Build script syncs website assets (*-logo.{svg,png}, *-screenshot-*.webp) into Docker build context.

v0.3.6 (2026-02-21)

Human-friendly retrieval passwords

  • Retrieval passwords now use Hungarian word passphrases (e.g. áldás-plazmid-palánta-süvítve-pócgém) instead of 64-char hex strings.
  • Embedded 29K+ curated Hungarian word list (hungarian.txt) via go:embed; 5-word passphrases give ~74 bits of entropy.
  • New configgen.RandomPassphrase(wordCount) function; all 3 retrieval password generation sites updated.
  • API keys remain as hex (machine-to-machine, never typed by humans).

v0.3.5 (2026-02-21)

Recovery Endpoint & Customer Standing

  • New GET /api/v1/recovery/{customer_id} endpoint: returns both generated controller.yaml and infra backup in a single response for disaster recovery. Auth via X-Retrieval-Password header (same as config retrieval).
  • Report response now includes customer_blocked: true when customer status is "blocked" — allows controllers to detect standing and enter limited mode.

v0.3.4 (2026-02-20)

  • Rename version labels: "Current version" → "Controller version", "Latest version" → "Registry latest".

v0.3.3 (2026-02-20)

Bugfixes

  • Fix double "v" prefix in controller version display (showed "vv0.21.1" instead of "v0.21.1").
  • Skip deprecated monitoring.ping_uuids.* keys in config diff comparison (added to volatile keys).

v0.3.2 (2026-02-20)

Hub Version Display

  • Show Hub version in footer of all pages via hubVersion template function.
  • web.New() now accepts version parameter (4th arg) — set via ldflags at build time.

v0.3.1 (2026-02-20)

Config Diff Display + Pull Config

  • Value-based config comparison: Replaced broken SHA256 hash comparison with semantic YAML comparison. Both configs are parsed into maps, flattened to dot-notation keys, and compared by value. Ignores key ordering, whitespace, comments, and volatile fields (web.session_secret). Shows actual diff count on customer page ("⚠ Config mismatch — N differences").
  • Config diff endpoint (GET /customers/{id}/config-diff): Fetches live YAML from controller via new GET /api/config endpoint, generates Hub YAML via configgen.Generate(), returns JSON with per-key diffs (key, hub value, controller value, status). Sensitive values (tokens, passwords, secrets) are masked.
  • Pull Config (POST /customers/{id}/pull-config): Reverse of Push Config — imports controller's current config into the Hub. Extracts identity fields (name, domain, email) and override fields (infrastructure tokens, git credentials, monitoring UUIDs). Preserves existing APIKey and RetrievalPassword.
  • Diff display UI: "Show Diff" button on customer page expands a table showing all key-value differences with color-coded rows (yellow=changed, blue=hub-only, orange=controller-only).
  • Pull Config button: Added next to existing "Push Config" with confirmation dialog.

v0.3.0 (2026-02-20)

Hub Monitoring Takeover — Event System, Dead Man's Switch, Notifications

Replaces external Healthchecks.io with a Hub-native event system. The Hub becomes the single source of truth for all customer monitoring, event tracking, dead man's switch alerting, and notification delivery.

Phase 1 — Event System

  • events table in SQLite: stores all events with customer_id, event_type, severity, message, details_json, source, timestamp
  • Indexes: idx_events_customer_created (customer + time DESC), idx_events_type (type + time DESC)
  • Store methods: SaveEvent, GetRecentEvents, GetEventsByType, GetLatestEventByType, GetAllRecentEvents, CountEventsBySeverity, PruneEvents, GetActiveCustomerIDs
  • POST /api/v1/event endpoint: accepts structured events from controllers, validates event_type against 27 allowed types, validates severity (info/warning/error), stores in DB
  • Enhanced auth: checkAuthCustomer() validates per-customer API keys match the customer_id in payload; global key bypasses ownership check
  • Prune: events pruned alongside reports at 04:30 Budapest time

Phase 2 — Dead Man's Switch

  • Staleness checker (internal/monitor/staleness.go): runs every 60s, detects when controllers stop reporting
    • ok→stale (>30min): inserts node_stale warning event
    • any→down (>60min): inserts node_down error event
    • stale/down→ok: inserts node_recovered info event
    • Skips blocked customers, no false alerts on startup
  • Backup deadline checker (internal/monitor/deadline.go): runs daily at 05:00 Budapest
    • Detects missing backup_completed events since midnight → inserts expected_backup_missed error
    • Detects missing db_dump_completed events → inserts expected_dbdump_missed error
    • Grace: skips customers with node_down state
  • scheduleDaily() helper: goroutine that sleeps until target time (Europe/Budapest), runs function, loops
  • /healthz enhanced: returns 503 if SQLite Ping fails

Phase 3 — Notification System

  • Dispatcher (internal/notify/dispatcher.go): processes events and sends emails via Resend API
    • Operator channel: English emails to operator for warning/error events, 1h cooldown per customer:eventType
    • Customer channel: Hungarian emails per event_type, respects customer preferences (enabled_events, cooldown_hours), blocked customers skipped
    • Test bypass: test event type skips cooldown/preferences, sends directly to customer email
  • Email templates (internal/notify/templates.go): operator (concise English), customer (Hungarian per event type with complete message table)
  • Cooldown tracking: in-memory maps with per-customer:eventType granularity
  • customer_notifications table: added cooldown_hours column (default 6)
  • notification_log table: added channel column (operator/customer)
  • Wired into /api/v1/event handler and staleness/deadline checkers

Phase 4 — Hub UI

  • Events section on customer detail page: last 50 events, severity filter buttons (All/Errors/Warnings/Info), colored severity badges
  • Dashboard badges: error+warning count in last 24h per customer, clickable to customer events
  • Notification log: shows channel column (operator/customer) in customer detail page
  • Config form: Monitoring UUIDs section marked as "Legacy" with deprecation notice, collapsed by default

Phase 6 — Config Cleanup

  • controller.yaml.default: monitoring.ping_uuids section commented out (deprecated)
  • buildConfigJSON: only writes ping_uuids to config JSON if user explicitly provides UUID values (new configs get none)

v0.2.2 (2026-02-20)

Config Hash Comparison

  • Config sync status on unified customer page: compares SHA256 hash of controller's controller.yaml (from report payload) against Hub-generated YAML. Shows "In sync", "Config mismatch", or "Unknown" (controller needs v0.20.0+ to report hash).
  • Visible in the Controller Update section next to Push Config button.

v0.2.1 (2026-02-20)

Unified Customer Management

All customer views consolidated into a single page. New management features: blocked status, dashboard merge, config push, and auto-config creation.

New features

  • Unified customer page — /customers/{id}:

    • Single page showing both configuration info and live report data
    • Replaces separate /configs/{id} (config detail) and /customers/{id} (report detail) pages
    • Shows config management (credentials, setup commands, YAML preview) when config exists
    • Shows "Create Config" button for manual (report-only) customers
    • Old /configs/{id} URLs redirect to /customers/{id}
  • Dashboard shows pending customers:

    • Customers with config but no reports appear on dashboard with "PENDING" status
    • All metric columns show "—" for pending customers
  • Blocked/Banned status:

    • Customers can be blocked via button on detail page
    • Blocked customers hidden from Dashboard
    • Reports still accepted (prevents controller retry loops) but notifications suppressed
    • "BLOCKED" badge shown on Customers list and detail page
    • One-click unblock button
  • Config push to controller:

    • "Push Config" button on unified page (visible when controller URL known)
    • Generates YAML and POSTs to {controller_url}/api/config/apply
    • Note: requires controller v0.20.0+ with config apply endpoint
  • Auto-create config from report data:

    • "Create Config" button on manual customer pages
    • Pre-fills customer name from report, generates credentials
    • Redirects to edit form for additional fields

Changes

  • Customers list: all rows now link to /customers/{id} (unified page)
  • Config badges: new MANAGED/MANUAL/BLOCKED pill-style badges
  • customer_configs table: added status column (active/blocked)
  • Status functions handle "pending" and "blocked" status values

v0.2.0 (2026-02-20)

Customer Configuration Management

New "Configurations" section for pre-provisioning customer nodes. Operators can configure customer settings in the Hub web UI, then docker-setup.sh downloads a ready-made controller.yaml — reducing deployment to a customer ID and password.

New features

  • Web UI — /configs pages:

    • List all customer configurations in a table
    • Create new configuration: customer identity, infrastructure secrets (CF tunnel/API tokens), git sync credentials, monitoring UUIDs — organized in collapsible sections
    • Detail page: shows credentials (retrieval password, per-customer API key) with copy-to-clipboard, setup commands (docker-setup.sh and curl), live YAML preview
    • Edit and delete configurations
    • Navigation tabs (Dashboard / Configurations) on all pages
  • Config retrieval API — GET /api/v1/config/{customer_id}:

    • Authenticated via X-Retrieval-Password header (separate from Bearer token)
    • Generates complete controller.yaml by deep-merging template with customer overrides
    • Template sourced from controller.yaml.example (fetched from Gitea repo periodically)
    • Falls back to embedded default template if fetcher not configured
  • Per-customer API keys:

    • Each customer config gets its own API key (auto-generated, 64 hex chars)
    • Controllers can authenticate with per-customer key instead of the shared global key
    • Backward compatible — global report_api_key continues to work alongside per-customer keys
  • YAML generation (internal/configgen package):

    • Deep-merge of template + customer-specific overrides
    • Programmatic injection: customer identity, hub config, session secret
    • Shared by both API handler and web UI preview
  • Template fetcher (background goroutine):

    • Periodically fetches controller.yaml.example from Gitea (configurable interval)
    • Requires registry.username + registry.token in hub.yaml
    • Falls back to go:embed default template when not configured
  • Data layer:

    • New customer_configs SQLite table
    • 6 CRUD methods: Save, Get, List, Delete, GetByAPIKey, UpdateRetrievalPassword

Configuration

New registry section in hub.yaml:

registry:
  image: "gitea.dooplex.hu/admin/felhom-controller"
  username: ""               # Gitea credentials (for version checker + template fetcher)
  token: ""
  check_interval: "6h"
  template_interval: "1h"   # How often to refresh controller.yaml.example

Files added

  • internal/configgen/configgen.go — shared YAML generation package
  • internal/web/configs.go — web handlers for config CRUD
  • internal/web/templatefetcher.go — background template refresh
  • internal/web/controller.yaml.default — embedded fallback template
  • internal/web/templates/configs.html — config list page
  • internal/web/templates/config_form.html — create/edit form
  • internal/web/templates/config_detail.html — detail + credentials page

Files modified

  • internal/store/store.go — customer_configs table + CRUD methods
  • internal/api/handler.go — config retrieval endpoint, per-customer auth, ConfigTemplateProvider interface
  • internal/web/server.go/configs/* routes, SetTemplateFetcher()
  • internal/web/embed.go — embedded default template
  • internal/web/templates/dashboard.html — navigation bar
  • internal/web/templates/customer.html — navigation bar
  • internal/web/templates/style.css — form, nav, button, credential styles
  • cmd/hub/main.go — template fetcher wiring, TemplateInterval config
  • configs/hub.yaml.example — registry section

v0.1.8 (2026-02-16)

  • Controller update trigger: "Update" button on customer detail page calls controller's self-update endpoint
  • Registry version checker: background goroutine checks Gitea registry for latest controller image tag
  • Update available indicator on customer detail page

v0.1.7 (2026-02-15)

  • Infrastructure backup endpoints for disaster recovery (POST + GET /api/v1/infra-backup)

v0.1.6 (2026-02-14)

  • Handle disabled reporting status
  • Storage labels display
  • Date in history table

v0.1.5 (2026-02-13)

  • Notification preferences sync endpoint (POST /api/v1/preferences)
  • Notification display on customer detail page

v0.1.4 (2026-02-12)

  • Resend API key support for email notifications
  • Notification endpoint (POST /api/v1/notify)

v0.1.3 (2026-02-11)

  • Customer detail page: system info, storage bars, container table
  • 24h history graphs

v0.1.2 (2026-02-10)

  • Dashboard auto-refresh (60s cycle)
  • Status logic (green/yellow/red based on report age + health)

v0.1.1 (2026-02-09)

  • Basic dashboard with customer overview table
  • Report ingest API

v0.1.0 (2026-02-08)

  • Initial release: SQLite store, report API, basic web dashboard