Files
felhom.eu/hub/CHANGELOG.md
T
admin 38f3a1e01e feat: per-app telemetry reset button on app detail page
Adds "Telemetria törlése" button that deletes all telemetry records and
known issues for a specific app. Useful after major app updates when old
data is no longer representative.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 15:05:46 +01:00

318 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Felhom Hub — Changelog
## v0.4.1 (2026-02-23)
### Added
- **Per-app telemetry reset** (`store/telemetry.go`, `web/apps.go`) — New "Telemetria törlése" button on the app detail page that deletes all telemetry records and known issues for the selected app. Useful after major app updates when old data is no longer representative. Includes confirmation dialog and flash notification.
- **`DeleteAppTelemetry()`** and **`DeleteAppIssues()`** store methods (`store/telemetry.go`) — Delete all telemetry/issue rows for a specific app_name.
- **`POST /apps/{name}/reset-telemetry`** route (`web/server.go`) — CSRF-protected endpoint that triggers the reset and redirects back with flash message.
## v0.4.0 (2026-02-23)
**App Telemetry & Analytics Dashboard**
### Added
- **`app_telemetry` and `app_log_issues` SQLite tables** (`store/store.go`) — store per-app resource metrics and deduplicated log issues reported by v0.28.0+ controllers.
- **`internal/store/telemetry.go`** — New store methods: `SaveAppTelemetry`, `GetFleetAppSummary` (with P95 memory calculation), `GetAppTelemetryHistory`, `GetAppCustomerBreakdown`, `GetCustomerAppSummary`, `GetAppIssues`, `GetRecentIssuesAllApps`, `PruneAppTelemetry`, `PruneStaleIssues`. New types: `AppTelemetryRecord`, `FleetAppSummary`, `AppTelemetryPoint`, `AppCustomerStats`, `CustomerAppSummary`, `AppIssue`.
- **`/api/v1/report` handler update** (`api/handler.go`) — After saving the standard report, parses the optional `app_telemetry` JSON field and persists it. Backward-compatible: old controllers (no `app_telemetry` key) are unaffected.
- **Fleet app list page** (`GET /apps`) — Hungarian-language dashboard showing all deployed apps fleet-wide with deployment count, avg/P95 memory, catalog estimate/limit accuracy, error/warning badges. Sortable columns, 24h/7d/30d period selector.
- **Per-app detail page** (`GET /apps/{name}`) — Memory trend Chart.js chart (avg + peak, with catalog limit line), per-customer breakdown table, known log issues table (severity, message, occurrence count, affected customers). Includes suggested mem_limit from P95×1.2 rounded to 32M.
- **Customer detail page telemetry section** (`customer_unified.html`) — New "Alkalmazás telemetria" card with per-app memory (current/avg/peak) and log error/warning counts linking to /apps/{name}.
- **Chart.js** (`static/chart.min.js`) — Embedded from controller build, served at `/static/chart.min.js`.
- **"Alkalmazások" nav link** — Added to header navigation across all templates.
- **New CSS** (`style.css`) — `.badge`, `.badge-error`, `.badge-warn`, `.summary-cards`, `.summary-card`, `.chart-container`, `.period-selector`, `.period-btn`, `.accuracy-dot`, `.mem-ok/warn/danger`, `.data-table` styles.
- **Telemetry pruning** (`cmd/hub/main.go`) — `pruneAll()` now also prunes app_telemetry rows older than 90 days and stale log issues not seen in 30 days.
### Changed
- **`internal/web/apps.go`** (new file) — `handleApps`, `handleAppDetail`, `parsePeriod`, `sortFleetSummary`, `aggregateHistoryForChart`, `parseLimitMB`, `memoryColor`, `accuracyClass`, `getCSRFToken` helper functions.
- **`internal/web/server.go`** — Added routes for `/apps`, `/apps/{name}`, `/static/chart.min.js`. Added `memoryColor`, `accuracyClass`, `gt` template functions.
- **`internal/web/embed.go`** — Added `//go:embed static/chart.min.js` directive.
## v0.3.7 (2026-02-21)
**Asset management API**
- New `internal/assets` package: manages app assets (logos, screenshots) on Hub PVC (`/data/assets/`) with automatic seeding from baked-in image copy on first run.
- Two new authenticated API endpoints for controllers to sync assets:
- `GET /api/v1/assets/manifest` — returns JSON manifest with filenames + SHA-256 checksums
- `GET /api/v1/assets/file/{filename}` — serves individual asset files
- Dockerfile updated to `COPY assets/ /usr/share/felhom/assets-seed/` for first-run seeding.
- Build script syncs website assets (`*-logo.{svg,png}`, `*-screenshot-*.webp`) into Docker build context.
## v0.3.6 (2026-02-21)
**Human-friendly retrieval passwords**
- Retrieval passwords now use Hungarian word passphrases (e.g. `áldás-plazmid-palánta-süvítve-pócgém`) instead of 64-char hex strings.
- Embedded 29K+ curated Hungarian word list (`hungarian.txt`) via go:embed; 5-word passphrases give ~74 bits of entropy.
- New `configgen.RandomPassphrase(wordCount)` function; all 3 retrieval password generation sites updated.
- API keys remain as hex (machine-to-machine, never typed by humans).
## v0.3.5 (2026-02-21)
**Recovery Endpoint & Customer Standing**
- New `GET /api/v1/recovery/{customer_id}` endpoint: returns both generated controller.yaml and infra backup in a single response for disaster recovery. Auth via `X-Retrieval-Password` header (same as config retrieval).
- Report response now includes `customer_blocked: true` when customer status is "blocked" — allows controllers to detect standing and enter limited mode.
## v0.3.4 (2026-02-20)
- Rename version labels: "Current version" → "Controller version", "Latest version" → "Registry latest".
## v0.3.3 (2026-02-20)
**Bugfixes**
- Fix double "v" prefix in controller version display (showed "vv0.21.1" instead of "v0.21.1").
- Skip deprecated `monitoring.ping_uuids.*` keys in config diff comparison (added to volatile keys).
## v0.3.2 (2026-02-20)
**Hub Version Display**
- Show Hub version in footer of all pages via `hubVersion` template function.
- `web.New()` now accepts `version` parameter (4th arg) — set via ldflags at build time.
## v0.3.1 (2026-02-20)
**Config Diff Display + Pull Config**
- **Value-based config comparison**: Replaced broken SHA256 hash comparison with semantic YAML comparison. Both configs are parsed into maps, flattened to dot-notation keys, and compared by value. Ignores key ordering, whitespace, comments, and volatile fields (`web.session_secret`). Shows actual diff count on customer page ("⚠ Config mismatch — N differences").
- **Config diff endpoint** (`GET /customers/{id}/config-diff`): Fetches live YAML from controller via new `GET /api/config` endpoint, generates Hub YAML via `configgen.Generate()`, returns JSON with per-key diffs (key, hub value, controller value, status). Sensitive values (tokens, passwords, secrets) are masked.
- **Pull Config** (`POST /customers/{id}/pull-config`): Reverse of Push Config — imports controller's current config into the Hub. Extracts identity fields (name, domain, email) and override fields (infrastructure tokens, git credentials, monitoring UUIDs). Preserves existing APIKey and RetrievalPassword.
- **Diff display UI**: "Show Diff" button on customer page expands a table showing all key-value differences with color-coded rows (yellow=changed, blue=hub-only, orange=controller-only).
- **Pull Config button**: Added next to existing "Push Config" with confirmation dialog.
## v0.3.0 (2026-02-20)
**Hub Monitoring Takeover — Event System, Dead Man's Switch, Notifications**
Replaces external Healthchecks.io with a Hub-native event system. The Hub becomes the single source of truth for all customer monitoring, event tracking, dead man's switch alerting, and notification delivery.
### Phase 1 — Event System
- **`events` table** in SQLite: stores all events with customer_id, event_type, severity, message, details_json, source, timestamp
- **Indexes**: `idx_events_customer_created` (customer + time DESC), `idx_events_type` (type + time DESC)
- **Store methods**: `SaveEvent`, `GetRecentEvents`, `GetEventsByType`, `GetLatestEventByType`, `GetAllRecentEvents`, `CountEventsBySeverity`, `PruneEvents`, `GetActiveCustomerIDs`
- **`POST /api/v1/event`** endpoint: accepts structured events from controllers, validates event_type against 27 allowed types, validates severity (info/warning/error), stores in DB
- **Enhanced auth**: `checkAuthCustomer()` validates per-customer API keys match the customer_id in payload; global key bypasses ownership check
- **Prune**: events pruned alongside reports at 04:30 Budapest time
### Phase 2 — Dead Man's Switch
- **Staleness checker** (`internal/monitor/staleness.go`): runs every 60s, detects when controllers stop reporting
- ok→stale (>30min): inserts `node_stale` warning event
- any→down (>60min): inserts `node_down` error event
- stale/down→ok: inserts `node_recovered` info event
- Skips blocked customers, no false alerts on startup
- **Backup deadline checker** (`internal/monitor/deadline.go`): runs daily at 05:00 Budapest
- Detects missing `backup_completed` events since midnight → inserts `expected_backup_missed` error
- Detects missing `db_dump_completed` events → inserts `expected_dbdump_missed` error
- Grace: skips customers with `node_down` state
- **`scheduleDaily()`** helper: goroutine that sleeps until target time (Europe/Budapest), runs function, loops
- **`/healthz`** enhanced: returns 503 if SQLite Ping fails
### Phase 3 — Notification System
- **Dispatcher** (`internal/notify/dispatcher.go`): processes events and sends emails via Resend API
- **Operator channel**: English emails to operator for warning/error events, 1h cooldown per customer:eventType
- **Customer channel**: Hungarian emails per event_type, respects customer preferences (enabled_events, cooldown_hours), blocked customers skipped
- **Test bypass**: `test` event type skips cooldown/preferences, sends directly to customer email
- **Email templates** (`internal/notify/templates.go`): operator (concise English), customer (Hungarian per event type with complete message table)
- **Cooldown tracking**: in-memory maps with per-customer:eventType granularity
- **`customer_notifications` table**: added `cooldown_hours` column (default 6)
- **`notification_log` table**: added `channel` column (operator/customer)
- Wired into `/api/v1/event` handler and staleness/deadline checkers
### Phase 4 — Hub UI
- **Events section** on customer detail page: last 50 events, severity filter buttons (All/Errors/Warnings/Info), colored severity badges
- **Dashboard badges**: error+warning count in last 24h per customer, clickable to customer events
- **Notification log**: shows channel column (operator/customer) in customer detail page
- **Config form**: Monitoring UUIDs section marked as "Legacy" with deprecation notice, collapsed by default
### Phase 6 — Config Cleanup
- **`controller.yaml.default`**: `monitoring.ping_uuids` section commented out (deprecated)
- **`buildConfigJSON`**: only writes `ping_uuids` to config JSON if user explicitly provides UUID values (new configs get none)
---
## v0.2.2 (2026-02-20)
**Config Hash Comparison**
- **Config sync status** on unified customer page: compares SHA256 hash of controller's
`controller.yaml` (from report payload) against Hub-generated YAML. Shows "In sync",
"Config mismatch", or "Unknown" (controller needs v0.20.0+ to report hash).
- Visible in the Controller Update section next to Push Config button.
---
## v0.2.1 (2026-02-20)
**Unified Customer Management**
All customer views consolidated into a single page. New management features: blocked status,
dashboard merge, config push, and auto-config creation.
### New features
- **Unified customer page — `/customers/{id}`:**
- Single page showing both configuration info and live report data
- Replaces separate `/configs/{id}` (config detail) and `/customers/{id}` (report detail) pages
- Shows config management (credentials, setup commands, YAML preview) when config exists
- Shows "Create Config" button for manual (report-only) customers
- Old `/configs/{id}` URLs redirect to `/customers/{id}`
- **Dashboard shows pending customers:**
- Customers with config but no reports appear on dashboard with "PENDING" status
- All metric columns show "—" for pending customers
- **Blocked/Banned status:**
- Customers can be blocked via button on detail page
- Blocked customers hidden from Dashboard
- Reports still accepted (prevents controller retry loops) but notifications suppressed
- "BLOCKED" badge shown on Customers list and detail page
- One-click unblock button
- **Config push to controller:**
- "Push Config" button on unified page (visible when controller URL known)
- Generates YAML and POSTs to `{controller_url}/api/config/apply`
- Note: requires controller v0.20.0+ with config apply endpoint
- **Auto-create config from report data:**
- "Create Config" button on manual customer pages
- Pre-fills customer name from report, generates credentials
- Redirects to edit form for additional fields
### Changes
- Customers list: all rows now link to `/customers/{id}` (unified page)
- Config badges: new MANAGED/MANUAL/BLOCKED pill-style badges
- `customer_configs` table: added `status` column (active/blocked)
- Status functions handle "pending" and "blocked" status values
---
## v0.2.0 (2026-02-20)
**Customer Configuration Management**
New "Configurations" section for pre-provisioning customer nodes. Operators can configure
customer settings in the Hub web UI, then `docker-setup.sh` downloads a ready-made
`controller.yaml` — reducing deployment to a customer ID and password.
### New features
- **Web UI — `/configs` pages:**
- List all customer configurations in a table
- Create new configuration: customer identity, infrastructure secrets (CF tunnel/API tokens),
git sync credentials, monitoring UUIDs — organized in collapsible sections
- Detail page: shows credentials (retrieval password, per-customer API key) with copy-to-clipboard,
setup commands (`docker-setup.sh` and `curl`), live YAML preview
- Edit and delete configurations
- Navigation tabs (Dashboard / Configurations) on all pages
- **Config retrieval API — `GET /api/v1/config/{customer_id}`:**
- Authenticated via `X-Retrieval-Password` header (separate from Bearer token)
- Generates complete `controller.yaml` by deep-merging template with customer overrides
- Template sourced from `controller.yaml.example` (fetched from Gitea repo periodically)
- Falls back to embedded default template if fetcher not configured
- **Per-customer API keys:**
- Each customer config gets its own API key (auto-generated, 64 hex chars)
- Controllers can authenticate with per-customer key instead of the shared global key
- Backward compatible — global `report_api_key` continues to work alongside per-customer keys
- **YAML generation (`internal/configgen` package):**
- Deep-merge of template + customer-specific overrides
- Programmatic injection: customer identity, hub config, session secret
- Shared by both API handler and web UI preview
- **Template fetcher (background goroutine):**
- Periodically fetches `controller.yaml.example` from Gitea (configurable interval)
- Requires `registry.username` + `registry.token` in hub.yaml
- Falls back to `go:embed` default template when not configured
- **Data layer:**
- New `customer_configs` SQLite table
- 6 CRUD methods: Save, Get, List, Delete, GetByAPIKey, UpdateRetrievalPassword
### Configuration
New `registry` section in `hub.yaml`:
```yaml
registry:
image: "gitea.dooplex.hu/admin/felhom-controller"
username: "" # Gitea credentials (for version checker + template fetcher)
token: ""
check_interval: "6h"
template_interval: "1h" # How often to refresh controller.yaml.example
```
### Files added
- `internal/configgen/configgen.go` — shared YAML generation package
- `internal/web/configs.go` — web handlers for config CRUD
- `internal/web/templatefetcher.go` — background template refresh
- `internal/web/controller.yaml.default` — embedded fallback template
- `internal/web/templates/configs.html` — config list page
- `internal/web/templates/config_form.html` — create/edit form
- `internal/web/templates/config_detail.html` — detail + credentials page
### Files modified
- `internal/store/store.go` — customer_configs table + CRUD methods
- `internal/api/handler.go` — config retrieval endpoint, per-customer auth, `ConfigTemplateProvider` interface
- `internal/web/server.go``/configs/*` routes, `SetTemplateFetcher()`
- `internal/web/embed.go` — embedded default template
- `internal/web/templates/dashboard.html` — navigation bar
- `internal/web/templates/customer.html` — navigation bar
- `internal/web/templates/style.css` — form, nav, button, credential styles
- `cmd/hub/main.go` — template fetcher wiring, `TemplateInterval` config
- `configs/hub.yaml.example` — registry section
---
## v0.1.8 (2026-02-16)
- Controller update trigger: "Update" button on customer detail page calls controller's self-update endpoint
- Registry version checker: background goroutine checks Gitea registry for latest controller image tag
- Update available indicator on customer detail page
## v0.1.7 (2026-02-15)
- Infrastructure backup endpoints for disaster recovery (POST + GET `/api/v1/infra-backup`)
## v0.1.6 (2026-02-14)
- Handle disabled reporting status
- Storage labels display
- Date in history table
## v0.1.5 (2026-02-13)
- Notification preferences sync endpoint (`POST /api/v1/preferences`)
- Notification display on customer detail page
## v0.1.4 (2026-02-12)
- Resend API key support for email notifications
- Notification endpoint (`POST /api/v1/notify`)
## v0.1.3 (2026-02-11)
- Customer detail page: system info, storage bars, container table
- 24h history graphs
## v0.1.2 (2026-02-10)
- Dashboard auto-refresh (60s cycle)
- Status logic (green/yellow/red based on report age + health)
## v0.1.1 (2026-02-09)
- Basic dashboard with customer overview table
- Report ingest API
## v0.1.0 (2026-02-08)
- Initial release: SQLite store, report API, basic web dashboard