- Add Configuration page with "Refresh Assets" button - Replace seedIfEmpty with seedOrUpdate (SHA-256 compare on startup) - Translate all Hungarian text on Apps pages to English - Add Configuration tab to all template navigation - Expand isAssetFile to match favicon patterns - Add felhom-logo.svg to website assets for the pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
felhom-hub
Central operator dashboard for monitoring and managing Felhom customer deployments.
A lightweight Go service that receives periodic reports and structured events from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery, event-based dead man's switch monitoring, and notification dispatch.
Current version: v0.5.0
Architecture
Customer nodes Central Hub (k3s)
┌─────────────────┐ ┌────────────────────────┐
│ felhom-controller│──── JSON push ────▶│ felhom-hub │
│ (every 15 min) │ (Bearer auth) │ │
│ │ │ ┌─────────────────┐ │
│ POST /api/v1/ │ │ │ API Handler │ │
│ report │ │ │ (ingest reports, │ │
│ infra-backup │◀── config push ────│ │ infra backups, │ │
│ notify │ (YAML body) │ │ config push, │ │
│ │ │ │ asset serving) │ │
│ GET /api/v1/ │ │ └────────┬────────┘ │
│ assets/* │◀── asset download ─│ │ │
└─────────────────┘ (Bearer auth) │ ┌────────▼────────┐ │
│ │ SQLite Store │ │
Operator browser │ │ (reports, │ │
┌─────────────────┐ │ │ assets, │ │
│ Web Dashboard │◀── HTML pages ──────│ │ infra_backups, │ │
│ (hub.felhom.eu) │ (bcrypt auth) │ │ configs, │ │
└─────────────────┘ │ │ notifications) │ │
│ └─────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │ Asset Manager │ │
│ │ (PVC storage, │ │
│ │ SHA-256 manifest│ │
│ │ file serving) │ │
│ └─────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │ Web Dashboard │ │
│ │ (unified customer│ │
│ │ management) │ │
│ └─────────────────┘ │
└────────────────────────┘
API Endpoints
All API endpoints require Authorization: Bearer <api_key> (except /healthz and /api/v1/config/{id}). Auth accepts both the global report_api_key and per-customer API keys (generated when creating customer configs).
Report Ingest
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/report |
Controller pushes periodic status report (v0.28.0+ includes app_telemetry field) |
GET |
/api/v1/customers |
List all customers with latest report summary |
GET |
/api/v1/customers/{id} |
Get latest full report for a customer |
GET |
/api/v1/customers/{id}/history?period=7d |
Get report history |
The POST /api/v1/report handler (v0.4.0+) automatically parses the optional app_telemetry JSON array from the request body and stores it in app_telemetry / app_log_issues tables. Old controllers (no app_telemetry key) continue to work unchanged.
Infrastructure Backup (Disaster Recovery)
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/infra-backup |
Controller pushes infrastructure snapshot |
GET |
/api/v1/infra-backup/{customer_id} |
Fresh controller pulls backup for restore |
The infra-backup payload contains everything needed to restore a customer deployment:
controller.yaml(base64, full config including secrets)settings.json(base64, backup preferences, storage paths)- Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
- Deployed stacks manifest (app names, HDD paths, display names)
- Restic passwords (primary + cross-drive, for encrypted backup access)
Disaster recovery flow:
- Customer's system drive fails → replaced with fresh Debian install
docker-setup.shdeploys controller with minimal config (domain only)- Controller enters setup wizard → user chooses restore from local drive or Hub
- For Hub restore: calls
GET /api/v1/recovery/{customer_id}(gets config + infra backup) - Controller uses disk UUIDs to auto-mount surviving drives
- Controller restores apps from local backups on those drives
Recovery (Disaster Recovery)
| Method | Path | Description |
|---|---|---|
GET |
/api/v1/recovery/{customer_id} |
Combined recovery: returns generated controller.yaml + infra backup in one response |
Auth: X-Retrieval-Password header (same per-customer password as config retrieval). Response:
{
"customer_id": "example",
"config_yaml": "customer:\n id: example\n ...",
"infra_backup": { ... },
"has_infra_backup": true
}
If no infra backup exists yet, infra_backup is null and has_infra_backup is false.
Report Response
The POST /api/v1/report response now includes customer_blocked: true when the customer's status is "blocked". Controllers use this to detect their standing and enter limited mode after a grace period.
Events
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/event |
Controller pushes structured event (27 allowed types, severity: info/warning/error) |
Events are the primary monitoring mechanism. Each event has: customer_id, event_type, severity, message, details_json, source. Per-customer API keys are validated against the customer_id in the payload. Stored in the events table with automatic pruning.
Hub-generated events (source="hub"):
node_stale/node_down/node_recovered— dead man's switch from staleness checker (every 60s)expected_backup_missed/expected_dbdump_missed— backup deadline checker (daily at 05:00 Budapest)
Notifications
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/notify |
Legacy notification relay (kept for backward compatibility) |
POST |
/api/v1/preferences |
Controller syncs customer notification preferences (email, enabled_events, cooldown_hours) |
Notifications are dispatched automatically when events are processed:
- Operator channel: English emails for warning/error events, 1h cooldown per customer:eventType
- Customer channel: Hungarian emails per event type, respects customer preferences and cooldown (default 6h)
- Email delivery via Resend.com API
Customer Config Retrieval
| Method | Path | Description |
|---|---|---|
GET |
/api/v1/config/{customer_id} |
Download generated controller.yaml (auth: X-Retrieval-Password header) |
Config retrieval uses a separate per-customer retrieval password (not the API key). Retrieval passwords are auto-generated as Hungarian word passphrases (e.g., alma-kerék-madár-felhő) for easy phone-based entry during disaster recovery. The Hub generates a complete controller.yaml by deep-merging controller.yaml.example (periodically fetched from the Gitea repo) with customer-specific overrides (identity, infrastructure tokens, hub API key, session secret).
Assets
| Method | Path | Description |
|---|---|---|
GET |
/api/v1/assets/manifest |
JSON manifest of all assets with SHA-256 checksums |
GET |
/api/v1/assets/file/{filename} |
Download a single asset file (logo, screenshot) |
Assets are stored on the Hub PVC at <dataDir>/assets/. On first run, assets are seeded from the Docker image (/usr/share/felhom/assets-seed/). The manifest includes filename, size, and SHA-256 hash for each file — controllers use this for efficient change detection.
Asset types served: {slug}-logo.svg, {slug}-logo.png, {slug}-screenshot-{N}.webp
The asset manager (internal/assets/) scans the assets directory on startup, builds an in-memory manifest, and serves files with appropriate Content-Type and cache headers. Both endpoints require Bearer token auth (global or per-customer API key).
Health
| Method | Path | Description |
|---|---|---|
GET |
/healthz |
Health check (no auth required, returns 503 if SQLite ping fails) |
Web Dashboard
Protected by bcrypt password + session cookie (7-day expiry).
Authentication & Session Model (internal/web/server.go)
- Login generates a cryptographically random 64-char hex session token stored server-side in a
map[string]*hubSession(+sync.RWMutex). The old literalhub_session=authenticatedcookie is gone. - Each session also stores a per-session CSRF token (separate 64-char hex random value).
- Cookie attributes:
SameSite=Lax,Secure(when TLS),HttpOnly, 7-dayMax-Age. RequireAuthmiddleware validates the session token withsubtle.ConstantTimeCompareand redirects to/loginon failure.CleanupSessions(ctx)goroutine runs hourly to purge expired sessions.
CSRF Protection (internal/web/server.go)
Synchronizer-token CSRF protection on all browser POST/DELETE/PATCH operations:
- CSRF validation block runs at the top of
ServeHTTPbefore routing. - Skipped when: no session cookie present (API/Basic-Auth path); or safe methods (GET/HEAD/OPTIONS).
- Token read from
_csrfform field orX-CSRF-Tokenrequest header. - On failure: JSON
{"ok":false,"error":"CSRF token missing or invalid"}for/api/paths; HTTP 403 text otherwise. - Template delivery:
csrfToken(r)andcsrfField(r)helpers injectCSRFTokenandCSRFFieldinto every render data struct viaconfigs.go. Templates use{{.CSRFField}}in forms andcsrfHeaders()JS helper for fetch calls.
Pages
- Dashboard (
/) — Fleet overview table showing all customers with live status and event count badges (error+warning in last 24h). Config-only customers (no reports yet) appear as "PENDING" with gray badge. Blocked customers are hidden. Auto-refreshes every 60 seconds. - Customers (
/configs) — Customer management list. Shows all customers (both managed and manual), their status, controller version, and config type (MANAGED/MANUAL). Blocked customers shown grayed-out with BLOCKED badge. - Fleet App Analytics (
/apps) — Fleet-wide app telemetry overview (v0.4.0+). Shows all deployed apps across all customers with deployment count, avg/P95 memory, catalog estimate/limit accuracy indicators, and 24h error/warning badge counts. Sortable columns (deployments/memory/errors), 24h/7d/30d time period selector. - App Detail (
/apps/{name}) — Per-app drill-down page with Chart.js memory trend (avg + peak lines, catalog limit dashed line), per-customer breakdown table, and known log issues table (severity, message, occurrence count, affected customers, first/last seen). Shows suggested mem_limit from P95×1.2 rounded to 32 MB. - Unified Customer Detail (
/customers/{id}) — Single page per customer combining config management and live monitoring. Adapts content based on available data:- Managed + reporting: Full view — config info, system metrics, storage, containers, backup status, events timeline (last 50, severity filter), credentials, setup commands, YAML preview, controller update, notifications (with channel column), history
- Managed + no reports yet: Config info, credentials, setup commands, "Waiting for first report" indicator
- Manual (report-only): System metrics, storage, containers, backup, with "Create Config" button to convert to managed
- Config Form (
/configs/new,/configs/{id}/edit) — Create/edit customer configurations with identity, infrastructure tokens, and monitoring overrides. Legacy Monitoring UUIDs section collapsed by default with deprecation notice
Customer States
| State | Dashboard | Customers List | Detail Page |
|---|---|---|---|
| Active + reporting | Shown with live status | MANAGED + status badge | Full unified view |
| Active + no reports | Shown as PENDING (gray) | MANAGED + no status | Config + "waiting for report" |
| Manual (report-only) | Shown with live status | MANUAL + status badge | Reports + "Create Config" button |
| Blocked | Hidden | Shown grayed-out, BLOCKED badge | Blocked banner + Unblock button |
Customer Actions
| Action | Description |
|---|---|
| Block/Unblock | Toggle blocked status — blocked customers are hidden from dashboard and notifications are suppressed, but reports are still accepted and stored |
| Push Config | Generate YAML from Hub config and POST it to the controller's /api/config/apply endpoint (requires controller URL from reports) |
| Pull Config | Import controller's current config into Hub — fetches live YAML via GET /api/config, extracts identity and override fields, updates Hub's stored config |
| Show Diff | Compare Hub-generated config with controller's live config — shows per-key differences in a color-coded table (value-based comparison, ignores key ordering and volatile fields) |
| Create Config | Auto-create a managed config from a manual customer's report data, then redirect to edit form |
| Trigger Update | Instruct controller to self-update to the latest version |
| Delete | Remove customer config (customer reappears as manual if reports continue) |
Status Logic
- OK (green): report < 30 min old, health = ok
- WARN (yellow): 30-60 min stale or health = warn
- DOWN (red): > 60 min stale or health = fail
- DISABLED (gray): controller monitoring paused
- PENDING (gray): config exists but no reports received yet
- BLOCKED (gray): customer blocked by operator
Data Storage
SQLite with WAL mode. Tables:
| Table | Purpose |
|---|---|
reports |
Full JSON reports with denormalized fields for dashboard queries |
events |
Structured events from controllers and Hub (type, severity, message, details, source) |
infra_backups |
Per-customer infrastructure snapshots for disaster recovery |
customer_notifications |
Email, enabled event types, cooldown hours per customer |
notification_log |
Send/skip/fail history for notifications with channel (operator/customer) |
customer_configs |
Pre-configured customer settings, retrieval passwords, per-customer API keys, status (active/blocked) |
Retention: configurable (default 90 days), daily prune at 04:30 Budapest time.
PVC Asset Storage
App assets (logos, screenshots, branding) are stored on the PVC at <dataDir>/assets/. On every startup, the Hub compares SHA-256 checksums between the image seed (/usr/share/felhom/assets-seed/) and the PVC, updating any changed files. This means redeploying the Hub image with updated assets automatically propagates changes without PVC deletion.
A manual "Refresh Assets from Image" button is available on the Configuration page (/configuration) for triggering a re-seed + manifest rebuild on demand.
Configuration
# hub.yaml
auth:
password_hash: "" # bcrypt hash for dashboard login (empty = no auth)
api:
report_api_key: "" # Bearer token for API auth
notifications:
resend_api_key: "" # Resend.com API key for email
from_email: "monitoring@felhom.eu"
operator_email: "" # Operator alert recipient
operator_enabled: true # Enable operator email notifications
retention:
max_days: 90
prune_schedule: "04:30"
alerting:
stale_threshold: "30m" # Customer considered stale after this duration
registry:
image: "gitea.dooplex.hu/admin/felhom-controller"
username: "" # Gitea registry credentials
token: ""
check_interval: "30m" # How often to check for new controller versions
template_interval: "1h" # How often to refresh controller.yaml.example
server:
listen: ":8080"
data_dir: "/data" # SQLite database location
Deployment
Runs on k3s (Kubernetes) in the felhom-system namespace:
- PVC: 1GB Longhorn volume for SQLite database + app assets
- Resources: 64Mi-256Mi memory, 50m-500m CPU
- Ingress:
hub.felhom.euwith TLS (cert-manager) - Geo-restriction: Hungary only (nginx annotation)
# Build and push (on 192.168.0.180)
cd ~/build/felhom-hub
./build.sh v0.3.8 --push
# Build script auto-syncs app assets from website/assets/ into the image
# Deploy (ArgoCD managed — update manifests/hub.yaml image tag, commit+push)
git pull && kubectl apply -f manifests/hub.yaml
# Check
kubectl logs -n felhom-system -l app=hub --tail 20
Note: kubectl set image alone does NOT persist — ArgoCD reverts it. Always update manifests/hub.yaml and apply.
The Dockerfile includes COPY assets/ /usr/share/felhom/assets-seed/ which bakes app assets into the image as a seed for the PVC. The build script copies *-logo.svg, *-logo.png, and *-screenshot-*.webp from the website repo's assets/ directory.
Background Services
| Service | Schedule | Description |
|---|---|---|
| Staleness checker | Every 60s | Detects controllers that stopped reporting. Generates node_stale (>30min), node_down (>60min), node_recovered events |
| Backup deadline checker | Daily 05:00 Budapest | Detects missing backup/db-dump events since midnight. Generates expected_backup_missed, expected_dbdump_missed events |
| Report/event prune | Daily 04:30 Budapest | Deletes reports and events older than retention period (default 90 days) |
| Registry version check | Every 30min | Checks Gitea registry for new controller image tags |
| Template refresh | Every 1h | Fetches latest controller.yaml.example from Gitea |
| Asset seeding | On startup | Compares SHA-256 checksums and updates changed assets from Docker image seed |
Internal Packages
| Package | Purpose |
|---|---|
internal/api |
REST API handler (report ingest, config, events, assets, notifications) |
internal/web |
Web dashboard (session auth, customer management, fleet overview) |
internal/assets |
PVC asset manager (manifest generation, SHA-256 checksums, file serving, image seed) |
internal/configgen |
Shared YAML config generation (deep-merge template + customer overrides) |
Dependencies
golang.org/x/crypto— bcrypt for password hashinggopkg.in/yaml.v3— YAML config parsingmodernc.org/sqlite— Pure Go SQLite (no CGo)