# felhom-hub **Central operator dashboard for monitoring and managing Felhom customer deployments.** A lightweight Go service that receives periodic reports from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery. **Current version: v0.2.0** --- ## Architecture ``` Customer nodes Central Hub (k3s) ┌─────────────────┐ ┌────────────────────────┐ │ felhom-controller│──── JSON push ────▶│ felhom-hub │ │ (every 15 min) │ (Bearer auth) │ │ │ │ │ ┌─────────────────┐ │ │ POST /api/v1/ │ │ │ API Handler │ │ │ report │ │ │ (ingest reports, │ │ │ infra-backup │ │ │ infra backups) │ │ │ notify │ │ └────────┬────────┘ │ │ │ │ │ │ └─────────────────┘ │ ┌────────▼────────┐ │ │ │ SQLite Store │ │ Operator browser │ │ (reports, │ │ ┌─────────────────┐ │ │ infra_backups, │ │ │ Web Dashboard │◀── HTML pages ──────│ │ notifications) │ │ │ (hub.felhom.eu) │ (bcrypt auth) │ └─────────────────┘ │ └─────────────────┘ │ │ │ ┌─────────────────┐ │ │ │ Web Dashboard │ │ │ │ (multi-customer │ │ │ │ overview) │ │ │ └─────────────────┘ │ └────────────────────────┘ ``` ## API Endpoints All API endpoints require `Authorization: Bearer ` (except `/healthz` and `/api/v1/config/{id}`). Auth accepts both the global `report_api_key` and per-customer API keys (generated when creating customer configs). ### Report Ingest | Method | Path | Description | |--------|------|-------------| | `POST` | `/api/v1/report` | Controller pushes periodic status report | | `GET` | `/api/v1/customers` | List all customers with latest report summary | | `GET` | `/api/v1/customers/{id}` | Get latest full report for a customer | | `GET` | `/api/v1/customers/{id}/history?period=7d` | Get report history | ### Infrastructure Backup (Disaster Recovery) | Method | Path | Description | |--------|------|-------------| | `POST` | `/api/v1/infra-backup` | Controller pushes infrastructure snapshot | | `GET` | `/api/v1/infra-backup/{customer_id}` | Fresh controller pulls backup for restore | The infra-backup payload contains everything needed to restore a customer deployment: - `controller.yaml` (base64, full config including secrets) - `settings.json` (base64, backup preferences, storage paths) - Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology) - Deployed stacks manifest (app names, HDD paths, display names) - Restic passwords (primary + cross-drive, for encrypted backup access) **Disaster recovery flow:** 1. Customer's system drive fails → replaced with fresh Debian install 2. `docker-setup.sh` deploys controller with Hub details (customer_id + API key) 3. Controller detects fresh deployment → calls `GET /api/v1/infra-backup/{customer_id}` 4. Controller uses disk UUIDs to auto-mount surviving drives 5. Controller restores apps from local backups on those drives ### Notifications | Method | Path | Description | |--------|------|-------------| | `POST` | `/api/v1/notify` | Controller sends event notification (backup_failed, disk_warning, etc.) | | `POST` | `/api/v1/preferences` | Controller syncs customer notification preferences | Notifications are sent via Resend.com email API. ### Customer Config Retrieval | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/v1/config/{customer_id}` | Download generated controller.yaml (auth: `X-Retrieval-Password` header) | Config retrieval uses a separate per-customer retrieval password (not the API key). The Hub generates a complete `controller.yaml` by deep-merging `controller.yaml.example` (periodically fetched from the Gitea repo) with customer-specific overrides (identity, infrastructure tokens, hub API key, session secret). ### Health | Method | Path | Description | |--------|------|-------------| | `GET` | `/healthz` | Health check (no auth required) | ## Web Dashboard Protected by bcrypt password + session cookie (7-day expiry). - **Customer overview table:** status indicators (OK/WARN/DOWN), CPU/memory %, disk usage, container counts, backup age, controller version - **Customer detail page:** system info, storage bars, container table, notification preferences, notification log, 24h history graphs - **Configurations page:** CRUD management for customer configs — pre-configure customer identity, infrastructure secrets, monitoring UUIDs; auto-generates retrieval password + per-customer API key; shows setup commands (`docker-setup.sh` and `curl`); YAML preview - **Auto-refresh:** 60-second cycle - **Status logic:** - Green: report < 30 min old, health = ok - Yellow: 30-60 min stale or health = warn - Red: > 60 min stale or health = fail ## Data Storage SQLite with WAL mode. Tables: | Table | Purpose | |-------|---------| | `reports` | Full JSON reports with denormalized fields for dashboard queries | | `infra_backups` | Per-customer infrastructure snapshots for disaster recovery | | `customer_notifications` | Email + enabled event types per customer | | `notification_log` | Send/skip/fail history for notifications | | `customer_configs` | Pre-configured customer settings, retrieval passwords, per-customer API keys | Retention: configurable (default 90 days), daily prune at 04:30 Budapest time. ## Configuration ```yaml # hub.yaml auth: password_hash: "" # bcrypt hash for dashboard login (empty = no auth) api: report_api_key: "" # Bearer token for API auth notifications: resend_api_key: "" # Resend.com API key for email from_email: "monitoring@felhom.eu" retention: max_days: 90 prune_schedule: "04:30" alerting: stale_threshold: "30m" # Customer considered stale after this duration registry: image: "gitea.dooplex.hu/admin/felhom-controller" username: "" # Gitea registry credentials token: "" check_interval: "30m" # How often to check for new controller versions template_interval: "1h" # How often to refresh controller.yaml.example server: listen: ":8080" data_dir: "/data" # SQLite database location ``` ## Deployment Runs on k3s (Kubernetes) in the `felhom-system` namespace: - **PVC:** 1GB Longhorn volume for SQLite database - **Resources:** 64Mi-256Mi memory, 50m-500m CPU - **Ingress:** `hub.felhom.eu` with TLS (cert-manager) - **Geo-restriction:** Hungary only (nginx annotation) ```bash # Build and push cd hub/ make VERSION=0.2.0 docker docker-push # Deploy kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.0 kubectl rollout status -n felhom-system deploy/hub # Check kubectl logs -n felhom-system -l app=hub --tail 20 ``` ## Dependencies - `golang.org/x/crypto` — bcrypt for password hashing - `gopkg.in/yaml.v3` — YAML config parsing - `modernc.org/sqlite` — Pure Go SQLite (no CGo)