Files
felhom.eu/hub/README.md
T
admin 41e313bf36 hub v0.1.7: Infrastructure backup endpoints for disaster recovery
Add infra-backup push/pull API for controller DR:
- POST /api/v1/infra-backup — controller pushes infrastructure snapshot
- GET /api/v1/infra-backup/{customer_id} — fresh controller pulls backup
- infra_backups SQLite table with per-customer snapshots
- Customer detail page shows infra backup status card
- README.md with full API docs and DR flow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:17:12 +01:00

6.8 KiB

felhom-hub

Central operator dashboard for monitoring and managing Felhom customer deployments.

A lightweight Go service that receives periodic reports from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery.

Current version: v0.1.6


Architecture

   Customer nodes                             Central Hub (k3s)
┌─────────────────┐                     ┌────────────────────────┐
│ felhom-controller│──── JSON push ────▶│  felhom-hub            │
│ (every 15 min)   │    (Bearer auth)   │                        │
│                  │                     │  ┌─────────────────┐   │
│ POST /api/v1/    │                     │  │ API Handler     │   │
│   report         │                     │  │ (ingest reports, │   │
│   infra-backup   │                     │  │  infra backups)  │   │
│   notify         │                     │  └────────┬────────┘   │
│                  │                     │           │             │
└─────────────────┘                     │  ┌────────▼────────┐   │
                                        │  │ SQLite Store    │   │
   Operator browser                     │  │ (reports,       │   │
┌─────────────────┐                     │  │  infra_backups, │   │
│ Web Dashboard   │◀── HTML pages ──────│  │  notifications) │   │
│ (hub.felhom.eu) │    (bcrypt auth)    │  └─────────────────┘   │
└─────────────────┘                     │                        │
                                        │  ┌─────────────────┐   │
                                        │  │ Web Dashboard   │   │
                                        │  │ (multi-customer │   │
                                        │  │  overview)      │   │
                                        │  └─────────────────┘   │
                                        └────────────────────────┘

API Endpoints

All API endpoints require Authorization: Bearer <report_api_key> (except /healthz).

Report Ingest

Method Path Description
POST /api/v1/report Controller pushes periodic status report
GET /api/v1/customers List all customers with latest report summary
GET /api/v1/customers/{id} Get latest full report for a customer
GET /api/v1/customers/{id}/history?period=7d Get report history

Infrastructure Backup (Disaster Recovery)

Method Path Description
POST /api/v1/infra-backup Controller pushes infrastructure snapshot
GET /api/v1/infra-backup/{customer_id} Fresh controller pulls backup for restore

The infra-backup payload contains everything needed to restore a customer deployment:

  • controller.yaml (base64, full config including secrets)
  • settings.json (base64, backup preferences, storage paths)
  • Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
  • Deployed stacks manifest (app names, HDD paths, display names)
  • Restic passwords (primary + cross-drive, for encrypted backup access)

Disaster recovery flow:

  1. Customer's system drive fails → replaced with fresh Debian install
  2. docker-setup.sh deploys controller with Hub details (customer_id + API key)
  3. Controller detects fresh deployment → calls GET /api/v1/infra-backup/{customer_id}
  4. Controller uses disk UUIDs to auto-mount surviving drives
  5. Controller restores apps from local backups on those drives

Notifications

Method Path Description
POST /api/v1/notify Controller sends event notification (backup_failed, disk_warning, etc.)
POST /api/v1/preferences Controller syncs customer notification preferences

Notifications are sent via Resend.com email API.

Health

Method Path Description
GET /healthz Health check (no auth required)

Web Dashboard

Protected by bcrypt password + session cookie (7-day expiry).

  • Customer overview table: status indicators (OK/WARN/DOWN), CPU/memory %, disk usage, container counts, backup age, controller version
  • Customer detail page: system info, storage bars, container table, notification preferences, notification log, 24h history graphs
  • Auto-refresh: 60-second cycle
  • Status logic:
    • Green: report < 30 min old, health = ok
    • Yellow: 30-60 min stale or health = warn
    • Red: > 60 min stale or health = fail

Data Storage

SQLite with WAL mode. Tables:

Table Purpose
reports Full JSON reports with denormalized fields for dashboard queries
infra_backups Per-customer infrastructure snapshots for disaster recovery
customer_notifications Email + enabled event types per customer
notification_log Send/skip/fail history for notifications

Retention: configurable (default 90 days), daily prune at 04:30 Budapest time.

Configuration

# hub.yaml
auth:
  password_hash: ""           # bcrypt hash for dashboard login (empty = no auth)

api:
  report_api_key: ""          # Bearer token for API auth

notifications:
  resend_api_key: ""          # Resend.com API key for email
  from_email: "monitoring@felhom.eu"

retention:
  max_days: 90
  prune_schedule: "04:30"

alerting:
  stale_threshold: "30m"      # Customer considered stale after this duration

server:
  listen: ":8080"
  data_dir: "/data"           # SQLite database location

Deployment

Runs on k3s (Kubernetes) in the felhom-system namespace:

  • PVC: 1GB Longhorn volume for SQLite database
  • Resources: 64Mi-256Mi memory, 50m-500m CPU
  • Ingress: hub.felhom.eu with TLS (cert-manager)
  • Geo-restriction: Hungary only (nginx annotation)
# Build and push
cd hub/
make VERSION=0.2.0 docker docker-push

# Deploy
kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.0
kubectl rollout status -n felhom-system deploy/hub

# Check
kubectl logs -n felhom-system -l app=hub --tail 20

Dependencies

  • golang.org/x/crypto — bcrypt for password hashing
  • gopkg.in/yaml.v3 — YAML config parsing
  • modernc.org/sqlite — Pure Go SQLite (no CGo)