diff --git a/hub/README.md b/hub/README.md new file mode 100644 index 0000000..bf7051c --- /dev/null +++ b/hub/README.md @@ -0,0 +1,164 @@ +# felhom-hub + +**Central operator dashboard for monitoring and managing Felhom customer deployments.** + +A lightweight Go service that receives periodic reports from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery. + +**Current version: v0.1.6** + +--- + +## Architecture + +``` + Customer nodes Central Hub (k3s) +┌─────────────────┐ ┌────────────────────────┐ +│ felhom-controller│──── JSON push ────▶│ felhom-hub │ +│ (every 15 min) │ (Bearer auth) │ │ +│ │ │ ┌─────────────────┐ │ +│ POST /api/v1/ │ │ │ API Handler │ │ +│ report │ │ │ (ingest reports, │ │ +│ infra-backup │ │ │ infra backups) │ │ +│ notify │ │ └────────┬────────┘ │ +│ │ │ │ │ +└─────────────────┘ │ ┌────────▼────────┐ │ + │ │ SQLite Store │ │ + Operator browser │ │ (reports, │ │ +┌─────────────────┐ │ │ infra_backups, │ │ +│ Web Dashboard │◀── HTML pages ──────│ │ notifications) │ │ +│ (hub.felhom.eu) │ (bcrypt auth) │ └─────────────────┘ │ +└─────────────────┘ │ │ + │ ┌─────────────────┐ │ + │ │ Web Dashboard │ │ + │ │ (multi-customer │ │ + │ │ overview) │ │ + │ └─────────────────┘ │ + └────────────────────────┘ +``` + +## API Endpoints + +All API endpoints require `Authorization: Bearer ` (except `/healthz`). + +### Report Ingest + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/api/v1/report` | Controller pushes periodic status report | +| `GET` | `/api/v1/customers` | List all customers with latest report summary | +| `GET` | `/api/v1/customers/{id}` | Get latest full report for a customer | +| `GET` | `/api/v1/customers/{id}/history?period=7d` | Get report history | + +### Infrastructure Backup (Disaster Recovery) + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/api/v1/infra-backup` | Controller pushes infrastructure snapshot | +| `GET` | `/api/v1/infra-backup/{customer_id}` | Fresh controller pulls backup for restore | + +The infra-backup payload contains everything needed to restore a customer deployment: +- `controller.yaml` (base64, full config including secrets) +- `settings.json` (base64, backup preferences, storage paths) +- Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology) +- Deployed stacks manifest (app names, HDD paths, display names) +- Restic passwords (primary + cross-drive, for encrypted backup access) + +**Disaster recovery flow:** +1. Customer's system drive fails → replaced with fresh Debian install +2. `docker-setup.sh` deploys controller with Hub details (customer_id + API key) +3. Controller detects fresh deployment → calls `GET /api/v1/infra-backup/{customer_id}` +4. Controller uses disk UUIDs to auto-mount surviving drives +5. Controller restores apps from local backups on those drives + +### Notifications + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/api/v1/notify` | Controller sends event notification (backup_failed, disk_warning, etc.) | +| `POST` | `/api/v1/preferences` | Controller syncs customer notification preferences | + +Notifications are sent via Resend.com email API. + +### Health + +| Method | Path | Description | +|--------|------|-------------| +| `GET` | `/healthz` | Health check (no auth required) | + +## Web Dashboard + +Protected by bcrypt password + session cookie (7-day expiry). + +- **Customer overview table:** status indicators (OK/WARN/DOWN), CPU/memory %, disk usage, container counts, backup age, controller version +- **Customer detail page:** system info, storage bars, container table, notification preferences, notification log, 24h history graphs +- **Auto-refresh:** 60-second cycle +- **Status logic:** + - Green: report < 30 min old, health = ok + - Yellow: 30-60 min stale or health = warn + - Red: > 60 min stale or health = fail + +## Data Storage + +SQLite with WAL mode. Tables: + +| Table | Purpose | +|-------|---------| +| `reports` | Full JSON reports with denormalized fields for dashboard queries | +| `infra_backups` | Per-customer infrastructure snapshots for disaster recovery | +| `customer_notifications` | Email + enabled event types per customer | +| `notification_log` | Send/skip/fail history for notifications | + +Retention: configurable (default 90 days), daily prune at 04:30 Budapest time. + +## Configuration + +```yaml +# hub.yaml +auth: + password_hash: "" # bcrypt hash for dashboard login (empty = no auth) + +api: + report_api_key: "" # Bearer token for API auth + +notifications: + resend_api_key: "" # Resend.com API key for email + from_email: "monitoring@felhom.eu" + +retention: + max_days: 90 + prune_schedule: "04:30" + +alerting: + stale_threshold: "30m" # Customer considered stale after this duration + +server: + listen: ":8080" + data_dir: "/data" # SQLite database location +``` + +## Deployment + +Runs on k3s (Kubernetes) in the `felhom-system` namespace: +- **PVC:** 1GB Longhorn volume for SQLite database +- **Resources:** 64Mi-256Mi memory, 50m-500m CPU +- **Ingress:** `hub.felhom.eu` with TLS (cert-manager) +- **Geo-restriction:** Hungary only (nginx annotation) + +```bash +# Build and push +cd hub/ +make VERSION=0.2.0 docker docker-push + +# Deploy +kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.0 +kubectl rollout status -n felhom-system deploy/hub + +# Check +kubectl logs -n felhom-system -l app=hub --tail 20 +``` + +## Dependencies + +- `golang.org/x/crypto` — bcrypt for password hashing +- `gopkg.in/yaml.v3` — YAML config parsing +- `modernc.org/sqlite` — Pure Go SQLite (no CGo) diff --git a/hub/internal/api/handler.go b/hub/internal/api/handler.go index 5508fa6..6db205f 100644 --- a/hub/internal/api/handler.go +++ b/hub/internal/api/handler.go @@ -44,6 +44,10 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { h.handleReport(w, r) case r.Method == http.MethodPost && path == "/notify": h.handleNotify(w, r) + case r.Method == http.MethodPost && path == "/infra-backup": + h.handleInfraBackupPush(w, r) + case r.Method == http.MethodGet && strings.HasPrefix(path, "/infra-backup/"): + h.handleInfraBackupGet(w, r, strings.TrimPrefix(path, "/infra-backup/")) case r.Method == http.MethodPost && path == "/preferences": h.handleSavePreferences(w, r) case r.Method == http.MethodGet && path == "/customers": @@ -322,6 +326,71 @@ func (h *Handler) handleSavePreferences(w http.ResponseWriter, r *http.Request) w.Write([]byte(`{"status":"ok"}`)) } +// handleInfraBackupPush stores an infrastructure snapshot from a controller. +func (h *Handler) handleInfraBackupPush(w http.ResponseWriter, r *http.Request) { + if h.apiKey != "" { + auth := r.Header.Get("Authorization") + if !strings.HasPrefix(auth, "Bearer ") || strings.TrimPrefix(auth, "Bearer ") != h.apiKey { + http.Error(w, "Unauthorized", http.StatusUnauthorized) + return + } + } + + body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20)) // 1MB limit + if err != nil { + http.Error(w, "Bad request", http.StatusBadRequest) + return + } + + var payload struct { + CustomerID string `json:"customer_id"` + } + if err := json.Unmarshal(body, &payload); err != nil || payload.CustomerID == "" { + http.Error(w, "Invalid payload: customer_id required", http.StatusBadRequest) + return + } + + if err := h.store.SaveInfraBackup(payload.CustomerID, body); err != nil { + h.logger.Printf("[ERROR] Failed to save infra backup for %s: %v", payload.CustomerID, err) + http.Error(w, "Internal error", http.StatusInternalServerError) + return + } + + h.logger.Printf("[INFO] Infra backup saved for %s (%d bytes)", payload.CustomerID, len(body)) + w.WriteHeader(http.StatusOK) + w.Write([]byte(`{"status":"ok"}`)) +} + +// handleInfraBackupGet returns the infrastructure backup for a customer. +func (h *Handler) handleInfraBackupGet(w http.ResponseWriter, r *http.Request, customerID string) { + if h.apiKey != "" { + auth := r.Header.Get("Authorization") + if !strings.HasPrefix(auth, "Bearer ") || strings.TrimPrefix(auth, "Bearer ") != h.apiKey { + http.Error(w, "Unauthorized", http.StatusUnauthorized) + return + } + } + + if customerID == "" { + http.Error(w, "Missing customer_id", http.StatusBadRequest) + return + } + + data, err := h.store.GetInfraBackup(customerID) + if err != nil { + h.logger.Printf("[ERROR] Failed to get infra backup for %s: %v", customerID, err) + http.Error(w, "Internal error", http.StatusInternalServerError) + return + } + if data == nil { + http.NotFound(w, r) + return + } + + w.Header().Set("Content-Type", "application/json") + w.Write(data) +} + // sendResendEmail sends an email via the Resend HTTP API. func (h *Handler) sendResendEmail(to, subject, textBody string) error { payload := map[string]interface{}{ diff --git a/hub/internal/store/store.go b/hub/internal/store/store.go index bfc71f5..0d2f9ff 100644 --- a/hub/internal/store/store.go +++ b/hub/internal/store/store.go @@ -91,6 +91,12 @@ func (s *Store) migrate() error { CREATE INDEX IF NOT EXISTS idx_notification_log_customer ON notification_log(customer_id, created_at DESC); + + CREATE TABLE IF NOT EXISTS infra_backups ( + customer_id TEXT PRIMARY KEY, + backup_json TEXT NOT NULL, + updated_at DATETIME NOT NULL DEFAULT (datetime('now')) + ); `) return err } @@ -380,6 +386,75 @@ func (s *Store) GetCustomerHistory(customerID string, since time.Duration) ([]Cu return history, rows.Err() } +// SaveInfraBackup upserts the infrastructure backup for a customer. +func (s *Store) SaveInfraBackup(customerID string, backupJSON []byte) error { + _, err := s.db.Exec(` + INSERT INTO infra_backups (customer_id, backup_json, updated_at) + VALUES (?, ?, datetime('now')) + ON CONFLICT(customer_id) DO UPDATE SET + backup_json = excluded.backup_json, + updated_at = datetime('now')`, + customerID, string(backupJSON), + ) + return err +} + +// GetInfraBackup returns the raw infra backup JSON for a customer, or nil if not found. +func (s *Store) GetInfraBackup(customerID string) ([]byte, error) { + var data string + err := s.db.QueryRow( + "SELECT backup_json FROM infra_backups WHERE customer_id = ?", + customerID, + ).Scan(&data) + if err == sql.ErrNoRows { + return nil, nil + } + if err != nil { + return nil, err + } + return []byte(data), nil +} + +// InfraBackupMeta holds summary info for the dashboard (avoids parsing full JSON). +type InfraBackupMeta struct { + UpdatedAt time.Time + StackCount int + DiskCount int +} + +// GetInfraBackupMeta returns summary metadata for a customer's infra backup. +func (s *Store) GetInfraBackupMeta(customerID string) (*InfraBackupMeta, error) { + var backupJSON, updatedAt string + err := s.db.QueryRow( + "SELECT backup_json, updated_at FROM infra_backups WHERE customer_id = ?", + customerID, + ).Scan(&backupJSON, &updatedAt) + if err == sql.ErrNoRows { + return nil, nil + } + if err != nil { + return nil, err + } + + meta := &InfraBackupMeta{ + UpdatedAt: parseSQLiteTime(updatedAt), + } + + // Parse just the fields we need + var parsed struct { + DeployedStacks []json.RawMessage `json:"deployed_stacks"` + DiskLayout struct { + Mounts []json.RawMessage `json:"mounts"` + } `json:"disk_layout"` + } + if json.Unmarshal([]byte(backupJSON), &parsed) == nil { + meta.StackCount = len(parsed.DeployedStacks) + meta.DiskCount = len(parsed.DiskLayout.Mounts) + } + + return meta, nil +} + // Prune deletes reports older than the given number of days. func (s *Store) Prune(maxDays int) (int64, error) { cutoff := time.Now().AddDate(0, 0, -maxDays).Format("2006-01-02 15:04:05") diff --git a/hub/internal/web/server.go b/hub/internal/web/server.go index bf50c68..0d20fb1 100644 --- a/hub/internal/web/server.go +++ b/hub/internal/web/server.go @@ -191,6 +191,9 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu notifPrefs, _ := s.store.GetNotificationPrefs(customerID) recentNotifs, _ := s.store.GetRecentNotifications(customerID, 10) + // Get infra backup metadata + infraMeta, _ := s.store.GetInfraBackupMeta(customerID) + type detailData struct { Customer *store.CustomerSummary Report map[string]interface{} @@ -198,6 +201,8 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu OverallStatus string NotifPrefs *store.NotificationPrefs RecentNotifications []store.NotificationLogEntry + InfraBackup *store.InfraBackupMeta + InfraBackupAge string } overallStatus := "ok" @@ -211,6 +216,11 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu overallStatus = "down" } + var infraBackupAge string + if infraMeta != nil { + infraBackupAge = timeAgo(infraMeta.UpdatedAt) + } + data := detailData{ Customer: customer, Report: report, @@ -218,6 +228,8 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu OverallStatus: overallStatus, NotifPrefs: notifPrefs, RecentNotifications: recentNotifs, + InfraBackup: infraMeta, + InfraBackupAge: infraBackupAge, } w.Header().Set("Content-Type", "text/html; charset=utf-8") diff --git a/hub/internal/web/templates/customer.html b/hub/internal/web/templates/customer.html index 4f66a91..14ba483 100644 --- a/hub/internal/web/templates/customer.html +++ b/hub/internal/web/templates/customer.html @@ -127,6 +127,29 @@ {{end}} + +
+

Infra Backup

+ {{if .InfraBackup}} +
+
+ Last Updated + {{.InfraBackupAge}} +
+
+ Deployed Stacks + {{.InfraBackup.StackCount}} +
+
+ Disks + {{.InfraBackup.DiskCount}} +
+
+ {{else}} +

No infra backup received yet

+ {{end}} +
+

Health