hub v0.1.7: Infrastructure backup endpoints for disaster recovery
Add infra-backup push/pull API for controller DR:
- POST /api/v1/infra-backup — controller pushes infrastructure snapshot
- GET /api/v1/infra-backup/{customer_id} — fresh controller pulls backup
- infra_backups SQLite table with per-customer snapshots
- Customer detail page shows infra backup status card
- README.md with full API docs and DR flow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+164
@@ -0,0 +1,164 @@
|
|||||||
|
# felhom-hub
|
||||||
|
|
||||||
|
**Central operator dashboard for monitoring and managing Felhom customer deployments.**
|
||||||
|
|
||||||
|
A lightweight Go service that receives periodic reports from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery.
|
||||||
|
|
||||||
|
**Current version: v0.1.6**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Customer nodes Central Hub (k3s)
|
||||||
|
┌─────────────────┐ ┌────────────────────────┐
|
||||||
|
│ felhom-controller│──── JSON push ────▶│ felhom-hub │
|
||||||
|
│ (every 15 min) │ (Bearer auth) │ │
|
||||||
|
│ │ │ ┌─────────────────┐ │
|
||||||
|
│ POST /api/v1/ │ │ │ API Handler │ │
|
||||||
|
│ report │ │ │ (ingest reports, │ │
|
||||||
|
│ infra-backup │ │ │ infra backups) │ │
|
||||||
|
│ notify │ │ └────────┬────────┘ │
|
||||||
|
│ │ │ │ │
|
||||||
|
└─────────────────┘ │ ┌────────▼────────┐ │
|
||||||
|
│ │ SQLite Store │ │
|
||||||
|
Operator browser │ │ (reports, │ │
|
||||||
|
┌─────────────────┐ │ │ infra_backups, │ │
|
||||||
|
│ Web Dashboard │◀── HTML pages ──────│ │ notifications) │ │
|
||||||
|
│ (hub.felhom.eu) │ (bcrypt auth) │ └─────────────────┘ │
|
||||||
|
└─────────────────┘ │ │
|
||||||
|
│ ┌─────────────────┐ │
|
||||||
|
│ │ Web Dashboard │ │
|
||||||
|
│ │ (multi-customer │ │
|
||||||
|
│ │ overview) │ │
|
||||||
|
│ └─────────────────┘ │
|
||||||
|
└────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
All API endpoints require `Authorization: Bearer <report_api_key>` (except `/healthz`).
|
||||||
|
|
||||||
|
### Report Ingest
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `POST` | `/api/v1/report` | Controller pushes periodic status report |
|
||||||
|
| `GET` | `/api/v1/customers` | List all customers with latest report summary |
|
||||||
|
| `GET` | `/api/v1/customers/{id}` | Get latest full report for a customer |
|
||||||
|
| `GET` | `/api/v1/customers/{id}/history?period=7d` | Get report history |
|
||||||
|
|
||||||
|
### Infrastructure Backup (Disaster Recovery)
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `POST` | `/api/v1/infra-backup` | Controller pushes infrastructure snapshot |
|
||||||
|
| `GET` | `/api/v1/infra-backup/{customer_id}` | Fresh controller pulls backup for restore |
|
||||||
|
|
||||||
|
The infra-backup payload contains everything needed to restore a customer deployment:
|
||||||
|
- `controller.yaml` (base64, full config including secrets)
|
||||||
|
- `settings.json` (base64, backup preferences, storage paths)
|
||||||
|
- Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
|
||||||
|
- Deployed stacks manifest (app names, HDD paths, display names)
|
||||||
|
- Restic passwords (primary + cross-drive, for encrypted backup access)
|
||||||
|
|
||||||
|
**Disaster recovery flow:**
|
||||||
|
1. Customer's system drive fails → replaced with fresh Debian install
|
||||||
|
2. `docker-setup.sh` deploys controller with Hub details (customer_id + API key)
|
||||||
|
3. Controller detects fresh deployment → calls `GET /api/v1/infra-backup/{customer_id}`
|
||||||
|
4. Controller uses disk UUIDs to auto-mount surviving drives
|
||||||
|
5. Controller restores apps from local backups on those drives
|
||||||
|
|
||||||
|
### Notifications
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `POST` | `/api/v1/notify` | Controller sends event notification (backup_failed, disk_warning, etc.) |
|
||||||
|
| `POST` | `/api/v1/preferences` | Controller syncs customer notification preferences |
|
||||||
|
|
||||||
|
Notifications are sent via Resend.com email API.
|
||||||
|
|
||||||
|
### Health
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `GET` | `/healthz` | Health check (no auth required) |
|
||||||
|
|
||||||
|
## Web Dashboard
|
||||||
|
|
||||||
|
Protected by bcrypt password + session cookie (7-day expiry).
|
||||||
|
|
||||||
|
- **Customer overview table:** status indicators (OK/WARN/DOWN), CPU/memory %, disk usage, container counts, backup age, controller version
|
||||||
|
- **Customer detail page:** system info, storage bars, container table, notification preferences, notification log, 24h history graphs
|
||||||
|
- **Auto-refresh:** 60-second cycle
|
||||||
|
- **Status logic:**
|
||||||
|
- Green: report < 30 min old, health = ok
|
||||||
|
- Yellow: 30-60 min stale or health = warn
|
||||||
|
- Red: > 60 min stale or health = fail
|
||||||
|
|
||||||
|
## Data Storage
|
||||||
|
|
||||||
|
SQLite with WAL mode. Tables:
|
||||||
|
|
||||||
|
| Table | Purpose |
|
||||||
|
|-------|---------|
|
||||||
|
| `reports` | Full JSON reports with denormalized fields for dashboard queries |
|
||||||
|
| `infra_backups` | Per-customer infrastructure snapshots for disaster recovery |
|
||||||
|
| `customer_notifications` | Email + enabled event types per customer |
|
||||||
|
| `notification_log` | Send/skip/fail history for notifications |
|
||||||
|
|
||||||
|
Retention: configurable (default 90 days), daily prune at 04:30 Budapest time.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# hub.yaml
|
||||||
|
auth:
|
||||||
|
password_hash: "" # bcrypt hash for dashboard login (empty = no auth)
|
||||||
|
|
||||||
|
api:
|
||||||
|
report_api_key: "" # Bearer token for API auth
|
||||||
|
|
||||||
|
notifications:
|
||||||
|
resend_api_key: "" # Resend.com API key for email
|
||||||
|
from_email: "monitoring@felhom.eu"
|
||||||
|
|
||||||
|
retention:
|
||||||
|
max_days: 90
|
||||||
|
prune_schedule: "04:30"
|
||||||
|
|
||||||
|
alerting:
|
||||||
|
stale_threshold: "30m" # Customer considered stale after this duration
|
||||||
|
|
||||||
|
server:
|
||||||
|
listen: ":8080"
|
||||||
|
data_dir: "/data" # SQLite database location
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
Runs on k3s (Kubernetes) in the `felhom-system` namespace:
|
||||||
|
- **PVC:** 1GB Longhorn volume for SQLite database
|
||||||
|
- **Resources:** 64Mi-256Mi memory, 50m-500m CPU
|
||||||
|
- **Ingress:** `hub.felhom.eu` with TLS (cert-manager)
|
||||||
|
- **Geo-restriction:** Hungary only (nginx annotation)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build and push
|
||||||
|
cd hub/
|
||||||
|
make VERSION=0.2.0 docker docker-push
|
||||||
|
|
||||||
|
# Deploy
|
||||||
|
kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.0
|
||||||
|
kubectl rollout status -n felhom-system deploy/hub
|
||||||
|
|
||||||
|
# Check
|
||||||
|
kubectl logs -n felhom-system -l app=hub --tail 20
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
- `golang.org/x/crypto` — bcrypt for password hashing
|
||||||
|
- `gopkg.in/yaml.v3` — YAML config parsing
|
||||||
|
- `modernc.org/sqlite` — Pure Go SQLite (no CGo)
|
||||||
@@ -44,6 +44,10 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
|||||||
h.handleReport(w, r)
|
h.handleReport(w, r)
|
||||||
case r.Method == http.MethodPost && path == "/notify":
|
case r.Method == http.MethodPost && path == "/notify":
|
||||||
h.handleNotify(w, r)
|
h.handleNotify(w, r)
|
||||||
|
case r.Method == http.MethodPost && path == "/infra-backup":
|
||||||
|
h.handleInfraBackupPush(w, r)
|
||||||
|
case r.Method == http.MethodGet && strings.HasPrefix(path, "/infra-backup/"):
|
||||||
|
h.handleInfraBackupGet(w, r, strings.TrimPrefix(path, "/infra-backup/"))
|
||||||
case r.Method == http.MethodPost && path == "/preferences":
|
case r.Method == http.MethodPost && path == "/preferences":
|
||||||
h.handleSavePreferences(w, r)
|
h.handleSavePreferences(w, r)
|
||||||
case r.Method == http.MethodGet && path == "/customers":
|
case r.Method == http.MethodGet && path == "/customers":
|
||||||
@@ -322,6 +326,71 @@ func (h *Handler) handleSavePreferences(w http.ResponseWriter, r *http.Request)
|
|||||||
w.Write([]byte(`{"status":"ok"}`))
|
w.Write([]byte(`{"status":"ok"}`))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// handleInfraBackupPush stores an infrastructure snapshot from a controller.
|
||||||
|
func (h *Handler) handleInfraBackupPush(w http.ResponseWriter, r *http.Request) {
|
||||||
|
if h.apiKey != "" {
|
||||||
|
auth := r.Header.Get("Authorization")
|
||||||
|
if !strings.HasPrefix(auth, "Bearer ") || strings.TrimPrefix(auth, "Bearer ") != h.apiKey {
|
||||||
|
http.Error(w, "Unauthorized", http.StatusUnauthorized)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20)) // 1MB limit
|
||||||
|
if err != nil {
|
||||||
|
http.Error(w, "Bad request", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
var payload struct {
|
||||||
|
CustomerID string `json:"customer_id"`
|
||||||
|
}
|
||||||
|
if err := json.Unmarshal(body, &payload); err != nil || payload.CustomerID == "" {
|
||||||
|
http.Error(w, "Invalid payload: customer_id required", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := h.store.SaveInfraBackup(payload.CustomerID, body); err != nil {
|
||||||
|
h.logger.Printf("[ERROR] Failed to save infra backup for %s: %v", payload.CustomerID, err)
|
||||||
|
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
h.logger.Printf("[INFO] Infra backup saved for %s (%d bytes)", payload.CustomerID, len(body))
|
||||||
|
w.WriteHeader(http.StatusOK)
|
||||||
|
w.Write([]byte(`{"status":"ok"}`))
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleInfraBackupGet returns the infrastructure backup for a customer.
|
||||||
|
func (h *Handler) handleInfraBackupGet(w http.ResponseWriter, r *http.Request, customerID string) {
|
||||||
|
if h.apiKey != "" {
|
||||||
|
auth := r.Header.Get("Authorization")
|
||||||
|
if !strings.HasPrefix(auth, "Bearer ") || strings.TrimPrefix(auth, "Bearer ") != h.apiKey {
|
||||||
|
http.Error(w, "Unauthorized", http.StatusUnauthorized)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if customerID == "" {
|
||||||
|
http.Error(w, "Missing customer_id", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
data, err := h.store.GetInfraBackup(customerID)
|
||||||
|
if err != nil {
|
||||||
|
h.logger.Printf("[ERROR] Failed to get infra backup for %s: %v", customerID, err)
|
||||||
|
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if data == nil {
|
||||||
|
http.NotFound(w, r)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
w.Header().Set("Content-Type", "application/json")
|
||||||
|
w.Write(data)
|
||||||
|
}
|
||||||
|
|
||||||
// sendResendEmail sends an email via the Resend HTTP API.
|
// sendResendEmail sends an email via the Resend HTTP API.
|
||||||
func (h *Handler) sendResendEmail(to, subject, textBody string) error {
|
func (h *Handler) sendResendEmail(to, subject, textBody string) error {
|
||||||
payload := map[string]interface{}{
|
payload := map[string]interface{}{
|
||||||
|
|||||||
@@ -91,6 +91,12 @@ func (s *Store) migrate() error {
|
|||||||
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_notification_log_customer
|
CREATE INDEX IF NOT EXISTS idx_notification_log_customer
|
||||||
ON notification_log(customer_id, created_at DESC);
|
ON notification_log(customer_id, created_at DESC);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS infra_backups (
|
||||||
|
customer_id TEXT PRIMARY KEY,
|
||||||
|
backup_json TEXT NOT NULL,
|
||||||
|
updated_at DATETIME NOT NULL DEFAULT (datetime('now'))
|
||||||
|
);
|
||||||
`)
|
`)
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@@ -380,6 +386,75 @@ func (s *Store) GetCustomerHistory(customerID string, since time.Duration) ([]Cu
|
|||||||
return history, rows.Err()
|
return history, rows.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// SaveInfraBackup upserts the infrastructure backup for a customer.
|
||||||
|
func (s *Store) SaveInfraBackup(customerID string, backupJSON []byte) error {
|
||||||
|
_, err := s.db.Exec(`
|
||||||
|
INSERT INTO infra_backups (customer_id, backup_json, updated_at)
|
||||||
|
VALUES (?, ?, datetime('now'))
|
||||||
|
ON CONFLICT(customer_id) DO UPDATE SET
|
||||||
|
backup_json = excluded.backup_json,
|
||||||
|
updated_at = datetime('now')`,
|
||||||
|
customerID, string(backupJSON),
|
||||||
|
)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetInfraBackup returns the raw infra backup JSON for a customer, or nil if not found.
|
||||||
|
func (s *Store) GetInfraBackup(customerID string) ([]byte, error) {
|
||||||
|
var data string
|
||||||
|
err := s.db.QueryRow(
|
||||||
|
"SELECT backup_json FROM infra_backups WHERE customer_id = ?",
|
||||||
|
customerID,
|
||||||
|
).Scan(&data)
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return []byte(data), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// InfraBackupMeta holds summary info for the dashboard (avoids parsing full JSON).
|
||||||
|
type InfraBackupMeta struct {
|
||||||
|
UpdatedAt time.Time
|
||||||
|
StackCount int
|
||||||
|
DiskCount int
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetInfraBackupMeta returns summary metadata for a customer's infra backup.
|
||||||
|
func (s *Store) GetInfraBackupMeta(customerID string) (*InfraBackupMeta, error) {
|
||||||
|
var backupJSON, updatedAt string
|
||||||
|
err := s.db.QueryRow(
|
||||||
|
"SELECT backup_json, updated_at FROM infra_backups WHERE customer_id = ?",
|
||||||
|
customerID,
|
||||||
|
).Scan(&backupJSON, &updatedAt)
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
meta := &InfraBackupMeta{
|
||||||
|
UpdatedAt: parseSQLiteTime(updatedAt),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse just the fields we need
|
||||||
|
var parsed struct {
|
||||||
|
DeployedStacks []json.RawMessage `json:"deployed_stacks"`
|
||||||
|
DiskLayout struct {
|
||||||
|
Mounts []json.RawMessage `json:"mounts"`
|
||||||
|
} `json:"disk_layout"`
|
||||||
|
}
|
||||||
|
if json.Unmarshal([]byte(backupJSON), &parsed) == nil {
|
||||||
|
meta.StackCount = len(parsed.DeployedStacks)
|
||||||
|
meta.DiskCount = len(parsed.DiskLayout.Mounts)
|
||||||
|
}
|
||||||
|
|
||||||
|
return meta, nil
|
||||||
|
}
|
||||||
|
|
||||||
// Prune deletes reports older than the given number of days.
|
// Prune deletes reports older than the given number of days.
|
||||||
func (s *Store) Prune(maxDays int) (int64, error) {
|
func (s *Store) Prune(maxDays int) (int64, error) {
|
||||||
cutoff := time.Now().AddDate(0, 0, -maxDays).Format("2006-01-02 15:04:05")
|
cutoff := time.Now().AddDate(0, 0, -maxDays).Format("2006-01-02 15:04:05")
|
||||||
|
|||||||
@@ -191,6 +191,9 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu
|
|||||||
notifPrefs, _ := s.store.GetNotificationPrefs(customerID)
|
notifPrefs, _ := s.store.GetNotificationPrefs(customerID)
|
||||||
recentNotifs, _ := s.store.GetRecentNotifications(customerID, 10)
|
recentNotifs, _ := s.store.GetRecentNotifications(customerID, 10)
|
||||||
|
|
||||||
|
// Get infra backup metadata
|
||||||
|
infraMeta, _ := s.store.GetInfraBackupMeta(customerID)
|
||||||
|
|
||||||
type detailData struct {
|
type detailData struct {
|
||||||
Customer *store.CustomerSummary
|
Customer *store.CustomerSummary
|
||||||
Report map[string]interface{}
|
Report map[string]interface{}
|
||||||
@@ -198,6 +201,8 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu
|
|||||||
OverallStatus string
|
OverallStatus string
|
||||||
NotifPrefs *store.NotificationPrefs
|
NotifPrefs *store.NotificationPrefs
|
||||||
RecentNotifications []store.NotificationLogEntry
|
RecentNotifications []store.NotificationLogEntry
|
||||||
|
InfraBackup *store.InfraBackupMeta
|
||||||
|
InfraBackupAge string
|
||||||
}
|
}
|
||||||
|
|
||||||
overallStatus := "ok"
|
overallStatus := "ok"
|
||||||
@@ -211,6 +216,11 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu
|
|||||||
overallStatus = "down"
|
overallStatus = "down"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
var infraBackupAge string
|
||||||
|
if infraMeta != nil {
|
||||||
|
infraBackupAge = timeAgo(infraMeta.UpdatedAt)
|
||||||
|
}
|
||||||
|
|
||||||
data := detailData{
|
data := detailData{
|
||||||
Customer: customer,
|
Customer: customer,
|
||||||
Report: report,
|
Report: report,
|
||||||
@@ -218,6 +228,8 @@ func (s *Server) handleCustomerDetail(w http.ResponseWriter, r *http.Request, cu
|
|||||||
OverallStatus: overallStatus,
|
OverallStatus: overallStatus,
|
||||||
NotifPrefs: notifPrefs,
|
NotifPrefs: notifPrefs,
|
||||||
RecentNotifications: recentNotifs,
|
RecentNotifications: recentNotifs,
|
||||||
|
InfraBackup: infraMeta,
|
||||||
|
InfraBackupAge: infraBackupAge,
|
||||||
}
|
}
|
||||||
|
|
||||||
w.Header().Set("Content-Type", "text/html; charset=utf-8")
|
w.Header().Set("Content-Type", "text/html; charset=utf-8")
|
||||||
|
|||||||
@@ -127,6 +127,29 @@
|
|||||||
{{end}}
|
{{end}}
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<!-- Infra Backup (Disaster Recovery) -->
|
||||||
|
<section class="card">
|
||||||
|
<h2>Infra Backup</h2>
|
||||||
|
{{if .InfraBackup}}
|
||||||
|
<div class="info-grid">
|
||||||
|
<div class="info-item">
|
||||||
|
<span class="label">Last Updated</span>
|
||||||
|
<span class="value">{{.InfraBackupAge}}</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-item">
|
||||||
|
<span class="label">Deployed Stacks</span>
|
||||||
|
<span class="value">{{.InfraBackup.StackCount}}</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-item">
|
||||||
|
<span class="label">Disks</span>
|
||||||
|
<span class="value">{{.InfraBackup.DiskCount}}</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
{{else}}
|
||||||
|
<p style="color: #facc15">No infra backup received yet</p>
|
||||||
|
{{end}}
|
||||||
|
</section>
|
||||||
|
|
||||||
<!-- Health -->
|
<!-- Health -->
|
||||||
<section class="card">
|
<section class="card">
|
||||||
<h2>Health</h2>
|
<h2>Health</h2>
|
||||||
|
|||||||
Reference in New Issue
Block a user