felhom.eu/hub/README.md

# felhom-hub

**Central operator dashboard for monitoring and managing Felhom customer deployments.**

A lightweight Go service that receives periodic reports from felhom-controller instances, stores them in SQLite, and provides a web dashboard for fleet monitoring. Also serves as the infrastructure backup store for disaster recovery.

**Current version: v0.1.6**

---

## Architecture

```
   Customer nodes                             Central Hub (k3s)
┌─────────────────┐                     ┌────────────────────────┐
│ felhom-controller│──── JSON push ────▶│  felhom-hub            │
│ (every 15 min)   │    (Bearer auth)   │                        │
│                  │                     │  ┌─────────────────┐   │
│ POST /api/v1/    │                     │  │ API Handler     │   │
│   report         │                     │  │ (ingest reports, │   │
│   infra-backup   │                     │  │  infra backups)  │   │
│   notify         │                     │  └────────┬────────┘   │
│                  │                     │           │             │
└─────────────────┘                     │  ┌────────▼────────┐   │
                                        │  │ SQLite Store    │   │
   Operator browser                     │  │ (reports,       │   │
┌─────────────────┐                     │  │  infra_backups, │   │
│ Web Dashboard   │◀── HTML pages ──────│  │  notifications) │   │
│ (hub.felhom.eu) │    (bcrypt auth)    │  └─────────────────┘   │
└─────────────────┘                     │                        │
                                        │  ┌─────────────────┐   │
                                        │  │ Web Dashboard   │   │
                                        │  │ (multi-customer │   │
                                        │  │  overview)      │   │
                                        │  └─────────────────┘   │
                                        └────────────────────────┘
```

## API Endpoints

All API endpoints require `Authorization: Bearer <report_api_key>` (except `/healthz`).

### Report Ingest

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/v1/report` | Controller pushes periodic status report |
| `GET` | `/api/v1/customers` | List all customers with latest report summary |
| `GET` | `/api/v1/customers/{id}` | Get latest full report for a customer |
| `GET` | `/api/v1/customers/{id}/history?period=7d` | Get report history |

### Infrastructure Backup (Disaster Recovery)

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/v1/infra-backup` | Controller pushes infrastructure snapshot |
| `GET` | `/api/v1/infra-backup/{customer_id}` | Fresh controller pulls backup for restore |

The infra-backup payload contains everything needed to restore a customer deployment:
- `controller.yaml` (base64, full config including secrets)
- `settings.json` (base64, backup preferences, storage paths)
- Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
- Deployed stacks manifest (app names, HDD paths, display names)
- Restic passwords (primary + cross-drive, for encrypted backup access)

**Disaster recovery flow:**
1. Customer's system drive fails → replaced with fresh Debian install
2. `docker-setup.sh` deploys controller with Hub details (customer_id + API key)
3. Controller detects fresh deployment → calls `GET /api/v1/infra-backup/{customer_id}`
4. Controller uses disk UUIDs to auto-mount surviving drives
5. Controller restores apps from local backups on those drives

### Notifications

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/v1/notify` | Controller sends event notification (backup_failed, disk_warning, etc.) |
| `POST` | `/api/v1/preferences` | Controller syncs customer notification preferences |

Notifications are sent via Resend.com email API.

### Health

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/healthz` | Health check (no auth required) |

## Web Dashboard

Protected by bcrypt password + session cookie (7-day expiry).

- **Customer overview table:** status indicators (OK/WARN/DOWN), CPU/memory %, disk usage, container counts, backup age, controller version
- **Customer detail page:** system info, storage bars, container table, notification preferences, notification log, 24h history graphs
- **Auto-refresh:** 60-second cycle
- **Status logic:**
  - Green: report < 30 min old, health = ok
  - Yellow: 30-60 min stale or health = warn
  - Red: > 60 min stale or health = fail

## Data Storage

SQLite with WAL mode. Tables:

| Table | Purpose |
|-------|---------|
| `reports` | Full JSON reports with denormalized fields for dashboard queries |
| `infra_backups` | Per-customer infrastructure snapshots for disaster recovery |
| `customer_notifications` | Email + enabled event types per customer |
| `notification_log` | Send/skip/fail history for notifications |

Retention: configurable (default 90 days), daily prune at 04:30 Budapest time.

## Configuration

```yaml
# hub.yaml
auth:
  password_hash: ""           # bcrypt hash for dashboard login (empty = no auth)

api:
  report_api_key: ""          # Bearer token for API auth

notifications:
  resend_api_key: ""          # Resend.com API key for email
  from_email: "monitoring@felhom.eu"

retention:
  max_days: 90
  prune_schedule: "04:30"

alerting:
  stale_threshold: "30m"      # Customer considered stale after this duration

server:
  listen: ":8080"
  data_dir: "/data"           # SQLite database location
```

## Deployment

Runs on k3s (Kubernetes) in the `felhom-system` namespace:
- **PVC:** 1GB Longhorn volume for SQLite database
- **Resources:** 64Mi-256Mi memory, 50m-500m CPU
- **Ingress:** `hub.felhom.eu` with TLS (cert-manager)
- **Geo-restriction:** Hungary only (nginx annotation)

```bash
# Build and push
cd hub/
make VERSION=0.2.0 docker docker-push

# Deploy
kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.0
kubectl rollout status -n felhom-system deploy/hub

# Check
kubectl logs -n felhom-system -l app=hub --tail 20
```

## Dependencies

- `golang.org/x/crypto` — bcrypt for password hashing
- `gopkg.in/yaml.v3` — YAML config parsing
- `modernc.org/sqlite` — Pure Go SQLite (no CGo)