Files

T

admin 94efc39c34 v0.6.0: Healthcheck Implementation + Central Push + Multi-Customer Dashboard

2026-02-16 13:05:01 +01:00

27 KiB

Raw Blame History

TASK.md — v0.6.0: Healthcheck Implementation + Central Push + Multi-Customer Dashboard

Version: v0.6.0 Depends on: v0.5.4 (current) Repo: deploy-felhom-compose (controller/ subfolder) Build: ~/build/felhom-controller/build.sh 0.6.0 --push Deploy target: demo-felhom.eu (N100) + k3s cluster (dooplex.hu)

Context

The controller already has health monitoring infrastructure built in v0.4.0:

internal/monitor/pinger.go — Healthchecks.io-compatible HTTP ping client (success/fail/start, retries)
internal/monitor/healthcheck.go — System health checks (disk, memory, CPU, temp, Docker, protected containers)
Scheduler jobs in main.go: system-health (every 5m), db-dump (daily), backup (daily)
Backup manager already calls pinger.Ping()/pinger.Fail() after each operation

Problem: The demo-felhom Healthchecks project has zero checks created (screenshot confirms empty project at status.felhom.eu/projects/.../checks/). The controller.yaml on demo-felhom has all CHANGEME placeholder UUIDs. Nothing is actually pinging.

Additionally, there are legacy bash scripts (backup-healthcheck.sh, monitoring-setup.sh) from the pre-controller era that duplicate functionality now built into the controller. These should be deprecated in favor of controller-native pings.

This version has two major parts:

Prerequisite: Get healthchecks actually working on demo-felhom (create checks, configure UUIDs, verify pings)
New feature: Central push from customer controllers to k3s + multi-customer overview dashboard

Part 0: Healthcheck Ping Design (controller.yaml schema update)

Current ping types (already implemented in code)

Ping	Schedule	Source	What it proves
`system_health`	Every 5 min	`monitor.RunHealthCheck()`	Server alive, Docker running, disks OK, protected containers up, CPU/mem/temp within thresholds
`db_dump`	Daily 02:30	`backup.RunDBDumps()`	Database dumps completed successfully
`backup`	Daily 03:00	`backup.RunBackup()`	Restic snapshot completed successfully

New ping types to add

Ping	Schedule	Source	What it proves
`backup_integrity`	Weekly (Sunday 04:00)	New: `backup.RunIntegrityCheck()`	Restic repo passes `restic check` — data is not corrupted
`heartbeat`	Every 5 min	New: lightweight HTTP POST, no logic	Controller process is alive (distinct from `system_health` which does heavy checks and could fail due to a bug while the controller itself is fine)

Revised `controller.yaml` monitoring section

monitoring:
  enabled: true
  healthchecks_base: "https://status.felhom.eu"
  ping_uuids:
    heartbeat: ""              # NEW — every 1 min, controller alive
    system_health: ""          # existing — every 5 min, comprehensive check
    db_dump: ""                # existing — daily after db dumps
    backup: ""                 # existing — daily after restic snapshot
    backup_integrity: ""       # NEW — weekly after restic check
  system_health_interval: "5m"
  health_check_schedule: "06:00"
  thresholds:
    disk_warn_percent: 80
    disk_crit_percent: 90
    backup_max_age_hours: 36
    cpu_warn_percent: 90
    memory_warn_percent: 85
    temperature_warn_celsius: 75

Note: Empty string and "CHANGEME..." UUIDs are both skipped by the pinger (already implemented). This means any check can be left unconfigured — the controller just skips it silently.

Healthchecks check configuration (to be created manually on status.felhom.eu)

For each customer project, create these checks:

Check name	Period	Grace	Tags
`heartbeat`	5 minutes	10 minutes	`heartbeat`
`system-health`	5 minutes	10 minutes	`system`, `health`
`db-dump`	1 day (02:30 CET)	30 minutes	`backup`, `db`
`backup`	1 day (03:00 CET)	60 minutes	`backup`, `restic`
`backup-integrity`	7 days	24 hours	`backup`, `integrity`

Part 1: Controller-side healthcheck implementation

Task 1.1: Add heartbeat ping

Files: cmd/controller/main.go

Add a new scheduler job — the simplest possible ping, no health check logic:

// Heartbeat — lightweight "I'm alive" signal
sched.Every("heartbeat", 5*time.Minute, func(ctx context.Context) error {
    pinger.Ping(cfg.Monitoring.PingUUIDs.Heartbeat, "")
    return nil
})

Files: internal/config/config.go

Add Heartbeat field to PingUUIDsConfig:

type PingUUIDsConfig struct {
    Heartbeat       string `yaml:"heartbeat"`
    DBDump          string `yaml:"db_dump"`
    Backup          string `yaml:"backup"`
    SystemHealth    string `yaml:"system_health"`
    BackupIntegrity string `yaml:"backup_integrity"`  // new
}

Task 1.2: Add backup integrity check

Files: internal/backup/restic.go

Add a Check() method (may already exist as part of prune logic — verify first):

// Check runs `restic check` to verify repository integrity.
func (r *ResticRunner) Check() error {
    args := []string{"check", "--repo", r.repo, "--json"}
    // ... standard exec with password file, timeout 30 min
}

Files: internal/backup/backup.go

Add RunIntegrityCheck():

// RunIntegrityCheck runs restic check and pings healthchecks with the result.
func (m *Manager) RunIntegrityCheck(ctx context.Context) error {
    err := m.restic.Check()
    uuid := m.cfg.Monitoring.PingUUIDs.BackupIntegrity
    if err != nil {
        m.pinger.Fail(uuid, fmt.Sprintf("restic check failed: %v", err))
        return err
    }
    m.pinger.Ping(uuid, "restic check passed")
    return nil
}

Files: cmd/controller/main.go

if cfg.Backup.Enabled && backupMgr != nil {
    // ... existing daily jobs ...

    // Weekly integrity check — Sunday 04:00
    sched.Daily("backup-integrity", "04:00", func(ctx context.Context) error {
        if time.Now().Weekday() != time.Sunday {
            return nil // skip non-Sundays
        }
        return backupMgr.RunIntegrityCheck(ctx)
    })
}

Note on scheduler: Daily() fires every day at the given time. To make it weekly, check the weekday inside the function. If you prefer, add a Weekly() method to the scheduler — but the weekday check is simpler and consistent with how prune already works.

Task 1.3: Update example config

Files: controller/configs/controller.yaml.example

Update the monitoring.ping_uuids section to include heartbeat and backup_integrity fields. Add comments explaining each.

Task 1.4: Deprecation note for bash monitoring scripts

The following files in deploy-felhom-compose/monitoring/ are superseded by the controller's built-in monitoring:

backup-healthcheck.sh → replaced by internal/monitor/healthcheck.go (scheduler: system-health)
monitoring-setup.sh → no longer needed (controller reads controller.yaml directly)
monitoring.conf.template → replaced by controller.yaml monitoring section
backup-healthcheck.service / .timer → replaced by controller's scheduler

Action: Add a DEPRECATED.md in deploy-felhom-compose/monitoring/ explaining that these scripts are kept for reference only and should not be used on nodes running felhom-controller v0.4.0+. Do NOT delete the files yet — they may be needed if a customer is still on a pre-controller setup.

Verification (Part 1)

After building and deploying v0.6.0 to demo-felhom:

Check controller logs: docker logs felhom-controller --since 5m | grep -i "ping\|health\|heartbeat"
Verify pings arrive at status.felhom.eu — all 5 checks should show green within 10 minutes
Test failure: docker stop traefik, wait 5 min, check that system-health goes red (protected container missing)
Restart traefik: docker start traefik, verify recovery

Part 2: Central push to k3s (customer → operator reporting)

Architecture

┌─────────────────────────┐         HTTPS POST /api/v1/report
│  Customer controller    │────────────────────────────────────────┐
│  (demo-felhom.eu)       │   every 15 min (configurable)         │
└─────────────────────────┘                                       ▼
                                                    ┌─────────────────────────────┐
┌─────────────────────────┐         HTTPS POST      │  felhom-hub                 │
│  Customer controller    │────────────────────────▶│  (k3s pod on dooplex.hu)    │
│  (customer-2)           │                         │                             │
└─────────────────────────┘                         │  - Receives reports         │
                                                    │  - Stores in SQLite         │
                                                    │  - Serves dashboard         │
                                                    │  - Alerts on stale reports  │
                                                    └─────────────────────────────┘
                                                          hub.felhom.eu

Task 2.1: Define the report payload

The controller pushes a JSON summary every 15 minutes. This is not raw metrics — it's an aggregated health summary.

{
  "version": 1,
  "customer_id": "demo-felhom",
  "customer_name": "Demo Ügyfél",
  "controller_version": "0.6.0",
  "timestamp": "2026-02-16T12:00:00Z",
  "system": {
    "hostname": "demo-felhom",
    "os": "Debian GNU/Linux 13 (trixie)",
    "kernel": "6.12.69+deb13-amd64",
    "cpu_model": "Intel N100",
    "cpu_cores": 4,
    "uptime_seconds": 345600,
    "cpu_percent": 12.5,
    "memory_total_mb": 15872,
    "memory_used_mb": 4200,
    "memory_percent": 26.5,
    "temperature_celsius": 48.0,
    "load_avg_1": 0.45,
    "load_avg_5": 0.38,
    "load_avg_15": 0.32
  },
  "storage": [
    { "mount": "/", "total_gb": 476.0, "used_gb": 28.5, "percent": 6.0 },
    { "mount": "/mnt/hdd_1", "total_gb": 931.0, "used_gb": 120.3, "percent": 12.9 }
  ],
  "containers": {
    "total": 16,
    "running": 14,
    "stopped": 2,
    "unhealthy": 0,
    "list": [
      { "name": "paperless-ngx-webserver-1", "state": "running", "cpu_percent": 2.1, "memory_mb": 350 },
      { "name": "traefik", "state": "running", "cpu_percent": 0.3, "memory_mb": 45 }
    ]
  },
  "backup": {
    "enabled": true,
    "last_db_dump": "2026-02-16T02:30:15Z",
    "last_snapshot": "2026-02-16T03:02:45Z",
    "snapshot_count": 42,
    "repo_size_mb": 2048,
    "last_integrity_check": "2026-02-09T04:00:00Z",
    "integrity_ok": true
  },
  "health": {
    "status": "ok",
    "issues": [],
    "warnings": ["Disk /mnt/hdd_1 at 82%"]
  },
  "stacks": {
    "deployed": ["paperless-ngx", "immich", "jellyfin"],
    "available": ["nextcloud", "vaultwarden", "home-assistant"],
    "updates_available": 1
  }
}

Task 2.2: Implement report builder in the controller

New file: controller/internal/report/builder.go

package report

// Report is the JSON payload pushed to the central hub.
type Report struct {
    Version           int              `json:"version"`
    CustomerID        string           `json:"customer_id"`
    CustomerName      string           `json:"customer_name"`
    ControllerVersion string           `json:"controller_version"`
    Timestamp         time.Time        `json:"timestamp"`
    System            SystemReport     `json:"system"`
    Storage           []StorageReport  `json:"storage"`
    Containers        ContainerReport  `json:"containers"`
    Backup            BackupReport     `json:"backup"`
    Health            HealthReport     `json:"health"`
    Stacks            StacksReport     `json:"stacks"`
}

// BuildReport collects current state from all subsystems and returns a Report.
func BuildReport(cfg *config.Config, stackMgr *stacks.Manager,
    backupMgr *backup.Manager, cpuCollector *system.CPUCollector,
    pinger *monitor.Pinger, version string) *Report {
    // Gather system info from system.GetInfo()
    // Gather container info from stackMgr
    // Gather backup info from backupMgr.GetFullStatus()
    // Gather health from monitor.RunHealthCheck()
    // Gather stack list from stackMgr.GetStacks()
    // Return assembled Report
}

This function should call existing methods — do not duplicate logic. Use the same data sources the dashboard and monitoring page already use.

Task 2.3: Implement report pusher in the controller

New file: controller/internal/report/pusher.go

package report

// Pusher sends reports to the central hub.
type Pusher struct {
    hubURL     string
    apiKey     string
    httpClient *http.Client
    logger     *log.Logger
    enabled    bool
}

// Push sends a report to the hub. Returns nil on success.
// Retries 3 times with 5s backoff. Never returns error to caller
// (push failures should not affect controller operation).
func (p *Pusher) Push(report *Report) error {
    // JSON marshal
    // POST to hubURL + "/api/v1/report"
    // Header: Authorization: Bearer <apiKey>
    // Header: Content-Type: application/json
    // Retry on failure
    // Log but don't propagate errors
}

Task 2.4: Add hub configuration to controller.yaml

Files: internal/config/config.go, controller/configs/controller.yaml.example

# --- Central hub (operator dashboard) ---
hub:
  enabled: false                              # Enable central reporting
  url: "https://hub.felhom.eu"                # Hub API endpoint
  api_key: ""                                 # Shared secret for authentication
  push_interval: "15m"                        # How often to push reports

type HubConfig struct {
    Enabled      bool   `yaml:"enabled"`
    URL          string `yaml:"url"`
    APIKey       string `yaml:"api_key"`
    PushInterval string `yaml:"push_interval"`
}

Add Hub HubConfig yaml:"hub"`` to the main Config struct.

Task 2.5: Wire the pusher into main.go

// --- Central hub reporting ---
if cfg.Hub.Enabled && cfg.Hub.URL != "" {
    pushInterval, err := time.ParseDuration(cfg.Hub.PushInterval)
    if err != nil {
        pushInterval = 15 * time.Minute
    }
    pusher := report.NewPusher(&cfg.Hub, logger)
    sched.Every("hub-report", pushInterval, func(ctx context.Context) error {
        r := report.BuildReport(cfg, stackMgr, backupMgr, cpuCollector, pinger, version)
        return pusher.Push(r)
    })
    logger.Printf("[INFO] Hub reporting enabled (every %s to %s)", pushInterval, cfg.Hub.URL)
}

Verification (Part 2)

Set hub.enabled: true and hub.url to a temporary endpoint (e.g., https://webhook.site/...) in demo-felhom's controller.yaml
Restart controller, check logs for "Hub reporting enabled"
Wait 15 min (or set push_interval: "1m" for testing), verify JSON arrives at the endpoint
Validate JSON structure matches the spec above
Reset push_interval to "15m" after testing

Part 3: Hub service on k3s (operator side)

Overview

The hub is a lightweight Go service deployed on Viktor's k3s cluster in the felhom-system namespace. It receives reports from customer controllers, stores them in SQLite, and serves an English-language dashboard for Viktor.

Domain: hub.felhom.eu (Nginx Ingress, cert-manager TLS) Namespace: felhom-system (alongside Healthchecks and other felhom infra) Code: felhom.eu repo on Gitea, hub/ subfolder

Task 3.1: Hub service (subfolder in felhom.eu repository)

The hub lives in the existing felhom.eu repository on Gitea as a hub/ subfolder. It's deployed to the k3s cluster in the felhom-system namespace (alongside Healthchecks and other felhom infra). K8s manifests go in the homelab-manifests repo as usual.

Structure (inside felhom.eu repo):

hub/
├── cmd/hub/main.go              # Entry point
├── internal/
│   ├── api/
│   │   └── handler.go           # POST /api/v1/report, GET /api/v1/customers
│   ├── store/
│   │   └── store.go             # SQLite: save reports, query latest per customer
│   └── web/
│       ├── server.go            # Dashboard HTTP server
│       ├── templates/
│       │   ├── dashboard.html   # Multi-customer overview (English)
│       │   ├── customer.html    # Single customer detail (English)
│       │   └── style.css        # Dark theme matching felhom.eu
│       └── embed.go
├── configs/
│   └── hub.yaml.example
├── Dockerfile
├── Makefile
└── go.mod

K8s manifests in felhom.eu/manifests/ (alongside healthchecks.yaml, webpage.yaml, etc.):

manifests/hub.yaml               # Deployment, Service, Ingress, PVC

Task 3.2: Hub API endpoints

Method	Path	Auth	Description
`POST`	`/api/v1/report`	Bearer token	Receive customer report (JSON body)
`GET`	`/api/v1/customers`	Session/Basic	List all customers with latest status
`GET`	`/api/v1/customers/{id}`	Session/Basic	Get latest report for a customer
`GET`	`/api/v1/customers/{id}/history`	Session/Basic	Get report history (last 24h/7d/30d)
`GET`	`/`	Session/Basic	Dashboard HTML page
`GET`	`/customers/{id}`	Session/Basic	Customer detail HTML page

Authentication:

Report ingest: Bearer token (shared secret per customer, or a single hub-wide key for simplicity)
Dashboard: Basic auth or simple password (Viktor only) — reuse the same bcrypt approach as the controller

Task 3.3: Hub SQLite schema

CREATE TABLE IF NOT EXISTS reports (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    customer_id TEXT NOT NULL,
    received_at DATETIME NOT NULL DEFAULT (datetime('now')),
    report_json TEXT NOT NULL,           -- Full JSON payload
    -- Denormalized fields for fast queries:
    health_status TEXT,                  -- "ok", "warn", "fail"
    cpu_percent REAL,
    memory_percent REAL,
    container_total INTEGER,
    container_running INTEGER,
    backup_last_snapshot DATETIME,
    controller_version TEXT
);

CREATE INDEX IF NOT EXISTS idx_reports_customer ON reports(customer_id, received_at DESC);

-- Prune old reports: keep 30 days of history
-- Run daily: DELETE FROM reports WHERE received_at < datetime('now', '-30 days');

Task 3.4: Hub dashboard UI (English)

Overview page (/):

A table/grid showing all customers at a glance:

Customer	Status	Last seen	CPU	Memory	Disk	Containers	Last backup	Version
🟢 Demo Ügyfél	OK	2 min ago	12%	26%	6%/13%	14/16	3h ago	0.6.0
🟡 Kovács Péter	WARN	18 min ago	45%	78%	82% ⚠️	8/8	4h ago	0.5.4
🔴 Nagy Anna	DOWN	2h ago	–	–	–	–	26h ago ⚠️	0.5.4

Color coding:

🟢 Green: last seen < 30 min AND health = "ok"
🟡 Yellow: last seen < 30 min AND health = "warn", OR last seen 30-60 min
🔴 Red: last seen > 60 min OR health = "fail"

Customer detail page (/customers/{id}):

Last report timestamp
Full system info section (same layout as controller's monitoring page)
Container list with CPU/memory
Backup status details
Health issues/warnings
Report history (collapsible list, last 24h)

Design: English language. Dark theme matching felhom.eu / the controller dashboard. Use the same CSS variables and fonts.

Task 3.5: Hub Kubernetes manifests

File: felhom.eu/manifests/hub.yaml (alongside healthchecks.yaml, webpage.yaml, etc.)

# Namespace: felhom-system (shared with healthchecks and other felhom infra)
# Deployment: 1 replica, 64Mi-256Mi memory
# Service: ClusterIP port 8080
# PVC: 1Gi for SQLite (Longhorn)
# Ingress: hub.felhom.eu via nginx-internal, cert-manager TLS
# Auth: same geo-restriction as other dooplex.hu services (HU only)

ConfigMap for hub.yaml config:

auth:
  password_hash: ""          # bcrypt hash, same approach as controller
api:
  report_api_key: ""         # Bearer token for report ingest
retention:
  max_days: 90               # Keep 90 days of report history
  prune_schedule: "04:30"    # Daily prune
alerting:
  stale_threshold: "30m"     # Alert if customer not seen for 30 min

Task 3.6: Alerting (optional, future enhancement)

When a customer is "stale" (no report for > 30 min), the hub could:

Send a webhook to Healthchecks (one "customer-X-reporting" check per customer)
Send email via Resend
Push to Telegram

For v0.6.0 scope: just show the status on the dashboard. Alerting can be added in v0.6.1.

Part 4: Manual steps for Viktor (demo-felhom setup)

These steps must be done by Viktor manually — Claude Code cannot access status.felhom.eu or the demo-felhom server.

4.1: Create Healthchecks checks on status.felhom.eu

Log into status.felhom.eu
Open the "demo-felhom" project
Create 5 checks with the settings from the table in Part 0
Copy the ping UUIDs for each check

4.2: Update controller.yaml on demo-felhom

SSH into demo-felhom and update /opt/docker/felhom-controller/controller.yaml:

monitoring:
  enabled: true
  healthchecks_base: "https://status.felhom.eu"
  ping_uuids:
    heartbeat: "<UUID-from-step-4.1>"
    system_health: "<UUID-from-step-4.1>"
    db_dump: "<UUID-from-step-4.1>"
    backup: "<UUID-from-step-4.1>"
    backup_integrity: "<UUID-from-step-4.1>"
  system_health_interval: "5m"
  health_check_schedule: "06:00"
  thresholds:
    disk_warn_percent: 80
    disk_crit_percent: 90
    backup_max_age_hours: 36
    cpu_warn_percent: 90
    memory_warn_percent: 85
    temperature_warn_celsius: 75

4.3: Restart controller

cd /opt/docker/felhom-controller
docker compose pull
docker compose up -d
docker logs -f felhom-controller --since 1m

4.4: Verify pings

Wait 5 minutes, then check status.felhom.eu — all 5 checks should be green.

4.5: Deploy hub to k3s (after Part 3 is built)

# Build and push hub image (from felhom.eu repo, hub/ subfolder)
cd hub && make docker-push

# Apply k8s manifests (from felhom.eu repo, manifests/ folder)
kubectl apply -f manifests/hub.yaml

# Configure hub.felhom.eu DNS in Cloudflare
# Update demo-felhom controller.yaml with hub config

Implementation order

Part 1 (controller-side, in deploy-felhom-compose repo):
- Task 1.1: Heartbeat ping (5 min)
- Task 1.2: Backup integrity check (20 min)
- Task 1.3: Update example config (5 min)
- Task 1.4: Deprecation note for bash scripts (5 min)
Part 4.1–4.4 (Viktor manual: create checks, configure UUIDs, verify)
Part 2 (controller-side, report push):
- Task 2.1: Report payload types (10 min)
- Task 2.2: Report builder (30 min)
- Task 2.3: Report pusher (15 min)
- Task 2.4: Hub config in controller.yaml (10 min)
- Task 2.5: Wire into main.go (5 min)
Part 3 (hub in felhom.eu repo, k8s manifests in homelab-manifests):
- Task 3.1: Project scaffold in hub/ subfolder (10 min)
- Task 3.2: API handlers (30 min)
- Task 3.3: SQLite store (20 min)
- Task 3.4: Dashboard UI — English (60 min)
- Task 3.5: K8s manifests in felhom.eu/manifests/ (20 min)
Part 4.5 (Viktor manual: deploy hub, wire everything)

Files to modify (controller repo)

controller/cmd/controller/main.go                     — heartbeat job, integrity job, hub pusher
controller/internal/config/config.go                   — PingUUIDsConfig + HubConfig
controller/internal/backup/backup.go                   — RunIntegrityCheck()
controller/internal/backup/restic.go                   — Check() method (verify/add)
controller/internal/report/builder.go                  — NEW: report assembly
controller/internal/report/pusher.go                   — NEW: HTTP push client
controller/internal/report/types.go                    — NEW: Report struct definitions
controller/configs/controller.yaml.example             — updated monitoring + new hub section
monitoring/DEPRECATED.md                               — NEW: deprecation notice for bash scripts

Files to create (hub — in felhom.eu repo)

hub/cmd/hub/main.go
hub/internal/api/handler.go
hub/internal/store/store.go
hub/internal/web/server.go
hub/internal/web/templates/dashboard.html
hub/internal/web/templates/customer.html
hub/internal/web/templates/style.css
hub/internal/web/embed.go
hub/configs/hub.yaml.example
hub/Dockerfile
hub/Makefile
hub/go.mod
hub/README.md

Files to create (k8s manifests — in felhom.eu repo)

manifests/hub.yaml

Verification checklist

Heartbeat ping arrives every 5 min at status.felhom.eu
System health ping arrives every 5 min with diagnostic body
DB dump ping arrives daily at ~02:30
Backup ping arrives daily at ~03:00
Backup integrity ping arrives weekly on Sunday ~04:00
Stopping a protected container triggers system-health FAIL
Controller logs show "Hub reporting enabled" when hub.enabled=true
Hub receives JSON reports from controller
Hub dashboard shows demo-felhom with green status
Hub dashboard shows "last seen: X min ago" updating correctly
Hub shows red status when controller is stopped for > 60 min
Hub SQLite prunes old reports automatically
All UUIDs are configurable (empty/CHANGEME = silently skipped)

CONTEXT.md update (after completion)

Add to "What was just completed" section:

### What was just completed (session N)
- **v0.6.0 — Healthcheck Implementation + Central Push + Hub Dashboard:**
  - **Healthcheck pings fully operational:** 5 check types (heartbeat, system-health, db-dump, backup, backup-integrity) configured on demo-felhom, all pinging status.felhom.eu
  - **Backup integrity check:** Weekly `restic check` with Healthchecks ping
  - **Central hub reporting:** Controller pushes JSON health summary every 15 min to hub.felhom.eu
  - **felhom-hub service:** New Go service in felhom.eu repo (`hub/` subfolder), k8s manifests in `felhom.eu/manifests/hub.yaml`, deployed on k3s in felhom-system namespace, SQLite storage, English multi-customer dashboard
  - **Deprecated:** Legacy bash monitoring scripts (backup-healthcheck.sh, monitoring-setup.sh) superseded by controller-native monitoring

Also update the repository distinction in CONTEXT.md:

## Repository & manifest layout

- **homelab-manifests** — Viktor's personal k3s apps (*.dooplex.hu): mon-system, servarr, pihole, etc.
- **felhom.eu** — Everything felhom-related:
  - `website/` — felhom.eu public website HTML
  - `manifests/` — k8s manifests for felhom infra in felhom-system namespace (webpage, healthchecks, contact-mailer, umami, hub, felhom.secret)
  - `hub/` — felhom-hub Go service (central multi-customer dashboard)
- **deploy-felhom-compose** — Customer-side: felhom-controller code, deploy scripts, monitoring scripts
- **app-catalog-felhom.eu** — Docker Compose templates for customer apps

27 KiB Raw Blame History Unescape Escape