# felhom-controller **Central management container for Felhom home servers.** A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware. **Current version: v0.12.9** --- ## Table of Contents - [Architecture](#architecture) - [Features](#features) - [App Management](#1-app-management) - [Backup System](#2-backup-system) - [Storage Management](#3-storage-management) - [Monitoring & Health](#4-monitoring--health) - [Notifications](#5-notifications) - [Update Management](#6-update-management) - [Authentication & Settings](#7-authentication--settings) - [Central Hub](#8-central-hub-reporting) - [Repository Layout](#repository-layout) - [Configuration](#configuration) - [REST API](#rest-api) - [Build & Deploy](#build--deploy) - [Roadmap](#roadmap) --- ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ Customer Hardware (N100 mini PC / Raspberry Pi) │ │ │ │ ┌──────────┐ ┌────────────────────────────────────────────┐ │ │ │ Traefik │ │ felhom-controller (privileged container) │ │ │ │ (reverse │──▶│ │ │ │ │ proxy) │ │ ┌──────────┐ ┌─────────────────────────┐│ │ │ └──────────┘ │ │ Web UI │ │ Stack Manager ││ │ │ │ │ (HU dash │ │ (compose ops, git sync, ││ │ │ ┌──────────┐ │ │ board) │ │ deploy, delete, update) ││ │ │ │cloudflared│ │ └──────────┘ └─────────────────────────┘│ │ │ │ (tunnel) │ │ ┌──────────┐ ┌─────────────────────────┐│ │ │ └──────────┘ │ │ Backup │ │ Storage Manager ││ │ │ │ │ (3-layer │ │ (disk scan, format, ││ │ │ ┌──────────┐ │ │ restic) │ │ mount, migrate) ││ │ │ │ App │ │ └──────────┘ └─────────────────────────┘│ │ │ │ stacks │ │ ┌──────────┐ ┌─────────────────────────┐│ │ │ │ (docker │ │ │Scheduler │ │ Monitor & Metrics ││ │ │ │ compose) │ │ │(cron-like│ │ (health, pings, SQLite ││ │ │ └──────────┘ │ │ jobs) │ │ time-series, Chart.js) ││ │ │ │ └──────────┘ └─────────────────────────┘│ │ │ │ ┌──────────┐ ┌─────────────────────────┐│ │ │ │ │ Notify │ │ REST API + Hub Reporter ││ │ │ │ │ (email) │ │ (JSON push to hub) ││ │ │ │ └──────────┘ └─────────────────────────┘│ │ │ └────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ pings │ JSON push │ git pull ▼ ▼ ▼ status.felhom.eu hub.felhom.eu gitea.dooplex.hu (Healthchecks) (central dashboard) (stack definitions) ``` ### Key Architecture Decisions - **Pure Go, no frameworks** — stdlib `net/http` + `html/template`. Only external deps: `bcrypt`, `yaml.v3`, `modernc.org/sqlite` (pure Go, no CGO). - **Privileged container** — Required for disk operations (format, mount, fstab), `/dev` access, and Docker socket control. - **`/host-dev` indirection** — Docker overrides `/dev` with a tmpfs. The host's `/dev` is mounted at `/host-dev` to access block devices. - **`StackDataProvider` interface** — Breaks circular import between backup and stacks packages. Implemented by `stackAdapter` in `main.go`. - **Atomic file writes** — All persistent state (`settings.json`, `app.yaml`) written to `.tmp` then `os.Rename` for crash safety. - **`go:embed` templates** — All HTML/CSS/JS compiled into the binary. No runtime file dependencies. - **Europe/Budapest timezone** — All scheduled jobs, timestamps, and UI labels use Hungarian timezone. ### Module Map | Module | Path | Responsibility | |--------|------|----------------| | **Config** | `internal/config/` | YAML loader, validation, `FELHOM_*` env overrides | | **Settings** | `internal/settings/` | Runtime-mutable `settings.json` (passwords, backup prefs, storage paths, notifications) | | **Stacks** | `internal/stacks/` | Compose operations, scanning, `.felhom.yml` metadata, deploy/delete flow | | **Sync** | `internal/sync/` | Git-based app catalog sync (clone/pull, content-hash copy) | | **Backup** | `internal/backup/` | 3-layer backup: DB dumps → restic snapshots → cross-drive copies, restore | | **Storage** | `internal/storage/` | Disk scanning (`lsblk`), partitioning (`sfdisk`), formatting (`mkfs.ext4`), mounting, data migration (`rsync`) | | **System** | `internal/system/` | System info (`/proc`), CPU collector, mount points, disk usage, FS info | | **Monitor** | `internal/monitor/` | Healthchecks.io pinger, system health checks | | **Metrics** | `internal/metrics/` | SQLite time-series store, system + container metric collection | | **Scheduler** | `internal/scheduler/` | Central job scheduler (periodic + daily, skip-if-running, panic recovery) | | **Notify** | `internal/notify/` | Email notifications via hub relay, preference sync, per-event cooldowns | | **Report** | `internal/report/` | Hub report builder + HTTP pusher (system, stacks, backup, health) | | **API** | `internal/api/` | REST JSON endpoints | | **Web** | `internal/web/` | Hungarian dashboard, auth, page handlers, template functions, alerts | --- ## Features ### 1. App Management The controller manages Docker Compose stacks through a complete lifecycle: catalog sync, first-time deployment, runtime operations, and deletion. #### Git Sync (`internal/sync/`) The app catalog lives in a separate Git repository. The controller: - Shallow-clones the catalog on startup - Periodically fetches updates (configurable, default 15 min) - Copies only `docker-compose.yml` and `.felhom.yml` to the stacks directory - **Never overwrites** `app.yaml` or `.env` (user secrets are safe) - Uses SHA-256 content hashing — only writes files that actually changed - Triggers stack rescan after sync so the dashboard updates immediately - Manual sync via "Sablonok frissitese" button or `POST /api/sync` #### First-Time Deploy Flow 1. Customer sees app card with "Telepites" button 2. Deploy page shows auto-filled fields (domain), auto-generated secrets (DB passwords, hex keys), and user-configurable inputs (admin password, language, storage path) 3. `checkBeforeDeploy()` JS guard fetches live state first (prevents double-deploy from another tab) 4. **Memory validation** checks `mem_request` against available RAM: - `usable_memory = total_ram - reserved_memory_mb` (default 384MB reserved) - Hard block if requests exceed usable memory - Soft warning if limits exceed total RAM (overcommit OK) 5. Controller generates secrets, saves `app.yaml`, sets in-memory `Deployed` flag **before** `docker compose up -d` (avoids stale UI during slow image pulls), reverts on failure 6. 3-step progress panel polls `GET /api/stacks/{name}` every 3s: config saved → containers starting → health check passed 7. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only #### App Info Pages Each app can define rich metadata in `.felhom.yml`: - `app_info`: tagline, use_cases, first_steps, prerequisites, default_creds, docs_url - `optional_config`: groups of post-deploy configurable env vars (e.g., API keys for metadata providers) - `resources`: mem_request, mem_limit, pi_compatible, needs_hdd The `/apps/{slug}` page renders hero section, screenshots, setup guide, and optional config form. #### Stack Operations | Operation | What it does | |-----------|-------------| | Start | `docker compose up -d` | | Stop | `docker compose stop` (blocked for protected stacks) | | Restart | `docker compose restart` | | Update | `docker compose pull` + `docker compose up -d` | | Delete | `docker compose down --rmi local --volumes` + optional HDD data cleanup | **Protected stacks** (traefik, cloudflared, felhom-controller) cannot be stopped or deleted from the UI. Restart is allowed. **Orphan detection**: Deployed stacks with no matching catalog template are marked as orphaned with an "Elavult" badge and can be safely deleted. #### Container State Display | State | Color | Label | Meaning | |-------|-------|-------|---------| | Running + healthy | Green | "Fut" | All containers running and healthy | | Running + starting | Orange | "Indulas..." | Healthcheck not yet passed | | Running + unhealthy | Yellow | "Nem egeszseges" | Healthcheck failing | | Stopped/exited | Red | "Leallitva" | All containers stopped | | Restarting | Yellow | "Ujrainditas..." | Restart loop | | Not deployed | Gray | "Nincs telepitve" | Compose file exists, not deployed | --- ### 2. Backup System The backup system implements a **3-2-1 backup architecture**. Each tier is a **complete, self-sufficient backup** — any single tier can fully restore an app. | Tier | Contents | Location | Can fully restore? | |------|----------|----------|--------------------| | **1. Nightly restic** | DB + Config + User data | Same drive as app | Yes (not against drive failure) | | **2. Cross-drive** | DB + Config + User data | Different physical device | Yes | | **3. Remote** | Everything | Cloud / remote server | Future | **Key principles:** - User data backup is **mandatory** — every app with HDD bind mounts is included automatically. There is no per-app toggle. - Each tier includes **everything** needed to restore: DB dumps, config, and user data. No tier depends on another tier's data. - **Tier 2 is configurable for ALL apps** — not just apps with HDD data. Non-HDD apps back up config + DB dumps to the secondary drive (small but protects against drive failure). - The `AppBackupPrefs.Enabled` field in settings.json is legacy and not read by any code. **Per-app Tier 2 contents by app type:** | App type | Tier 2 contents | Example | |----------|----------------|---------| | HDD + DB | Config + DB + User data | Immich, Paperless-ngx | | HDD, no DB | Config + User data | — | | DB, no HDD | Config + DB | Mealie, Vikunja | | Config only | Config | Gokapi, Homepage | #### Tier 1: Nightly Backup (mandatory, same drive) The nightly backup has two phases that run sequentially: **Phase 1 — Database Dumps** (`internal/backup/dbdump.go`, scheduled 02:30) - **Auto-discovery** of PostgreSQL and MariaDB containers via `docker ps` + `docker inspect` - Dumps via `docker exec pg_dump` / `docker exec mariadb-dump` with 5-minute timeout - Atomic writes (`.tmp` → `.sql`) to prevent corruption - **Validation** after each dump: checks file size, header presence, counts `CREATE TABLE` - Results cached in `settings.json` surviving container restarts **Phase 2 — Restic Snapshot** (`internal/backup/restic.go`, scheduled 03:00) - Auto-generated repository password (32 random bytes, base64url), synced to hub - **Paths included in every snapshot:** - Stacks dir (all compose.yml + app.yaml + .felhom.yml) - DB dump dir (all `.sql` dump files from Phase 1) - `controller.yaml` (controller config) - **ALL deployed apps' HDD mount paths** — discovered via `resolveAppBackupPaths()` which iterates `ListDeployedStacks()`, no `Enabled` flag - Auto-detects and unlocks stale locks (restic repo lock) - Weekly prune on Sundays with configurable retention (keep-daily, keep-weekly, keep-monthly) - Weekly integrity check (`restic check`) on Sunday 04:00 **Protects against:** accidental deletion, data corruption, point-in-time rollback. Does NOT protect against drive failure (backup is on the same physical drive). #### Tier 2: Cross-Drive Backup (opt-in, different device) (`internal/backup/crossdrive.go`) **Complete backup** to a different physical drive. Available for **all apps** — apps with HDD data back up config + DB + user data; apps without HDD back up config + DB dumps only. - **Two methods:** - **rsync** — Simple mirror with `--delete` (fast, no versioning, **browsable** on disk) - **restic** — Versioned, deduplicated, encrypted (shared repo across apps, not browsable) - Per-app configuration in settings.json: destination path, method, schedule (daily/weekly/manual) - **Pre-backup DB dump:** `DumpStackDB()` runs fresh pg_dump/mariadb-dump before each cross-drive backup; non-fatal on failure (wired via `DBDumper` interface to avoid circular imports) - **Empty mounts allowed:** `RunAppBackup` accepts apps with no HDD mounts — the rsync mount loop simply doesn't execute, but DB + config copy still runs - **Drive-type-aware validation** (`ValidateDestination`): | Destination type | Space checks | |-----------------|--------------| | External mount (different device than `/`) | Block if <100 MB free | | System drive (same device as `/`) | Require ≥10 GB free AND <90% used; logged warning | - **Rsync destination layout** (complete — can restore app independently): ``` backups/rsync// _db/ ← DB dump files (stackName_postgres.sql, etc.) _config/ ← compose.yml, app.yaml, .felhom.yml ← HDD mount contents (only for apps with HDD data) ``` - DB dump files excluded from user data rsync (`--exclude backups/*.sql.gz/sql/dump`) to avoid duplicating app-internal dumps - `_` prefix directories prevent collision with user data - For non-HDD apps, only `_db/` and `_config/` are present (no user data directory) - **Restic backup paths:** includes HDD mounts (if any) + config dir + DB dump dir (deduplication handles overlap) - Safety guards: destination ≠ source, path-overlap check (HDD mounts only), writable check - **Chained execution:** runs immediately after nightly restic — daily apps every night, weekly apps on Sundays - Per-app concurrency lock prevents overlapping runs - Status (last_run, duration, size, error) persisted to settings.json **Protects against:** primary drive failure, drive theft/damage. #### Tier 3: Remote Backup (future) Complete offsite backup for disaster recovery. Not yet implemented. Placeholder shown in UI ("3. mentés — Hamarosan"). #### Restore (`internal/backup/restore.go`) All deployed apps appear in the restore dropdown — every app has restic snapshot data (stacks dir + DB dumps are always backed up). | App type | Config restored | DB restored | User data restored | |----------|----------------|------------|-------------------| | Has HDD data | Yes | Yes | Yes (always — backup is mandatory) | | DB only, no HDD | Yes | Yes | n/a | | No DB, no HDD | Yes | — | n/a | - **Snapshot API** returns ALL snapshots unfiltered — older snapshots still allow config+DB restore; `RestoreApp` extracts whatever paths are available - **Restore type info** shown per-app when selected in dropdown (Hungarian banners): - Has HDD: "Teljes visszaállitas: adatbazis + konfiguracio + felhasznaloi adatok" - Has DB, no HDD: "Adatbazis es konfiguracio visszaallitasa" - No DB, no HDD: "Csak konfiguracio visszaallitasa" - **Execution flow:** stop app → `restic restore --target / --include ...` → restart app - Running flag prevents concurrent backup/restore operations - Snapshot ID validated (8-64 lowercase hex) **Note:** Restore currently uses Tier 1 (primary restic repo) only. Restoring from Tier 2 (cross-drive) is a future enhancement. #### Backup Page UI (`internal/web/templates/backups.html`) Unified per-app status table with expandable rows showing **per-tier** backup status: **Status dot per app:** | Dot color | Meaning | |-----------|---------| | Green | 2+ tiers configured with successful backups + destination healthy | | Yellow | Only 1 tier, or Tier 2 failing, or Tier 2 configured but never run | | Red | Tier 2 destination blocked or inaccessible | Every app starts as yellow (1 tier only). Green requires Tier 2 configured with successful backup. **Per-app backup tiers (3 rows per app):** - **1. mentes** (Tier 1, always present) — Auto badge + "helyi" + last run + contents (e.g., "DB + Konfig + Adatok") - **2. mentes** (Tier 2, configurable for ALL apps) — one of: - Configured: method (rsync/restic) + destination + schedule + last run + status + contents + browsable indicator (folder icon for rsync) + action buttons - Not configured: "1. mentes auto" + "Nincs 2. masolat" + settings link - **3. mentes** (Tier 3, placeholder) — grayed out "Hamarosan" + "tavoli (offsite)" + future note **Backup contents per app** (shown per tier): - Apps with DB + HDD: "DB + Konfig + Adatok" - Apps with DB only: "DB + Konfig" - Apps with HDD, no DB: "Konfig + Adatok" - Apps with neither: "Konfig" **Deploy page** shows cross-drive (Tier 2) configuration form for **all deployed apps**, not just those with HDD data. Non-HDD apps can configure destination, method, and schedule. **Other sections:** - Schedule overview with next run times for DB dump, restic, prune - Snapshot history table (last 20 snapshots with ID, time, files new/changed, data added) - Repository info card (path, size, snapshot count, encryption key with show/copy) - Restore section: app dropdown → snapshot dropdown → restore type info → confirmation checkbox → execute --- ### 3. Storage Management The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration. #### Disk Scanning (`internal/storage/scan.go`) - `ScanDisks()` uses `lsblk -J -b` for block device enumeration - System disk detection via host fstab parsing (`/host-fstab`) + UUID resolution via `blkid` - Partitions enriched with filesystem type, UUID, and label from direct `blkid` probing (Docker containers have incomplete udev cache) - Returns `AvailableDisks` (non-system, non-loop, non-CDROM) and `SystemDisks` separately - Handles NVMe (`nvme0n1p1`), SCSI (`sdb1`), and eMMC (`mmcblk0p1`) naming #### Disk Initialization Wizard (`internal/storage/format.go`) A step-by-step UI at `/settings/storage/init`: 1. **Scan** — Lists available disks with model, size, partition info 2. **Select** — User picks a disk and enters a mount name (e.g., `hdd_1`) 3. **Confirm** — User types "FORMAZAS" to confirm destructive operation 4. **Format pipeline**: `wipefs` → `sfdisk` (GPT) → `mkfs.ext4` → `blkid` UUID → backup fstab → append UUID-based fstab entry → mount → `findmnt` verification → `chown 1000:1000` → create `storage/` and `Dokumentumok/` subdirectories 5. Auto-registers new storage path in settings.json 6. Smart partition detection: skips repartitioning for existing empty partitions Safety guards: system disk detection, mount path conflict check, confirmation required, progress channel for real-time UI feedback. #### Storage Path Registry (`internal/settings/settings.go`) Multiple external storage paths supported with: - **Label**: Human-readable name (editable inline) - **Default flag**: New deploys use this path by default - **Schedulable flag**: Path appears in deploy dropdown - **Auto-discovery**: On startup, scans deployed apps' `HDD_PATH` values and registers unknown paths - Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel #### Data Migration (`internal/storage/migrate.go`) Move app data between storage paths (e.g., SSD → HDD, HDD → new HDD): 1. Validate: stack exists, deployed, has HDD data, target differs from source 2. Estimate total size, check free space on target 3. Stop the application 4. `rsync -a --info=progress2` per mount path with real-time progress parsing 5. Update `app.yaml` HDD_PATH to new location 6. Start the application 7. **Rollback on failure**: reverts config, restarts on old storage Progress UI at `/stacks/{name}/migrate` with byte counter and percentage. #### Stale Data Cleanup After migration, the deploy page detects leftover data on previous storage paths: - Shows path, size, and a delete button - Two-step confirmation required - Protected paths (storage root, media, Dokumentumok, appdata) cannot be deleted #### FileBrowser Mount Sync When storage paths are added or removed, `syncFileBrowserMounts()` auto-regenerates FileBrowser's `docker-compose.yml` with volume mounts for all registered paths, then recreates the container. --- ### 4. Monitoring & Health #### System Health Checks (`internal/monitor/healthcheck.go`) `RunHealthCheck()` evaluates multiple subsystems and returns a `HealthReport` with status (`ok`/`warn`/`fail`): | Check | Warning | Critical | |-------|---------|----------| | Disk usage (SSD/HDD) | >= 90% | >= 95% | | Memory | available < 512MB | available < 256MB | | CPU temperature | >= 75C | >= 85C | | Docker daemon | — | unreachable | | Protected containers | — | not running | | Storage paths | not a mount point (data on SSD) | path inaccessible, disk >= 95% | Backup destination validation (`CheckBackupDestination`) has tiered checks: - Path doesn't exist → critical/blocked - Not writable → critical/blocked - Same block device as root → warning (data on system drive) - Disk >95% full → critical/blocked - Disk >90% full → warning #### Healthchecks.io Integration (`internal/monitor/pinger.go`) Five ping UUIDs for external monitoring: - **Heartbeat**: every 5 min (simple "I'm alive") - **System Health**: periodic health check results - **DB Dump**: after nightly database dumps - **Backup**: after nightly restic backup - **Backup Integrity**: weekly `restic check` result 3-attempt retry with 2-second backoff. Pinger never fails the caller. #### Metrics Store (`internal/metrics/`) - **SQLite with WAL mode** for concurrent reads during collection - **System metrics**: CPU%, memory (total/used/available), temperature, load average — collected every 60 seconds - **Container metrics**: CPU%, memory, network I/O, block I/O per container - Downsampled queries for chart time ranges (1h, 6h, 24h, 7d, 30d) - 30-day auto-prune via daily scheduler job #### Monitoring Page Full-page system monitor at `/monitoring`: - **System Overview**: hostname, OS, kernel, CPU model/cores, uptime - **System Metrics Charts**: 4 line charts (CPU, Memory, Temperature, Load) in 2x2 grid - **Container Resources**: horizontal bar charts (CPU% and Memory per container) - **Per-container Detail**: click-to-expand historical charts - **Remote Monitoring Status**: shows Healthchecks ping UUID configuration Chart.js 4.4.7 embedded locally (works in offline environments), dark theme matching site design. #### Alert System (`internal/web/alerts.go`) State-based alerts displayed on all pages: - Sources: health issues, missing ping UUIDs, backup disabled - Sorted by severity (error > warning > info), capped at 5 visible - Refreshed every 5 min + on startup - Monitoring page suppresses ping-related alerts (shown in dedicated table instead) --- ### 5. Notifications #### Email Delivery The controller relays notifications through the central hub, which sends emails via the Resend API: 1. Controller detects event (health degradation, backup failure, etc.) 2. Non-blocking POST to hub's `/api/v1/notify` with event details 3. Hub checks customer notification preferences 4. Hub sends Hungarian-language email via Resend #### Event Types | Event | Trigger | |-------|---------| | `disk_warning` | Disk usage crosses warning/critical threshold | | `backup_failed` | Nightly backup or DB dump fails | | `update_available` | New app version detected in catalog | | `security_update` | Critical security update available | #### Cooldown System Per-event-type cooldown (default 6 hours, configurable) prevents notification spam. Only notifies on **status degradation** (ok→warn, ok→fail, warn→fail), not on repeated same-status checks. #### Preference Sync Notification preferences (email, enabled events, cooldown) are: - Stored locally in `settings.json` - Synced to hub on save and on controller startup - Hub sync failure doesn't block local save --- ### 6. Update Management #### App Catalog Sync - Periodic `git fetch` + `git reset --hard` of the app catalog repo - Content-hash comparison prevents unnecessary file writes - Post-sync stack rescan detects new/changed apps immediately #### Planned Update Classifications | Marker | Behavior | |--------|----------| | No marker | Optional — shown on dashboard, customer clicks "Update" | | `UPDATE_REQUIRED=true` | Mandatory — auto-applied during next update window | | `UPDATE_SECURITY=true` | Critical — applied immediately | --- ### 7. Authentication & Settings #### Session Auth (`internal/web/auth.go`) - bcrypt password verification with configurable source priority: `settings.json` → `controller.yaml` → no auth (open access) - 7-day session duration with random 32-byte hex tokens - `?next=` redirect after login preserves the page the user was visiting - Session cleanup every 15 minutes - All sessions invalidated on password change - Conditional logout link (hidden when auth is disabled) #### Settings Persistence (`internal/settings/settings.go`) Runtime-mutable settings in `settings.json` (separate from infrastructure config): | Section | Contents | |---------|----------| | `password_hash` | bcrypt hash override | | `notifications` | email, enabled events, cooldown hours | | `db_validations` | per-DB dump validation results (survives restarts) | | `app_backup` | per-app map: enabled flag, cross-drive config (method, dest, schedule, runtime status) | | `storage_paths` | registered paths with label, default flag, schedulable flag | | `cross_drive_restic_password` | auto-generated restic password for cross-drive repos | All public methods use `sync.RWMutex`. File writes are atomic (`.tmp` + rename). #### Settings Page (`/settings`) Three sections: 1. **System config** — read-only display of `controller.yaml` values 2. **Password change** — current + new + confirm, min 8 chars 3. **Storage paths** — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes 4. **Notifications** — email, event checkboxes, cooldown hours, test email button --- ### 8. Central Hub Reporting #### Report Push (`internal/report/`) Periodic JSON push (default every 15 min) to the central felhom-hub service: - System: hostname, OS, CPU, memory, disk usage, uptime - Containers: running/stopped counts, per-container CPU/memory - Backup: last run, success, repo stats, snapshot count, restic password (for disaster recovery) - Health: current status, issues, warnings - Stacks: deployed apps with versions and states Bearer token authentication, 3-attempt retry with 5-second backoff. #### Hub Dashboard The hub service (separate Go app in the `felhom.eu` repo) provides: - Multi-customer overview table with status indicators - Customer detail page with system/storage/containers/backup/health sections - Color coding: green (<30min), yellow (30-60min), red (>60min since last report) - 90-day report retention with daily prune --- ## Repository Layout ``` controller/ ├── cmd/controller/main.go # Entry point, wires all 14 modules ├── internal/ │ ├── config/config.go # YAML loader, validation, env overrides │ ├── settings/settings.go # Runtime settings (JSON, atomic writes, RWMutex) │ ├── stacks/ │ │ ├── manager.go # Stack scanning, compose ops, container status │ │ ├── metadata.go # Parse .felhom.yml app metadata │ │ ├── deploy.go # First-deploy: secret gen, app.yaml, compose up │ │ └── delete.go # Stack deletion + HDD data cleanup │ ├── sync/sync.go # Git sync: clone/pull app catalog, content-hash copy │ ├── storage/ │ │ ├── scan.go, scan_linux.go # Disk detection via lsblk + blkid │ │ ├── format.go, format_linux.go # Partition, format, mount pipeline │ │ ├── safety.go, safety_linux.go # System disk detection, mount guards, fstab ops │ │ ├── migrate.go # App data migration (rsync with progress) │ │ └── *_other.go # Non-Linux stubs for cross-compilation │ ├── backup/ │ │ ├── backup.go # Orchestrator (dumps + restic + cross-drive chain) │ │ ├── dbdump.go # DB auto-discovery + dump (pg_dump, mariadb-dump) │ │ ├── restic.go # Restic operations (init, snapshot, prune, check) │ │ ├── appdata.go # StackDataProvider interface, app data discovery │ │ ├── crossdrive.go # Per-app backup to secondary storage (rsync/restic) │ │ └── restore.go # Per-app restore with auto stop/restart │ ├── api/router.go # REST API endpoints (~30 routes) │ ├── scheduler/scheduler.go # Central job scheduler (Every, Daily) │ ├── system/ │ │ ├── info.go, info_linux.go # RAM, disk, CPU, temperature, load average │ │ ├── cpu_linux.go # Background /proc/stat sampling │ │ └── mounts_linux.go # Mount points, disk usage, FS info, backup dest checks │ ├── monitor/ │ │ ├── pinger.go # Healthchecks.io HTTP ping client │ │ └── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker) │ ├── metrics/ │ │ ├── store.go # SQLite time-series (WAL mode, downsampled queries) │ │ ├── collector.go # Background collector (60s, system + docker stats) │ │ └── sysinfo.go # Static system info (/proc, /etc) │ ├── notify/notifier.go # Email relay to hub, preference sync, cooldowns │ ├── report/ │ │ ├── builder.go # Hub report builder (all subsystems → JSON) │ │ └── pusher.go # HTTP POST to hub (retry, Bearer auth) │ └── web/ │ ├── server.go # HTTP server, routing, static files │ ├── auth.go # Session auth, login/logout, session cleanup │ ├── handlers.go # Page handlers (dashboard, stacks, deploy, backups, etc.) │ ├── storage_handlers.go # Storage API handlers (scan, format, migrate, cleanup) │ ├── alerts.go # State-based alert generation │ ├── funcmap.go # Template functions (state colors, Hungarian formatting) │ ├── embed.go # go:embed for templates + Chart.js │ └── templates/ # 12 HTML files + style.css (Hungarian UI) ├── configs/ │ ├── controller.yaml.example # Full config reference │ └── example-felhom-metadata.yml # .felhom.yml format reference ├── Dockerfile # Multi-stage: Go 1.24 builder + debian-slim runtime ├── docker-compose.yml # Controller's own compose (privileged, /mnt rshared) └── go.mod # Go 1.24, deps: bcrypt, yaml.v3, modernc.org/sqlite ``` --- ## Configuration ### Controller config (`controller.yaml`) Single YAML file per customer, infrastructure-only. Does **not** contain app-specific config. Key sections: ```yaml customer: name: "Demo Felhom" node_id: "demo-felhom" paths: stacks_dir: "/opt/docker/stacks" data_dir: "/opt/docker/felhom-controller/data" db_dump_dir: "/srv/backups/db-dumps" restic_repo: "/srv/backups/restic-repo" git: repo_url: "https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git" sync_interval: "15m" backup: enabled: true db_dump_time: "02:30" restic_time: "03:00" retention: { keep_daily: 7, keep_weekly: 4, keep_monthly: 6 } monitoring: health_interval: "5m" ping_uuids: heartbeat: "uuid-here" system_health: "uuid-here" db_dump: "uuid-here" backup: "uuid-here" backup_integrity: "uuid-here" hub: enabled: true url: "https://hub.felhom.eu" api_key: "bearer-token-here" system: reserved_memory_mb: 384 # RAM reserved for OS + controller ``` Environment variable overrides: `FELHOM_LOGGING_LEVEL=debug`, `FELHOM_HUB_ENABLED=false`, etc. ### Runtime settings (`settings.json`) Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache. All writes are atomic. ### Per-app config (`app.yaml`) Auto-generated during deployment. Contains env vars, locked fields list, deploy timestamp. Secret fields are locked (read-only after first deploy). --- ## Scheduler Jobs | Job | Type | When | Purpose | |-----|------|------|---------| | status-refresh | periodic | 30s | Refresh container states | | stack-scan | periodic | 2m | Rescan stacks directory | | heartbeat | periodic | 5m | Ping Healthchecks "I'm alive" | | system-health | periodic | configurable | Health checks + alert refresh | | backup-cache | periodic | 5m | Refresh backup status cache | | hub-report | periodic | 15m | Push report to central hub | | db-dump | daily | 02:30 | Database dumps | | backup | daily | 03:00 | Restic backup → cross-drive chain | | backup-integrity | daily | Sun 04:00 | Restic check | | metrics-prune | daily | 04:00 | Delete metrics older than 30 days | All daily jobs use Europe/Budapest timezone. Skip-if-running prevents concurrent execution. Panic recovery in all jobs. --- ## REST API ### Stack Operations | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/health` | Health check (no auth) | | GET | `/api/stacks` | List all stacks | | GET | `/api/stacks/{name}` | Stack details | | POST | `/api/stacks/{name}/deploy` | First-time deploy | | POST | `/api/stacks/{name}/start` | Start stack | | POST | `/api/stacks/{name}/stop` | Stop stack | | POST | `/api/stacks/{name}/restart` | Restart stack | | POST | `/api/stacks/{name}/update` | Pull + recreate | | POST | `/api/stacks/{name}/optional-config` | Update optional env vars | | GET | `/api/stacks/{name}/logs` | Container logs (`?raw=1` for plain text) | | GET | `/api/stacks/{name}/hdd-data` | HDD data paths + sizes | | DELETE | `/api/stacks/{name}` | Delete stack | | POST | `/api/sync` | Trigger catalog sync | | GET | `/api/system/info` | System info + sync status | ### Backup & Restore | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/backup/status` | Full backup status | | POST | `/api/backup/run` | Trigger manual backup | | GET | `/api/backup/snapshots` | List snapshots (`?stack={name}` for filtering) | | POST | `/api/stacks/{name}/cross-backup` | Save cross-drive config | | POST | `/api/stacks/{name}/cross-backup/run` | Trigger cross-drive backup | | GET | `/api/stacks/{name}/cross-backup/status` | Cross-drive status | | POST | `/api/backup/cross-drive/run-all` | Run all scheduled cross-drive backups | ### Storage | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/storage/scan` | Scan available disks | | POST | `/api/storage/init` | Format and mount a disk | | GET | `/api/storage/init/status` | Format progress | | POST | `/api/storage/migrate` | Start app data migration | | GET | `/api/storage/migrate/status` | Migration progress | ### Metrics | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/metrics/system` | System metrics time-series (`?range=1h|6h|24h|7d|30d`) | | GET | `/api/metrics/containers/summary` | Current container stats | | GET | `/api/metrics/containers/{name}` | Per-container time-series | | GET | `/api/metrics/sysinfo` | Static system info | Response format: `{"ok": true/false, "data": ..., "error": "...", "message": "..."}` --- ## Build & Deploy ### Build ```bash # On build server (192.168.0.180) cd ~/build/felhom-controller git -C ~/git/deploy-felhom-compose pull ./build.sh v0.12.2 --push ``` ### Deploy on customer node ```bash # On customer node (e.g., 192.168.0.162) cd /opt/docker/felhom-controller sudo docker pull gitea.dooplex.hu/admin/felhom-controller:v0.12.2 sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:v0.12.2|' docker-compose.yml sudo docker compose up -d ``` **Important:** Always use `docker compose up -d`, NOT `docker compose restart` — restart doesn't pick up new images. ### Docker Requirements The controller container needs: - `privileged: true` (disk operations) - Docker socket mount (`/var/run/docker.sock`) - `/mnt` mount with `propagation: rshared` (container mounts visible to host) - `/dev` mounted as `/host-dev` (block device access) - `/etc/fstab` mounted as `/host-fstab` (persistent mount config) See `docker-compose.yml` for the full volume configuration. --- ## Roadmap ### Completed - [x] Stack management with deploy flow and memory validation - [x] Git-based app catalog sync - [x] Central job scheduler - [x] System monitoring with SQLite metrics and Chart.js charts - [x] Healthchecks.io integration (5 ping types) - [x] 3-layer backup system (DB dumps + restic + cross-drive) - [x] Per-app backup restore with auto stop/restart - [x] Storage management (scan, format, mount, registry) - [x] App data migration between storage paths - [x] Central hub reporting - [x] Email notifications via hub relay - [x] Settings persistence and password management - [x] Dashboard alert system ### In Progress / Planned - [ ] Update classification and auto-apply (optional/required/security markers) - [ ] Self-update mechanism with health-based rollback - [ ] Docker volume backup (`/var/lib/docker/volumes:ro`) - [ ] Raspberry Pi testing (pi-customer-1) - [ ] Cross-drive restic pruning (unbounded snapshot growth) - [ ] CSRF protection on POST endpoints - [ ] Login rate limiting --- ## Test Environments | Node | Hardware | Domain | Status | |------|----------|--------|--------| | demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | Controller v0.12.2 running | | pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | Not yet tested | ## Related Repositories | Repository | Purpose | |------------|---------| | [deploy-felhom-compose](https://gitea.dooplex.hu/admin/deploy-felhom-compose) | This repo — controller + deploy scripts | | [app-catalog-felhom.eu](https://gitea.dooplex.hu/admin/app-catalog-felhom.eu) | Docker Compose templates + .felhom.yml metadata | | [felhom.eu](https://gitea.dooplex.hu/admin/felhom.eu) | Website + app assets + felhom-hub service |