38 KiB
felhom-controller
Central management container for Felhom home servers.
A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.
Current version: v0.12.7
Table of Contents
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Customer Hardware (N100 mini PC / Raspberry Pi) │
│ │
│ ┌──────────┐ ┌────────────────────────────────────────────┐ │
│ │ Traefik │ │ felhom-controller (privileged container) │ │
│ │ (reverse │──▶│ │ │
│ │ proxy) │ │ ┌──────────┐ ┌─────────────────────────┐│ │
│ └──────────┘ │ │ Web UI │ │ Stack Manager ││ │
│ │ │ (HU dash │ │ (compose ops, git sync, ││ │
│ ┌──────────┐ │ │ board) │ │ deploy, delete, update) ││ │
│ │cloudflared│ │ └──────────┘ └─────────────────────────┘│ │
│ │ (tunnel) │ │ ┌──────────┐ ┌─────────────────────────┐│ │
│ └──────────┘ │ │ Backup │ │ Storage Manager ││ │
│ │ │ (3-layer │ │ (disk scan, format, ││ │
│ ┌──────────┐ │ │ restic) │ │ mount, migrate) ││ │
│ │ App │ │ └──────────┘ └─────────────────────────┘│ │
│ │ stacks │ │ ┌──────────┐ ┌─────────────────────────┐│ │
│ │ (docker │ │ │Scheduler │ │ Monitor & Metrics ││ │
│ │ compose) │ │ │(cron-like│ │ (health, pings, SQLite ││ │
│ └──────────┘ │ │ jobs) │ │ time-series, Chart.js) ││ │
│ │ └──────────┘ └─────────────────────────┘│ │
│ │ ┌──────────┐ ┌─────────────────────────┐│ │
│ │ │ Notify │ │ REST API + Hub Reporter ││ │
│ │ │ (email) │ │ (JSON push to hub) ││ │
│ │ └──────────┘ └─────────────────────────┘│ │
│ └────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ pings │ JSON push │ git pull
▼ ▼ ▼
status.felhom.eu hub.felhom.eu gitea.dooplex.hu
(Healthchecks) (central dashboard) (stack definitions)
Key Architecture Decisions
- Pure Go, no frameworks — stdlib
net/http+html/template. Only external deps:bcrypt,yaml.v3,modernc.org/sqlite(pure Go, no CGO). - Privileged container — Required for disk operations (format, mount, fstab),
/devaccess, and Docker socket control. /host-devindirection — Docker overrides/devwith a tmpfs. The host's/devis mounted at/host-devto access block devices.StackDataProviderinterface — Breaks circular import between backup and stacks packages. Implemented bystackAdapterinmain.go.- Atomic file writes — All persistent state (
settings.json,app.yaml) written to.tmpthenos.Renamefor crash safety. go:embedtemplates — All HTML/CSS/JS compiled into the binary. No runtime file dependencies.- Europe/Budapest timezone — All scheduled jobs, timestamps, and UI labels use Hungarian timezone.
Module Map
| Module | Path | Responsibility |
|---|---|---|
| Config | internal/config/ |
YAML loader, validation, FELHOM_* env overrides |
| Settings | internal/settings/ |
Runtime-mutable settings.json (passwords, backup prefs, storage paths, notifications) |
| Stacks | internal/stacks/ |
Compose operations, scanning, .felhom.yml metadata, deploy/delete flow |
| Sync | internal/sync/ |
Git-based app catalog sync (clone/pull, content-hash copy) |
| Backup | internal/backup/ |
3-layer backup: DB dumps → restic snapshots → cross-drive copies, restore |
| Storage | internal/storage/ |
Disk scanning (lsblk), partitioning (sfdisk), formatting (mkfs.ext4), mounting, data migration (rsync) |
| System | internal/system/ |
System info (/proc), CPU collector, mount points, disk usage, FS info |
| Monitor | internal/monitor/ |
Healthchecks.io pinger, system health checks |
| Metrics | internal/metrics/ |
SQLite time-series store, system + container metric collection |
| Scheduler | internal/scheduler/ |
Central job scheduler (periodic + daily, skip-if-running, panic recovery) |
| Notify | internal/notify/ |
Email notifications via hub relay, preference sync, per-event cooldowns |
| Report | internal/report/ |
Hub report builder + HTTP pusher (system, stacks, backup, health) |
| API | internal/api/ |
REST JSON endpoints |
| Web | internal/web/ |
Hungarian dashboard, auth, page handlers, template functions, alerts |
Features
1. App Management
The controller manages Docker Compose stacks through a complete lifecycle: catalog sync, first-time deployment, runtime operations, and deletion.
Git Sync (internal/sync/)
The app catalog lives in a separate Git repository. The controller:
- Shallow-clones the catalog on startup
- Periodically fetches updates (configurable, default 15 min)
- Copies only
docker-compose.ymland.felhom.ymlto the stacks directory - Never overwrites
app.yamlor.env(user secrets are safe) - Uses SHA-256 content hashing — only writes files that actually changed
- Triggers stack rescan after sync so the dashboard updates immediately
- Manual sync via "Sablonok frissitese" button or
POST /api/sync
First-Time Deploy Flow
- Customer sees app card with "Telepites" button
- Deploy page shows auto-filled fields (domain), auto-generated secrets (DB passwords, hex keys), and user-configurable inputs (admin password, language, storage path)
checkBeforeDeploy()JS guard fetches live state first (prevents double-deploy from another tab)- Memory validation checks
mem_requestagainst available RAM:usable_memory = total_ram - reserved_memory_mb(default 384MB reserved)- Hard block if requests exceed usable memory
- Soft warning if limits exceed total RAM (overcommit OK)
- Controller generates secrets, saves
app.yaml, sets in-memoryDeployedflag beforedocker compose up -d(avoids stale UI during slow image pulls), reverts on failure - 3-step progress panel polls
GET /api/stacks/{name}every 3s: config saved → containers starting → health check passed - Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only
App Info Pages
Each app can define rich metadata in .felhom.yml:
app_info: tagline, use_cases, first_steps, prerequisites, default_creds, docs_urloptional_config: groups of post-deploy configurable env vars (e.g., API keys for metadata providers)resources: mem_request, mem_limit, pi_compatible, needs_hdd
The /apps/{slug} page renders hero section, screenshots, setup guide, and optional config form.
Stack Operations
| Operation | What it does |
|---|---|
| Start | docker compose up -d |
| Stop | docker compose stop (blocked for protected stacks) |
| Restart | docker compose restart |
| Update | docker compose pull + docker compose up -d |
| Delete | docker compose down --rmi local --volumes + optional HDD data cleanup |
Protected stacks (traefik, cloudflared, felhom-controller) cannot be stopped or deleted from the UI. Restart is allowed.
Orphan detection: Deployed stacks with no matching catalog template are marked as orphaned with an "Elavult" badge and can be safely deleted.
Container State Display
| State | Color | Label | Meaning |
|---|---|---|---|
| Running + healthy | Green | "Fut" | All containers running and healthy |
| Running + starting | Orange | "Indulas..." | Healthcheck not yet passed |
| Running + unhealthy | Yellow | "Nem egeszseges" | Healthcheck failing |
| Stopped/exited | Red | "Leallitva" | All containers stopped |
| Restarting | Yellow | "Ujrainditas..." | Restart loop |
| Not deployed | Gray | "Nincs telepitve" | Compose file exists, not deployed |
2. Backup System
The backup system implements a 3-2-1 backup architecture. Each tier is a complete, self-sufficient backup — any single tier can fully restore an app.
| Tier | Contents | Location | Can fully restore? |
|---|---|---|---|
| 1. Nightly restic | DB + Config + User data | Same drive as app | Yes (not against drive failure) |
| 2. Cross-drive | DB + Config + User data | Different physical device | Yes |
| 3. Remote | Everything | Cloud / remote server | Future |
Key principles:
- User data backup is mandatory — every app with HDD bind mounts is included automatically. There is no per-app toggle.
- Each tier includes everything needed to restore: DB dumps, config, and user data. No tier depends on another tier's data.
- The
AppBackupPrefs.Enabledfield in settings.json is legacy and not read by any code.
Tier 1: Nightly Backup (mandatory, same drive)
The nightly backup has two phases that run sequentially:
Phase 1 — Database Dumps (internal/backup/dbdump.go, scheduled 02:30)
- Auto-discovery of PostgreSQL and MariaDB containers via
docker ps+docker inspect - Dumps via
docker exec pg_dump/docker exec mariadb-dumpwith 5-minute timeout - Atomic writes (
.tmp→.sql) to prevent corruption - Validation after each dump: checks file size, header presence, counts
CREATE TABLE - Results cached in
settings.jsonsurviving container restarts
Phase 2 — Restic Snapshot (internal/backup/restic.go, scheduled 03:00)
- Auto-generated repository password (32 random bytes, base64url), synced to hub
- Paths included in every snapshot:
- Stacks dir (all compose.yml + app.yaml + .felhom.yml)
- DB dump dir (all
.sqldump files from Phase 1) controller.yaml(controller config)- ALL deployed apps' HDD mount paths — discovered via
resolveAppBackupPaths()which iteratesListDeployedStacks(), noEnabledflag
- Auto-detects and unlocks stale locks (restic repo lock)
- Weekly prune on Sundays with configurable retention (keep-daily, keep-weekly, keep-monthly)
- Weekly integrity check (
restic check) on Sunday 04:00
Protects against: accidental deletion, data corruption, point-in-time rollback. Does NOT protect against drive failure (backup is on the same physical drive).
Tier 2: Cross-Drive Backup (opt-in, different device) (internal/backup/crossdrive.go)
Complete backup to a different physical drive — DB dumps + config + user data.
-
Two methods:
- rsync — Simple mirror with
--delete(fast, no versioning, browsable on disk) - restic — Versioned, deduplicated, encrypted (shared repo across apps, not browsable)
- rsync — Simple mirror with
-
Per-app configuration in settings.json: destination path, method, schedule (daily/weekly/manual)
-
Pre-backup DB dump:
DumpStackDB()runs fresh pg_dump/mariadb-dump before each cross-drive backup; non-fatal on failure (wired viaDBDumperinterface to avoid circular imports) -
Drive-type-aware validation (
ValidateDestination):Destination type Space checks External mount (different device than /)Block if <100 MB free System drive (same device as /)Require ≥10 GB free AND <90% used; logged warning -
Rsync destination layout (complete — can restore app independently):
backups/rsync/<app>/ _db/ ← DB dump files (stackName_postgres.sql, etc.) _config/ ← compose.yml, app.yaml, .felhom.yml <user data> ← HDD mount contents (single mount: flat; multi-mount: leaf subfolders)- DB dump files excluded from user data rsync (
--exclude backups/*.sql.gz/sql/dump) to avoid duplicating app-internal dumps _prefix directories prevent collision with user data
- DB dump files excluded from user data rsync (
-
Restic backup paths: includes HDD mounts + config dir + DB dump dir (deduplication handles overlap)
-
Safety guards: destination ≠ source, path-overlap check, writable check
-
Chained execution: runs immediately after nightly restic — daily apps every night, weekly apps on Sundays
-
Per-app concurrency lock prevents overlapping runs
-
Status (last_run, duration, size, error) persisted to settings.json
Protects against: primary drive failure, drive theft/damage.
Tier 3: Remote Backup (future)
Complete offsite backup for disaster recovery. Not yet implemented.
Restore (internal/backup/restore.go)
All deployed apps appear in the restore dropdown — every app has restic snapshot data (stacks dir + DB dumps are always backed up).
| App type | Config restored | DB restored | User data restored |
|---|---|---|---|
| Has HDD data | ✓ | ✓ | ✓ (always — backup is mandatory) |
| DB only, no HDD | ✓ | ✓ | n/a |
| No DB, no HDD | ✓ | — | n/a |
- Snapshot API returns ALL snapshots unfiltered — older snapshots still allow config+DB restore;
RestoreAppextracts whatever paths are available - Restore type info shown per-app when selected in dropdown (Hungarian banners):
- Has HDD: "Teljes visszaállítás: adatbázis + konfiguráció + felhasználói adatok"
- Has DB, no HDD: "Adatbázis és konfiguráció visszaállítása"
- No DB, no HDD: "Csak konfiguráció visszaállítása"
- Execution flow: stop app →
restic restore <id> --target / --include <path>...→ restart app - Running flag prevents concurrent backup/restore operations
- Snapshot ID validated (8–64 lowercase hex)
Note: Restore currently uses Tier 1 (primary restic repo) only. Restoring from Tier 2 (cross-drive) is a future enhancement.
Backup Page UI (internal/web/templates/backups.html)
Unified per-app status table with expandable rows showing per-tier backup status:
Status dot per app:
| Dot color | Meaning |
|---|---|
| Green | Fully covered — cross-drive configured and last run OK |
| Yellow | Warning — no second copy, or last backup failed, or disk space issue |
| Red | Cross-drive destination blocked or inaccessible |
| Gray (auto) | No user data — only config/DB backup (automatic) |
Per-app backup tiers:
- 1. mentés (Tier 1, always present) — Auto badge + "helyi" + last run + contents (e.g., "DB + Konfig + Adatok")
- 2. mentés (Tier 2, only for apps with HDD data) — one of:
- Configured: method (rsync/restic) + destination + schedule + last run + status + contents + browsable indicator (📁 for rsync) + action buttons
- Not configured: "✓ 1. mentés auto" + "⚠ Nincs 2. másolat" + settings link
Backup contents per app (shown per tier):
- Apps with DB + HDD: "DB + Konfig + Adatok"
- Apps with DB only: "DB + Konfig"
- Apps with HDD, no DB: "Konfig + Adatok"
- Apps with neither: "Konfig"
Other sections:
- Schedule overview with next run times for DB dump, restic, prune
- Snapshot history table (last 20 snapshots with ID, time, files new/changed, data added)
- Repository info card (path, size, snapshot count, encryption key with show/copy)
- Restore section: app dropdown → snapshot dropdown → restore type info → confirmation checkbox → execute
3. Storage Management
The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration.
Disk Scanning (internal/storage/scan.go)
ScanDisks()useslsblk -J -bfor block device enumeration- System disk detection via host fstab parsing (
/host-fstab) + UUID resolution viablkid - Partitions enriched with filesystem type, UUID, and label from direct
blkidprobing (Docker containers have incomplete udev cache) - Returns
AvailableDisks(non-system, non-loop, non-CDROM) andSystemDisksseparately - Handles NVMe (
nvme0n1p1), SCSI (sdb1), and eMMC (mmcblk0p1) naming
Disk Initialization Wizard (internal/storage/format.go)
A step-by-step UI at /settings/storage/init:
- Scan — Lists available disks with model, size, partition info
- Select — User picks a disk and enters a mount name (e.g.,
hdd_1) - Confirm — User types "FORMAZAS" to confirm destructive operation
- Format pipeline:
wipefs→sfdisk(GPT) →mkfs.ext4→blkidUUID → backup fstab → append UUID-based fstab entry → mount →findmntverification →chown 1000:1000→ createstorage/andDokumentumok/subdirectories - Auto-registers new storage path in settings.json
- Smart partition detection: skips repartitioning for existing empty partitions
Safety guards: system disk detection, mount path conflict check, confirmation required, progress channel for real-time UI feedback.
Storage Path Registry (internal/settings/settings.go)
Multiple external storage paths supported with:
- Label: Human-readable name (editable inline)
- Default flag: New deploys use this path by default
- Schedulable flag: Path appears in deploy dropdown
- Auto-discovery: On startup, scans deployed apps'
HDD_PATHvalues and registers unknown paths - Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel
Data Migration (internal/storage/migrate.go)
Move app data between storage paths (e.g., SSD → HDD, HDD → new HDD):
- Validate: stack exists, deployed, has HDD data, target differs from source
- Estimate total size, check free space on target
- Stop the application
rsync -a --info=progress2per mount path with real-time progress parsing- Update
app.yamlHDD_PATH to new location - Start the application
- Rollback on failure: reverts config, restarts on old storage
Progress UI at /stacks/{name}/migrate with byte counter and percentage.
Stale Data Cleanup
After migration, the deploy page detects leftover data on previous storage paths:
- Shows path, size, and a delete button
- Two-step confirmation required
- Protected paths (storage root, media, Dokumentumok, appdata) cannot be deleted
FileBrowser Mount Sync
When storage paths are added or removed, syncFileBrowserMounts() auto-regenerates FileBrowser's docker-compose.yml with volume mounts for all registered paths, then recreates the container.
4. Monitoring & Health
System Health Checks (internal/monitor/healthcheck.go)
RunHealthCheck() evaluates multiple subsystems and returns a HealthReport with status (ok/warn/fail):
| Check | Warning | Critical |
|---|---|---|
| Disk usage (SSD/HDD) | >= 90% | >= 95% |
| Memory | available < 512MB | available < 256MB |
| CPU temperature | >= 75C | >= 85C |
| Docker daemon | — | unreachable |
| Protected containers | — | not running |
| Storage paths | not a mount point (data on SSD) | path inaccessible, disk >= 95% |
Backup destination validation (CheckBackupDestination) has tiered checks:
- Path doesn't exist → critical/blocked
- Not writable → critical/blocked
- Same block device as root → warning (data on system drive)
- Disk >95% full → critical/blocked
- Disk >90% full → warning
Healthchecks.io Integration (internal/monitor/pinger.go)
Five ping UUIDs for external monitoring:
- Heartbeat: every 5 min (simple "I'm alive")
- System Health: periodic health check results
- DB Dump: after nightly database dumps
- Backup: after nightly restic backup
- Backup Integrity: weekly
restic checkresult
3-attempt retry with 2-second backoff. Pinger never fails the caller.
Metrics Store (internal/metrics/)
- SQLite with WAL mode for concurrent reads during collection
- System metrics: CPU%, memory (total/used/available), temperature, load average — collected every 60 seconds
- Container metrics: CPU%, memory, network I/O, block I/O per container
- Downsampled queries for chart time ranges (1h, 6h, 24h, 7d, 30d)
- 30-day auto-prune via daily scheduler job
Monitoring Page
Full-page system monitor at /monitoring:
- System Overview: hostname, OS, kernel, CPU model/cores, uptime
- System Metrics Charts: 4 line charts (CPU, Memory, Temperature, Load) in 2x2 grid
- Container Resources: horizontal bar charts (CPU% and Memory per container)
- Per-container Detail: click-to-expand historical charts
- Remote Monitoring Status: shows Healthchecks ping UUID configuration
Chart.js 4.4.7 embedded locally (works in offline environments), dark theme matching site design.
Alert System (internal/web/alerts.go)
State-based alerts displayed on all pages:
- Sources: health issues, missing ping UUIDs, backup disabled
- Sorted by severity (error > warning > info), capped at 5 visible
- Refreshed every 5 min + on startup
- Monitoring page suppresses ping-related alerts (shown in dedicated table instead)
5. Notifications
Email Delivery
The controller relays notifications through the central hub, which sends emails via the Resend API:
- Controller detects event (health degradation, backup failure, etc.)
- Non-blocking POST to hub's
/api/v1/notifywith event details - Hub checks customer notification preferences
- Hub sends Hungarian-language email via Resend
Event Types
| Event | Trigger |
|---|---|
disk_warning |
Disk usage crosses warning/critical threshold |
backup_failed |
Nightly backup or DB dump fails |
update_available |
New app version detected in catalog |
security_update |
Critical security update available |
Cooldown System
Per-event-type cooldown (default 6 hours, configurable) prevents notification spam. Only notifies on status degradation (ok→warn, ok→fail, warn→fail), not on repeated same-status checks.
Preference Sync
Notification preferences (email, enabled events, cooldown) are:
- Stored locally in
settings.json - Synced to hub on save and on controller startup
- Hub sync failure doesn't block local save
6. Update Management
App Catalog Sync
- Periodic
git fetch+git reset --hardof the app catalog repo - Content-hash comparison prevents unnecessary file writes
- Post-sync stack rescan detects new/changed apps immediately
Planned Update Classifications
| Marker | Behavior |
|---|---|
| No marker | Optional — shown on dashboard, customer clicks "Update" |
UPDATE_REQUIRED=true |
Mandatory — auto-applied during next update window |
UPDATE_SECURITY=true |
Critical — applied immediately |
7. Authentication & Settings
Session Auth (internal/web/auth.go)
- bcrypt password verification with configurable source priority:
settings.json→controller.yaml→ no auth (open access) - 7-day session duration with random 32-byte hex tokens
?next=redirect after login preserves the page the user was visiting- Session cleanup every 15 minutes
- All sessions invalidated on password change
- Conditional logout link (hidden when auth is disabled)
Settings Persistence (internal/settings/settings.go)
Runtime-mutable settings in settings.json (separate from infrastructure config):
| Section | Contents |
|---|---|
password_hash |
bcrypt hash override |
notifications |
email, enabled events, cooldown hours |
db_validations |
per-DB dump validation results (survives restarts) |
app_backup |
per-app map: enabled flag, cross-drive config (method, dest, schedule, runtime status) |
storage_paths |
registered paths with label, default flag, schedulable flag |
cross_drive_restic_password |
auto-generated restic password for cross-drive repos |
All public methods use sync.RWMutex. File writes are atomic (.tmp + rename).
Settings Page (/settings)
Three sections:
- System config — read-only display of
controller.yamlvalues - Password change — current + new + confirm, min 8 chars
- Storage paths — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes
- Notifications — email, event checkboxes, cooldown hours, test email button
8. Central Hub Reporting
Report Push (internal/report/)
Periodic JSON push (default every 15 min) to the central felhom-hub service:
- System: hostname, OS, CPU, memory, disk usage, uptime
- Containers: running/stopped counts, per-container CPU/memory
- Backup: last run, success, repo stats, snapshot count, restic password (for disaster recovery)
- Health: current status, issues, warnings
- Stacks: deployed apps with versions and states
Bearer token authentication, 3-attempt retry with 5-second backoff.
Hub Dashboard
The hub service (separate Go app in the felhom.eu repo) provides:
- Multi-customer overview table with status indicators
- Customer detail page with system/storage/containers/backup/health sections
- Color coding: green (<30min), yellow (30-60min), red (>60min since last report)
- 90-day report retention with daily prune
Repository Layout
controller/
├── cmd/controller/main.go # Entry point, wires all 14 modules
├── internal/
│ ├── config/config.go # YAML loader, validation, env overrides
│ ├── settings/settings.go # Runtime settings (JSON, atomic writes, RWMutex)
│ ├── stacks/
│ │ ├── manager.go # Stack scanning, compose ops, container status
│ │ ├── metadata.go # Parse .felhom.yml app metadata
│ │ ├── deploy.go # First-deploy: secret gen, app.yaml, compose up
│ │ └── delete.go # Stack deletion + HDD data cleanup
│ ├── sync/sync.go # Git sync: clone/pull app catalog, content-hash copy
│ ├── storage/
│ │ ├── scan.go, scan_linux.go # Disk detection via lsblk + blkid
│ │ ├── format.go, format_linux.go # Partition, format, mount pipeline
│ │ ├── safety.go, safety_linux.go # System disk detection, mount guards, fstab ops
│ │ ├── migrate.go # App data migration (rsync with progress)
│ │ └── *_other.go # Non-Linux stubs for cross-compilation
│ ├── backup/
│ │ ├── backup.go # Orchestrator (dumps + restic + cross-drive chain)
│ │ ├── dbdump.go # DB auto-discovery + dump (pg_dump, mariadb-dump)
│ │ ├── restic.go # Restic operations (init, snapshot, prune, check)
│ │ ├── appdata.go # StackDataProvider interface, app data discovery
│ │ ├── crossdrive.go # Per-app backup to secondary storage (rsync/restic)
│ │ └── restore.go # Per-app restore with auto stop/restart
│ ├── api/router.go # REST API endpoints (~30 routes)
│ ├── scheduler/scheduler.go # Central job scheduler (Every, Daily)
│ ├── system/
│ │ ├── info.go, info_linux.go # RAM, disk, CPU, temperature, load average
│ │ ├── cpu_linux.go # Background /proc/stat sampling
│ │ └── mounts_linux.go # Mount points, disk usage, FS info, backup dest checks
│ ├── monitor/
│ │ ├── pinger.go # Healthchecks.io HTTP ping client
│ │ └── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker)
│ ├── metrics/
│ │ ├── store.go # SQLite time-series (WAL mode, downsampled queries)
│ │ ├── collector.go # Background collector (60s, system + docker stats)
│ │ └── sysinfo.go # Static system info (/proc, /etc)
│ ├── notify/notifier.go # Email relay to hub, preference sync, cooldowns
│ ├── report/
│ │ ├── builder.go # Hub report builder (all subsystems → JSON)
│ │ └── pusher.go # HTTP POST to hub (retry, Bearer auth)
│ └── web/
│ ├── server.go # HTTP server, routing, static files
│ ├── auth.go # Session auth, login/logout, session cleanup
│ ├── handlers.go # Page handlers (dashboard, stacks, deploy, backups, etc.)
│ ├── storage_handlers.go # Storage API handlers (scan, format, migrate, cleanup)
│ ├── alerts.go # State-based alert generation
│ ├── funcmap.go # Template functions (state colors, Hungarian formatting)
│ ├── embed.go # go:embed for templates + Chart.js
│ └── templates/ # 12 HTML files + style.css (Hungarian UI)
├── configs/
│ ├── controller.yaml.example # Full config reference
│ └── example-felhom-metadata.yml # .felhom.yml format reference
├── Dockerfile # Multi-stage: Go 1.24 builder + debian-slim runtime
├── docker-compose.yml # Controller's own compose (privileged, /mnt rshared)
└── go.mod # Go 1.24, deps: bcrypt, yaml.v3, modernc.org/sqlite
Configuration
Controller config (controller.yaml)
Single YAML file per customer, infrastructure-only. Does not contain app-specific config.
Key sections:
customer:
name: "Demo Felhom"
node_id: "demo-felhom"
paths:
stacks_dir: "/opt/docker/stacks"
data_dir: "/opt/docker/felhom-controller/data"
db_dump_dir: "/srv/backups/db-dumps"
restic_repo: "/srv/backups/restic-repo"
git:
repo_url: "https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git"
sync_interval: "15m"
backup:
enabled: true
db_dump_time: "02:30"
restic_time: "03:00"
retention: { keep_daily: 7, keep_weekly: 4, keep_monthly: 6 }
monitoring:
health_interval: "5m"
ping_uuids:
heartbeat: "uuid-here"
system_health: "uuid-here"
db_dump: "uuid-here"
backup: "uuid-here"
backup_integrity: "uuid-here"
hub:
enabled: true
url: "https://hub.felhom.eu"
api_key: "bearer-token-here"
system:
reserved_memory_mb: 384 # RAM reserved for OS + controller
Environment variable overrides: FELHOM_LOGGING_LEVEL=debug, FELHOM_HUB_ENABLED=false, etc.
Runtime settings (settings.json)
Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache. All writes are atomic.
Per-app config (app.yaml)
Auto-generated during deployment. Contains env vars, locked fields list, deploy timestamp. Secret fields are locked (read-only after first deploy).
Scheduler Jobs
| Job | Type | When | Purpose |
|---|---|---|---|
| status-refresh | periodic | 30s | Refresh container states |
| stack-scan | periodic | 2m | Rescan stacks directory |
| heartbeat | periodic | 5m | Ping Healthchecks "I'm alive" |
| system-health | periodic | configurable | Health checks + alert refresh |
| backup-cache | periodic | 5m | Refresh backup status cache |
| hub-report | periodic | 15m | Push report to central hub |
| db-dump | daily | 02:30 | Database dumps |
| backup | daily | 03:00 | Restic backup → cross-drive chain |
| backup-integrity | daily | Sun 04:00 | Restic check |
| metrics-prune | daily | 04:00 | Delete metrics older than 30 days |
All daily jobs use Europe/Budapest timezone. Skip-if-running prevents concurrent execution. Panic recovery in all jobs.
REST API
Stack Operations
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health |
Health check (no auth) |
| GET | /api/stacks |
List all stacks |
| GET | /api/stacks/{name} |
Stack details |
| POST | /api/stacks/{name}/deploy |
First-time deploy |
| POST | /api/stacks/{name}/start |
Start stack |
| POST | /api/stacks/{name}/stop |
Stop stack |
| POST | /api/stacks/{name}/restart |
Restart stack |
| POST | /api/stacks/{name}/update |
Pull + recreate |
| POST | /api/stacks/{name}/optional-config |
Update optional env vars |
| GET | /api/stacks/{name}/logs |
Container logs (?raw=1 for plain text) |
| GET | /api/stacks/{name}/hdd-data |
HDD data paths + sizes |
| DELETE | /api/stacks/{name} |
Delete stack |
| POST | /api/sync |
Trigger catalog sync |
| GET | /api/system/info |
System info + sync status |
Backup & Restore
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/backup/status |
Full backup status |
| POST | /api/backup/run |
Trigger manual backup |
| GET | /api/backup/snapshots |
List snapshots (?stack={name} for filtering) |
| POST | /api/stacks/{name}/cross-backup |
Save cross-drive config |
| POST | /api/stacks/{name}/cross-backup/run |
Trigger cross-drive backup |
| GET | /api/stacks/{name}/cross-backup/status |
Cross-drive status |
| POST | /api/backup/cross-drive/run-all |
Run all scheduled cross-drive backups |
Storage
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/storage/scan |
Scan available disks |
| POST | /api/storage/init |
Format and mount a disk |
| GET | /api/storage/init/status |
Format progress |
| POST | /api/storage/migrate |
Start app data migration |
| GET | /api/storage/migrate/status |
Migration progress |
Metrics
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/metrics/system |
System metrics time-series (`?range=1h |
| GET | /api/metrics/containers/summary |
Current container stats |
| GET | /api/metrics/containers/{name} |
Per-container time-series |
| GET | /api/metrics/sysinfo |
Static system info |
Response format: {"ok": true/false, "data": ..., "error": "...", "message": "..."}
Build & Deploy
Build
# On build server (192.168.0.180)
cd ~/build/felhom-controller
git -C ~/git/deploy-felhom-compose pull
./build.sh v0.12.2 --push
Deploy on customer node
# On customer node (e.g., 192.168.0.162)
cd /opt/docker/felhom-controller
sudo docker pull gitea.dooplex.hu/admin/felhom-controller:v0.12.2
sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:v0.12.2|' docker-compose.yml
sudo docker compose up -d
Important: Always use docker compose up -d, NOT docker compose restart — restart doesn't pick up new images.
Docker Requirements
The controller container needs:
privileged: true(disk operations)- Docker socket mount (
/var/run/docker.sock) /mntmount withpropagation: rshared(container mounts visible to host)/devmounted as/host-dev(block device access)/etc/fstabmounted as/host-fstab(persistent mount config)
See docker-compose.yml for the full volume configuration.
Roadmap
Completed
- Stack management with deploy flow and memory validation
- Git-based app catalog sync
- Central job scheduler
- System monitoring with SQLite metrics and Chart.js charts
- Healthchecks.io integration (5 ping types)
- 3-layer backup system (DB dumps + restic + cross-drive)
- Per-app backup restore with auto stop/restart
- Storage management (scan, format, mount, registry)
- App data migration between storage paths
- Central hub reporting
- Email notifications via hub relay
- Settings persistence and password management
- Dashboard alert system
In Progress / Planned
- Update classification and auto-apply (optional/required/security markers)
- Self-update mechanism with health-based rollback
- Docker volume backup (
/var/lib/docker/volumes:ro) - Raspberry Pi testing (pi-customer-1)
- Cross-drive restic pruning (unbounded snapshot growth)
- CSRF protection on POST endpoints
- Login rate limiting
Test Environments
| Node | Hardware | Domain | Status |
|---|---|---|---|
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | Controller v0.12.2 running |
| pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | Not yet tested |
Related Repositories
| Repository | Purpose |
|---|---|
| deploy-felhom-compose | This repo — controller + deploy scripts |
| app-catalog-felhom.eu | Docker Compose templates + .felhom.yml metadata |
| felhom.eu | Website + app assets + felhom-hub service |