Files
deploy-felhom-compose/controller

felhom-controller

Central management container for Felhom home servers.

A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.

Current version: v0.12.7


Table of Contents


Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Customer Hardware (N100 mini PC / Raspberry Pi)                │
│                                                                 │
│  ┌──────────┐   ┌────────────────────────────────────────────┐  │
│  │ Traefik  │   │  felhom-controller (privileged container)  │  │
│  │ (reverse │──▶│                                            │  │
│  │  proxy)  │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Web UI   │  │ Stack Manager           ││  │
│                 │  │ (HU dash │  │ (compose ops, git sync,  ││  │
│  ┌──────────┐   │  │  board)  │  │  deploy, delete, update) ││  │
│  │cloudflared│   │  └──────────┘  └─────────────────────────┘│  │
│  │ (tunnel) │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Backup   │  │ Storage Manager         ││  │
│                 │  │ (3-layer │  │ (disk scan, format,     ││  │
│  ┌──────────┐   │  │  restic) │  │  mount, migrate)        ││  │
│  │ App      │   │  └──────────┘  └─────────────────────────┘│  │
│  │ stacks   │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  │ (docker  │   │  │Scheduler │  │ Monitor & Metrics       ││  │
│  │ compose) │   │  │(cron-like│  │ (health, pings, SQLite  ││  │
│  └──────────┘   │  │  jobs)   │  │  time-series, Chart.js) ││  │
│                 │  └──────────┘  └─────────────────────────┘│  │
│                 │  ┌──────────┐  ┌─────────────────────────┐│  │
│                 │  │ Notify   │  │ REST API + Hub Reporter ││  │
│                 │  │ (email)  │  │ (JSON push to hub)      ││  │
│                 │  └──────────┘  └─────────────────────────┘│  │
│                 └────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
         │ pings              │ JSON push          │ git pull
         ▼                    ▼                    ▼
  status.felhom.eu      hub.felhom.eu       gitea.dooplex.hu
  (Healthchecks)        (central dashboard)  (stack definitions)

Key Architecture Decisions

  • Pure Go, no frameworks — stdlib net/http + html/template. Only external deps: bcrypt, yaml.v3, modernc.org/sqlite (pure Go, no CGO).
  • Privileged container — Required for disk operations (format, mount, fstab), /dev access, and Docker socket control.
  • /host-dev indirection — Docker overrides /dev with a tmpfs. The host's /dev is mounted at /host-dev to access block devices.
  • StackDataProvider interface — Breaks circular import between backup and stacks packages. Implemented by stackAdapter in main.go.
  • Atomic file writes — All persistent state (settings.json, app.yaml) written to .tmp then os.Rename for crash safety.
  • go:embed templates — All HTML/CSS/JS compiled into the binary. No runtime file dependencies.
  • Europe/Budapest timezone — All scheduled jobs, timestamps, and UI labels use Hungarian timezone.

Module Map

Module Path Responsibility
Config internal/config/ YAML loader, validation, FELHOM_* env overrides
Settings internal/settings/ Runtime-mutable settings.json (passwords, backup prefs, storage paths, notifications)
Stacks internal/stacks/ Compose operations, scanning, .felhom.yml metadata, deploy/delete flow
Sync internal/sync/ Git-based app catalog sync (clone/pull, content-hash copy)
Backup internal/backup/ 3-layer backup: DB dumps → restic snapshots → cross-drive copies, restore
Storage internal/storage/ Disk scanning (lsblk), partitioning (sfdisk), formatting (mkfs.ext4), mounting, data migration (rsync)
System internal/system/ System info (/proc), CPU collector, mount points, disk usage, FS info
Monitor internal/monitor/ Healthchecks.io pinger, system health checks
Metrics internal/metrics/ SQLite time-series store, system + container metric collection
Scheduler internal/scheduler/ Central job scheduler (periodic + daily, skip-if-running, panic recovery)
Notify internal/notify/ Email notifications via hub relay, preference sync, per-event cooldowns
Report internal/report/ Hub report builder + HTTP pusher (system, stacks, backup, health)
API internal/api/ REST JSON endpoints
Web internal/web/ Hungarian dashboard, auth, page handlers, template functions, alerts

Features

1. App Management

The controller manages Docker Compose stacks through a complete lifecycle: catalog sync, first-time deployment, runtime operations, and deletion.

Git Sync (internal/sync/)

The app catalog lives in a separate Git repository. The controller:

  • Shallow-clones the catalog on startup
  • Periodically fetches updates (configurable, default 15 min)
  • Copies only docker-compose.yml and .felhom.yml to the stacks directory
  • Never overwrites app.yaml or .env (user secrets are safe)
  • Uses SHA-256 content hashing — only writes files that actually changed
  • Triggers stack rescan after sync so the dashboard updates immediately
  • Manual sync via "Sablonok frissitese" button or POST /api/sync

First-Time Deploy Flow

  1. Customer sees app card with "Telepites" button
  2. Deploy page shows auto-filled fields (domain), auto-generated secrets (DB passwords, hex keys), and user-configurable inputs (admin password, language, storage path)
  3. checkBeforeDeploy() JS guard fetches live state first (prevents double-deploy from another tab)
  4. Memory validation checks mem_request against available RAM:
    • usable_memory = total_ram - reserved_memory_mb (default 384MB reserved)
    • Hard block if requests exceed usable memory
    • Soft warning if limits exceed total RAM (overcommit OK)
  5. Controller generates secrets, saves app.yaml, sets in-memory Deployed flag before docker compose up -d (avoids stale UI during slow image pulls), reverts on failure
  6. 3-step progress panel polls GET /api/stacks/{name} every 3s: config saved → containers starting → health check passed
  7. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only

App Info Pages

Each app can define rich metadata in .felhom.yml:

  • app_info: tagline, use_cases, first_steps, prerequisites, default_creds, docs_url
  • optional_config: groups of post-deploy configurable env vars (e.g., API keys for metadata providers)
  • resources: mem_request, mem_limit, pi_compatible, needs_hdd

The /apps/{slug} page renders hero section, screenshots, setup guide, and optional config form.

Stack Operations

Operation What it does
Start docker compose up -d
Stop docker compose stop (blocked for protected stacks)
Restart docker compose restart
Update docker compose pull + docker compose up -d
Delete docker compose down --rmi local --volumes + optional HDD data cleanup

Protected stacks (traefik, cloudflared, felhom-controller) cannot be stopped or deleted from the UI. Restart is allowed.

Orphan detection: Deployed stacks with no matching catalog template are marked as orphaned with an "Elavult" badge and can be safely deleted.

Container State Display

State Color Label Meaning
Running + healthy Green "Fut" All containers running and healthy
Running + starting Orange "Indulas..." Healthcheck not yet passed
Running + unhealthy Yellow "Nem egeszseges" Healthcheck failing
Stopped/exited Red "Leallitva" All containers stopped
Restarting Yellow "Ujrainditas..." Restart loop
Not deployed Gray "Nincs telepitve" Compose file exists, not deployed

2. Backup System

The backup system implements a 3-2-1 backup architecture:

Rule What Where Status
1. Nightly backup DB dumps + config + ALL user data Same drive as app Mandatory, automatic
2. Cross-drive backup User data copy to secondary drive Different physical device Opt-in per app
3. Remote backup Offsite copy for disaster recovery Cloud / remote server Future

Key principle: User data backup is mandatory — every app with HDD bind mounts is included in the nightly restic snapshot automatically. There is no per-app toggle. The AppBackupPrefs.Enabled field in settings.json is legacy and not read by any code.

Rule 1: Nightly Backup (mandatory, same drive)

The nightly backup has two phases that run sequentially:

Phase 1 — Database Dumps (internal/backup/dbdump.go, scheduled 02:30)

  • Auto-discovery of PostgreSQL and MariaDB containers via docker ps + docker inspect
  • Dumps via docker exec pg_dump / docker exec mariadb-dump with 5-minute timeout
  • Atomic writes (.tmp.sql) to prevent corruption
  • Validation after each dump: checks file size, header presence, counts CREATE TABLE
  • Results cached in settings.json surviving container restarts

Phase 2 — Restic Snapshot (internal/backup/restic.go, scheduled 03:00)

  • Auto-generated repository password (32 random bytes, base64url), synced to hub
  • Paths included in every snapshot:
    • Stacks dir (all compose.yml + app.yaml + .felhom.yml)
    • DB dump dir (all .sql dump files from Phase 1)
    • controller.yaml (controller config)
    • ALL deployed apps' HDD mount paths — discovered via resolveAppBackupPaths() which iterates ListDeployedStacks(), no Enabled flag
  • Auto-detects and unlocks stale locks (restic repo lock)
  • Weekly prune on Sundays with configurable retention (keep-daily, keep-weekly, keep-monthly)
  • Weekly integrity check (restic check) on Sunday 04:00

What this protects against: accidental deletion, data corruption, point-in-time rollback. Does NOT protect against drive failure (backup is on the same physical drive).

Rule 2: Cross-Drive Backup (opt-in, different device) (internal/backup/crossdrive.go)

Copies user data to a different physical drive, providing the second copy for 3-2-1.

  • Two methods:

    • rsync — Simple mirror with --delete (fast, no versioning)
    • restic — Versioned, deduplicated, encrypted (shared repo across apps, auto-generated password)
  • Per-app configuration in settings.json: destination path, method, schedule (daily/weekly/manual)

  • Pre-backup DB dump: DumpStackDB() runs fresh pg_dump/mariadb-dump before each cross-drive backup to ensure DB state matches user data; non-fatal on failure (wired via DBDumper interface to avoid circular imports)

  • Drive-type-aware validation (ValidateDestination):

    Destination type Space checks
    External mount (different device than /) Block if <100 MB free
    System drive (same device as /) Require ≥10 GB free AND <90% used; logged warning
  • Rsync destination layout:

    • Single mount: backups/rsync/<app>/ (flat, no extra nesting)
    • Multiple mounts: backups/rsync/<app>/<leaf>/ per mount; duplicate leaf names get _N suffix
    • DB dump files excluded (--exclude backups/*.sql.gz/sql/dump) — already handled by pg_dump
  • Safety guards: destination ≠ source, path-overlap check, writable check

  • Chained execution: runs immediately after nightly restic — daily apps every night, weekly apps on Sundays

  • Per-app concurrency lock prevents overlapping runs

  • Status (last_run, duration, size, error) persisted to settings.json

What this protects against: primary drive failure, drive theft/damage.

Rule 3: Remote Backup (future)

Offsite backup for disaster recovery. Not yet implemented.

Restore (internal/backup/restore.go)

All deployed apps appear in the restore dropdown — every app has restic snapshot data (stacks dir + DB dumps are always backed up).

App type Config restored DB restored User data restored
Has HDD data ✓ (always — backup is mandatory)
DB only, no HDD n/a
No DB, no HDD n/a
  • Snapshot API returns ALL snapshots unfiltered — older snapshots (pre-mandatory HDD backup) still allow config+DB restore; RestoreApp extracts whatever paths are available
  • Restore type info shown per-app when selected in dropdown (Hungarian banners):
    • Has HDD: "Teljes visszaállítás: adatbázis + konfiguráció + felhasználói adatok"
    • Has DB, no HDD: "Adatbázis és konfiguráció visszaállítása"
    • No DB, no HDD: "Csak konfiguráció visszaállítása"
  • Execution flow: stop app → restic restore <id> --target / --include <path>... → restart app
  • Running flag prevents concurrent backup/restore operations
  • Snapshot ID validated (864 lowercase hex)

Backup Page UI (internal/web/templates/backups.html)

Unified per-app status table with expandable rows showing 3 backup layers per app:

Status dot per app:

Dot color Meaning
Green Fully covered — cross-drive configured and last run OK
Yellow Warning — no second copy, or last backup failed, or disk space issue
Red Cross-drive destination blocked or inaccessible
Gray (auto) No user data — only config/DB backup (automatic)

Three backup layers per app row:

  1. Adatbázis mentés — Auto badge + last run timestamp + status
  2. Konfiguráció — Auto badge + last restic snapshot timestamp + status
  3. Felhasználói adatok — one of:
    • Cross-drive configured: method + destination + schedule + last run + status + "Futtatás most" button
    • HDD data, no cross-drive: "✓ Helyi mentés auto" (green) + "⚠ Nincs 2. másolat" (yellow) + settings link
    • No HDD data: "— (nincs HDD adat)" (muted)

Other sections:

  • Schedule overview with next run times for DB dump, restic, prune
  • Snapshot history table (last 20 snapshots with ID, time, files new/changed, data added)
  • Repository info card (path, size, snapshot count, encryption key with show/copy)
  • Restore section: app dropdown → snapshot dropdown → restore type info → confirmation checkbox → execute

3. Storage Management

The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration.

Disk Scanning (internal/storage/scan.go)

  • ScanDisks() uses lsblk -J -b for block device enumeration
  • System disk detection via host fstab parsing (/host-fstab) + UUID resolution via blkid
  • Partitions enriched with filesystem type, UUID, and label from direct blkid probing (Docker containers have incomplete udev cache)
  • Returns AvailableDisks (non-system, non-loop, non-CDROM) and SystemDisks separately
  • Handles NVMe (nvme0n1p1), SCSI (sdb1), and eMMC (mmcblk0p1) naming

Disk Initialization Wizard (internal/storage/format.go)

A step-by-step UI at /settings/storage/init:

  1. Scan — Lists available disks with model, size, partition info
  2. Select — User picks a disk and enters a mount name (e.g., hdd_1)
  3. Confirm — User types "FORMAZAS" to confirm destructive operation
  4. Format pipeline: wipefssfdisk (GPT) → mkfs.ext4blkid UUID → backup fstab → append UUID-based fstab entry → mount → findmnt verification → chown 1000:1000 → create storage/ and Dokumentumok/ subdirectories
  5. Auto-registers new storage path in settings.json
  6. Smart partition detection: skips repartitioning for existing empty partitions

Safety guards: system disk detection, mount path conflict check, confirmation required, progress channel for real-time UI feedback.

Storage Path Registry (internal/settings/settings.go)

Multiple external storage paths supported with:

  • Label: Human-readable name (editable inline)
  • Default flag: New deploys use this path by default
  • Schedulable flag: Path appears in deploy dropdown
  • Auto-discovery: On startup, scans deployed apps' HDD_PATH values and registers unknown paths
  • Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel

Data Migration (internal/storage/migrate.go)

Move app data between storage paths (e.g., SSD → HDD, HDD → new HDD):

  1. Validate: stack exists, deployed, has HDD data, target differs from source
  2. Estimate total size, check free space on target
  3. Stop the application
  4. rsync -a --info=progress2 per mount path with real-time progress parsing
  5. Update app.yaml HDD_PATH to new location
  6. Start the application
  7. Rollback on failure: reverts config, restarts on old storage

Progress UI at /stacks/{name}/migrate with byte counter and percentage.

Stale Data Cleanup

After migration, the deploy page detects leftover data on previous storage paths:

  • Shows path, size, and a delete button
  • Two-step confirmation required
  • Protected paths (storage root, media, Dokumentumok, appdata) cannot be deleted

FileBrowser Mount Sync

When storage paths are added or removed, syncFileBrowserMounts() auto-regenerates FileBrowser's docker-compose.yml with volume mounts for all registered paths, then recreates the container.


4. Monitoring & Health

System Health Checks (internal/monitor/healthcheck.go)

RunHealthCheck() evaluates multiple subsystems and returns a HealthReport with status (ok/warn/fail):

Check Warning Critical
Disk usage (SSD/HDD) >= 90% >= 95%
Memory available < 512MB available < 256MB
CPU temperature >= 75C >= 85C
Docker daemon unreachable
Protected containers not running
Storage paths not a mount point (data on SSD) path inaccessible, disk >= 95%

Backup destination validation (CheckBackupDestination) has tiered checks:

  • Path doesn't exist → critical/blocked
  • Not writable → critical/blocked
  • Same block device as root → warning (data on system drive)
  • Disk >95% full → critical/blocked
  • Disk >90% full → warning

Healthchecks.io Integration (internal/monitor/pinger.go)

Five ping UUIDs for external monitoring:

  • Heartbeat: every 5 min (simple "I'm alive")
  • System Health: periodic health check results
  • DB Dump: after nightly database dumps
  • Backup: after nightly restic backup
  • Backup Integrity: weekly restic check result

3-attempt retry with 2-second backoff. Pinger never fails the caller.

Metrics Store (internal/metrics/)

  • SQLite with WAL mode for concurrent reads during collection
  • System metrics: CPU%, memory (total/used/available), temperature, load average — collected every 60 seconds
  • Container metrics: CPU%, memory, network I/O, block I/O per container
  • Downsampled queries for chart time ranges (1h, 6h, 24h, 7d, 30d)
  • 30-day auto-prune via daily scheduler job

Monitoring Page

Full-page system monitor at /monitoring:

  • System Overview: hostname, OS, kernel, CPU model/cores, uptime
  • System Metrics Charts: 4 line charts (CPU, Memory, Temperature, Load) in 2x2 grid
  • Container Resources: horizontal bar charts (CPU% and Memory per container)
  • Per-container Detail: click-to-expand historical charts
  • Remote Monitoring Status: shows Healthchecks ping UUID configuration

Chart.js 4.4.7 embedded locally (works in offline environments), dark theme matching site design.

Alert System (internal/web/alerts.go)

State-based alerts displayed on all pages:

  • Sources: health issues, missing ping UUIDs, backup disabled
  • Sorted by severity (error > warning > info), capped at 5 visible
  • Refreshed every 5 min + on startup
  • Monitoring page suppresses ping-related alerts (shown in dedicated table instead)

5. Notifications

Email Delivery

The controller relays notifications through the central hub, which sends emails via the Resend API:

  1. Controller detects event (health degradation, backup failure, etc.)
  2. Non-blocking POST to hub's /api/v1/notify with event details
  3. Hub checks customer notification preferences
  4. Hub sends Hungarian-language email via Resend

Event Types

Event Trigger
disk_warning Disk usage crosses warning/critical threshold
backup_failed Nightly backup or DB dump fails
update_available New app version detected in catalog
security_update Critical security update available

Cooldown System

Per-event-type cooldown (default 6 hours, configurable) prevents notification spam. Only notifies on status degradation (ok→warn, ok→fail, warn→fail), not on repeated same-status checks.

Preference Sync

Notification preferences (email, enabled events, cooldown) are:

  • Stored locally in settings.json
  • Synced to hub on save and on controller startup
  • Hub sync failure doesn't block local save

6. Update Management

App Catalog Sync

  • Periodic git fetch + git reset --hard of the app catalog repo
  • Content-hash comparison prevents unnecessary file writes
  • Post-sync stack rescan detects new/changed apps immediately

Planned Update Classifications

Marker Behavior
No marker Optional — shown on dashboard, customer clicks "Update"
UPDATE_REQUIRED=true Mandatory — auto-applied during next update window
UPDATE_SECURITY=true Critical — applied immediately

7. Authentication & Settings

Session Auth (internal/web/auth.go)

  • bcrypt password verification with configurable source priority: settings.jsoncontroller.yaml → no auth (open access)
  • 7-day session duration with random 32-byte hex tokens
  • ?next= redirect after login preserves the page the user was visiting
  • Session cleanup every 15 minutes
  • All sessions invalidated on password change
  • Conditional logout link (hidden when auth is disabled)

Settings Persistence (internal/settings/settings.go)

Runtime-mutable settings in settings.json (separate from infrastructure config):

Section Contents
password_hash bcrypt hash override
notifications email, enabled events, cooldown hours
db_validations per-DB dump validation results (survives restarts)
app_backup per-app map: enabled flag, cross-drive config (method, dest, schedule, runtime status)
storage_paths registered paths with label, default flag, schedulable flag
cross_drive_restic_password auto-generated restic password for cross-drive repos

All public methods use sync.RWMutex. File writes are atomic (.tmp + rename).

Settings Page (/settings)

Three sections:

  1. System config — read-only display of controller.yaml values
  2. Password change — current + new + confirm, min 8 chars
  3. Storage paths — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes
  4. Notifications — email, event checkboxes, cooldown hours, test email button

8. Central Hub Reporting

Report Push (internal/report/)

Periodic JSON push (default every 15 min) to the central felhom-hub service:

  • System: hostname, OS, CPU, memory, disk usage, uptime
  • Containers: running/stopped counts, per-container CPU/memory
  • Backup: last run, success, repo stats, snapshot count, restic password (for disaster recovery)
  • Health: current status, issues, warnings
  • Stacks: deployed apps with versions and states

Bearer token authentication, 3-attempt retry with 5-second backoff.

Hub Dashboard

The hub service (separate Go app in the felhom.eu repo) provides:

  • Multi-customer overview table with status indicators
  • Customer detail page with system/storage/containers/backup/health sections
  • Color coding: green (<30min), yellow (30-60min), red (>60min since last report)
  • 90-day report retention with daily prune

Repository Layout

controller/
├── cmd/controller/main.go           # Entry point, wires all 14 modules
├── internal/
│   ├── config/config.go             # YAML loader, validation, env overrides
│   ├── settings/settings.go         # Runtime settings (JSON, atomic writes, RWMutex)
│   ├── stacks/
│   │   ├── manager.go               # Stack scanning, compose ops, container status
│   │   ├── metadata.go              # Parse .felhom.yml app metadata
│   │   ├── deploy.go                # First-deploy: secret gen, app.yaml, compose up
│   │   └── delete.go                # Stack deletion + HDD data cleanup
│   ├── sync/sync.go                 # Git sync: clone/pull app catalog, content-hash copy
│   ├── storage/
│   │   ├── scan.go, scan_linux.go   # Disk detection via lsblk + blkid
│   │   ├── format.go, format_linux.go  # Partition, format, mount pipeline
│   │   ├── safety.go, safety_linux.go  # System disk detection, mount guards, fstab ops
│   │   ├── migrate.go              # App data migration (rsync with progress)
│   │   └── *_other.go              # Non-Linux stubs for cross-compilation
│   ├── backup/
│   │   ├── backup.go               # Orchestrator (dumps + restic + cross-drive chain)
│   │   ├── dbdump.go               # DB auto-discovery + dump (pg_dump, mariadb-dump)
│   │   ├── restic.go               # Restic operations (init, snapshot, prune, check)
│   │   ├── appdata.go              # StackDataProvider interface, app data discovery
│   │   ├── crossdrive.go           # Per-app backup to secondary storage (rsync/restic)
│   │   └── restore.go              # Per-app restore with auto stop/restart
│   ├── api/router.go               # REST API endpoints (~30 routes)
│   ├── scheduler/scheduler.go      # Central job scheduler (Every, Daily)
│   ├── system/
│   │   ├── info.go, info_linux.go  # RAM, disk, CPU, temperature, load average
│   │   ├── cpu_linux.go            # Background /proc/stat sampling
│   │   └── mounts_linux.go         # Mount points, disk usage, FS info, backup dest checks
│   ├── monitor/
│   │   ├── pinger.go               # Healthchecks.io HTTP ping client
│   │   └── healthcheck.go          # System health checks (disk, mem, CPU, temp, Docker)
│   ├── metrics/
│   │   ├── store.go                # SQLite time-series (WAL mode, downsampled queries)
│   │   ├── collector.go            # Background collector (60s, system + docker stats)
│   │   └── sysinfo.go              # Static system info (/proc, /etc)
│   ├── notify/notifier.go          # Email relay to hub, preference sync, cooldowns
│   ├── report/
│   │   ├── builder.go              # Hub report builder (all subsystems → JSON)
│   │   └── pusher.go               # HTTP POST to hub (retry, Bearer auth)
│   └── web/
│       ├── server.go               # HTTP server, routing, static files
│       ├── auth.go                 # Session auth, login/logout, session cleanup
│       ├── handlers.go             # Page handlers (dashboard, stacks, deploy, backups, etc.)
│       ├── storage_handlers.go     # Storage API handlers (scan, format, migrate, cleanup)
│       ├── alerts.go               # State-based alert generation
│       ├── funcmap.go              # Template functions (state colors, Hungarian formatting)
│       ├── embed.go                # go:embed for templates + Chart.js
│       └── templates/              # 12 HTML files + style.css (Hungarian UI)
├── configs/
│   ├── controller.yaml.example     # Full config reference
│   └── example-felhom-metadata.yml # .felhom.yml format reference
├── Dockerfile                      # Multi-stage: Go 1.24 builder + debian-slim runtime
├── docker-compose.yml              # Controller's own compose (privileged, /mnt rshared)
└── go.mod                          # Go 1.24, deps: bcrypt, yaml.v3, modernc.org/sqlite

Configuration

Controller config (controller.yaml)

Single YAML file per customer, infrastructure-only. Does not contain app-specific config.

Key sections:

customer:
  name: "Demo Felhom"
  node_id: "demo-felhom"

paths:
  stacks_dir: "/opt/docker/stacks"
  data_dir: "/opt/docker/felhom-controller/data"
  db_dump_dir: "/srv/backups/db-dumps"
  restic_repo: "/srv/backups/restic-repo"

git:
  repo_url: "https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git"
  sync_interval: "15m"

backup:
  enabled: true
  db_dump_time: "02:30"
  restic_time: "03:00"
  retention: { keep_daily: 7, keep_weekly: 4, keep_monthly: 6 }

monitoring:
  health_interval: "5m"
  ping_uuids:
    heartbeat: "uuid-here"
    system_health: "uuid-here"
    db_dump: "uuid-here"
    backup: "uuid-here"
    backup_integrity: "uuid-here"

hub:
  enabled: true
  url: "https://hub.felhom.eu"
  api_key: "bearer-token-here"

system:
  reserved_memory_mb: 384  # RAM reserved for OS + controller

Environment variable overrides: FELHOM_LOGGING_LEVEL=debug, FELHOM_HUB_ENABLED=false, etc.

Runtime settings (settings.json)

Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache. All writes are atomic.

Per-app config (app.yaml)

Auto-generated during deployment. Contains env vars, locked fields list, deploy timestamp. Secret fields are locked (read-only after first deploy).


Scheduler Jobs

Job Type When Purpose
status-refresh periodic 30s Refresh container states
stack-scan periodic 2m Rescan stacks directory
heartbeat periodic 5m Ping Healthchecks "I'm alive"
system-health periodic configurable Health checks + alert refresh
backup-cache periodic 5m Refresh backup status cache
hub-report periodic 15m Push report to central hub
db-dump daily 02:30 Database dumps
backup daily 03:00 Restic backup → cross-drive chain
backup-integrity daily Sun 04:00 Restic check
metrics-prune daily 04:00 Delete metrics older than 30 days

All daily jobs use Europe/Budapest timezone. Skip-if-running prevents concurrent execution. Panic recovery in all jobs.


REST API

Stack Operations

Method Endpoint Description
GET /api/health Health check (no auth)
GET /api/stacks List all stacks
GET /api/stacks/{name} Stack details
POST /api/stacks/{name}/deploy First-time deploy
POST /api/stacks/{name}/start Start stack
POST /api/stacks/{name}/stop Stop stack
POST /api/stacks/{name}/restart Restart stack
POST /api/stacks/{name}/update Pull + recreate
POST /api/stacks/{name}/optional-config Update optional env vars
GET /api/stacks/{name}/logs Container logs (?raw=1 for plain text)
GET /api/stacks/{name}/hdd-data HDD data paths + sizes
DELETE /api/stacks/{name} Delete stack
POST /api/sync Trigger catalog sync
GET /api/system/info System info + sync status

Backup & Restore

Method Endpoint Description
GET /api/backup/status Full backup status
POST /api/backup/run Trigger manual backup
GET /api/backup/snapshots List snapshots (?stack={name} for filtering)
POST /api/stacks/{name}/cross-backup Save cross-drive config
POST /api/stacks/{name}/cross-backup/run Trigger cross-drive backup
GET /api/stacks/{name}/cross-backup/status Cross-drive status
POST /api/backup/cross-drive/run-all Run all scheduled cross-drive backups

Storage

Method Endpoint Description
GET /api/storage/scan Scan available disks
POST /api/storage/init Format and mount a disk
GET /api/storage/init/status Format progress
POST /api/storage/migrate Start app data migration
GET /api/storage/migrate/status Migration progress

Metrics

Method Endpoint Description
GET /api/metrics/system System metrics time-series (`?range=1h
GET /api/metrics/containers/summary Current container stats
GET /api/metrics/containers/{name} Per-container time-series
GET /api/metrics/sysinfo Static system info

Response format: {"ok": true/false, "data": ..., "error": "...", "message": "..."}


Build & Deploy

Build

# On build server (192.168.0.180)
cd ~/build/felhom-controller
git -C ~/git/deploy-felhom-compose pull
./build.sh v0.12.2 --push

Deploy on customer node

# On customer node (e.g., 192.168.0.162)
cd /opt/docker/felhom-controller
sudo docker pull gitea.dooplex.hu/admin/felhom-controller:v0.12.2
sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:v0.12.2|' docker-compose.yml
sudo docker compose up -d

Important: Always use docker compose up -d, NOT docker compose restart — restart doesn't pick up new images.

Docker Requirements

The controller container needs:

  • privileged: true (disk operations)
  • Docker socket mount (/var/run/docker.sock)
  • /mnt mount with propagation: rshared (container mounts visible to host)
  • /dev mounted as /host-dev (block device access)
  • /etc/fstab mounted as /host-fstab (persistent mount config)

See docker-compose.yml for the full volume configuration.


Roadmap

Completed

  • Stack management with deploy flow and memory validation
  • Git-based app catalog sync
  • Central job scheduler
  • System monitoring with SQLite metrics and Chart.js charts
  • Healthchecks.io integration (5 ping types)
  • 3-layer backup system (DB dumps + restic + cross-drive)
  • Per-app backup restore with auto stop/restart
  • Storage management (scan, format, mount, registry)
  • App data migration between storage paths
  • Central hub reporting
  • Email notifications via hub relay
  • Settings persistence and password management
  • Dashboard alert system

In Progress / Planned

  • Update classification and auto-apply (optional/required/security markers)
  • Self-update mechanism with health-based rollback
  • Docker volume backup (/var/lib/docker/volumes:ro)
  • Raspberry Pi testing (pi-customer-1)
  • Cross-drive restic pruning (unbounded snapshot growth)
  • CSRF protection on POST endpoints
  • Login rate limiting

Test Environments

Node Hardware Domain Status
demo-felhom Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD demo-felhom.eu Controller v0.12.2 running
pi-customer-1 Raspberry Pi 3B+, 1G RAM, 32G SD pi-customer-1.local Not yet tested
Repository Purpose
deploy-felhom-compose This repo — controller + deploy scripts
app-catalog-felhom.eu Docker Compose templates + .felhom.yml metadata
felhom.eu Website + app assets + felhom-hub service