Files
deploy-felhom-compose/controller
..
2026-02-13 19:12:32 +01:00
2026-02-13 18:54:08 +01:00

felhom-controller

Central management container for Felhom home servers.

Replaces Portainer + scattered systemd scripts with a single, lightweight container that provides:

  • Hungarian-language web dashboard for customers
  • Docker Compose stack management (start/stop/update)
  • Interactive first-deployment flow with auto-generated secrets
  • Health-aware container state monitoring (starting/unhealthy/running)
  • Backup orchestration (DB dumps + restic snapshots) — Phase 3
  • System health monitoring with Healthchecks pings — Phase 2
  • Git-based stack synchronization with update management — Phase 4
  • Self-update with automatic rollback on failure — Phase 5

Current Status

Phase 1 — Stack Manager + Deploy Flow: COMPLETE

The controller is built, deployed, and running on the N100 test node (demo-felhom.eu). First application (Paperless-ngx) successfully deployed end-to-end through the dashboard on 2026-02-13.

Milestone achieved: Full deploy cycle works — customer clicks "Telepítés", fills in fields, controller generates secrets, saves app.yaml, runs docker compose up -d, and the app comes up with Traefik routing and health checks. The dashboard correctly shows real-time container states including health substatus (starting → healthy → running).

Current version: v0.6.1

What works

  • Dashboard with live container state (green/orange/yellow/red)
  • Deploy form with password validation, auto-generation, and field locking
  • Stack operations: start, stop, restart, update (pull + recreate)
  • Live-scrolling log viewer with auto-refresh (3s polling), pause/resume, and scroll position tracking
  • Deploy page doubles as config viewer (read-only mode for deployed apps)
  • App detail/info pages with use cases, setup guide, screenshots, and optional config
  • Optional config saves to app.yaml and restarts deployed apps (e.g., metadata provider API keys)
  • Periodic stack rescanning (every 2 minutes)
  • Manual rescan endpoint (POST /api/stacks/rescan)
  • Alphabetically sorted stack display (consistent card ordering)
  • Protected stacks (traefik, cloudflared, felhom-controller) can't be stopped
  • System info bar on dashboard: RAM, SSD, and HDD usage with progress bars
  • Docker Compose memory limits enforced via deploy.resources.limits.memory
  • Pre-deploy memory validation (hard block on mem_request overcommit, soft warning on mem_limit overcommit)
  • Memory summary bar shown on deploy page before deployment
  • Felhom.eu logo SVG in sidebar and login page
  • Verbose debug logging with operation timing, post-start container state checks, and image pull detection
  • Clickable app cards on dashboard and applications pages (navigate to info page)
  • Memory bar with two-segment visualization on deploy page (committed vs new app allocation)
  • Deployment progress UI: 3-step progress panel with real-time health polling (config → containers → health check)
  • CPU usage bar with load average display (1/5/15 min)
  • Temperature display with colored indicator dot (thermal zone reading)
  • Central job scheduler replacing ad-hoc goroutines (periodic + daily jobs)
  • Healthchecks.io-compatible system health pings with retry logic
  • Database auto-discovery and dump (PostgreSQL/MariaDB via docker exec)
  • Restic backup with auto-password generation, snapshot, prune, stats
  • Backup status card on dashboard with manual "Mentés most" trigger button
  • Backup API endpoints: status query and manual trigger
  • SQLite metrics store (system + container metrics, 60s collection, 30-day retention)
  • Heartbeat ping (5-minute "I'm alive" signal to Healthchecks)
  • Weekly backup integrity check (restic check, Sunday 04:00)
  • Central hub reporting (periodic JSON push to felhom-hub service)

Known issues / next priorities

  • Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel)
  • No undo/delete for deployed apps yet
  • Dashboard theme doesn't yet match felhom.eu dark theme

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Customer Hardware (N100 mini PC / Raspberry Pi)                │
│                                                                 │
│  ┌──────────┐   ┌────────────────────────────────────────────┐  │
│  │ Traefik  │   │  felhom-controller                         │  │
│  │ (reverse │──▶│                                            │  │
│  │  proxy)  │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Web UI   │  │ Stack Manager           ││  │
│                 │  │ (HU dash │  │ (compose up/down/pull,  ││  │
│  ┌──────────┐   │  │  board)  │  │  git sync, update mgmt) ││  │
│  │cloudflared│   │  └──────────┘  └─────────────────────────┘│  │
│  │ (tunnel) │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Backup   │  │ Monitor & Pinger        ││  │
│                 │  │ (db dump │  │ (healthchecks pings,    ││  │
│  ┌──────────┐   │  │  restic) │  │  system metrics)        ││  │
│  │ App      │   │  └──────────┘  └─────────────────────────┘│  │
│  │ stacks   │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  │ (docker  │   │  │Scheduler │  │ REST API                ││  │
│  │ compose) │   │  │(cron-like│  │ (for UI + remote mgmt)  ││  │
│  └──────────┘   │  │  jobs)   │  └─────────────────────────┘│  │
│                 │  └──────────┘                              │  │
│                 └────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
         │ pings              │ JSON push          │ git pull
         ▼                    ▼                    ▼
  status.felhom.eu      hub.felhom.eu       gitea.dooplex.hu
  (Healthchecks)        (central dashboard)  (stack definitions)

Repository Layout

This is the controller/ subfolder of the deploy-felhom-compose repository.

controller/
├── cmd/controller/main.go           # Entry point, wires all modules
├── internal/
│   ├── config/config.go             # YAML loader, validation, env overrides
│   ├── stacks/
│   │   ├── manager.go               # Stack scanning, compose ops, container status, debug logging
│   │   ├── metadata.go              # Parse .felhom.yml app metadata
│   │   └── deploy.go                # First-deploy flow: secret gen, app.yaml, compose up
│   ├── sync/
│   │   └── sync.go                  # Git sync: clone/pull app catalog, content-hash copy
│   ├── api/router.go                # REST API endpoints
│   ├── scheduler/
│   │   └── scheduler.go             # Central job scheduler (Every, Daily, skip-if-running)
│   ├── system/
│   │   ├── info.go                  # SystemInfo struct
│   │   ├── info_linux.go            # Linux: /proc/meminfo + statfs + loadavg + temperature
│   │   ├── info_other.go            # Non-Linux stub
│   │   ├── cpu_linux.go             # CPU collector (background /proc/stat sampling)
│   │   └── cpu_other.go             # CPU collector stub (non-Linux)
│   ├── monitor/
│   │   ├── pinger.go                # Healthchecks.io HTTP ping client
│   │   └── healthcheck.go           # System health checks (disk, mem, CPU, temp, Docker)
│   ├── backup/
│   │   ├── backup.go                # Backup orchestrator (DB dumps + restic + prune + integrity)
│   │   ├── dbdump.go                # Database auto-discovery + dump (pg_dump, mariadb-dump)
│   │   └── restic.go                # Restic operations (init, snapshot, prune, check, stats)
│   ├── metrics/
│   │   ├── store.go                 # SQLite metrics storage (system + container, downsampled queries)
│   │   ├── collector.go             # Background collector (60s interval, system + docker stats)
│   │   ├── types.go                 # SystemSample, ContainerSample, StaticSystemInfo structs
│   │   └── sysinfo.go              # Host-level static info (/proc, /etc)
│   ├── report/
│   │   ├── types.go                 # Hub report JSON payload definitions
│   │   ├── builder.go               # Builds report from system/stacks/backup/metrics state
│   │   └── pusher.go                # HTTP POST to central hub (retry, Bearer auth)
│   └── web/
│       ├── server.go                # HTTP server, routing, static file serving
│       ├── auth.go                  # Session auth, login/logout handlers
│       ├── handlers.go              # Page handlers (dashboard, stacks, deploy, etc.)
│       ├── funcmap.go               # Template function map (state colors, formatting)
│       ├── embed.go                 # go:embed directive for templates
│       ├── templates.go             # Felhom logo SVG constant
│       └── templates/               # go:embed HTML/CSS files (Hungarian UI)
│           ├── layout.html, dashboard.html, stacks.html, login.html
│           ├── logs.html, deploy.html, app_info.html
│           └── style.css
├── configs/
│   ├── controller.yaml.example      # Full config reference (infrastructure only)
│   └── example-felhom-metadata.yml  # .felhom.yml format reference
├── scripts/hashpass.go              # Bcrypt password hash generator
├── assets/                          # App logos + screenshots (gitignored, synced at build)
├── docs/BUILDING.md                 # Container image build & registry guide
├── Dockerfile                       # Multi-stage build (Go 1.22 + debian-slim)
├── docker-compose.yml               # Controller's own compose definition
├── Makefile                         # Build targets + asset sync
└── go.mod

Module Overview

Module Path Status Responsibility
Config internal/config/ Done Load & validate controller.yaml, env overrides
Stacks internal/stacks/ Done Compose operations, scanning, metadata, deploy flow
API internal/api/ Done REST endpoints (stacks, deploy, rescan, system info, health)
System internal/system/ Done System resource info (RAM, disk, CPU, temperature, load)
Web internal/web/ Done Hungarian dashboard, auth, deploy pages, asset serving
Sync internal/sync/ Done Git-based app catalog sync (clone/pull, content-hash copy)
Scheduler internal/scheduler/ Done Central job scheduler (periodic + daily, skip-if-running)
Monitor internal/monitor/ Done Healthchecks.io pings, system health checks
Metrics internal/metrics/ Done SQLite time-series store, system + container collection
Report internal/report/ Done Central hub push (JSON report builder + HTTP pusher)
Backup internal/backup/ Done DB auto-discovery + dump, restic snapshots, prune, manual trigger

Stack Management

How stacks get onto the machine

  1. During initial setup, deploy-felhom-compose.sh clones the app catalog
  2. Compose files + .felhom.yml metadata land in /opt/docker/stacks/<app>/
  3. The controller periodically pulls from Git to detect changes (Phase 4)

First deployment flow (via dashboard)

  1. Customer sees app card with "🚀 Telepítés" (Deploy) button
  2. Clicks → deploy page shows:
    • Auto-filled: DOMAIN (from controller config), read-only
    • Auto-generated: DB passwords, secret keys (shown as "✓ Generated")
    • User input: HDD path, admin password, language, etc.
    • "🎲 Generálás" button next to password fields
  3. Clicks "Telepítés" → checkBeforeDeploy() JS guard fetches live state from API first (prevents deploying if already deployed from another tab). Then controller:
    • Memory validation: checks mem_request against available system RAM (see below)
    • Validates all required fields (password fields must be explicitly filled or generated)
    • Generates auto-secrets (DB passwords, hex keys)
    • Saves app.yaml (env vars + locked fields list)
    • Updates in-memory state immediately (so UI shows "deployed" during slow compose ops)
    • Runs docker compose up -d with env vars injected
    • On failure: reverts both in-memory state and disk (app.yaml deployed: false)
  4. Progress UI replaces the form with a 3-step progress panel:
    • "Konfiguráció mentve" — shown immediately after API success
    • "Konténer(ek) indítása..." → when containers are up
    • "Alkalmazás inicializálása..." → when state = running (healthy)
    • Polls GET /api/stacks/{name} every 3 seconds for real-time health status
    • Handles: running (auto-redirect), starting (keep polling), unhealthy (warning), exited (error)
    • Timeout after 120 seconds with informational message
  5. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only
  6. "Részletek" button opens deploy page in read-only mode showing current config

Memory validation during deploy

Before deploying an app, the controller checks if there's enough RAM. This uses the Kubernetes-inspired mem_request / mem_limit model:

Field In .felhom.yml Purpose Validation
mem_request resources.mem_request: "500M" Expected memory usage during normal operation Hard block — sum of requests must not exceed usable RAM
mem_limit resources.mem_limit: "1152M" Docker deploy.resources.limits.memory total across all containers Soft warning — overcommit is allowed for limits
pi_compatible resources.pi_compatible: true Whether the app can run on Raspberry Pi Display-only hint
needs_hdd resources.needs_hdd: true Whether the app needs external storage Display-only hint

How it works:

  • usable_memory = total_ram - reserved_memory_mb (default: 384MB reserved for OS + controller)
  • If sum(deployed mem_requests) + new_mem_request > usable_memory → deploy is blocked with error
  • If sum(deployed mem_limits) + new_mem_limit > total_ram → deploy proceeds with warning
  • Apps without mem_request set are treated as 0MB (never blocked)
  • The deploy page shows a memory summary bar before the user clicks deploy

Configure the reserved memory via controller.yaml:

system:
  reserved_memory_mb: 384  # default

Container state display

The dashboard shows health-aware container states with distinct colors:

State Color Label Meaning
Running + healthy 🟢 Green "Fut" All containers running and healthy
Running + health: starting 🟠 Orange "Indulás..." Container up but healthcheck not yet passed
Running + unhealthy 🟡 Yellow "Nem egészséges" Container running but healthcheck failing
Stopped/exited 🔴 Red "Leállítva" All containers stopped
Restarting 🟡 Yellow "Újraindítás..." Container in restart loop
Not deployed Gray "Nincs telepítve" Compose file exists but not yet deployed

Action buttons adapt: "operational" states (running/starting/unhealthy/restarting) show restart/stop, while stopped states show a start button.

Update strategy (Phase 4)

Stack updates are classified in the Git repository via markers:

Marker Behavior
No marker Optional update — shown on dashboard, customer clicks "Update"
UPDATE_REQUIRED=true Mandatory — auto-applied during next update window
UPDATE_SECURITY=true Critical — applied immediately (within minutes)

Protected stacks

The following stacks cannot be stopped from the customer UI:

  • traefik (reverse proxy)
  • cloudflared (tunnel)
  • felhom-controller (this container)

Logging

The controller uses two-tier logging controlled by logging.level in controller.yaml:

logging:
  level: debug    # debug | info | warn | error (default: info)
  file: ""        # optional log file path
  max_size_mb: 10
  max_files: 3

Can also be set via environment variable: FELHOM_LOGGING_LEVEL=debug

What gets logged at each level

Level What's logged
info Operation success/failure with elapsed time, post-start container states, stack scan counts, deploy memory checks
debug All of above + env var keys per compose command, local image availability checks, compose command timing, log fetch byte counts

Example output (debug level)

[INFO] Starting stack: immich
[DEBUG] Env vars for compose: [DOMAIN, DB_PASSWORD, HDD_PATH] (3 app + 42 system)
[DEBUG] Running: docker compose up -d (in /opt/docker/stacks/immich)
[DEBUG] Command completed: docker compose up -d (took 12.3s)
[INFO] Stack immich started successfully (took 12.3s)
[INFO] Stack immich post-start status:
[INFO]   immich-server   ghcr.io/immich-app/immich-server:release   running   Up 3 seconds (health: starting)
[INFO]   immich-postgres docker.io/tensorchord/pgvecto-rs:pg16...   running   Up 3 seconds (healthy)
[INFO]   immich-redis    docker.io/library/redis:7-alpine            running   Up 3 seconds (healthy)

On failure:

[ERROR] Command failed: docker compose up -d (in /opt/docker/stacks/immich) — exit code 1 (took 2.1s)
[ERROR] stderr: Error response from daemon: pull access denied...
[ERROR] Stack immich start failed after 2.1s: exit code 1

Security

  • Env var values are never logged — only keys appear in debug output
  • stdout/stderr in error logs are truncated to 500 characters to prevent log spam

Configuration

Controller config (infrastructure only)

Single YAML file per customer: /opt/docker/felhom-controller/controller.yaml

Contains customer identity, infrastructure secrets, backup/monitoring settings. Does not contain app-specific config (HDD paths, DB passwords, etc.).

See configs/controller.yaml.example for the full reference.

Per-app config (created during deployment)

Each deployed app gets an app.yaml in its stack directory:

# /opt/docker/stacks/paperless-ngx/app.yaml
# Auto-generated by felhom-controller — do not edit locked fields manually
deployed: true
deployed_at: "2026-02-13T21:10:00Z"
env:
  DOMAIN: "demo-felhom.eu"
  DB_PASSWORD: "a7f2b9c1e4d..."       # locked
  PAPERLESS_SECRET_KEY: "8b3e..."      # locked
  PAPERLESS_ADMIN_USER: "admin"        # editable
  PAPERLESS_OCR_LANGUAGE: "hun+eng"    # editable
  HDD_PATH: "/mnt/hdd_placeholder"    # locked
locked_fields:
  - DB_PASSWORD
  - PAPERLESS_SECRET_KEY
  - DOMAIN
  - HDD_PATH

App assets (logos, screenshots)

Baked into the container image at build time — no external dependencies at runtime. Synced from the felhom.eu website repo before building.

Served locally at /static/assets/. Logos try SVG first, fall back to PNG.

Build & Deploy

Source: https://gitea.dooplex.hu/admin/deploy-felhom-composecontroller/ subfolder.

Build happens outside the repo in ~/build/felhom-controller/ to keep the repo clean. See docs/BUILDING.md for the full guide.

# Quick build (current platform only)
cd ~/build/felhom-controller
./build.sh 0.2.11

# Build + push to Gitea registry
./build.sh 0.2.11 --push

Deploy on customer node

# Pull new image
docker pull gitea.dooplex.hu/admin/felhom-controller:0.2.11

# IMPORTANT: use 'up -d', NOT 'restart' — restart doesn't pick up new images
cd /opt/docker/felhom-controller
docker compose up -d

Test Environments

Node Hardware Domain IP Status
demo-felhom Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD demo-felhom.eu 192.168.0.162 Controller v0.6.0 + Paperless-ngx running
pi-customer-1 Raspberry Pi 3B+, 1G RAM, 32G SD pi-customer-1.local 📲 Not yet tested

First deployment log (Paperless-ngx on demo-felhom)

  • Date: 2026-02-13
  • App: Paperless-ngx (document management)
  • Deploy method: Dashboard UI → "Telepítés" button
  • Issues encountered & resolved:
    1. Password fields accepted empty values → Added server-side + client-side validation
    2. "Telepítés" button appeared for already-deployed apps → Fixed in-memory Deployed flag update
    3. Green status shown for (health: starting) containers → Added health-aware state parsing
    4. Stack cards switched positions on refresh → Added alphabetical sorting in GetStacks()
    5. "Részletek" button did nothing for deployed apps → Redirects to deploy page (read-only)
    6. OCR crash: PAPERLESS_OCR_LANGUAGE=hun not installed → Added PAPERLESS_OCR_LANGUAGES (plural) to docker-compose
    7. Container restart vs recreate: docker compose restart doesn't pick up new images → Documented: always use docker compose up -d

REST API

Method Endpoint Auth Description
GET /api/health No Health check (for monitoring)
GET /api/stacks Yes List all stacks
GET /api/stacks/{name} Yes Stack details
GET /api/stacks/{name}/deploy-fields Yes Get deploy form fields
POST /api/stacks/{name}/deploy Yes First-time deploy with config
POST /api/stacks/{name}/start Yes Start stack
POST /api/stacks/{name}/stop Yes Stop stack (not protected)
POST /api/stacks/{name}/restart Yes Restart stack
POST /api/stacks/{name}/update Yes Pull images + recreate
POST /api/stacks/{name}/optional-config Yes Update optional config env vars
GET /api/stacks/{name}/logs Yes Container logs (add ?raw=1 for plain text)
POST /api/stacks/rescan Yes Trigger manual stack discovery
GET /api/system/info Yes System resource usage (RAM, disk, CPU, temp, load)
GET /api/backup/status Yes Backup status (last run, DB dump count, repo stats)
POST /api/backup/run Yes Trigger manual backup (DB dumps + restic snapshot)

Status & Roadmap

Phase 1 — Stack Manager + Deploy Flow COMPLETE

  • Project skeleton & config format
  • .felhom.yml app metadata format with deploy fields
  • Per-app config persistence (app.yaml)
  • Secret generation engine (password, hex, static)
  • Stack catalog (read compose files + metadata from disk)
  • Docker Compose operations (up/down/pull/ps/logs)
  • Deploy flow with interactive field input
  • Password validation (server-side + client-side, no empty passwords)
  • Basic web dashboard with start/stop/deploy buttons
  • Health-aware container states (starting/unhealthy/running)
  • REST API for stack + deploy operations
  • Simple web authentication (bcrypt sessions)
  • App assets baked into container (SVG/PNG logos, webp screenshots)
  • Container image build pipeline (Dockerfile + build.sh)
  • Build + push to Gitea container registry
  • Deploy on N100 test node — dashboard accessible
  • Stack scanning + display working
  • First app deployed: Paperless-ngx via dashboard (2026-02-13)
  • Periodic stack rescanning (every 2 minutes)
  • Alphabetically sorted stack display
  • Deploy page doubles as read-only config viewer for deployed apps

Phase 2 — Monitoring & Health COMPLETE

  • System metrics on dashboard (RAM, SSD, HDD usage bars)
  • /api/system/info endpoint with live resource data
  • Pre-deploy memory validation (mem_request hard block, mem_limit soft warning)
  • Memory summary bar on deploy page
  • CPU usage collector (background /proc/stat sampling, 5s interval)
  • CPU usage bar on dashboard with load average display
  • Temperature reading from /sys/class/thermal (with /host/sys Docker mount)
  • Temperature display with colored indicator dot (green/yellow/red)
  • Central job scheduler (replaces ad-hoc goroutines)
  • Healthchecks.io-compatible HTTP pinger with retry logic
  • System health checks (disk, memory, CPU, temp, Docker, protected containers)
  • Heartbeat ping (5-minute "I'm alive" signal)
  • SQLite metrics store (system + container metrics, 60s collection, 30-day prune)
  • Backup integrity check (weekly restic check with Healthchecks ping)
  • Customer notifications (email/Telegram)

Phase 3 — Backups COMPLETE

  • DB auto-discovery (PostgreSQL/MariaDB containers via docker inspect)
  • DB dump engine (pg_dump/mariadb-dump via docker exec, atomic writes)
  • Restic integration (auto-init, snapshot, prune, check, stats)
  • Restic password auto-generation (no manual setup needed)
  • Backup orchestrator (DB dumps + restic + weekly prune)
  • Backup status on dashboard (last run, DB count, repo stats)
  • Manual backup trigger from UI ("Mentés most" button)
  • GET /api/backup/status and POST /api/backup/run endpoints
  • Restore workflow

Phase 4 — Git Sync & Updates

  • Periodic git pull for stack definitions (git sync module)
  • Manual sync button on Alkalmazások page ("Sablonok frissítése")
  • Sync status in /api/system/info
  • Update classification (optional/required/security)
  • Update window enforcement
  • Dashboard update notifications with "Update" button

Phase 5 — Self-Update & Resilience

  • Self-update check & execution
  • Pre-update config backup
  • Health-based rollback mechanism
  • Config export/import

Phase 6 — Central Management (in progress)

  • Central hub reporting (controller → hub JSON push with Bearer auth)
  • Hub report builder (system, stacks, backup, health, containers, metrics)
  • Hub service (felhom-hub: REST API + SQLite + dark-theme dashboard)
  • K8s manifests for hub deployment on k3s
  • Fleet-wide update management
  • Customer notifications (email/Telegram)
Repository Purpose
deploy-felhom-compose This repo — controller + deploy scripts
app-catalog-felhom.eu Docker Compose templates + .felhom.yml metadata
felhom.eu Website + app assets + felhom infra manifests (incl. felhom-hub)
homelab-manifests k3s cluster manifests (dooplex.hu)