Files
deploy-felhom-compose/controller
admin 3f30803432 fix(dashboard): use GetFullStatus for backup display after restart
The dashboard was using GetStatus() which returns nil after restart,
showing "Még nem futott" even when backups exist. Now uses
GetFullStatus() with synthesis logic, matching the backups page.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 10:20:14 +01:00
..
2026-02-13 19:12:32 +01:00
2026-02-13 18:54:08 +01:00

felhom-controller

Central management container for Felhom home servers.

Replaces Portainer + scattered systemd scripts with a single, lightweight container that provides:

  • Hungarian-language web dashboard for customers
  • Docker Compose stack management (start/stop/update)
  • Interactive first-deployment flow with auto-generated secrets
  • Health-aware container state monitoring (starting/unhealthy/running)
  • Backup orchestration (DB dumps + restic snapshots) — Phase 3
  • System health monitoring with Healthchecks pings — Phase 2
  • Git-based stack synchronization with update management — Phase 4
  • Self-update with automatic rollback on failure — Phase 5

Current Status

Phase 1 — Stack Manager + Deploy Flow: COMPLETE

The controller is built, deployed, and running on the N100 test node (demo-felhom.eu). First application (Paperless-ngx) successfully deployed end-to-end through the dashboard on 2026-02-13.

Milestone achieved: Full deploy cycle works — customer clicks "Telepítés", fills in fields, controller generates secrets, saves app.yaml, runs docker compose up -d, and the app comes up with Traefik routing and health checks. The dashboard correctly shows real-time container states including health substatus (starting → healthy → running).

Current version: v0.4.0

What works

  • Dashboard with live container state (green/orange/yellow/red)
  • Deploy form with password validation, auto-generation, and field locking
  • Stack operations: start, stop, restart, update (pull + recreate)
  • Live-scrolling log viewer with auto-refresh (3s polling), pause/resume, and scroll position tracking
  • Deploy page doubles as config viewer (read-only mode for deployed apps)
  • App detail/info pages with use cases, setup guide, screenshots, and optional config
  • Optional config saves to app.yaml and restarts deployed apps (e.g., metadata provider API keys)
  • Periodic stack rescanning (every 2 minutes)
  • Manual rescan endpoint (POST /api/stacks/rescan)
  • Alphabetically sorted stack display (consistent card ordering)
  • Protected stacks (traefik, cloudflared, felhom-controller) can't be stopped
  • System info bar on dashboard: RAM, SSD, and HDD usage with progress bars
  • Docker Compose memory limits enforced via deploy.resources.limits.memory
  • Pre-deploy memory validation (hard block on mem_request overcommit, soft warning on mem_limit overcommit)
  • Memory summary bar shown on deploy page before deployment
  • Felhom.eu logo SVG in sidebar and login page
  • Verbose debug logging with operation timing, post-start container state checks, and image pull detection
  • Clickable app cards on dashboard and applications pages (navigate to info page)
  • Memory bar with two-segment visualization on deploy page (committed vs new app allocation)
  • Deployment progress UI: 3-step progress panel with real-time health polling (config → containers → health check)
  • CPU usage bar with load average display (1/5/15 min)
  • Temperature display with colored indicator dot (thermal zone reading)
  • Central job scheduler replacing ad-hoc goroutines (periodic + daily jobs)
  • Healthchecks.io-compatible system health pings with retry logic
  • Database auto-discovery and dump (PostgreSQL/MariaDB via docker exec)
  • Restic backup with auto-password generation, snapshot, prune, stats
  • Backup status card on dashboard with manual "Mentés most" trigger button
  • Backup API endpoints: status query and manual trigger

Known issues / next priorities

  • Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel)
  • No undo/delete for deployed apps yet
  • Dashboard theme doesn't yet match felhom.eu dark theme

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Customer Hardware (N100 mini PC / Raspberry Pi)                │
│                                                                 │
│  ┌──────────┐   ┌────────────────────────────────────────────┐  │
│  │ Traefik  │   │  felhom-controller                         │  │
│  │ (reverse │──▶│                                            │  │
│  │  proxy)  │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Web UI   │  │ Stack Manager           ││  │
│                 │  │ (HU dash │  │ (compose up/down/pull,  ││  │
│  ┌──────────┐   │  │  board)  │  │  git sync, update mgmt) ││  │
│  │cloudflared│   │  └──────────┘  └─────────────────────────┘│  │
│  │ (tunnel) │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Backup   │  │ Monitor & Pinger        ││  │
│                 │  │ (db dump │  │ (healthchecks pings,    ││  │
│  ┌──────────┐   │  │  restic) │  │  system metrics)        ││  │
│  │ App      │   │  └──────────┘  └─────────────────────────┘│  │
│  │ stacks   │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  │ (docker  │   │  │Scheduler │  │ REST API                ││  │
│  │ compose) │   │  │(cron-like│  │ (for UI + remote mgmt)  ││  │
│  └──────────┘   │  │  jobs)   │  └─────────────────────────┘│  │
│                 │  └──────────┘                              │  │
│                 └────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
         │ pings                              │ git pull
         ▼                                    ▼
  status.felhom.eu                   gitea.dooplex.hu
  (Healthchecks on k3s)              (stack definitions)

Repository Layout

This is the controller/ subfolder of the deploy-felhom-compose repository.

controller/
├── cmd/controller/main.go           # Entry point, wires all modules
├── internal/
│   ├── config/config.go             # YAML loader, validation, env overrides
│   ├── stacks/
│   │   ├── manager.go               # Stack scanning, compose ops, container status, debug logging
│   │   ├── metadata.go              # Parse .felhom.yml app metadata
│   │   └── deploy.go                # First-deploy flow: secret gen, app.yaml, compose up
│   ├── sync/
│   │   └── sync.go                  # Git sync: clone/pull app catalog, content-hash copy
│   ├── api/router.go                # REST API endpoints
│   ├── scheduler/
│   │   └── scheduler.go             # Central job scheduler (Every, Daily, skip-if-running)
│   ├── system/
│   │   ├── info.go                  # SystemInfo struct
│   │   ├── info_linux.go            # Linux: /proc/meminfo + statfs + loadavg + temperature
│   │   ├── info_other.go            # Non-Linux stub
│   │   ├── cpu_linux.go             # CPU collector (background /proc/stat sampling)
│   │   └── cpu_other.go             # CPU collector stub (non-Linux)
│   ├── monitor/
│   │   ├── pinger.go                # Healthchecks.io HTTP ping client
│   │   └── healthcheck.go           # System health checks (disk, mem, CPU, temp, Docker)
│   ├── backup/
│   │   ├── backup.go                # Backup orchestrator (DB dumps + restic + prune)
│   │   ├── dbdump.go                # Database auto-discovery + dump (pg_dump, mariadb-dump)
│   │   └── restic.go                # Restic operations (init, snapshot, prune, stats)
│   └── web/
│       ├── server.go                # HTTP server, routing, static file serving
│       ├── auth.go                  # Session auth, login/logout handlers
│       ├── handlers.go              # Page handlers (dashboard, stacks, deploy, etc.)
│       ├── funcmap.go               # Template function map (state colors, formatting)
│       ├── embed.go                 # go:embed directive for templates
│       ├── templates.go             # Felhom logo SVG constant
│       └── templates/               # go:embed HTML/CSS files (Hungarian UI)
│           ├── layout.html, dashboard.html, stacks.html, login.html
│           ├── logs.html, deploy.html, app_info.html
│           └── style.css
├── configs/
│   ├── controller.yaml.example      # Full config reference (infrastructure only)
│   └── example-felhom-metadata.yml  # .felhom.yml format reference
├── scripts/hashpass.go              # Bcrypt password hash generator
├── assets/                          # App logos + screenshots (gitignored, synced at build)
├── docs/BUILDING.md                 # Container image build & registry guide
├── Dockerfile                       # Multi-stage build (Go 1.22 + debian-slim)
├── docker-compose.yml               # Controller's own compose definition
├── Makefile                         # Build targets + asset sync
└── go.mod

Module Overview

Module Path Status Responsibility
Config internal/config/ Done Load & validate controller.yaml, env overrides
Stacks internal/stacks/ Done Compose operations, scanning, metadata, deploy flow
API internal/api/ Done REST endpoints (stacks, deploy, rescan, system info, health)
System internal/system/ Done System resource info (RAM, disk, CPU, temperature, load)
Web internal/web/ Done Hungarian dashboard, auth, deploy pages, asset serving
Sync internal/sync/ Done Git-based app catalog sync (clone/pull, content-hash copy)
Scheduler internal/scheduler/ Done Central job scheduler (periodic + daily, skip-if-running)
Monitor internal/monitor/ Done Healthchecks.io pings, system health checks
Backup internal/backup/ Done DB auto-discovery + dump, restic snapshots, prune, manual trigger

Stack Management

How stacks get onto the machine

  1. During initial setup, deploy-felhom-compose.sh clones the app catalog
  2. Compose files + .felhom.yml metadata land in /opt/docker/stacks/<app>/
  3. The controller periodically pulls from Git to detect changes (Phase 4)

First deployment flow (via dashboard)

  1. Customer sees app card with "🚀 Telepítés" (Deploy) button
  2. Clicks → deploy page shows:
    • Auto-filled: DOMAIN (from controller config), read-only
    • Auto-generated: DB passwords, secret keys (shown as "✓ Generated")
    • User input: HDD path, admin password, language, etc.
    • "🎲 Generálás" button next to password fields
  3. Clicks "Telepítés" → checkBeforeDeploy() JS guard fetches live state from API first (prevents deploying if already deployed from another tab). Then controller:
    • Memory validation: checks mem_request against available system RAM (see below)
    • Validates all required fields (password fields must be explicitly filled or generated)
    • Generates auto-secrets (DB passwords, hex keys)
    • Saves app.yaml (env vars + locked fields list)
    • Updates in-memory state immediately (so UI shows "deployed" during slow compose ops)
    • Runs docker compose up -d with env vars injected
    • On failure: reverts both in-memory state and disk (app.yaml deployed: false)
  4. Progress UI replaces the form with a 3-step progress panel:
    • "Konfiguráció mentve" — shown immediately after API success
    • "Konténer(ek) indítása..." → when containers are up
    • "Alkalmazás inicializálása..." → when state = running (healthy)
    • Polls GET /api/stacks/{name} every 3 seconds for real-time health status
    • Handles: running (auto-redirect), starting (keep polling), unhealthy (warning), exited (error)
    • Timeout after 120 seconds with informational message
  5. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only
  6. "Részletek" button opens deploy page in read-only mode showing current config

Memory validation during deploy

Before deploying an app, the controller checks if there's enough RAM. This uses the Kubernetes-inspired mem_request / mem_limit model:

Field In .felhom.yml Purpose Validation
mem_request resources.mem_request: "500M" Expected memory usage during normal operation Hard block — sum of requests must not exceed usable RAM
mem_limit resources.mem_limit: "1152M" Docker deploy.resources.limits.memory total across all containers Soft warning — overcommit is allowed for limits
pi_compatible resources.pi_compatible: true Whether the app can run on Raspberry Pi Display-only hint
needs_hdd resources.needs_hdd: true Whether the app needs external storage Display-only hint

How it works:

  • usable_memory = total_ram - reserved_memory_mb (default: 384MB reserved for OS + controller)
  • If sum(deployed mem_requests) + new_mem_request > usable_memory → deploy is blocked with error
  • If sum(deployed mem_limits) + new_mem_limit > total_ram → deploy proceeds with warning
  • Apps without mem_request set are treated as 0MB (never blocked)
  • The deploy page shows a memory summary bar before the user clicks deploy

Configure the reserved memory via controller.yaml:

system:
  reserved_memory_mb: 384  # default

Container state display

The dashboard shows health-aware container states with distinct colors:

State Color Label Meaning
Running + healthy 🟢 Green "Fut" All containers running and healthy
Running + health: starting 🟠 Orange "Indulás..." Container up but healthcheck not yet passed
Running + unhealthy 🟡 Yellow "Nem egészséges" Container running but healthcheck failing
Stopped/exited 🔴 Red "Leállítva" All containers stopped
Restarting 🟡 Yellow "Újraindítás..." Container in restart loop
Not deployed Gray "Nincs telepítve" Compose file exists but not yet deployed

Action buttons adapt: "operational" states (running/starting/unhealthy/restarting) show restart/stop, while stopped states show a start button.

Update strategy (Phase 4)

Stack updates are classified in the Git repository via markers:

Marker Behavior
No marker Optional update — shown on dashboard, customer clicks "Update"
UPDATE_REQUIRED=true Mandatory — auto-applied during next update window
UPDATE_SECURITY=true Critical — applied immediately (within minutes)

Protected stacks

The following stacks cannot be stopped from the customer UI:

  • traefik (reverse proxy)
  • cloudflared (tunnel)
  • felhom-controller (this container)

Logging

The controller uses two-tier logging controlled by logging.level in controller.yaml:

logging:
  level: debug    # debug | info | warn | error (default: info)
  file: ""        # optional log file path
  max_size_mb: 10
  max_files: 3

Can also be set via environment variable: FELHOM_LOGGING_LEVEL=debug

What gets logged at each level

Level What's logged
info Operation success/failure with elapsed time, post-start container states, stack scan counts, deploy memory checks
debug All of above + env var keys per compose command, local image availability checks, compose command timing, log fetch byte counts

Example output (debug level)

[INFO] Starting stack: immich
[DEBUG] Env vars for compose: [DOMAIN, DB_PASSWORD, HDD_PATH] (3 app + 42 system)
[DEBUG] Running: docker compose up -d (in /opt/docker/stacks/immich)
[DEBUG] Command completed: docker compose up -d (took 12.3s)
[INFO] Stack immich started successfully (took 12.3s)
[INFO] Stack immich post-start status:
[INFO]   immich-server   ghcr.io/immich-app/immich-server:release   running   Up 3 seconds (health: starting)
[INFO]   immich-postgres docker.io/tensorchord/pgvecto-rs:pg16...   running   Up 3 seconds (healthy)
[INFO]   immich-redis    docker.io/library/redis:7-alpine            running   Up 3 seconds (healthy)

On failure:

[ERROR] Command failed: docker compose up -d (in /opt/docker/stacks/immich) — exit code 1 (took 2.1s)
[ERROR] stderr: Error response from daemon: pull access denied...
[ERROR] Stack immich start failed after 2.1s: exit code 1

Security

  • Env var values are never logged — only keys appear in debug output
  • stdout/stderr in error logs are truncated to 500 characters to prevent log spam

Configuration

Controller config (infrastructure only)

Single YAML file per customer: /opt/docker/felhom-controller/controller.yaml

Contains customer identity, infrastructure secrets, backup/monitoring settings. Does not contain app-specific config (HDD paths, DB passwords, etc.).

See configs/controller.yaml.example for the full reference.

Per-app config (created during deployment)

Each deployed app gets an app.yaml in its stack directory:

# /opt/docker/stacks/paperless-ngx/app.yaml
# Auto-generated by felhom-controller — do not edit locked fields manually
deployed: true
deployed_at: "2026-02-13T21:10:00Z"
env:
  DOMAIN: "demo-felhom.eu"
  DB_PASSWORD: "a7f2b9c1e4d..."       # locked
  PAPERLESS_SECRET_KEY: "8b3e..."      # locked
  PAPERLESS_ADMIN_USER: "admin"        # editable
  PAPERLESS_OCR_LANGUAGE: "hun+eng"    # editable
  HDD_PATH: "/mnt/hdd_placeholder"    # locked
locked_fields:
  - DB_PASSWORD
  - PAPERLESS_SECRET_KEY
  - DOMAIN
  - HDD_PATH

App assets (logos, screenshots)

Baked into the container image at build time — no external dependencies at runtime. Synced from the felhom.eu website repo before building.

Served locally at /static/assets/. Logos try SVG first, fall back to PNG.

Build & Deploy

Source: https://gitea.dooplex.hu/admin/deploy-felhom-composecontroller/ subfolder.

Build happens outside the repo in ~/build/felhom-controller/ to keep the repo clean. See docs/BUILDING.md for the full guide.

# Quick build (current platform only)
cd ~/build/felhom-controller
./build.sh 0.2.11

# Build + push to Gitea registry
./build.sh 0.2.11 --push

Deploy on customer node

# Pull new image
docker pull gitea.dooplex.hu/admin/felhom-controller:0.2.11

# IMPORTANT: use 'up -d', NOT 'restart' — restart doesn't pick up new images
cd /opt/docker/felhom-controller
docker compose up -d

Test Environments

Node Hardware Domain IP Status
demo-felhom Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD demo-felhom.eu 192.168.0.162 Controller v0.4.0 + Paperless-ngx running
pi-customer-1 Raspberry Pi 3B+, 1G RAM, 32G SD pi-customer-1.local 📲 Not yet tested

First deployment log (Paperless-ngx on demo-felhom)

  • Date: 2026-02-13
  • App: Paperless-ngx (document management)
  • Deploy method: Dashboard UI → "Telepítés" button
  • Issues encountered & resolved:
    1. Password fields accepted empty values → Added server-side + client-side validation
    2. "Telepítés" button appeared for already-deployed apps → Fixed in-memory Deployed flag update
    3. Green status shown for (health: starting) containers → Added health-aware state parsing
    4. Stack cards switched positions on refresh → Added alphabetical sorting in GetStacks()
    5. "Részletek" button did nothing for deployed apps → Redirects to deploy page (read-only)
    6. OCR crash: PAPERLESS_OCR_LANGUAGE=hun not installed → Added PAPERLESS_OCR_LANGUAGES (plural) to docker-compose
    7. Container restart vs recreate: docker compose restart doesn't pick up new images → Documented: always use docker compose up -d

REST API

Method Endpoint Auth Description
GET /api/health No Health check (for monitoring)
GET /api/stacks Yes List all stacks
GET /api/stacks/{name} Yes Stack details
GET /api/stacks/{name}/deploy-fields Yes Get deploy form fields
POST /api/stacks/{name}/deploy Yes First-time deploy with config
POST /api/stacks/{name}/start Yes Start stack
POST /api/stacks/{name}/stop Yes Stop stack (not protected)
POST /api/stacks/{name}/restart Yes Restart stack
POST /api/stacks/{name}/update Yes Pull images + recreate
POST /api/stacks/{name}/optional-config Yes Update optional config env vars
GET /api/stacks/{name}/logs Yes Container logs (add ?raw=1 for plain text)
POST /api/stacks/rescan Yes Trigger manual stack discovery
GET /api/system/info Yes System resource usage (RAM, disk, CPU, temp, load)
GET /api/backup/status Yes Backup status (last run, DB dump count, repo stats)
POST /api/backup/run Yes Trigger manual backup (DB dumps + restic snapshot)

Status & Roadmap

Phase 1 — Stack Manager + Deploy Flow COMPLETE

  • Project skeleton & config format
  • .felhom.yml app metadata format with deploy fields
  • Per-app config persistence (app.yaml)
  • Secret generation engine (password, hex, static)
  • Stack catalog (read compose files + metadata from disk)
  • Docker Compose operations (up/down/pull/ps/logs)
  • Deploy flow with interactive field input
  • Password validation (server-side + client-side, no empty passwords)
  • Basic web dashboard with start/stop/deploy buttons
  • Health-aware container states (starting/unhealthy/running)
  • REST API for stack + deploy operations
  • Simple web authentication (bcrypt sessions)
  • App assets baked into container (SVG/PNG logos, webp screenshots)
  • Container image build pipeline (Dockerfile + build.sh)
  • Build + push to Gitea container registry
  • Deploy on N100 test node — dashboard accessible
  • Stack scanning + display working
  • First app deployed: Paperless-ngx via dashboard (2026-02-13)
  • Periodic stack rescanning (every 2 minutes)
  • Alphabetically sorted stack display
  • Deploy page doubles as read-only config viewer for deployed apps

Phase 2 — Monitoring & Health COMPLETE

  • System metrics on dashboard (RAM, SSD, HDD usage bars)
  • /api/system/info endpoint with live resource data
  • Pre-deploy memory validation (mem_request hard block, mem_limit soft warning)
  • Memory summary bar on deploy page
  • CPU usage collector (background /proc/stat sampling, 5s interval)
  • CPU usage bar on dashboard with load average display
  • Temperature reading from /sys/class/thermal (with /host/sys Docker mount)
  • Temperature display with colored indicator dot (green/yellow/red)
  • Central job scheduler (replaces ad-hoc goroutines)
  • Healthchecks.io-compatible HTTP pinger with retry logic
  • System health checks (disk, memory, CPU, temp, Docker, protected containers)
  • Customer notifications (email/Telegram)

Phase 3 — Backups COMPLETE

  • DB auto-discovery (PostgreSQL/MariaDB containers via docker inspect)
  • DB dump engine (pg_dump/mariadb-dump via docker exec, atomic writes)
  • Restic integration (auto-init, snapshot, prune, check, stats)
  • Restic password auto-generation (no manual setup needed)
  • Backup orchestrator (DB dumps + restic + weekly prune)
  • Backup status on dashboard (last run, DB count, repo stats)
  • Manual backup trigger from UI ("Mentés most" button)
  • GET /api/backup/status and POST /api/backup/run endpoints
  • Restore workflow

Phase 4 — Git Sync & Updates

  • Periodic git pull for stack definitions (git sync module)
  • Manual sync button on Alkalmazások page ("Sablonok frissítése")
  • Sync status in /api/system/info
  • Update classification (optional/required/security)
  • Update window enforcement
  • Dashboard update notifications with "Update" button

Phase 5 — Self-Update & Resilience

  • Self-update check & execution
  • Pre-update config backup
  • Health-based rollback mechanism
  • Config export/import

Phase 6 — Central Management (future)

  • API authentication for remote management
  • Central dashboard on k3s querying all customer controllers
  • Fleet-wide update management
Repository Purpose
deploy-felhom-compose This repo — controller + deploy scripts
app-catalog-felhom.eu Docker Compose templates + .felhom.yml metadata
felhom.eu Website + app assets + felhom infra manifests
homelab-manifests k3s cluster manifests (dooplex.hu)