# felhom-controller **Central management container for Felhom home servers.** Replaces Portainer + scattered systemd scripts with a single, lightweight container that provides: - Hungarian-language web dashboard for customers - Docker Compose stack management (start/stop/update) - Interactive first-deployment flow with auto-generated secrets - Health-aware container state monitoring (starting/unhealthy/running) - Backup orchestration (DB dumps + restic snapshots) — Phase 3 - System health monitoring with Healthchecks pings — Phase 2 - Git-based stack synchronization with update management — Phase 4 - Self-update with automatic rollback on failure — Phase 5 ## Current Status **Phase 1 — Stack Manager + Deploy Flow: ✅ COMPLETE** The controller is built, deployed, and running on the N100 test node (demo-felhom.eu). First application (Paperless-ngx) successfully deployed end-to-end through the dashboard on 2026-02-13. **Milestone achieved:** Full deploy cycle works — customer clicks "Telepítés", fills in fields, controller generates secrets, saves app.yaml, runs `docker compose up -d`, and the app comes up with Traefik routing and health checks. The dashboard correctly shows real-time container states including health substatus (starting → healthy → running). Current version: **v0.2.1** ### What works - Dashboard with live container state (green/orange/yellow/red) - Deploy form with password validation, auto-generation, and field locking - Stack operations: start, stop, restart, update (pull + recreate) - Log viewer for each stack - Deploy page doubles as config viewer (read-only mode for deployed apps) - Periodic stack rescanning (every 2 minutes) - Manual rescan endpoint (`POST /api/stacks/rescan`) - Alphabetically sorted stack display (consistent card ordering) - Protected stacks (traefik, cloudflared, felhom-controller) can't be stopped - System info bar on dashboard: RAM, SSD, and HDD usage with progress bars - Docker Compose memory limits enforced via `deploy.resources.limits.memory` - Pre-deploy memory validation (hard block on `mem_request` overcommit, soft warning on `mem_limit` overcommit) - Memory summary bar shown on deploy page before deployment - Felhom.eu logo SVG in sidebar and login page - Verbose debug logging with operation timing, post-start container state checks, and image pull detection ### Known issues / next priorities - Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel) - No undo/delete for deployed apps yet - Dashboard theme doesn't yet match felhom.eu dark theme ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ Customer Hardware (N100 mini PC / Raspberry Pi) │ │ │ │ ┌──────────┐ ┌────────────────────────────────────────────┐ │ │ │ Traefik │ │ felhom-controller │ │ │ │ (reverse │──▶│ │ │ │ │ proxy) │ │ ┌──────────┐ ┌─────────────────────────┐│ │ │ └──────────┘ │ │ Web UI │ │ Stack Manager ││ │ │ │ │ (HU dash │ │ (compose up/down/pull, ││ │ │ ┌──────────┐ │ │ board) │ │ git sync, update mgmt) ││ │ │ │cloudflared│ │ └──────────┘ └─────────────────────────┘│ │ │ │ (tunnel) │ │ ┌──────────┐ ┌─────────────────────────┐│ │ │ └──────────┘ │ │ Backup │ │ Monitor & Pinger ││ │ │ │ │ (db dump │ │ (healthchecks pings, ││ │ │ ┌──────────┐ │ │ restic) │ │ system metrics) ││ │ │ │ App │ │ └──────────┘ └─────────────────────────┘│ │ │ │ stacks │ │ ┌──────────┐ ┌─────────────────────────┐│ │ │ │ (docker │ │ │Scheduler │ │ REST API ││ │ │ │ compose) │ │ │(cron-like│ │ (for UI + remote mgmt) ││ │ │ └──────────┘ │ │ jobs) │ └─────────────────────────┘│ │ │ │ └──────────┘ │ │ │ └────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ pings │ git pull ▼ ▼ status.felhom.eu gitea.dooplex.hu (Healthchecks on k3s) (stack definitions) ``` ## Repository Layout This is the `controller/` subfolder of the `deploy-felhom-compose` repository. ``` controller/ ├── cmd/controller/main.go # Entry point, wires all modules ├── internal/ │ ├── config/config.go # YAML loader, validation, env overrides │ ├── stacks/ │ │ ├── manager.go # Stack scanning, compose ops, container status, debug logging │ │ ├── metadata.go # Parse .felhom.yml app metadata │ │ └── deploy.go # First-deploy flow: secret gen, app.yaml, compose up │ ├── sync/ │ │ └── sync.go # Git sync: clone/pull app catalog, content-hash copy │ ├── api/router.go # REST API endpoints │ ├── system/ │ │ ├── info.go # SystemInfo struct │ │ ├── info_linux.go # Linux: /proc/meminfo + statfs │ │ └── info_other.go # Non-Linux stub │ └── web/ │ ├── server.go # HTTP server, auth, page handlers, asset serving │ └── templates.go # Embedded HTML templates + CSS (Hungarian UI) ├── configs/ │ ├── controller.yaml.example # Full config reference (infrastructure only) │ └── example-felhom-metadata.yml # .felhom.yml format reference ├── scripts/hashpass.go # Bcrypt password hash generator ├── assets/ # App logos + screenshots (gitignored, synced at build) ├── docs/BUILDING.md # Container image build & registry guide ├── Dockerfile # Multi-stage build (Go 1.22 + debian-slim) ├── docker-compose.yml # Controller's own compose definition ├── Makefile # Build targets + asset sync └── go.mod ``` ## Module Overview | Module | Path | Status | Responsibility | |--------|------|--------|----------------| | **Config** | `internal/config/` | ✅ Done | Load & validate controller.yaml, env overrides | | **Stacks** | `internal/stacks/` | ✅ Done | Compose operations, scanning, metadata, deploy flow | | **API** | `internal/api/` | ✅ Done | REST endpoints (stacks, deploy, rescan, system info, health) | | **System** | `internal/system/` | ✅ Done | System resource info (RAM, disk usage) for dashboard & API | | **Web** | `internal/web/` | ✅ Done | Hungarian dashboard, auth, deploy pages, asset serving | | **Sync** | `internal/sync/` | ✅ Done | Git-based app catalog sync (clone/pull, content-hash copy) | | **Backup** | `internal/backup/` | 📲 Phase 3 | DB dumps, restic snapshots, restore | | **Monitor** | `internal/monitor/` | 📲 Phase 2 | Health checks, Healthchecks pings, system metrics | | **Scheduler** | `internal/scheduler/` | 📲 Phase 2 | Cron-like job runner for all periodic tasks | ## Stack Management ### How stacks get onto the machine 1. During initial setup, `deploy-felhom-compose.sh` clones the app catalog 2. Compose files + `.felhom.yml` metadata land in `/opt/docker/stacks//` 3. The controller periodically pulls from Git to detect changes (Phase 4) ### First deployment flow (via dashboard) 1. Customer sees app card with "🚀 Telepítés" (Deploy) button 2. Clicks → deploy page shows: - **Auto-filled**: DOMAIN (from controller config), read-only - **Auto-generated**: DB passwords, secret keys (shown as "✓ Generated") - **User input**: HDD path, admin password, language, etc. - **"🎲 Generálás"** button next to password fields 3. Clicks "Telepítés" → controller: - **Memory validation**: checks `mem_request` against available system RAM (see below) - Validates all required fields (password fields must be explicitly filled or generated) - Generates auto-secrets (DB passwords, hex keys) - Saves `app.yaml` (env vars + locked fields list) - Runs `docker compose up -d` with env vars injected - Updates in-memory state immediately (no stale "Telepítés" button) 4. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only 5. "Részletek" button opens deploy page in read-only mode showing current config ### Memory validation during deploy Before deploying an app, the controller checks if there's enough RAM. This uses the Kubernetes-inspired `mem_request` / `mem_limit` model: | Field | In `.felhom.yml` | Purpose | Validation | |-------|-------------------|---------|------------| | `mem_request` | `resources.mem_request: "500M"` | Expected memory usage during normal operation | **Hard block** — sum of requests must not exceed usable RAM | | `mem_limit` | `resources.mem_limit: "1152M"` | Docker `deploy.resources.limits.memory` total across all containers | **Soft warning** — overcommit is allowed for limits | | `pi_compatible` | `resources.pi_compatible: true` | Whether the app can run on Raspberry Pi | Display-only hint | | `needs_hdd` | `resources.needs_hdd: true` | Whether the app needs external storage | Display-only hint | **How it works:** - `usable_memory = total_ram - reserved_memory_mb` (default: 384MB reserved for OS + controller) - If `sum(deployed mem_requests) + new_mem_request > usable_memory` → deploy is **blocked** with error - If `sum(deployed mem_limits) + new_mem_limit > total_ram` → deploy proceeds with **warning** - Apps without `mem_request` set are treated as 0MB (never blocked) - The deploy page shows a memory summary bar before the user clicks deploy Configure the reserved memory via `controller.yaml`: ```yaml system: reserved_memory_mb: 384 # default ``` ### Container state display The dashboard shows health-aware container states with distinct colors: | State | Color | Label | Meaning | |-------|-------|-------|---------| | Running + healthy | 🟢 Green | "Fut" | All containers running and healthy | | Running + health: starting | 🟠 Orange | "Indulás..." | Container up but healthcheck not yet passed | | Running + unhealthy | 🟡 Yellow | "Nem egészséges" | Container running but healthcheck failing | | Stopped/exited | 🔴 Red | "Leállítva" | All containers stopped | | Restarting | 🟡 Yellow | "Újraindítás..." | Container in restart loop | | Not deployed | ⚪ Gray | "Nincs telepítve" | Compose file exists but not yet deployed | Action buttons adapt: "operational" states (running/starting/unhealthy/restarting) show restart/stop, while stopped states show a start button. ### Update strategy (Phase 4) Stack updates are classified in the Git repository via markers: | Marker | Behavior | |--------|----------| | No marker | Optional update — shown on dashboard, customer clicks "Update" | | `UPDATE_REQUIRED=true` | Mandatory — auto-applied during next update window | | `UPDATE_SECURITY=true` | Critical — applied immediately (within minutes) | ### Protected stacks The following stacks cannot be stopped from the customer UI: - `traefik` (reverse proxy) - `cloudflared` (tunnel) - `felhom-controller` (this container) ## Logging The controller uses two-tier logging controlled by `logging.level` in `controller.yaml`: ```yaml logging: level: debug # debug | info | warn | error (default: info) file: "" # optional log file path max_size_mb: 10 max_files: 3 ``` Can also be set via environment variable: `FELHOM_LOGGING_LEVEL=debug` ### What gets logged at each level | Level | What's logged | |-------|--------------| | **info** | Operation success/failure with elapsed time, post-start container states, stack scan counts, deploy memory checks | | **debug** | All of above + env var keys per compose command, local image availability checks, compose command timing, log fetch byte counts | ### Example output (debug level) ``` [INFO] Starting stack: immich [DEBUG] Env vars for compose: [DOMAIN, DB_PASSWORD, HDD_PATH] (3 app + 42 system) [DEBUG] Running: docker compose up -d (in /opt/docker/stacks/immich) [DEBUG] Command completed: docker compose up -d (took 12.3s) [INFO] Stack immich started successfully (took 12.3s) [INFO] Stack immich post-start status: [INFO] immich-server ghcr.io/immich-app/immich-server:release running Up 3 seconds (health: starting) [INFO] immich-postgres docker.io/tensorchord/pgvecto-rs:pg16... running Up 3 seconds (healthy) [INFO] immich-redis docker.io/library/redis:7-alpine running Up 3 seconds (healthy) ``` On failure: ``` [ERROR] Command failed: docker compose up -d (in /opt/docker/stacks/immich) — exit code 1 (took 2.1s) [ERROR] stderr: Error response from daemon: pull access denied... [ERROR] Stack immich start failed after 2.1s: exit code 1 ``` ### Security - Env var **values** are never logged — only keys appear in debug output - stdout/stderr in error logs are truncated to 500 characters to prevent log spam ## Configuration ### Controller config (infrastructure only) Single YAML file per customer: `/opt/docker/felhom-controller/controller.yaml` Contains customer identity, infrastructure secrets, backup/monitoring settings. Does **not** contain app-specific config (HDD paths, DB passwords, etc.). See `configs/controller.yaml.example` for the full reference. ### Per-app config (created during deployment) Each deployed app gets an `app.yaml` in its stack directory: ```yaml # /opt/docker/stacks/paperless-ngx/app.yaml # Auto-generated by felhom-controller — do not edit locked fields manually deployed: true deployed_at: "2026-02-13T21:10:00Z" env: DOMAIN: "demo-felhom.eu" DB_PASSWORD: "a7f2b9c1e4d..." # locked PAPERLESS_SECRET_KEY: "8b3e..." # locked PAPERLESS_ADMIN_USER: "admin" # editable PAPERLESS_OCR_LANGUAGE: "hun+eng" # editable HDD_PATH: "/mnt/hdd_placeholder" # locked locked_fields: - DB_PASSWORD - PAPERLESS_SECRET_KEY - DOMAIN - HDD_PATH ``` ### App assets (logos, screenshots) Baked into the container image at build time — no external dependencies at runtime. Synced from the felhom.eu website repo before building. Served locally at `/static/assets/`. Logos try SVG first, fall back to PNG. ## Build & Deploy Source: `https://gitea.dooplex.hu/admin/deploy-felhom-compose` → `controller/` subfolder. Build happens outside the repo in `~/build/felhom-controller/` to keep the repo clean. See `docs/BUILDING.md` for the full guide. ```bash # Quick build (current platform only) cd ~/build/felhom-controller ./build.sh 0.2.1 # Build + push to Gitea registry ./build.sh 0.2.1 --push ``` ### Deploy on customer node ```bash # Pull new image docker pull gitea.dooplex.hu/admin/felhom-controller:0.2.1 # IMPORTANT: use 'up -d', NOT 'restart' — restart doesn't pick up new images cd /opt/docker/felhom-controller docker compose up -d ``` ## Test Environments | Node | Hardware | Domain | IP | Status | |------|----------|--------|----|--------| | demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | ✅ Controller v0.2.1 + Paperless-ngx running | | pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | — | 📲 Not yet tested | ### First deployment log (Paperless-ngx on demo-felhom) - **Date:** 2026-02-13 - **App:** Paperless-ngx (document management) - **Deploy method:** Dashboard UI → "Telepítés" button - **Issues encountered & resolved:** 1. Password fields accepted empty values → Added server-side + client-side validation 2. "Telepítés" button appeared for already-deployed apps → Fixed in-memory Deployed flag update 3. Green status shown for `(health: starting)` containers → Added health-aware state parsing 4. Stack cards switched positions on refresh → Added alphabetical sorting in GetStacks() 5. "Részletek" button did nothing for deployed apps → Redirects to deploy page (read-only) 6. OCR crash: `PAPERLESS_OCR_LANGUAGE=hun` not installed → Added `PAPERLESS_OCR_LANGUAGES` (plural) to docker-compose 7. Container restart vs recreate: `docker compose restart` doesn't pick up new images → Documented: always use `docker compose up -d` ## REST API | Method | Endpoint | Auth | Description | |--------|----------|------|-------------| | GET | `/api/health` | No | Health check (for monitoring) | | GET | `/api/stacks` | Yes | List all stacks | | GET | `/api/stacks/{name}` | Yes | Stack details | | GET | `/api/stacks/{name}/deploy-fields` | Yes | Get deploy form fields | | POST | `/api/stacks/{name}/deploy` | Yes | First-time deploy with config | | POST | `/api/stacks/{name}/start` | Yes | Start stack | | POST | `/api/stacks/{name}/stop` | Yes | Stop stack (not protected) | | POST | `/api/stacks/{name}/restart` | Yes | Restart stack | | POST | `/api/stacks/{name}/update` | Yes | Pull images + recreate | | GET | `/api/stacks/{name}/logs` | Yes | Container logs | | POST | `/api/stacks/rescan` | Yes | Trigger manual stack discovery | | GET | `/api/system/info` | Yes | System resource usage (RAM, disk, HDD) | ## Status & Roadmap ### Phase 1 — Stack Manager + Deploy Flow ✅ COMPLETE - [x] Project skeleton & config format - [x] .felhom.yml app metadata format with deploy fields - [x] Per-app config persistence (app.yaml) - [x] Secret generation engine (password, hex, static) - [x] Stack catalog (read compose files + metadata from disk) - [x] Docker Compose operations (up/down/pull/ps/logs) - [x] Deploy flow with interactive field input - [x] Password validation (server-side + client-side, no empty passwords) - [x] Basic web dashboard with start/stop/deploy buttons - [x] Health-aware container states (starting/unhealthy/running) - [x] REST API for stack + deploy operations - [x] Simple web authentication (bcrypt sessions) - [x] App assets baked into container (SVG/PNG logos, webp screenshots) - [x] Container image build pipeline (Dockerfile + build.sh) - [x] Build + push to Gitea container registry - [x] Deploy on N100 test node — dashboard accessible - [x] Stack scanning + display working - [x] **First app deployed: Paperless-ngx via dashboard** (2026-02-13) - [x] Periodic stack rescanning (every 2 minutes) - [x] Alphabetically sorted stack display - [x] Deploy page doubles as read-only config viewer for deployed apps ### Phase 2 — Monitoring & Health - [x] System metrics on dashboard (RAM, SSD, HDD usage bars) - [x] `/api/system/info` endpoint with live resource data - [x] Pre-deploy memory validation (mem_request hard block, mem_limit soft warning) - [x] Memory summary bar on deploy page - [ ] CPU and temperature metrics - [ ] Healthchecks.io ping integration - [ ] Customer notifications (email/Telegram) ### Phase 3 — Backups - [ ] DB dump engine (PostgreSQL, MariaDB/MySQL, SQLite) - [ ] Restic integration (snapshot, prune, check) - [ ] Backup status on dashboard - [ ] Manual backup trigger from UI - [ ] Restore workflow ### Phase 4 — Git Sync & Updates - [x] Periodic git pull for stack definitions (git sync module) - [x] Manual sync button on Alkalmazások page ("Sablonok frissítése") - [x] Sync status in `/api/system/info` - [ ] Update classification (optional/required/security) - [ ] Update window enforcement - [ ] Dashboard update notifications with "Update" button ### Phase 5 — Self-Update & Resilience - [ ] Self-update check & execution - [ ] Pre-update config backup - [ ] Health-based rollback mechanism - [ ] Config export/import ### Phase 6 — Central Management (future) - [ ] API authentication for remote management - [ ] Central dashboard on k3s querying all customer controllers - [ ] Fleet-wide update management ## Related Repositories | Repository | Purpose | |------------|---------| | [deploy-felhom-compose](https://gitea.dooplex.hu/admin/deploy-felhom-compose) | This repo — controller + deploy scripts | | [app-catalog-felhom.eu](https://gitea.dooplex.hu/admin/app-catalog-felhom.eu) | Docker Compose templates + .felhom.yml metadata | | [felhom.eu](https://gitea.dooplex.hu/admin/felhom.eu) | Website + app assets + felhom infra manifests | | [homelab-manifests](https://gitea.dooplex.hu/admin/homelab-manifests) | k3s cluster manifests (dooplex.hu) |