Files

T

admin 4923afa6a7 v0.15.7: Fix backup page storage display & rename system drive label

- Add StorageBars to backupsHandler so all registered storage paths appear
- Update backups.html to use StorageBars loop (replacing single HDDConfigured block)
- Rename "SSD (/)" → "Rendszer (/)" on backup, monitoring, and dashboard pages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-19 16:04:36 +01:00

cmd/controller

Fix bugs from BUGHUNT.md: restore race conditions, infra backup, DR wiring, docker-setup.sh, restore.html

2026-02-19 14:06:42 +01:00

configs

Major rewrite of scripts/docker-setup.sh (v5.0)

2026-02-19 11:12:39 +01:00

internal

v0.15.7: Fix backup page storage display & rename system drive label

2026-02-19 16:04:36 +01:00

mnt/user-data/outputs/felhom-controller

added controller

2026-02-13 18:54:08 +01:00

scripts

restructured files, and updated for build outside

2026-02-13 19:10:35 +01:00

.gitignore

updated DB query and build instructions

2026-02-16 11:19:18 +01:00

build.sh

made script executable

2026-02-13 19:12:32 +01:00

BUILDING.md

restructured files, and updated for build outside

2026-02-13 19:10:35 +01:00

docker-compose.yml

fix(storage): fix FormatAndMount for container environment

2026-02-17 11:38:52 +01:00

Dockerfile

fix(dockerfile): add fdisk package for sfdisk (partition table writing)

2026-02-17 11:15:39 +01:00

go.mod

v0.5.0: Backup bugfixes + monitoring page with metrics store

2026-02-16 10:14:46 +01:00

go.sum

v0.5.0: Backup bugfixes + monitoring page with metrics store

2026-02-16 10:14:46 +01:00

Makefile

added controller

2026-02-13 18:54:08 +01:00

README.md

v0.15.5: Disaster recovery — Hub-based infra backup, auto-mount, restore UI

2026-02-19 13:16:46 +01:00

README.md

felhom-controller

Central management container for Felhom home servers.

A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.

Current version: v0.15.5

Architecture
Features
Repository Layout
Configuration
REST API
Build & Deploy
Roadmap

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Customer Hardware (N100 mini PC / Raspberry Pi)                │
│                                                                 │
│  ┌──────────┐   ┌────────────────────────────────────────────┐  │
│  │ Traefik  │   │  felhom-controller (privileged container)  │  │
│  │ (reverse │──▶│                                            │  │
│  │  proxy)  │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Web UI   │  │ Stack Manager           ││  │
│                 │  │ (HU dash │  │ (compose ops, git sync,  ││  │
│  ┌──────────┐   │  │  board)  │  │  deploy, delete, update) ││  │
│  │cloudflared│   │  └──────────┘  └─────────────────────────┘│  │
│  │ (tunnel) │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Backup   │  │ Storage Manager         ││  │
│                 │  │ (3-layer │  │ (disk scan, format,     ││  │
│  ┌──────────┐   │  │  restic) │  │  mount, migrate)        ││  │
│  │ App      │   │  └──────────┘  └─────────────────────────┘│  │
│  │ stacks   │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  │ (docker  │   │  │Scheduler │  │ Monitor & Metrics       ││  │
│  │ compose) │   │  │(cron-like│  │ (health, pings, SQLite  ││  │
│  └──────────┘   │  │  jobs)   │  │  time-series, Chart.js) ││  │
│                 │  └──────────┘  └─────────────────────────┘│  │
│                 │  ┌──────────┐  ┌─────────────────────────┐│  │
│                 │  │ Notify   │  │ REST API + Hub Reporter ││  │
│                 │  │ (email)  │  │ (JSON push to hub)      ││  │
│                 │  └──────────┘  └─────────────────────────┘│  │
│                 └────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
         │ pings              │ JSON push          │ git pull
         ▼                    ▼                    ▼
  status.felhom.eu      hub.felhom.eu       gitea.dooplex.hu
  (Healthchecks)        (central dashboard)  (stack definitions)

Key Architecture Decisions

Pure Go, no frameworks — stdlib net/http + html/template. Only external deps: bcrypt, yaml.v3, modernc.org/sqlite (pure Go, no CGO).
Privileged container — Required for disk operations (format, mount, fstab), /dev access, and Docker socket control.
/host-dev indirection — Docker overrides /dev with a tmpfs. The host's /dev is mounted at /host-dev to access block devices.
StackDataProvider interface — Breaks circular import between backup and stacks packages. Implemented by stackAdapter in main.go. Provides GetStackHDDPath() for per-drive backup routing.
Atomic file writes — All persistent state (settings.json, app.yaml) written to .tmp then os.Rename for crash safety.
go:embed templates — All HTML/CSS/JS compiled into the binary. No runtime file dependencies.
Europe/Budapest timezone — All scheduled jobs, timestamps, and UI labels use Hungarian timezone.

Module Map

Module	Path	Responsibility
Config	`internal/config/`	YAML loader, validation, `FELHOM_*` env overrides
Settings	`internal/settings/`	Runtime-mutable `settings.json` (passwords, backup prefs, storage paths, notifications)
Stacks	`internal/stacks/`	Compose operations, scanning, `.felhom.yml` metadata, deploy/delete flow
Sync	`internal/sync/`	Git-based app catalog sync (clone/pull, content-hash copy)
Backup	`internal/backup/`	Per-drive 3-layer backup: DB dumps → restic snapshots → cross-drive copies, restore
Storage	`internal/storage/`	Disk scanning (`lsblk`), partitioning (`sfdisk`), formatting (`mkfs.ext4`), mounting, data migration (`rsync`)
System	`internal/system/`	System info (`/proc`), CPU collector, mount points, disk usage, FS info
Monitor	`internal/monitor/`	Healthchecks.io pinger, system health checks
Metrics	`internal/metrics/`	SQLite time-series store, system + container metric collection
Scheduler	`internal/scheduler/`	Central job scheduler (periodic + daily, skip-if-running, panic recovery)
Notify	`internal/notify/`	Email notifications via hub relay, preference sync, per-event cooldowns
Report	`internal/report/`	Hub report builder + HTTP pusher (system, stacks, backup, health)
API	`internal/api/`	REST JSON endpoints
Web	`internal/web/`	Hungarian dashboard, auth, page handlers, template functions, alerts

Features

1. App Management

The controller manages Docker Compose stacks through a complete lifecycle: catalog sync, first-time deployment, runtime operations, and deletion.

Git Sync (`internal/sync/`)

The app catalog lives in a separate Git repository. The controller:

Shallow-clones the catalog on startup
Periodically fetches updates (configurable, default 15 min)
Copies only docker-compose.yml and .felhom.yml to the stacks directory
Never overwrites app.yaml or .env (user secrets are safe)
Uses SHA-256 content hashing — only writes files that actually changed
Triggers stack rescan after sync so the dashboard updates immediately
Manual sync via "Sablonok frissitese" button or POST /api/sync

First-Time Deploy Flow

Customer sees app card with "Telepites" button
Deploy page shows auto-filled fields (domain), auto-generated secrets (DB passwords, hex keys), and user-configurable inputs (admin password, language, storage path)
checkBeforeDeploy() JS guard fetches live state first (prevents double-deploy from another tab)
Memory validation checks mem_request against available RAM:
- usable_memory = total_ram - reserved_memory_mb (default 384MB reserved)
- Hard block if requests exceed usable memory
- Soft warning if limits exceed total RAM (overcommit OK)
Controller generates secrets, saves app.yaml, sets in-memory Deployed flag before docker compose up -d (avoids stale UI during slow image pulls), reverts on failure
3-step progress panel polls GET /api/stacks/{name} every 3s: config saved → containers starting → health check passed
Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only

App Info Pages

Each app can define rich metadata in .felhom.yml:

app_info: tagline, use_cases, first_steps, prerequisites, default_creds, docs_url
optional_config: groups of post-deploy configurable env vars (e.g., API keys for metadata providers)
resources: mem_request, mem_limit, pi_compatible, needs_hdd

The /apps/{slug} page renders hero section, screenshots, setup guide, and optional config form.

Stack Operations

Operation	What it does
Start	`docker compose up -d`
Stop	`docker compose stop` (blocked for protected stacks)
Restart	`docker compose restart`
Update	`docker compose pull` + `docker compose up -d`
Delete	`docker compose down --rmi local --volumes` + optional HDD data cleanup

Protected stacks (traefik, cloudflared, felhom-controller) cannot be stopped or deleted from the UI. Restart is allowed.

Orphan detection: Deployed stacks with no matching catalog template are marked as orphaned with an "Elavult" badge and can be safely deleted.

Container State Display

State	Color	Label	Meaning
Running + healthy	Green	"Fut"	All containers running and healthy
Running + starting	Orange	"Indulas..."	Healthcheck not yet passed
Running + unhealthy	Yellow	"Nem egeszseges"	Healthcheck failing
Stopped/exited	Red	"Leallitva"	All containers stopped
Restarting	Yellow	"Ujrainditas..."	Restart loop
Not deployed	Gray	"Nincs telepitve"	Compose file exists, not deployed

2. Backup System

The backup system implements a 3-2-1 backup architecture. Each tier is a complete, self-sufficient backup — any single tier can fully restore an app.

Tier	Contents	Location	Can fully restore?
1. Nightly restic	DB + Config + User data	Same drive as app	Yes (not against drive failure)
2. Cross-drive	DB + Config + User data	Different physical device	Yes
3. Remote	Everything	Cloud / remote server	Future

Key principles:

User data backup is mandatory — every app with HDD bind mounts is included automatically. There is no per-app toggle.
Each tier includes everything needed to restore: DB dumps, config, and user data. No tier depends on another tier's data.
Tier 2 is configurable for ALL apps — not just apps with HDD data. Non-HDD apps back up config + DB dumps to the secondary drive (small but protects against drive failure).
The AppBackupPrefs.Enabled field in settings.json is legacy and not read by any code.

Per-app Tier 2 contents by app type:

App type	Tier 2 contents	Example
HDD + DB	Config + DB + User data	Immich, Paperless-ngx
HDD, no DB	Config + User data	—
DB, no HDD	Config + DB	Mealie, Vikunja
Config only	Config	Gokapi, Homepage

Tier 1: Nightly Backup (mandatory, same drive)

The nightly backup has two phases that run sequentially. All paths are per-drive — each physical drive gets its own restic repo and per-app DB dump directories.

Drive layout (v0.14.1):

<drive>/
├── appdata/<app>/              ← app user data
└── backups/
    └── primary/
        ├── restic/             ← one restic repo per drive (all apps on this drive)
        └── <app>/db-dumps/     ← per-app DB dump files

Path computation is centralized in backup/paths.go:

PrimaryResticRepoPath(drivePath) → <drive>/backups/primary/restic/
AppDBDumpPath(drivePath, stackName) → <drive>/backups/primary/<stack>/db-dumps/
AppDataDir(drivePath, stackName) → <drive>/appdata/<stack>/
SecondaryInfraPath(drivePath) → <drive>/backups/secondary/_infra/

Phase 1 — Database Dumps (internal/backup/dbdump.go, scheduled 02:30)

Auto-discovery of PostgreSQL and MariaDB containers via docker ps + docker inspect
Dumps via docker exec pg_dump / docker exec mariadb-dump with 5-minute timeout
Dumps are written to the app's home drive: AppDBDumpPath(appDrive, stackName)
Atomic writes (.tmp → .sql) to prevent corruption
Validation after each dump: checks file size, header presence, counts CREATE TABLE
Results cached in settings.json surviving container restarts

Phase 2 — Restic Snapshot (internal/backup/restic.go, scheduled 03:00)

Apps are grouped by drive via groupStacksByDrive() — each drive's apps are backed up to that drive's restic repo
App drive resolution: GetStackHDDPath() (from StackDataProvider) → falls back to SystemDataPath
Auto-generated repository password (32 random bytes, base64url), shared across all repos, synced to hub
Paths included in every per-drive snapshot:
- Per-app DB dump dirs on that drive
- Per-app HDD mount paths (user data)
- Stacks dir (compose.yml + app.yaml + .felhom.yml for all apps)
- controller.yaml (controller config)
Auto-detects and unlocks stale locks (restic repo lock)
Weekly prune on Sundays with configurable retention (keep-daily, keep-weekly, keep-monthly)
Weekly integrity check (restic check) on Sunday 04:00 — checks all primary repos

Protects against: accidental deletion, data corruption, point-in-time rollback. Does NOT protect against drive failure (backup is on the same physical drive).

Tier 2: Cross-Drive Backup (opt-in, different device) (`internal/backup/crossdrive.go`)

Complete backup to a different physical drive. Available for all apps — apps with HDD data back up config + DB + user data; apps without HDD back up config + DB dumps only.

Auto-enable for small apps (v0.14.1): Apps without HDD mounts (config-only, DB-only) are automatically configured for daily rsync Tier 2 when ≥2 storage paths are registered. AutoEnableSmallApps() runs at the start of each nightly backup cycle. Never overwrites existing user-configured cross-drive settings (even disabled ones).
Infrastructure config backup (v0.14.1): syncInfraConfig() rsyncs the stacks directory and controller.yaml to <dest>/backups/secondary/_infra/ on every secondary destination drive. Runs before per-app backups. Cross-drive restic also includes infra paths.
Two methods:
- rsync — Simple mirror with --delete (fast, no versioning, browsable on disk)
- restic — Versioned, deduplicated, encrypted (shared repo across apps, not browsable)
Per-app configuration in settings.json: destination path, method, schedule (daily/weekly/manual)
Pre-backup DB dump: DumpStackDB() runs fresh pg_dump/mariadb-dump before each cross-drive backup; non-fatal on failure (wired via DBDumper interface to avoid circular imports)
Empty mounts allowed: RunAppBackup accepts apps with no HDD mounts — the rsync mount loop simply doesn't execute, but DB + config copy still runs
Drive-type-aware validation (ValidateDestination):

Destination type Space checks

External mount (different device than /) Block if <100 MB free

System drive (same device as /) Require ≥10 GB free AND <90% used; logged warning

Destination type	Space checks
External mount (different device than `/`)	Block if <100 MB free
System drive (same device as `/`)	Require ≥10 GB free AND <90% used; logged warning

Secondary drive layout (v0.14.1):

<dest-drive>/backups/secondary/
├── _infra/              ← infrastructure config mirror (v0.14.1)
│   ├── controller.yaml
│   └── stacks/          ← full stacks dir (all app configs)
├── <app>/rsync/         ← per-app rsync mirror
│   ├── _db/             ← DB dump files
│   ├── _config/         ← compose.yml, app.yaml, .felhom.yml
│   └── <user data>      ← HDD mount contents (if app has HDD data)
└── restic/              ← shared restic repo (all cross-drive apps)

DB dump files read from per-app home drive path (AppDBDumpPath)
_ prefix directories prevent collision with user data
For non-HDD apps, only _db/ and _config/ are present (no user data directory)

Restic backup paths: includes HDD mounts (if any) + config dir + per-app DB dump dir from home drive + stacks dir + controller.yaml (infra, v0.14.1)
Safety guards: destination ≠ source, path-overlap check (HDD mounts only), writable check
Chained execution: runs immediately after nightly restic — daily apps every night, weekly apps on Sundays
Per-app concurrency lock prevents overlapping runs
Status (last_run, duration, size, error) persisted to settings.json

Protects against: primary drive failure, drive theft/damage.

Tier 3: Remote Backup (future)

Complete offsite backup for disaster recovery. Not yet implemented. Placeholder shown in UI ("3. mentés — Hamarosan").

Restore (`internal/backup/restore.go`)

All deployed apps appear in the restore dropdown — every app has restic snapshot data (stacks dir + DB dumps are always backed up).

App type	Config restored	DB restored	User data restored
Has HDD data	Yes	Yes	Yes (always — backup is mandatory)
DB only, no HDD	Yes	Yes	n/a
No DB, no HDD	Yes	—	n/a

Snapshot API returns ALL snapshots unfiltered — older snapshots still allow config+DB restore; RestoreApp extracts whatever paths are available
Restore type info shown per-app when selected in dropdown (Hungarian banners):
- Has HDD: "Teljes visszaállitas: adatbazis + konfiguracio + felhasznaloi adatok"
- Has DB, no HDD: "Adatbazis es konfiguracio visszaallitasa"
- No DB, no HDD: "Csak konfiguracio visszaallitasa"
Execution flow: stop app → resolve app's home drive → restic restore <id> --target / --include <path>... from per-drive repo → restart app
Restic repo resolved via PrimaryResticRepoPath(appDrivePath)
DB dumps restored from AppDBDumpPath(appDrivePath, stackName)
Running flag prevents concurrent backup/restore operations
Snapshot ID validated (8-64 lowercase hex)

Note: Restore currently uses Tier 1 (primary restic repo on app's home drive) only. Restoring from Tier 2 (cross-drive) is a future enhancement.

Backup Page UI (`internal/web/templates/backups.html`)

Unified per-app status table with expandable rows showing per-tier backup status:

Status dot per app:

Dot color	Meaning
Green	2+ tiers configured with successful backups + destination healthy
Yellow	Only 1 tier, or Tier 2 failing, or Tier 2 configured but never run
Red	Tier 2 destination blocked or inaccessible

Every app starts as yellow (1 tier only). Green requires Tier 2 configured with successful backup.

Per-app backup tiers (3 rows per app):

1. mentes (Tier 1, always present) — Auto badge + "helyi" + last run + contents (e.g., "DB + Konfig + Adatok")
2. mentes (Tier 2, configurable for ALL apps) — one of:
- Configured: method (rsync/restic) + destination + schedule + last run + status + contents + browsable indicator (folder icon for rsync) + action buttons
- Not configured: "1. mentes auto" + "Nincs 2. masolat" + settings link
3. mentes (Tier 3, placeholder) — grayed out "Hamarosan" + "tavoli (offsite)" + future note

Backup contents per app (shown per tier):

Apps with DB + HDD: "DB + Konfig + Adatok"
Apps with DB only: "DB + Konfig"
Apps with HDD, no DB: "Konfig + Adatok"
Apps with neither: "Konfig"

Deploy page shows cross-drive (Tier 2) configuration form for all deployed apps, not just those with HDD data. Non-HDD apps can configure destination, method, and schedule.

Other sections:

Schedule overview with next run times for DB dump, restic, prune
Snapshot history table (last 20 snapshots aggregated from all per-drive repos, sorted by time)
Storage overview card (total size across repos, snapshot count, DB dump count/size, encryption key with show/copy)
Restore section: app dropdown → snapshot dropdown → restore type info → confirmation checkbox → execute

3. Storage Management

The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration.

Disk Scanning (`internal/storage/scan.go`)

ScanDisks() uses lsblk -J -b for block device enumeration
System disk detection via host fstab parsing (/host-fstab) + UUID resolution via blkid
Partitions enriched with filesystem type, UUID, and label from direct blkid probing (Docker containers have incomplete udev cache)
Returns AvailableDisks (non-system, non-loop, non-CDROM) and SystemDisks separately
Handles NVMe (nvme0n1p1), SCSI (sdb1), and eMMC (mmcblk0p1) naming

Disk Initialization Wizard (`internal/storage/format.go`)

A step-by-step UI at /settings/storage/init:

Scan — Lists available disks with model, size, partition info
Select — User picks a disk and enters a mount name (e.g., hdd_1)
Confirm — User types "FORMAZAS" to confirm destructive operation
Format pipeline: wipefs → sfdisk (GPT) → mkfs.ext4 → blkid UUID → backup fstab → append UUID-based fstab entry → mount → findmnt verification → chown 1000:1000 → create appdata/, backups/, and Dokumentumok/ subdirectories
Auto-registers new storage path in settings.json
Smart partition detection: skips repartitioning for existing empty partitions

Safety guards: system disk detection, mount path conflict check, confirmation required, progress channel for real-time UI feedback.

Attach Existing Drive Wizard (`internal/storage/attach.go`)

A step-by-step UI at /settings/storage/attach for drives that already have a filesystem (e.g., a previously used ext4 drive). Unlike the init wizard, this does not format the drive — existing data is preserved.

Problem solved: Mounting a whole drive at /mnt/<name> would mix existing user data with the controller's directory structure (storage/, Dokumentumok/, backup repos). The bind-mount approach isolates the controller's working directory from other data on the drive.

Scan — Lists available disks, filtered to partitions that have an existing filesystem (FSType != "")
Mount raw — Partition is mounted read-only at a hidden staging path (/mnt/.felhom-raw/<label>)
Browse — Directory browser shows the drive's contents. User can navigate and create a new folder (e.g., felhom_data)
Configure — User enters a mount name and display label. Warning: mount path is immutable until detached
Finalize — Bind-mounts the selected subfolder at /mnt/<name>. Two fstab entries are created (both with nofail):
- Raw mount: UUID=<uuid> /mnt/.felhom-raw/<x> <fstype> defaults,nofail,noatime 0 2
- Bind mount: /mnt/.felhom-raw/<x>/<subfolder> /mnt/<name> none bind,nofail 0 0
Sets permissions (chown 1000:1000), creates storage/ and Dokumentumok/ subdirectories
Auto-registers the storage path in settings.json + syncs FileBrowser mounts

Cancel at any point cleans up the temporary raw mount. The bind mount path (/mnt/<name>) is a real mount point, so all existing code (disk usage, IsMountPoint checks, etc.) works unchanged.

Storage Path Registry (`internal/settings/settings.go`)

Multiple external storage paths supported with:

Label: Human-readable name (editable inline)
Default flag: New deploys use this path by default
Schedulable flag: Path appears in deploy dropdown
Auto-discovery: On startup, scans deployed apps' HDD_PATH values and registers unknown paths
Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel

Data Migration (`internal/storage/migrate.go`)

Move app data between storage paths (e.g., SSD → HDD, HDD → new HDD):

Validate: stack exists, deployed, has HDD data, target differs from source
Estimate total size, check free space on target
Stop the application
rsync -a --info=progress2 per mount path with real-time progress parsing
Update app.yaml HDD_PATH to new location
Start the application
Rollback on failure: reverts config, restarts on old storage

Progress UI at /stacks/{name}/migrate with byte counter and percentage.

Stale Data Cleanup

After migration, the deploy page detects leftover data on previous storage paths:

Shows path, size, and a delete button
Two-step confirmation required
Protected paths (appdata, backups, media, Dokumentumok) cannot be deleted

FileBrowser Mount Sync

When storage paths are added or removed, syncFileBrowserMounts() auto-regenerates FileBrowser's docker-compose.yml with volume mounts for all registered paths, then recreates the container.

4. Monitoring & Health

System Health Checks (`internal/monitor/healthcheck.go`)

RunHealthCheck() evaluates multiple subsystems and returns a HealthReport with status (ok/warn/fail):

Check	Warning	Critical
Disk usage (SSD/HDD)	>= 90%	>= 95%
Memory	available < 512MB	available < 256MB
CPU temperature	>= 75C	>= 85C
Docker daemon	—	unreachable
Protected containers	—	not running
Storage paths	not a mount point (data on SSD)	path inaccessible, disk >= 95%

Backup destination validation (CheckBackupDestination) has tiered checks:

Path doesn't exist → critical/blocked
Not writable → critical/blocked
Same block device as root → warning (data on system drive)
Disk >95% full → critical/blocked
Disk >90% full → warning

Healthchecks.io Integration (`internal/monitor/pinger.go`)

Five ping UUIDs for external monitoring:

Heartbeat: every 5 min (simple "I'm alive")
System Health: periodic health check results
DB Dump: after nightly database dumps
Backup: after nightly restic backup
Backup Integrity: weekly restic check result

3-attempt retry with 2-second backoff. Pinger never fails the caller.

Metrics Store (`internal/metrics/`)

SQLite with WAL mode for concurrent reads during collection
System metrics: CPU%, memory (total/used/available), temperature, load average — collected every 60 seconds
Container metrics: CPU%, memory, network I/O, block I/O per container
Downsampled queries for chart time ranges (1h, 6h, 24h, 7d, 30d)
30-day auto-prune via daily scheduler job

Monitoring Page

Full-page system monitor at /monitoring:

System Overview: hostname, OS, kernel, CPU model/cores, uptime
System Metrics Charts: 4 line charts (CPU, Memory, Temperature, Load) in 2x2 grid
Container Resources: horizontal bar charts (CPU% and Memory per container)
Per-container Detail: click-to-expand historical charts
Remote Monitoring Status: shows Healthchecks ping UUID configuration

Chart.js 4.4.7 embedded locally (works in offline environments), dark theme matching site design.

Alert System (`internal/web/alerts.go`)

State-based alerts displayed on all pages:

Sources: health issues, missing ping UUIDs, backup disabled
Sorted by severity (error > warning > info), capped at 5 visible
Refreshed every 5 min + on startup
Monitoring page suppresses ping-related alerts (shown in dedicated table instead)

5. Notifications

Email Delivery

The controller relays notifications through the central hub, which sends emails via the Resend API:

Controller detects event (health degradation, backup failure, etc.)
Non-blocking POST to hub's /api/v1/notify with event details
Hub checks customer notification preferences
Hub sends Hungarian-language email via Resend

Event Types

Event	Trigger
`disk_warning`	Disk usage crosses warning/critical threshold
`backup_failed`	Nightly backup or DB dump fails
`update_available`	New app version detected in catalog
`security_update`	Critical security update available

Cooldown System

Per-event-type cooldown (default 6 hours, configurable) prevents notification spam. Only notifies on status degradation (ok→warn, ok→fail, warn→fail), not on repeated same-status checks.

Preference Sync

Notification preferences (email, enabled events, cooldown) are:

Stored locally in settings.json
Synced to hub on save and on controller startup
Hub sync failure doesn't block local save

6. Update Management

App Catalog Sync

Periodic git fetch + git reset --hard of the app catalog repo
Content-hash comparison prevents unnecessary file writes
Post-sync stack rescan detects new/changed apps immediately

Planned Update Classifications

Marker	Behavior
No marker	Optional — shown on dashboard, customer clicks "Update"
`UPDATE_REQUIRED=true`	Mandatory — auto-applied during next update window
`UPDATE_SECURITY=true`	Critical — applied immediately

7. Authentication & Settings

Session Auth (`internal/web/auth.go`)

bcrypt password verification with configurable source priority: settings.json → controller.yaml → no auth (open access)
7-day session duration with random 32-byte hex tokens
?next= redirect after login preserves the page the user was visiting
Session cleanup every 15 minutes
All sessions invalidated on password change
Conditional logout link (hidden when auth is disabled)

Settings Persistence (`internal/settings/settings.go`)

Runtime-mutable settings in settings.json (separate from infrastructure config):

Section	Contents
`password_hash`	bcrypt hash override
`notifications`	email, enabled events, cooldown hours
`db_validations`	per-DB dump validation results (survives restarts)
`app_backup`	per-app map: enabled flag, cross-drive config (method, dest, schedule, runtime status)
`storage_paths`	registered paths with label, default flag, schedulable flag
`cross_drive_restic_password`	auto-generated restic password for cross-drive repos

All public methods use sync.RWMutex. File writes are atomic (.tmp + rename).

Settings Page (`/settings`)

Three sections:

System config — read-only display of controller.yaml values
Password change — current + new + confirm, min 8 chars
Storage paths — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes
Notifications — email, event checkboxes, cooldown hours, test email button

8. Central Hub Reporting

Report Push (`internal/report/`)

Periodic JSON push (default every 15 min) to the central felhom-hub service:

System: hostname, OS, CPU, memory, disk usage, uptime
Containers: running/stopped counts, per-container CPU/memory
Backup: last run, success, repo stats, snapshot count, restic password (for disaster recovery)
Health: current status, issues, warnings
Stacks: deployed apps with versions and states

Bearer token authentication, 3-attempt retry with 5-second backoff.

Infrastructure Backup to Hub (`internal/report/infra_backup.go`)

After each backup cycle, the controller pushes a full infrastructure snapshot to the Hub for disaster recovery. This snapshot includes:

controller.yaml (base64-encoded, full config including secrets)
settings.json (base64-encoded, backup prefs, storage paths, cross-drive configs)
Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
Deployed stacks manifest (app names, HDD paths)
Restic passwords (primary + cross-drive, base64-encoded)

This enables fully automated recovery when the system drive is replaced — the new controller pulls the snapshot from the Hub, auto-mounts surviving drives by UUID, and restores all applications.

Hub Dashboard

The hub service (separate Go app in the felhom.eu repo) provides:

Multi-customer overview table with status indicators
Customer detail page with system/storage/containers/backup/health sections
Infra backup status per customer (last sync, stack count, disk count)
Color coding: green (<30min), yellow (30-60min), red (>60min since last report)
90-day report retention with daily prune

9. Disaster Recovery

When a system drive fails and is replaced, the controller can automatically restore the full deployment:

1. docker-setup.sh deploys fresh controller (Hub enabled, customer_id configured)
2. Controller detects empty data dir → fresh deployment
3. Controller pulls infra backup from Hub → gets disk layout, passwords, configs
4. Controller scans block devices for UUIDs matching stored disk layout
5. Controller mounts surviving drives (e.g., HDD with backups)
6. Controller scans mounted drives for local backup data (_infra/ + rsync copies)
7. Controller auto-restores stack configs → apps appear in dashboard
8. User opens dashboard → "Visszaállítás" (Restore) wizard
9. User confirms → sequential restore: rsync first, restic fallback, DB import
10. Apps restored and running

Backup sources (priority order):

Rsync copies (cross-drive, plain files, no password needed) — fastest, most reliable
Restic snapshots (encrypted, needs password from Hub) — comprehensive but slower

Fallback: If the Hub is unreachable, the controller can still detect backups on already-mounted drives (manual mount or pre-existing fstab entries).

Repository Layout

controller/
├── cmd/controller/main.go           # Entry point, wires all 14 modules
├── internal/
│   ├── config/config.go             # YAML loader, validation, env overrides
│   ├── settings/settings.go         # Runtime settings (JSON, atomic writes, RWMutex)
│   ├── stacks/
│   │   ├── manager.go               # Stack scanning, compose ops, container status
│   │   ├── metadata.go              # Parse .felhom.yml app metadata
│   │   ├── deploy.go                # First-deploy: secret gen, app.yaml, compose up
│   │   └── delete.go                # Stack deletion + HDD data cleanup
│   ├── sync/sync.go                 # Git sync: clone/pull app catalog, content-hash copy
│   ├── storage/
│   │   ├── scan.go, scan_linux.go   # Disk detection via lsblk + blkid
│   │   ├── format.go, format_linux.go  # Partition, format, mount pipeline
│   │   ├── attach.go, attach_linux.go  # Attach existing FS drive (raw mount + bind mount)
│   │   ├── safety.go, safety_linux.go  # System disk detection, mount guards, fstab ops
│   │   ├── migrate.go              # App data migration (rsync with progress)
│   │   └── *_other.go              # Non-Linux stubs for cross-compilation
│   ├── backup/
│   │   ├── backup.go               # Orchestrator (per-drive dumps + restic + cross-drive chain)
│   │   ├── paths.go                # Per-drive path helpers (PrimaryResticRepoPath, AppDBDumpPath, etc.)
│   │   ├── dbdump.go               # DB auto-discovery + dump (pg_dump, mariadb-dump)
│   │   ├── restic.go               # Restic operations (init, snapshot, prune, check) — repoPath as param
│   │   ├── appdata.go              # StackDataProvider interface, app data discovery
│   │   ├── crossdrive.go           # Per-app backup to secondary storage (rsync/restic)
│   │   ├── restore.go              # Per-app restore from per-drive repo
│   │   ├── restore_scan.go         # DR: scan drives for backup data, build restore plan
│   │   ├── restore_app_linux.go    # DR: per-app restore (rsync config/data + docker compose up)
│   │   └── restore_drives_linux.go # DR: auto-mount drives by UUID from Hub infra backup
│   ├── api/router.go               # REST API endpoints (~30 routes)
│   ├── scheduler/scheduler.go      # Central job scheduler (Every, Daily)
│   ├── system/
│   │   ├── info.go, info_linux.go  # RAM, disk, CPU, temperature, load average
│   │   ├── cpu_linux.go            # Background /proc/stat sampling
│   │   └── mounts_linux.go         # Mount points, disk usage, FS info, backup dest checks
│   ├── monitor/
│   │   ├── pinger.go               # Healthchecks.io HTTP ping client
│   │   └── healthcheck.go          # System health checks (disk, mem, CPU, temp, Docker)
│   ├── metrics/
│   │   ├── store.go                # SQLite time-series (WAL mode, downsampled queries)
│   │   ├── collector.go            # Background collector (60s, system + docker stats)
│   │   └── sysinfo.go              # Static system info (/proc, /etc)
│   ├── notify/notifier.go          # Email relay to hub, preference sync, cooldowns
│   ├── report/
│   │   ├── builder.go              # Hub report builder (all subsystems → JSON)
│   │   ├── pusher.go               # HTTP POST to hub (retry, Bearer auth)
│   │   └── infra_pull.go           # DR: pull infra backup from Hub for fresh deployment
│   └── web/
│       ├── server.go               # HTTP server, routing, static files
│       ├── auth.go                 # Session auth, login/logout, session cleanup
│       ├── handlers.go             # Page handlers (dashboard, stacks, deploy, backups, etc.)
│       ├── handler_restore.go      # DR: restore page handler + APIs (scan, restore all, skip)
│       ├── storage_handlers.go     # Storage API handlers (scan, format, attach, migrate, cleanup)
│       ├── alerts.go               # State-based alert generation
│       ├── funcmap.go              # Template functions (state colors, Hungarian formatting)
│       ├── embed.go                # go:embed for templates + Chart.js
│       └── templates/              # 13 HTML files + style.css (Hungarian UI)
├── configs/
│   ├── controller.yaml.example     # Full config reference
│   └── example-felhom-metadata.yml # .felhom.yml format reference
├── Dockerfile                      # Multi-stage: Go 1.24 builder + debian-slim runtime
├── docker-compose.yml              # Controller's own compose (privileged, /mnt rshared)
└── go.mod                          # Go 1.24, deps: bcrypt, yaml.v3, modernc.org/sqlite

Configuration

Controller config (`controller.yaml`)

Single YAML file per customer, infrastructure-only. Does not contain app-specific config.

Key sections:

customer:
  name: "Demo Felhom"
  id: "demo-felhom"

paths:
  stacks_dir: "/opt/docker/stacks"
  data_dir: "/opt/docker/felhom-controller/data"
  system_data_path: "/mnt/sys_drive"   # NVMe/system drive — fallback for apps without HDD

git:
  repo_url: "https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git"
  sync_interval: "15m"

# Per-drive backup paths are computed automatically:
#   <drive>/backups/primary/restic/          — restic repo per drive
#   <drive>/backups/primary/<app>/db-dumps/  — DB dumps per app
#   <drive>/backups/secondary/               — cross-drive rsync + restic
backup:
  enabled: true
  restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
  db_dump_schedule: "02:30"
  restic_schedule: "03:00"
  retention: { keep_daily: 7, keep_weekly: 4, keep_monthly: 6 }

monitoring:
  health_interval: "5m"
  ping_uuids:
    heartbeat: "uuid-here"
    system_health: "uuid-here"
    db_dump: "uuid-here"
    backup: "uuid-here"
    backup_integrity: "uuid-here"

hub:
  enabled: true
  url: "https://hub.felhom.eu"
  api_key: "bearer-token-here"

system:
  reserved_memory_mb: 384  # RAM reserved for OS + controller

Environment variable overrides: FELHOM_LOGGING_LEVEL=debug, FELHOM_HUB_ENABLED=false, etc.

Runtime settings (`settings.json`)

Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache. All writes are atomic.

Per-app config (`app.yaml`)

Auto-generated during deployment. Contains env vars, locked fields list, deploy timestamp. Secret fields are locked (read-only after first deploy).

Scheduler Jobs

Job	Type	When	Purpose
status-refresh	periodic	30s	Refresh container states
stack-scan	periodic	2m	Rescan stacks directory
heartbeat	periodic	5m	Ping Healthchecks "I'm alive"
system-health	periodic	configurable	Health checks + alert refresh
backup-cache	periodic	5m	Refresh backup status cache
hub-report	periodic	15m	Push report to central hub
db-dump	daily	02:30	Database dumps
backup	daily	03:00	Restic backup → cross-drive chain
backup-integrity	daily	Sun 04:00	Restic check
metrics-prune	daily	04:00	Delete metrics older than 30 days

All daily jobs use Europe/Budapest timezone. Skip-if-running prevents concurrent execution. Panic recovery in all jobs.

REST API

Stack Operations

Method	Endpoint	Description
GET	`/api/health`	Health check (no auth)
GET	`/api/stacks`	List all stacks
GET	`/api/stacks/{name}`	Stack details
POST	`/api/stacks/{name}/deploy`	First-time deploy
POST	`/api/stacks/{name}/start`	Start stack
POST	`/api/stacks/{name}/stop`	Stop stack
POST	`/api/stacks/{name}/restart`	Restart stack
POST	`/api/stacks/{name}/update`	Pull + recreate
POST	`/api/stacks/{name}/optional-config`	Update optional env vars
GET	`/api/stacks/{name}/logs`	Container logs (`?raw=1` for plain text)
GET	`/api/stacks/{name}/hdd-data`	HDD data paths + sizes
DELETE	`/api/stacks/{name}`	Delete stack
POST	`/api/sync`	Trigger catalog sync
GET	`/api/system/info`	System info + sync status

Backup & Restore

Method	Endpoint	Description
GET	`/api/backup/status`	Full backup status
POST	`/api/backup/run`	Trigger manual backup
GET	`/api/backup/snapshots`	List snapshots (`?stack={name}` for filtering)
POST	`/api/stacks/{name}/cross-backup`	Save cross-drive config
POST	`/api/stacks/{name}/cross-backup/run`	Trigger cross-drive backup
GET	`/api/stacks/{name}/cross-backup/status`	Cross-drive status
POST	`/api/backup/cross-drive/run-all`	Run all scheduled cross-drive backups

Storage

Method	Endpoint	Description
GET	`/api/storage/scan`	Scan available disks
POST	`/api/storage/init`	Format and mount a disk
GET	`/api/storage/init/status`	Format progress
POST	`/api/storage/attach/mount-raw`	Temp-mount partition for browsing
GET	`/api/storage/attach/browse?path=`	List directories on raw mount
POST	`/api/storage/attach/mkdir`	Create folder on raw mount
POST	`/api/storage/attach`	Finalize attach (bind mount + fstab)
GET	`/api/storage/attach/status`	Attach progress
POST	`/api/storage/attach/cancel`	Cleanup temp raw mount
POST	`/api/storage/migrate`	Start app data migration
GET	`/api/storage/migrate/status`	Migration progress

Metrics

Method	Endpoint	Description
GET	`/api/metrics/system`	System metrics time-series (`?range=1h
GET	`/api/metrics/containers/summary`	Current container stats
GET	`/api/metrics/containers/{name}`	Per-container time-series
GET	`/api/metrics/sysinfo`	Static system info

Response format: {"ok": true/false, "data": ..., "error": "...", "message": "..."}

Build & Deploy

Build

# On build server (192.168.0.180)
cd ~/build/felhom-controller
git -C ~/git/deploy-felhom-compose pull
./build.sh v0.14.1 --push

Deploy on customer node

# On customer node (e.g., 192.168.0.162)
cd /opt/docker/felhom-controller
sudo docker pull gitea.dooplex.hu/admin/felhom-controller:v0.14.1
sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:v0.14.1|' docker-compose.yml
sudo docker compose up -d

Important: Always use docker compose up -d, NOT docker compose restart — restart doesn't pick up new images.

Docker Requirements

The controller container needs:

privileged: true (disk operations)
Docker socket mount (/var/run/docker.sock)
/mnt mount with propagation: rshared (container mounts visible to host)
/dev mounted as /host-dev (block device access)
/etc/fstab mounted as /host-fstab (persistent mount config)

See docker-compose.yml for the full volume configuration.

Roadmap

Completed

Stack management with deploy flow and memory validation
Git-based app catalog sync
Central job scheduler
System monitoring with SQLite metrics and Chart.js charts
Healthchecks.io integration (5 ping types)
3-layer backup system (DB dumps + restic + cross-drive)
Per-app backup restore with auto stop/restart
Storage management (scan, format, mount, registry)
Attach existing drive wizard (v0.15.0) — bind-mount subfolder from pre-formatted drive, directory browser
App data migration between storage paths
Central hub reporting
Email notifications via hub relay
Settings persistence and password management
Dashboard alert system
Per-drive backup architecture (v0.14.0) — per-drive restic repos, per-app DB dumps, path helpers
Cross-drive restic pruning (v0.14.0)
Auto Tier 2 for small apps (v0.14.1) — auto-enable daily rsync for non-HDD apps when ≥2 drives
Infrastructure config in cross-drive backup (v0.14.1) — stacks dir + controller.yaml in _infra/ + restic
Disaster recovery (v0.15.5) — Hub-based infra backup, auto-mount by UUID, restore UI with full-page takeover

In Progress / Planned

Update classification and auto-apply (optional/required/security markers)
Self-update mechanism with health-based rollback
Docker volume backup (/var/lib/docker/volumes:ro)
Raspberry Pi testing (pi-customer-1)
CSRF protection on POST endpoints
Login rate limiting

Test Environments

Node	Hardware	Domain	Status
demo-felhom	Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD	demo-felhom.eu	Controller v0.15.5
pi-customer-1	Raspberry Pi 3B+, 1G RAM, 32G SD	pi-customer-1.local	Not yet tested

Repository	Purpose
deploy-felhom-compose	This repo — controller + deploy scripts
app-catalog-felhom.eu	Docker Compose templates + .felhom.yml metadata
felhom.eu	Website + app assets + felhom-hub service

README.md

felhom-controller

Table of Contents

Architecture

Key Architecture Decisions

Module Map

Features

1. App Management

Git Sync (internal/sync/)

First-Time Deploy Flow

App Info Pages

Stack Operations

Container State Display

2. Backup System

Tier 1: Nightly Backup (mandatory, same drive)

Tier 2: Cross-Drive Backup (opt-in, different device) (internal/backup/crossdrive.go)

Tier 3: Remote Backup (future)

Restore (internal/backup/restore.go)

Backup Page UI (internal/web/templates/backups.html)

3. Storage Management

Disk Scanning (internal/storage/scan.go)

Disk Initialization Wizard (internal/storage/format.go)

Attach Existing Drive Wizard (internal/storage/attach.go)

Storage Path Registry (internal/settings/settings.go)

Data Migration (internal/storage/migrate.go)

Stale Data Cleanup

FileBrowser Mount Sync

4. Monitoring & Health

System Health Checks (internal/monitor/healthcheck.go)

Healthchecks.io Integration (internal/monitor/pinger.go)

Metrics Store (internal/metrics/)

Monitoring Page

Alert System (internal/web/alerts.go)

5. Notifications

Email Delivery

Event Types

Cooldown System

Preference Sync

6. Update Management

App Catalog Sync

Planned Update Classifications

7. Authentication & Settings

Session Auth (internal/web/auth.go)

Settings Persistence (internal/settings/settings.go)

Settings Page (/settings)

8. Central Hub Reporting

Report Push (internal/report/)

Infrastructure Backup to Hub (internal/report/infra_backup.go)

Hub Dashboard

9. Disaster Recovery

Repository Layout

Configuration

Controller config (controller.yaml)

Runtime settings (settings.json)

Per-app config (app.yaml)

Scheduler Jobs

REST API

Stack Operations

Backup & Restore

Storage

Metrics

Build & Deploy

Build

Deploy on customer node

Docker Requirements

Roadmap

Completed

In Progress / Planned

Test Environments

Related Repositories

Git Sync (`internal/sync/`)

Tier 2: Cross-Drive Backup (opt-in, different device) (`internal/backup/crossdrive.go`)

Restore (`internal/backup/restore.go`)

Backup Page UI (`internal/web/templates/backups.html`)

Disk Scanning (`internal/storage/scan.go`)

Disk Initialization Wizard (`internal/storage/format.go`)

Attach Existing Drive Wizard (`internal/storage/attach.go`)

Storage Path Registry (`internal/settings/settings.go`)

Data Migration (`internal/storage/migrate.go`)

System Health Checks (`internal/monitor/healthcheck.go`)

Healthchecks.io Integration (`internal/monitor/pinger.go`)

Metrics Store (`internal/metrics/`)

Alert System (`internal/web/alerts.go`)

Session Auth (`internal/web/auth.go`)

Settings Persistence (`internal/settings/settings.go`)

Settings Page (`/settings`)

Report Push (`internal/report/`)

Infrastructure Backup to Hub (`internal/report/infra_backup.go`)

Controller config (`controller.yaml`)

Runtime settings (`settings.json`)

Per-app config (`app.yaml`)