Files

T

admin c929948f27 feat: Docker volume backup, Tier 2 restore, restore dropdown fixes (v0.33.0)

- Add Docker named volume backup to Tier 1 (dump to tar, include in restic)
  and Tier 2 (copy tars to rsync mirror _volumes/ dir)
- Fix volume name resolution: use project-prefixed names (mealie_mealie_data)
- Fix double Tier 1 in restore dropdown: filter snapshots by app's home drive
- Add Tier 2 restore: RestoreAppFromTier2() restores from rsync mirror
- Show Tier 2 entry in restore dropdown when cross-drive backup succeeded
- Add .fab import link in restore section
- Volume-aware restore type banners and backup content labels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-27 21:43:02 +01:00

cmd/controller

feat: Docker volume backup, Tier 2 restore, restore dropdown fixes (v0.33.0)

2026-02-27 21:43:02 +01:00

configs

updated readme

2026-02-21 15:45:40 +01:00

internal

feat: Docker volume backup, Tier 2 restore, restore dropdown fixes (v0.33.0)

2026-02-27 21:43:02 +01:00

mnt/user-data/outputs/felhom-controller

added controller

2026-02-13 18:54:08 +01:00

scripts

restructured files, and updated for build outside

2026-02-13 19:10:35 +01:00

.gitignore

updated DB query and build instructions

2026-02-16 11:19:18 +01:00

build.sh

made script executable

2026-02-13 19:12:32 +01:00

BUILDING.md

restructured files, and updated for build outside

2026-02-13 19:10:35 +01:00

docker-compose.yml

fix: catch-all middleware allow localhost for healthcheck, drop certresolver

2026-02-23 14:15:49 +01:00

Dockerfile

fix(dockerfile): add fdisk package for sfdisk (partition table writing)

2026-02-17 11:15:39 +01:00

go.mod

v0.5.0: Backup bugfixes + monitoring page with metrics store

2026-02-16 10:14:46 +01:00

go.sum

v0.5.0: Backup bugfixes + monitoring page with metrics store

2026-02-16 10:14:46 +01:00

Makefile

added controller

2026-02-13 18:54:08 +01:00

README.md

feat: Docker volume backup, Tier 2 restore, restore dropdown fixes (v0.33.0)

2026-02-27 21:43:02 +01:00

README.md

felhom-controller

Central management container for Felhom home servers.

A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.

Current version: v0.33.0

Architecture
Features
Repository Layout
Configuration
REST API
Build & Deploy
Roadmap

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Customer Hardware (N100 mini PC / Raspberry Pi)                │
│                                                                 │
│  ┌──────────┐   ┌────────────────────────────────────────────┐  │
│  │ Traefik  │   │  felhom-controller (privileged container)  │  │
│  │ (reverse │──▶│                                            │  │
│  │  proxy)  │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Web UI   │  │ Stack Manager           ││  │
│                 │  │ (HU dash │  │ (compose ops, git sync,  ││  │
│  ┌──────────┐   │  │  board)  │  │  deploy, delete, update) ││  │
│  │cloudflared│   │  └──────────┘  └─────────────────────────┘│  │
│  │ (tunnel) │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  └──────────┘   │  │ Backup   │  │ Storage Manager         ││  │
│                 │  │ (3-layer │  │ (disk scan, format,     ││  │
│  ┌──────────┐   │  │  restic) │  │  mount, migrate)        ││  │
│  │ App      │   │  └──────────┘  └─────────────────────────┘│  │
│  │ stacks   │   │  ┌──────────┐  ┌─────────────────────────┐│  │
│  │ (docker  │   │  │Scheduler │  │ Monitor & Metrics       ││  │
│  │ compose) │   │  │(cron-like│  │ (health, SQLite         ││  │
│  └──────────┘   │  │  jobs)   │  │  time-series, Chart.js) ││  │
│                 │  └──────────┘  └─────────────────────────┘│  │
│                 │  ┌──────────┐  ┌─────────────────────────┐│  │
│                 │  │ Notify   │  │ REST API + Hub Reporter ││  │
│                 │  │ (events) │  │ (JSON push + events)    ││  │
│                 │  └──────────┘  └─────────────────────────┘│  │
│                 │  ┌──────────┐                              │  │
│                 │  │ Assets   │                              │  │
│                 │  │ (Hub     │                              │  │
│                 │  │  sync)   │                              │  │
│                 │  └──────────┘                              │  │
│                 └────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
         │ events + reports      │ git pull       │ asset sync
         ▼                       ▼                ▼
   hub.felhom.eu           gitea.dooplex.hu  hub.felhom.eu
   (central dashboard)     (stack definitions) (logos, screenshots)

Key Architecture Decisions

Pure Go, no frameworks — stdlib net/http + html/template. Only external deps: bcrypt, yaml.v3, modernc.org/sqlite (pure Go, no CGO).
Privileged container — Required for disk operations (format, mount, fstab), /dev access, and Docker socket control.
/host-dev indirection — Docker overrides /dev with a tmpfs. The host's /dev is mounted at /host-dev to access block devices.
StackDataProvider interface — Breaks circular import between backup and stacks packages. Implemented by stackAdapter in main.go. Provides GetStackHDDPath() for per-drive backup routing.
Atomic file writes — All persistent state (settings.json, app.yaml) written to .tmp then os.Rename for crash safety.
go:embed templates — All HTML/CSS/JS compiled into the binary. No runtime file dependencies.
Europe/Budapest timezone — All scheduled jobs, timestamps, and UI labels use Hungarian timezone.

Module Map

Module	Path	Responsibility
Config	`internal/config/`	YAML loader, validation, `FELHOM_*` env overrides
Settings	`internal/settings/`	Runtime-mutable `settings.json` (passwords, backup prefs, storage paths, notifications)
Stacks	`internal/stacks/`	Compose operations, scanning, `.felhom.yml` metadata, deploy/delete flow
Crypto	`internal/crypto/`	AES-256-GCM encryption for sensitive app.yaml values (passwords, secrets), key management
Sync	`internal/sync/`	Git-based app catalog sync (clone/pull, content-hash copy)
Backup	`internal/backup/`	Per-drive 3-layer backup: DB dumps → restic snapshots → cross-drive copies, restore
Storage	`internal/storage/`	Disk scanning (`lsblk`), partitioning (`sfdisk`), formatting (`mkfs.ext4`), mounting, data migration (`rsync`)
System	`internal/system/`	System info (`/proc`), CPU collector, mount points, disk usage, FS info
Monitor	`internal/monitor/`	System health checks, storage watchdog, legacy Healthchecks pinger (deprecated)
Metrics	`internal/metrics/`	SQLite time-series store, system + container metric collection
Scheduler	`internal/scheduler/`	Central job scheduler (periodic + daily, skip-if-running, panic recovery)
SelfUpdate	`internal/selfupdate/`	Version checking (registry), update trigger, state persistence, startup verification
Notify	`internal/notify/`	Email notifications via hub relay, preference sync, per-event cooldowns
Report	`internal/report/`	Hub report builder + HTTP pusher (system, stacks, backup, health)
Assets	`internal/assets/`	Hub-managed asset syncer: downloads logos/screenshots with SHA-256 change detection
SelfTest	`internal/selftest/`	Startup self-test: 9 diagnostic checks (Docker, dirs, storage, hub, restic, metrics)
Util	`internal/util/`	Shared utilities: `TruncateStr` for debug log output truncation
AppExport	`internal/appexport/`	Per-app export/import via `.fab` bundles (config + DB + user data), optional AES-256 encryption
API	`internal/api/`	REST JSON endpoints, diagnostic dump (`/api/debug/dump`)
Web	`internal/web/`	Hungarian dashboard, auth, page handlers, template functions, alerts

Features

1. App Management

The controller manages Docker Compose stacks through a complete lifecycle: catalog sync, first-time deployment, runtime operations, and deletion.

Git Sync (`internal/sync/`)

The app catalog lives in a separate Git repository. The controller:

Shallow-clones the catalog on startup
Periodically fetches updates (configurable, default 15 min)
Copies only docker-compose.yml and .felhom.yml to the stacks directory
Never overwrites app.yaml (user secrets are safe)
Uses SHA-256 content hashing — only writes files that actually changed
Triggers stack rescan after sync so the dashboard updates immediately
Post-sync hook: auto-injects missing deploy fields (new secrets, domains) into existing app.yaml for stacks whose templates were updated (see Missing Field Injection below)
Manual sync via "Sablonok frissitese" button or POST /api/sync

First-Time Deploy Flow

Customer sees app card with "Telepites" button
Deploy page pre-generates and displays all auto-values before the user clicks deploy:
- domain fields: shown as readonly text input with the customer's configured base domain
- subdomain fields: editable text input pre-filled with the default from .felhom.yml, shown with .base-domain suffix. Validated for DNS-safe format, reserved names, and uniqueness across deployed stacks. Locked after deploy — changing requires Remove + Redeploy
- secret fields: pre-generated and shown as masked password inputs with a "Megjelenítés" reveal button — user can see/copy all DB passwords and keys before deploying
- User-configurable inputs (admin password, language, storage path) remain editable
- Section header prompts the user to note down any passwords they need
checkBeforeDeploy() JS guard fetches live state first (prevents double-deploy from another tab)
Memory validation uses real system memory from /proc/meminfo:
- usable_memory = total_ram - reserved_memory_mb (default 384MB reserved)
- system.GetMemoryMB() returns real-time total and used memory (not declared reservations)
- Hard block if used_mb + new_request > usable_memory
- CommittedMemory() (declared sum) still used for soft overcommit warning only
- Deploy page shows real memory usage bar (not declared reservations)
Pre-generated secret values are submitted as hidden form inputs so the same values the user saw are saved to app.yaml (no silent re-generation on submit). Controller saves app.yaml, sets in-memory Deployed + Deploying flags, then runs docker compose up -d asynchronously in a goroutine — API returns immediately so the UI switches to the progress panel without waiting for image pulls. On failure the goroutine reverts both disk and in-memory state and sets DeployError.
3-step progress panel polls GET /api/stacks/{name} every 3s: config saved → deploying (pulling images) → containers starting → health check passed. New StateDeploying state shown while compose-up is in progress (no containers yet).
Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only; the "Automatikusan generált értékek" section continues to show the saved values on the settings page
The deploy/settings page includes start/stop/restart buttons for deployed apps, plus a "Megnyitás ↗" link to the app's subdomain URL (only visible when running)

Catch-All Page for Stopped Apps

When a user visits a stopped or undeployed app's subdomain (e.g., travel.demo-felhom.eu), the controller serves a branded error page instead of Traefik's raw 404:

Traefik catch-all router: The controller's docker-compose.yml registers a second router (catchall) with priority=1 (lowest) and HostRegexp(.+). Running apps always win; only requests with no matching container reach the controller.
CatchAllMiddleware in server.go intercepts requests where Host ≠ felhom.DOMAIN, serves the catch-all page without auth (user has no session on the app subdomain).
findStackBySubdomain() identifies the app by matching the subdomain against deployed app.yaml SUBDOMAIN env or metadata fallback.
catchall.html — standalone template (no layout, inline CSS) showing the app name, status ("leállítva" / "nincs telepítve" / "nem található"), and links to the controller dashboard or the app's detail page.
Subdomain links on the Alkalmazások page are only shown for deployed apps (non-deployed apps have no guaranteed subdomain yet).

Dashboard "Megnyitás" Button

Running apps on the Vezérlőpult now show a "Megnyitás ↗" button that opens the app's subdomain in a new tab. The Subdomains map is built in dashboardHandler from app.yaml env or metadata fallback.

App Info Pages

Each app can define rich metadata in .felhom.yml:

app_info: tagline, use_cases, first_steps, prerequisites, default_creds, docs_url
optional_config: groups of post-deploy configurable env vars (e.g., API keys for metadata providers)
resources: mem_request, mem_limit, pi_compatible, needs_hdd, hungarian_ui

The /apps/{slug} page renders hero section, screenshots, setup guide, and optional config form.

Stack Operations

Operation	What it does
Start	`docker compose up -d` — pre-start memory check rejects with 409 if insufficient RAM
Stop	`docker compose stop` (blocked for protected stacks)
Restart	`docker compose restart`
Update	`docker compose pull` + `docker compose up -d`
Remove	`docker compose down --volumes` + remove `app.yaml` + optional HDD/backup cleanup; template preserved for redeploy
Delete	`docker compose down --rmi local --volumes` + optional HDD data cleanup (orphaned stacks only)

Remove vs Delete: "Eltávolítás" (Remove) is for deployed catalog stacks — it reverts the stack to "Nincs telepítve" state while keeping the template for easy redeployment. "Törlés" (Delete) is for orphaned stacks — it removes the entire stack directory including templates. Both require stopping the stack first.

Remove modal shows three sections: (1) always-removed items (Docker volumes, app.yaml, cross-drive schedule), (2) optional HDD data deletion with reimport warning, (3) optional backup data deletion (DB dumps + cross-drive rsync) with restic retention note.

Protected stacks (traefik, cloudflared, felhom-controller) cannot be stopped, removed, or deleted from the UI. Restart is allowed.

Orphan detection: Deployed stacks with no matching catalog template are marked as orphaned with an "Elavult" badge and can be safely deleted.

Missing Field Injection (`deploy.go`)

When app templates are updated (e.g., a new APP_KEY secret is added to .felhom.yml), existing deployed apps need the new field in their app.yaml. The controller handles this automatically:

On startup: InjectMissingFields() runs for all deployed stacks
After sync: the post-sync hook runs for stacks whose templates were updated
For each deployed stack, compares .felhom.yml deploy_fields against app.yaml env vars
Missing secret fields: auto-generated using the field's generator spec (password:N, hex:N, base64key:N)
Missing domain fields: filled with the customer's configured domain
Missing subdomain fields: filled with the field's default value or the .felhom.yml subdomain: metadata
Other field types (e.g., text, select): logged as warning for manual configuration
Locked fields are added to the locked list automatically

Generator types: password:N (alphanumeric), hex:N (hex-encoded random bytes), base64key:N (base64: + N random bytes base64-encoded, for Laravel APP_KEY etc.), static:VALUE (literal value).

Container State Display

State	Color	Label	Meaning
Running + healthy	Green	"Fut"	All containers running and healthy
Running + starting	Orange	"Indulas..."	Healthcheck not yet passed
Deploying	Orange	"Telepítés..."	Compose up in progress (image pull, container creation)
Running + unhealthy	Yellow	"Nem egeszseges"	Docker or controller-side healthcheck failing
Stopped/exited	Red	"Leallitva"	All containers stopped
Restarting	Yellow	"Ujrainditas..."	Restart loop
Not deployed	Gray	"Nincs telepitve"	Compose file exists, not deployed

Controller-side Health Probes (`internal/stacks/healthprobe.go`)

For apps that declare a healthcheck: section in .felhom.yml, the controller probes the container directly over the Docker network (both are on traefik-public). This complements Docker-level healthchecks and is the only health mechanism for distroless/scratch images that lack shell utilities.

Three probe types are supported:

http — Any HTTP response (even 4xx/5xx) = service is alive. Only connection refused/timeout = unhealthy.
api — HTTP request with response validation (expected status code, body content). Fails if expectations aren't met.
tcp — Simple port reachability check via net.Dial.

Multiple checks per app are supported (all must pass). The probe scheduler runs every 10 seconds; per-app intervals default to 5 minutes and are configurable via healthcheck.interval in .felhom.yml. Probe results are stored in Stack.HealthProbe and exposed via the API. Failed probes override the stack state to StateUnhealthy; the override clears automatically when the next probe passes.

Fast initial probing: On start/restart, stale health probe results are cleared (so the stack doesn't immediately appear "unhealthy" from a previous result). Until the first healthy probe, the controller checks every 10 seconds instead of the normal 5-minute interval, giving fast feedback on whether the app came up successfully.

2. App Export/Import (.fab bundles)

Per-app export creates a self-contained .fab file (tar.gz, optionally encrypted) that can be stored externally or used to restore the app on the same server. Distinct from the automatic backup system — user-initiated, per-app, produces a single portable file.

Bundle contents: manifest.json + config/ (compose, .felhom.yml, app.yaml with plaintext secrets) + database/ (gzipped SQL dump) + data/ (HDD bind mount tars or Docker named volume tars).

Encryption: Optional AES-256-CTR + HMAC-SHA256 with scrypt key derivation (N=32768). Format: "FABE" magic header + salt + IV + encrypted tar.gz + HMAC tag. Streaming for multi-GB files.

Export flow: Estimate size → check free space → optionally stop app → copy config → dump DB → tar user data → create tar.gz → optionally encrypt → atomic rename. App restarts automatically after export if it was stopped.

Import flow: Decrypt if needed → extract → prepare stack dir (create new or compose down --volumes for existing) → restore config (re-encrypt app.yaml with current server key) → restore user data (HDD or volumes) → restore DB (start DB service, wait for ready, import dump) → start full stack → refresh UI.

Architecture: internal/appexport/ package with ExportStackProvider adapter interface (same pattern as backup.StackDataProvider). exportAdapter in main.go bridges stacks.Manager to the provider.

API endpoints: /api/export/estimate, /api/export/start, /api/export/status, /api/export/bundles, /api/export/manifest, /api/export/import, /api/export/import/status.

UI: Export button on app info page, standalone import page at /import accessible from the stacks page header.

3. Backup System

The backup system implements a 3-2-1 backup architecture. Each tier is a complete, self-sufficient backup — any single tier can fully restore an app.

Tier	Contents	Location	Can fully restore?
1. Nightly restic	DB + Config + User data	Same drive as app	Yes (not against drive failure)
2. Cross-drive	DB + Config + User data	Different physical device	Yes
3. Remote	Everything	Cloud / remote server	Future

Key principles:

User data backup is mandatory — every app with HDD bind mounts is included automatically. There is no per-app toggle.
Each tier includes everything needed to restore: DB dumps, config, and user data. No tier depends on another tier's data.
Tier 2 is configurable for ALL apps — not just apps with HDD data. Non-HDD apps back up config + DB dumps to the secondary drive (small but protects against drive failure).
The AppBackupPrefs.Enabled field in settings.json is legacy and not read by any code.

Per-app Tier 2 contents by app type:

App type	Tier 2 contents	Example
HDD + DB	Config + DB + User data	Immich, Paperless-ngx
HDD, no DB	Config + User data	—
Docker volumes + DB	Config + DB + Volume data	Tandoor
Docker volumes, no DB	Config + Volume data	Mealie (SQLite)
DB, no HDD/volumes	Config + DB	Vikunja
Config only	Config	Gokapi, Homepage

Tier 1: Nightly Backup (mandatory, same drive)

The nightly backup has two phases that run sequentially. All paths are per-drive — each physical drive gets its own restic repo and per-app DB dump directories.

Drive layout (v0.26.0):

<drive>/
├── felhom-data/                ← all controller-managed data (namespace, v0.26.0+)
│   ├── appdata/<app>/          ← app user data
│   └── backups/
│       ├── primary/
│       │   ├── restic/              ← one restic repo per drive (all apps on this drive)
│       │   └── <app>/
│       │       ├── db-dumps/       ← per-app DB dump files
│       │       └── volume-dumps/   ← per-app Docker volume tars (v0.33.0)
│       └── secondary/
│           ├── restic/         ← secondary restic repo (cross-drive)
│           ├── _infra/         ← infra config mirror
│           └── <app>/rsync/    ← per-app rsync data
├── .felhom-infra-backup/       ← DR marker (stays at drive root for scanner)
├── Dokumentumok/               ← user files (not controller-managed)
└── media/                      ← user files (not controller-managed)

Note: HDD_PATH env var in app.yaml is still the mount point (e.g., /mnt/hdd_1). The felhom-data segment is embedded in path helpers — not in HDD_PATH. Pre-v0.26.0 installations use <drive>/appdata/ and <drive>/backups/ directly (no felhom-data/ namespace).

Path computation is centralized in backup/paths.go via the FelhomDataDir = "felhom-data" constant:

PrimaryResticRepoPath(drivePath) → <drive>/felhom-data/backups/primary/restic/
AppDBDumpPath(drivePath, stackName) → <drive>/felhom-data/backups/primary/<stack>/db-dumps/
AppVolumeDumpPath(drivePath, stackName) → <drive>/felhom-data/backups/primary/<stack>/volume-dumps/
AppDataDir(drivePath, stackName) → <drive>/felhom-data/appdata/<stack>/
SecondaryResticRepoPath(drivePath) → <drive>/felhom-data/backups/secondary/restic/
AppSecondaryRsyncPath(drivePath, stackName) → <drive>/felhom-data/backups/secondary/<stack>/rsync/
SecondaryInfraPath(drivePath) → <drive>/felhom-data/backups/secondary/_infra/
InfraBackupDir(mountPath) → <drive>/.felhom-infra-backup/ (unchanged — stays at drive root for DR scanner)

Phase 1 — Database Dumps (internal/backup/dbdump.go, scheduled 02:30)

Auto-discovery of PostgreSQL and MariaDB containers via docker ps + docker inspect
Dumps via docker exec pg_dump / docker exec mariadb-dump with 5-minute timeout
Dumps are written to the app's home drive: AppDBDumpPath(appDrive, stackName)
Atomic writes (.tmp → .sql) to prevent corruption
Validation after each dump: checks file size, header presence, counts CREATE TABLE
Results cached in settings.json surviving container restarts

Phase 1b — Docker Volume Dumps (internal/backup/backup.go, runs after DB dumps)

Iterates all deployed stacks that have Docker named volumes (GetDockerVolumes())
For each volume: docker run --rm -v <vol>:/vol:ro -v <dumpDir>:/out alpine tar cf /out/<vol>.tar -C /vol .
10-minute timeout per volume; warnings on failure (non-fatal)
Stale tars cleaned up (volumes that no longer exist)
Volume names resolved with project prefix via ResolveDockerVolumeNames() (e.g., mealie_mealie_data)
Dumps written to AppVolumeDumpPath(appDrive, stackName)

Phase 2 — Restic Snapshot (internal/backup/restic.go, scheduled 03:00)

Apps are grouped by drive via groupStacksByDrive() — each drive's apps are backed up to that drive's restic repo
App drive resolution: GetStackHDDPath() (from StackDataProvider) → falls back to SystemDataPath
Auto-generated repository password (32 random bytes, base64url), shared across all repos, synced to hub
Paths included in every per-drive snapshot:
- Per-app DB dump dirs on that drive
- Per-app Docker volume dump dirs (volume-dumps/*.tar)
- Per-app HDD mount paths (user data)
- Stacks dir (compose.yml + app.yaml + .felhom.yml for all apps)
- controller.yaml (controller config)
Auto-detects and unlocks stale locks (restic repo lock)
Weekly prune on Sundays with configurable retention (keep-daily, keep-weekly, keep-monthly)
Weekly integrity check (restic check) on Sunday 04:00 — checks all primary repos

Protects against: accidental deletion, data corruption, point-in-time rollback. Does NOT protect against drive failure (backup is on the same physical drive).

Tier 2: Cross-Drive Backup (opt-in, different device) (`internal/backup/crossdrive.go`)

Complete backup to a different physical drive. Available for all apps — apps with HDD data back up config + DB + user data + Docker volumes; apps without HDD back up config + DB dumps + Docker volumes.

Auto-enable for small apps (v0.14.1): Apps without HDD mounts (config-only, DB-only) are automatically configured for daily rsync Tier 2 when ≥2 storage paths are registered. AutoEnableSmallApps() runs at the start of each nightly backup cycle. Never overwrites existing user-configured cross-drive settings (even disabled ones).
Infrastructure config backup (v0.14.1): syncInfraConfig() rsyncs the stacks directory and controller.yaml to <dest>/backups/secondary/_infra/ on every secondary destination drive. Runs before per-app backups. Cross-drive restic also includes infra paths.
Two methods:
- rsync — Simple mirror with --delete (fast, no versioning, browsable on disk)
- restic — Versioned, deduplicated, encrypted (shared repo across apps, not browsable)
Per-app configuration in settings.json: destination path, method, schedule (daily/weekly/manual)
Pre-backup DB dump: DumpStackDB() runs fresh pg_dump/mariadb-dump before each cross-drive backup; non-fatal on failure (wired via DBDumper interface to avoid circular imports)
Pre-backup volume dump (v0.33.0): DumpAppVolumes() exports Docker named volumes to tar before each cross-drive backup (wired via VolumeDumper interface)
Empty mounts allowed: RunAppBackup accepts apps with no HDD mounts — the rsync mount loop simply doesn't execute, but DB + config copy still runs
Drive-type-aware validation (ValidateDestination):

Destination type Space checks

External mount (different device than /) Block if <100 MB free

System drive (same device as /) Require ≥10 GB free AND <90% used; logged warning

Destination type	Space checks
External mount (different device than `/`)	Block if <100 MB free
System drive (same device as `/`)	Require ≥10 GB free AND <90% used; logged warning

Secondary drive layout (v0.14.1):

<dest-drive>/backups/secondary/
├── _infra/              ← infrastructure config mirror (v0.14.1)
│   ├── controller.yaml
│   └── stacks/          ← full stacks dir (all app configs)
├── <app>/rsync/         ← per-app rsync mirror
│   ├── _db/             ← DB dump files
│   ├── _config/         ← compose.yml, app.yaml, .felhom.yml
│   ├── _volumes/        ← Docker volume tars (v0.33.0)
│   └── <user data>      ← HDD mount contents (if app has HDD data)
└── restic/              ← shared restic repo (all cross-drive apps)

DB dump files read from per-app home drive path (AppDBDumpPath)
_ prefix directories prevent collision with user data
For non-HDD apps, only _db/, _config/, and _volumes/ (if applicable) are present (no user data directory)

Restic backup paths: includes HDD mounts (if any) + config dir + per-app DB dump dir from home drive + stacks dir + controller.yaml (infra, v0.14.1)
Safety guards: destination ≠ source, path-overlap check (HDD mounts only), writable check
Chained execution: runs immediately after nightly restic — daily apps every night, weekly apps on Sundays
Hub reporting after manual triggers (v0.27.2): OnCrossDriveComplete callback on Router pushes infra backup snapshot to Hub + writes local infra backup after both single-app and run-all manual triggers complete (previously only automatic scheduled runs reported)
Per-app concurrency lock prevents overlapping runs
Status (last_run, duration, size, error) persisted to settings.json

Protects against: primary drive failure, drive theft/damage.

Tier 3: Remote Backup (future)

Complete offsite backup for disaster recovery. Not yet implemented. Placeholder shown in UI ("3. mentés — Hamarosan").

Restore (`internal/backup/restore.go`)

Both Tier 1 (restic) and Tier 2 (rsync) restores are supported. All deployed apps appear in the restore dropdown with per-app snapshot filtering.

App type	Config restored	DB restored	User data restored	Docker volumes restored
Has HDD data	Yes	Yes	Yes (always)	Yes (if present)
Docker volumes, no HDD	Yes	Yes	n/a	Yes
DB only, no HDD/volumes	Yes	Yes	n/a	n/a
Config only	Yes	—	n/a	n/a

Snapshot API (/api/backup/snapshots?stack=<name>):

Returns snapshots only from the app's home drive primary repo (prevents showing irrelevant snapshots from other drives)
Appends a synthetic Tier 2 entry (ID tier2-rsync) from cross-drive config when last backup was successful
Dropdown groups by tier: "1. szint — Helyi mentes" and "2. szint — Masodlagos masolat"

Restore type info shown per-app when selected in dropdown (Hungarian banners):

Has HDD or Docker volumes: "Teljes visszaallitas: adatbazis + konfiguracio + felhasznaloi adatok"
Has DB, no user data: "Adatbazis es konfiguracio visszaallitasa"
Config only: "Csak konfiguracio visszaallitasa"

Tier 1 restore (RestoreApp):

Stop app → resolve app's home drive → restic restore <id> --target / --include <path>... → populate Docker volumes from restored tars → restart app
Restore paths: config dir, DB dump dir, volume dump dir, HDD mounts
Docker volumes restored via restoreDockerVolumes(): docker volume rm -f → docker volume create → docker run alpine tar xf

Tier 2 restore (RestoreAppFromTier2):

Stop app → rsync config from _config/ → rsync HDD data (single/multi-mount) → copy DB dumps from _db/ → restore Docker volumes from _volumes/ tars → restart app
Uses rsync --delete for config and HDD data to ensure exact mirror state
Single-mount apps: data directly in rsync dir (excluding _*); multi-mount: per-leaf subdirectories

Common:

Running flag prevents concurrent backup/restore operations
Snapshot ID validated (8-64 lowercase hex, or special tier2-rsync)
Import from .fab bundle link shown in restore section for cross-system migration

Backup Page UI (`internal/web/templates/backups.html`)

Unified per-app status table with expandable rows showing per-tier backup status:

Status dot per app:

Dot color	Meaning
Green	2+ tiers configured with successful backups + destination healthy
Yellow	Only 1 tier, or Tier 2 failing, or Tier 2 configured but never run, or destination disconnected/inactive
Red	Tier 2 destination blocked or inaccessible

Every app starts as yellow (1 tier only). Green requires Tier 2 configured with successful backup.

Per-app backup tiers (3 rows per app):

1. mentes (Tier 1, always present) — Auto badge + "helyi" + last run + contents (e.g., "DB + Konfig + Adatok")
2. mentes (Tier 2, configurable for ALL apps) — one of:
- Configured: method (rsync/restic) + destination + schedule + last run + status + contents + browsable indicator (folder icon for rsync) + action buttons
- Not configured: "1. mentes auto" + "Nincs 2. masolat" + settings link
3. mentes (Tier 3, placeholder) — grayed out "Hamarosan" + "tavoli (offsite)" + future note

Backup contents per app (shown per tier):

Apps with DB + HDD: "DB + Konfig + Adatok"
Apps with Docker volumes (no HDD): "Konfig + DB + Adatok" or "Konfig + Adatok"
Apps with DB only: "DB + Konfig"
Apps with HDD, no DB: "Konfig + Adatok"
Apps with neither: "Konfig"

Deploy page shows cross-drive (Tier 2) configuration form for all deployed apps, not just those with HDD data. Non-HDD apps can configure destination, method, and schedule.

Other sections:

Schedule overview with next run times for DB dump, restic, prune
Snapshot history table (last 20 snapshots aggregated from all per-drive repos, sorted by time)
Storage overview card (total size across repos, snapshot count, DB dump count/size, encryption key with show/copy)
Restore section: app dropdown → per-app snapshot dropdown (Tier 1 + Tier 2 grouped) → restore type info → confirmation checkbox → execute → import from .fab bundle link

4. Storage Management

The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration.

Disk Scanning (`internal/storage/scan.go`)

ScanDisks() uses lsblk -J -b for block device enumeration
System disk detection via host fstab parsing (/host-fstab) + UUID resolution via blkid
Partitions enriched with filesystem type, UUID, and label from direct blkid probing (Docker containers have incomplete udev cache)
Returns AvailableDisks (non-system, non-loop, non-CDROM), SystemDisks, and FormatablePartitions (empty partitions on system disks that are safe to format)
Handles NVMe (nvme0n1p1), SCSI (sdb1), and eMMC (mmcblk0p1) naming

Disk Initialization Wizard (`internal/storage/format.go`)

A step-by-step UI at /settings/storage/init:

Scan — Lists available disks with model, size, partition info
Select — User picks a disk and enters a mount name (e.g., hdd_1)
Confirm — User types "FORMAZAS" to confirm destructive operation
Format pipeline: wipefs → sfdisk (GPT) → mkfs.ext4 → blkid UUID → backup fstab → append UUID-based fstab entry → mount → findmnt verification → chown 1000:1000 → create felhom-data/ and Dokumentumok/ subdirectories
Auto-registers new storage path in settings.json
Smart partition detection: skips repartitioning for existing empty partitions

Safety guards: system disk detection, mount path conflict check, confirmation required, progress channel for real-time UI feedback.

System-disk partition formatting: When the system disk has an empty partition (no filesystem, not mounted, not used for /, /boot, /boot/efi, or swap), the init wizard detects it via FormatablePartitions in the scan result and offers to format just that partition. Uses IsSystemPartition() (granular per-partition check via fstab) instead of IsSystemDisk() (whole-disk block), so sda1 can be formatted while sda3 (root) remains protected.

Attach Existing Drive Wizard (`internal/storage/attach.go`)

A step-by-step UI at /settings/storage/attach for drives that already have a filesystem (e.g., a previously used ext4 drive). Unlike the init wizard, this does not format the drive — existing data is preserved.

Problem solved: Mounting a whole drive at /mnt/<name> would mix existing user data with the controller's directory structure (felhom-data/, Dokumentumok/, etc.). The bind-mount approach isolates the controller's working directory from other data on the drive.

Scan — Lists available disks, filtered to partitions that have an existing filesystem (FSType != "")
Mount raw — Partition is mounted read-only at a hidden staging path (/mnt/.felhom-raw/<label>)
Browse — Directory browser shows the drive's contents. User can navigate and create a new folder (e.g., felhom_data)
Configure — User enters a mount name and display label. Warning: mount path is immutable until detached
Finalize — Bind-mounts the selected subfolder at /mnt/<name>. Two fstab entries are created (both with nofail):
- Raw mount: UUID=<uuid> /mnt/.felhom-raw/<x> <fstype> defaults,nofail,noatime 0 2
- Bind mount: /mnt/.felhom-raw/<x>/<subfolder> /mnt/<name> none bind,nofail 0 0
Sets permissions (chown 1000:1000), creates felhom-data/ and Dokumentumok/ subdirectories
Auto-registers the storage path in settings.json + syncs FileBrowser mounts

Cancel at any point cleans up the temporary raw mount. The bind mount path (/mnt/<name>) is a real mount point, so all existing code (disk usage, IsMountPoint checks, etc.) works unchanged.

Storage Path Registry (`internal/settings/settings.go`)

Multiple external storage paths supported with:

Label: Human-readable name (editable inline)
Default flag: New deploys use this path by default
Schedulable flag: Path appears in deploy dropdown
Disconnected state: Disconnected, DisconnectedAt, StoppedStacks — set by watchdog or safe-disconnect API, cleared on reconnect
Auto-discovery: On startup, scans deployed apps' HDD_PATH values and registers unknown paths
Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel, SetDisconnected, ClearDisconnected

Data Migration (`internal/storage/migrate.go`)

Move app data between storage paths (e.g., SSD → HDD, HDD → new HDD):

Validate: stack exists, deployed, has HDD data, target differs from source
Estimate total size, check free space on target
Stop the application
rsync -a --info=progress2 per mount path with real-time progress parsing
Update app.yaml HDD_PATH to new location
Start the application
Rollback on failure: reverts config, restarts on old storage

Progress UI at /stacks/{name}/migrate with byte counter and percentage.

Stale Data Cleanup

After migration, the deploy page detects leftover data on previous storage paths:

Shows path, size, and a delete button
Two-step confirmation required
Protected paths (felhom-data/, felhom-data/appdata/, felhom-data/backups/, media/, Dokumentumok/) cannot be deleted

FileBrowser Mount Sync

When storage paths are added or removed, syncFileBrowserMounts() auto-regenerates FileBrowser's docker-compose.yml with volume mounts for all registered paths, then recreates the container.

Storage Watchdog (`internal/monitor/watchdog.go`)

Continuously monitors registered storage paths for disconnection/reconnection (primarily USB drives):

Probe loop: ProbeStoragePath() calls syscall.Statfs() with 3-second timeout in a goroutine. Runs every 5s per connected path, 30s per disconnected path.
Debouncing: 3 consecutive probe failures required before declaring a drive disconnected (prevents false positives from transient I/O).
Disconnect reaction (automatic, ~15s detection):
1. Stops all deployed stacks whose HDD_PATH is under the disconnected drive (skips protected stacks)
2. Persists Disconnected, DisconnectedAt, StoppedStacks to settings.json
3. Lazy-unmounts stale VFS entries (umount -l) — for attach-wizard drives, unmounts bind first, then raw
4. Fires alert refresh (red banner on all pages), notification (storage_disconnected), and immediate hub report push
Auto-reconnect (for UUID-based fstab entries):
1. Checks /host-dev/disk/by-uuid/<uuid> for device reappearance
2. Cleans stale mounts, then mount -T /host-fstab <path> (raw + bind for attach-wizard drives)
3. Verifies with a post-mount probe
4. Runs restic unlock if stale lock files exist
5. Validates StoppedStacks (filters to actually-stopped stacks), clears Disconnected flag
6. Fires alert refresh, notification (storage_reconnected), hub report push

Safe disconnect UI (manual, Settings page):

"Leválasztás" button shown for USB drives (detected via sysfs symlink path containing /usb)
Confirmation dialog lists affected apps
Flow: stop apps → sync → umount (fallback umount -l) → mark disconnected → notification
Disconnected card: dashed border, red badge, timestamp, stopped apps list, "Csatlakoztatás" (reconnect) button
After reconnect: "Alkalmazások indítása" button to restart auto-stopped stacks

USB detection (system.IsUSBDevice): Reads /host/sys/block/<disk> symlink — if target path contains /usb, it's a USB device. The removable sysfs flag is unreliable for USB HDDs (returns 0). USB drives show an orange "USB" badge on their storage card alongside Aktív/Alapértelmezett badges (v0.27.2). Handles findmnt bind-mount suffix stripping (/dev/sdb1[/subdir] → /dev/sdb1) for attach-wizard drives (v0.32.5).

Backup guards: Nightly DB dumps, restic snapshots, and cross-drive backups all skip disconnected, removed, and inactive drives with WARN log (not treated as failures). Cross-drive RunAppBackup() returns nil (not error) for unavailable destinations — prevents noisy error aggregation in scheduled runs (v0.32.5).

Tier2 destination unavailable (v0.32.5): When a Tier2 backup destination drive is disconnected, removed from storage, or deactivated (Inaktív), the backup page shows:

Yellow status dot with "2. mentés szünetel" tooltip (not red)
Warning badge: "Cél meghajtó leválasztva" (disconnected/removed) or "Cél meghajtó inaktív" (deactivated)
Grayed-out last-run info and backup contents
Hidden "Futtatás most" button (prevents futile manual triggers)
"Beállítás" link preserved for reconfiguration
Tier2 config persists — backups auto-resume when drive returns/reactivates
Detection: IsStoragePathKnown() catches removed paths, IsStoragePathSchedulable() catches inactive/disconnected/decommissioned

UI integration: Disconnected drives show with hatched red bars on dashboard, monitoring, and backup pages. Per-app backup rows show "Meghajtó leválasztva" badge. Health check emits warnings for disconnected paths.

5. Monitoring & Health

System Health Checks (`internal/monitor/healthcheck.go`)

RunHealthCheck() evaluates multiple subsystems and returns a HealthReport with status (ok/warn/fail):

Check	Warning	Critical
Disk usage (SSD/HDD)	>= 90%	>= 95%
Memory	available < 512MB	available < 256MB
CPU temperature	>= 75C	>= 85C
Docker daemon	—	unreachable
Protected containers	—	not running
Storage paths	not a mount point (data on SSD), drive disconnected	path inaccessible, disk >= 95%

Backup destination validation (CheckBackupDestination) has tiered checks:

Path doesn't exist → critical/blocked
Not writable → critical/blocked
Same block device as root → warning (data on system drive)
Disk >95% full → critical/blocked
Disk >90% full → warning

Healthchecks.io Integration (deprecated)

Legacy pinger (internal/monitor/pinger.go) still runs for backward compatibility but is no longer the primary monitoring mechanism. Monitoring is now handled by the Hub event system (see Notifications). A deprecation log is emitted on startup if ping UUIDs are configured.

Metrics Store (`internal/metrics/`)

SQLite with WAL mode for concurrent reads during collection
System metrics: CPU%, memory (total/used/available), temperature, load average — collected every 60 seconds
Container metrics: CPU%, memory, network I/O, block I/O per container
Downsampled queries for chart time ranges (1h, 6h, 24h, 7d, 30d)
30-day auto-prune via daily scheduler job

Monitoring Page

Full-page system monitor at /monitoring:

System Overview: hostname, OS, kernel, CPU model/cores, uptime
System Metrics Charts: 4 line charts (CPU, Memory, Temperature, Load) in 2x2 grid
Memory Distribution Bar: stacked bar showing per-container memory usage, OS/system overhead, and free memory (real-time from /proc/meminfo + container stats)
Container Resources: horizontal bar charts (CPU% and Memory per container)
Per-container Detail: click-to-expand historical charts
Hub Connection Status: shows Hub URL, customer ID, connection state (connected/unreachable), last successful push, last error

Chart.js 4.4.7 embedded locally (works in offline environments), dark theme matching site design.

Alert System (`internal/web/alerts.go`)

State-based alerts displayed on all pages:

Sources: health issues, Hub connection status, backup disabled, storage disconnected, update available
Hub alerts: hub-disabled (warning) when Hub not enabled, hub-unreachable (error) when last push failed and no success in 30 min
Sorted by severity (error > warning > info), capped at 5 visible
Refreshed every 5 min + on startup + on storage state changes

6. Notifications

Hub Event System (`internal/notify/notifier.go`)

The controller pushes structured events to the Hub's /api/v1/event endpoint. The Hub handles notification dispatch, cooldown management, and dead man's switch detection.

Core method: PushEvent(eventType, severity, message, details) — non-blocking goroutine, 2 retries with 3s backoff, never blocks the caller.

Event Types

Event Type	Severity	Trigger
`backup_completed`	info	Nightly restic backup succeeds
`backup_failed`	error	Nightly restic backup fails
`db_dump_completed`	info	Nightly database dumps succeed
`db_dump_failed`	error	Nightly database dumps fail
`backup_integrity_ok`	info	Weekly `restic check` passes
`backup_integrity_failed`	error	Weekly `restic check` fails
`crossdrive_completed`	info	Cross-drive secondary backup succeeds
`crossdrive_failed`	error	Cross-drive secondary backup fails
`health_degraded`	warning	Health status degrades (ok→warn)
`health_critical`	error	Health status critical (any→fail)
`health_recovered`	info	Health status recovers (fail/warn→ok)
`disk_warning`	warning	Disk usage crosses 90%
`disk_critical`	error	Disk usage crosses 95%
`storage_disconnected`	error	Storage drive physically removed
`storage_reconnected`	info	Storage drive reconnected
`controller_started`	info	Controller process starts
`controller_updated`	info/error	Self-update success or failure
`app_deployed`	info	New app deployed via API
`app_removed`	info	App removed via API
`disaster_recovery_started`	warning	DR restore begins
`disaster_recovery_completed`	info/error	DR restore finishes (success/partial)

Each event carries typed detail structs (e.g., BackupDetails, DiskDetails, HealthDetails) serialized as JSON.

Default Enabled Events

Events the customer receives notifications for (configurable in settings): backup_failed, db_dump_failed, disk_warning, disk_critical, storage_disconnected, node_down, health_critical, expected_backup_missed, expected_dbdump_missed

Preference Sync

Notification preferences (email, enabled events, cooldown hours) are:

Stored locally in settings.json
Synced to Hub on save and on controller startup via POST /api/v1/preferences
Hub sync failure doesn't block local save

7. Update Management

App Catalog Sync

Periodic git fetch + git reset --hard of the app catalog repo
Content-hash comparison prevents unnecessary file writes
Post-sync stack rescan detects new/changed apps immediately
Stale lock recovery: automatically removes .git/index.lock, .git/shallow.lock, and .git/HEAD.lock before each fetch — prevents permanent sync failures after interrupted operations (e.g. container restart mid-sync)

Planned Update Classifications

Marker	Behavior
No marker	Optional — shown on dashboard, customer clicks "Update"
`UPDATE_REQUIRED=true`	Mandatory — auto-applied during next update window
`UPDATE_SECURITY=true`	Critical — applied immediately

Controller Self-Update (`internal/selfupdate/`)

The controller can update itself — a Watchtower-style pull-and-restart mechanism for a single container. Replaces manual SSH-based docker pull + sed + docker compose up -d with a one-click Settings page button or scheduled auto-update.

How It Works

1. Check Gitea Docker Registry V2 API for new image tags
2. Compare highest semver tag with current Version (set at build time via ldflags)
3. If newer version exists → pull image → update compose file → docker compose up -d
4. Current container is replaced by Docker → new container starts with new version
5. On startup, new container reads update-state.json → marks update success/failure

Design Philosophy

No automatic rollback — follows the Watchtower pattern (24k+ GitHub stars, no rollback). Docker's restart: unless-stopped policy is the crash safety net. The Hub's dead man's switch detects when the controller goes down.
Audit state file — update-state.json in the data volume records every update attempt (previous version, target version, initiator, result). Operators can SSH in and revert using PreviousImage from this file.
Backup-aware — refuses to start an update while a backup is in progress (backupRunning() guard).

Package Structure

File	Purpose
`version.go`	`ParseVersion("X.Y.Z")` → `Version{Major,Minor,Patch}`, `Compare()` returns -1/0/1. Hand-rolled, no external deps. Rejects "dev" and "latest".
`state.go`	`UpdateState` struct persisted as JSON. `LoadState()`, `SaveState()` (atomic: `.tmp` + rename), `ClearState()`. Status values: `"pending"`, `"success"`, `"failed"`.
`updater.go`	Core `Updater` struct. Registry check via HTTP GET to `gitea.dooplex.hu/v2/admin/felhom-controller/tags/list` with Basic Auth (git username/token). Update trigger: `docker pull` → compose file regex replace → `docker compose up -d`. Thread-safe with `sync.Mutex`.

Update Trigger Flow

Guard checks: concurrent update lock, dev version check, backup running check, compose file accessible
Write update-state.json with status "pending" (audit trail)
docker pull <image>:<targetVersion>
Read compose file → replace image tag via regexp → atomic write (.tmp + rename)
docker compose -f /opt/docker/felhom-controller/docker-compose.yml -p felhom-controller up -d
Docker kills the current container, starts the new one

Startup Verification

Called once from main.go before the scheduler starts:

Load update-state.json — if missing or status != "pending", nothing to do
Compare running Version with state.TargetVersion
Match → mark "success", notify via hub
Mismatch → mark "failed", notify via hub
No rollback attempt — operator reverts manually if needed

Auto-Update Scheduling

Two separate scheduler jobs prevent interference with backups:

Job	Type	Default	Purpose
`selfupdate-check`	`sched.Every`	6h	Check registry, cache result (for UI). Never triggers update.
`selfupdate-auto`	`sched.Daily`	04:30	If auto-update enabled + update available + backup not running → trigger.

The auto-update time (config.SelfUpdate.AutoUpdateTime, default "04:30") is deliberately separate from the backup window (02:30-~04:00) to avoid collisions. The backupRunning() guard is the hard safety check — if backups run long past 04:30, the update is skipped and retried the next day.

An initial version check fires 30s after startup so the Settings page shows version info quickly.

Compose File Access

The controller needs write access to its own docker-compose.yml. This is achieved via Docker volume mount ordering:

volumes:
  # 1. Directory mount — gives access to compose file + config
  - /opt/docker/felhom-controller:/opt/docker/felhom-controller
  # 2. Read-only override — prevents accidental config writes
  - /opt/docker/felhom-controller/controller.yaml:/opt/docker/felhom-controller/controller.yaml:ro
  # 3. Named volume override — persistent data in Docker-managed volume
  - controller-data:/opt/docker/felhom-controller/data

API Endpoints

Method	Path	Auth	Description
GET	`/api/selfupdate/status`	Session or API key	Current status (cached, no network call)
POST	`/api/selfupdate/check`	Session or API key	Force registry check, return result
POST	`/api/selfupdate/update`	Session or API key	Trigger update (async, returns immediately)

Self-update endpoints accept either session auth (for UI) or hub API key as bearer token (for external triggering from build scripts or hub). This enables the post-v0.16.0 deploy workflow:

# After building + pushing new image:
curl -s -X POST https://felhom.demo-felhom.eu/api/selfupdate/update \
  -H "Authorization: Bearer <HUB_API_KEY>"

Settings Page UI

The "Verzió és frissítés" card on the Settings page (/settings) shows:

Current version and latest available version
"Frissítés elérhető" (update available) badge
Last check time and any errors
Auto-update status with configured time
Last update result (success/failed/pending)
Buttons: "Frissítés keresése" (check) + "Frissítés telepítése" (apply)

After triggering an update, the page polls /api/health every 3s and reloads when the new container responds.

A global info-level alert ("Új controller verzió elérhető") appears on all pages when an update is available, linking to the Settings page.

Configuration

self_update:
  enabled: true
  check_interval: "6h"          # How often to check registry
  image: "gitea.dooplex.hu/admin/felhom-controller"  # Default
  auto_update: false             # Set true for unattended updates
  auto_update_time: "04:30"     # When to auto-apply (after backups)
  health_timeout_seconds: 60    # Reserved for future use

Edge Cases

Scenario	Behavior
`Version == "dev"`	`ParseVersion` returns error → no updates reported, trigger refused
Registry unreachable	Log warning, return error in check result. No crash.
No registry credentials	Return error "Registry hitelesítő adatok hiányoznak"
Compose file not writable	Refuse update before doing anything
Backup running	Refuse with "Mentés fut, próbálja később"
Concurrent update	Mutex prevents duplicates: "Frissítés már folyamatban"
Bad update (crash loop)	Docker restarts container. State file stays "pending". Operator SSH-reverts using `PreviousImage`.
Corrupt state file	Treated as "no pending update", logged, deleted

8. Authentication & Settings

Session Auth (`internal/web/auth.go`)

bcrypt password verification with configurable source priority: settings.json → controller.yaml → no auth (open access)
7-day session duration with random 32-byte hex tokens
?next= redirect after login preserves the page the user was visiting
Session cleanup every 15 minutes
All sessions invalidated on password change
Conditional logout link (hidden when auth is disabled)
Each session stores a dedicated CSRF token (separate 32-byte random value) alongside the session token

CSRF Protection (`internal/web/csrf.go`)

Synchronizer-token CSRF protection on all browser-facing state-mutating endpoints.

How it works:

CsrfProtect middleware wraps all route handlers in main.go
Safe methods (GET, HEAD, OPTIONS) pass through without validation
For POST/DELETE/PATCH: reads token from _csrf form field or X-CSRF-Token request header; constant-time compares against the session's stored CSRF token
On rejection: JSON {"ok":false,"error":"CSRF token missing or invalid"} for /api/ paths; HTTP 403 text page for UI routes
Logs: [WARN] CSRF rejected: METHOD /path from addr (reason)

Exempt paths (no CSRF check):

Requests with Authorization: Bearer ... header — hub→controller API calls (selfupdate, config/apply). Browsers cannot auto-send Bearer headers, so cross-site requests are impossible on these endpoints.
Auth-disabled mode (authEnabled() == false) — CSRF is meaningless when there is no session.

Token delivery to templates:

executeTemplate(w, r, name, data) wrapper in server.go auto-injects CSRFField (template.HTML hidden <input>) and CSRFToken (raw string) into every page's data map
layout.html emits <meta name="csrf-token" content="{{.CSRFToken}}"> and defines csrfHeaders() JS function in <head> (before page scripts)
Forms: {{.CSRFField}} (or {{$.CSRFField}} inside {{range}} loops — outer scope required)
JS fetch() calls: headers: csrfHeaders() — returns {'X-CSRF-Token': metaContent}
Dynamically-created JS forms: read token from document.querySelector('meta[name="csrf-token"]').content
navigator.sendBeacon() replaced with fetch(..., {keepalive: true}) where used — sendBeacon cannot send custom headers

Settings Persistence (`internal/settings/settings.go`)

Runtime-mutable settings in settings.json (separate from infrastructure config):

Section	Contents
`password_hash`	bcrypt hash override
`notifications`	email, enabled events, cooldown hours
`db_validations`	per-DB dump validation results (survives restarts)
`app_backup`	per-app map: enabled flag, cross-drive config (method, dest, schedule, runtime status)
`storage_paths`	registered paths with label, default flag, schedulable flag, disconnected state
`cross_drive_restic_password`	auto-generated restic password for cross-drive repos

All public methods use sync.RWMutex. File writes are atomic (.tmp + rename).

Settings Page (`/settings`)

Five sections:

System config — read-only display of controller.yaml values
Version & update — current/latest version, check/update buttons, auto-update status, last update result
Storage paths — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes, safe disconnect/reconnect for USB drives
Password change — current + new + confirm, min 8 chars
Notifications — email, event checkboxes, cooldown hours, test email button

9. Central Hub Reporting

Report Push (`internal/report/`)

Periodic JSON push (default every 15 min) to the central felhom-hub service:

System: hostname, OS, CPU, memory, disk usage, uptime
Containers: running/stopped counts, per-container CPU/memory
Backup: last run, success, repo stats, snapshot count, restic password (for disaster recovery)
Health: current status, issues, warnings
Stacks: deployed apps with versions and states
Config hash: SHA256 of controller.yaml for Hub-side config comparison
App telemetry (v0.28.0+): Per-stack memory (current/avg/peak) and CPU averages from the last 15 minutes of metrics data, plus log scan results (error/warning counts with deduplicated issues). Only non-protected, deployed stacks are included. Backward-compatible: old Hub versions silently ignore this field.
Controller telemetry (v0.32.4+): The controller's own container (felhom-controller) is included as a special entry in the app_telemetry array. Its memory/CPU metrics come from the same metrics collector, and its log warnings/errors are scanned via docker logs using the same pipeline as app containers. This reuses all existing Hub telemetry infrastructure (memory trend charts, known issues, fleet aggregation) with zero Hub-side changes.

Bearer token authentication, 3-attempt retry with 5-second backoff. Push status tracked via PushStatus struct (LastAttempt, LastSuccess, LastError, consecutive failures) — used by the monitoring page and alert system to show Hub connection health.

App Telemetry (`internal/metrics/telemetry.go`, `internal/metrics/logscanner.go`, `internal/report/telemetry.go`)

Each report push now includes per-app telemetry data:

Metrics collection (telemetry.go):

MetricsStore.GetContainerTelemetry(since) aggregates container-level memory (avg, peak, current) and CPU averages from the container_metrics SQLite table for the last 15 minutes.

Log scanning (logscanner.go):

ScanContainerLogs(containerNames, since, logger) runs docker logs --since=15m --tail=1000 sequentially on all non-protected deployed containers.
Classifies lines by keyword match (errors: error, fatal, panic, crit, oom, killed, exception, traceback; warnings: warn, warning) on the first 5 words (case-insensitive).
Deduplicates via fingerprinting: strips ANSI escape codes, ISO timestamps (with timezone offsets), and syslog timestamps (including mid-line); replaces 6+ digit numbers with <N>, 8+ char hex with <HEX>, UUIDs with <UUID>. Groups identical fingerprints, keeps top 10 per container.
Returns []ContainerLogSummary with ErrorCount, WarnCount, RecentIssues []LogIssue.

Report integration (report/telemetry.go):

buildAppTelemetrySection() calls both, then buildAppTelemetry() aggregates by stack — summing container metrics, merging issues, capping at 10 per app. Additionally, buildControllerTelemetry() creates a special entry for the controller container itself (app_name: "felhom-controller").
Results stored as []AppTelemetry in the Report struct field app_telemetry.

Infrastructure Backup to Hub (`internal/report/infra_backup.go`)

After each backup cycle (including manual Tier 2 triggers via OnCrossDriveComplete callback), the controller pushes a full infrastructure snapshot to the Hub for disaster recovery. This snapshot includes:

controller.yaml (base64-encoded, full config including secrets)
settings.json (base64-encoded, backup prefs, storage paths, cross-drive configs)
Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
Deployed stacks manifest (app names, HDD paths)
Restic passwords (primary + cross-drive, base64-encoded)

This enables fully automated recovery when the system drive is replaced — the new controller pulls the snapshot from the Hub, auto-mounts surviving drives by UUID, and restores all applications.

Hub Dashboard

The hub service (separate Go app in the felhom.eu repo) provides:

Multi-customer overview table with status indicators and event count badges
Customer detail page with system/storage/containers/backup/health/events sections
Event timeline: last 50 events with severity filter, colored badges, source tracking
Dead man's switch: staleness detection (30min stale, 60min down), missed backup detection (daily at 05:00)
Notification dispatch: operator (English) + customer (Hungarian) emails via Resend with per-event cooldowns
Infra backup status per customer (last sync, stack count, disk count)
Color coding: green (<30min), yellow (30-60min), red (>60min since last report)
90-day report + event retention with daily prune at 04:30 Budapest time

10. First-Run Setup Wizard

When the controller starts with no valid customer configuration (customer.id empty), it enters setup mode — a web-based wizard that handles all initial configuration. This replaces the old interactive shell wizard in docker-setup.sh.

Setup Mode Detection (`internal/setup/setup.go`)

NeedsSetup(cfg) returns true when customer.id is empty or a .needs-setup marker file exists. In setup mode, the controller skips normal startup (no scheduler, no backup, no stacks) and serves only the wizard UI on two listeners:

:8080 — behind Traefik (accessible via domain, e.g. https://felhom.example.com)
:8081 — direct HTTP (accessible via LAN IP, e.g. http://192.168.0.100:8081)

Wizard Flow

┌──────────────────────────────────┐
│  1. Welcome                      │
│  Choose: Restore / Fresh install │
└─────────┬───────────┬────────────┘
          │           │
    ┌─────▼─────┐  ┌──▼───────────────┐
    │ 2a. Scan  │  │ 2b. Hub download  │
    │ drives for│  │ (customer ID +    │
    │ local     │  │  password)        │
    │ backups   │  │                   │
    └─────┬─────┘  └──────┬────────────┘
          │               │
    ┌─────▼─────┐         │
    │ 2a.2 Hub  │         │
    │ recovery  │         │
    │ (fallback)│         │
    └─────┬─────┘         │
          │               │
    ┌─────▼─────┐  ┌──────▼───────────┐
    │ Execute   │  │ Execute fresh    │
    │ restore   │  │ install          │
    └─────┬─────┘  └──────┬───────────┘
          │               │
          └───────┬───────┘
                  ▼
          os.Exit(0) → Docker restarts
          → normal mode

Hub Pre-Seeding

When docker-setup.sh is run with --hub-customer / --hub-password, the controller receives pre-seeded credentials via environment variables:

Env var	Purpose
`FELHOM_SETUP_CUSTOMER_ID`	Pre-fills customer ID in wizard forms
`FELHOM_SETUP_PASSWORD`	Pre-fills retrieval password for auto-processing

In hub mode, the welcome page shows three cards instead of two:

"Visszaállítás a Hub-ról" — auto-calls PullRecovery(), shows infra backup details
"Visszaállítás helyi meghajtóról" — standard drive scan
"Friss telepítés" — auto-calls PullConfig(), downloads config only

Both hub paths auto-process when credentials are pre-seeded (no form entry needed). On error, the wizard falls back to the manual form with the error displayed.

Key Components

File	Purpose
`setup/setup.go`	`NeedsSetup()` detection, `SetupState` persistence to `setup-state.json`
`setup/handlers.go`	HTTP handlers for each wizard step (welcome, scan, hub-restore, fresh, manual)
`setup/scanner.go`	Scans all block devices for `.felhom-infra-backup/` directories (current + `history/`) via `lsblk` + temp mounts; returns rich info (app names, disk count)
`setup/hub.go`	Hub recovery pull (`GET /api/v1/recovery/{id}`) and config download
`setup/csrf.go`	Lightweight CSRF protection (cookie + hidden field, `SameSite=Strict`)
`setup/network.go`	Detects local IPs for LAN access URL display
`setup/templates/`	8 embedded HTML templates (Hungarian, dark theme matching main UI) — includes `setup_hub_versions.html` for Hub backup version picker

Local Infra Backup (`internal/backup/local_infra.go`)

The controller writes infrastructure snapshots to every connected drive after each backup cycle and on startup. Location: <drive>/.felhom-infra-backup/. Files:

backup.json — full infra backup (config, settings, disk layout, passwords, stacks)
metadata.json — schema version, timestamp, customer ID, controller version, SHA256 checksum
history/ — previous backup versions (last 5), rotated automatically before each write
- {timestamp}-backup.json + {timestamp}-metadata.json pairs (timestamp format: 20060102T150405Z)
- Oldest entries pruned when count exceeds 5

During setup wizard drive scan, both current and historical backups are discovered, integrity-verified, and offered for one-click restore. The scan results table shows app names/count, disk count, and a "korábbi" badge for historical versions.

Recovery Info (`internal/recovery/info.go`)

Generates recovery-info.txt on the system data partition with customer ID, Hub URL, retrieval password, and recovery instructions in Hungarian. Updated on startup and after config changes. Also displayed on the Settings page in a "Vészhelyzeti információk" section.

11. Disaster Recovery

When a system drive fails and is replaced, the recovery flow uses the setup wizard:

1. docker-setup.sh deploys fresh controller with minimal config
   - With --hub-customer: credentials pre-seeded via env vars
   - Without: user enters credentials manually in wizard
2. Controller detects empty customer.id → enters setup mode
3. User opens wizard at http://<LAN-IP>:8081
4. Hub mode: welcome page shows Hub restore / local scan / fresh install
   Non-hub mode: welcome page shows restore / fresh install
5. Hub restore: auto-connects to Hub, shows version picker if multiple versions
   Local restore: scans all drives for .felhom-infra-backup/ directories (current + history/)
6. User selects backup version → restore: config, settings, passwords, disk layout
7. Controller restarts into normal mode with full config
8. Controller auto-mounts surviving drives by UUID from disk layout
9. Dashboard shows "Visszaállítás" (Restore) page for app-level recovery
10. User confirms → sequential restore: rsync first, restic fallback, DB import

Backup sources (priority order):

Local infra backup (.felhom-infra-backup/ on surviving drives) — fastest, no network needed
Hub recovery endpoint (GET /api/v1/recovery/{id}) — requires retrieval password, supports ?version=ID for specific versions; Hub retains ~14 versions via GFS pruning (7 daily / 4 weekly / 3 monthly)
Manual config (wizard form) — enter all details manually as last resort

Hub verification: After setup, the controller periodically verifies customer standing via the Hub report push response (customer_blocked field). If blocked or Hub unreachable for >7 days, the controller enters limited mode (no new deployments).

12. Asset Sync

App assets (logos, screenshots) are managed centrally by the Hub and downloaded to each controller via a daily sync process. This decouples asset updates from controller image rebuilds — new app icons only require a Hub redeploy.

How It Works (`internal/assets/syncer.go`)

1. Fetch manifest from Hub: GET /api/v1/assets/manifest (Bearer auth)
2. Compare SHA-256 checksums with local cache (<dataDir>/assets/)
3. Download changed/new files: GET /api/v1/assets/file/{filename}
4. Remove local files not in Hub manifest (stale cleanup)
5. Save local manifest copy for next comparison

Asset Resolution (two-tier)

Priority	Path	Source
1	`<dataDir>/assets/`	Downloaded from Hub (synced cache)
2	`/usr/share/felhom/assets/`	Baked into Docker image (fallback)

The Resolve(filename) method checks the synced cache first, then falls back to the baked-in directory. This ensures assets are always available even before the first sync.

The Felhom logo (/static/felhom-logo.svg) also uses this two-tier resolution: the logo handler checks synced assets first, then falls back to the embedded SVG constant. This allows logo updates via Hub without a controller rebuild. The logo is also used as an SVG favicon.

Configuration

assets:
  sync_enabled: true       # Opt-in: download assets from Hub API
  sync_schedule: "05:00"   # Daily sync time (HH:MM, Budapest timezone)

Asset sync requires hub.enabled: true with valid hub.url and hub.api_key. The initial sync runs 10 seconds after startup (to let subsystems initialize), then daily at the configured time.

Sync Status

The syncer tracks status (last sync time, result, file count, total bytes) accessible via GET /api/assets/status. On-demand sync can be triggered via POST /api/assets/sync.

File Types

The Hub serves three asset types per app:

{slug}-logo.svg — primary SVG logo
{slug}-logo.png — PNG fallback
{slug}-screenshot-{N}.webp — app screenshots

Key Design Decisions

Opt-in via sync_enabled — backward compatible, baked-in assets still work without Hub
SHA-256 change detection — only downloads files that actually changed (bandwidth efficient)
Atomic file writes — downloads to .tmp then os.Rename for crash safety
Stale file cleanup — removes local files not in the Hub manifest (e.g., deleted apps)
Non-blocking initial sync — runs in a goroutine with 10s delay, doesn't block startup

13. Debug Mode

When logging.level: "debug" is set in controller.yaml, the controller exposes a full diagnostic dashboard at /debug with 9 testing sections. All debug endpoints are gated — at info level, the sidebar link disappears and all /api/debug/* routes return 404.

Debug Page Sections

#	Section	Endpoints	Description
1	Rendszer diagnosztika	`GET /api/debug/dump`	Full state dump: controller info, storage, stacks, scheduler, health, alerts. JSON download.
2	Értesítés teszt	`POST /api/debug/event/test`, `GET /api/debug/event/history`	Send test events with configurable type/severity, view event history ring buffer.
3	Mentés teszt	`POST /api/debug/backup/{dbdump,crossdrive,integrity,infra}`	Trigger individual backup phases independently.
4	Tárhely teszt	`POST /api/debug/storage/simulate-{disconnect,reconnect}`, `GET /api/debug/storage/watchdog-status`	Simulate drive disconnect/reconnect without unmounting. Per-path probe state with 5s auto-refresh.
5	Hub & Kapcsolatok	`POST /api/debug/hub/{push,infra-push,test-connectivity,preferences-sync}`, `POST /api/debug/gitea/test-connectivity`	Test Hub/Gitea connectivity with latency. Push reports and sync preferences.
—	Telemetria teszt	`GET /api/debug/telemetry`	Run the full telemetry collection pipeline on-demand (metrics query + log scan). Returns per-app table: container list, memory current/avg/peak, CPU avg, catalog limit, log error/warning counts, and top issues. Useful for verifying container→stack mapping and testing log scanner patterns without waiting for the 15-minute report cycle.
6	Önfrissítés teszt	`POST /api/debug/selfupdate/dry-run`	Dry-run update check: current vs new image lines, compose writability, backup state.
7	DR / Telepítő varázsló	`POST /api/debug/dr/trigger-setup`, `GET /api/debug/dr/infra-status`	Infra backup status per drive. Trigger setup mode via marker file (requires "RESET" + infra backup pre-check).
8	Naplóviewer	`GET /api/debug/logs?level=&limit=&after=`	In-memory log viewer (last 1000 entries), level filter, 2s auto-refresh, color-coded entries.

Key Implementation Details

Log buffer (internal/web/logbuffer.go): Ring buffer implementing io.Writer, created before all modules via io.MultiWriter(os.Stdout, logBuffer). Parses [DEBUG]/[INFO]/[WARN]/[ERROR] tags from standard log format.
Storage simulation: simulatedPaths map in watchdog prevents the watchdog from re-probing simulated-disconnected paths. Disconnect runs all real steps except lazyUnmount (drive stays physically mounted).
DR trigger safety: Uses marker file (data/.needs-setup) instead of modifying controller.yaml. Pre-checks that infra backup exists on at least one drive.
Routing: /api/debug/ carved out in HTTP mux (same pattern as /api/storage/), routed to web server with auth + CSRF.
DebugCallbacks: 7 closures wired from main.go for operations needing modules not on Server struct (hub push, infra backup, connectivity tests, telemetry preview).
Telemetry debug: GetTelemetryPreview callback calls report.BuildAppTelemetryForDebug() (exported wrapper around the private buildAppTelemetrySection()). Result renders as a table with collapsible raw JSON. Available regardless of hub configuration.

Per-Module Logging

All modules emit structured log lines at [INFO], [WARN], and [ERROR] levels for operational events (state changes, completions, failures). When logging.level: "debug", additional detailed [DEBUG] [module] prefixed log lines are emitted. Each module with stateful debug (struct-based) exposes a SetDebug(bool) method, wired from main.go. Modules without a struct use package-level DebugLogger variables (e.g., system.DebugLogger).

Standard-level logging (always active):

[INFO] — Operational events: stack deploy/start/stop, backup completion, config changes, disk operations, sync results
[WARN] — Degraded states: health threshold breaches, unsafe backup destinations, retryable failures, best-effort operation failures
[ERROR] — Hard failures: data restore errors, integration apply failures, compose file update errors, disk format failures

Module	Debug Field	Prefix	Key Areas
`stacks`	`cfg.Logging.Level`	`[DEBUG] [stacks]`	Stack CRUD, compose commands, env vars, HDD mounts, encryption migration, health probes
`backup`	`ResticManager.debug`	`[DEBUG] [restic]` / `[DEBUG] [backup]`	Restic commands, snapshot operations, restore scanning, drive mounting
`cloudflare`	`Client.debug` + `GeoSyncManager.debug`	`[CF-DEBUG]` / `[DEBUG] [cloudflare]`	API requests/responses, WAF rule CRUD, zone resolution, geo sync diff
`integrations`	`Manager.debug`	`[DEBUG] [integrations]`	Toggle apply/revoke timing, lifecycle hooks, config reapply
`system`	`DebugLogger`	`[DEBUG] [system]`	Memory/disk/CPU/load/temp collection, mount probing, USB detection
`monitor`	`Pinger.debug`	`[DEBUG] [pinger]`	Health ping URLs, retry attempts, response codes
`settings`	`Settings.debug`	`[DEBUG] [settings]`	Load/save sizes, storage path ops, geo/integration state changes
`scheduler`	`Scheduler.debug`	`[DEBUG] [sched]`	Job registration, execution timing, daily schedule calculations
`web`	`cfg.Logging.Level`	`[DEBUG] [web]`	HTTP requests, auth decisions, session management, storage API ops
`api`	`Router.debug`	`[DEBUG] [api]`	API routing, handler entry points, request details
`selfupdate`	`Updater.debug`	`[DEBUG] [selfupdate]`	Version checks, update preconditions, docker pull timing
`assets`	`Syncer.debug`	`[DEBUG] [assets]`	Manifest fetch, hash comparison, file download timing
`storage`	logger-based	`[DEBUG] [storage]`	Disk scanning, formatting, attach, drive migration
`metrics`	logger-based	`[DEBUG] [metrics]`	Per-container log scanning, error/warning counts
`appexport`	`Exporter.debug`	`[DEBUG] [appexport]`	Export/import steps, crypto operations, bundle scanning

14. Geo-Restriction

Country-based access control via Cloudflare WAF Custom Rules. The controller manages WAF rules in the http_request_firewall_custom phase to block requests from non-allowed countries. Rules are identified by a [felhom-geo] description prefix — other WAF rules are never touched.

Prerequisites

The existing cf_api_token (used for DNS-01 ACME) needs Zone WAF:Edit permission added. No new token is needed — just expanded permissions on the same token. The settings UI only appears when a CF API token is configured.

Architecture

┌─────────────┐     ┌──────────────────┐     ┌──────────────────────┐
│  Settings UI │────▶│  GeoSyncManager  │────▶│  Cloudflare WAF API  │
│ (settings.   │     │  (geosync.go)    │     │  /zones/{id}/        │
│  html)       │     │  diff & apply    │     │  rulesets/{id}/rules │
└─────────────┘     └──────────────────┘     └──────────────────────┘
       │                     ▲
       │  POST /api/geo/*    │  Scheduler (6h)
       ▼                     │  + deploy/remove hooks
┌─────────────┐              │
│  API layer  │──────────────┘
│  (geo.go)   │
└─────────────┘

Rule structure:

Global rule: (not ip.src.country in {"HU"}) → block (with http.host ne exclusions for apps that have per-app overrides)
Per-app rule: (http.host eq "app.example.com" and not ip.src.country in {"HU" "US"}) → block
Block response: HTTP 403 with Hungarian message

Local network access is inherently unaffected — traffic from the LAN goes directly to the server, bypassing Cloudflare entirely.

Cloudflare API Client (`internal/cloudflare/`)

File	Purpose
`client.go`	HTTP client with Bearer token auth, 15s timeout, generic `do()` helper
`zone.go`	Zone ID resolution — tries exact domain, then parent domains progressively
`waf.go`	WAF rule CRUD, expression builders (`BuildGlobalExpression`, `BuildAppExpression`)
`countries.go`	~250 ISO 3166-1 alpha-2 codes with Hungarian names
`geosync.go`	Sync orchestrator — diffs desired vs existing rules, creates/updates/deletes

GeoSyncManager uses a StackLister interface (implemented by geoStackAdapter in main.go) to get deployed app hostnames without circular imports.

Settings Model

Stored in settings.json (runtime-modifiable):

type GeoRestriction struct {
    Enabled          bool                      `json:"enabled"`
    AllowedCountries []string                  `json:"allowed_countries"`
    AppOverrides     map[string]AppGeoOverride `json:"app_overrides,omitempty"`
    LastSync         string                    `json:"last_sync,omitempty"`
    LastSyncError    string                    `json:"last_sync_error,omitempty"`
    ZoneID           string                    `json:"zone_id,omitempty"`
    RulesetID        string                    `json:"ruleset_id,omitempty"`
}

Thread-safe access via GetGeoRestriction(), SetGeoRestriction(), SetGeoAppOverride(), RemoveGeoAppOverride(), SetGeoSyncState().

API Endpoints

Method	Path	Description
GET	`/api/geo/status`	Current geo settings + sync state
POST	`/api/geo/settings`	Update global settings (enable/disable, countries)
POST	`/api/geo/sync`	Trigger manual sync
GET	`/api/geo/countries`	Full country list for search UI
POST	`/api/stacks/{name}/geo/override`	Set per-app country override
DELETE	`/api/stacks/{name}/geo/override`	Remove per-app override

All mutating endpoints trigger an async Cloudflare sync. The /api/geo/ path accepts both session auth and Hub Bearer token auth (via selfUpdateAuthMiddleware), enabling Hub-side geo-disable for lockout recovery.

Sync Triggers

Settings change — user saves geo settings or per-app override
Deploy/remove — app deployment or removal changes the hostname list
Scheduler — periodic verification every 6 hours
Startup — delayed initial sync 15s after boot
Manual — "Szinkronizálás" button on settings page

UI

Settings page ("Beállítások" → "Földrajzi korlátozás"):

Enable/disable toggle
Searchable country autocomplete with tag-based selection
Hungary pinned with confirm() warning on removal
Per-app overrides summary with add/edit/remove
Sync status display (last sync time, errors)

App detail page (per-app override, shown when geo is globally enabled):

Toggle for custom country restriction
Independent country selector

15. App-to-App Integrations

Generic framework for connecting deployed applications to each other. Provider apps declare available integrations in .felhom.yml, and users enable/disable them via toggle switches on the provider's deploy/settings page ("Beállítások").

Architecture (`internal/integrations/`)

integrations.go — Core types: Handler interface (Apply/Revoke), ApplyContext (carries domain, decrypted env vars, provider metadata, stacks dir, logger, restart func), StatusInfo (UI data), IntegrationKey()/ParseIntegrationKey() key helpers
manager.go — Manager coordinates toggle operations, builds apply contexts from decrypted app.yaml env vars. Uses StackProvider interface (GetStack, GetStacks, RestartStack) to break circular imports with stacks package — adapted via integrationStackAdapter in main.go. Key methods:
- Toggle(ctx, provider, target, enable) — Validates both apps deployed+running, calls Apply/Revoke, persists state
- ListForProvider(slug) — Returns []StatusInfo for UI with target deployment/running status
- ReapplyConfigForTarget(name) — Re-applies all active integrations targeting a stack (config-only, no restart). Used by SyncFileBrowserMounts after config regeneration
lifecycle.go — Lifecycle hooks called from API router goroutines:
- OnStackStop — Revokes active integrations, sets "provider_stopped"/"target_unavailable" (keeps enabled=true)
- OnStackStart — Re-applies enabled integrations after 5s delay (waits for stack state refresh). Accepts both StateRunning and StateStarting via isStackUp() helper
- OnStackRemove — Revokes and permanently deletes integration state
Handler implementations — One file per integration pair (e.g. onlyoffice_filebrowser.go, onlyoffice_nextcloud.go)

Integration State

Stored in settings.json under integrations map (key: "provider:target"):

enabled — User intent (survives stop/restart)
status — Current state: "active", "error", "disabled", "provider_stopped", "target_unavailable"
last_error — Most recent error message
enabled_at — RFC3339 timestamp

CRUD methods in settings.go: GetIntegrationState, SetIntegrationState, RemoveIntegrationState, GetIntegrationsForProvider, GetIntegrationsForTarget (all use existing RWMutex + atomic write pattern).

Lifecycle

Enable: User toggles on → validates both apps deployed+running → calls Handler.Apply() → persists state as "active"
Disable: User toggles off → calls Handler.Revoke() → persists state as "disabled"
Provider/target stops: OnStackStop → calls Handler.Revoke() → sets status to "provider_stopped" or "target_unavailable" (keeps enabled=true)
Provider/target starts: OnStackStart (5s delay) → finds enabled integrations with non-active status → re-applies if both sides running/starting
Provider/target removed: OnStackRemove → revokes and deletes integration state permanently
FileBrowser config regen: SyncFileBrowserMounts regenerates config.yaml from scratch → ReapplyConfigForTarget("filebrowser") patches integration config synchronously before docker compose up -d --force-recreate

Important: SyncFileBrowserMounts uses --force-recreate because config.yaml is a bind mount — without it, docker compose up -d won't recreate the container when only the config file changes (compose only detects compose file changes). ReapplyConfigForTarget calls each handler's Apply with a no-op RestartStack since the caller handles the restart.

Built-in Handlers

OnlyOffice → FileBrowser (onlyoffice_filebrowser.go):

Apply: Reads JWT_SECRET + SUBDOMAIN from OnlyOffice app.yaml (decrypted), strips any existing integrations: block from FileBrowser config.yaml via removeIntegrationsSection(), appends new block with url (public HTTPS), internalUrl (http://onlyoffice:80), secret, viewOnly: false. Atomic write (.tmp + rename). Restarts FileBrowser
Revoke: Strips integrations: block from config.yaml, restarts FileBrowser

OnlyOffice → Nextcloud (onlyoffice_nextcloud.go):

Apply: Runs docker exec -u www-data nextcloud php occ commands:
1. app:install onlyoffice (tolerates "already installed")
2. app:enable onlyoffice
3. config:app:set onlyoffice DocumentServerUrl --value=https://{subdomain}.{domain}
4. config:app:set onlyoffice DocumentServerInternalUrl --value=http://onlyoffice:80
5. config:app:set onlyoffice jwt_secret --value={JWT_SECRET}
6. config:app:set onlyoffice StorageUrl --value=http://nextcloud (internal callback URL)
Revoke: Runs occ app:disable onlyoffice (tolerates container not running / app not enabled)

OnlyOffice compose template notes: Requires Traefik middleware X-Forwarded-Proto=https in labels so the Document Server generates HTTPS URLs for editor resources (prevents mixed content errors in browser).

Metadata (`.felhom.yml`)

Provider apps declare integrations in their .felhom.yml. Parsed into IntegrationDef struct in metadata.go, with HasIntegrations() helper.

integrations:
  - target: filebrowser
    label: "FileBrowser integráció"
    description: "Dokumentumok szerkesztése a fájlkezelőben"
  - target: nextcloud
    label: "Nextcloud integráció"
    description: "Dokumentumok szerkesztése a Nextcloudban"

API Endpoints

Method	Endpoint	Description
GET	`/api/integrations/{provider}`	List integrations for a provider app (status, target availability)
POST	`/api/integrations/{provider}/{target}`	Enable/disable integration (`{"enabled": true/false}`)

Routes registered before hasSuffix-based stack routes in router.go (see router bug pattern).

UI

Toggle switches on the provider's deploy/settings page ("Integrációk" section, within deploy.html). Data wired in deployHandler() for deployed apps only. Each integration shows:

Label and description from .felhom.yml metadata
Status badge: "Aktív", "Nincs telepítve", "Célalkalmazás leállítva", "Hiba"
Toggle checkbox (disabled when target not deployed/running)
JS toggleIntegration() → POST to API → reload on success

Wiring (main.go)

integrationStackAdapter type implements integrations.StackProvider (same pattern as stackAdapter, geoStackAdapter)
integrations.NewManager(sett, adapter, domain, stacksDir, encKey, logger) — registers built-in handlers
Wired into API router via SetIntegrationManager() and web server via SetIntegrationManager()

Repository Layout

controller/
├── cmd/controller/main.go           # Entry point, wires all 17 modules (setup mode branch + normal startup)
├── internal/
│   ├── config/config.go             # YAML loader, validation, env overrides
│   ├── crypto/crypto.go             # AES-256-GCM encryption for app.yaml secrets, key management
│   ├── settings/settings.go         # Runtime settings (JSON, atomic writes, RWMutex)
│   ├── stacks/
│   │   ├── manager.go               # Stack scanning, compose ops, container status
│   │   ├── metadata.go              # Parse .felhom.yml app metadata
│   │   ├── deploy.go                # First-deploy: secret gen, app.yaml, compose up; missing field injection
│   │   └── delete.go                # Stack deletion/removal + HDD/backup data cleanup
│   ├── sync/sync.go                 # Git sync: clone/pull app catalog, content-hash copy
│   ├── storage/
│   │   ├── scan.go, scan_linux.go   # Disk detection via lsblk + blkid
│   │   ├── format.go, format_linux.go  # Partition, format, mount pipeline
│   │   ├── attach.go, attach_linux.go  # Attach existing FS drive (raw mount + bind mount)
│   │   ├── safety.go, safety_linux.go  # System disk detection, mount guards, fstab ops
│   │   ├── migrate.go              # App data migration (rsync with progress)
│   │   └── *_other.go              # Non-Linux stubs for cross-compilation
│   ├── backup/
│   │   ├── backup.go               # Orchestrator (per-drive dumps + restic + cross-drive chain)
│   │   ├── paths.go                # Per-drive path helpers (FelhomDataDir constant, PrimaryResticRepoPath, AppDataDir, InfraBackupDir, etc.)
│   │   ├── local_infra.go          # Local infra backup to all drives (.felhom-infra-backup/)
│   │   ├── dbdump.go               # DB auto-discovery + dump (pg_dump, mariadb-dump)
│   │   ├── restic.go               # Restic operations (init, snapshot, prune, check) — repoPath as param
│   │   ├── appdata.go              # StackDataProvider interface, app data discovery
│   │   ├── crossdrive.go           # Per-app backup to secondary storage (rsync/restic)
│   │   ├── restore.go              # Per-app restore from per-drive repo
│   │   ├── restore_scan.go         # DR: scan drives for backup data, build restore plan
│   │   ├── restore_app_linux.go    # DR: per-app restore (rsync config/data + docker compose up)
│   │   └── restore_drives_linux.go # DR: auto-mount drives by UUID from Hub infra backup
│   ├── cloudflare/
│   │   ├── client.go               # CF API client (Bearer auth, generic JSON helper)
│   │   ├── zone.go                 # Zone ID resolution (domain → zone)
│   │   ├── waf.go                  # WAF rule CRUD + expression builders
│   │   ├── countries.go            # ISO 3166-1 country codes + Hungarian names
│   │   └── geosync.go              # Geo sync orchestrator (diff & apply rules)
│   ├── integrations/
│   │   ├── integrations.go          # Core types: Handler interface, ApplyContext, StatusInfo
│   │   ├── manager.go               # Manager: Toggle, ListForProvider, StackProvider interface
│   │   ├── lifecycle.go             # OnStackStop, OnStackStart, OnStackRemove hooks
│   │   ├── onlyoffice_filebrowser.go # OnlyOffice → FileBrowser handler (config.yaml patch)
│   │   └── onlyoffice_nextcloud.go  # OnlyOffice → Nextcloud handler (occ commands)
│   ├── assets/syncer.go             # Hub asset sync (download, SHA-256 compare, resolve)
│   ├── api/
│   │   ├── router.go               # REST API endpoints (~36 routes)
│   │   └── geo.go                  # Geo-restriction API handlers
│   ├── scheduler/scheduler.go      # Central job scheduler (Every, Daily)
│   ├── system/
│   │   ├── info.go, info_linux.go  # RAM, disk, CPU, temperature, load average
│   │   ├── cpu_linux.go            # Background /proc/stat sampling
│   │   └── mounts_linux.go         # Mount points, disk usage, FS info, backup dest checks, storage probing, USB detection
│   ├── monitor/
│   │   ├── pinger.go               # Healthchecks.io HTTP ping client
│   │   ├── healthcheck.go          # System health checks (disk, mem, CPU, temp, Docker)
│   │   └── watchdog.go             # Storage watchdog (probe, disconnect/reconnect, safe eject)
│   ├── metrics/
│   │   ├── store.go                # SQLite time-series (WAL mode, downsampled queries)
│   │   ├── collector.go            # Background collector (60s, system + docker stats)
│   │   └── sysinfo.go              # Static system info (/proc, /etc)
│   ├── selfupdate/
│   │   ├── version.go              # Semver parsing + comparison (hand-rolled)
│   │   ├── state.go                # Update audit state (JSON, atomic writes)
│   │   └── updater.go              # Registry check, update trigger, startup verify
│   ├── notify/notifier.go          # Email relay to hub, preference sync, cooldowns
│   ├── report/
│   │   ├── builder.go              # Hub report builder (all subsystems → JSON)
│   │   ├── pusher.go               # HTTP POST to hub (retry, Bearer auth, parses customer_blocked)
│   │   └── infra_pull.go           # DR: pull recovery/config from Hub (retrieval password auth)
│   ├── setup/                      # First-run setup wizard (web-based, replaces docker-setup.sh wizard)
│   │   ├── setup.go                # NeedsSetup() detection, state persistence
│   │   ├── handlers.go             # HTTP handlers for all wizard steps
│   │   ├── scanner.go              # Drive scanner for local infra backups
│   │   ├── csrf.go                 # Lightweight CSRF (cookie + hidden field)
│   │   ├── network.go              # Local IP detection for LAN access URLs
│   │   └── templates/              # 7 wizard HTML templates (Hungarian)
│   ├── recovery/info.go            # Recovery info file generator (recovery-info.txt)
│   └── web/
│       ├── server.go               # HTTP server, routing, static files, catch-all middleware, executeTemplate wrapper
│       ├── auth.go                 # Session auth + per-session CSRF token, login/logout, session cleanup
│       ├── csrf.go                 # CsrfProtect middleware, csrfToken/csrfField helpers
│       ├── handlers.go             # Page handlers (dashboard, stacks, deploy, backups, etc.)
│       ├── handler_restore.go      # DR: restore page handler + APIs (scan, restore all, skip)
│       ├── handler_debug.go        # Debug page handler + 20 debug API endpoints (debug-mode only)
│       ├── logbuffer.go            # Ring buffer (io.Writer) for in-memory log capture
│       ├── storage_handlers.go     # Storage API handlers (scan, format, attach, migrate, cleanup, disconnect/reconnect)
│       ├── alerts.go               # State-based alert generation
│       ├── funcmap.go              # Template functions (state colors, Hungarian formatting)
│       ├── embed.go                # go:embed for templates + Chart.js
│       └── templates/              # 15 HTML files + style.css (Hungarian UI, incl. debug.html, catchall.html)
├── configs/
│   ├── controller.yaml.example     # Full config reference
│   └── example-felhom-metadata.yml # .felhom.yml format reference
├── Dockerfile                      # Multi-stage: Go 1.24 builder + debian-slim runtime
├── docker-compose.yml              # Controller's own compose (privileged, /mnt rshared)
└── go.mod                          # Go 1.24, deps: bcrypt, yaml.v3, modernc.org/sqlite

Configuration

Controller config (`controller.yaml`)

Single YAML file per customer, infrastructure-only. Does not contain app-specific config.

Key sections:

customer:
  name: "Demo Felhom"
  id: "demo-felhom"

paths:
  stacks_dir: "/opt/docker/stacks"
  data_dir: "/opt/docker/felhom-controller/data"
  system_data_path: "/mnt/sys_drive"   # NVMe/system drive — fallback for apps without HDD

git:
  repo_url: "https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git"
  sync_interval: "15m"

# Per-drive backup paths are computed automatically:
#   <drive>/backups/primary/restic/          — restic repo per drive
#   <drive>/backups/primary/<app>/db-dumps/  — DB dumps per app
#   <drive>/backups/secondary/               — cross-drive rsync + restic
backup:
  enabled: true
  restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
  db_dump_schedule: "02:30"
  restic_schedule: "03:00"
  retention: { keep_daily: 7, keep_weekly: 4, keep_monthly: 6 }

monitoring:
  health_interval: "5m"
  ping_uuids:
    heartbeat: "uuid-here"
    system_health: "uuid-here"
    db_dump: "uuid-here"
    backup: "uuid-here"
    backup_integrity: "uuid-here"

web:
  listen: ":8080"
  setup_listen: ":8081"   # Plain HTTP for setup wizard LAN access

hub:
  enabled: true
  url: "https://hub.felhom.eu"
  api_key: "bearer-token-here"

assets:
  sync_enabled: true       # Download app assets (logos, screenshots) from Hub API
  sync_schedule: "05:00"   # Daily sync time (HH:MM, Budapest timezone)

system:
  reserved_memory_mb: 384  # RAM reserved for OS + controller

Environment variable overrides: FELHOM_LOGGING_LEVEL=debug, FELHOM_HUB_ENABLED=false, etc.

Runtime settings (`settings.json`)

Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache, Hub verification state (hub_verified, hub_verified_at), retrieval password for disaster recovery, and pending event queue. All writes are atomic (write .tmp, rename).

Per-app config (`app.yaml`)

Auto-generated during deployment. Contains env vars, locked fields list, deploy timestamp. Secret fields are locked (read-only after first deploy). Missing fields from updated templates are auto-injected on startup and after sync (see Missing Field Injection).

Encryption at rest: Sensitive env values (type: password and type: secret from .felhom.yml metadata) are stored encrypted as ENC:base64(nonce+ciphertext) using AES-256-GCM. The 32-byte encryption key is stored at {dataDir}/encryption.key (generated on first run, 0600 permissions). Values are decrypted transparently when passed to docker-compose or displayed in the UI. The key is included in infra backups (Hub + local drives) and restored during disaster recovery. On upgrade, existing plaintext values are migrated automatically on startup.

Scheduler Jobs

Job	Type	When	Purpose
status-refresh	periodic	30s	Refresh container states
stack-scan	periodic	2m	Rescan stacks directory
heartbeat	periodic	5m	Legacy Healthchecks ping (deprecated — Hub handles via event system)
system-health	periodic	configurable	Health checks + alert refresh
backup-cache	periodic	5m	Refresh backup status cache
hub-report	periodic	15m	Push report to central hub
db-dump	daily	02:30	Database dumps
backup	daily	03:00	Restic backup → cross-drive chain
backup-integrity	daily	Sun 04:00	Restic check
metrics-prune	daily	04:00	Delete metrics older than 30 days
selfupdate-check	periodic	6h	Check registry for new version (cache for UI)
selfupdate-auto	daily	04:30	Auto-update if enabled + backup not running
asset-sync	daily	05:00	Download changed app assets from Hub

All daily jobs use Europe/Budapest timezone. Skip-if-running prevents concurrent execution. Panic recovery in all jobs.

REST API

Stack Operations

Method	Endpoint	Description
GET	`/api/health`	Health check (no auth)
GET	`/api/stacks`	List all stacks
GET	`/api/stacks/{name}`	Stack details
POST	`/api/stacks/{name}/deploy`	First-time deploy
POST	`/api/stacks/{name}/start`	Start stack (409 if insufficient memory)
POST	`/api/stacks/{name}/stop`	Stop stack
POST	`/api/stacks/{name}/restart`	Restart stack
POST	`/api/stacks/{name}/update`	Pull + recreate
POST	`/api/stacks/{name}/optional-config`	Update optional env vars
GET	`/api/stacks/{name}/logs`	Container logs (`?raw=1` for plain text)
GET	`/api/stacks/{name}/hdd-data`	HDD data paths + sizes
GET	`/api/stacks/{name}/backup-data`	Backup data paths + sizes (DB dumps, cross-drive rsync)
POST	`/api/stacks/{name}/remove`	Remove deployed stack (revert to "not deployed")
DELETE	`/api/stacks/{name}`	Delete orphaned stack
POST	`/api/sync`	Trigger catalog sync
GET	`/api/system/info`	System info + sync status

Backup & Restore

Method	Endpoint	Description
GET	`/api/backup/status`	Full backup status
POST	`/api/backup/run`	Trigger manual backup
GET	`/api/backup/snapshots`	List snapshots (`?stack={name}` for filtering)
POST	`/api/stacks/{name}/cross-backup`	Save cross-drive config
POST	`/api/stacks/{name}/cross-backup/run`	Trigger cross-drive backup
GET	`/api/stacks/{name}/cross-backup/status`	Cross-drive status
POST	`/api/backup/cross-drive/run-all`	Run all scheduled cross-drive backups

Storage

Method	Endpoint	Description
GET	`/api/storage/scan`	Scan available disks
POST	`/api/storage/init`	Format and mount a disk
GET	`/api/storage/init/status`	Format progress
POST	`/api/storage/attach/mount-raw`	Temp-mount partition for browsing
GET	`/api/storage/attach/browse?path=`	List directories on raw mount
POST	`/api/storage/attach/mkdir`	Create folder on raw mount
POST	`/api/storage/attach`	Finalize attach (bind mount + fstab)
GET	`/api/storage/attach/status`	Attach progress
POST	`/api/storage/attach/cancel`	Cleanup temp raw mount
POST	`/api/storage/migrate`	Start app data migration
GET	`/api/storage/migrate/status`	Migration progress
POST	`/api/storage/disconnect`	Safe disconnect (stop apps, unmount)
POST	`/api/storage/reconnect`	Reconnect disconnected drive
POST	`/api/storage/restart-apps`	Restart auto-stopped apps
GET	`/api/storage/status`	All storage paths with connection state

Self-Update

Method	Endpoint	Description
GET	`/api/selfupdate/status`	Update status (cached check result + last state)
POST	`/api/selfupdate/check`	Force registry check
POST	`/api/selfupdate/update`	Trigger self-update (async)

Self-update endpoints accept session auth OR Authorization: Bearer <hub_api_key> for external triggering.

Config Management

Method	Endpoint	Description
POST	`/api/config/apply`	Apply new controller.yaml from Hub (atomic write)
GET	`/api/config/hash`	Get SHA256 hash of current controller.yaml
GET	`/api/config`	Get raw controller.yaml content (text/yaml) for live diff and pull

Config endpoints accept session auth OR Authorization: Bearer <hub_api_key> (same as self-update). The /api/config/apply endpoint:

Accepts raw YAML body (the generated config from Hub)
Validates YAML is parseable before writing
Atomic write: writes to .tmp then os.Rename for crash safety
Does NOT reload config — restart required to apply changes
Returns {"ok": true, "message": "Config applied. Restart controller to apply changes."}

Metrics

Method	Endpoint	Description
GET	`/api/metrics/system`	System metrics time-series (`?range=1h
GET	`/api/metrics/containers/summary`	Current container stats
GET	`/api/metrics/containers/{name}`	Per-container time-series
GET	`/api/metrics/sysinfo`	Static system info

Assets

Method	Endpoint	Description
POST	`/api/assets/sync`	Trigger on-demand asset sync from Hub (async)
GET	`/api/assets/status`	Asset sync status (last sync, file count, total bytes)

Integrations

Method	Endpoint	Description
GET	`/api/integrations/{provider}`	List integrations for provider app (status, target availability)
POST	`/api/integrations/{provider}/{target}`	Enable/disable integration (`{"enabled": true/false}`)

Debug (debug mode only)

Method	Endpoint	Description
GET	`/api/debug/dump`	Full diagnostic JSON dump (controller state, storage, stacks, backup, hub, scheduler, health, alerts). Returns 404 when `logging.level` is not `"debug"`.
GET	`/api/debug/telemetry`	Run telemetry collection on-demand; returns per-app metrics + log summary with latency. Response: `{latency_ms, app_count, total_errors, total_warnings, app_telemetry[]}`.

Response format: {"ok": true/false, "data": ..., "error": "...", "message": "..."}

Build & Deploy

Build

# On build server (192.168.0.180)
cd ~/build/felhom-controller
git -C ~/git/deploy-felhom-compose pull
./build.sh v0.20.0 --push

Deploy on customer node

Option A: Self-Update API (v0.16.0+)

After building and pushing the new image, trigger the controller's self-update endpoint:

curl -s -X POST https://felhom.demo-felhom.eu/api/selfupdate/update \
  -H "Authorization: Bearer <HUB_API_KEY>"

The controller pulls the new image, updates its own compose file, and runs docker compose up -d to replace itself. The Settings page also has a "Frissítés telepítése" button for manual triggering.

Option B: Manual SSH (pre-v0.16.0 or fallback)

# On customer node (e.g., 192.168.0.162)
cd /opt/docker/felhom-controller
sudo docker pull gitea.dooplex.hu/admin/felhom-controller:<VERSION>
sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:<VERSION>|' docker-compose.yml
sudo docker compose up -d

Important: Always use docker compose up -d, NOT docker compose restart — restart doesn't pick up new images.

Docker Requirements

The controller container needs:

privileged: true (disk operations)
Docker socket mount (/var/run/docker.sock)
/mnt mount with propagation: rshared (container mounts visible to host)
/dev mounted as /host-dev (block device access)
/etc/fstab mounted as /host-fstab (persistent mount config)

See docker-compose.yml for the full volume configuration.

Roadmap

Completed

Stack management with deploy flow and memory validation
Git-based app catalog sync
Central job scheduler
System monitoring with SQLite metrics and Chart.js charts
Healthchecks.io integration (5 ping types)
3-layer backup system (DB dumps + restic + cross-drive)
Per-app backup restore with auto stop/restart
Storage management (scan, format, mount, registry)
Attach existing drive wizard (v0.15.0) — bind-mount subfolder from pre-formatted drive, directory browser
App data migration between storage paths
Storage watchdog (v0.17.0) — USB disconnect detection (~15s), auto-stop apps, auto-remount on reconnect, safe eject UI
Central hub reporting
Email notifications via hub relay
Settings persistence and password management
Dashboard alert system
Per-drive backup architecture (v0.14.0) — per-drive restic repos, per-app DB dumps, path helpers
Cross-drive restic pruning (v0.14.0)
Auto Tier 2 for small apps (v0.14.1) — auto-enable daily rsync for non-HDD apps when ≥2 drives
Infrastructure config in cross-drive backup (v0.14.1) — stacks dir + controller.yaml in _infra/ + restic
Disaster recovery (v0.15.5) — Hub-based infra backup, auto-mount by UUID, restore UI with full-page takeover
Controller self-update (v0.16.0) — Watchtower-style pull + restart, Settings page UI, API key auth, auto-update scheduling
Hub-managed config (v0.20.0) — Config apply endpoint (POST /api/config/apply), config hash in reports for sync comparison
Config content endpoint (v0.21.1) — GET /api/config returns raw YAML for Hub live diff and pull operations
First-run setup wizard (v0.22.0) — Web-based wizard replaces shell scripts, drive scan for local backups, Hub recovery, fresh install flow
Setup wizard logo fix (v0.22.2) — Use embedded SVG instead of filesystem path
Hub-managed asset sync (v0.22.3) — Download app logos/screenshots from Hub API with SHA-256 change detection, daily sync schedule

In Progress / Planned

Update classification and auto-apply (optional/required/security markers)
Docker volume backup + Tier 2 restore (v0.33.0)
Raspberry Pi testing (pi-customer-1)
CSRF protection on POST endpoints (v0.23.0)
Verbose debug logging across all modules (v0.24.0)
Diagnostic dump endpoint /api/debug/dump (v0.24.0)
Startup self-test with 9 subsystem checks (v0.24.0)
Login rate limiting

Test Environments

Node	Hardware	Domain	Status
demo-felhom	Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD	demo-felhom.eu	Active
felhotest	Proxmox VM (4-16G RAM, 8 vCPU, 200G + 100G SCSI)	router.abonet.hu:33022	Active
pi-customer-1	Raspberry Pi 3B+, 1G RAM, 32G SD	pi-customer-1.local	Not yet tested

Repository	Purpose
deploy-felhom-compose	This repo — controller + deploy scripts
app-catalog-felhom.eu	Docker Compose templates + .felhom.yml metadata
felhom.eu	Website + app assets + felhom-hub service

README.md

felhom-controller

Table of Contents

Architecture

Key Architecture Decisions

Module Map

Features

1. App Management

Git Sync (internal/sync/)

First-Time Deploy Flow

Catch-All Page for Stopped Apps

Dashboard "Megnyitás" Button

App Info Pages

Stack Operations

Missing Field Injection (deploy.go)

Container State Display

Controller-side Health Probes (internal/stacks/healthprobe.go)

2. App Export/Import (.fab bundles)

3. Backup System

Tier 1: Nightly Backup (mandatory, same drive)

Tier 2: Cross-Drive Backup (opt-in, different device) (internal/backup/crossdrive.go)

Tier 3: Remote Backup (future)

Restore (internal/backup/restore.go)

Backup Page UI (internal/web/templates/backups.html)

4. Storage Management

Disk Scanning (internal/storage/scan.go)

Disk Initialization Wizard (internal/storage/format.go)

Attach Existing Drive Wizard (internal/storage/attach.go)

Storage Path Registry (internal/settings/settings.go)

Data Migration (internal/storage/migrate.go)

Stale Data Cleanup

FileBrowser Mount Sync

Storage Watchdog (internal/monitor/watchdog.go)

5. Monitoring & Health

System Health Checks (internal/monitor/healthcheck.go)

Healthchecks.io Integration (deprecated)

Metrics Store (internal/metrics/)

Monitoring Page

Alert System (internal/web/alerts.go)

6. Notifications

Hub Event System (internal/notify/notifier.go)

Event Types

Default Enabled Events

Preference Sync

7. Update Management

App Catalog Sync

Planned Update Classifications

Controller Self-Update (internal/selfupdate/)

How It Works

Design Philosophy

Package Structure

Update Trigger Flow

Startup Verification

Auto-Update Scheduling

Compose File Access

API Endpoints

Settings Page UI

Configuration

Edge Cases

8. Authentication & Settings

Session Auth (internal/web/auth.go)

CSRF Protection (internal/web/csrf.go)

Settings Persistence (internal/settings/settings.go)

Settings Page (/settings)

9. Central Hub Reporting

Report Push (internal/report/)

App Telemetry (internal/metrics/telemetry.go, internal/metrics/logscanner.go, internal/report/telemetry.go)

Infrastructure Backup to Hub (internal/report/infra_backup.go)

Hub Dashboard

10. First-Run Setup Wizard

Setup Mode Detection (internal/setup/setup.go)

Wizard Flow

Hub Pre-Seeding

Key Components

Local Infra Backup (internal/backup/local_infra.go)

Recovery Info (internal/recovery/info.go)

11. Disaster Recovery

12. Asset Sync

How It Works (internal/assets/syncer.go)

Asset Resolution (two-tier)

Git Sync (`internal/sync/`)

Missing Field Injection (`deploy.go`)

Controller-side Health Probes (`internal/stacks/healthprobe.go`)

Tier 2: Cross-Drive Backup (opt-in, different device) (`internal/backup/crossdrive.go`)

Restore (`internal/backup/restore.go`)

Backup Page UI (`internal/web/templates/backups.html`)

Disk Scanning (`internal/storage/scan.go`)

Disk Initialization Wizard (`internal/storage/format.go`)

Attach Existing Drive Wizard (`internal/storage/attach.go`)

Storage Path Registry (`internal/settings/settings.go`)

Data Migration (`internal/storage/migrate.go`)

Storage Watchdog (`internal/monitor/watchdog.go`)

System Health Checks (`internal/monitor/healthcheck.go`)

Metrics Store (`internal/metrics/`)

Alert System (`internal/web/alerts.go`)

Hub Event System (`internal/notify/notifier.go`)

Controller Self-Update (`internal/selfupdate/`)

Session Auth (`internal/web/auth.go`)

CSRF Protection (`internal/web/csrf.go`)

Settings Persistence (`internal/settings/settings.go`)

Settings Page (`/settings`)

Report Push (`internal/report/`)

App Telemetry (`internal/metrics/telemetry.go`, `internal/metrics/logscanner.go`, `internal/report/telemetry.go`)

Infrastructure Backup to Hub (`internal/report/infra_backup.go`)

Setup Mode Detection (`internal/setup/setup.go`)

Local Infra Backup (`internal/backup/local_infra.go`)

Recovery Info (`internal/recovery/info.go`)

How It Works (`internal/assets/syncer.go`)

Cloudflare API Client (`internal/cloudflare/`)

Architecture (`internal/integrations/`)

Metadata (`.felhom.yml`)

Controller config (`controller.yaml`)

Runtime settings (`settings.json`)

Per-app config (`app.yaml`)