1155a0522b
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1789 lines
109 KiB
Markdown
1789 lines
109 KiB
Markdown
# felhom-controller
|
|
|
|
**Central management container for Felhom home servers.**
|
|
|
|
A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.
|
|
|
|
**Current version: v0.32.4**
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
- [Architecture](#architecture)
|
|
- [Features](#features)
|
|
- [App Management](#1-app-management)
|
|
- [App Export/Import](#2-app-exportimport-fab-bundles)
|
|
- [Backup System](#3-backup-system)
|
|
- [Storage Management](#4-storage-management)
|
|
- [Monitoring & Health](#5-monitoring--health)
|
|
- [Notifications](#6-notifications)
|
|
- [Update Management](#7-update-management)
|
|
- [Authentication & Settings](#8-authentication--settings)
|
|
- [Central Hub](#9-central-hub-reporting)
|
|
- [Setup Wizard](#10-first-run-setup-wizard)
|
|
- [Disaster Recovery](#11-disaster-recovery)
|
|
- [Asset Sync](#12-asset-sync)
|
|
- [Debug Mode](#13-debug-mode)
|
|
- [Geo-Restriction](#14-geo-restriction)
|
|
- [App-to-App Integrations](#15-app-to-app-integrations)
|
|
- [Repository Layout](#repository-layout)
|
|
- [Configuration](#configuration)
|
|
- [REST API](#rest-api)
|
|
- [Build & Deploy](#build--deploy)
|
|
- [Roadmap](#roadmap)
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Customer Hardware (N100 mini PC / Raspberry Pi) │
|
|
│ │
|
|
│ ┌──────────┐ ┌────────────────────────────────────────────┐ │
|
|
│ │ Traefik │ │ felhom-controller (privileged container) │ │
|
|
│ │ (reverse │──▶│ │ │
|
|
│ │ proxy) │ │ ┌──────────┐ ┌─────────────────────────┐│ │
|
|
│ └──────────┘ │ │ Web UI │ │ Stack Manager ││ │
|
|
│ │ │ (HU dash │ │ (compose ops, git sync, ││ │
|
|
│ ┌──────────┐ │ │ board) │ │ deploy, delete, update) ││ │
|
|
│ │cloudflared│ │ └──────────┘ └─────────────────────────┘│ │
|
|
│ │ (tunnel) │ │ ┌──────────┐ ┌─────────────────────────┐│ │
|
|
│ └──────────┘ │ │ Backup │ │ Storage Manager ││ │
|
|
│ │ │ (3-layer │ │ (disk scan, format, ││ │
|
|
│ ┌──────────┐ │ │ restic) │ │ mount, migrate) ││ │
|
|
│ │ App │ │ └──────────┘ └─────────────────────────┘│ │
|
|
│ │ stacks │ │ ┌──────────┐ ┌─────────────────────────┐│ │
|
|
│ │ (docker │ │ │Scheduler │ │ Monitor & Metrics ││ │
|
|
│ │ compose) │ │ │(cron-like│ │ (health, SQLite ││ │
|
|
│ └──────────┘ │ │ jobs) │ │ time-series, Chart.js) ││ │
|
|
│ │ └──────────┘ └─────────────────────────┘│ │
|
|
│ │ ┌──────────┐ ┌─────────────────────────┐│ │
|
|
│ │ │ Notify │ │ REST API + Hub Reporter ││ │
|
|
│ │ │ (events) │ │ (JSON push + events) ││ │
|
|
│ │ └──────────┘ └─────────────────────────┘│ │
|
|
│ │ ┌──────────┐ │ │
|
|
│ │ │ Assets │ │ │
|
|
│ │ │ (Hub │ │ │
|
|
│ │ │ sync) │ │ │
|
|
│ │ └──────────┘ │ │
|
|
│ └────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│ events + reports │ git pull │ asset sync
|
|
▼ ▼ ▼
|
|
hub.felhom.eu gitea.dooplex.hu hub.felhom.eu
|
|
(central dashboard) (stack definitions) (logos, screenshots)
|
|
```
|
|
|
|
### Key Architecture Decisions
|
|
|
|
- **Pure Go, no frameworks** — stdlib `net/http` + `html/template`. Only external deps: `bcrypt`, `yaml.v3`, `modernc.org/sqlite` (pure Go, no CGO).
|
|
- **Privileged container** — Required for disk operations (format, mount, fstab), `/dev` access, and Docker socket control.
|
|
- **`/host-dev` indirection** — Docker overrides `/dev` with a tmpfs. The host's `/dev` is mounted at `/host-dev` to access block devices.
|
|
- **`StackDataProvider` interface** — Breaks circular import between backup and stacks packages. Implemented by `stackAdapter` in `main.go`. Provides `GetStackHDDPath()` for per-drive backup routing.
|
|
- **Atomic file writes** — All persistent state (`settings.json`, `app.yaml`) written to `.tmp` then `os.Rename` for crash safety.
|
|
- **`go:embed` templates** — All HTML/CSS/JS compiled into the binary. No runtime file dependencies.
|
|
- **Europe/Budapest timezone** — All scheduled jobs, timestamps, and UI labels use Hungarian timezone.
|
|
|
|
### Module Map
|
|
|
|
| Module | Path | Responsibility |
|
|
|--------|------|----------------|
|
|
| **Config** | `internal/config/` | YAML loader, validation, `FELHOM_*` env overrides |
|
|
| **Settings** | `internal/settings/` | Runtime-mutable `settings.json` (passwords, backup prefs, storage paths, notifications) |
|
|
| **Stacks** | `internal/stacks/` | Compose operations, scanning, `.felhom.yml` metadata, deploy/delete flow |
|
|
| **Crypto** | `internal/crypto/` | AES-256-GCM encryption for sensitive app.yaml values (passwords, secrets), key management |
|
|
| **Sync** | `internal/sync/` | Git-based app catalog sync (clone/pull, content-hash copy) |
|
|
| **Backup** | `internal/backup/` | Per-drive 3-layer backup: DB dumps → restic snapshots → cross-drive copies, restore |
|
|
| **Storage** | `internal/storage/` | Disk scanning (`lsblk`), partitioning (`sfdisk`), formatting (`mkfs.ext4`), mounting, data migration (`rsync`) |
|
|
| **System** | `internal/system/` | System info (`/proc`), CPU collector, mount points, disk usage, FS info |
|
|
| **Monitor** | `internal/monitor/` | System health checks, storage watchdog, legacy Healthchecks pinger (deprecated) |
|
|
| **Metrics** | `internal/metrics/` | SQLite time-series store, system + container metric collection |
|
|
| **Scheduler** | `internal/scheduler/` | Central job scheduler (periodic + daily, skip-if-running, panic recovery) |
|
|
| **SelfUpdate** | `internal/selfupdate/` | Version checking (registry), update trigger, state persistence, startup verification |
|
|
| **Notify** | `internal/notify/` | Email notifications via hub relay, preference sync, per-event cooldowns |
|
|
| **Report** | `internal/report/` | Hub report builder + HTTP pusher (system, stacks, backup, health) |
|
|
| **Assets** | `internal/assets/` | Hub-managed asset syncer: downloads logos/screenshots with SHA-256 change detection |
|
|
| **SelfTest** | `internal/selftest/` | Startup self-test: 9 diagnostic checks (Docker, dirs, storage, hub, restic, metrics) |
|
|
| **Util** | `internal/util/` | Shared utilities: `TruncateStr` for debug log output truncation |
|
|
| **AppExport** | `internal/appexport/` | Per-app export/import via `.fab` bundles (config + DB + user data), optional AES-256 encryption |
|
|
| **API** | `internal/api/` | REST JSON endpoints, diagnostic dump (`/api/debug/dump`) |
|
|
| **Web** | `internal/web/` | Hungarian dashboard, auth, page handlers, template functions, alerts |
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
### 1. App Management
|
|
|
|
The controller manages Docker Compose stacks through a complete lifecycle: catalog sync, first-time deployment, runtime operations, and deletion.
|
|
|
|
#### Git Sync (`internal/sync/`)
|
|
|
|
The app catalog lives in a separate Git repository. The controller:
|
|
- Shallow-clones the catalog on startup
|
|
- Periodically fetches updates (configurable, default 15 min)
|
|
- Copies only `docker-compose.yml` and `.felhom.yml` to the stacks directory
|
|
- **Never overwrites** `app.yaml` (user secrets are safe)
|
|
- Uses SHA-256 content hashing — only writes files that actually changed
|
|
- Triggers stack rescan after sync so the dashboard updates immediately
|
|
- **Post-sync hook**: auto-injects missing deploy fields (new secrets, domains) into existing `app.yaml` for stacks whose templates were updated (see Missing Field Injection below)
|
|
- Manual sync via "Sablonok frissitese" button or `POST /api/sync`
|
|
|
|
#### First-Time Deploy Flow
|
|
|
|
1. Customer sees app card with "Telepites" button
|
|
2. Deploy page pre-generates and **displays** all auto-values before the user clicks deploy:
|
|
- `domain` fields: shown as readonly text input with the customer's configured base domain
|
|
- `subdomain` fields: editable text input pre-filled with the default from `.felhom.yml`, shown with `.base-domain` suffix. Validated for DNS-safe format, reserved names, and uniqueness across deployed stacks. Locked after deploy — changing requires Remove + Redeploy
|
|
- `secret` fields: pre-generated and shown as masked password inputs with a "Megjelenítés" reveal button — user can see/copy all DB passwords and keys before deploying
|
|
- User-configurable inputs (admin password, language, storage path) remain editable
|
|
- Section header prompts the user to note down any passwords they need
|
|
3. `checkBeforeDeploy()` JS guard fetches live state first (prevents double-deploy from another tab)
|
|
4. **Memory validation** uses real system memory from `/proc/meminfo`:
|
|
- `usable_memory = total_ram - reserved_memory_mb` (default 384MB reserved)
|
|
- `system.GetMemoryMB()` returns real-time total and used memory (not declared reservations)
|
|
- Hard block if `used_mb + new_request > usable_memory`
|
|
- `CommittedMemory()` (declared sum) still used for soft overcommit warning only
|
|
- Deploy page shows real memory usage bar (not declared reservations)
|
|
5. Pre-generated secret values are submitted as hidden form inputs so the **same values** the user saw are saved to `app.yaml` (no silent re-generation on submit). Controller saves `app.yaml`, sets in-memory `Deployed` + `Deploying` flags, then runs `docker compose up -d` **asynchronously** in a goroutine — API returns immediately so the UI switches to the progress panel without waiting for image pulls. On failure the goroutine reverts both disk and in-memory state and sets `DeployError`.
|
|
6. 3-step progress panel polls `GET /api/stacks/{name}` every 3s: config saved → `deploying` (pulling images) → containers starting → health check passed. New `StateDeploying` state shown while compose-up is in progress (no containers yet).
|
|
7. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only; the "Automatikusan generált értékek" section continues to show the saved values on the settings page
|
|
8. The deploy/settings page includes **start/stop/restart** buttons for deployed apps, plus a "Megnyitás ↗" link to the app's subdomain URL (only visible when running)
|
|
|
|
#### Catch-All Page for Stopped Apps
|
|
|
|
When a user visits a stopped or undeployed app's subdomain (e.g., `travel.demo-felhom.eu`), the controller serves a branded error page instead of Traefik's raw 404:
|
|
|
|
- **Traefik catch-all router**: The controller's `docker-compose.yml` registers a second router (`catchall`) with `priority=1` (lowest) and `HostRegexp(.+)`. Running apps always win; only requests with no matching container reach the controller.
|
|
- **`CatchAllMiddleware`** in `server.go` intercepts requests where `Host` ≠ `felhom.DOMAIN`, serves the catch-all page **without auth** (user has no session on the app subdomain).
|
|
- **`findStackBySubdomain()`** identifies the app by matching the subdomain against deployed `app.yaml` `SUBDOMAIN` env or metadata fallback.
|
|
- **`catchall.html`** — standalone template (no layout, inline CSS) showing the app name, status ("leállítva" / "nincs telepítve" / "nem található"), and links to the controller dashboard or the app's detail page.
|
|
- **Subdomain links** on the Alkalmazások page are only shown for deployed apps (non-deployed apps have no guaranteed subdomain yet).
|
|
|
|
#### Dashboard "Megnyitás" Button
|
|
|
|
Running apps on the Vezérlőpult now show a "Megnyitás ↗" button that opens the app's subdomain in a new tab. The `Subdomains` map is built in `dashboardHandler` from `app.yaml` env or metadata fallback.
|
|
|
|
#### App Info Pages
|
|
|
|
Each app can define rich metadata in `.felhom.yml`:
|
|
- `app_info`: tagline, use_cases, first_steps, prerequisites, default_creds, docs_url
|
|
- `optional_config`: groups of post-deploy configurable env vars (e.g., API keys for metadata providers)
|
|
- `resources`: mem_request, mem_limit, pi_compatible, needs_hdd, hungarian_ui
|
|
|
|
The `/apps/{slug}` page renders hero section, screenshots, setup guide, and optional config form.
|
|
|
|
#### Stack Operations
|
|
|
|
| Operation | What it does |
|
|
|-----------|-------------|
|
|
| Start | `docker compose up -d` — pre-start memory check rejects with 409 if insufficient RAM |
|
|
| Stop | `docker compose stop` (blocked for protected stacks) |
|
|
| Restart | `docker compose restart` |
|
|
| Update | `docker compose pull` + `docker compose up -d` |
|
|
| Remove | `docker compose down --volumes` + remove `app.yaml` + optional HDD/backup cleanup; template preserved for redeploy |
|
|
| Delete | `docker compose down --rmi local --volumes` + optional HDD data cleanup (orphaned stacks only) |
|
|
|
|
**Remove vs Delete**: "Eltávolítás" (Remove) is for deployed catalog stacks — it reverts the stack to "Nincs telepítve" state while keeping the template for easy redeployment. "Törlés" (Delete) is for orphaned stacks — it removes the entire stack directory including templates. Both require stopping the stack first.
|
|
|
|
**Remove modal** shows three sections: (1) always-removed items (Docker volumes, app.yaml, cross-drive schedule), (2) optional HDD data deletion with reimport warning, (3) optional backup data deletion (DB dumps + cross-drive rsync) with restic retention note.
|
|
|
|
**Protected stacks** (traefik, cloudflared, felhom-controller) cannot be stopped, removed, or deleted from the UI. Restart is allowed.
|
|
|
|
**Orphan detection**: Deployed stacks with no matching catalog template are marked as orphaned with an "Elavult" badge and can be safely deleted.
|
|
|
|
#### Missing Field Injection (`deploy.go`)
|
|
|
|
When app templates are updated (e.g., a new `APP_KEY` secret is added to `.felhom.yml`), existing deployed apps need the new field in their `app.yaml`. The controller handles this automatically:
|
|
|
|
- **On startup**: `InjectMissingFields()` runs for all deployed stacks
|
|
- **After sync**: the post-sync hook runs for stacks whose templates were updated
|
|
- For each deployed stack, compares `.felhom.yml` `deploy_fields` against `app.yaml` env vars
|
|
- Missing `secret` fields: auto-generated using the field's generator spec (`password:N`, `hex:N`, `base64key:N`)
|
|
- Missing `domain` fields: filled with the customer's configured domain
|
|
- Missing `subdomain` fields: filled with the field's default value or the `.felhom.yml` `subdomain:` metadata
|
|
- Other field types (e.g., `text`, `select`): logged as warning for manual configuration
|
|
- Locked fields are added to the locked list automatically
|
|
|
|
**Generator types**: `password:N` (alphanumeric), `hex:N` (hex-encoded random bytes), `base64key:N` (`base64:` + N random bytes base64-encoded, for Laravel APP_KEY etc.), `static:VALUE` (literal value).
|
|
|
|
#### Container State Display
|
|
|
|
| State | Color | Label | Meaning |
|
|
|-------|-------|-------|---------|
|
|
| Running + healthy | Green | "Fut" | All containers running and healthy |
|
|
| Running + starting | Orange | "Indulas..." | Healthcheck not yet passed |
|
|
| Deploying | Orange | "Telepítés..." | Compose up in progress (image pull, container creation) |
|
|
| Running + unhealthy | Yellow | "Nem egeszseges" | Docker or controller-side healthcheck failing |
|
|
| Stopped/exited | Red | "Leallitva" | All containers stopped |
|
|
| Restarting | Yellow | "Ujrainditas..." | Restart loop |
|
|
| Not deployed | Gray | "Nincs telepitve" | Compose file exists, not deployed |
|
|
|
|
#### Controller-side Health Probes (`internal/stacks/healthprobe.go`)
|
|
|
|
For apps that declare a `healthcheck:` section in `.felhom.yml`, the controller probes the container directly over the Docker network (both are on `traefik-public`). This complements Docker-level healthchecks and is the **only** health mechanism for distroless/scratch images that lack shell utilities.
|
|
|
|
Three probe types are supported:
|
|
- **`http`** — Any HTTP response (even 4xx/5xx) = service is alive. Only connection refused/timeout = unhealthy.
|
|
- **`api`** — HTTP request with response validation (expected status code, body content). Fails if expectations aren't met.
|
|
- **`tcp`** — Simple port reachability check via `net.Dial`.
|
|
|
|
Multiple checks per app are supported (all must pass). The probe scheduler runs every 10 seconds; per-app intervals default to 5 minutes and are configurable via `healthcheck.interval` in `.felhom.yml`. Probe results are stored in `Stack.HealthProbe` and exposed via the API. Failed probes override the stack state to `StateUnhealthy`; the override clears automatically when the next probe passes.
|
|
|
|
**Fast initial probing:** On start/restart, stale health probe results are cleared (so the stack doesn't immediately appear "unhealthy" from a previous result). Until the first healthy probe, the controller checks every 10 seconds instead of the normal 5-minute interval, giving fast feedback on whether the app came up successfully.
|
|
|
|
---
|
|
|
|
### 2. App Export/Import (.fab bundles)
|
|
|
|
Per-app export creates a self-contained `.fab` file (tar.gz, optionally encrypted) that can be stored externally or used to restore the app on the same server. Distinct from the automatic backup system — user-initiated, per-app, produces a single portable file.
|
|
|
|
**Bundle contents:** `manifest.json` + `config/` (compose, .felhom.yml, app.yaml with plaintext secrets) + `database/` (gzipped SQL dump) + `data/` (HDD bind mount tars or Docker named volume tars).
|
|
|
|
**Encryption:** Optional AES-256-CTR + HMAC-SHA256 with scrypt key derivation (N=32768). Format: `"FABE"` magic header + salt + IV + encrypted tar.gz + HMAC tag. Streaming for multi-GB files.
|
|
|
|
**Export flow:** Estimate size → check free space → optionally stop app → copy config → dump DB → tar user data → create tar.gz → optionally encrypt → atomic rename. App restarts automatically after export if it was stopped.
|
|
|
|
**Import flow:** Decrypt if needed → extract → prepare stack dir (create new or `compose down --volumes` for existing) → restore config (re-encrypt app.yaml with current server key) → restore user data (HDD or volumes) → restore DB (start DB service, wait for ready, import dump) → start full stack → refresh UI.
|
|
|
|
**Architecture:** `internal/appexport/` package with `ExportStackProvider` adapter interface (same pattern as `backup.StackDataProvider`). `exportAdapter` in `main.go` bridges `stacks.Manager` to the provider.
|
|
|
|
**API endpoints:** `/api/export/estimate`, `/api/export/start`, `/api/export/status`, `/api/export/bundles`, `/api/export/manifest`, `/api/export/import`, `/api/export/import/status`.
|
|
|
|
**UI:** Export button on app info page, standalone import page at `/import` accessible from the stacks page header.
|
|
|
|
---
|
|
|
|
### 3. Backup System
|
|
|
|
The backup system implements a **3-2-1 backup architecture**. Each tier is a **complete,
|
|
self-sufficient backup** — any single tier can fully restore an app.
|
|
|
|
| Tier | Contents | Location | Can fully restore? |
|
|
|------|----------|----------|--------------------|
|
|
| **1. Nightly restic** | DB + Config + User data | Same drive as app | Yes (not against drive failure) |
|
|
| **2. Cross-drive** | DB + Config + User data | Different physical device | Yes |
|
|
| **3. Remote** | Everything | Cloud / remote server | Future |
|
|
|
|
**Key principles:**
|
|
- User data backup is **mandatory** — every app with HDD bind mounts is included
|
|
automatically. There is no per-app toggle.
|
|
- Each tier includes **everything** needed to restore: DB dumps, config, and user data.
|
|
No tier depends on another tier's data.
|
|
- **Tier 2 is configurable for ALL apps** — not just apps with HDD data. Non-HDD apps
|
|
back up config + DB dumps to the secondary drive (small but protects against drive failure).
|
|
- The `AppBackupPrefs.Enabled` field in settings.json is legacy and not read by any code.
|
|
|
|
**Per-app Tier 2 contents by app type:**
|
|
|
|
| App type | Tier 2 contents | Example |
|
|
|----------|----------------|---------|
|
|
| HDD + DB | Config + DB + User data | Immich, Paperless-ngx |
|
|
| HDD, no DB | Config + User data | — |
|
|
| DB, no HDD | Config + DB | Mealie, Vikunja |
|
|
| Config only | Config | Gokapi, Homepage |
|
|
|
|
#### Tier 1: Nightly Backup (mandatory, same drive)
|
|
|
|
The nightly backup has two phases that run sequentially. All paths are **per-drive** — each physical drive gets its own restic repo and per-app DB dump directories.
|
|
|
|
**Drive layout (v0.26.0):**
|
|
```
|
|
<drive>/
|
|
├── felhom-data/ ← all controller-managed data (namespace, v0.26.0+)
|
|
│ ├── appdata/<app>/ ← app user data
|
|
│ └── backups/
|
|
│ ├── primary/
|
|
│ │ ├── restic/ ← one restic repo per drive (all apps on this drive)
|
|
│ │ └── <app>/db-dumps/ ← per-app DB dump files
|
|
│ └── secondary/
|
|
│ ├── restic/ ← secondary restic repo (cross-drive)
|
|
│ ├── _infra/ ← infra config mirror
|
|
│ └── <app>/rsync/ ← per-app rsync data
|
|
├── .felhom-infra-backup/ ← DR marker (stays at drive root for scanner)
|
|
├── Dokumentumok/ ← user files (not controller-managed)
|
|
└── media/ ← user files (not controller-managed)
|
|
```
|
|
|
|
> **Note:** `HDD_PATH` env var in `app.yaml` is still the mount point (e.g., `/mnt/hdd_1`). The `felhom-data` segment is embedded in path helpers — not in `HDD_PATH`.
|
|
> Pre-v0.26.0 installations use `<drive>/appdata/` and `<drive>/backups/` directly (no `felhom-data/` namespace).
|
|
|
|
Path computation is centralized in `backup/paths.go` via the `FelhomDataDir = "felhom-data"` constant:
|
|
- `PrimaryResticRepoPath(drivePath)` → `<drive>/felhom-data/backups/primary/restic/`
|
|
- `AppDBDumpPath(drivePath, stackName)` → `<drive>/felhom-data/backups/primary/<stack>/db-dumps/`
|
|
- `AppDataDir(drivePath, stackName)` → `<drive>/felhom-data/appdata/<stack>/`
|
|
- `SecondaryResticRepoPath(drivePath)` → `<drive>/felhom-data/backups/secondary/restic/`
|
|
- `AppSecondaryRsyncPath(drivePath, stackName)` → `<drive>/felhom-data/backups/secondary/<stack>/rsync/`
|
|
- `SecondaryInfraPath(drivePath)` → `<drive>/felhom-data/backups/secondary/_infra/`
|
|
- `InfraBackupDir(mountPath)` → `<drive>/.felhom-infra-backup/` (**unchanged** — stays at drive root for DR scanner)
|
|
|
|
**Phase 1 — Database Dumps** (`internal/backup/dbdump.go`, scheduled 02:30)
|
|
|
|
- **Auto-discovery** of PostgreSQL and MariaDB containers via `docker ps` + `docker inspect`
|
|
- Dumps via `docker exec pg_dump` / `docker exec mariadb-dump` with 5-minute timeout
|
|
- Dumps are written to the app's **home drive**: `AppDBDumpPath(appDrive, stackName)`
|
|
- Atomic writes (`.tmp` → `.sql`) to prevent corruption
|
|
- **Validation** after each dump: checks file size, header presence, counts `CREATE TABLE`
|
|
- Results cached in `settings.json` surviving container restarts
|
|
|
|
**Phase 2 — Restic Snapshot** (`internal/backup/restic.go`, scheduled 03:00)
|
|
|
|
- Apps are **grouped by drive** via `groupStacksByDrive()` — each drive's apps are backed up to that drive's restic repo
|
|
- App drive resolution: `GetStackHDDPath()` (from `StackDataProvider`) → falls back to `SystemDataPath`
|
|
- Auto-generated repository password (32 random bytes, base64url), shared across all repos, synced to hub
|
|
- **Paths included in every per-drive snapshot:**
|
|
- Per-app DB dump dirs on that drive
|
|
- Per-app HDD mount paths (user data)
|
|
- Stacks dir (compose.yml + app.yaml + .felhom.yml for all apps)
|
|
- `controller.yaml` (controller config)
|
|
- Auto-detects and unlocks stale locks (restic repo lock)
|
|
- Weekly prune on Sundays with configurable retention (keep-daily, keep-weekly, keep-monthly)
|
|
- Weekly integrity check (`restic check`) on Sunday 04:00 — checks **all** primary repos
|
|
|
|
**Protects against:** accidental deletion, data corruption, point-in-time rollback.
|
|
Does NOT protect against drive failure (backup is on the same physical drive).
|
|
|
|
#### Tier 2: Cross-Drive Backup (opt-in, different device) (`internal/backup/crossdrive.go`)
|
|
|
|
**Complete backup** to a different physical drive. Available for **all apps** — apps with HDD
|
|
data back up config + DB + user data; apps without HDD back up config + DB dumps only.
|
|
|
|
- **Auto-enable for small apps (v0.14.1):** Apps without HDD mounts (config-only, DB-only) are
|
|
automatically configured for daily rsync Tier 2 when ≥2 storage paths are registered.
|
|
`AutoEnableSmallApps()` runs at the start of each nightly backup cycle. Never overwrites
|
|
existing user-configured cross-drive settings (even disabled ones).
|
|
- **Infrastructure config backup (v0.14.1):** `syncInfraConfig()` rsyncs the stacks directory
|
|
and `controller.yaml` to `<dest>/backups/secondary/_infra/` on every secondary destination
|
|
drive. Runs before per-app backups. Cross-drive restic also includes infra paths.
|
|
- **Two methods:**
|
|
- **rsync** — Simple mirror with `--delete` (fast, no versioning, **browsable** on disk)
|
|
- **restic** — Versioned, deduplicated, encrypted (shared repo across apps, not browsable)
|
|
- Per-app configuration in settings.json: destination path, method, schedule (daily/weekly/manual)
|
|
- **Pre-backup DB dump:** `DumpStackDB()` runs fresh pg_dump/mariadb-dump before each cross-drive backup; non-fatal on failure (wired via `DBDumper` interface to avoid circular imports)
|
|
- **Empty mounts allowed:** `RunAppBackup` accepts apps with no HDD mounts — the rsync
|
|
mount loop simply doesn't execute, but DB + config copy still runs
|
|
- **Drive-type-aware validation** (`ValidateDestination`):
|
|
|
|
| Destination type | Space checks |
|
|
|-----------------|--------------|
|
|
| External mount (different device than `/`) | Block if <100 MB free |
|
|
| System drive (same device as `/`) | Require ≥10 GB free AND <90% used; logged warning |
|
|
|
|
- **Secondary drive layout (v0.14.1):**
|
|
```
|
|
<dest-drive>/backups/secondary/
|
|
├── _infra/ ← infrastructure config mirror (v0.14.1)
|
|
│ ├── controller.yaml
|
|
│ └── stacks/ ← full stacks dir (all app configs)
|
|
├── <app>/rsync/ ← per-app rsync mirror
|
|
│ ├── _db/ ← DB dump files
|
|
│ ├── _config/ ← compose.yml, app.yaml, .felhom.yml
|
|
│ └── <user data> ← HDD mount contents (if app has HDD data)
|
|
└── restic/ ← shared restic repo (all cross-drive apps)
|
|
```
|
|
- DB dump files read from **per-app home drive** path (`AppDBDumpPath`)
|
|
- `_` prefix directories prevent collision with user data
|
|
- For non-HDD apps, only `_db/` and `_config/` are present (no user data directory)
|
|
- **Restic backup paths:** includes HDD mounts (if any) + config dir + per-app DB dump dir from home drive + stacks dir + controller.yaml (infra, v0.14.1)
|
|
- Safety guards: destination ≠ source, path-overlap check (HDD mounts only), writable check
|
|
- **Chained execution:** runs immediately after nightly restic — daily apps every night, weekly apps on Sundays
|
|
- **Hub reporting after manual triggers (v0.27.2):** `OnCrossDriveComplete` callback on Router pushes infra backup snapshot to Hub + writes local infra backup after both single-app and run-all manual triggers complete (previously only automatic scheduled runs reported)
|
|
- Per-app concurrency lock prevents overlapping runs
|
|
- Status (last_run, duration, size, error) persisted to settings.json
|
|
|
|
**Protects against:** primary drive failure, drive theft/damage.
|
|
|
|
#### Tier 3: Remote Backup (future)
|
|
|
|
Complete offsite backup for disaster recovery. Not yet implemented.
|
|
Placeholder shown in UI ("3. mentés — Hamarosan").
|
|
|
|
#### Restore (`internal/backup/restore.go`)
|
|
|
|
All deployed apps appear in the restore dropdown — every app has restic snapshot data
|
|
(stacks dir + DB dumps are always backed up).
|
|
|
|
| App type | Config restored | DB restored | User data restored |
|
|
|----------|----------------|------------|-------------------|
|
|
| Has HDD data | Yes | Yes | Yes (always — backup is mandatory) |
|
|
| DB only, no HDD | Yes | Yes | n/a |
|
|
| No DB, no HDD | Yes | — | n/a |
|
|
|
|
- **Snapshot API** returns ALL snapshots unfiltered — older snapshots still allow config+DB restore; `RestoreApp` extracts whatever paths are available
|
|
- **Restore type info** shown per-app when selected in dropdown (Hungarian banners):
|
|
- Has HDD: "Teljes visszaállitas: adatbazis + konfiguracio + felhasznaloi adatok"
|
|
- Has DB, no HDD: "Adatbazis es konfiguracio visszaallitasa"
|
|
- No DB, no HDD: "Csak konfiguracio visszaallitasa"
|
|
- **Execution flow:** stop app → resolve app's home drive → `restic restore <id> --target / --include <path>...` from per-drive repo → restart app
|
|
- Restic repo resolved via `PrimaryResticRepoPath(appDrivePath)`
|
|
- DB dumps restored from `AppDBDumpPath(appDrivePath, stackName)`
|
|
- Running flag prevents concurrent backup/restore operations
|
|
- Snapshot ID validated (8-64 lowercase hex)
|
|
|
|
**Note:** Restore currently uses Tier 1 (primary restic repo on app's home drive) only.
|
|
Restoring from Tier 2 (cross-drive) is a future enhancement.
|
|
|
|
#### Backup Page UI (`internal/web/templates/backups.html`)
|
|
|
|
Unified per-app status table with expandable rows showing **per-tier** backup status:
|
|
|
|
**Status dot per app:**
|
|
|
|
| Dot color | Meaning |
|
|
|-----------|---------|
|
|
| Green | 2+ tiers configured with successful backups + destination healthy |
|
|
| Yellow | Only 1 tier, or Tier 2 failing, or Tier 2 configured but never run |
|
|
| Red | Tier 2 destination blocked or inaccessible |
|
|
|
|
Every app starts as yellow (1 tier only). Green requires Tier 2 configured with successful backup.
|
|
|
|
**Per-app backup tiers (3 rows per app):**
|
|
- **1. mentes** (Tier 1, always present) — Auto badge + "helyi" + last run + contents (e.g., "DB + Konfig + Adatok")
|
|
- **2. mentes** (Tier 2, configurable for ALL apps) — one of:
|
|
- Configured: method (rsync/restic) + destination + schedule + last run + status + contents + browsable indicator (folder icon for rsync) + action buttons
|
|
- Not configured: "1. mentes auto" + "Nincs 2. masolat" + settings link
|
|
- **3. mentes** (Tier 3, placeholder) — grayed out "Hamarosan" + "tavoli (offsite)" + future note
|
|
|
|
**Backup contents per app** (shown per tier):
|
|
- Apps with DB + HDD: "DB + Konfig + Adatok"
|
|
- Apps with DB only: "DB + Konfig"
|
|
- Apps with HDD, no DB: "Konfig + Adatok"
|
|
- Apps with neither: "Konfig"
|
|
|
|
**Deploy page** shows cross-drive (Tier 2) configuration form for **all deployed apps**,
|
|
not just those with HDD data. Non-HDD apps can configure destination, method, and schedule.
|
|
|
|
**Other sections:**
|
|
- Schedule overview with next run times for DB dump, restic, prune
|
|
- Snapshot history table (last 20 snapshots aggregated from all per-drive repos, sorted by time)
|
|
- Storage overview card (total size across repos, snapshot count, DB dump count/size, encryption key with show/copy)
|
|
- Restore section: app dropdown → snapshot dropdown → restore type info → confirmation checkbox → execute
|
|
|
|
---
|
|
|
|
### 4. Storage Management
|
|
|
|
The storage subsystem handles the full lifecycle of external storage: detection, initialization, path registration, and data migration.
|
|
|
|
#### Disk Scanning (`internal/storage/scan.go`)
|
|
|
|
- `ScanDisks()` uses `lsblk -J -b` for block device enumeration
|
|
- System disk detection via host fstab parsing (`/host-fstab`) + UUID resolution via `blkid`
|
|
- Partitions enriched with filesystem type, UUID, and label from direct `blkid` probing (Docker containers have incomplete udev cache)
|
|
- Returns `AvailableDisks` (non-system, non-loop, non-CDROM) and `SystemDisks` separately
|
|
- Handles NVMe (`nvme0n1p1`), SCSI (`sdb1`), and eMMC (`mmcblk0p1`) naming
|
|
|
|
#### Disk Initialization Wizard (`internal/storage/format.go`)
|
|
|
|
A step-by-step UI at `/settings/storage/init`:
|
|
|
|
1. **Scan** — Lists available disks with model, size, partition info
|
|
2. **Select** — User picks a disk and enters a mount name (e.g., `hdd_1`)
|
|
3. **Confirm** — User types "FORMAZAS" to confirm destructive operation
|
|
4. **Format pipeline**: `wipefs` → `sfdisk` (GPT) → `mkfs.ext4` → `blkid` UUID → backup fstab → append UUID-based fstab entry → mount → `findmnt` verification → `chown 1000:1000` → create `felhom-data/` and `Dokumentumok/` subdirectories
|
|
5. Auto-registers new storage path in settings.json
|
|
6. Smart partition detection: skips repartitioning for existing empty partitions
|
|
|
|
Safety guards: system disk detection, mount path conflict check, confirmation required, progress channel for real-time UI feedback.
|
|
|
|
#### Attach Existing Drive Wizard (`internal/storage/attach.go`)
|
|
|
|
A step-by-step UI at `/settings/storage/attach` for drives that already have a filesystem (e.g., a previously used ext4 drive). Unlike the init wizard, this does **not** format the drive — existing data is preserved.
|
|
|
|
**Problem solved:** Mounting a whole drive at `/mnt/<name>` would mix existing user data with the controller's directory structure (`felhom-data/`, `Dokumentumok/`, etc.). The bind-mount approach isolates the controller's working directory from other data on the drive.
|
|
|
|
1. **Scan** — Lists available disks, filtered to partitions that have an existing filesystem (FSType != "")
|
|
2. **Mount raw** — Partition is mounted read-only at a hidden staging path (`/mnt/.felhom-raw/<label>`)
|
|
3. **Browse** — Directory browser shows the drive's contents. User can navigate and create a new folder (e.g., `felhom_data`)
|
|
4. **Configure** — User enters a mount name and display label. Warning: mount path is immutable until detached
|
|
5. **Finalize** — Bind-mounts the selected subfolder at `/mnt/<name>`. Two fstab entries are created (both with `nofail`):
|
|
- Raw mount: `UUID=<uuid> /mnt/.felhom-raw/<x> <fstype> defaults,nofail,noatime 0 2`
|
|
- Bind mount: `/mnt/.felhom-raw/<x>/<subfolder> /mnt/<name> none bind,nofail 0 0`
|
|
6. Sets permissions (`chown 1000:1000`), creates `felhom-data/` and `Dokumentumok/` subdirectories
|
|
7. Auto-registers the storage path in settings.json + syncs FileBrowser mounts
|
|
|
|
Cancel at any point cleans up the temporary raw mount. The bind mount path (`/mnt/<name>`) is a real mount point, so all existing code (disk usage, IsMountPoint checks, etc.) works unchanged.
|
|
|
|
#### Storage Path Registry (`internal/settings/settings.go`)
|
|
|
|
Multiple external storage paths supported with:
|
|
- **Label**: Human-readable name (editable inline)
|
|
- **Default flag**: New deploys use this path by default
|
|
- **Schedulable flag**: Path appears in deploy dropdown
|
|
- **Disconnected state**: `Disconnected`, `DisconnectedAt`, `StoppedStacks` — set by watchdog or safe-disconnect API, cleared on reconnect
|
|
- **Auto-discovery**: On startup, scans deployed apps' `HDD_PATH` values and registers unknown paths
|
|
- Thread-safe CRUD: Add, Remove, SetDefault, SetSchedulable, SetLabel, SetDisconnected, ClearDisconnected
|
|
|
|
#### Data Migration (`internal/storage/migrate.go`)
|
|
|
|
Move app data between storage paths (e.g., SSD → HDD, HDD → new HDD):
|
|
|
|
1. Validate: stack exists, deployed, has HDD data, target differs from source
|
|
2. Estimate total size, check free space on target
|
|
3. Stop the application
|
|
4. `rsync -a --info=progress2` per mount path with real-time progress parsing
|
|
5. Update `app.yaml` HDD_PATH to new location
|
|
6. Start the application
|
|
7. **Rollback on failure**: reverts config, restarts on old storage
|
|
|
|
Progress UI at `/stacks/{name}/migrate` with byte counter and percentage.
|
|
|
|
#### Stale Data Cleanup
|
|
|
|
After migration, the deploy page detects leftover data on previous storage paths:
|
|
- Shows path, size, and a delete button
|
|
- Two-step confirmation required
|
|
- Protected paths (`felhom-data/`, `felhom-data/appdata/`, `felhom-data/backups/`, `media/`, `Dokumentumok/`) cannot be deleted
|
|
|
|
#### FileBrowser Mount Sync
|
|
|
|
When storage paths are added or removed, `syncFileBrowserMounts()` auto-regenerates FileBrowser's `docker-compose.yml` with volume mounts for all registered paths, then recreates the container.
|
|
|
|
#### Storage Watchdog (`internal/monitor/watchdog.go`)
|
|
|
|
Continuously monitors registered storage paths for disconnection/reconnection (primarily USB drives):
|
|
|
|
- **Probe loop**: `ProbeStoragePath()` calls `syscall.Statfs()` with 3-second timeout in a goroutine. Runs every 5s per connected path, 30s per disconnected path.
|
|
- **Debouncing**: 3 consecutive probe failures required before declaring a drive disconnected (prevents false positives from transient I/O).
|
|
- **Disconnect reaction** (automatic, ~15s detection):
|
|
1. Stops all deployed stacks whose `HDD_PATH` is under the disconnected drive (skips protected stacks)
|
|
2. Persists `Disconnected`, `DisconnectedAt`, `StoppedStacks` to `settings.json`
|
|
3. Lazy-unmounts stale VFS entries (`umount -l`) — for attach-wizard drives, unmounts bind first, then raw
|
|
4. Fires alert refresh (red banner on all pages), notification (`storage_disconnected`), and immediate hub report push
|
|
- **Auto-reconnect** (for UUID-based fstab entries):
|
|
1. Checks `/host-dev/disk/by-uuid/<uuid>` for device reappearance
|
|
2. Cleans stale mounts, then `mount -T /host-fstab <path>` (raw + bind for attach-wizard drives)
|
|
3. Verifies with a post-mount probe
|
|
4. Runs `restic unlock` if stale lock files exist
|
|
5. Validates `StoppedStacks` (filters to actually-stopped stacks), clears `Disconnected` flag
|
|
6. Fires alert refresh, notification (`storage_reconnected`), hub report push
|
|
|
|
**Safe disconnect UI** (manual, Settings page):
|
|
|
|
- "Leválasztás" button shown for USB drives (detected via sysfs symlink path containing `/usb`)
|
|
- Confirmation dialog lists affected apps
|
|
- Flow: stop apps → `sync` → `umount` (fallback `umount -l`) → mark disconnected → notification
|
|
- Disconnected card: dashed border, red badge, timestamp, stopped apps list, "Csatlakoztatás" (reconnect) button
|
|
- After reconnect: "Alkalmazások indítása" button to restart auto-stopped stacks
|
|
|
|
**USB detection** (`system.IsUSBDevice`): Reads `/host/sys/block/<disk>` symlink — if target path contains `/usb`, it's a USB device. The `removable` sysfs flag is unreliable for USB HDDs (returns 0). USB drives show an orange "USB" badge on their storage card alongside Aktív/Alapértelmezett badges (v0.27.2).
|
|
|
|
**Backup guards**: Nightly DB dumps, restic snapshots, and cross-drive backups all skip disconnected drives with WARN log (not treated as failures).
|
|
|
|
**UI integration**: Disconnected drives show with hatched red bars on dashboard, monitoring, and backup pages. Per-app backup rows show "Meghajtó leválasztva" badge. Health check emits warnings for disconnected paths.
|
|
|
|
---
|
|
|
|
### 5. Monitoring & Health
|
|
|
|
#### System Health Checks (`internal/monitor/healthcheck.go`)
|
|
|
|
`RunHealthCheck()` evaluates multiple subsystems and returns a `HealthReport` with status (`ok`/`warn`/`fail`):
|
|
|
|
| Check | Warning | Critical |
|
|
|-------|---------|----------|
|
|
| Disk usage (SSD/HDD) | >= 90% | >= 95% |
|
|
| Memory | available < 512MB | available < 256MB |
|
|
| CPU temperature | >= 75C | >= 85C |
|
|
| Docker daemon | — | unreachable |
|
|
| Protected containers | — | not running |
|
|
| Storage paths | not a mount point (data on SSD), drive disconnected | path inaccessible, disk >= 95% |
|
|
|
|
Backup destination validation (`CheckBackupDestination`) has tiered checks:
|
|
- Path doesn't exist → critical/blocked
|
|
- Not writable → critical/blocked
|
|
- Same block device as root → warning (data on system drive)
|
|
- Disk >95% full → critical/blocked
|
|
- Disk >90% full → warning
|
|
|
|
#### Healthchecks.io Integration (deprecated)
|
|
|
|
Legacy pinger (`internal/monitor/pinger.go`) still runs for backward compatibility but is no longer the primary monitoring mechanism. Monitoring is now handled by the Hub event system (see [Notifications](#5-notifications)). A deprecation log is emitted on startup if ping UUIDs are configured.
|
|
|
|
#### Metrics Store (`internal/metrics/`)
|
|
|
|
- **SQLite with WAL mode** for concurrent reads during collection
|
|
- **System metrics**: CPU%, memory (total/used/available), temperature, load average — collected every 60 seconds
|
|
- **Container metrics**: CPU%, memory, network I/O, block I/O per container
|
|
- Downsampled queries for chart time ranges (1h, 6h, 24h, 7d, 30d)
|
|
- 30-day auto-prune via daily scheduler job
|
|
|
|
#### Monitoring Page
|
|
|
|
Full-page system monitor at `/monitoring`:
|
|
- **System Overview**: hostname, OS, kernel, CPU model/cores, uptime
|
|
- **System Metrics Charts**: 4 line charts (CPU, Memory, Temperature, Load) in 2x2 grid
|
|
- **Memory Distribution Bar**: stacked bar showing per-container memory usage, OS/system overhead, and free memory (real-time from `/proc/meminfo` + container stats)
|
|
- **Container Resources**: horizontal bar charts (CPU% and Memory per container)
|
|
- **Per-container Detail**: click-to-expand historical charts
|
|
- **Hub Connection Status**: shows Hub URL, customer ID, connection state (connected/unreachable), last successful push, last error
|
|
|
|
Chart.js 4.4.7 embedded locally (works in offline environments), dark theme matching site design.
|
|
|
|
#### Alert System (`internal/web/alerts.go`)
|
|
|
|
State-based alerts displayed on all pages:
|
|
- Sources: health issues, Hub connection status, backup disabled, storage disconnected, update available
|
|
- Hub alerts: `hub-disabled` (warning) when Hub not enabled, `hub-unreachable` (error) when last push failed and no success in 30 min
|
|
- Sorted by severity (error > warning > info), capped at 5 visible
|
|
- Refreshed every 5 min + on startup + on storage state changes
|
|
|
|
---
|
|
|
|
### 6. Notifications
|
|
|
|
#### Hub Event System (`internal/notify/notifier.go`)
|
|
|
|
The controller pushes structured events to the Hub's `/api/v1/event` endpoint. The Hub handles notification dispatch, cooldown management, and dead man's switch detection.
|
|
|
|
**Core method:** `PushEvent(eventType, severity, message, details)` — non-blocking goroutine, 2 retries with 3s backoff, never blocks the caller.
|
|
|
|
#### Event Types
|
|
|
|
| Event Type | Severity | Trigger |
|
|
|------------|----------|---------|
|
|
| `backup_completed` | info | Nightly restic backup succeeds |
|
|
| `backup_failed` | error | Nightly restic backup fails |
|
|
| `db_dump_completed` | info | Nightly database dumps succeed |
|
|
| `db_dump_failed` | error | Nightly database dumps fail |
|
|
| `backup_integrity_ok` | info | Weekly `restic check` passes |
|
|
| `backup_integrity_failed` | error | Weekly `restic check` fails |
|
|
| `crossdrive_completed` | info | Cross-drive secondary backup succeeds |
|
|
| `crossdrive_failed` | error | Cross-drive secondary backup fails |
|
|
| `health_degraded` | warning | Health status degrades (ok→warn) |
|
|
| `health_critical` | error | Health status critical (any→fail) |
|
|
| `health_recovered` | info | Health status recovers (fail/warn→ok) |
|
|
| `disk_warning` | warning | Disk usage crosses 90% |
|
|
| `disk_critical` | error | Disk usage crosses 95% |
|
|
| `storage_disconnected` | error | Storage drive physically removed |
|
|
| `storage_reconnected` | info | Storage drive reconnected |
|
|
| `controller_started` | info | Controller process starts |
|
|
| `controller_updated` | info/error | Self-update success or failure |
|
|
| `app_deployed` | info | New app deployed via API |
|
|
| `app_removed` | info | App removed via API |
|
|
| `disaster_recovery_started` | warning | DR restore begins |
|
|
| `disaster_recovery_completed` | info/error | DR restore finishes (success/partial) |
|
|
|
|
Each event carries typed detail structs (e.g., `BackupDetails`, `DiskDetails`, `HealthDetails`) serialized as JSON.
|
|
|
|
#### Default Enabled Events
|
|
|
|
Events the customer receives notifications for (configurable in settings):
|
|
`backup_failed`, `db_dump_failed`, `disk_warning`, `disk_critical`, `storage_disconnected`, `node_down`, `health_critical`, `expected_backup_missed`, `expected_dbdump_missed`
|
|
|
|
#### Preference Sync
|
|
|
|
Notification preferences (email, enabled events, cooldown hours) are:
|
|
- Stored locally in `settings.json`
|
|
- Synced to Hub on save and on controller startup via `POST /api/v1/preferences`
|
|
- Hub sync failure doesn't block local save
|
|
|
|
---
|
|
|
|
### 7. Update Management
|
|
|
|
#### App Catalog Sync
|
|
|
|
- Periodic `git fetch` + `git reset --hard` of the app catalog repo
|
|
- Content-hash comparison prevents unnecessary file writes
|
|
- Post-sync stack rescan detects new/changed apps immediately
|
|
- **Stale lock recovery**: automatically removes `.git/index.lock`, `.git/shallow.lock`, and `.git/HEAD.lock` before each fetch — prevents permanent sync failures after interrupted operations (e.g. container restart mid-sync)
|
|
|
|
#### Planned Update Classifications
|
|
|
|
| Marker | Behavior |
|
|
|--------|----------|
|
|
| No marker | Optional — shown on dashboard, customer clicks "Update" |
|
|
| `UPDATE_REQUIRED=true` | Mandatory — auto-applied during next update window |
|
|
| `UPDATE_SECURITY=true` | Critical — applied immediately |
|
|
|
|
#### Controller Self-Update (`internal/selfupdate/`)
|
|
|
|
The controller can update itself — a Watchtower-style pull-and-restart mechanism for a single container. Replaces manual SSH-based `docker pull + sed + docker compose up -d` with a one-click Settings page button or scheduled auto-update.
|
|
|
|
##### How It Works
|
|
|
|
```
|
|
1. Check Gitea Docker Registry V2 API for new image tags
|
|
2. Compare highest semver tag with current Version (set at build time via ldflags)
|
|
3. If newer version exists → pull image → update compose file → docker compose up -d
|
|
4. Current container is replaced by Docker → new container starts with new version
|
|
5. On startup, new container reads update-state.json → marks update success/failure
|
|
```
|
|
|
|
##### Design Philosophy
|
|
|
|
- **No automatic rollback** — follows the Watchtower pattern (24k+ GitHub stars, no rollback). Docker's `restart: unless-stopped` policy is the crash safety net. The Hub's dead man's switch detects when the controller goes down.
|
|
- **Audit state file** — `update-state.json` in the data volume records every update attempt (previous version, target version, initiator, result). Operators can SSH in and revert using `PreviousImage` from this file.
|
|
- **Backup-aware** — refuses to start an update while a backup is in progress (`backupRunning()` guard).
|
|
|
|
##### Package Structure
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `version.go` | `ParseVersion("X.Y.Z")` → `Version{Major,Minor,Patch}`, `Compare()` returns -1/0/1. Hand-rolled, no external deps. Rejects "dev" and "latest". |
|
|
| `state.go` | `UpdateState` struct persisted as JSON. `LoadState()`, `SaveState()` (atomic: `.tmp` + rename), `ClearState()`. Status values: `"pending"`, `"success"`, `"failed"`. |
|
|
| `updater.go` | Core `Updater` struct. Registry check via HTTP GET to `gitea.dooplex.hu/v2/admin/felhom-controller/tags/list` with Basic Auth (git username/token). Update trigger: `docker pull` → compose file regex replace → `docker compose up -d`. Thread-safe with `sync.Mutex`. |
|
|
|
|
##### Update Trigger Flow
|
|
|
|
1. **Guard checks:** concurrent update lock, dev version check, backup running check, compose file accessible
|
|
2. Write `update-state.json` with status `"pending"` (audit trail)
|
|
3. `docker pull <image>:<targetVersion>`
|
|
4. Read compose file → replace image tag via regexp → atomic write (`.tmp` + rename)
|
|
5. `docker compose -f /opt/docker/felhom-controller/docker-compose.yml -p felhom-controller up -d`
|
|
6. Docker kills the current container, starts the new one
|
|
|
|
##### Startup Verification
|
|
|
|
Called once from `main.go` before the scheduler starts:
|
|
1. Load `update-state.json` — if missing or status != `"pending"`, nothing to do
|
|
2. Compare running `Version` with `state.TargetVersion`
|
|
3. **Match** → mark `"success"`, notify via hub
|
|
4. **Mismatch** → mark `"failed"`, notify via hub
|
|
5. No rollback attempt — operator reverts manually if needed
|
|
|
|
##### Auto-Update Scheduling
|
|
|
|
Two separate scheduler jobs prevent interference with backups:
|
|
|
|
| Job | Type | Default | Purpose |
|
|
|-----|------|---------|---------|
|
|
| `selfupdate-check` | `sched.Every` | 6h | Check registry, cache result (for UI). Never triggers update. |
|
|
| `selfupdate-auto` | `sched.Daily` | 04:30 | If auto-update enabled + update available + backup not running → trigger. |
|
|
|
|
The auto-update time (`config.SelfUpdate.AutoUpdateTime`, default `"04:30"`) is deliberately separate from the backup window (02:30-~04:00) to avoid collisions. The `backupRunning()` guard is the hard safety check — if backups run long past 04:30, the update is skipped and retried the next day.
|
|
|
|
An initial version check fires 30s after startup so the Settings page shows version info quickly.
|
|
|
|
##### Compose File Access
|
|
|
|
The controller needs write access to its own `docker-compose.yml`. This is achieved via Docker volume mount ordering:
|
|
|
|
```yaml
|
|
volumes:
|
|
# 1. Directory mount — gives access to compose file + config
|
|
- /opt/docker/felhom-controller:/opt/docker/felhom-controller
|
|
# 2. Read-only override — prevents accidental config writes
|
|
- /opt/docker/felhom-controller/controller.yaml:/opt/docker/felhom-controller/controller.yaml:ro
|
|
# 3. Named volume override — persistent data in Docker-managed volume
|
|
- controller-data:/opt/docker/felhom-controller/data
|
|
```
|
|
|
|
##### API Endpoints
|
|
|
|
| Method | Path | Auth | Description |
|
|
|--------|------|------|-------------|
|
|
| GET | `/api/selfupdate/status` | Session or API key | Current status (cached, no network call) |
|
|
| POST | `/api/selfupdate/check` | Session or API key | Force registry check, return result |
|
|
| POST | `/api/selfupdate/update` | Session or API key | Trigger update (async, returns immediately) |
|
|
|
|
Self-update endpoints accept either session auth (for UI) or hub API key as bearer token (for external triggering from build scripts or hub). This enables the post-v0.16.0 deploy workflow:
|
|
|
|
```bash
|
|
# After building + pushing new image:
|
|
curl -s -X POST https://felhom.demo-felhom.eu/api/selfupdate/update \
|
|
-H "Authorization: Bearer <HUB_API_KEY>"
|
|
```
|
|
|
|
##### Settings Page UI
|
|
|
|
The "Verzió és frissítés" card on the Settings page (`/settings`) shows:
|
|
- Current version and latest available version
|
|
- "Frissítés elérhető" (update available) badge
|
|
- Last check time and any errors
|
|
- Auto-update status with configured time
|
|
- Last update result (success/failed/pending)
|
|
- **Buttons:** "Frissítés keresése" (check) + "Frissítés telepítése" (apply)
|
|
|
|
After triggering an update, the page polls `/api/health` every 3s and reloads when the new container responds.
|
|
|
|
A global info-level alert ("Új controller verzió elérhető") appears on all pages when an update is available, linking to the Settings page.
|
|
|
|
##### Configuration
|
|
|
|
```yaml
|
|
self_update:
|
|
enabled: true
|
|
check_interval: "6h" # How often to check registry
|
|
image: "gitea.dooplex.hu/admin/felhom-controller" # Default
|
|
auto_update: false # Set true for unattended updates
|
|
auto_update_time: "04:30" # When to auto-apply (after backups)
|
|
health_timeout_seconds: 60 # Reserved for future use
|
|
```
|
|
|
|
##### Edge Cases
|
|
|
|
| Scenario | Behavior |
|
|
|----------|----------|
|
|
| `Version == "dev"` | `ParseVersion` returns error → no updates reported, trigger refused |
|
|
| Registry unreachable | Log warning, return error in check result. No crash. |
|
|
| No registry credentials | Return error "Registry hitelesítő adatok hiányoznak" |
|
|
| Compose file not writable | Refuse update before doing anything |
|
|
| Backup running | Refuse with "Mentés fut, próbálja később" |
|
|
| Concurrent update | Mutex prevents duplicates: "Frissítés már folyamatban" |
|
|
| Bad update (crash loop) | Docker restarts container. State file stays "pending". Operator SSH-reverts using `PreviousImage`. |
|
|
| Corrupt state file | Treated as "no pending update", logged, deleted |
|
|
|
|
---
|
|
|
|
### 8. Authentication & Settings
|
|
|
|
#### Session Auth (`internal/web/auth.go`)
|
|
|
|
- bcrypt password verification with configurable source priority: `settings.json` → `controller.yaml` → no auth (open access)
|
|
- 7-day session duration with random 32-byte hex tokens
|
|
- `?next=` redirect after login preserves the page the user was visiting
|
|
- Session cleanup every 15 minutes
|
|
- All sessions invalidated on password change
|
|
- Conditional logout link (hidden when auth is disabled)
|
|
- Each session stores a dedicated CSRF token (separate 32-byte random value) alongside the session token
|
|
|
|
#### CSRF Protection (`internal/web/csrf.go`)
|
|
|
|
Synchronizer-token CSRF protection on all browser-facing state-mutating endpoints.
|
|
|
|
**How it works:**
|
|
- `CsrfProtect` middleware wraps all route handlers in `main.go`
|
|
- Safe methods (GET, HEAD, OPTIONS) pass through without validation
|
|
- For POST/DELETE/PATCH: reads token from `_csrf` form field or `X-CSRF-Token` request header; constant-time compares against the session's stored CSRF token
|
|
- On rejection: JSON `{"ok":false,"error":"CSRF token missing or invalid"}` for `/api/` paths; HTTP 403 text page for UI routes
|
|
- Logs: `[WARN] CSRF rejected: METHOD /path from addr (reason)`
|
|
|
|
**Exempt paths (no CSRF check):**
|
|
- Requests with `Authorization: Bearer ...` header — hub→controller API calls (selfupdate, config/apply). Browsers cannot auto-send Bearer headers, so cross-site requests are impossible on these endpoints.
|
|
- Auth-disabled mode (`authEnabled() == false`) — CSRF is meaningless when there is no session.
|
|
|
|
**Token delivery to templates:**
|
|
- `executeTemplate(w, r, name, data)` wrapper in `server.go` auto-injects `CSRFField` (`template.HTML` hidden `<input>`) and `CSRFToken` (raw string) into every page's data map
|
|
- `layout.html` emits `<meta name="csrf-token" content="{{.CSRFToken}}">` and defines `csrfHeaders()` JS function in `<head>` (before page scripts)
|
|
- Forms: `{{.CSRFField}}` (or `{{$.CSRFField}}` inside `{{range}}` loops — outer scope required)
|
|
- JS `fetch()` calls: `headers: csrfHeaders()` — returns `{'X-CSRF-Token': metaContent}`
|
|
- Dynamically-created JS forms: read token from `document.querySelector('meta[name="csrf-token"]').content`
|
|
- `navigator.sendBeacon()` replaced with `fetch(..., {keepalive: true})` where used — `sendBeacon` cannot send custom headers
|
|
|
|
#### Settings Persistence (`internal/settings/settings.go`)
|
|
|
|
Runtime-mutable settings in `settings.json` (separate from infrastructure config):
|
|
|
|
| Section | Contents |
|
|
|---------|----------|
|
|
| `password_hash` | bcrypt hash override |
|
|
| `notifications` | email, enabled events, cooldown hours |
|
|
| `db_validations` | per-DB dump validation results (survives restarts) |
|
|
| `app_backup` | per-app map: enabled flag, cross-drive config (method, dest, schedule, runtime status) |
|
|
| `storage_paths` | registered paths with label, default flag, schedulable flag, disconnected state |
|
|
| `cross_drive_restic_password` | auto-generated restic password for cross-drive repos |
|
|
|
|
All public methods use `sync.RWMutex`. File writes are atomic (`.tmp` + rename).
|
|
|
|
#### Settings Page (`/settings`)
|
|
|
|
Five sections:
|
|
1. **System config** — read-only display of `controller.yaml` values
|
|
2. **Version & update** — current/latest version, check/update buttons, auto-update status, last update result
|
|
3. **Storage paths** — add/remove, edit labels, set default, toggle schedulable, per-path app list with sizes, safe disconnect/reconnect for USB drives
|
|
4. **Password change** — current + new + confirm, min 8 chars
|
|
5. **Notifications** — email, event checkboxes, cooldown hours, test email button
|
|
|
|
---
|
|
|
|
### 9. Central Hub Reporting
|
|
|
|
#### Report Push (`internal/report/`)
|
|
|
|
Periodic JSON push (default every 15 min) to the central felhom-hub service:
|
|
- System: hostname, OS, CPU, memory, disk usage, uptime
|
|
- Containers: running/stopped counts, per-container CPU/memory
|
|
- Backup: last run, success, repo stats, snapshot count, restic password (for disaster recovery)
|
|
- Health: current status, issues, warnings
|
|
- Stacks: deployed apps with versions and states
|
|
- Config hash: SHA256 of `controller.yaml` for Hub-side config comparison
|
|
- **App telemetry** (v0.28.0+): Per-stack memory (current/avg/peak) and CPU averages from the last 15 minutes of metrics data, plus log scan results (error/warning counts with deduplicated issues). Only non-protected, deployed stacks are included. Backward-compatible: old Hub versions silently ignore this field.
|
|
- **Controller telemetry** (v0.32.4+): The controller's own container (`felhom-controller`) is included as a special entry in the `app_telemetry` array. Its memory/CPU metrics come from the same metrics collector, and its log warnings/errors are scanned via `docker logs` using the same pipeline as app containers. This reuses all existing Hub telemetry infrastructure (memory trend charts, known issues, fleet aggregation) with zero Hub-side changes.
|
|
|
|
Bearer token authentication, 3-attempt retry with 5-second backoff. Push status tracked via `PushStatus` struct (LastAttempt, LastSuccess, LastError, consecutive failures) — used by the monitoring page and alert system to show Hub connection health.
|
|
|
|
#### App Telemetry (`internal/metrics/telemetry.go`, `internal/metrics/logscanner.go`, `internal/report/telemetry.go`)
|
|
|
|
Each report push now includes per-app telemetry data:
|
|
|
|
**Metrics collection** (`telemetry.go`):
|
|
- `MetricsStore.GetContainerTelemetry(since)` aggregates container-level memory (avg, peak, current) and CPU averages from the `container_metrics` SQLite table for the last 15 minutes.
|
|
|
|
**Log scanning** (`logscanner.go`):
|
|
- `ScanContainerLogs(containerNames, since, logger)` runs `docker logs --since=15m --tail=1000` sequentially on all non-protected deployed containers.
|
|
- Classifies lines by keyword match (errors: `error`, `fatal`, `panic`, `crit`, `oom`, `killed`, `exception`, `traceback`; warnings: `warn`, `warning`) on the first 5 words (case-insensitive).
|
|
- Deduplicates via fingerprinting: strips ANSI escape codes, ISO timestamps (with timezone offsets), and syslog timestamps (including mid-line); replaces 6+ digit numbers with `<N>`, 8+ char hex with `<HEX>`, UUIDs with `<UUID>`. Groups identical fingerprints, keeps top 10 per container.
|
|
- Returns `[]ContainerLogSummary` with `ErrorCount`, `WarnCount`, `RecentIssues []LogIssue`.
|
|
|
|
**Report integration** (`report/telemetry.go`):
|
|
- `buildAppTelemetrySection()` calls both, then `buildAppTelemetry()` aggregates by stack — summing container metrics, merging issues, capping at 10 per app. Additionally, `buildControllerTelemetry()` creates a special entry for the controller container itself (`app_name: "felhom-controller"`).
|
|
- Results stored as `[]AppTelemetry` in the `Report` struct field `app_telemetry`.
|
|
|
|
#### Infrastructure Backup to Hub (`internal/report/infra_backup.go`)
|
|
|
|
After each backup cycle (including manual Tier 2 triggers via `OnCrossDriveComplete` callback), the controller pushes a full infrastructure snapshot to the Hub for disaster recovery. This snapshot includes:
|
|
- `controller.yaml` (base64-encoded, full config including secrets)
|
|
- `settings.json` (base64-encoded, backup prefs, storage paths, cross-drive configs)
|
|
- Disk layout (UUIDs, labels, mount points, fstab options, bind-mount topology)
|
|
- Deployed stacks manifest (app names, HDD paths)
|
|
- Restic passwords (primary + cross-drive, base64-encoded)
|
|
|
|
This enables fully automated recovery when the system drive is replaced — the new controller pulls the snapshot from the Hub, auto-mounts surviving drives by UUID, and restores all applications.
|
|
|
|
#### Hub Dashboard
|
|
|
|
The hub service (separate Go app in the `felhom.eu` repo) provides:
|
|
- Multi-customer overview table with status indicators and event count badges
|
|
- Customer detail page with system/storage/containers/backup/health/events sections
|
|
- Event timeline: last 50 events with severity filter, colored badges, source tracking
|
|
- Dead man's switch: staleness detection (30min stale, 60min down), missed backup detection (daily at 05:00)
|
|
- Notification dispatch: operator (English) + customer (Hungarian) emails via Resend with per-event cooldowns
|
|
- Infra backup status per customer (last sync, stack count, disk count)
|
|
- Color coding: green (<30min), yellow (30-60min), red (>60min since last report)
|
|
- 90-day report + event retention with daily prune at 04:30 Budapest time
|
|
|
|
### 10. First-Run Setup Wizard
|
|
|
|
When the controller starts with no valid customer configuration (`customer.id` empty), it enters **setup mode** — a web-based wizard that handles all initial configuration. This replaces the old interactive shell wizard in `docker-setup.sh`.
|
|
|
|
#### Setup Mode Detection (`internal/setup/setup.go`)
|
|
|
|
`NeedsSetup(cfg)` returns true when `customer.id` is empty or a `.needs-setup` marker file exists. In setup mode, the controller skips normal startup (no scheduler, no backup, no stacks) and serves only the wizard UI on two listeners:
|
|
- `:8080` — behind Traefik (accessible via domain, e.g. `https://felhom.example.com`)
|
|
- `:8081` — direct HTTP (accessible via LAN IP, e.g. `http://192.168.0.100:8081`)
|
|
|
|
#### Wizard Flow
|
|
|
|
```
|
|
┌──────────────────────────────────┐
|
|
│ 1. Welcome │
|
|
│ Choose: Restore / Fresh install │
|
|
└─────────┬───────────┬────────────┘
|
|
│ │
|
|
┌─────▼─────┐ ┌──▼───────────────┐
|
|
│ 2a. Scan │ │ 2b. Hub download │
|
|
│ drives for│ │ (customer ID + │
|
|
│ local │ │ password) │
|
|
│ backups │ │ │
|
|
└─────┬─────┘ └──────┬────────────┘
|
|
│ │
|
|
┌─────▼─────┐ │
|
|
│ 2a.2 Hub │ │
|
|
│ recovery │ │
|
|
│ (fallback)│ │
|
|
└─────┬─────┘ │
|
|
│ │
|
|
┌─────▼─────┐ ┌──────▼───────────┐
|
|
│ Execute │ │ Execute fresh │
|
|
│ restore │ │ install │
|
|
└─────┬─────┘ └──────┬───────────┘
|
|
│ │
|
|
└───────┬───────┘
|
|
▼
|
|
os.Exit(0) → Docker restarts
|
|
→ normal mode
|
|
```
|
|
|
|
#### Hub Pre-Seeding
|
|
|
|
When `docker-setup.sh` is run with `--hub-customer` / `--hub-password`, the controller receives
|
|
pre-seeded credentials via environment variables:
|
|
|
|
| Env var | Purpose |
|
|
|---------|---------|
|
|
| `FELHOM_SETUP_CUSTOMER_ID` | Pre-fills customer ID in wizard forms |
|
|
| `FELHOM_SETUP_PASSWORD` | Pre-fills retrieval password for auto-processing |
|
|
|
|
In hub mode, the welcome page shows three cards instead of two:
|
|
1. **"Visszaállítás a Hub-ról"** — auto-calls `PullRecovery()`, shows infra backup details
|
|
2. **"Visszaállítás helyi meghajtóról"** — standard drive scan
|
|
3. **"Friss telepítés"** — auto-calls `PullConfig()`, downloads config only
|
|
|
|
Both hub paths auto-process when credentials are pre-seeded (no form entry needed).
|
|
On error, the wizard falls back to the manual form with the error displayed.
|
|
|
|
#### Key Components
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `setup/setup.go` | `NeedsSetup()` detection, `SetupState` persistence to `setup-state.json` |
|
|
| `setup/handlers.go` | HTTP handlers for each wizard step (welcome, scan, hub-restore, fresh, manual) |
|
|
| `setup/scanner.go` | Scans all block devices for `.felhom-infra-backup/` directories (current + `history/`) via `lsblk` + temp mounts; returns rich info (app names, disk count) |
|
|
| `setup/hub.go` | Hub recovery pull (`GET /api/v1/recovery/{id}`) and config download |
|
|
| `setup/csrf.go` | Lightweight CSRF protection (cookie + hidden field, `SameSite=Strict`) |
|
|
| `setup/network.go` | Detects local IPs for LAN access URL display |
|
|
| `setup/templates/` | 8 embedded HTML templates (Hungarian, dark theme matching main UI) — includes `setup_hub_versions.html` for Hub backup version picker |
|
|
|
|
#### Local Infra Backup (`internal/backup/local_infra.go`)
|
|
|
|
The controller writes infrastructure snapshots to **every connected drive** after each backup cycle and on startup. Location: `<drive>/.felhom-infra-backup/`. Files:
|
|
- `backup.json` — full infra backup (config, settings, disk layout, passwords, stacks)
|
|
- `metadata.json` — schema version, timestamp, customer ID, controller version, SHA256 checksum
|
|
- `history/` — previous backup versions (last 5), rotated automatically before each write
|
|
- `{timestamp}-backup.json` + `{timestamp}-metadata.json` pairs (timestamp format: `20060102T150405Z`)
|
|
- Oldest entries pruned when count exceeds 5
|
|
|
|
During setup wizard drive scan, both current and historical backups are discovered, integrity-verified, and offered for one-click restore. The scan results table shows app names/count, disk count, and a "korábbi" badge for historical versions.
|
|
|
|
#### Recovery Info (`internal/recovery/info.go`)
|
|
|
|
Generates `recovery-info.txt` on the system data partition with customer ID, Hub URL, retrieval password, and recovery instructions in Hungarian. Updated on startup and after config changes. Also displayed on the Settings page in a "Vészhelyzeti információk" section.
|
|
|
|
### 11. Disaster Recovery
|
|
|
|
When a system drive fails and is replaced, the recovery flow uses the setup wizard:
|
|
|
|
```
|
|
1. docker-setup.sh deploys fresh controller with minimal config
|
|
- With --hub-customer: credentials pre-seeded via env vars
|
|
- Without: user enters credentials manually in wizard
|
|
2. Controller detects empty customer.id → enters setup mode
|
|
3. User opens wizard at http://<LAN-IP>:8081
|
|
4. Hub mode: welcome page shows Hub restore / local scan / fresh install
|
|
Non-hub mode: welcome page shows restore / fresh install
|
|
5. Hub restore: auto-connects to Hub, shows version picker if multiple versions
|
|
Local restore: scans all drives for .felhom-infra-backup/ directories (current + history/)
|
|
6. User selects backup version → restore: config, settings, passwords, disk layout
|
|
7. Controller restarts into normal mode with full config
|
|
8. Controller auto-mounts surviving drives by UUID from disk layout
|
|
9. Dashboard shows "Visszaállítás" (Restore) page for app-level recovery
|
|
10. User confirms → sequential restore: rsync first, restic fallback, DB import
|
|
```
|
|
|
|
**Backup sources (priority order):**
|
|
1. **Local infra backup** (`.felhom-infra-backup/` on surviving drives) — fastest, no network needed
|
|
2. **Hub recovery endpoint** (`GET /api/v1/recovery/{id}`) — requires retrieval password, supports `?version=ID` for specific versions; Hub retains ~14 versions via GFS pruning (7 daily / 4 weekly / 3 monthly)
|
|
3. **Manual config** (wizard form) — enter all details manually as last resort
|
|
|
|
**Hub verification:** After setup, the controller periodically verifies customer standing via the Hub report push response (`customer_blocked` field). If blocked or Hub unreachable for >7 days, the controller enters limited mode (no new deployments).
|
|
|
|
---
|
|
|
|
### 12. Asset Sync
|
|
|
|
App assets (logos, screenshots) are managed centrally by the Hub and downloaded to each controller via a daily sync process. This decouples asset updates from controller image rebuilds — new app icons only require a Hub redeploy.
|
|
|
|
#### How It Works (`internal/assets/syncer.go`)
|
|
|
|
```
|
|
1. Fetch manifest from Hub: GET /api/v1/assets/manifest (Bearer auth)
|
|
2. Compare SHA-256 checksums with local cache (<dataDir>/assets/)
|
|
3. Download changed/new files: GET /api/v1/assets/file/{filename}
|
|
4. Remove local files not in Hub manifest (stale cleanup)
|
|
5. Save local manifest copy for next comparison
|
|
```
|
|
|
|
#### Asset Resolution (two-tier)
|
|
|
|
| Priority | Path | Source |
|
|
|----------|------|--------|
|
|
| 1 | `<dataDir>/assets/` | Downloaded from Hub (synced cache) |
|
|
| 2 | `/usr/share/felhom/assets/` | Baked into Docker image (fallback) |
|
|
|
|
The `Resolve(filename)` method checks the synced cache first, then falls back to the baked-in directory. This ensures assets are always available even before the first sync.
|
|
|
|
The Felhom logo (`/static/felhom-logo.svg`) also uses this two-tier resolution: the logo handler checks synced assets first, then falls back to the embedded SVG constant. This allows logo updates via Hub without a controller rebuild. The logo is also used as an SVG favicon.
|
|
|
|
#### Configuration
|
|
|
|
```yaml
|
|
assets:
|
|
sync_enabled: true # Opt-in: download assets from Hub API
|
|
sync_schedule: "05:00" # Daily sync time (HH:MM, Budapest timezone)
|
|
```
|
|
|
|
Asset sync requires `hub.enabled: true` with valid `hub.url` and `hub.api_key`. The initial sync runs 10 seconds after startup (to let subsystems initialize), then daily at the configured time.
|
|
|
|
#### Sync Status
|
|
|
|
The syncer tracks status (last sync time, result, file count, total bytes) accessible via `GET /api/assets/status`. On-demand sync can be triggered via `POST /api/assets/sync`.
|
|
|
|
#### File Types
|
|
|
|
The Hub serves three asset types per app:
|
|
- `{slug}-logo.svg` — primary SVG logo
|
|
- `{slug}-logo.png` — PNG fallback
|
|
- `{slug}-screenshot-{N}.webp` — app screenshots
|
|
|
|
#### Key Design Decisions
|
|
|
|
- **Opt-in via `sync_enabled`** — backward compatible, baked-in assets still work without Hub
|
|
- **SHA-256 change detection** — only downloads files that actually changed (bandwidth efficient)
|
|
- **Atomic file writes** — downloads to `.tmp` then `os.Rename` for crash safety
|
|
- **Stale file cleanup** — removes local files not in the Hub manifest (e.g., deleted apps)
|
|
- **Non-blocking initial sync** — runs in a goroutine with 10s delay, doesn't block startup
|
|
|
|
---
|
|
|
|
### 13. Debug Mode
|
|
|
|
When `logging.level: "debug"` is set in `controller.yaml`, the controller exposes a full diagnostic dashboard at `/debug` with 9 testing sections. All debug endpoints are gated — at `info` level, the sidebar link disappears and all `/api/debug/*` routes return 404.
|
|
|
|
#### Debug Page Sections
|
|
|
|
| # | Section | Endpoints | Description |
|
|
|---|---------|-----------|-------------|
|
|
| 1 | Rendszer diagnosztika | `GET /api/debug/dump` | Full state dump: controller info, storage, stacks, scheduler, health, alerts. JSON download. |
|
|
| 2 | Értesítés teszt | `POST /api/debug/event/test`, `GET /api/debug/event/history` | Send test events with configurable type/severity, view event history ring buffer. |
|
|
| 3 | Mentés teszt | `POST /api/debug/backup/{dbdump,crossdrive,integrity,infra}` | Trigger individual backup phases independently. |
|
|
| 4 | Tárhely teszt | `POST /api/debug/storage/simulate-{disconnect,reconnect}`, `GET /api/debug/storage/watchdog-status` | Simulate drive disconnect/reconnect without unmounting. Per-path probe state with 5s auto-refresh. |
|
|
| 5 | Hub & Kapcsolatok | `POST /api/debug/hub/{push,infra-push,test-connectivity,preferences-sync}`, `POST /api/debug/gitea/test-connectivity` | Test Hub/Gitea connectivity with latency. Push reports and sync preferences. |
|
|
| — | Telemetria teszt | `GET /api/debug/telemetry` | Run the full telemetry collection pipeline on-demand (metrics query + log scan). Returns per-app table: container list, memory current/avg/peak, CPU avg, catalog limit, log error/warning counts, and top issues. Useful for verifying container→stack mapping and testing log scanner patterns without waiting for the 15-minute report cycle. |
|
|
| 6 | Önfrissítés teszt | `POST /api/debug/selfupdate/dry-run` | Dry-run update check: current vs new image lines, compose writability, backup state. |
|
|
| 7 | DR / Telepítő varázsló | `POST /api/debug/dr/trigger-setup`, `GET /api/debug/dr/infra-status` | Infra backup status per drive. Trigger setup mode via marker file (requires "RESET" + infra backup pre-check). |
|
|
| 8 | Naplóviewer | `GET /api/debug/logs?level=&limit=&after=` | In-memory log viewer (last 1000 entries), level filter, 2s auto-refresh, color-coded entries. |
|
|
|
|
#### Key Implementation Details
|
|
|
|
- **Log buffer** (`internal/web/logbuffer.go`): Ring buffer implementing `io.Writer`, created before all modules via `io.MultiWriter(os.Stdout, logBuffer)`. Parses `[DEBUG]`/`[INFO]`/`[WARN]`/`[ERROR]` tags from standard log format.
|
|
- **Storage simulation**: `simulatedPaths` map in watchdog prevents the watchdog from re-probing simulated-disconnected paths. Disconnect runs all real steps except `lazyUnmount` (drive stays physically mounted).
|
|
- **DR trigger safety**: Uses marker file (`data/.needs-setup`) instead of modifying controller.yaml. Pre-checks that infra backup exists on at least one drive.
|
|
- **Routing**: `/api/debug/` carved out in HTTP mux (same pattern as `/api/storage/`), routed to web server with auth + CSRF.
|
|
- **DebugCallbacks**: 7 closures wired from main.go for operations needing modules not on Server struct (hub push, infra backup, connectivity tests, telemetry preview).
|
|
- **Telemetry debug**: `GetTelemetryPreview` callback calls `report.BuildAppTelemetryForDebug()` (exported wrapper around the private `buildAppTelemetrySection()`). Result renders as a table with collapsible raw JSON. Available regardless of hub configuration.
|
|
|
|
#### Per-Module Logging
|
|
|
|
All modules emit structured log lines at `[INFO]`, `[WARN]`, and `[ERROR]` levels for operational events (state changes, completions, failures). When `logging.level: "debug"`, additional detailed `[DEBUG] [module]` prefixed log lines are emitted. Each module with stateful debug (struct-based) exposes a `SetDebug(bool)` method, wired from `main.go`. Modules without a struct use package-level `DebugLogger` variables (e.g., `system.DebugLogger`).
|
|
|
|
**Standard-level logging** (always active):
|
|
- `[INFO]` — Operational events: stack deploy/start/stop, backup completion, config changes, disk operations, sync results
|
|
- `[WARN]` — Degraded states: health threshold breaches, unsafe backup destinations, retryable failures, best-effort operation failures
|
|
- `[ERROR]` — Hard failures: data restore errors, integration apply failures, compose file update errors, disk format failures
|
|
|
|
| Module | Debug Field | Prefix | Key Areas |
|
|
|--------|------------|--------|-----------|
|
|
| `stacks` | `cfg.Logging.Level` | `[DEBUG] [stacks]` | Stack CRUD, compose commands, env vars, HDD mounts, encryption migration, health probes |
|
|
| `backup` | `ResticManager.debug` | `[DEBUG] [restic]` / `[DEBUG] [backup]` | Restic commands, snapshot operations, restore scanning, drive mounting |
|
|
| `cloudflare` | `Client.debug` + `GeoSyncManager.debug` | `[CF-DEBUG]` / `[DEBUG] [cloudflare]` | API requests/responses, WAF rule CRUD, zone resolution, geo sync diff |
|
|
| `integrations` | `Manager.debug` | `[DEBUG] [integrations]` | Toggle apply/revoke timing, lifecycle hooks, config reapply |
|
|
| `system` | `DebugLogger` | `[DEBUG] [system]` | Memory/disk/CPU/load/temp collection, mount probing, USB detection |
|
|
| `monitor` | `Pinger.debug` | `[DEBUG] [pinger]` | Health ping URLs, retry attempts, response codes |
|
|
| `settings` | `Settings.debug` | `[DEBUG] [settings]` | Load/save sizes, storage path ops, geo/integration state changes |
|
|
| `scheduler` | `Scheduler.debug` | `[DEBUG] [sched]` | Job registration, execution timing, daily schedule calculations |
|
|
| `web` | `cfg.Logging.Level` | `[DEBUG] [web]` | HTTP requests, auth decisions, session management, storage API ops |
|
|
| `api` | `Router.debug` | `[DEBUG] [api]` | API routing, handler entry points, request details |
|
|
| `selfupdate` | `Updater.debug` | `[DEBUG] [selfupdate]` | Version checks, update preconditions, docker pull timing |
|
|
| `assets` | `Syncer.debug` | `[DEBUG] [assets]` | Manifest fetch, hash comparison, file download timing |
|
|
| `storage` | logger-based | `[DEBUG] [storage]` | Disk scanning, formatting, attach, drive migration |
|
|
| `metrics` | logger-based | `[DEBUG] [metrics]` | Per-container log scanning, error/warning counts |
|
|
| `appexport` | `Exporter.debug` | `[DEBUG] [appexport]` | Export/import steps, crypto operations, bundle scanning |
|
|
|
|
---
|
|
|
|
### 14. Geo-Restriction
|
|
|
|
Country-based access control via **Cloudflare WAF Custom Rules**. The controller manages WAF rules in the `http_request_firewall_custom` phase to block requests from non-allowed countries. Rules are identified by a `[felhom-geo]` description prefix — other WAF rules are never touched.
|
|
|
|
#### Prerequisites
|
|
|
|
The existing `cf_api_token` (used for DNS-01 ACME) needs **Zone WAF:Edit** permission added. No new token is needed — just expanded permissions on the same token. The settings UI only appears when a CF API token is configured.
|
|
|
|
#### Architecture
|
|
|
|
```
|
|
┌─────────────┐ ┌──────────────────┐ ┌──────────────────────┐
|
|
│ Settings UI │────▶│ GeoSyncManager │────▶│ Cloudflare WAF API │
|
|
│ (settings. │ │ (geosync.go) │ │ /zones/{id}/ │
|
|
│ html) │ │ diff & apply │ │ rulesets/{id}/rules │
|
|
└─────────────┘ └──────────────────┘ └──────────────────────┘
|
|
│ ▲
|
|
│ POST /api/geo/* │ Scheduler (6h)
|
|
▼ │ + deploy/remove hooks
|
|
┌─────────────┐ │
|
|
│ API layer │──────────────┘
|
|
│ (geo.go) │
|
|
└─────────────┘
|
|
```
|
|
|
|
**Rule structure:**
|
|
- **Global rule**: `(not ip.src.country in {"HU"})` → block (with `http.host ne` exclusions for apps that have per-app overrides)
|
|
- **Per-app rule**: `(http.host eq "app.example.com" and not ip.src.country in {"HU" "US"})` → block
|
|
- **Block response**: HTTP 403 with Hungarian message
|
|
|
|
**Local network access** is inherently unaffected — traffic from the LAN goes directly to the server, bypassing Cloudflare entirely.
|
|
|
|
#### Cloudflare API Client (`internal/cloudflare/`)
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `client.go` | HTTP client with Bearer token auth, 15s timeout, generic `do()` helper |
|
|
| `zone.go` | Zone ID resolution — tries exact domain, then parent domains progressively |
|
|
| `waf.go` | WAF rule CRUD, expression builders (`BuildGlobalExpression`, `BuildAppExpression`) |
|
|
| `countries.go` | ~250 ISO 3166-1 alpha-2 codes with Hungarian names |
|
|
| `geosync.go` | Sync orchestrator — diffs desired vs existing rules, creates/updates/deletes |
|
|
|
|
**GeoSyncManager** uses a `StackLister` interface (implemented by `geoStackAdapter` in main.go) to get deployed app hostnames without circular imports.
|
|
|
|
#### Settings Model
|
|
|
|
Stored in `settings.json` (runtime-modifiable):
|
|
|
|
```go
|
|
type GeoRestriction struct {
|
|
Enabled bool `json:"enabled"`
|
|
AllowedCountries []string `json:"allowed_countries"`
|
|
AppOverrides map[string]AppGeoOverride `json:"app_overrides,omitempty"`
|
|
LastSync string `json:"last_sync,omitempty"`
|
|
LastSyncError string `json:"last_sync_error,omitempty"`
|
|
ZoneID string `json:"zone_id,omitempty"`
|
|
RulesetID string `json:"ruleset_id,omitempty"`
|
|
}
|
|
```
|
|
|
|
Thread-safe access via `GetGeoRestriction()`, `SetGeoRestriction()`, `SetGeoAppOverride()`, `RemoveGeoAppOverride()`, `SetGeoSyncState()`.
|
|
|
|
#### API Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| GET | `/api/geo/status` | Current geo settings + sync state |
|
|
| POST | `/api/geo/settings` | Update global settings (enable/disable, countries) |
|
|
| POST | `/api/geo/sync` | Trigger manual sync |
|
|
| GET | `/api/geo/countries` | Full country list for search UI |
|
|
| POST | `/api/stacks/{name}/geo/override` | Set per-app country override |
|
|
| DELETE | `/api/stacks/{name}/geo/override` | Remove per-app override |
|
|
|
|
All mutating endpoints trigger an async Cloudflare sync. The `/api/geo/` path accepts both session auth and Hub Bearer token auth (via `selfUpdateAuthMiddleware`), enabling Hub-side geo-disable for lockout recovery.
|
|
|
|
#### Sync Triggers
|
|
|
|
1. **Settings change** — user saves geo settings or per-app override
|
|
2. **Deploy/remove** — app deployment or removal changes the hostname list
|
|
3. **Scheduler** — periodic verification every 6 hours
|
|
4. **Startup** — delayed initial sync 15s after boot
|
|
5. **Manual** — "Szinkronizálás" button on settings page
|
|
|
|
#### UI
|
|
|
|
**Settings page** ("Beállítások" → "Földrajzi korlátozás"):
|
|
- Enable/disable toggle
|
|
- Searchable country autocomplete with tag-based selection
|
|
- Hungary pinned with `confirm()` warning on removal
|
|
- Per-app overrides summary with add/edit/remove
|
|
- Sync status display (last sync time, errors)
|
|
|
|
**App detail page** (per-app override, shown when geo is globally enabled):
|
|
- Toggle for custom country restriction
|
|
- Independent country selector
|
|
|
|
---
|
|
|
|
### 15. App-to-App Integrations
|
|
|
|
Generic framework for connecting deployed applications to each other. Provider apps declare available integrations in `.felhom.yml`, and users enable/disable them via toggle switches on the provider's deploy/settings page ("Beállítások").
|
|
|
|
#### Architecture (`internal/integrations/`)
|
|
|
|
- **`integrations.go`** — Core types: `Handler` interface (`Apply`/`Revoke`), `ApplyContext` (carries domain, decrypted env vars, provider metadata, stacks dir, logger, restart func), `StatusInfo` (UI data), `IntegrationKey()`/`ParseIntegrationKey()` key helpers
|
|
- **`manager.go`** — `Manager` coordinates toggle operations, builds apply contexts from decrypted app.yaml env vars. Uses `StackProvider` interface (GetStack, GetStacks, RestartStack) to break circular imports with stacks package — adapted via `integrationStackAdapter` in main.go. Key methods:
|
|
- `Toggle(ctx, provider, target, enable)` — Validates both apps deployed+running, calls Apply/Revoke, persists state
|
|
- `ListForProvider(slug)` — Returns `[]StatusInfo` for UI with target deployment/running status
|
|
- `ReapplyConfigForTarget(name)` — Re-applies all active integrations targeting a stack (config-only, no restart). Used by `SyncFileBrowserMounts` after config regeneration
|
|
- **`lifecycle.go`** — Lifecycle hooks called from API router goroutines:
|
|
- `OnStackStop` — Revokes active integrations, sets `"provider_stopped"`/`"target_unavailable"` (keeps `enabled=true`)
|
|
- `OnStackStart` — Re-applies enabled integrations after 5s delay (waits for stack state refresh). Accepts both `StateRunning` and `StateStarting` via `isStackUp()` helper
|
|
- `OnStackRemove` — Revokes and permanently deletes integration state
|
|
- **Handler implementations** — One file per integration pair (e.g. `onlyoffice_filebrowser.go`, `onlyoffice_nextcloud.go`)
|
|
|
|
#### Integration State
|
|
|
|
Stored in `settings.json` under `integrations` map (key: `"provider:target"`):
|
|
- `enabled` — User intent (survives stop/restart)
|
|
- `status` — Current state: `"active"`, `"error"`, `"disabled"`, `"provider_stopped"`, `"target_unavailable"`
|
|
- `last_error` — Most recent error message
|
|
- `enabled_at` — RFC3339 timestamp
|
|
|
|
CRUD methods in settings.go: `GetIntegrationState`, `SetIntegrationState`, `RemoveIntegrationState`, `GetIntegrationsForProvider`, `GetIntegrationsForTarget` (all use existing RWMutex + atomic write pattern).
|
|
|
|
#### Lifecycle
|
|
|
|
1. **Enable**: User toggles on → validates both apps deployed+running → calls `Handler.Apply()` → persists state as `"active"`
|
|
2. **Disable**: User toggles off → calls `Handler.Revoke()` → persists state as `"disabled"`
|
|
3. **Provider/target stops**: `OnStackStop` → calls `Handler.Revoke()` → sets status to `"provider_stopped"` or `"target_unavailable"` (keeps `enabled=true`)
|
|
4. **Provider/target starts**: `OnStackStart` (5s delay) → finds enabled integrations with non-active status → re-applies if both sides running/starting
|
|
5. **Provider/target removed**: `OnStackRemove` → revokes and deletes integration state permanently
|
|
6. **FileBrowser config regen**: `SyncFileBrowserMounts` regenerates `config.yaml` from scratch → `ReapplyConfigForTarget("filebrowser")` patches integration config synchronously before `docker compose up -d --force-recreate`
|
|
|
|
**Important**: `SyncFileBrowserMounts` uses `--force-recreate` because `config.yaml` is a bind mount — without it, `docker compose up -d` won't recreate the container when only the config file changes (compose only detects compose file changes). `ReapplyConfigForTarget` calls each handler's `Apply` with a no-op `RestartStack` since the caller handles the restart.
|
|
|
|
#### Built-in Handlers
|
|
|
|
**OnlyOffice → FileBrowser** (`onlyoffice_filebrowser.go`):
|
|
- Apply: Reads `JWT_SECRET` + `SUBDOMAIN` from OnlyOffice app.yaml (decrypted), strips any existing `integrations:` block from FileBrowser `config.yaml` via `removeIntegrationsSection()`, appends new block with `url` (public HTTPS), `internalUrl` (`http://onlyoffice:80`), `secret`, `viewOnly: false`. Atomic write (`.tmp` + rename). Restarts FileBrowser
|
|
- Revoke: Strips `integrations:` block from config.yaml, restarts FileBrowser
|
|
|
|
**OnlyOffice → Nextcloud** (`onlyoffice_nextcloud.go`):
|
|
- Apply: Runs `docker exec -u www-data nextcloud php occ` commands:
|
|
1. `app:install onlyoffice` (tolerates "already installed")
|
|
2. `app:enable onlyoffice`
|
|
3. `config:app:set onlyoffice DocumentServerUrl --value=https://{subdomain}.{domain}`
|
|
4. `config:app:set onlyoffice DocumentServerInternalUrl --value=http://onlyoffice:80`
|
|
5. `config:app:set onlyoffice jwt_secret --value={JWT_SECRET}`
|
|
6. `config:app:set onlyoffice StorageUrl --value=http://nextcloud` (internal callback URL)
|
|
- Revoke: Runs `occ app:disable onlyoffice` (tolerates container not running / app not enabled)
|
|
|
|
**OnlyOffice compose template notes**: Requires Traefik middleware `X-Forwarded-Proto=https` in labels so the Document Server generates HTTPS URLs for editor resources (prevents mixed content errors in browser).
|
|
|
|
#### Metadata (`.felhom.yml`)
|
|
|
|
Provider apps declare integrations in their `.felhom.yml`. Parsed into `IntegrationDef` struct in `metadata.go`, with `HasIntegrations()` helper.
|
|
|
|
```yaml
|
|
integrations:
|
|
- target: filebrowser
|
|
label: "FileBrowser integráció"
|
|
description: "Dokumentumok szerkesztése a fájlkezelőben"
|
|
- target: nextcloud
|
|
label: "Nextcloud integráció"
|
|
description: "Dokumentumok szerkesztése a Nextcloudban"
|
|
```
|
|
|
|
#### API Endpoints
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/integrations/{provider}` | List integrations for a provider app (status, target availability) |
|
|
| POST | `/api/integrations/{provider}/{target}` | Enable/disable integration (`{"enabled": true/false}`) |
|
|
|
|
Routes registered **before** `hasSuffix`-based stack routes in router.go (see router bug pattern).
|
|
|
|
#### UI
|
|
|
|
Toggle switches on the provider's deploy/settings page ("Integrációk" section, within `deploy.html`). Data wired in `deployHandler()` for deployed apps only. Each integration shows:
|
|
- Label and description from `.felhom.yml` metadata
|
|
- Status badge: "Aktív", "Nincs telepítve", "Célalkalmazás leállítva", "Hiba"
|
|
- Toggle checkbox (disabled when target not deployed/running)
|
|
- JS `toggleIntegration()` → POST to API → reload on success
|
|
|
|
#### Wiring (main.go)
|
|
|
|
- `integrationStackAdapter` type implements `integrations.StackProvider` (same pattern as `stackAdapter`, `geoStackAdapter`)
|
|
- `integrations.NewManager(sett, adapter, domain, stacksDir, encKey, logger)` — registers built-in handlers
|
|
- Wired into API router via `SetIntegrationManager()` and web server via `SetIntegrationManager()`
|
|
|
|
---
|
|
|
|
## Repository Layout
|
|
|
|
```
|
|
controller/
|
|
├── cmd/controller/main.go # Entry point, wires all 17 modules (setup mode branch + normal startup)
|
|
├── internal/
|
|
│ ├── config/config.go # YAML loader, validation, env overrides
|
|
│ ├── crypto/crypto.go # AES-256-GCM encryption for app.yaml secrets, key management
|
|
│ ├── settings/settings.go # Runtime settings (JSON, atomic writes, RWMutex)
|
|
│ ├── stacks/
|
|
│ │ ├── manager.go # Stack scanning, compose ops, container status
|
|
│ │ ├── metadata.go # Parse .felhom.yml app metadata
|
|
│ │ ├── deploy.go # First-deploy: secret gen, app.yaml, compose up; missing field injection
|
|
│ │ └── delete.go # Stack deletion/removal + HDD/backup data cleanup
|
|
│ ├── sync/sync.go # Git sync: clone/pull app catalog, content-hash copy
|
|
│ ├── storage/
|
|
│ │ ├── scan.go, scan_linux.go # Disk detection via lsblk + blkid
|
|
│ │ ├── format.go, format_linux.go # Partition, format, mount pipeline
|
|
│ │ ├── attach.go, attach_linux.go # Attach existing FS drive (raw mount + bind mount)
|
|
│ │ ├── safety.go, safety_linux.go # System disk detection, mount guards, fstab ops
|
|
│ │ ├── migrate.go # App data migration (rsync with progress)
|
|
│ │ └── *_other.go # Non-Linux stubs for cross-compilation
|
|
│ ├── backup/
|
|
│ │ ├── backup.go # Orchestrator (per-drive dumps + restic + cross-drive chain)
|
|
│ │ ├── paths.go # Per-drive path helpers (FelhomDataDir constant, PrimaryResticRepoPath, AppDataDir, InfraBackupDir, etc.)
|
|
│ │ ├── local_infra.go # Local infra backup to all drives (.felhom-infra-backup/)
|
|
│ │ ├── dbdump.go # DB auto-discovery + dump (pg_dump, mariadb-dump)
|
|
│ │ ├── restic.go # Restic operations (init, snapshot, prune, check) — repoPath as param
|
|
│ │ ├── appdata.go # StackDataProvider interface, app data discovery
|
|
│ │ ├── crossdrive.go # Per-app backup to secondary storage (rsync/restic)
|
|
│ │ ├── restore.go # Per-app restore from per-drive repo
|
|
│ │ ├── restore_scan.go # DR: scan drives for backup data, build restore plan
|
|
│ │ ├── restore_app_linux.go # DR: per-app restore (rsync config/data + docker compose up)
|
|
│ │ └── restore_drives_linux.go # DR: auto-mount drives by UUID from Hub infra backup
|
|
│ ├── cloudflare/
|
|
│ │ ├── client.go # CF API client (Bearer auth, generic JSON helper)
|
|
│ │ ├── zone.go # Zone ID resolution (domain → zone)
|
|
│ │ ├── waf.go # WAF rule CRUD + expression builders
|
|
│ │ ├── countries.go # ISO 3166-1 country codes + Hungarian names
|
|
│ │ └── geosync.go # Geo sync orchestrator (diff & apply rules)
|
|
│ ├── integrations/
|
|
│ │ ├── integrations.go # Core types: Handler interface, ApplyContext, StatusInfo
|
|
│ │ ├── manager.go # Manager: Toggle, ListForProvider, StackProvider interface
|
|
│ │ ├── lifecycle.go # OnStackStop, OnStackStart, OnStackRemove hooks
|
|
│ │ ├── onlyoffice_filebrowser.go # OnlyOffice → FileBrowser handler (config.yaml patch)
|
|
│ │ └── onlyoffice_nextcloud.go # OnlyOffice → Nextcloud handler (occ commands)
|
|
│ ├── assets/syncer.go # Hub asset sync (download, SHA-256 compare, resolve)
|
|
│ ├── api/
|
|
│ │ ├── router.go # REST API endpoints (~36 routes)
|
|
│ │ └── geo.go # Geo-restriction API handlers
|
|
│ ├── scheduler/scheduler.go # Central job scheduler (Every, Daily)
|
|
│ ├── system/
|
|
│ │ ├── info.go, info_linux.go # RAM, disk, CPU, temperature, load average
|
|
│ │ ├── cpu_linux.go # Background /proc/stat sampling
|
|
│ │ └── mounts_linux.go # Mount points, disk usage, FS info, backup dest checks, storage probing, USB detection
|
|
│ ├── monitor/
|
|
│ │ ├── pinger.go # Healthchecks.io HTTP ping client
|
|
│ │ ├── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker)
|
|
│ │ └── watchdog.go # Storage watchdog (probe, disconnect/reconnect, safe eject)
|
|
│ ├── metrics/
|
|
│ │ ├── store.go # SQLite time-series (WAL mode, downsampled queries)
|
|
│ │ ├── collector.go # Background collector (60s, system + docker stats)
|
|
│ │ └── sysinfo.go # Static system info (/proc, /etc)
|
|
│ ├── selfupdate/
|
|
│ │ ├── version.go # Semver parsing + comparison (hand-rolled)
|
|
│ │ ├── state.go # Update audit state (JSON, atomic writes)
|
|
│ │ └── updater.go # Registry check, update trigger, startup verify
|
|
│ ├── notify/notifier.go # Email relay to hub, preference sync, cooldowns
|
|
│ ├── report/
|
|
│ │ ├── builder.go # Hub report builder (all subsystems → JSON)
|
|
│ │ ├── pusher.go # HTTP POST to hub (retry, Bearer auth, parses customer_blocked)
|
|
│ │ └── infra_pull.go # DR: pull recovery/config from Hub (retrieval password auth)
|
|
│ ├── setup/ # First-run setup wizard (web-based, replaces docker-setup.sh wizard)
|
|
│ │ ├── setup.go # NeedsSetup() detection, state persistence
|
|
│ │ ├── handlers.go # HTTP handlers for all wizard steps
|
|
│ │ ├── scanner.go # Drive scanner for local infra backups
|
|
│ │ ├── csrf.go # Lightweight CSRF (cookie + hidden field)
|
|
│ │ ├── network.go # Local IP detection for LAN access URLs
|
|
│ │ └── templates/ # 7 wizard HTML templates (Hungarian)
|
|
│ ├── recovery/info.go # Recovery info file generator (recovery-info.txt)
|
|
│ └── web/
|
|
│ ├── server.go # HTTP server, routing, static files, catch-all middleware, executeTemplate wrapper
|
|
│ ├── auth.go # Session auth + per-session CSRF token, login/logout, session cleanup
|
|
│ ├── csrf.go # CsrfProtect middleware, csrfToken/csrfField helpers
|
|
│ ├── handlers.go # Page handlers (dashboard, stacks, deploy, backups, etc.)
|
|
│ ├── handler_restore.go # DR: restore page handler + APIs (scan, restore all, skip)
|
|
│ ├── handler_debug.go # Debug page handler + 20 debug API endpoints (debug-mode only)
|
|
│ ├── logbuffer.go # Ring buffer (io.Writer) for in-memory log capture
|
|
│ ├── storage_handlers.go # Storage API handlers (scan, format, attach, migrate, cleanup, disconnect/reconnect)
|
|
│ ├── alerts.go # State-based alert generation
|
|
│ ├── funcmap.go # Template functions (state colors, Hungarian formatting)
|
|
│ ├── embed.go # go:embed for templates + Chart.js
|
|
│ └── templates/ # 15 HTML files + style.css (Hungarian UI, incl. debug.html, catchall.html)
|
|
├── configs/
|
|
│ ├── controller.yaml.example # Full config reference
|
|
│ └── example-felhom-metadata.yml # .felhom.yml format reference
|
|
├── Dockerfile # Multi-stage: Go 1.24 builder + debian-slim runtime
|
|
├── docker-compose.yml # Controller's own compose (privileged, /mnt rshared)
|
|
└── go.mod # Go 1.24, deps: bcrypt, yaml.v3, modernc.org/sqlite
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Controller config (`controller.yaml`)
|
|
|
|
Single YAML file per customer, infrastructure-only. Does **not** contain app-specific config.
|
|
|
|
Key sections:
|
|
```yaml
|
|
customer:
|
|
name: "Demo Felhom"
|
|
id: "demo-felhom"
|
|
|
|
paths:
|
|
stacks_dir: "/opt/docker/stacks"
|
|
data_dir: "/opt/docker/felhom-controller/data"
|
|
system_data_path: "/mnt/sys_drive" # NVMe/system drive — fallback for apps without HDD
|
|
|
|
git:
|
|
repo_url: "https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git"
|
|
sync_interval: "15m"
|
|
|
|
# Per-drive backup paths are computed automatically:
|
|
# <drive>/backups/primary/restic/ — restic repo per drive
|
|
# <drive>/backups/primary/<app>/db-dumps/ — DB dumps per app
|
|
# <drive>/backups/secondary/ — cross-drive rsync + restic
|
|
backup:
|
|
enabled: true
|
|
restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
|
|
db_dump_schedule: "02:30"
|
|
restic_schedule: "03:00"
|
|
retention: { keep_daily: 7, keep_weekly: 4, keep_monthly: 6 }
|
|
|
|
monitoring:
|
|
health_interval: "5m"
|
|
ping_uuids:
|
|
heartbeat: "uuid-here"
|
|
system_health: "uuid-here"
|
|
db_dump: "uuid-here"
|
|
backup: "uuid-here"
|
|
backup_integrity: "uuid-here"
|
|
|
|
web:
|
|
listen: ":8080"
|
|
setup_listen: ":8081" # Plain HTTP for setup wizard LAN access
|
|
|
|
hub:
|
|
enabled: true
|
|
url: "https://hub.felhom.eu"
|
|
api_key: "bearer-token-here"
|
|
|
|
assets:
|
|
sync_enabled: true # Download app assets (logos, screenshots) from Hub API
|
|
sync_schedule: "05:00" # Daily sync time (HH:MM, Budapest timezone)
|
|
|
|
system:
|
|
reserved_memory_mb: 384 # RAM reserved for OS + controller
|
|
```
|
|
|
|
Environment variable overrides: `FELHOM_LOGGING_LEVEL=debug`, `FELHOM_HUB_ENABLED=false`, etc.
|
|
|
|
### Runtime settings (`settings.json`)
|
|
|
|
Auto-managed by the controller. Contains password hash overrides, notification preferences, per-app backup configs, storage path registry, DB validation cache, Hub verification state (`hub_verified`, `hub_verified_at`), retrieval password for disaster recovery, and pending event queue. All writes are atomic (write `.tmp`, rename).
|
|
|
|
### Per-app config (`app.yaml`)
|
|
|
|
Auto-generated during deployment. Contains env vars, locked fields list, deploy timestamp. Secret fields are locked (read-only after first deploy). Missing fields from updated templates are auto-injected on startup and after sync (see Missing Field Injection).
|
|
|
|
**Encryption at rest**: Sensitive env values (`type: password` and `type: secret` from `.felhom.yml` metadata) are stored encrypted as `ENC:base64(nonce+ciphertext)` using AES-256-GCM. The 32-byte encryption key is stored at `{dataDir}/encryption.key` (generated on first run, 0600 permissions). Values are decrypted transparently when passed to docker-compose or displayed in the UI. The key is included in infra backups (Hub + local drives) and restored during disaster recovery. On upgrade, existing plaintext values are migrated automatically on startup.
|
|
|
|
---
|
|
|
|
## Scheduler Jobs
|
|
|
|
| Job | Type | When | Purpose |
|
|
|-----|------|------|---------|
|
|
| status-refresh | periodic | 30s | Refresh container states |
|
|
| stack-scan | periodic | 2m | Rescan stacks directory |
|
|
| heartbeat | periodic | 5m | Legacy Healthchecks ping (deprecated — Hub handles via event system) |
|
|
| system-health | periodic | configurable | Health checks + alert refresh |
|
|
| backup-cache | periodic | 5m | Refresh backup status cache |
|
|
| hub-report | periodic | 15m | Push report to central hub |
|
|
| db-dump | daily | 02:30 | Database dumps |
|
|
| backup | daily | 03:00 | Restic backup → cross-drive chain |
|
|
| backup-integrity | daily | Sun 04:00 | Restic check |
|
|
| metrics-prune | daily | 04:00 | Delete metrics older than 30 days |
|
|
| selfupdate-check | periodic | 6h | Check registry for new version (cache for UI) |
|
|
| selfupdate-auto | daily | 04:30 | Auto-update if enabled + backup not running |
|
|
| asset-sync | daily | 05:00 | Download changed app assets from Hub |
|
|
|
|
All daily jobs use Europe/Budapest timezone. Skip-if-running prevents concurrent execution. Panic recovery in all jobs.
|
|
|
|
---
|
|
|
|
## REST API
|
|
|
|
### Stack Operations
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/health` | Health check (no auth) |
|
|
| GET | `/api/stacks` | List all stacks |
|
|
| GET | `/api/stacks/{name}` | Stack details |
|
|
| POST | `/api/stacks/{name}/deploy` | First-time deploy |
|
|
| POST | `/api/stacks/{name}/start` | Start stack (409 if insufficient memory) |
|
|
| POST | `/api/stacks/{name}/stop` | Stop stack |
|
|
| POST | `/api/stacks/{name}/restart` | Restart stack |
|
|
| POST | `/api/stacks/{name}/update` | Pull + recreate |
|
|
| POST | `/api/stacks/{name}/optional-config` | Update optional env vars |
|
|
| GET | `/api/stacks/{name}/logs` | Container logs (`?raw=1` for plain text) |
|
|
| GET | `/api/stacks/{name}/hdd-data` | HDD data paths + sizes |
|
|
| GET | `/api/stacks/{name}/backup-data` | Backup data paths + sizes (DB dumps, cross-drive rsync) |
|
|
| POST | `/api/stacks/{name}/remove` | Remove deployed stack (revert to "not deployed") |
|
|
| DELETE | `/api/stacks/{name}` | Delete orphaned stack |
|
|
| POST | `/api/sync` | Trigger catalog sync |
|
|
| GET | `/api/system/info` | System info + sync status |
|
|
|
|
### Backup & Restore
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/backup/status` | Full backup status |
|
|
| POST | `/api/backup/run` | Trigger manual backup |
|
|
| GET | `/api/backup/snapshots` | List snapshots (`?stack={name}` for filtering) |
|
|
| POST | `/api/stacks/{name}/cross-backup` | Save cross-drive config |
|
|
| POST | `/api/stacks/{name}/cross-backup/run` | Trigger cross-drive backup |
|
|
| GET | `/api/stacks/{name}/cross-backup/status` | Cross-drive status |
|
|
| POST | `/api/backup/cross-drive/run-all` | Run all scheduled cross-drive backups |
|
|
|
|
### Storage
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/storage/scan` | Scan available disks |
|
|
| POST | `/api/storage/init` | Format and mount a disk |
|
|
| GET | `/api/storage/init/status` | Format progress |
|
|
| POST | `/api/storage/attach/mount-raw` | Temp-mount partition for browsing |
|
|
| GET | `/api/storage/attach/browse?path=` | List directories on raw mount |
|
|
| POST | `/api/storage/attach/mkdir` | Create folder on raw mount |
|
|
| POST | `/api/storage/attach` | Finalize attach (bind mount + fstab) |
|
|
| GET | `/api/storage/attach/status` | Attach progress |
|
|
| POST | `/api/storage/attach/cancel` | Cleanup temp raw mount |
|
|
| POST | `/api/storage/migrate` | Start app data migration |
|
|
| GET | `/api/storage/migrate/status` | Migration progress |
|
|
| POST | `/api/storage/disconnect` | Safe disconnect (stop apps, unmount) |
|
|
| POST | `/api/storage/reconnect` | Reconnect disconnected drive |
|
|
| POST | `/api/storage/restart-apps` | Restart auto-stopped apps |
|
|
| GET | `/api/storage/status` | All storage paths with connection state |
|
|
|
|
### Self-Update
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/selfupdate/status` | Update status (cached check result + last state) |
|
|
| POST | `/api/selfupdate/check` | Force registry check |
|
|
| POST | `/api/selfupdate/update` | Trigger self-update (async) |
|
|
|
|
Self-update endpoints accept session auth OR `Authorization: Bearer <hub_api_key>` for external triggering.
|
|
|
|
### Config Management
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/config/apply` | Apply new controller.yaml from Hub (atomic write) |
|
|
| GET | `/api/config/hash` | Get SHA256 hash of current controller.yaml |
|
|
| GET | `/api/config` | Get raw controller.yaml content (text/yaml) for live diff and pull |
|
|
|
|
Config endpoints accept session auth OR `Authorization: Bearer <hub_api_key>` (same as self-update). The `/api/config/apply` endpoint:
|
|
- Accepts raw YAML body (the generated config from Hub)
|
|
- Validates YAML is parseable before writing
|
|
- Atomic write: writes to `.tmp` then `os.Rename` for crash safety
|
|
- Does NOT reload config — restart required to apply changes
|
|
- Returns `{"ok": true, "message": "Config applied. Restart controller to apply changes."}`
|
|
|
|
### Metrics
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/metrics/system` | System metrics time-series (`?range=1h|6h|24h|7d|30d`) |
|
|
| GET | `/api/metrics/containers/summary` | Current container stats |
|
|
| GET | `/api/metrics/containers/{name}` | Per-container time-series |
|
|
| GET | `/api/metrics/sysinfo` | Static system info |
|
|
|
|
### Assets
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/assets/sync` | Trigger on-demand asset sync from Hub (async) |
|
|
| GET | `/api/assets/status` | Asset sync status (last sync, file count, total bytes) |
|
|
|
|
### Integrations
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/integrations/{provider}` | List integrations for provider app (status, target availability) |
|
|
| POST | `/api/integrations/{provider}/{target}` | Enable/disable integration (`{"enabled": true/false}`) |
|
|
|
|
### Debug (debug mode only)
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/debug/dump` | Full diagnostic JSON dump (controller state, storage, stacks, backup, hub, scheduler, health, alerts). Returns 404 when `logging.level` is not `"debug"`. |
|
|
| GET | `/api/debug/telemetry` | Run telemetry collection on-demand; returns per-app metrics + log summary with latency. Response: `{latency_ms, app_count, total_errors, total_warnings, app_telemetry[]}`. |
|
|
|
|
Response format: `{"ok": true/false, "data": ..., "error": "...", "message": "..."}`
|
|
|
|
---
|
|
|
|
## Build & Deploy
|
|
|
|
### Build
|
|
|
|
```bash
|
|
# On build server (192.168.0.180)
|
|
cd ~/build/felhom-controller
|
|
git -C ~/git/deploy-felhom-compose pull
|
|
./build.sh v0.20.0 --push
|
|
```
|
|
|
|
### Deploy on customer node
|
|
|
|
**Option A: Self-Update API (v0.16.0+)**
|
|
|
|
After building and pushing the new image, trigger the controller's self-update endpoint:
|
|
|
|
```bash
|
|
curl -s -X POST https://felhom.demo-felhom.eu/api/selfupdate/update \
|
|
-H "Authorization: Bearer <HUB_API_KEY>"
|
|
```
|
|
|
|
The controller pulls the new image, updates its own compose file, and runs `docker compose up -d` to replace itself. The Settings page also has a "Frissítés telepítése" button for manual triggering.
|
|
|
|
**Option B: Manual SSH (pre-v0.16.0 or fallback)**
|
|
|
|
```bash
|
|
# On customer node (e.g., 192.168.0.162)
|
|
cd /opt/docker/felhom-controller
|
|
sudo docker pull gitea.dooplex.hu/admin/felhom-controller:<VERSION>
|
|
sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:<VERSION>|' docker-compose.yml
|
|
sudo docker compose up -d
|
|
```
|
|
|
|
**Important:** Always use `docker compose up -d`, NOT `docker compose restart` — restart doesn't pick up new images.
|
|
|
|
### Docker Requirements
|
|
|
|
The controller container needs:
|
|
- `privileged: true` (disk operations)
|
|
- Docker socket mount (`/var/run/docker.sock`)
|
|
- `/mnt` mount with `propagation: rshared` (container mounts visible to host)
|
|
- `/dev` mounted as `/host-dev` (block device access)
|
|
- `/etc/fstab` mounted as `/host-fstab` (persistent mount config)
|
|
|
|
See `docker-compose.yml` for the full volume configuration.
|
|
|
|
---
|
|
|
|
## Roadmap
|
|
|
|
### Completed
|
|
|
|
- [x] Stack management with deploy flow and memory validation
|
|
- [x] Git-based app catalog sync
|
|
- [x] Central job scheduler
|
|
- [x] System monitoring with SQLite metrics and Chart.js charts
|
|
- [x] Healthchecks.io integration (5 ping types)
|
|
- [x] 3-layer backup system (DB dumps + restic + cross-drive)
|
|
- [x] Per-app backup restore with auto stop/restart
|
|
- [x] Storage management (scan, format, mount, registry)
|
|
- [x] Attach existing drive wizard (v0.15.0) — bind-mount subfolder from pre-formatted drive, directory browser
|
|
- [x] App data migration between storage paths
|
|
- [x] Storage watchdog (v0.17.0) — USB disconnect detection (~15s), auto-stop apps, auto-remount on reconnect, safe eject UI
|
|
- [x] Central hub reporting
|
|
- [x] Email notifications via hub relay
|
|
- [x] Settings persistence and password management
|
|
- [x] Dashboard alert system
|
|
- [x] Per-drive backup architecture (v0.14.0) — per-drive restic repos, per-app DB dumps, path helpers
|
|
- [x] Cross-drive restic pruning (v0.14.0)
|
|
- [x] Auto Tier 2 for small apps (v0.14.1) — auto-enable daily rsync for non-HDD apps when ≥2 drives
|
|
- [x] Infrastructure config in cross-drive backup (v0.14.1) — stacks dir + controller.yaml in `_infra/` + restic
|
|
- [x] Disaster recovery (v0.15.5) — Hub-based infra backup, auto-mount by UUID, restore UI with full-page takeover
|
|
- [x] Controller self-update (v0.16.0) — Watchtower-style pull + restart, Settings page UI, API key auth, auto-update scheduling
|
|
- [x] Hub-managed config (v0.20.0) — Config apply endpoint (`POST /api/config/apply`), config hash in reports for sync comparison
|
|
- [x] Config content endpoint (v0.21.1) — `GET /api/config` returns raw YAML for Hub live diff and pull operations
|
|
- [x] First-run setup wizard (v0.22.0) — Web-based wizard replaces shell scripts, drive scan for local backups, Hub recovery, fresh install flow
|
|
- [x] Setup wizard logo fix (v0.22.2) — Use embedded SVG instead of filesystem path
|
|
- [x] Hub-managed asset sync (v0.22.3) — Download app logos/screenshots from Hub API with SHA-256 change detection, daily sync schedule
|
|
|
|
### In Progress / Planned
|
|
|
|
- [ ] Update classification and auto-apply (optional/required/security markers)
|
|
- [ ] Docker volume backup (`/var/lib/docker/volumes:ro`)
|
|
- [ ] Raspberry Pi testing (pi-customer-1)
|
|
- [x] CSRF protection on POST endpoints (v0.23.0)
|
|
- [x] Verbose debug logging across all modules (v0.24.0)
|
|
- [x] Diagnostic dump endpoint `/api/debug/dump` (v0.24.0)
|
|
- [x] Startup self-test with 9 subsystem checks (v0.24.0)
|
|
- [ ] Login rate limiting
|
|
|
|
---
|
|
|
|
## Test Environments
|
|
|
|
| Node | Hardware | Domain | Status |
|
|
|------|----------|--------|--------|
|
|
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | Controller v0.22.3 |
|
|
| pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | Not yet tested |
|
|
|
|
## Related Repositories
|
|
|
|
| Repository | Purpose |
|
|
|------------|---------|
|
|
| [deploy-felhom-compose](https://gitea.dooplex.hu/admin/deploy-felhom-compose) | This repo — controller + deploy scripts |
|
|
| [app-catalog-felhom.eu](https://gitea.dooplex.hu/admin/app-catalog-felhom.eu) | Docker Compose templates + .felhom.yml metadata |
|
|
| [felhom.eu](https://gitea.dooplex.hu/admin/felhom.eu) | Website + app assets + felhom-hub service |
|