# CONTEXT.md — Project Memory > This file serves as persistent project memory across Claude Code sessions. > It replaces the auto-generated "Memory" from the claude.ai Project. > **Update this file at the end of each working session** with current state, > recent decisions, and anything the next session needs to know. > > Ask Claude Code: "Please update CONTEXT.md with what we did today" Last updated: 2026-02-16 (session 21) --- ## About Viktor (project owner) - Works at Magyar Telekom (Budapest), building Felhom as a side business - Felhom: managed home-server service for Hungarian households - Technical but prefers pragmatic solutions over over-engineering - Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management - Customer deployments use Docker Compose (not Kubernetes) for simplicity ## Current project state ### felhom-controller (this repo) - **Version:** v0.6.3 - **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow - **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings) - **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**) - **Phase 4:** ✅ COMPLETE — Monitoring Page with Metrics Store (SQLite, Chart.js, system + container metrics) - **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13) - **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080 - **All Phase 1-4 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page ### What was just completed (2026-02-16 session 21) - **v0.6.3 — Bug fixes from v0.6.2 code scan (4 minor fixes):** - **Bug 1:** `--hdd-path` in `docker-setup.sh` now uses `require_arg` validation like all other flags. Previously, `--hdd-path` as the last argument without a value would crash with a cryptic bash error under `set -u` instead of a friendly message. - **Bug 2:** `stackAction()` in `layout.html` now receives `event` as an explicit parameter instead of relying on the deprecated implicit `window.event`. All 10 onclick call sites in `dashboard.html` and `stacks.html` updated to pass `event` as first argument. - **Bug 3:** Page `` now has an em dash separator: `"Vezérlőpult — Felhom.eu"` instead of `"VezérlőpultFelhom.eu"`. - **Bug 4:** `nextPruneLabel()` in `funcmap.go` now returns `"ma"` (Hungarian for "today") on Sunday before 4am, consistent with the `nextRunLabel` function. Previously returned the date in `"2006-01-02"` format. - **Deployed:** Controller v0.6.3 to demo-felhom.eu, verified healthy ### What was previously completed (2026-02-16 session 20) - **Hub Dashboard Bugs + Backup Validation Fix (3 bugs):** - **Bug 1&2 (Hub repo, felhom-hub v0.1.2):** Hub timestamp parsing failure — `time.Parse` with single hardcoded format silently failed for formats returned by `modernc.org/sqlite`. Added `parseSQLiteTime()` that tries 6 common formats. Fixed: hub main page showing DOWN despite OK status, and report history timestamps showing 00:00:00. - **Bug 3 (Controller repo, v0.6.2):** Backup page showing "Hiba" for all DB validations — zero-value `DumpValidation{}` (never assigned) hit the `{{else}}` branch in template. Three fixes: - Template: 4-branch guard (Valid → OK / Error → Hiba / zero-value → "–" with tooltip) - Debug logging: Added `[DEBUG]` and `[WARN]` log lines to all `ValidateDump()` code paths - Re-validation: `RefreshCache()` now cross-checks `lastDBDump` results against fresh `ListDumpFiles()` validation, healing stale in-memory state - **Deployed:** Hub v0.1.2 to k3s, Controller v0.6.2 to demo-felhom - **Verified:** Controller logs show `ValidateDump OK` for all 3 databases (immich: 60 tables, paperless: 67 tables, romm: 14 tables) ### What was previously completed (2026-02-16 session 19) - **v0.6.1 — Code Review Bugfixes (7 fixes):** - **Fix 1:** `http.NotFound(w, nil)` → pass actual `*http.Request` in `deployHandler` and `appDetailHandler` - **Fix 2:** Dashboard running/stopped counts now computed from the filtered `deployedStacks` set (was counting ALL stacks including non-deployed) - **Fix 3:** Session cookie `Secure` flag now dynamic based on `r.TLS != nil || X-Forwarded-Proto == "https"`. `SameSite` changed from `Strict` to `Lax` (Strict breaks Cloudflare Tunnel redirects) - **Fix 4:** Removed misleading `subtle.ConstantTimeCompare` from `isValidSession()` (map lookup already leaks timing; comparing token to itself is meaningless). Removed unused `token` field from `session` struct. Removed `crypto/subtle` import. - **Fix 5:** Replaced `time.Tick()` (goroutine leak) with proper `time.NewTicker` + `done` channel in `cleanupSessions()`. Added `Close()` method to Server. Added `done chan struct{}` to Server struct. - **Fix 6:** Added `http.MaxBytesReader(w, req.Body, 1<<20)` (1MB limit) to `deployStack`, `updateOptionalConfig`, `deleteStack` API handlers via `limitBody()` helper. - **Fix 7:** Cached `time.LoadLocation("Europe/Budapest")` once at top of `templateFuncMap()`, removed 5 per-function `LoadLocation` calls (timeAgo, fmtTime, fmtTimeShort, nextRunLabel, nextPruneLabel). - **Post-fix verification:** All 4 grep checks pass (0 results for NotFound(w,nil), ConstantTimeCompare, time.Tick(, Secure:.*true). `go vet ./...` clean. - **Controller version:** v0.6.1 — deployed and verified on demo-felhom.eu ### What was previously completed (2026-02-16 session 18) - **v0.6.0 — Healthcheck Implementation + Central Push + Hub Dashboard:** - **Part 1 — Healthcheck enhancements (controller-side):** - Added `heartbeat` ping — lightweight "I'm alive" signal every 5 min (no logic, just ping) - Added `backup_integrity` ping — weekly `restic check` on Sunday 04:00, pings healthchecks with result - Added `Heartbeat` and `BackupIntegrity` fields to `PingUUIDsConfig` - Added `RunIntegrityCheck()` to backup Manager (calls restic Check(), updates lastCheckTime/lastCheckOK, pings) - Updated `controller.yaml.example` with new monitoring ping_uuids - Created `monitoring/DEPRECATED.md` for legacy bash monitoring scripts - **Part 2 — Central hub reporting (controller-side):** - New `internal/report/` package: types.go (Report struct), builder.go (BuildReport), pusher.go (HTTP push) - Report builder gathers data from all subsystems: system info (via metrics.GetStaticInfo + system.GetInfo), container stats (via metricsStore.QueryContainerSummary), backup status (via backupMgr.GetFullStatus), health (via monitor.RunHealthCheck), stacks (via stackMgr.GetStacks) - Report pusher: POST JSON to hub with Bearer token auth, 3 retries with 5s backoff, never fails caller - Added `HubConfig` to config.go (enabled, url, api_key, push_interval) - Wired hub reporting into scheduler (configurable interval, default 15m) - Hub reporting disabled by default (hub.enabled: false) - **Part 3 — Hub service (felhom.eu repo, new `hub/` subfolder):** - Full Go service: `cmd/hub/main.go`, `internal/api/handler.go`, `internal/store/store.go`, `internal/web/server.go` - SQLite store with WAL mode, auto-migration, denormalized fields for fast queries - REST API: POST /api/v1/report (Bearer token auth), GET /api/v1/customers, GET /api/v1/customers/{id}, GET /api/v1/customers/{id}/history - Dark theme dashboard (English): multi-customer overview table with status indicators, customer detail page with system/storage/containers/backup/health sections - Color coding: green (OK, <30min), yellow (warn or 30-60min), red (fail or >60min) - K8s manifest: Deployment + Service + Ingress for hub.felhom.eu in felhom-system namespace - Dockerfile, Makefile, hub.yaml.example config - 90-day report retention with daily auto-prune - **Controller version:** v0.6.0 — deployed and verified on demo-felhom.eu (9 scheduler jobs, all new jobs registered) - **Manual steps remaining for Viktor (Part 4 of TASK.md):** - Create 5 healthcheck checks on status.felhom.eu (heartbeat, system-health, db-dump, backup, backup-integrity) - Update controller.yaml on demo-felhom with real UUIDs - Build and deploy felhom-hub to k3s cluster - Configure hub.felhom.eu DNS in Cloudflare - Enable hub reporting on demo-felhom controller.yaml ### What was previously completed (2026-02-16 session 17) - **v0.5.4 — Monitoring Page Frontend Fixes (4 bugs, frontend-only):** - **Bug 1: Tooltip "Invalid Date"** — `items[0].parsed.x` unreliable across Chart.js versions. Fixed tooltip callback to use `items[0].raw.x` (direct {x,y} data access) with `parsed.x` as fallback. - **Bug 2: Charts fill full width regardless of data density** — `setChartXBounds()` setting `min/max` at runtime was ignored because the scale was created without them. Fixed by including `min: now - defaultRangeMs, max: now` in the initial `chartOpts()` options. Now "7 nap" shows full 7-day x-axis with data clustered on the right. - **Bug 3: Sysinfo values not consistently right-aligned** — `.sysinfo-grid` used `auto-fill` creating variable-width cells. Fixed to `1fr 1fr` (fixed 2-column). Added `align-items: baseline`, `gap: 1rem`, `white-space: nowrap` on labels, `font-weight: 600` + `word-break: break-word` on values. Removed redundant `<style>` block from monitoring.html (styles now in style.css). - **Bug 4: Charts overflow on mobile** — Added `min-width: 0` on `.chart-box` (critical CSS grid fix), `overflow: hidden` + `max-width: 100%` on `.chart-wrap` and `.chart-wrap-bar`, `max-width: 100%` on canvas. - **Controller version:** v0.5.4 — deployed and verified on demo-felhom.eu ### What was previously completed (2026-02-16 session 16) - **v0.5.1 — Monitoring Page Bugfixes:** - **Bug 1: Hostname** — `os.Hostname()` returns the container ID inside Docker. Fixed by mounting `/etc/hostname:/host/etc/hostname:ro` and reading it first in `sysinfo.go`. Now shows `demo-felhom`. - **Bug 2: Tooltip timestamps** — Chart.js tooltip callback used `items[0].parsed.x` (category index 0,1,2...) instead of `items[0].label` (actual timestamp). Index 0 worked by accident (`0 || label` falls through), but all other points showed 1970-01-01. - **Bug 3+4: Default range + empty charts** — Default range was `24h` but new system had only minutes of data. Changed to `1h` default for both system and container detail charts. Moved `active` class to "1 óra" button. - **Controller version:** v0.5.1 — deployed and verified on demo-felhom.eu ### What was previously completed (2026-02-16 session 15) - **v0.5.0 — Backup Bugfixes + Monitoring Page with Metrics Store:** - **Task 1: Fixed "Helyi mentés" showing "–" after restart** — `GetFullStatus()` now synthesizes `LastBackup` from `SnapshotHistory` and `LastDBDump` from `DumpFiles` on disk when the in-memory values are nil (e.g., after controller restart). Dashboard handler also updated to use `GetFullStatus()` instead of `GetStatus()` for consistent behavior. - **Task 2: Verified backup page caching** — Already implemented in v0.4.7 (`RefreshCache`, scheduler job, `AfterBackup` callback). No changes needed. - **Task 3: New Monitoring Page ("Rendszermonitor")** — Full system monitoring subsystem: - **SQLite metrics store** (`internal/metrics/store.go`, `types.go`): WAL-mode SQLite via `modernc.org/sqlite` (pure Go, no CGO). Stores system metrics (CPU%, memory, temperature, load) and container metrics (CPU%, memory, net/block I/O) with timestamp. Downsampled queries via bucket-based `GROUP BY` for Chart.js. 30-day auto-prune via daily scheduler job at 04:00. - **Metrics collector** (`internal/metrics/collector.go`): Background goroutine collects system + container metrics every 60 seconds. System data from `system.GetInfo()`, container data from `docker stats --no-stream` with tab-separated format parsing. - **System info provider** (`internal/metrics/sysinfo.go`, `sysinfo_other.go`): Reads hostname, OS, kernel, CPU model/cores, uptime from `/proc` filesystem. Linux-specific with build-tag fallback for cross-compilation. - **REST API endpoints** (4 new routes in `router.go`): `GET /api/metrics/system` (time-series with range presets), `GET /api/metrics/containers/summary` (current stats), `GET /api/metrics/containers/{name}` (per-container time-series), `GET /api/metrics/sysinfo` (static system info). - **Monitoring page template** (`monitoring.html`): 5 sections — System Overview (sysinfo via API), System Metrics Charts (4 line charts: CPU, Memory, Temperature, Load in 2×2 grid), Container Resources (2 horizontal bar charts: CPU% and Memory), Per-container Detail (click to expand with historical charts), Storage (server-rendered progress bars). Time range selectors (1h/6h/24h/7d/30d). Auto-refresh every 60s. - **Chart.js 4.4.7** embedded locally (offline environments, ~200KB UMD), dark theme configuration matching site design. - **CSS**: ~100 lines added for monitoring page (`.monitor-card`, `.charts-grid`, `.chart-box`, `.container-charts-row`, `.storage-bars`, responsive rules). - **Wiring**: 4th sidebar nav item "Rendszermonitor", metrics DB path in named volume (`data/metrics.db`), `/etc/os-release:/host/etc/os-release:ro` volume mount in docker-compose.yml, Dockerfile updated to `golang:1.24-bookworm` (required by `modernc.org/sqlite`), `go.mod` upgraded to `go 1.24.0`. - **Controller version:** v0.5.0 — deployed and verified on demo-felhom.eu (metrics collecting, 16 containers reporting, sysinfo showing Intel N100 correctly) ### What was previously completed (2026-02-16 session 14) - **v0.4.7 — Protected Stack Detail Pages + Backup Page Caching:** - **Protected stacks clickable** — `data-href` gating changed from `{{if not .Protected}}` to `{{if .Meta.Slug}}` on both `stacks.html` and `dashboard.html`. Protected stacks with `.felhom.yml` (i.e. a slug) are now clickable, linking to `/apps/{slug}`. Stacks without `.felhom.yml` remain non-clickable. - **"Részletek" button for protected stacks** — Protected stack action section in `stacks.html` now shows a "Részletek" link when the stack has a slug, next to the restart button. - **FileBrowser `.felhom.yml` resources** — Added `resources` section (mem_request: 128M, mem_limit: 256M, pi_compatible: true, needs_hdd: true) to both `install_filebrowser()` in `docker-setup.sh` and manually on the demo node. FileBrowser detail page now shows memory/Pi/HDD badges. - **Backup page caching** — `GetFullStatus()` no longer runs expensive subprocess calls (restic stats, docker inspect, disk listing) on every page load. Instead, a new `RefreshCache()` method runs these in the background: - Every 5 minutes via `backup-cache` scheduler job - After each successful backup via `AfterBackup` callback - On startup via a goroutine (non-blocking) - `GetFullStatus()` returns the cached `FullBackupStatus` instantly, updating only dynamic fields (running flag, next run times, snapshot history). Falls back to a minimal status if cache hasn't populated yet. - **Controller version:** v0.4.7 — deployed and verified on demo-felhom.eu ### What was previously completed (2026-02-16 session 13) - **v0.4.6 — MariaDB Validation Fix + Dashboard & Protected Stack UX:** - **Bugfix: MariaDB dump validation false positive** — MariaDB 11.4+ prepends `/*M!999999\- enable the sandbox mode */` before the dump header comment. `ValidateDump()` now scans the first 10 lines for the expected header pattern instead of just checking line 1. Accepts `-- MariaDB dump`, `-- MySQL dump`, `-- mysqldump` for MariaDB and `-- PostgreSQL database dump` for PostgreSQL. - **Dashboard shows deployed apps only** — `dashboardHandler()` filters to deployed + protected stacks only. Non-deployed apps remain on the Alkalmazások page. Section heading changed to "Telepített alkalmazások". `TotalCount` stat card still shows all 52 apps. - **Protected stack restart button** — Protected stacks (traefik, cloudflared, felhom-controller, filebrowser) now show an "Újraindítás" restart button when operational, on both dashboard (compact ↻) and Alkalmazások page (full button). "Védett" / "Védett rendszerkomponens" badge still shown. - **API protection guard** — Centralized guard in `actionStack()` blocks all actions except `restart` on protected stacks (HTTP 403). Defense-in-depth: `StopStack()` and `DeleteStack()` retain their own guards. - **FileBrowser `.felhom.yml`** — `install_filebrowser()` in `docker-setup.sh` now creates `.felhom.yml` with `subdomain: files` metadata, so the controller shows the `files.DOMAIN ↗` URL link. Manually created on demo node. - **Controller version:** v0.4.6 — deployed and verified on demo-felhom.eu ### What was previously completed (2026-02-16 session 12) - **v0.4.5 — Dedicated Backup Page ("Biztonsági mentés"):** - **New `/backups` page** with full backup system visibility — 5 sections: 1. **Status overview cards**: Local backup status (green/gray), remote placeholder (gray), DB count, repo size 2. **Schedule section**: DB dump/restic/prune schedule with next-run times, last backup time + duration, retention policy, "Mentés most" button 3. **Database table**: Lists all discovered DBs with type badge (PostgreSQL/MariaDB), dump file size, last dump time, validation (table count), status 4. **Snapshot history table**: Last 20 snapshots with ID, time, data added, files new/changed 5. **Repository info card**: Path, size, snapshot count, integrity check status, backed-up paths list, remote copy placeholder - **Backend extensions:** - `SnapshotRecord` type + ring buffer (20 entries) in Manager for per-snapshot stats - `DumpValidation` — scans dump files for CREATE TABLE statements, validates header and file size - `ValidateDump()` runs after each successful dump in `DumpOne()` - `ListDumpFiles()` scans dump directory for existing `.sql` files (fallback when in-memory results empty) - `ListSnapshots()` on ResticManager — returns all snapshots from restic (newest first) - `GetFullStatus()` on Manager — single call returns everything the page needs - `LoadSnapshotHistory()` populates history from restic on startup (without delta stats) - Restic check result tracking (`lastCheckTime`, `lastCheckOK`) - `NextDailyRun()` exported from scheduler for next-run time calculation - **Server wiring:** - `Server` struct now holds `*scheduler.Scheduler` - `NewServer()` accepts scheduler parameter - `/backups` route + `backupsHandler()` in handlers.go - **New template functions** (`funcmap.go`): `timeAgo`, `fmtTime`, `fmtTimeShort`, `dbTypeLabel`, `nextRunLabel`, `pruneLabel`, `nextPruneLabel`, `fmtDuration`, `fmtBytes`, `shortID` - **Navigation**: Sidebar now has 3 items (Vezérlőpult, Alkalmazások, Biztonsági mentés) - **Dashboard**: Backup card title is now a clickable link to `/backups` - **Auto-refresh**: Page polls `/api/backup/status` every 3s during backup-in-progress, reloads when complete - **CSS**: Full dark-theme styles for schedule card, database table, snapshot table, repository card, validation badges, DB type badges, empty state - **Controller version:** v0.4.5 — deployed and verified on demo-felhom.eu (2 historical snapshots loaded) ### What was previously completed (2026-02-15 session 11) - **v0.4.1 — App Filtering + Bugfixes:** - **Filter bar on Alkalmazások page**: Four pill-shaped filter buttons (Mind/Futó/Leállítva/Telepíthető) with live count badges computed from DOM. Filters stack cards via `display: none`, updates URL with `?filter=running` via `history.replaceState`. Reads filter from URL on page load for deep-linking support. - **New `filterCategory` template function** (`funcmap.go`): Maps container state + deployed flag to filter categories (running/stopped/available). Each stack card gets a `data-filter-state` attribute for client-side filtering. - **Clickable dashboard stat cards**: Stat cards (Futó/Leállítva/Összes) changed from `<div>` to `<a>` with `href` linking to `/stacks?filter=running`, `/stacks?filter=stopped`, `/stacks` respectively. Hover effect with translateY + box-shadow. - **docker-compose.yml synced to demo node**: Fixed the stale compose file that still had `dashboard.${DOMAIN}` Traefik label (from pre-v0.3.0). Now uses correct `felhom.${DOMAIN}` label + `/sys:/host/sys:ro` mount. - **Controller version:** v0.4.1 — deployed and verified on demo-felhom.eu - **Remaining manual tasks for Viktor (Task 2 & 3 from TASK.md):** - Verify `felhom.demo-felhom.eu` resolves correctly (Cloudflare Tunnel public hostname may need updating from `dashboard.*` to `felhom.*`) - Update Pi-hole local DNS if applicable - Enable backup in `controller.yaml` on demo node (`backup.enabled: true`) - Create `/srv/backups` directories on demo node ### What was previously completed (2026-02-15 session 10) - **v0.4.0 — Monitoring & Health + Backups (Phase 2 & 3):** - **Central job scheduler** (`internal/scheduler/scheduler.go`): - Replaces ad-hoc goroutines in main.go with a unified scheduler - `Every(name, interval, fn)` for periodic jobs, `Daily(name, timeStr, fn)` for scheduled tasks - Panic recovery, skip-if-running, quiet mode for high-frequency jobs (≤30s) - Daily jobs use `Europe/Budapest` timezone with `time.Timer` for DST correctness - Graceful shutdown with 30s timeout for running jobs - **CPU usage collector** (`internal/system/cpu_linux.go`): - Background goroutine samples `/proc/stat` every 5s, computes delta-based CPU % - Platform stubs for non-Linux in `cpu_other.go` - **Temperature & load metrics** (`internal/system/info_linux.go`): - Reads `/proc/loadavg` for 1/5/15 min load averages - Reads thermal zones from `/host/sys/class/thermal/` (Docker mount) with `/sys/` fallback - Handles millidegree values, picks highest zone, with hwmon fallback - **Healthchecks.io pinger** (`internal/monitor/pinger.go`): - HTTP ping client for Healthchecks.io-compatible endpoints - POST to `/ping/{uuid}` (success), `/fail` (failure), `/start` (started) - 10s timeout, 3 retries with 2s backoff, skips CHANGEME UUIDs - **System health checks** (`internal/monitor/healthcheck.go`): - Checks disk, memory, CPU, temperature, Docker reachability, protected containers - Returns HealthReport with status "ok"/"warn"/"fail" + formatted message for pings - **Database dump engine** (`internal/backup/dbdump.go`): - Auto-discovers PostgreSQL/MariaDB containers via `docker ps` + `docker inspect` - Dumps via `docker exec pg_dump`/`mariadb-dump` with 5min timeout - Atomic writes (`.tmp` → `.sql`), empty file detection, stale temp cleanup - **Restic integration** (`internal/backup/restic.go`): - Auto-generates repository password (32 random bytes, base64url) - Init, snapshot (JSON output), prune, check, stats, latest snapshot - Stale lock detection with automatic unlock + retry - **Backup orchestrator** (`internal/backup/backup.go`): - DB dumps + restic snapshots, weekly prune on Sundays - Thread-safe running flag, Healthchecks.io pings with results - `RunFullBackup()` for manual trigger (sequential: dumps → snapshot) - **Wiring updates:** - `main.go`: scheduler-based job registration, cpuCollector lifecycle, pinger + backupMgr init - `api/router.go`: `GET /api/backup/status`, `POST /api/backup/run` - `web/server.go` + `handlers.go`: pass cpuCollector to GetInfo(), backup status on dashboard - `funcmap.go`: `tempColor`, `fmtTemp`, `fmtLoad` template functions - **Dashboard UI enhancements:** - CPU usage bar with load average display below - Temperature with colored indicator dot (green/yellow/red at 60°/75°C) - Backup status card: last run time, DB count, repo size/snapshots - "Mentés most" button triggers manual backup via API - **Config updates:** - `controller.yaml.example`: added `system_health_interval`, `hdd_path`, `system.reserved_memory_mb` - `docker-compose.yml`: added `/sys:/host/sys:ro` mount for temperature reading - `restic_password_file` default changed to `data/` subdir (auto-generated in named volume) - **Controller version:** v0.4.0 — deployed and verified on demo-felhom.eu ### What was previously completed (2026-02-15 session 9) - **v0.3.0 — Structural refactoring (templates + server split + domain rename):** - **Templates: go:embed migration** — moved all 7 HTML templates + CSS from Go string constants to individual files in `internal/web/templates/`. Created `embed.go` with `//go:embed` directive. Template loading now uses `ParseFS()` instead of `Parse()`. CSS served from embed.FS via `ReadFile()`. Zero runtime file dependencies — still compiled into the binary. - **Server decomposition** — split monolithic `server.go` (540 lines) into focused files: - `auth.go`: session struct, auth middleware, login/logout handlers, session management - `handlers.go`: page handlers (dashboard, stacks, logs, deploy, app detail) - `funcmap.go`: template FuncMap with 14 custom functions - `server.go`: Server struct, NewServer, loadTemplates (3-liner), ServeHTTP routing, render helper, static file serving - **Domain rename** — controller subdomain changed from `dashboard.*` to `felhom.*` in Traefik labels and setup script - **Documentation updated** — CLAUDE.md, README.md, CONTEXT.md all reflect new file structure - **Reminder for Viktor:** Update Cloudflare Tunnel public hostname (`dashboard.demo-felhom.eu` → `felhom.demo-felhom.eu`) and Pi-hole DNS if needed - **Controller version:** v0.3.0 ### What was previously completed (2026-02-15 session 8) - **FileBrowser as infrastructure service:** - Created `scripts/hdd-setup.sh` (adapted from deploy-portainer) — sets up HDD folder structure with `Dokumentumok` user dir - Created `scripts/docker-setup.sh` (adapted from deploy-portainer) — installs Docker, Traefik, FileBrowser as infra services - Added `filebrowser` to protected stacks in `controller.yaml.example` - Removed `templates/filebrowser/` from app-catalog-felhom.eu (no longer a catalog app) - **Orphan stack detection and deletion:** - Added `Orphaned` field to Stack struct + `getCatalogTemplateSlugs()` helper - Orphan detection in `ScanStacks()` — deployed stacks with no matching catalog template marked as orphaned - New `delete.go`: `DeleteStack()` (compose down + HDD cleanup + dir removal), `GetStackHDDData()`, `parseComposeHDDMounts()` - Safety: protected HDD paths (root, media, storage, Dokumentumok, appdata) can never be deleted - New API endpoints: `DELETE /api/stacks/{name}` and `GET /api/stacks/{name}/hdd-data` - UI: orange "Elavult" badge on orphaned stacks, "Törlés" button, delete confirmation modal - Modal shows HDD data paths/sizes, checkbox for "Felhasználói adatok törlése a merevlemezről" - Hides "Frissítés" and "Részletek" buttons for orphaned stacks - **Verified:** 1 orphaned stack detected on startup (filebrowser — now infra, removed from catalog) - **Controller version:** v0.2.15 ### Previously completed (2026-02-14 session 7) - **Fixed YAML parse error in romm `.felhom.yml`** (app-catalog repo): - Root cause: Hungarian opening quote `„` (U+201E) paired with ASCII `"` (0x22) inside YAML double-quoted strings terminated the string prematurely - Affected lines: `help_text` for IGDB Client Secret and SteamGridDB API Key fields - Fix: escaped inner ASCII double quotes with `\"` in the YAML strings - This caused `LoadMetadata()` to silently fail and return empty defaults for ALL romm metadata (tagline, resources, category — everything) - **Added error logging to `LoadMetadata()`** in `metadata.go`: - `[ERROR]` log on YAML parse failure (was silently swallowed — critical bug) - Temporary `[DEBUG]` log used for diagnosis, then removed - **Fixed deploy command in CLAUDE.md**: - `sed` pattern now targets only `image:` lines (was matching service name too, breaking YAML) - Added `sudo` for both sed and docker compose (directory is root-owned) - **Controller version:** v0.2.14 ### Previously completed (2026-02-14 session 6) - **Bug fix: App info logo SVG rendering** — `.app-info-logo` CSS in `templates.go`: - Added `min-width`, `min-height`, `max-width`, `max-height: 80px` and `overflow: hidden` - Prevents SVG images with explicit dimensions or no viewBox from overflowing container - Logo now reliably renders at 80x80 regardless of SVG intrinsic size - **Controller version:** v0.2.12 ### Previously completed (2026-02-14 session 5) - **App detail/info pages** — new feature: - New route: `GET /apps/{slug}` renders a full info page (was redirect to deploy page) - Hero section with logo, tagline, resource badges - Screenshots section (graceful — hidden via `onerror` if assets don't exist) - Info cards: use cases, first steps, prerequisites, default credentials, docs link - Optional config form with AJAX save (POST `/api/stacks/{name}/optional-config`) - New `.felhom.yml` fields: `app_info` (tagline, use_cases, first_steps, prerequisites, default_creds, docs_url) and `optional_config` (groups of env var fields) - New structs in `metadata.go`: `AppInfo`, `OptionalConfigGroup`, `OptionalConfigField` - `UpdateOptionalConfig` in `deploy.go`: saves optional env vars to `app.yaml`, restarts deployed stacks with `docker compose up -d` to pick up new env vars - Navigation updated: stack cards on dashboard/stacks pages now link to `/apps/{slug}`, deploy page has "Részletek" link back to info page - **RoMM metadata updated** (app-catalog repo): - Full `app_info` section: tagline, 5 use cases, 6 first steps, 3 prerequisites, default creds, docs URL - 6 optional config fields for metadata providers: IGDB (client_id + secret), SteamGridDB, ScreenScraper (user + password), MobyGames - docker-compose.yml updated with SCREENSCRAPER_USER, SCREENSCRAPER_PASSWORD, MOBYGAMES_API_KEY env vars - Display name fixed: "ROMM" → "RomM" - **Controller version:** v0.2.11 ### Previously completed (2026-02-14 session 4) - **Fixed deploy race condition** in `internal/stacks/deploy.go`: - In-memory `Deployed` flag now set BEFORE `docker compose up -d` (compose up can take 30-60s for image pulls) - On failure: both in-memory state and disk (app.yaml) are reverted - Eliminates stale "Telepítés" button during long compose operations - **Added `checkBeforeDeploy()` JS guard** in `internal/web/templates.go`: - Telepítés buttons on Vezérlőpult and Alkalmazások pages now fetch live state from `/api/stacks/{name}` before navigating - If app is already deployed (e.g., another tab deployed it), shows alert and reloads page instead of navigating to deploy form - Catches stale UI state gracefully ### Previously completed (2026-02-14 session 3) - **Enhanced debug logging** across all stack operations in `internal/stacks/`: - **Operation timing**: All stack ops (start, stop, restart, update, deploy) now log elapsed time - **Post-start container state check**: Async goroutine after start/restart/update/deploy - **Image pull detection**: Checks local images before deploy/update (debug level) - **GetLogs/ScanStacks improvements**: Byte count logging, deployed/available counts - All verbose checks gated on `cfg.Logging.Level == "debug"`; timing always at INFO - **UI improvements** in `internal/web/templates.go` and `server.go`: - **Memory bar fix on deploy page**: Bar segments now always visible (min-width: 3px), new app segment uses translucent green with distinct border for clear visual separation from committed memory - **Clickable app cards**: Cards on Vezérlőpult and Alkalmazások pages are now clickable (navigates to deploy/detail page). Uses `data-href` attribute + delegated click handler. Protected stacks excluded. Actions area (buttons, state labels) excluded from click-to-navigate - **Live-scrolling logs**: Logs page now auto-refreshes every 3s via AJAX polling (`?raw=1` returns plain text). Fixed-height container (70vh) with auto-scroll to bottom. Pulsing green "Élő" indicator. Pause/resume toggle ("Szüneteltetés"/"Folytatás"). User scroll position preserved when scrolled up to read history - **Deployment progress UI**: Deploy button no longer shows alert+redirect immediately. Instead shows 3-step progress panel: config saved → containers starting → app initializing. Polls `GET /api/stacks/{name}` every 3s to track actual container health state. Handles running (auto-redirect), starting (keep polling), unhealthy (warning), exited (error), and 120s timeout. Shows elapsed time counter - **Mealie healthcheck fix** (app-catalog-felhom.eu): - `wget --spider` replaced with Python TCP socket check — mealie image doesn't include wget - `start_period` increased to 60s (DB migrations take ~40s on first start) - **Healthcheck audit**: filebrowser (Alpine, has BusyBox wget — OK), stirling-pdf (Ubuntu, has wget — OK) ### Previously completed (2026-02-15 session 2) - **Phase 4: Git Sync + App Catalog Audit** — major milestone - **Git sync module** (`internal/sync/sync.go`): - Clones/pulls app-catalog-felhom.eu repo to local cache on startup - Periodic sync based on `git.sync_interval` (default 15m) - Copies `docker-compose.yml` + `.felhom.yml` to stacks dir (never overwrites `app.yaml`/`.env`) - SHA-256 content comparison — only writes changed files - Triggers `ScanStacks()` after sync so dashboard updates immediately - Uses `os/exec` git CLI — no Go git library dependency - **Manual sync button** ("Sablonok frissítése") on Alkalmazások page: - `POST /api/sync` endpoint with 30s debounce - Toast notification shows result (success/failure/what changed) - Auto-reloads page if new apps or updates detected - **Sync status** added to `/api/system/info` (last_sync, last_status, syncing flag) - **.felhom.yml files created for all 10 apps** (paperless-ngx already had one): - actualbudget, docmost, filebrowser, homebox, immich, mealie, romm, stirling-pdf, vaultwarden - All follow the same format: display_name, description, category, subdomain, resources, deploy_fields - **Docker Compose templates audited and fixed** for all 10 apps: - Fixed `{{DOMAIN}}` → `${DOMAIN}` syntax in homebox, mealie, romm, stirling-pdf - Fixed `{{HDD_PATH}}` → `${HDD_PATH}` in romm - Added `deploy.resources.limits.memory` to all services across all templates - Added `TZ=Europe/Budapest` to all sidecar services (postgres, redis, mariadb) - Added healthcheck to romm main service - Added `romm-redis` `condition: service_healthy` (was `service_started`) - Standardized header comment blocks across all templates - **Documentation updated**: app-catalog README, CLAUDE.md, CONTEXT.md ### Previously completed (2026-02-15 session 1) - **Memory validation during deployment**: - Pre-deploy memory check: compares `mem_request` sum against usable system RAM - Hard block if requests exceed usable memory (total - 384MB reserved) - Soft warning if `mem_limit` sum exceeds total RAM (overcommit OK for limits) - `ParseMemoryMB()` supports "500M", "1G", "1.5G", "1024" formats - `CommittedMemory()` sums requests/limits across all deployed stacks - Memory summary bar shown on deploy page before user clicks deploy - `system.reserved_memory_mb` configurable in controller.yaml (default: 384) - **Display: `~` prefix on mem_request** in UI badges (display-only, exact value stored) - **Felhom.eu logo** replaced text logos in sidebar and login page with actual SVG logo - Logo SVG embedded as Go string constant, served at `/static/felhom-logo.svg` ### Previously completed (2026-02-14) - **System info bar on Vezérlőpult dashboard**: RAM, SSD, and optional HDD usage - Progress bars with color coding (green < 70%, yellow 70-85%, red > 85%) - New `internal/system` package reads `/proc/meminfo` + `syscall.Statfs` - Platform-specific: Linux impl + non-Linux stub (build tags) - Hungarian labels: "Memória", "SSD tárhely", "Külső HDD" - **Docker Compose memory limits** on paperless-ngx template: - paperless-webserver: 768M, postgres: 256M, redis: 128M - Added `mem_limit` field to `.felhom.yml` ResourceHints (total: 1152M) - **`/api/system/info` endpoint** now returns live system metrics (was customer info) - **Config**: Added `paths.hdd_path` for external HDD monitoring - Controller image builds via build.sh, pushes to Gitea container registry ### Previously completed (2026-02-13) - Built the entire felhom-controller from scratch (Go, no frameworks) - Debugged and fixed 7 issues during first real deployment: 1. Password validation (empty passwords accepted) 2. In-memory Deployed flag not updating after deploy 3. Health-aware state parsing (starting/unhealthy detection) 4. Random card ordering (Go map iteration) 5. "Részletek" button redirect for deployed apps 6. Paperless OCR language installation (LANGUAGES vs LANGUAGE env var) 7. Documentation: restart vs up -d for image updates ### What's next (priorities) 1. **Manual steps for v0.6.0** — Viktor needs to: - Create 5 healthcheck checks on status.felhom.eu with correct periods/grace - Update controller.yaml on demo-felhom with real UUIDs - Build + deploy felhom-hub to k3s (`cd hub && make docker-push`, `kubectl apply -f manifests/hub.yaml`) - Configure hub.felhom.eu DNS in Cloudflare - Enable hub reporting on demo-felhom (`hub.enabled: true`, `hub.api_key: <key>`) 2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps 3. **Test backup integrity check** — wait for Sunday 04:00 or manually trigger 4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden) 5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) 6. Test on Raspberry Pi (pi-customer-1) 7. Phase 4: Self-update mechanism 8. v0.6.1: Hub alerting (webhook to Healthchecks for stale customers) ## Architecture decisions | Decision | Rationale | |----------|-----------| | Go stdlib for web (no Gin/Echo) | Minimal dependencies, single binary, easy to embed templates | | Templates as go:embed HTML/CSS files | Zero runtime file dependencies (compiled into binary), but each template is a separate editable file | | Docker Compose for customers (not k8s) | Simpler troubleshooting, customers don't need k8s knowledge | | k3s for management infra only | Viktor's own services (gitea, monitoring, website) run on k3s | | Cloudflare Tunnel for remote access | No port forwarding needed, works behind any NAT | | app.yaml per stack | Separates deploy config from compose files, survives git pulls | | Password fields require explicit input | Prevents accidental empty-password deployments | | Health-aware state from Docker Status field | Docker's State says "running" even for unhealthy containers | | Memory limits via deploy.resources.limits | Prevents runaway containers; ~50% headroom over expected usage | | System info from /proc/meminfo + statfs | No external dependencies, cheap to read on each page load | | mem_request vs mem_limit (K8s-inspired) | Requests = expected usage (hard block), limits = peak (overcommit OK) | | 384MB reserved for system | Prevents deploying apps that would starve the OS/controller | | Logo SVG embedded as Go constant | Same approach as CSS/HTML — zero external file deps | | Git sync via os/exec git CLI | No Go git library needed, git is in the container image | | SHA-256 for content comparison | Only copy changed files, avoid unnecessary disk writes | | 30s debounce on manual sync | Prevents spamming the git server | | Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI | | FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script | | Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) | | Central scheduler (not ad-hoc goroutines) | Single place to register/monitor all periodic tasks, graceful shutdown, skip-if-running | | CPU sampling via background goroutine | /proc/stat delta needs two readings — collector runs every 5s, GetInfo() reads cached value | | Temperature from /host/sys (Docker mount) | Container can't read host /sys directly — mount /sys:/host/sys:ro, try /host/sys first | | Restic password auto-generated | No manual setup needed — generated on first backup run, stored in named volume | | DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars | | Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger | | modernc.org/sqlite (pure Go) | No CGO/gcc needed in Docker build stage — keeps `CGO_ENABLED=0` static binary | | Chart.js embedded locally | Customer hardware may not have internet — CDN not reliable for offline environments | | Metrics downsampling via SQL | Bucket-based AVG in GROUP BY keeps Chart.js responsive with up to 30 days of data | | 60s metrics collection interval | Good balance of resolution vs. storage — ~44K rows/month for system metrics | | /etc/os-release mounted read-only | Container can't read host OS info directly — mount to /host/etc/os-release:ro | ## Key file locations on demo-felhom ``` /opt/docker/felhom-controller/ # Controller compose + config ├── controller.yaml # Customer config (domain, auth, paths) ├── docker-compose.yml # Controller's own compose └── .env # DOMAIN=demo-felhom.eu /opt/docker/stacks/ # All app stacks ├── traefik/ # Reverse proxy (protected) ├── cloudflared/ # Tunnel (protected) ├── paperless-ngx/ # First deployed app ✅ │ ├── docker-compose.yml │ ├── .felhom.yml # App metadata │ └── app.yaml # Deploy config (env vars, locked fields) └── whoami/ # Test stack (not deployed) /mnt/hdd_placeholder/storage/ # HDD storage for apps └── paperless/ ├── consume/ # Drop files here for OCR ├── media/ # Processed documents └── export/ # Backup exports ``` ## Related repositories and their state | Repository | Status | Notes | |------------|--------|-------| | deploy-felhom-compose | Active | This repo. Controller code + deploy scripts | | app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits | | felhom.eu | Active | Website + hub/ subfolder (felhom-hub service) + k8s manifests | | homelab-manifests | Stable | k3s cluster running (dooplex.hu services) | | misc-scripts | Utility | collect-repo.sh, backup helpers | ## Gotchas & lessons learned - `docker compose restart` ≠ `docker compose up -d` — restart doesn't pick up new images - Go maps have random iteration order — always sort slices before displaying - Docker `.State`="running" doesn't mean healthy — check `.Status` for "(health: starting)" / "(unhealthy)" - Paperless-ngx needs `PAPERLESS_OCR_LANGUAGES` (plural) to install language packs, `PAPERLESS_OCR_LANGUAGE` (singular) to select - In-memory Deployed flag must be set BEFORE `docker compose up -d` (not after) — compose can take 30-60s for image pulls, during which the UI would show a stale "Telepítés" button - Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers - BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage - `docker compose up -d` returns exit 0 even when containers immediately crash-loop — need post-start status check to detect this - When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files - Mealie image (`ghcr.io/mealie-recipes/mealie`) doesn't include wget/curl — use Python TCP socket check for healthcheck - Mealie DB migrations on first start take ~40s (alembic) — use `start_period: 60s` to avoid premature unhealthy status - Alpine-based images (filebrowser, vaultwarden) have wget via BusyBox — healthchecks with `wget --spider` work fine - Deploy `sed` command to update image version must target only the `image:` line — naive `sed 's|name:OLD|name:NEW|'` also matches the service name line (e.g., `felhom-controller:` → `felhom-controller:0.2.12`), breaking YAML. Use `sudo sed -i 's|image:.*felhom-controller:[^ ]*|image: ...felhom-controller:NEW|'` or similar scoped pattern - Hungarian quotation marks `„"` in YAML: `„` (U+201E) is safe inside YAML double-quoted strings, but the closing `"` must NOT be ASCII `"` (0x22) — it terminates the YAML string. Use `\"` escape or Unicode `"` (U+201D). This caused a silent parse failure for the entire `.felhom.yml` file - Never silently swallow parse errors — always log them. Silent failures make debugging impossible (took a dedicated debug session to find a simple quoting issue)