eb7ae1323f
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
386 lines
30 KiB
Markdown
386 lines
30 KiB
Markdown
# CONTEXT.md — Project Memory
|
|
|
|
> This file serves as persistent project memory across Claude Code sessions.
|
|
> It replaces the auto-generated "Memory" from the claude.ai Project.
|
|
> **Update this file at the end of each working session** with current state,
|
|
> recent decisions, and anything the next session needs to know.
|
|
>
|
|
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
|
|
|
|
Last updated: 2026-02-16 (session 13)
|
|
|
|
---
|
|
|
|
## About Viktor (project owner)
|
|
|
|
- Works at Magyar Telekom (Budapest), building Felhom as a side business
|
|
- Felhom: managed home-server service for Hungarian households
|
|
- Technical but prefers pragmatic solutions over over-engineering
|
|
- Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management
|
|
- Customer deployments use Docker Compose (not Kubernetes) for simplicity
|
|
|
|
## Current project state
|
|
|
|
### felhom-controller (this repo)
|
|
- **Version:** v0.4.6
|
|
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
|
|
- **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings)
|
|
- **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**)
|
|
- **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13)
|
|
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
|
|
- **All Phase 1-3 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page
|
|
|
|
### What was just completed (2026-02-16 session 13)
|
|
- **v0.4.6 — MariaDB Validation Fix + Dashboard & Protected Stack UX:**
|
|
- **Bugfix: MariaDB dump validation false positive** — MariaDB 11.4+ prepends `/*M!999999\- enable the sandbox mode */` before the dump header comment. `ValidateDump()` now scans the first 10 lines for the expected header pattern instead of just checking line 1. Accepts `-- MariaDB dump`, `-- MySQL dump`, `-- mysqldump` for MariaDB and `-- PostgreSQL database dump` for PostgreSQL.
|
|
- **Dashboard shows deployed apps only** — `dashboardHandler()` filters to deployed + protected stacks only. Non-deployed apps remain on the Alkalmazások page. Section heading changed to "Telepített alkalmazások". `TotalCount` stat card still shows all 52 apps.
|
|
- **Protected stack restart button** — Protected stacks (traefik, cloudflared, felhom-controller, filebrowser) now show an "Újraindítás" restart button when operational, on both dashboard (compact ↻) and Alkalmazások page (full button). "Védett" / "Védett rendszerkomponens" badge still shown.
|
|
- **API protection guard** — Centralized guard in `actionStack()` blocks all actions except `restart` on protected stacks (HTTP 403). Defense-in-depth: `StopStack()` and `DeleteStack()` retain their own guards.
|
|
- **FileBrowser `.felhom.yml`** — `install_filebrowser()` in `docker-setup.sh` now creates `.felhom.yml` with `subdomain: files` metadata, so the controller shows the `files.DOMAIN ↗` URL link. Manually created on demo node.
|
|
- **Controller version:** v0.4.6 — deployed and verified on demo-felhom.eu
|
|
|
|
### What was previously completed (2026-02-16 session 12)
|
|
- **v0.4.5 — Dedicated Backup Page ("Biztonsági mentés"):**
|
|
- **New `/backups` page** with full backup system visibility — 5 sections:
|
|
1. **Status overview cards**: Local backup status (green/gray), remote placeholder (gray), DB count, repo size
|
|
2. **Schedule section**: DB dump/restic/prune schedule with next-run times, last backup time + duration, retention policy, "Mentés most" button
|
|
3. **Database table**: Lists all discovered DBs with type badge (PostgreSQL/MariaDB), dump file size, last dump time, validation (table count), status
|
|
4. **Snapshot history table**: Last 20 snapshots with ID, time, data added, files new/changed
|
|
5. **Repository info card**: Path, size, snapshot count, integrity check status, backed-up paths list, remote copy placeholder
|
|
- **Backend extensions:**
|
|
- `SnapshotRecord` type + ring buffer (20 entries) in Manager for per-snapshot stats
|
|
- `DumpValidation` — scans dump files for CREATE TABLE statements, validates header and file size
|
|
- `ValidateDump()` runs after each successful dump in `DumpOne()`
|
|
- `ListDumpFiles()` scans dump directory for existing `.sql` files (fallback when in-memory results empty)
|
|
- `ListSnapshots()` on ResticManager — returns all snapshots from restic (newest first)
|
|
- `GetFullStatus()` on Manager — single call returns everything the page needs
|
|
- `LoadSnapshotHistory()` populates history from restic on startup (without delta stats)
|
|
- Restic check result tracking (`lastCheckTime`, `lastCheckOK`)
|
|
- `NextDailyRun()` exported from scheduler for next-run time calculation
|
|
- **Server wiring:**
|
|
- `Server` struct now holds `*scheduler.Scheduler`
|
|
- `NewServer()` accepts scheduler parameter
|
|
- `/backups` route + `backupsHandler()` in handlers.go
|
|
- **New template functions** (`funcmap.go`): `timeAgo`, `fmtTime`, `fmtTimeShort`, `dbTypeLabel`, `nextRunLabel`, `pruneLabel`, `nextPruneLabel`, `fmtDuration`, `fmtBytes`, `shortID`
|
|
- **Navigation**: Sidebar now has 3 items (Vezérlőpult, Alkalmazások, Biztonsági mentés)
|
|
- **Dashboard**: Backup card title is now a clickable link to `/backups`
|
|
- **Auto-refresh**: Page polls `/api/backup/status` every 3s during backup-in-progress, reloads when complete
|
|
- **CSS**: Full dark-theme styles for schedule card, database table, snapshot table, repository card, validation badges, DB type badges, empty state
|
|
- **Controller version:** v0.4.5 — deployed and verified on demo-felhom.eu (2 historical snapshots loaded)
|
|
|
|
### What was previously completed (2026-02-15 session 11)
|
|
- **v0.4.1 — App Filtering + Bugfixes:**
|
|
- **Filter bar on Alkalmazások page**: Four pill-shaped filter buttons (Mind/Futó/Leállítva/Telepíthető) with live count badges computed from DOM. Filters stack cards via `display: none`, updates URL with `?filter=running` via `history.replaceState`. Reads filter from URL on page load for deep-linking support.
|
|
- **New `filterCategory` template function** (`funcmap.go`): Maps container state + deployed flag to filter categories (running/stopped/available). Each stack card gets a `data-filter-state` attribute for client-side filtering.
|
|
- **Clickable dashboard stat cards**: Stat cards (Futó/Leállítva/Összes) changed from `<div>` to `<a>` with `href` linking to `/stacks?filter=running`, `/stacks?filter=stopped`, `/stacks` respectively. Hover effect with translateY + box-shadow.
|
|
- **docker-compose.yml synced to demo node**: Fixed the stale compose file that still had `dashboard.${DOMAIN}` Traefik label (from pre-v0.3.0). Now uses correct `felhom.${DOMAIN}` label + `/sys:/host/sys:ro` mount.
|
|
- **Controller version:** v0.4.1 — deployed and verified on demo-felhom.eu
|
|
- **Remaining manual tasks for Viktor (Task 2 & 3 from TASK.md):**
|
|
- Verify `felhom.demo-felhom.eu` resolves correctly (Cloudflare Tunnel public hostname may need updating from `dashboard.*` to `felhom.*`)
|
|
- Update Pi-hole local DNS if applicable
|
|
- Enable backup in `controller.yaml` on demo node (`backup.enabled: true`)
|
|
- Create `/srv/backups` directories on demo node
|
|
|
|
### What was previously completed (2026-02-15 session 10)
|
|
- **v0.4.0 — Monitoring & Health + Backups (Phase 2 & 3):**
|
|
- **Central job scheduler** (`internal/scheduler/scheduler.go`):
|
|
- Replaces ad-hoc goroutines in main.go with a unified scheduler
|
|
- `Every(name, interval, fn)` for periodic jobs, `Daily(name, timeStr, fn)` for scheduled tasks
|
|
- Panic recovery, skip-if-running, quiet mode for high-frequency jobs (≤30s)
|
|
- Daily jobs use `Europe/Budapest` timezone with `time.Timer` for DST correctness
|
|
- Graceful shutdown with 30s timeout for running jobs
|
|
- **CPU usage collector** (`internal/system/cpu_linux.go`):
|
|
- Background goroutine samples `/proc/stat` every 5s, computes delta-based CPU %
|
|
- Platform stubs for non-Linux in `cpu_other.go`
|
|
- **Temperature & load metrics** (`internal/system/info_linux.go`):
|
|
- Reads `/proc/loadavg` for 1/5/15 min load averages
|
|
- Reads thermal zones from `/host/sys/class/thermal/` (Docker mount) with `/sys/` fallback
|
|
- Handles millidegree values, picks highest zone, with hwmon fallback
|
|
- **Healthchecks.io pinger** (`internal/monitor/pinger.go`):
|
|
- HTTP ping client for Healthchecks.io-compatible endpoints
|
|
- POST to `/ping/{uuid}` (success), `/fail` (failure), `/start` (started)
|
|
- 10s timeout, 3 retries with 2s backoff, skips CHANGEME UUIDs
|
|
- **System health checks** (`internal/monitor/healthcheck.go`):
|
|
- Checks disk, memory, CPU, temperature, Docker reachability, protected containers
|
|
- Returns HealthReport with status "ok"/"warn"/"fail" + formatted message for pings
|
|
- **Database dump engine** (`internal/backup/dbdump.go`):
|
|
- Auto-discovers PostgreSQL/MariaDB containers via `docker ps` + `docker inspect`
|
|
- Dumps via `docker exec pg_dump`/`mariadb-dump` with 5min timeout
|
|
- Atomic writes (`.tmp` → `.sql`), empty file detection, stale temp cleanup
|
|
- **Restic integration** (`internal/backup/restic.go`):
|
|
- Auto-generates repository password (32 random bytes, base64url)
|
|
- Init, snapshot (JSON output), prune, check, stats, latest snapshot
|
|
- Stale lock detection with automatic unlock + retry
|
|
- **Backup orchestrator** (`internal/backup/backup.go`):
|
|
- DB dumps + restic snapshots, weekly prune on Sundays
|
|
- Thread-safe running flag, Healthchecks.io pings with results
|
|
- `RunFullBackup()` for manual trigger (sequential: dumps → snapshot)
|
|
- **Wiring updates:**
|
|
- `main.go`: scheduler-based job registration, cpuCollector lifecycle, pinger + backupMgr init
|
|
- `api/router.go`: `GET /api/backup/status`, `POST /api/backup/run`
|
|
- `web/server.go` + `handlers.go`: pass cpuCollector to GetInfo(), backup status on dashboard
|
|
- `funcmap.go`: `tempColor`, `fmtTemp`, `fmtLoad` template functions
|
|
- **Dashboard UI enhancements:**
|
|
- CPU usage bar with load average display below
|
|
- Temperature with colored indicator dot (green/yellow/red at 60°/75°C)
|
|
- Backup status card: last run time, DB count, repo size/snapshots
|
|
- "Mentés most" button triggers manual backup via API
|
|
- **Config updates:**
|
|
- `controller.yaml.example`: added `system_health_interval`, `hdd_path`, `system.reserved_memory_mb`
|
|
- `docker-compose.yml`: added `/sys:/host/sys:ro` mount for temperature reading
|
|
- `restic_password_file` default changed to `data/` subdir (auto-generated in named volume)
|
|
- **Controller version:** v0.4.0 — deployed and verified on demo-felhom.eu
|
|
|
|
### What was previously completed (2026-02-15 session 9)
|
|
- **v0.3.0 — Structural refactoring (templates + server split + domain rename):**
|
|
- **Templates: go:embed migration** — moved all 7 HTML templates + CSS from Go string constants to individual files in `internal/web/templates/`. Created `embed.go` with `//go:embed` directive. Template loading now uses `ParseFS()` instead of `Parse()`. CSS served from embed.FS via `ReadFile()`. Zero runtime file dependencies — still compiled into the binary.
|
|
- **Server decomposition** — split monolithic `server.go` (540 lines) into focused files:
|
|
- `auth.go`: session struct, auth middleware, login/logout handlers, session management
|
|
- `handlers.go`: page handlers (dashboard, stacks, logs, deploy, app detail)
|
|
- `funcmap.go`: template FuncMap with 14 custom functions
|
|
- `server.go`: Server struct, NewServer, loadTemplates (3-liner), ServeHTTP routing, render helper, static file serving
|
|
- **Domain rename** — controller subdomain changed from `dashboard.*` to `felhom.*` in Traefik labels and setup script
|
|
- **Documentation updated** — CLAUDE.md, README.md, CONTEXT.md all reflect new file structure
|
|
- **Reminder for Viktor:** Update Cloudflare Tunnel public hostname (`dashboard.demo-felhom.eu` → `felhom.demo-felhom.eu`) and Pi-hole DNS if needed
|
|
- **Controller version:** v0.3.0
|
|
|
|
### What was previously completed (2026-02-15 session 8)
|
|
- **FileBrowser as infrastructure service:**
|
|
- Created `scripts/hdd-setup.sh` (adapted from deploy-portainer) — sets up HDD folder structure with `Dokumentumok` user dir
|
|
- Created `scripts/docker-setup.sh` (adapted from deploy-portainer) — installs Docker, Traefik, FileBrowser as infra services
|
|
- Added `filebrowser` to protected stacks in `controller.yaml.example`
|
|
- Removed `templates/filebrowser/` from app-catalog-felhom.eu (no longer a catalog app)
|
|
- **Orphan stack detection and deletion:**
|
|
- Added `Orphaned` field to Stack struct + `getCatalogTemplateSlugs()` helper
|
|
- Orphan detection in `ScanStacks()` — deployed stacks with no matching catalog template marked as orphaned
|
|
- New `delete.go`: `DeleteStack()` (compose down + HDD cleanup + dir removal), `GetStackHDDData()`, `parseComposeHDDMounts()`
|
|
- Safety: protected HDD paths (root, media, storage, Dokumentumok, appdata) can never be deleted
|
|
- New API endpoints: `DELETE /api/stacks/{name}` and `GET /api/stacks/{name}/hdd-data`
|
|
- UI: orange "Elavult" badge on orphaned stacks, "Törlés" button, delete confirmation modal
|
|
- Modal shows HDD data paths/sizes, checkbox for "Felhasználói adatok törlése a merevlemezről"
|
|
- Hides "Frissítés" and "Részletek" buttons for orphaned stacks
|
|
- **Verified:** 1 orphaned stack detected on startup (filebrowser — now infra, removed from catalog)
|
|
- **Controller version:** v0.2.15
|
|
|
|
### Previously completed (2026-02-14 session 7)
|
|
- **Fixed YAML parse error in romm `.felhom.yml`** (app-catalog repo):
|
|
- Root cause: Hungarian opening quote `„` (U+201E) paired with ASCII `"` (0x22) inside YAML double-quoted strings terminated the string prematurely
|
|
- Affected lines: `help_text` for IGDB Client Secret and SteamGridDB API Key fields
|
|
- Fix: escaped inner ASCII double quotes with `\"` in the YAML strings
|
|
- This caused `LoadMetadata()` to silently fail and return empty defaults for ALL romm metadata (tagline, resources, category — everything)
|
|
- **Added error logging to `LoadMetadata()`** in `metadata.go`:
|
|
- `[ERROR]` log on YAML parse failure (was silently swallowed — critical bug)
|
|
- Temporary `[DEBUG]` log used for diagnosis, then removed
|
|
- **Fixed deploy command in CLAUDE.md**:
|
|
- `sed` pattern now targets only `image:` lines (was matching service name too, breaking YAML)
|
|
- Added `sudo` for both sed and docker compose (directory is root-owned)
|
|
- **Controller version:** v0.2.14
|
|
|
|
### Previously completed (2026-02-14 session 6)
|
|
- **Bug fix: App info logo SVG rendering** — `.app-info-logo` CSS in `templates.go`:
|
|
- Added `min-width`, `min-height`, `max-width`, `max-height: 80px` and `overflow: hidden`
|
|
- Prevents SVG images with explicit dimensions or no viewBox from overflowing container
|
|
- Logo now reliably renders at 80x80 regardless of SVG intrinsic size
|
|
- **Controller version:** v0.2.12
|
|
|
|
### Previously completed (2026-02-14 session 5)
|
|
- **App detail/info pages** — new feature:
|
|
- New route: `GET /apps/{slug}` renders a full info page (was redirect to deploy page)
|
|
- Hero section with logo, tagline, resource badges
|
|
- Screenshots section (graceful — hidden via `onerror` if assets don't exist)
|
|
- Info cards: use cases, first steps, prerequisites, default credentials, docs link
|
|
- Optional config form with AJAX save (POST `/api/stacks/{name}/optional-config`)
|
|
- New `.felhom.yml` fields: `app_info` (tagline, use_cases, first_steps, prerequisites, default_creds, docs_url) and `optional_config` (groups of env var fields)
|
|
- New structs in `metadata.go`: `AppInfo`, `OptionalConfigGroup`, `OptionalConfigField`
|
|
- `UpdateOptionalConfig` in `deploy.go`: saves optional env vars to `app.yaml`, restarts deployed stacks with `docker compose up -d` to pick up new env vars
|
|
- Navigation updated: stack cards on dashboard/stacks pages now link to `/apps/{slug}`, deploy page has "Részletek" link back to info page
|
|
- **RoMM metadata updated** (app-catalog repo):
|
|
- Full `app_info` section: tagline, 5 use cases, 6 first steps, 3 prerequisites, default creds, docs URL
|
|
- 6 optional config fields for metadata providers: IGDB (client_id + secret), SteamGridDB, ScreenScraper (user + password), MobyGames
|
|
- docker-compose.yml updated with SCREENSCRAPER_USER, SCREENSCRAPER_PASSWORD, MOBYGAMES_API_KEY env vars
|
|
- Display name fixed: "ROMM" → "RomM"
|
|
- **Controller version:** v0.2.11
|
|
|
|
### Previously completed (2026-02-14 session 4)
|
|
- **Fixed deploy race condition** in `internal/stacks/deploy.go`:
|
|
- In-memory `Deployed` flag now set BEFORE `docker compose up -d` (compose up can take 30-60s for image pulls)
|
|
- On failure: both in-memory state and disk (app.yaml) are reverted
|
|
- Eliminates stale "Telepítés" button during long compose operations
|
|
- **Added `checkBeforeDeploy()` JS guard** in `internal/web/templates.go`:
|
|
- Telepítés buttons on Vezérlőpult and Alkalmazások pages now fetch live state from `/api/stacks/{name}` before navigating
|
|
- If app is already deployed (e.g., another tab deployed it), shows alert and reloads page instead of navigating to deploy form
|
|
- Catches stale UI state gracefully
|
|
|
|
### Previously completed (2026-02-14 session 3)
|
|
- **Enhanced debug logging** across all stack operations in `internal/stacks/`:
|
|
- **Operation timing**: All stack ops (start, stop, restart, update, deploy) now log elapsed time
|
|
- **Post-start container state check**: Async goroutine after start/restart/update/deploy
|
|
- **Image pull detection**: Checks local images before deploy/update (debug level)
|
|
- **GetLogs/ScanStacks improvements**: Byte count logging, deployed/available counts
|
|
- All verbose checks gated on `cfg.Logging.Level == "debug"`; timing always at INFO
|
|
- **UI improvements** in `internal/web/templates.go` and `server.go`:
|
|
- **Memory bar fix on deploy page**: Bar segments now always visible (min-width: 3px), new app segment uses translucent green with distinct border for clear visual separation from committed memory
|
|
- **Clickable app cards**: Cards on Vezérlőpult and Alkalmazások pages are now clickable (navigates to deploy/detail page). Uses `data-href` attribute + delegated click handler. Protected stacks excluded. Actions area (buttons, state labels) excluded from click-to-navigate
|
|
- **Live-scrolling logs**: Logs page now auto-refreshes every 3s via AJAX polling (`?raw=1` returns plain text). Fixed-height container (70vh) with auto-scroll to bottom. Pulsing green "Élő" indicator. Pause/resume toggle ("Szüneteltetés"/"Folytatás"). User scroll position preserved when scrolled up to read history
|
|
- **Deployment progress UI**: Deploy button no longer shows alert+redirect immediately. Instead shows 3-step progress panel: config saved → containers starting → app initializing. Polls `GET /api/stacks/{name}` every 3s to track actual container health state. Handles running (auto-redirect), starting (keep polling), unhealthy (warning), exited (error), and 120s timeout. Shows elapsed time counter
|
|
- **Mealie healthcheck fix** (app-catalog-felhom.eu):
|
|
- `wget --spider` replaced with Python TCP socket check — mealie image doesn't include wget
|
|
- `start_period` increased to 60s (DB migrations take ~40s on first start)
|
|
- **Healthcheck audit**: filebrowser (Alpine, has BusyBox wget — OK), stirling-pdf (Ubuntu, has wget — OK)
|
|
|
|
### Previously completed (2026-02-15 session 2)
|
|
- **Phase 4: Git Sync + App Catalog Audit** — major milestone
|
|
- **Git sync module** (`internal/sync/sync.go`):
|
|
- Clones/pulls app-catalog-felhom.eu repo to local cache on startup
|
|
- Periodic sync based on `git.sync_interval` (default 15m)
|
|
- Copies `docker-compose.yml` + `.felhom.yml` to stacks dir (never overwrites `app.yaml`/`.env`)
|
|
- SHA-256 content comparison — only writes changed files
|
|
- Triggers `ScanStacks()` after sync so dashboard updates immediately
|
|
- Uses `os/exec` git CLI — no Go git library dependency
|
|
- **Manual sync button** ("Sablonok frissítése") on Alkalmazások page:
|
|
- `POST /api/sync` endpoint with 30s debounce
|
|
- Toast notification shows result (success/failure/what changed)
|
|
- Auto-reloads page if new apps or updates detected
|
|
- **Sync status** added to `/api/system/info` (last_sync, last_status, syncing flag)
|
|
- **.felhom.yml files created for all 10 apps** (paperless-ngx already had one):
|
|
- actualbudget, docmost, filebrowser, homebox, immich, mealie, romm, stirling-pdf, vaultwarden
|
|
- All follow the same format: display_name, description, category, subdomain, resources, deploy_fields
|
|
- **Docker Compose templates audited and fixed** for all 10 apps:
|
|
- Fixed `{{DOMAIN}}` → `${DOMAIN}` syntax in homebox, mealie, romm, stirling-pdf
|
|
- Fixed `{{HDD_PATH}}` → `${HDD_PATH}` in romm
|
|
- Added `deploy.resources.limits.memory` to all services across all templates
|
|
- Added `TZ=Europe/Budapest` to all sidecar services (postgres, redis, mariadb)
|
|
- Added healthcheck to romm main service
|
|
- Added `romm-redis` `condition: service_healthy` (was `service_started`)
|
|
- Standardized header comment blocks across all templates
|
|
- **Documentation updated**: app-catalog README, CLAUDE.md, CONTEXT.md
|
|
|
|
### Previously completed (2026-02-15 session 1)
|
|
- **Memory validation during deployment**:
|
|
- Pre-deploy memory check: compares `mem_request` sum against usable system RAM
|
|
- Hard block if requests exceed usable memory (total - 384MB reserved)
|
|
- Soft warning if `mem_limit` sum exceeds total RAM (overcommit OK for limits)
|
|
- `ParseMemoryMB()` supports "500M", "1G", "1.5G", "1024" formats
|
|
- `CommittedMemory()` sums requests/limits across all deployed stacks
|
|
- Memory summary bar shown on deploy page before user clicks deploy
|
|
- `system.reserved_memory_mb` configurable in controller.yaml (default: 384)
|
|
- **Display: `~` prefix on mem_request** in UI badges (display-only, exact value stored)
|
|
- **Felhom.eu logo** replaced text logos in sidebar and login page with actual SVG logo
|
|
- Logo SVG embedded as Go string constant, served at `/static/felhom-logo.svg`
|
|
|
|
### Previously completed (2026-02-14)
|
|
- **System info bar on Vezérlőpult dashboard**: RAM, SSD, and optional HDD usage
|
|
- Progress bars with color coding (green < 70%, yellow 70-85%, red > 85%)
|
|
- New `internal/system` package reads `/proc/meminfo` + `syscall.Statfs`
|
|
- Platform-specific: Linux impl + non-Linux stub (build tags)
|
|
- Hungarian labels: "Memória", "SSD tárhely", "Külső HDD"
|
|
- **Docker Compose memory limits** on paperless-ngx template:
|
|
- paperless-webserver: 768M, postgres: 256M, redis: 128M
|
|
- Added `mem_limit` field to `.felhom.yml` ResourceHints (total: 1152M)
|
|
- **`/api/system/info` endpoint** now returns live system metrics (was customer info)
|
|
- **Config**: Added `paths.hdd_path` for external HDD monitoring
|
|
- Controller image builds via build.sh, pushes to Gitea container registry
|
|
|
|
### Previously completed (2026-02-13)
|
|
- Built the entire felhom-controller from scratch (Go, no frameworks)
|
|
- Debugged and fixed 7 issues during first real deployment:
|
|
1. Password validation (empty passwords accepted)
|
|
2. In-memory Deployed flag not updating after deploy
|
|
3. Health-aware state parsing (starting/unhealthy detection)
|
|
4. Random card ordering (Go map iteration)
|
|
5. "Részletek" button redirect for deployed apps
|
|
6. Paperless OCR language installation (LANGUAGES vs LANGUAGE env var)
|
|
7. Documentation: restart vs up -d for image updates
|
|
|
|
### What's next (priorities)
|
|
1. **Configure Healthchecks.io UUIDs** on demo-felhom.eu (replace CHANGEME in controller.yaml)
|
|
2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps
|
|
3. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI
|
|
4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
|
|
5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets)
|
|
6. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
|
|
7. Test on Raspberry Pi (pi-customer-1)
|
|
8. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
|
|
9. Phase 4: Self-update mechanism
|
|
|
|
## Architecture decisions
|
|
|
|
| Decision | Rationale |
|
|
|----------|-----------|
|
|
| Go stdlib for web (no Gin/Echo) | Minimal dependencies, single binary, easy to embed templates |
|
|
| Templates as go:embed HTML/CSS files | Zero runtime file dependencies (compiled into binary), but each template is a separate editable file |
|
|
| Docker Compose for customers (not k8s) | Simpler troubleshooting, customers don't need k8s knowledge |
|
|
| k3s for management infra only | Viktor's own services (gitea, monitoring, website) run on k3s |
|
|
| Cloudflare Tunnel for remote access | No port forwarding needed, works behind any NAT |
|
|
| app.yaml per stack | Separates deploy config from compose files, survives git pulls |
|
|
| Password fields require explicit input | Prevents accidental empty-password deployments |
|
|
| Health-aware state from Docker Status field | Docker's State says "running" even for unhealthy containers |
|
|
| Memory limits via deploy.resources.limits | Prevents runaway containers; ~50% headroom over expected usage |
|
|
| System info from /proc/meminfo + statfs | No external dependencies, cheap to read on each page load |
|
|
| mem_request vs mem_limit (K8s-inspired) | Requests = expected usage (hard block), limits = peak (overcommit OK) |
|
|
| 384MB reserved for system | Prevents deploying apps that would starve the OS/controller |
|
|
| Logo SVG embedded as Go constant | Same approach as CSS/HTML — zero external file deps |
|
|
| Git sync via os/exec git CLI | No Go git library needed, git is in the container image |
|
|
| SHA-256 for content comparison | Only copy changed files, avoid unnecessary disk writes |
|
|
| 30s debounce on manual sync | Prevents spamming the git server |
|
|
| Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI |
|
|
| FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script |
|
|
| Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) |
|
|
| Central scheduler (not ad-hoc goroutines) | Single place to register/monitor all periodic tasks, graceful shutdown, skip-if-running |
|
|
| CPU sampling via background goroutine | /proc/stat delta needs two readings — collector runs every 5s, GetInfo() reads cached value |
|
|
| Temperature from /host/sys (Docker mount) | Container can't read host /sys directly — mount /sys:/host/sys:ro, try /host/sys first |
|
|
| Restic password auto-generated | No manual setup needed — generated on first backup run, stored in named volume |
|
|
| DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars |
|
|
| Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger |
|
|
|
|
## Key file locations on demo-felhom
|
|
|
|
```
|
|
/opt/docker/felhom-controller/ # Controller compose + config
|
|
├── controller.yaml # Customer config (domain, auth, paths)
|
|
├── docker-compose.yml # Controller's own compose
|
|
└── .env # DOMAIN=demo-felhom.eu
|
|
|
|
/opt/docker/stacks/ # All app stacks
|
|
├── traefik/ # Reverse proxy (protected)
|
|
├── cloudflared/ # Tunnel (protected)
|
|
├── paperless-ngx/ # First deployed app ✅
|
|
│ ├── docker-compose.yml
|
|
│ ├── .felhom.yml # App metadata
|
|
│ └── app.yaml # Deploy config (env vars, locked fields)
|
|
└── whoami/ # Test stack (not deployed)
|
|
|
|
/mnt/hdd_placeholder/storage/ # HDD storage for apps
|
|
└── paperless/
|
|
├── consume/ # Drop files here for OCR
|
|
├── media/ # Processed documents
|
|
└── export/ # Backup exports
|
|
```
|
|
|
|
## Related repositories and their state
|
|
|
|
| Repository | Status | Notes |
|
|
|------------|--------|-------|
|
|
| deploy-felhom-compose | Active | This repo. Controller code + deploy scripts |
|
|
| app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits |
|
|
| felhom.eu | Stable | Website live, SEO indexed, email working |
|
|
| homelab-manifests | Stable | k3s cluster running (dooplex.hu services) |
|
|
| misc-scripts | Utility | collect-repo.sh, backup helpers |
|
|
|
|
## Gotchas & lessons learned
|
|
|
|
- `docker compose restart` ≠ `docker compose up -d` — restart doesn't pick up new images
|
|
- Go maps have random iteration order — always sort slices before displaying
|
|
- Docker `.State`="running" doesn't mean healthy — check `.Status` for "(health: starting)" / "(unhealthy)"
|
|
- Paperless-ngx needs `PAPERLESS_OCR_LANGUAGES` (plural) to install language packs, `PAPERLESS_OCR_LANGUAGE` (singular) to select
|
|
- In-memory Deployed flag must be set BEFORE `docker compose up -d` (not after) — compose can take 30-60s for image pulls, during which the UI would show a stale "Telepítés" button
|
|
- Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers
|
|
- BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage
|
|
- `docker compose up -d` returns exit 0 even when containers immediately crash-loop — need post-start status check to detect this
|
|
- When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files
|
|
- Mealie image (`ghcr.io/mealie-recipes/mealie`) doesn't include wget/curl — use Python TCP socket check for healthcheck
|
|
- Mealie DB migrations on first start take ~40s (alembic) — use `start_period: 60s` to avoid premature unhealthy status
|
|
- Alpine-based images (filebrowser, vaultwarden) have wget via BusyBox — healthchecks with `wget --spider` work fine
|
|
- Deploy `sed` command to update image version must target only the `image:` line — naive `sed 's|name:OLD|name:NEW|'` also matches the service name line (e.g., `felhom-controller:` → `felhom-controller:0.2.12`), breaking YAML. Use `sudo sed -i 's|image:.*felhom-controller:[^ ]*|image: ...felhom-controller:NEW|'` or similar scoped pattern
|
|
- Hungarian quotation marks `„"` in YAML: `„` (U+201E) is safe inside YAML double-quoted strings, but the closing `"` must NOT be ASCII `"` (0x22) — it terminates the YAML string. Use `\"` escape or Unicode `"` (U+201D). This caused a silent parse failure for the entire `.felhom.yml` file
|
|
- Never silently swallow parse errors — always log them. Silent failures make debugging impossible (took a dedicated debug session to find a simple quoting issue) |