# CONTEXT.md — Project Memory > This file serves as persistent project memory across Claude Code sessions. > It replaces the auto-generated "Memory" from the claude.ai Project. > **Update this file at the end of each working session** with current state, > recent decisions, and anything the next session needs to know. > > Ask Claude Code: "Please update CONTEXT.md with what we did today" Last updated: 2026-02-14 (session 6) --- ## About Viktor (project owner) - Works at Magyar Telekom (Budapest), building Felhom as a side business - Felhom: managed home-server service for Hungarian households - Technical but prefers pragmatic solutions over over-engineering - Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management - Customer deployments use Docker Compose (not Kubernetes) for simplicity ## Current project state ### felhom-controller (this repo) - **Version:** v0.2.12 - **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow - **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13) - **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080 - **All Phase 1 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth ### What was just completed (2026-02-14 session 6) - **Bug fix: App info logo SVG rendering** — `.app-info-logo` CSS in `templates.go`: - Added `min-width`, `min-height`, `max-width`, `max-height: 80px` and `overflow: hidden` - Prevents SVG images with explicit dimensions or no viewBox from overflowing container - Logo now reliably renders at 80x80 regardless of SVG intrinsic size - **Bug 2 (app info sections not rendering) was already resolved** — confirmed: - Demo node `.felhom.yml` already has `app_info` and `optional_config` sections (synced from previous session) - `docker-compose.yml` already has SCREENSCRAPER_USER, SCREENSCRAPER_PASSWORD, MOBYGAMES_API_KEY env vars - **Controller version:** v0.2.12 ### Previously completed (2026-02-14 session 5) - **App detail/info pages** — new feature: - New route: `GET /apps/{slug}` renders a full info page (was redirect to deploy page) - Hero section with logo, tagline, resource badges - Screenshots section (graceful — hidden via `onerror` if assets don't exist) - Info cards: use cases, first steps, prerequisites, default credentials, docs link - Optional config form with AJAX save (POST `/api/stacks/{name}/optional-config`) - New `.felhom.yml` fields: `app_info` (tagline, use_cases, first_steps, prerequisites, default_creds, docs_url) and `optional_config` (groups of env var fields) - New structs in `metadata.go`: `AppInfo`, `OptionalConfigGroup`, `OptionalConfigField` - `UpdateOptionalConfig` in `deploy.go`: saves optional env vars to `app.yaml`, restarts deployed stacks with `docker compose up -d` to pick up new env vars - Navigation updated: stack cards on dashboard/stacks pages now link to `/apps/{slug}`, deploy page has "Részletek" link back to info page - **RoMM metadata updated** (app-catalog repo): - Full `app_info` section: tagline, 5 use cases, 6 first steps, 3 prerequisites, default creds, docs URL - 6 optional config fields for metadata providers: IGDB (client_id + secret), SteamGridDB, ScreenScraper (user + password), MobyGames - docker-compose.yml updated with SCREENSCRAPER_USER, SCREENSCRAPER_PASSWORD, MOBYGAMES_API_KEY env vars - Display name fixed: "ROMM" → "RomM" - **Controller version:** v0.2.11 ### Previously completed (2026-02-14 session 4) - **Fixed deploy race condition** in `internal/stacks/deploy.go`: - In-memory `Deployed` flag now set BEFORE `docker compose up -d` (compose up can take 30-60s for image pulls) - On failure: both in-memory state and disk (app.yaml) are reverted - Eliminates stale "Telepítés" button during long compose operations - **Added `checkBeforeDeploy()` JS guard** in `internal/web/templates.go`: - Telepítés buttons on Vezérlőpult and Alkalmazások pages now fetch live state from `/api/stacks/{name}` before navigating - If app is already deployed (e.g., another tab deployed it), shows alert and reloads page instead of navigating to deploy form - Catches stale UI state gracefully ### Previously completed (2026-02-14 session 3) - **Enhanced debug logging** across all stack operations in `internal/stacks/`: - **Operation timing**: All stack ops (start, stop, restart, update, deploy) now log elapsed time - **Post-start container state check**: Async goroutine after start/restart/update/deploy - **Image pull detection**: Checks local images before deploy/update (debug level) - **GetLogs/ScanStacks improvements**: Byte count logging, deployed/available counts - All verbose checks gated on `cfg.Logging.Level == "debug"`; timing always at INFO - **UI improvements** in `internal/web/templates.go` and `server.go`: - **Memory bar fix on deploy page**: Bar segments now always visible (min-width: 3px), new app segment uses translucent green with distinct border for clear visual separation from committed memory - **Clickable app cards**: Cards on Vezérlőpult and Alkalmazások pages are now clickable (navigates to deploy/detail page). Uses `data-href` attribute + delegated click handler. Protected stacks excluded. Actions area (buttons, state labels) excluded from click-to-navigate - **Live-scrolling logs**: Logs page now auto-refreshes every 3s via AJAX polling (`?raw=1` returns plain text). Fixed-height container (70vh) with auto-scroll to bottom. Pulsing green "Élő" indicator. Pause/resume toggle ("Szüneteltetés"/"Folytatás"). User scroll position preserved when scrolled up to read history - **Deployment progress UI**: Deploy button no longer shows alert+redirect immediately. Instead shows 3-step progress panel: config saved → containers starting → app initializing. Polls `GET /api/stacks/{name}` every 3s to track actual container health state. Handles running (auto-redirect), starting (keep polling), unhealthy (warning), exited (error), and 120s timeout. Shows elapsed time counter - **Mealie healthcheck fix** (app-catalog-felhom.eu): - `wget --spider` replaced with Python TCP socket check — mealie image doesn't include wget - `start_period` increased to 60s (DB migrations take ~40s on first start) - **Healthcheck audit**: filebrowser (Alpine, has BusyBox wget — OK), stirling-pdf (Ubuntu, has wget — OK) ### Previously completed (2026-02-15 session 2) - **Phase 4: Git Sync + App Catalog Audit** — major milestone - **Git sync module** (`internal/sync/sync.go`): - Clones/pulls app-catalog-felhom.eu repo to local cache on startup - Periodic sync based on `git.sync_interval` (default 15m) - Copies `docker-compose.yml` + `.felhom.yml` to stacks dir (never overwrites `app.yaml`/`.env`) - SHA-256 content comparison — only writes changed files - Triggers `ScanStacks()` after sync so dashboard updates immediately - Uses `os/exec` git CLI — no Go git library dependency - **Manual sync button** ("Sablonok frissítése") on Alkalmazások page: - `POST /api/sync` endpoint with 30s debounce - Toast notification shows result (success/failure/what changed) - Auto-reloads page if new apps or updates detected - **Sync status** added to `/api/system/info` (last_sync, last_status, syncing flag) - **.felhom.yml files created for all 10 apps** (paperless-ngx already had one): - actualbudget, docmost, filebrowser, homebox, immich, mealie, romm, stirling-pdf, vaultwarden - All follow the same format: display_name, description, category, subdomain, resources, deploy_fields - **Docker Compose templates audited and fixed** for all 10 apps: - Fixed `{{DOMAIN}}` → `${DOMAIN}` syntax in homebox, mealie, romm, stirling-pdf - Fixed `{{HDD_PATH}}` → `${HDD_PATH}` in romm - Added `deploy.resources.limits.memory` to all services across all templates - Added `TZ=Europe/Budapest` to all sidecar services (postgres, redis, mariadb) - Added healthcheck to romm main service - Added `romm-redis` `condition: service_healthy` (was `service_started`) - Standardized header comment blocks across all templates - **Documentation updated**: app-catalog README, CLAUDE.md, CONTEXT.md ### Previously completed (2026-02-15 session 1) - **Memory validation during deployment**: - Pre-deploy memory check: compares `mem_request` sum against usable system RAM - Hard block if requests exceed usable memory (total - 384MB reserved) - Soft warning if `mem_limit` sum exceeds total RAM (overcommit OK for limits) - `ParseMemoryMB()` supports "500M", "1G", "1.5G", "1024" formats - `CommittedMemory()` sums requests/limits across all deployed stacks - Memory summary bar shown on deploy page before user clicks deploy - `system.reserved_memory_mb` configurable in controller.yaml (default: 384) - **Display: `~` prefix on mem_request** in UI badges (display-only, exact value stored) - **Felhom.eu logo** replaced text logos in sidebar and login page with actual SVG logo - Logo SVG embedded as Go string constant, served at `/static/felhom-logo.svg` ### Previously completed (2026-02-14) - **System info bar on Vezérlőpult dashboard**: RAM, SSD, and optional HDD usage - Progress bars with color coding (green < 70%, yellow 70-85%, red > 85%) - New `internal/system` package reads `/proc/meminfo` + `syscall.Statfs` - Platform-specific: Linux impl + non-Linux stub (build tags) - Hungarian labels: "Memória", "SSD tárhely", "Külső HDD" - **Docker Compose memory limits** on paperless-ngx template: - paperless-webserver: 768M, postgres: 256M, redis: 128M - Added `mem_limit` field to `.felhom.yml` ResourceHints (total: 1152M) - **`/api/system/info` endpoint** now returns live system metrics (was customer info) - **Config**: Added `paths.hdd_path` for external HDD monitoring - Controller image builds via build.sh, pushes to Gitea container registry ### Previously completed (2026-02-13) - Built the entire felhom-controller from scratch (Go, no frameworks) - Debugged and fixed 7 issues during first real deployment: 1. Password validation (empty passwords accepted) 2. In-memory Deployed flag not updating after deploy 3. Health-aware state parsing (starting/unhealthy detection) 4. Random card ordering (Go map iteration) 5. "Részletek" button redirect for deployed apps 6. Paperless OCR language installation (LANGUAGES vs LANGUAGE env var) 7. Documentation: restart vs up -d for image updates ### What's next (priorities) 1. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden) 2. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) to validate all .felhom.yml files 3. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.) 4. Test on Raspberry Pi (pi-customer-1) 5. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar 6. Phase 2 continued: CPU/temperature metrics, Healthchecks.io pings 7. Phase 3: Backup system (DB dumps + restic) ## Architecture decisions | Decision | Rationale | |----------|-----------| | Go stdlib for web (no Gin/Echo) | Minimal dependencies, single binary, easy to embed templates | | Templates as Go string constants | Zero runtime file dependencies, everything in the binary | | Docker Compose for customers (not k8s) | Simpler troubleshooting, customers don't need k8s knowledge | | k3s for management infra only | Viktor's own services (gitea, monitoring, website) run on k3s | | Cloudflare Tunnel for remote access | No port forwarding needed, works behind any NAT | | app.yaml per stack | Separates deploy config from compose files, survives git pulls | | Password fields require explicit input | Prevents accidental empty-password deployments | | Health-aware state from Docker Status field | Docker's State says "running" even for unhealthy containers | | Memory limits via deploy.resources.limits | Prevents runaway containers; ~50% headroom over expected usage | | System info from /proc/meminfo + statfs | No external dependencies, cheap to read on each page load | | mem_request vs mem_limit (K8s-inspired) | Requests = expected usage (hard block), limits = peak (overcommit OK) | | 384MB reserved for system | Prevents deploying apps that would starve the OS/controller | | Logo SVG embedded as Go constant | Same approach as CSS/HTML — zero external file deps | | Git sync via os/exec git CLI | No Go git library needed, git is in the container image | | SHA-256 for content comparison | Only copy changed files, avoid unnecessary disk writes | | 30s debounce on manual sync | Prevents spamming the git server | ## Key file locations on demo-felhom ``` /opt/docker/felhom-controller/ # Controller compose + config ├── controller.yaml # Customer config (domain, auth, paths) ├── docker-compose.yml # Controller's own compose └── .env # DOMAIN=demo-felhom.eu /opt/docker/stacks/ # All app stacks ├── traefik/ # Reverse proxy (protected) ├── cloudflared/ # Tunnel (protected) ├── paperless-ngx/ # First deployed app ✅ │ ├── docker-compose.yml │ ├── .felhom.yml # App metadata │ └── app.yaml # Deploy config (env vars, locked fields) └── whoami/ # Test stack (not deployed) /mnt/hdd_placeholder/storage/ # HDD storage for apps └── paperless/ ├── consume/ # Drop files here for OCR ├── media/ # Processed documents └── export/ # Backup exports ``` ## Related repositories and their state | Repository | Status | Notes | |------------|--------|-------| | deploy-felhom-compose | Active | This repo. Controller code + deploy scripts | | app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits | | felhom.eu | Stable | Website live, SEO indexed, email working | | homelab-manifests | Stable | k3s cluster running (dooplex.hu services) | | misc-scripts | Utility | collect-repo.sh, backup helpers | ## Gotchas & lessons learned - `docker compose restart` ≠ `docker compose up -d` — restart doesn't pick up new images - Go maps have random iteration order — always sort slices before displaying - Docker `.State`="running" doesn't mean healthy — check `.Status` for "(health: starting)" / "(unhealthy)" - Paperless-ngx needs `PAPERLESS_OCR_LANGUAGES` (plural) to install language packs, `PAPERLESS_OCR_LANGUAGE` (singular) to select - In-memory Deployed flag must be set BEFORE `docker compose up -d` (not after) — compose can take 30-60s for image pulls, during which the UI would show a stale "Telepítés" button - Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers - BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage - `docker compose up -d` returns exit 0 even when containers immediately crash-loop — need post-start status check to detect this - When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files - Mealie image (`ghcr.io/mealie-recipes/mealie`) doesn't include wget/curl — use Python TCP socket check for healthcheck - Mealie DB migrations on first start take ~40s (alembic) — use `start_period: 60s` to avoid premature unhealthy status - Alpine-based images (filebrowser, vaultwarden) have wget via BusyBox — healthchecks with `wget --spider` work fine - Deploy `sed` command to update image version must target only the `image:` line — naive `sed 's|name:OLD|name:NEW|'` also matches the service name line (e.g., `felhom-controller:` → `felhom-controller:0.2.12`), breaking YAML. Use `sudo sed -i 's|image:.*felhom-controller:[^ ]*|image: ...felhom-controller:NEW|'` or similar scoped pattern