# CONTEXT.md — Project Memory > This file serves as persistent project memory across Claude Code sessions. > It replaces the auto-generated "Memory" from the claude.ai Project. > **Update this file at the end of each working session** with current state, > recent decisions, and anything the next session needs to know. > > Ask Claude Code: "Please update CONTEXT.md with what we did today" Last updated: 2026-02-17 (session 28) --- ## About Viktor (project owner) - Works at Deutsche Telekom (Budapest), building Felhom.eu as a side business - Felhom.eu: managed home-server service for Hungarian households - Technical but prefers pragmatic solutions over over-engineering - Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management - Customer deployments use Docker Compose (not Kubernetes) for simplicity ### felhom-controller (this repo) - **Version:** v0.11.4 - **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow - **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings) - **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**) - **Phase 4:** ✅ COMPLETE — Monitoring Page with Metrics Store (SQLite, Chart.js, system + container metrics) - **Phase 5:** ✅ COMPLETE — Authentication, Persistence & Settings Page (settings.json, password change, session management) - **Phase 6:** ✅ COMPLETE — Monitoring Warnings, Dashboard Alerts & Notification System - **Phase 7:** ✅ COMPLETE — Storage Overview, Per-App Backup Toggles & Limited Restore - **Phase A:** ✅ COMPLETE — Storage Paths Foundation (registry, auto-discovery, per-app HDD_PATH, deploy dropdown, health monitoring) - **Phase B:** ✅ COMPLETE — Storage Management UI Polish & Health Severity Fix (flash messages, label editing, app details, FS info, deploy free space, backup context) - **Phase C:** ✅ COMPLETE — Storage Init Wizard, Data Migration & Startup Fix (disk scan/format/mount wizard, rsync-based migration, startup pings) - **v0.11.1 bugfix:** ✅ COMPLETE — Storage Scan: system disk detection via host fstab + blkid UUID resolution; FSType enrichment via `blkid -o export` - **v0.11.2 bugfix:** ✅ COMPLETE — /host-dev mount for block device access; `HostDevicePath()` helper; all format/scan/safety ops use /host-dev - **v0.11.3 bugfix:** ✅ COMPLETE — Added `fdisk` package to Dockerfile (provides `sfdisk`; not in `util-linux` on Debian bookworm) - **v0.11.4 bugfix:** ✅ COMPLETE — FormatAndMount: fixed sfdisk (wipefs+force+`,,`), mount (explicit device path), mount propagation (rshared), ASCII label, smart partition skip, findmnt verification - **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13) - **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080 - **All Phase 1-5 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page, settings page ## Architecture decisions | Decision | Rationale | |----------|-----------| | Go stdlib for web (no Gin/Echo) | Minimal dependencies, single binary, easy to embed templates | | Templates as go:embed HTML/CSS files | Zero runtime file dependencies (compiled into binary), but each template is a separate editable file | | Docker Compose for customers (not k8s) | Simpler troubleshooting, customers don't need k8s knowledge | | k3s for management infra only | Viktor's own services (gitea, monitoring, website) run on k3s | | Cloudflare Tunnel for remote access | No port forwarding needed, works behind any NAT | | app.yaml per stack | Separates deploy config from compose files, survives git pulls | | Password fields require explicit input | Prevents accidental empty-password deployments | | Health-aware state from Docker Status field | Docker's State says "running" even for unhealthy containers | | Memory limits via deploy.resources.limits | Prevents runaway containers; ~50% headroom over expected usage | | System info from /proc/meminfo + statfs | No external dependencies, cheap to read on each page load | | mem_request vs mem_limit (K8s-inspired) | Requests = expected usage (hard block), limits = peak (overcommit OK) | | 384MB reserved for system | Prevents deploying apps that would starve the OS/controller | | Logo SVG embedded as Go constant | Same approach as CSS/HTML — zero external file deps | | Git sync via os/exec git CLI | No Go git library needed, git is in the container image | | SHA-256 for content comparison | Only copy changed files, avoid unnecessary disk writes | | 30s debounce on manual sync | Prevents spamming the git server | | Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI | | FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script | | Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) | | Central scheduler (not ad-hoc goroutines) | Single place to register/monitor all periodic tasks, graceful shutdown, skip-if-running | | CPU sampling via background goroutine | /proc/stat delta needs two readings — collector runs every 5s, GetInfo() reads cached value | | Temperature from /host/sys (Docker mount) | Container can't read host /sys directly — mount /sys:/host/sys:ro, try /host/sys first | | Restic password auto-generated | No manual setup needed — generated on first backup run, stored in named volume | | DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars | | Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger | | modernc.org/sqlite (pure Go) | No CGO/gcc needed in Docker build stage — keeps `CGO_ENABLED=0` static binary | | AlertManager state-based refresh | Alerts regenerated every 5min from health report — no persistent storage needed, always reflects current state | | Notification relay via hub | Controller → hub → Resend → email. Hub acts as central relay: knows customer email, handles Resend API. Controller only needs hub URL + API key | | In-memory notification cooldowns | Per-event-type cooldown map (default 6h). Lost on restart = acceptable (better to re-notify than miss). No persistence needed | | Health status change detection | Only notify on degradation (ok→warn, ok→fail, warn→fail). Avoids spam on flapping. First run records baseline, doesn't notify | | Resend HTTP API (no SMTP) | Direct POST to api.resend.com — same pattern as website contact-mailer. Simpler than SMTP setup, good deliverability | | Preferences sync on save + startup | Controller pushes prefs to hub (not pull). Startup sync handles hub DB rebuild. Local save always succeeds even if sync fails | | Chart.js embedded locally | Customer hardware may not have internet — CDN not reliable for offline environments | | StackDataProvider interface | backup package needs stack data but can't import stacks (circular). Interface in backup, thin adapter in main.go | | Password sync to hub via report | Restic password in Docker named volume on SSD. Hub sync provides redundancy for disaster recovery | | App backup via HDD mounts only | Docker volumes at /var/lib/docker/volumes/ not mounted in controller. HDD data is the important user data; DB in volumes covered by nightly dump | | Restore uses running mutex | Prevents concurrent backup+restore on same restic repo. Reuses existing `m.running` flag | | Storage paths registry in settings.json | Multi-storage support: each app's HDD_PATH from app.yaml is authoritative. Auto-discovery on startup avoids manual config. Registry enables UI management + health monitoring per path | | /mnt:/mnt:rw mount in controller | Replaces per-path HDD_PATH mount. Enables multi-storage + restore writes. All customer HDD mounts are under /mnt/ by convention | | Per-app HDD_PATH resolution (app.yaml > global) | App's own env HDD_PATH is Priority 1, registered storage paths as fallback. Eliminates dependency on global controller.yaml hdd_path | | Mount-point detection via syscall.Stat_t.Dev | Compares device ID of path vs parent dir — reliable check that path is on separate filesystem. Prevents data writes to SSD | | Health severity: mount-point = warning | Non-mount-point is informational, not a service failure. FAIL reserved for genuinely broken things. Avoids false alarms on demo/test environments | | FS info via findmnt + sysfs | `findmnt -n -o SOURCE,FSTYPE --target ` for filesystem type/device. `/sys/block//device/model` for disk model. Best-effort, returns nil on failure | | Query param flash messages | Stateless, no session store needed. Consistent with backup page pattern. `?storage_msg=success&storage_detail=...` | | StorageLabels map on stacks page | Separate map passed to template (not modifying Stack struct). Built from deployed apps' HDD_PATH → registered path label lookup | | Metrics downsampling via SQL | Bucket-based AVG in GROUP BY keeps Chart.js responsive with up to 30 days of data | | 60s metrics collection interval | Good balance of resolution vs. storage — ~44K rows/month for system metrics | | /etc/os-release mounted read-only | Container can't read host OS info directly — mount to /host/etc/os-release:ro | ## Key file locations on demo-felhom ``` /opt/docker/felhom-controller/ # Controller compose + config ├── controller.yaml # Customer config (domain, auth, paths) ├── docker-compose.yml # Controller's own compose └── .env # DOMAIN=demo-felhom.eu /opt/docker/stacks/ # All app stacks ├── traefik/ # Reverse proxy (protected) ├── cloudflared/ # Tunnel (protected) ├── paperless-ngx/ # First deployed app ✅ │ ├── docker-compose.yml │ ├── .felhom.yml # App metadata │ └── app.yaml # Deploy config (env vars, locked fields) └── whoami/ # Test stack (not deployed) /mnt/hdd_placeholder/storage/ # HDD storage for apps └── paperless/ ├── consume/ # Drop files here for OCR ├── media/ # Processed documents └── export/ # Backup exports ``` ## Related repositories and their state | Repository | Status | Notes | |------------|--------|-------| | deploy-felhom-compose | Active | This repo. Controller code + deploy scripts | | app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits | | felhom.eu | Active | Website + hub/ subfolder (felhom-hub service) + k8s manifests | | homelab-manifests | Stable | k3s cluster running (dooplex.hu services) | | misc-scripts | Utility | collect-repo.sh, backup helpers | ## Gotchas & lessons learned - `docker compose restart` ≠ `docker compose up -d` — restart doesn't pick up new images - Go maps have random iteration order — always sort slices before displaying - Docker `.State`="running" doesn't mean healthy — check `.Status` for "(health: starting)" / "(unhealthy)" - Paperless-ngx needs `PAPERLESS_OCR_LANGUAGES` (plural) to install language packs, `PAPERLESS_OCR_LANGUAGE` (singular) to select - In-memory Deployed flag must be set BEFORE `docker compose up -d` (not after) — compose can take 30-60s for image pulls, during which the UI would show a stale "Telepítés" button - Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers - BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage - `docker compose up -d` returns exit 0 even when containers immediately crash-loop — need post-start status check to detect this - When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files - Mealie image (`ghcr.io/mealie-recipes/mealie`) doesn't include wget/curl — use Python TCP socket check for healthcheck - Mealie DB migrations on first start take ~40s (alembic) — use `start_period: 60s` to avoid premature unhealthy status - Alpine-based images (filebrowser, vaultwarden) have wget via BusyBox — healthchecks with `wget --spider` work fine - Deploy `sed` command to update image version must target only the `image:` line — naive `sed 's|name:OLD|name:NEW|'` also matches the service name line (e.g., `felhom-controller:` → `felhom-controller:0.2.12`), breaking YAML. Use `sudo sed -i 's|image:.*felhom-controller:[^ ]*|image: ...felhom-controller:NEW|'` or similar scoped pattern - Hungarian quotation marks `„"` in YAML: `„` (U+201E) is safe inside YAML double-quoted strings, but the closing `"` must NOT be ASCII `"` (0x22) — it terminates the YAML string. Use `\"` escape or Unicode `"` (U+201D). This caused a silent parse failure for the entire `.felhom.yml` file - Never silently swallow parse errors — always log them. Silent failures make debugging impossible (took a dedicated debug session to find a simple quoting issue)