63d81088bd
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
19 KiB
19 KiB
CONTEXT.md — Project Memory
This file serves as persistent project memory across Claude Code sessions. It replaces the auto-generated "Memory" from the claude.ai Project. Update this file at the end of each working session with current state, recent decisions, and anything the next session needs to know.
Ask Claude Code: "Please update CONTEXT.md with what we did today"
Last updated: 2026-02-15 (session 8)
About Viktor (project owner)
- Works at Magyar Telekom (Budapest), building Felhom as a side business
- Felhom: managed home-server service for Hungarian households
- Technical but prefers pragmatic solutions over over-engineering
- Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management
- Customer deployments use Docker Compose (not Kubernetes) for simplicity
Current project state
felhom-controller (this repo)
- Version: v0.2.15
- Phase 1: ✅ COMPLETE — Stack Manager + Deploy Flow
- First app deployed: Paperless-ngx on demo-felhom.eu (2026-02-13)
- Running on: demo-felhom (N100 mini PC) at 192.168.0.162:8080
- All Phase 1 features working: deploy, start/stop/restart/update, logs, health-aware states, auth
What was just completed (2026-02-15 session 8)
- FileBrowser as infrastructure service:
- Created
scripts/hdd-setup.sh(adapted from deploy-portainer) — sets up HDD folder structure withDokumentumokuser dir - Created
scripts/docker-setup.sh(adapted from deploy-portainer) — installs Docker, Traefik, FileBrowser as infra services - Added
filebrowserto protected stacks incontroller.yaml.example - Removed
templates/filebrowser/from app-catalog-felhom.eu (no longer a catalog app)
- Created
- Orphan stack detection and deletion:
- Added
Orphanedfield to Stack struct +getCatalogTemplateSlugs()helper - Orphan detection in
ScanStacks()— deployed stacks with no matching catalog template marked as orphaned - New
delete.go:DeleteStack()(compose down + HDD cleanup + dir removal),GetStackHDDData(),parseComposeHDDMounts() - Safety: protected HDD paths (root, media, storage, Dokumentumok, appdata) can never be deleted
- New API endpoints:
DELETE /api/stacks/{name}andGET /api/stacks/{name}/hdd-data - UI: orange "Elavult" badge on orphaned stacks, "Törlés" button, delete confirmation modal
- Modal shows HDD data paths/sizes, checkbox for "Felhasználói adatok törlése a merevlemezről"
- Hides "Frissítés" and "Részletek" buttons for orphaned stacks
- Added
- Verified: 1 orphaned stack detected on startup (filebrowser — now infra, removed from catalog)
- Controller version: v0.2.15
Previously completed (2026-02-14 session 7)
- Fixed YAML parse error in romm
.felhom.yml(app-catalog repo):- Root cause: Hungarian opening quote
„(U+201E) paired with ASCII"(0x22) inside YAML double-quoted strings terminated the string prematurely - Affected lines:
help_textfor IGDB Client Secret and SteamGridDB API Key fields - Fix: escaped inner ASCII double quotes with
\"in the YAML strings - This caused
LoadMetadata()to silently fail and return empty defaults for ALL romm metadata (tagline, resources, category — everything)
- Root cause: Hungarian opening quote
- Added error logging to
LoadMetadata()inmetadata.go:[ERROR]log on YAML parse failure (was silently swallowed — critical bug)- Temporary
[DEBUG]log used for diagnosis, then removed
- Fixed deploy command in CLAUDE.md:
sedpattern now targets onlyimage:lines (was matching service name too, breaking YAML)- Added
sudofor both sed and docker compose (directory is root-owned)
- Controller version: v0.2.14
Previously completed (2026-02-14 session 6)
- Bug fix: App info logo SVG rendering —
.app-info-logoCSS intemplates.go:- Added
min-width,min-height,max-width,max-height: 80pxandoverflow: hidden - Prevents SVG images with explicit dimensions or no viewBox from overflowing container
- Logo now reliably renders at 80x80 regardless of SVG intrinsic size
- Added
- Controller version: v0.2.12
Previously completed (2026-02-14 session 5)
- App detail/info pages — new feature:
- New route:
GET /apps/{slug}renders a full info page (was redirect to deploy page) - Hero section with logo, tagline, resource badges
- Screenshots section (graceful — hidden via
onerrorif assets don't exist) - Info cards: use cases, first steps, prerequisites, default credentials, docs link
- Optional config form with AJAX save (POST
/api/stacks/{name}/optional-config) - New
.felhom.ymlfields:app_info(tagline, use_cases, first_steps, prerequisites, default_creds, docs_url) andoptional_config(groups of env var fields) - New structs in
metadata.go:AppInfo,OptionalConfigGroup,OptionalConfigField UpdateOptionalConfigindeploy.go: saves optional env vars toapp.yaml, restarts deployed stacks withdocker compose up -dto pick up new env vars- Navigation updated: stack cards on dashboard/stacks pages now link to
/apps/{slug}, deploy page has "Részletek" link back to info page
- New route:
- RoMM metadata updated (app-catalog repo):
- Full
app_infosection: tagline, 5 use cases, 6 first steps, 3 prerequisites, default creds, docs URL - 6 optional config fields for metadata providers: IGDB (client_id + secret), SteamGridDB, ScreenScraper (user + password), MobyGames
- docker-compose.yml updated with SCREENSCRAPER_USER, SCREENSCRAPER_PASSWORD, MOBYGAMES_API_KEY env vars
- Display name fixed: "ROMM" → "RomM"
- Full
- Controller version: v0.2.11
Previously completed (2026-02-14 session 4)
- Fixed deploy race condition in
internal/stacks/deploy.go:- In-memory
Deployedflag now set BEFOREdocker compose up -d(compose up can take 30-60s for image pulls) - On failure: both in-memory state and disk (app.yaml) are reverted
- Eliminates stale "Telepítés" button during long compose operations
- In-memory
- Added
checkBeforeDeploy()JS guard ininternal/web/templates.go:- Telepítés buttons on Vezérlőpult and Alkalmazások pages now fetch live state from
/api/stacks/{name}before navigating - If app is already deployed (e.g., another tab deployed it), shows alert and reloads page instead of navigating to deploy form
- Catches stale UI state gracefully
- Telepítés buttons on Vezérlőpult and Alkalmazások pages now fetch live state from
Previously completed (2026-02-14 session 3)
- Enhanced debug logging across all stack operations in
internal/stacks/:- Operation timing: All stack ops (start, stop, restart, update, deploy) now log elapsed time
- Post-start container state check: Async goroutine after start/restart/update/deploy
- Image pull detection: Checks local images before deploy/update (debug level)
- GetLogs/ScanStacks improvements: Byte count logging, deployed/available counts
- All verbose checks gated on
cfg.Logging.Level == "debug"; timing always at INFO
- UI improvements in
internal/web/templates.goandserver.go:- Memory bar fix on deploy page: Bar segments now always visible (min-width: 3px), new app segment uses translucent green with distinct border for clear visual separation from committed memory
- Clickable app cards: Cards on Vezérlőpult and Alkalmazások pages are now clickable (navigates to deploy/detail page). Uses
data-hrefattribute + delegated click handler. Protected stacks excluded. Actions area (buttons, state labels) excluded from click-to-navigate - Live-scrolling logs: Logs page now auto-refreshes every 3s via AJAX polling (
?raw=1returns plain text). Fixed-height container (70vh) with auto-scroll to bottom. Pulsing green "Élő" indicator. Pause/resume toggle ("Szüneteltetés"/"Folytatás"). User scroll position preserved when scrolled up to read history - Deployment progress UI: Deploy button no longer shows alert+redirect immediately. Instead shows 3-step progress panel: config saved → containers starting → app initializing. Polls
GET /api/stacks/{name}every 3s to track actual container health state. Handles running (auto-redirect), starting (keep polling), unhealthy (warning), exited (error), and 120s timeout. Shows elapsed time counter
- Mealie healthcheck fix (app-catalog-felhom.eu):
wget --spiderreplaced with Python TCP socket check — mealie image doesn't include wgetstart_periodincreased to 60s (DB migrations take ~40s on first start)
- Healthcheck audit: filebrowser (Alpine, has BusyBox wget — OK), stirling-pdf (Ubuntu, has wget — OK)
Previously completed (2026-02-15 session 2)
- Phase 4: Git Sync + App Catalog Audit — major milestone
- Git sync module (
internal/sync/sync.go):- Clones/pulls app-catalog-felhom.eu repo to local cache on startup
- Periodic sync based on
git.sync_interval(default 15m) - Copies
docker-compose.yml+.felhom.ymlto stacks dir (never overwritesapp.yaml/.env) - SHA-256 content comparison — only writes changed files
- Triggers
ScanStacks()after sync so dashboard updates immediately - Uses
os/execgit CLI — no Go git library dependency
- Manual sync button ("Sablonok frissítése") on Alkalmazások page:
POST /api/syncendpoint with 30s debounce- Toast notification shows result (success/failure/what changed)
- Auto-reloads page if new apps or updates detected
- Sync status added to
/api/system/info(last_sync, last_status, syncing flag) - .felhom.yml files created for all 10 apps (paperless-ngx already had one):
- actualbudget, docmost, filebrowser, homebox, immich, mealie, romm, stirling-pdf, vaultwarden
- All follow the same format: display_name, description, category, subdomain, resources, deploy_fields
- Docker Compose templates audited and fixed for all 10 apps:
- Fixed
{{DOMAIN}}→${DOMAIN}syntax in homebox, mealie, romm, stirling-pdf - Fixed
{{HDD_PATH}}→${HDD_PATH}in romm - Added
deploy.resources.limits.memoryto all services across all templates - Added
TZ=Europe/Budapestto all sidecar services (postgres, redis, mariadb) - Added healthcheck to romm main service
- Added
romm-rediscondition: service_healthy(wasservice_started) - Standardized header comment blocks across all templates
- Fixed
- Documentation updated: app-catalog README, CLAUDE.md, CONTEXT.md
Previously completed (2026-02-15 session 1)
- Memory validation during deployment:
- Pre-deploy memory check: compares
mem_requestsum against usable system RAM - Hard block if requests exceed usable memory (total - 384MB reserved)
- Soft warning if
mem_limitsum exceeds total RAM (overcommit OK for limits) ParseMemoryMB()supports "500M", "1G", "1.5G", "1024" formatsCommittedMemory()sums requests/limits across all deployed stacks- Memory summary bar shown on deploy page before user clicks deploy
system.reserved_memory_mbconfigurable in controller.yaml (default: 384)
- Pre-deploy memory check: compares
- Display:
~prefix on mem_request in UI badges (display-only, exact value stored) - Felhom.eu logo replaced text logos in sidebar and login page with actual SVG logo
- Logo SVG embedded as Go string constant, served at
/static/felhom-logo.svg
- Logo SVG embedded as Go string constant, served at
Previously completed (2026-02-14)
- System info bar on Vezérlőpult dashboard: RAM, SSD, and optional HDD usage
- Progress bars with color coding (green < 70%, yellow 70-85%, red > 85%)
- New
internal/systempackage reads/proc/meminfo+syscall.Statfs - Platform-specific: Linux impl + non-Linux stub (build tags)
- Hungarian labels: "Memória", "SSD tárhely", "Külső HDD"
- Docker Compose memory limits on paperless-ngx template:
- paperless-webserver: 768M, postgres: 256M, redis: 128M
- Added
mem_limitfield to.felhom.ymlResourceHints (total: 1152M)
/api/system/infoendpoint now returns live system metrics (was customer info)- Config: Added
paths.hdd_pathfor external HDD monitoring - Controller image builds via build.sh, pushes to Gitea container registry
Previously completed (2026-02-13)
- Built the entire felhom-controller from scratch (Go, no frameworks)
- Debugged and fixed 7 issues during first real deployment:
- Password validation (empty passwords accepted)
- In-memory Deployed flag not updating after deploy
- Health-aware state parsing (starting/unhealthy detection)
- Random card ordering (Go map iteration)
- "Részletek" button redirect for deployed apps
- Paperless OCR language installation (LANGUAGES vs LANGUAGE env var)
- Documentation: restart vs up -d for image updates
What's next (priorities)
- Test orphan delete flow — try deleting the orphaned filebrowser stack via the UI
- Add
app_info+optional_configto more apps (start with Immich, Mealie, Vaultwarden) - Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) to validate all .felhom.yml files
- Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
- Test on Raspberry Pi (pi-customer-1)
- Add
paths.hdd_pathto demo-felhom controller.yaml to enable HDD bar - Phase 2 continued: CPU/temperature metrics, Healthchecks.io pings
- Phase 3: Backup system (DB dumps + restic)
Architecture decisions
| Decision | Rationale |
|---|---|
| Go stdlib for web (no Gin/Echo) | Minimal dependencies, single binary, easy to embed templates |
| Templates as Go string constants | Zero runtime file dependencies, everything in the binary |
| Docker Compose for customers (not k8s) | Simpler troubleshooting, customers don't need k8s knowledge |
| k3s for management infra only | Viktor's own services (gitea, monitoring, website) run on k3s |
| Cloudflare Tunnel for remote access | No port forwarding needed, works behind any NAT |
| app.yaml per stack | Separates deploy config from compose files, survives git pulls |
| Password fields require explicit input | Prevents accidental empty-password deployments |
| Health-aware state from Docker Status field | Docker's State says "running" even for unhealthy containers |
| Memory limits via deploy.resources.limits | Prevents runaway containers; ~50% headroom over expected usage |
| System info from /proc/meminfo + statfs | No external dependencies, cheap to read on each page load |
| mem_request vs mem_limit (K8s-inspired) | Requests = expected usage (hard block), limits = peak (overcommit OK) |
| 384MB reserved for system | Prevents deploying apps that would starve the OS/controller |
| Logo SVG embedded as Go constant | Same approach as CSS/HTML — zero external file deps |
| Git sync via os/exec git CLI | No Go git library needed, git is in the container image |
| SHA-256 for content comparison | Only copy changed files, avoid unnecessary disk writes |
| 30s debounce on manual sync | Prevents spamming the git server |
| Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI |
| FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script |
| Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) |
Key file locations on demo-felhom
/opt/docker/felhom-controller/ # Controller compose + config
├── controller.yaml # Customer config (domain, auth, paths)
├── docker-compose.yml # Controller's own compose
└── .env # DOMAIN=demo-felhom.eu
/opt/docker/stacks/ # All app stacks
├── traefik/ # Reverse proxy (protected)
├── cloudflared/ # Tunnel (protected)
├── paperless-ngx/ # First deployed app ✅
│ ├── docker-compose.yml
│ ├── .felhom.yml # App metadata
│ └── app.yaml # Deploy config (env vars, locked fields)
└── whoami/ # Test stack (not deployed)
/mnt/hdd_placeholder/storage/ # HDD storage for apps
└── paperless/
├── consume/ # Drop files here for OCR
├── media/ # Processed documents
└── export/ # Backup exports
Related repositories and their state
| Repository | Status | Notes |
|---|---|---|
| deploy-felhom-compose | Active | This repo. Controller code + deploy scripts |
| app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits |
| felhom.eu | Stable | Website live, SEO indexed, email working |
| homelab-manifests | Stable | k3s cluster running (dooplex.hu services) |
| misc-scripts | Utility | collect-repo.sh, backup helpers |
Gotchas & lessons learned
docker compose restart≠docker compose up -d— restart doesn't pick up new images- Go maps have random iteration order — always sort slices before displaying
- Docker
.State="running" doesn't mean healthy — check.Statusfor "(health: starting)" / "(unhealthy)" - Paperless-ngx needs
PAPERLESS_OCR_LANGUAGES(plural) to install language packs,PAPERLESS_OCR_LANGUAGE(singular) to select - In-memory Deployed flag must be set BEFORE
docker compose up -d(not after) — compose can take 30-60s for image pulls, during which the UI would show a stale "Telepítés" button - Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers
- BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage
docker compose up -dreturns exit 0 even when containers immediately crash-loop — need post-start status check to detect this- When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files
- Mealie image (
ghcr.io/mealie-recipes/mealie) doesn't include wget/curl — use Python TCP socket check for healthcheck - Mealie DB migrations on first start take ~40s (alembic) — use
start_period: 60sto avoid premature unhealthy status - Alpine-based images (filebrowser, vaultwarden) have wget via BusyBox — healthchecks with
wget --spiderwork fine - Deploy
sedcommand to update image version must target only theimage:line — naivesed 's|name:OLD|name:NEW|'also matches the service name line (e.g.,felhom-controller:→felhom-controller:0.2.12), breaking YAML. Usesudo sed -i 's|image:.*felhom-controller:[^ ]*|image: ...felhom-controller:NEW|'or similar scoped pattern - Hungarian quotation marks
„"in YAML:„(U+201E) is safe inside YAML double-quoted strings, but the closing"must NOT be ASCII"(0x22) — it terminates the YAML string. Use\"escape or Unicode"(U+201D). This caused a silent parse failure for the entire.felhom.ymlfile - Never silently swallow parse errors — always log them. Silent failures make debugging impossible (took a dedicated debug session to find a simple quoting issue)