36afd828a1
The gtstef/filebrowser image bakes FILEBROWSER_CONFIG=/home/filebrowser/data/config.yaml, but controller mounts config at /home/filebrowser/config.yaml. Override the env var in both generateFileBrowserCompose() and docker-setup.sh so FileBrowser reads the controller-managed config with proper sources and database path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 KiB
14 KiB
CONTEXT.md — Project Memory
This file serves as persistent project memory across Claude Code sessions. It replaces the auto-generated "Memory" from the claude.ai Project. Update this file at the end of each working session with current state, recent decisions, and anything the next session needs to know.
Ask Claude Code: "Please update CONTEXT.md with what we did today"
Last updated: 2026-02-19 (session 59)
About Viktor (project owner)
- Works at Deutsche Telekom (Budapest), building Felhom.eu as a side business
- Felhom.eu: managed home-server service for Hungarian households
- Technical but prefers pragmatic solutions over over-engineering
- Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management
- Customer deployments use Docker Compose (not Kubernetes) for simplicity
felhom-controller (this repo)
- Version: v0.16.1
- Phase 1: ✅ COMPLETE — Stack Manager + Deploy Flow
- Phase 2: ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings)
- Phase 3: ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, dedicated backup page)
- Phase 4: ✅ COMPLETE — Monitoring Page with Metrics Store (SQLite, Chart.js, system + container metrics)
- Phase 5: ✅ COMPLETE — Authentication, Persistence & Settings Page (settings.json, password change, session management)
- Phase 6: ✅ COMPLETE — Monitoring Warnings, Dashboard Alerts & Notification System
- Phase 7: ✅ COMPLETE — Storage Overview, Per-App Backup Toggles & Limited Restore
- Phase A: ✅ COMPLETE — Storage Paths Foundation (registry, auto-discovery, per-app HDD_PATH, deploy dropdown, health monitoring)
- Phase B: ✅ COMPLETE — Storage Management UI Polish & Health Severity Fix (flash messages, label editing, app details, FS info, deploy free space, backup context)
- Phase C: ✅ COMPLETE — Storage Init Wizard, Data Migration & Startup Fix (disk scan/format/mount wizard, rsync-based migration, startup pings)
- v0.11.1 bugfix: ✅ COMPLETE — Storage Scan: system disk detection via host fstab + blkid UUID resolution; FSType enrichment via
blkid -o export - v0.11.2 bugfix: ✅ COMPLETE — /host-dev mount for block device access;
HostDevicePath()helper; all format/scan/safety ops use /host-dev - v0.11.3 bugfix: ✅ COMPLETE — Added
fdiskpackage to Dockerfile (providessfdisk; not inutil-linuxon Debian bookworm) - v0.11.4 bugfix: ✅ COMPLETE — FormatAndMount: fixed sfdisk (wipefs+force+
,,), mount (explicit device path), mount propagation (rshared), ASCII label, smart partition skip, findmnt verification - v0.11.6: ✅ COMPLETE — FileBrowser auto-mount sync (
syncFileBrowserMounts()) + 3 UI fixes (badge color, progress bar, button text) - v0.11.7: ✅ COMPLETE — Stale data cleanup + FileBrowser sync after migration + deploy page title fix
- v0.11.8: ✅ COMPLETE — Per-App Cross-Drive Backup (3-2-1 rule): rsync/restic to secondary drive, deploy page UI, backup page summary, scheduler jobs, API endpoints
- v0.11.9: ✅ COMPLETE — UI Polish Fixes: spacing, tooltip on "Módszer", status dot instead of disabled checkbox, progressive disclosure, emoji cleanup
- First app deployed: Paperless-ngx on demo-felhom.eu (2026-02-13)
- Running on: demo-felhom (N100 mini PC) at 192.168.0.162:8080, felhotest (Proxmox VM) at router.abonet.hu:33022
- All Phase 1-5 features working: deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page, settings page
Architecture decisions
| Decision | Rationale |
|---|---|
| Go stdlib for web (no Gin/Echo) | Minimal dependencies, single binary, easy to embed templates |
| Templates as go:embed HTML/CSS files | Zero runtime file dependencies (compiled into binary), but each template is a separate editable file |
| Docker Compose for customers (not k8s) | Simpler troubleshooting, customers don't need k8s knowledge |
| k3s for management infra only | Viktor's own services (gitea, monitoring, website) run on k3s |
| Cloudflare Tunnel for remote access | No port forwarding needed, works behind any NAT |
| app.yaml per stack | Separates deploy config from compose files, survives git pulls |
| Password fields require explicit input | Prevents accidental empty-password deployments |
| Health-aware state from Docker Status field | Docker's State says "running" even for unhealthy containers |
| Memory limits via deploy.resources.limits | Prevents runaway containers; ~50% headroom over expected usage |
| System info from /proc/meminfo + statfs | No external dependencies, cheap to read on each page load |
| mem_request vs mem_limit (K8s-inspired) | Requests = expected usage (hard block), limits = peak (overcommit OK) |
| 384MB reserved for system | Prevents deploying apps that would starve the OS/controller |
| Logo SVG embedded as Go constant | Same approach as CSS/HTML — zero external file deps |
| Git sync via os/exec git CLI | No Go git library needed, git is in the container image |
| SHA-256 for content comparison | Only copy changed files, avoid unnecessary disk writes |
| 30s debounce on manual sync | Prevents spamming the git server |
| Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI |
| FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script |
| Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) |
| Central scheduler (not ad-hoc goroutines) | Single place to register/monitor all periodic tasks, graceful shutdown, skip-if-running |
| CPU sampling via background goroutine | /proc/stat delta needs two readings — collector runs every 5s, GetInfo() reads cached value |
| Temperature from /host/sys (Docker mount) | Container can't read host /sys directly — mount /sys:/host/sys:ro, try /host/sys first |
| Restic password auto-generated | No manual setup needed — generated on first backup run, stored in named volume |
| DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars |
| Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger |
| modernc.org/sqlite (pure Go) | No CGO/gcc needed in Docker build stage — keeps CGO_ENABLED=0 static binary |
| AlertManager state-based refresh | Alerts regenerated every 5min from health report — no persistent storage needed, always reflects current state |
| Notification relay via hub | Controller → hub → Resend → email. Hub acts as central relay: knows customer email, handles Resend API. Controller only needs hub URL + API key |
| In-memory notification cooldowns | Per-event-type cooldown map (default 6h). Lost on restart = acceptable (better to re-notify than miss). No persistence needed |
| Health status change detection | Only notify on degradation (ok→warn, ok→fail, warn→fail). Avoids spam on flapping. First run records baseline, doesn't notify |
| Resend HTTP API (no SMTP) | Direct POST to api.resend.com — same pattern as website contact-mailer. Simpler than SMTP setup, good deliverability |
| Preferences sync on save + startup | Controller pushes prefs to hub (not pull). Startup sync handles hub DB rebuild. Local save always succeeds even if sync fails |
| Chart.js embedded locally | Customer hardware may not have internet — CDN not reliable for offline environments |
| StackDataProvider interface | backup package needs stack data but can't import stacks (circular). Interface in backup, thin adapter in main.go |
| Password sync to hub via report | Restic password in Docker named volume on SSD. Hub sync provides redundancy for disaster recovery |
| App backup via HDD mounts only | Docker volumes at /var/lib/docker/volumes/ not mounted in controller. HDD data is the important user data; DB in volumes covered by nightly dump |
| Restore uses running mutex | Prevents concurrent backup+restore on same restic repo. Reuses existing m.running flag |
| Storage paths registry in settings.json | Multi-storage support: each app's HDD_PATH from app.yaml is authoritative. Auto-discovery on startup avoids manual config. Registry enables UI management + health monitoring per path |
| /mnt:/mnt:rw mount in controller | Replaces per-path HDD_PATH mount. Enables multi-storage + restore writes. All customer HDD mounts are under /mnt/ by convention |
| Per-app HDD_PATH resolution (app.yaml > global) | App's own env HDD_PATH is Priority 1, registered storage paths as fallback. Eliminates dependency on global controller.yaml hdd_path |
| Mount-point detection via syscall.Stat_t.Dev | Compares device ID of path vs parent dir — reliable check that path is on separate filesystem. Prevents data writes to SSD |
| Health severity: mount-point = warning | Non-mount-point is informational, not a service failure. FAIL reserved for genuinely broken things. Avoids false alarms on demo/test environments |
| FS info via findmnt + sysfs | findmnt -n -o SOURCE,FSTYPE --target <path> for filesystem type/device. /sys/block/<dev>/device/model for disk model. Best-effort, returns nil on failure |
| Query param flash messages | Stateless, no session store needed. Consistent with backup page pattern. ?storage_msg=success&storage_detail=... |
| StorageLabels map on stacks page | Separate map passed to template (not modifying Stack struct). Built from deployed apps' HDD_PATH → registered path label lookup |
| Metrics downsampling via SQL | Bucket-based AVG in GROUP BY keeps Chart.js responsive with up to 30 days of data |
| 60s metrics collection interval | Good balance of resolution vs. storage — ~44K rows/month for system metrics |
| /etc/os-release mounted read-only | Container can't read host OS info directly — mount to /host/etc/os-release:ro |
Key file locations on demo-felhom
/opt/docker/felhom-controller/ # Controller compose + config
├── controller.yaml # Customer config (domain, auth, paths)
├── docker-compose.yml # Controller's own compose
└── data/ # Controller persistent data (named volume)
/opt/docker/stacks/ # All app stacks
├── traefik/ # Reverse proxy (protected)
├── cloudflared/ # Tunnel (protected)
├── paperless-ngx/ # First deployed app ✅
│ ├── docker-compose.yml
│ ├── .felhom.yml # App metadata
│ └── app.yaml # Deploy config (env vars, locked fields)
└── whoami/ # Test stack (not deployed)
/mnt/hdd_placeholder/storage/ # HDD storage for apps
└── paperless/
├── consume/ # Drop files here for OCR
├── media/ # Processed documents
└── export/ # Backup exports
Related repositories and their state
| Repository | Status | Notes |
|---|---|---|
| deploy-felhom-compose | Active | This repo. Controller code + deploy scripts |
| app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits |
| felhom.eu | Active | Website + hub/ subfolder (felhom-hub service) + k8s manifests |
| homelab-manifests | Stable | k3s cluster running (dooplex.hu services) |
| misc-scripts | Utility | collect-repo.sh, backup helpers |
Gotchas & lessons learned
docker compose restart≠docker compose up -d— restart doesn't pick up new images- Go maps have random iteration order — always sort slices before displaying
- Docker
.State="running" doesn't mean healthy — check.Statusfor "(health: starting)" / "(unhealthy)" - Paperless-ngx needs
PAPERLESS_OCR_LANGUAGES(plural) to install language packs,PAPERLESS_OCR_LANGUAGE(singular) to select - In-memory Deployed flag must be set BEFORE
docker compose up -d(not after) — compose can take 30-60s for image pulls, during which the UI would show a stale "Telepítés" button - Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers
- BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage
docker compose up -dreturns exit 0 even when containers immediately crash-loop — need post-start status check to detect this- When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files
- Mealie image (
ghcr.io/mealie-recipes/mealie) doesn't include wget/curl — use Python TCP socket check for healthcheck - Mealie DB migrations on first start take ~40s (alembic) — use
start_period: 60sto avoid premature unhealthy status - Alpine-based images (filebrowser, vaultwarden) have wget via BusyBox — healthchecks with
wget --spiderwork fine - Deploy
sedcommand to update image version must target only theimage:line — naivesed 's|name:OLD|name:NEW|'also matches the service name line (e.g.,felhom-controller:→felhom-controller:0.2.12), breaking YAML. Usesudo sed -i 's|image:.*felhom-controller:[^ ]*|image: ...felhom-controller:NEW|'or similar scoped pattern - Hungarian quotation marks
„"in YAML:„(U+201E) is safe inside YAML double-quoted strings, but the closing"must NOT be ASCII"(0x22) — it terminates the YAML string. Use\"escape or Unicode"(U+201D). This caused a silent parse failure for the entire.felhom.ymlfile - Never silently swallow parse errors — always log them. Silent failures make debugging impossible (took a dedicated debug session to find a simple quoting issue)