Files
deploy-felhom-compose/CONTEXT.md
T

191 lines
12 KiB
Markdown

# CONTEXT.md — Project Memory
> This file serves as persistent project memory across Claude Code sessions.
> It replaces the auto-generated "Memory" from the claude.ai Project.
> **Update this file at the end of each working session** with current state,
> recent decisions, and anything the next session needs to know.
>
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
Last updated: 2026-02-14 (session 3)
---
## About Viktor (project owner)
- Works at Magyar Telekom (Budapest), building Felhom as a side business
- Felhom: managed home-server service for Hungarian households
- Technical but prefers pragmatic solutions over over-engineering
- Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management
- Customer deployments use Docker Compose (not Kubernetes) for simplicity
## Current project state
### felhom-controller (this repo)
- **Version:** v0.2.1
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
- **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13)
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
- **All Phase 1 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth
### What was just completed (2026-02-14 session 3)
- **Enhanced debug logging** across all stack operations in `internal/stacks/`:
- **Operation timing**: All stack ops (start, stop, restart, update, deploy) now log elapsed time
- Success: `[INFO] Stack immich started successfully (took 45.2s)`
- Failure: `[ERROR] Stack immich start failed after 3.1s: exit code 1`
- **`composeExecCustomEnv` improvements**:
- Logs env var **keys** at debug level (never values — secrets stay safe)
- Logs exit code, truncated stdout/stderr (max 500 chars) on failure
- Logs command completion time on success
- **Post-start container state check** (StartStack, RestartStack, UpdateStack, DeployStack):
- Async goroutine: sleeps 3s, runs `docker compose ps -a`, logs each container's state
- Critical for detecting crash-loops that `docker compose up -d` wouldn't surface
- Non-blocking — never fails the operation, just logs a warning if check fails
- **Image pull detection** (DeployStack, UpdateStack at debug level):
- Parses `docker-compose.yml` for `image:` lines
- Runs `docker image inspect` per image to check local availability
- Skips images with `${VAR}` interpolation (can't resolve at check time)
- **GetLogs improvement**: Logs byte count of returned logs (distinguishes empty vs failure)
- **ScanStacks improvement**: `[INFO] Scanned stacks: 10 found (3 deployed, 7 available)`
- **New helpers added to manager.go**: `isDebug()`, `truncateStr()`, `logPostStartStatus()`, `checkLocalImages()`
- All verbose checks gated on `cfg.Logging.Level == "debug"`; timing and container states always logged at INFO
### Previously completed (2026-02-15 session 2)
- **Phase 4: Git Sync + App Catalog Audit** — major milestone
- **Git sync module** (`internal/sync/sync.go`):
- Clones/pulls app-catalog-felhom.eu repo to local cache on startup
- Periodic sync based on `git.sync_interval` (default 15m)
- Copies `docker-compose.yml` + `.felhom.yml` to stacks dir (never overwrites `app.yaml`/`.env`)
- SHA-256 content comparison — only writes changed files
- Triggers `ScanStacks()` after sync so dashboard updates immediately
- Uses `os/exec` git CLI — no Go git library dependency
- **Manual sync button** ("Sablonok frissítése") on Alkalmazások page:
- `POST /api/sync` endpoint with 30s debounce
- Toast notification shows result (success/failure/what changed)
- Auto-reloads page if new apps or updates detected
- **Sync status** added to `/api/system/info` (last_sync, last_status, syncing flag)
- **.felhom.yml files created for all 10 apps** (paperless-ngx already had one):
- actualbudget, docmost, filebrowser, homebox, immich, mealie, romm, stirling-pdf, vaultwarden
- All follow the same format: display_name, description, category, subdomain, resources, deploy_fields
- **Docker Compose templates audited and fixed** for all 10 apps:
- Fixed `{{DOMAIN}}``${DOMAIN}` syntax in homebox, mealie, romm, stirling-pdf
- Fixed `{{HDD_PATH}}``${HDD_PATH}` in romm
- Added `deploy.resources.limits.memory` to all services across all templates
- Added `TZ=Europe/Budapest` to all sidecar services (postgres, redis, mariadb)
- Added healthcheck to romm main service
- Added `romm-redis` `condition: service_healthy` (was `service_started`)
- Standardized header comment blocks across all templates
- **Documentation updated**: app-catalog README, CLAUDE.md, CONTEXT.md
### Previously completed (2026-02-15 session 1)
- **Memory validation during deployment**:
- Pre-deploy memory check: compares `mem_request` sum against usable system RAM
- Hard block if requests exceed usable memory (total - 384MB reserved)
- Soft warning if `mem_limit` sum exceeds total RAM (overcommit OK for limits)
- `ParseMemoryMB()` supports "500M", "1G", "1.5G", "1024" formats
- `CommittedMemory()` sums requests/limits across all deployed stacks
- Memory summary bar shown on deploy page before user clicks deploy
- `system.reserved_memory_mb` configurable in controller.yaml (default: 384)
- **Display: `~` prefix on mem_request** in UI badges (display-only, exact value stored)
- **Felhom.eu logo** replaced text logos in sidebar and login page with actual SVG logo
- Logo SVG embedded as Go string constant, served at `/static/felhom-logo.svg`
### Previously completed (2026-02-14)
- **System info bar on Vezérlőpult dashboard**: RAM, SSD, and optional HDD usage
- Progress bars with color coding (green < 70%, yellow 70-85%, red > 85%)
- New `internal/system` package reads `/proc/meminfo` + `syscall.Statfs`
- Platform-specific: Linux impl + non-Linux stub (build tags)
- Hungarian labels: "Memória", "SSD tárhely", "Külső HDD"
- **Docker Compose memory limits** on paperless-ngx template:
- paperless-webserver: 768M, postgres: 256M, redis: 128M
- Added `mem_limit` field to `.felhom.yml` ResourceHints (total: 1152M)
- **`/api/system/info` endpoint** now returns live system metrics (was customer info)
- **Config**: Added `paths.hdd_path` for external HDD monitoring
- Controller image builds via build.sh, pushes to Gitea container registry
### Previously completed (2026-02-13)
- Built the entire felhom-controller from scratch (Go, no frameworks)
- Debugged and fixed 7 issues during first real deployment:
1. Password validation (empty passwords accepted)
2. In-memory Deployed flag not updating after deploy
3. Health-aware state parsing (starting/unhealthy detection)
4. Random card ordering (Go map iteration)
5. "Részletek" button redirect for deployed apps
6. Paperless OCR language installation (LANGUAGES vs LANGUAGE env var)
7. Documentation: restart vs up -d for image updates
### What's next (priorities)
1. Build + deploy the updated controller with git sync module
2. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) to validate all .felhom.yml files
3. Test git sync end-to-end: push a template change to app-catalog, verify controller picks it up
4. Test on Raspberry Pi (pi-customer-1)
5. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
6. Phase 2 continued: CPU/temperature metrics, Healthchecks.io pings
7. Phase 3: Backup system (DB dumps + restic)
## Architecture decisions
| Decision | Rationale |
|----------|-----------|
| Go stdlib for web (no Gin/Echo) | Minimal dependencies, single binary, easy to embed templates |
| Templates as Go string constants | Zero runtime file dependencies, everything in the binary |
| Docker Compose for customers (not k8s) | Simpler troubleshooting, customers don't need k8s knowledge |
| k3s for management infra only | Viktor's own services (gitea, monitoring, website) run on k3s |
| Cloudflare Tunnel for remote access | No port forwarding needed, works behind any NAT |
| app.yaml per stack | Separates deploy config from compose files, survives git pulls |
| Password fields require explicit input | Prevents accidental empty-password deployments |
| Health-aware state from Docker Status field | Docker's State says "running" even for unhealthy containers |
| Memory limits via deploy.resources.limits | Prevents runaway containers; ~50% headroom over expected usage |
| System info from /proc/meminfo + statfs | No external dependencies, cheap to read on each page load |
| mem_request vs mem_limit (K8s-inspired) | Requests = expected usage (hard block), limits = peak (overcommit OK) |
| 384MB reserved for system | Prevents deploying apps that would starve the OS/controller |
| Logo SVG embedded as Go constant | Same approach as CSS/HTML — zero external file deps |
| Git sync via os/exec git CLI | No Go git library needed, git is in the container image |
| SHA-256 for content comparison | Only copy changed files, avoid unnecessary disk writes |
| 30s debounce on manual sync | Prevents spamming the git server |
## Key file locations on demo-felhom
```
/opt/docker/felhom-controller/ # Controller compose + config
├── controller.yaml # Customer config (domain, auth, paths)
├── docker-compose.yml # Controller's own compose
└── .env # DOMAIN=demo-felhom.eu
/opt/docker/stacks/ # All app stacks
├── traefik/ # Reverse proxy (protected)
├── cloudflared/ # Tunnel (protected)
├── paperless-ngx/ # First deployed app ✅
│ ├── docker-compose.yml
│ ├── .felhom.yml # App metadata
│ └── app.yaml # Deploy config (env vars, locked fields)
└── whoami/ # Test stack (not deployed)
/mnt/hdd_placeholder/storage/ # HDD storage for apps
└── paperless/
├── consume/ # Drop files here for OCR
├── media/ # Processed documents
└── export/ # Backup exports
```
## Related repositories and their state
| Repository | Status | Notes |
|------------|--------|-------|
| deploy-felhom-compose | Active | This repo. Controller code + deploy scripts |
| app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits |
| felhom.eu | Stable | Website live, SEO indexed, email working |
| homelab-manifests | Stable | k3s cluster running (dooplex.hu services) |
| misc-scripts | Utility | collect-repo.sh, backup helpers |
## Gotchas & lessons learned
- `docker compose restart``docker compose up -d` — restart doesn't pick up new images
- Go maps have random iteration order — always sort slices before displaying
- Docker `.State`="running" doesn't mean healthy — check `.Status` for "(health: starting)" / "(unhealthy)"
- Paperless-ngx needs `PAPERLESS_OCR_LANGUAGES` (plural) to install language packs, `PAPERLESS_OCR_LANGUAGE` (singular) to select
- After deploying a stack, update the in-memory Deployed flag immediately — RefreshStatus() only reads docker ps
- Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers
- BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage
- `docker compose up -d` returns exit 0 even when containers immediately crash-loop — need post-start status check to detect this
- When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files