docs: update CONTEXT.md and README.md for v0.4.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-15 11:20:30 +01:00
parent d32d9fb44b
commit 1596e86e69
2 changed files with 118 additions and 29 deletions
+71 -12
View File
@@ -7,7 +7,7 @@
>
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
Last updated: 2026-02-15 (session 9)
Last updated: 2026-02-15 (session 10)
---
@@ -22,13 +22,65 @@ Last updated: 2026-02-15 (session 9)
## Current project state
### felhom-controller (this repo)
- **Version:** v0.3.0
- **Version:** v0.4.0
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
- **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings)
- **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger)
- **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13)
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
- **All Phase 1 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth
- **All Phase 1-3 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups
### What was just completed (2026-02-15 session 9)
### What was just completed (2026-02-15 session 10)
- **v0.4.0 — Monitoring & Health + Backups (Phase 2 & 3):**
- **Central job scheduler** (`internal/scheduler/scheduler.go`):
- Replaces ad-hoc goroutines in main.go with a unified scheduler
- `Every(name, interval, fn)` for periodic jobs, `Daily(name, timeStr, fn)` for scheduled tasks
- Panic recovery, skip-if-running, quiet mode for high-frequency jobs (≤30s)
- Daily jobs use `Europe/Budapest` timezone with `time.Timer` for DST correctness
- Graceful shutdown with 30s timeout for running jobs
- **CPU usage collector** (`internal/system/cpu_linux.go`):
- Background goroutine samples `/proc/stat` every 5s, computes delta-based CPU %
- Platform stubs for non-Linux in `cpu_other.go`
- **Temperature & load metrics** (`internal/system/info_linux.go`):
- Reads `/proc/loadavg` for 1/5/15 min load averages
- Reads thermal zones from `/host/sys/class/thermal/` (Docker mount) with `/sys/` fallback
- Handles millidegree values, picks highest zone, with hwmon fallback
- **Healthchecks.io pinger** (`internal/monitor/pinger.go`):
- HTTP ping client for Healthchecks.io-compatible endpoints
- POST to `/ping/{uuid}` (success), `/fail` (failure), `/start` (started)
- 10s timeout, 3 retries with 2s backoff, skips CHANGEME UUIDs
- **System health checks** (`internal/monitor/healthcheck.go`):
- Checks disk, memory, CPU, temperature, Docker reachability, protected containers
- Returns HealthReport with status "ok"/"warn"/"fail" + formatted message for pings
- **Database dump engine** (`internal/backup/dbdump.go`):
- Auto-discovers PostgreSQL/MariaDB containers via `docker ps` + `docker inspect`
- Dumps via `docker exec pg_dump`/`mariadb-dump` with 5min timeout
- Atomic writes (`.tmp``.sql`), empty file detection, stale temp cleanup
- **Restic integration** (`internal/backup/restic.go`):
- Auto-generates repository password (32 random bytes, base64url)
- Init, snapshot (JSON output), prune, check, stats, latest snapshot
- Stale lock detection with automatic unlock + retry
- **Backup orchestrator** (`internal/backup/backup.go`):
- DB dumps + restic snapshots, weekly prune on Sundays
- Thread-safe running flag, Healthchecks.io pings with results
- `RunFullBackup()` for manual trigger (sequential: dumps → snapshot)
- **Wiring updates:**
- `main.go`: scheduler-based job registration, cpuCollector lifecycle, pinger + backupMgr init
- `api/router.go`: `GET /api/backup/status`, `POST /api/backup/run`
- `web/server.go` + `handlers.go`: pass cpuCollector to GetInfo(), backup status on dashboard
- `funcmap.go`: `tempColor`, `fmtTemp`, `fmtLoad` template functions
- **Dashboard UI enhancements:**
- CPU usage bar with load average display below
- Temperature with colored indicator dot (green/yellow/red at 60°/75°C)
- Backup status card: last run time, DB count, repo size/snapshots
- "Mentés most" button triggers manual backup via API
- **Config updates:**
- `controller.yaml.example`: added `system_health_interval`, `hdd_path`, `system.reserved_memory_mb`
- `docker-compose.yml`: added `/sys:/host/sys:ro` mount for temperature reading
- `restic_password_file` default changed to `data/` subdir (auto-generated in named volume)
- **Controller version:** v0.4.0 — deployed and verified on demo-felhom.eu
### What was previously completed (2026-02-15 session 9)
- **v0.3.0 — Structural refactoring (templates + server split + domain rename):**
- **Templates: go:embed migration** — moved all 7 HTML templates + CSS from Go string constants to individual files in `internal/web/templates/`. Created `embed.go` with `//go:embed` directive. Template loading now uses `ParseFS()` instead of `Parse()`. CSS served from embed.FS via `ReadFile()`. Zero runtime file dependencies — still compiled into the binary.
- **Server decomposition** — split monolithic `server.go` (540 lines) into focused files:
@@ -190,14 +242,15 @@ Last updated: 2026-02-15 (session 9)
7. Documentation: restart vs up -d for image updates
### What's next (priorities)
1. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI
2. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
3. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) to validate all .felhom.yml files
4. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
5. Test on Raspberry Pi (pi-customer-1)
6. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
7. Phase 2 continued: CPU/temperature metrics, Healthchecks.io pings
8. Phase 3: Backup system (DB dumps + restic)
1. **Configure Healthchecks.io UUIDs** on demo-felhom.eu (replace CHANGEME in controller.yaml)
2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps
3. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI
4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets)
6. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
7. Test on Raspberry Pi (pi-customer-1)
8. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
9. Phase 4: Self-update mechanism
## Architecture decisions
@@ -222,6 +275,12 @@ Last updated: 2026-02-15 (session 9)
| Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI |
| FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script |
| Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) |
| Central scheduler (not ad-hoc goroutines) | Single place to register/monitor all periodic tasks, graceful shutdown, skip-if-running |
| CPU sampling via background goroutine | /proc/stat delta needs two readings — collector runs every 5s, GetInfo() reads cached value |
| Temperature from /host/sys (Docker mount) | Container can't read host /sys directly — mount /sys:/host/sys:ro, try /host/sys first |
| Restic password auto-generated | No manual setup needed — generated on first backup run, stored in named volume |
| DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars |
| Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger |
## Key file locations on demo-felhom