docs: update CONTEXT.md and README.md for v0.4.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-15 11:20:30 +01:00
parent d32d9fb44b
commit 1596e86e69
2 changed files with 118 additions and 29 deletions
+71 -12
View File
@@ -7,7 +7,7 @@
>
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
Last updated: 2026-02-15 (session 9)
Last updated: 2026-02-15 (session 10)
---
@@ -22,13 +22,65 @@ Last updated: 2026-02-15 (session 9)
## Current project state
### felhom-controller (this repo)
- **Version:** v0.3.0
- **Version:** v0.4.0
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
- **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings)
- **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger)
- **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13)
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
- **All Phase 1 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth
- **All Phase 1-3 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups
### What was just completed (2026-02-15 session 9)
### What was just completed (2026-02-15 session 10)
- **v0.4.0 — Monitoring & Health + Backups (Phase 2 & 3):**
- **Central job scheduler** (`internal/scheduler/scheduler.go`):
- Replaces ad-hoc goroutines in main.go with a unified scheduler
- `Every(name, interval, fn)` for periodic jobs, `Daily(name, timeStr, fn)` for scheduled tasks
- Panic recovery, skip-if-running, quiet mode for high-frequency jobs (≤30s)
- Daily jobs use `Europe/Budapest` timezone with `time.Timer` for DST correctness
- Graceful shutdown with 30s timeout for running jobs
- **CPU usage collector** (`internal/system/cpu_linux.go`):
- Background goroutine samples `/proc/stat` every 5s, computes delta-based CPU %
- Platform stubs for non-Linux in `cpu_other.go`
- **Temperature & load metrics** (`internal/system/info_linux.go`):
- Reads `/proc/loadavg` for 1/5/15 min load averages
- Reads thermal zones from `/host/sys/class/thermal/` (Docker mount) with `/sys/` fallback
- Handles millidegree values, picks highest zone, with hwmon fallback
- **Healthchecks.io pinger** (`internal/monitor/pinger.go`):
- HTTP ping client for Healthchecks.io-compatible endpoints
- POST to `/ping/{uuid}` (success), `/fail` (failure), `/start` (started)
- 10s timeout, 3 retries with 2s backoff, skips CHANGEME UUIDs
- **System health checks** (`internal/monitor/healthcheck.go`):
- Checks disk, memory, CPU, temperature, Docker reachability, protected containers
- Returns HealthReport with status "ok"/"warn"/"fail" + formatted message for pings
- **Database dump engine** (`internal/backup/dbdump.go`):
- Auto-discovers PostgreSQL/MariaDB containers via `docker ps` + `docker inspect`
- Dumps via `docker exec pg_dump`/`mariadb-dump` with 5min timeout
- Atomic writes (`.tmp``.sql`), empty file detection, stale temp cleanup
- **Restic integration** (`internal/backup/restic.go`):
- Auto-generates repository password (32 random bytes, base64url)
- Init, snapshot (JSON output), prune, check, stats, latest snapshot
- Stale lock detection with automatic unlock + retry
- **Backup orchestrator** (`internal/backup/backup.go`):
- DB dumps + restic snapshots, weekly prune on Sundays
- Thread-safe running flag, Healthchecks.io pings with results
- `RunFullBackup()` for manual trigger (sequential: dumps → snapshot)
- **Wiring updates:**
- `main.go`: scheduler-based job registration, cpuCollector lifecycle, pinger + backupMgr init
- `api/router.go`: `GET /api/backup/status`, `POST /api/backup/run`
- `web/server.go` + `handlers.go`: pass cpuCollector to GetInfo(), backup status on dashboard
- `funcmap.go`: `tempColor`, `fmtTemp`, `fmtLoad` template functions
- **Dashboard UI enhancements:**
- CPU usage bar with load average display below
- Temperature with colored indicator dot (green/yellow/red at 60°/75°C)
- Backup status card: last run time, DB count, repo size/snapshots
- "Mentés most" button triggers manual backup via API
- **Config updates:**
- `controller.yaml.example`: added `system_health_interval`, `hdd_path`, `system.reserved_memory_mb`
- `docker-compose.yml`: added `/sys:/host/sys:ro` mount for temperature reading
- `restic_password_file` default changed to `data/` subdir (auto-generated in named volume)
- **Controller version:** v0.4.0 — deployed and verified on demo-felhom.eu
### What was previously completed (2026-02-15 session 9)
- **v0.3.0 — Structural refactoring (templates + server split + domain rename):**
- **Templates: go:embed migration** — moved all 7 HTML templates + CSS from Go string constants to individual files in `internal/web/templates/`. Created `embed.go` with `//go:embed` directive. Template loading now uses `ParseFS()` instead of `Parse()`. CSS served from embed.FS via `ReadFile()`. Zero runtime file dependencies — still compiled into the binary.
- **Server decomposition** — split monolithic `server.go` (540 lines) into focused files:
@@ -190,14 +242,15 @@ Last updated: 2026-02-15 (session 9)
7. Documentation: restart vs up -d for image updates
### What's next (priorities)
1. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI
2. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
3. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) to validate all .felhom.yml files
4. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
5. Test on Raspberry Pi (pi-customer-1)
6. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
7. Phase 2 continued: CPU/temperature metrics, Healthchecks.io pings
8. Phase 3: Backup system (DB dumps + restic)
1. **Configure Healthchecks.io UUIDs** on demo-felhom.eu (replace CHANGEME in controller.yaml)
2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps
3. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI
4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets)
6. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
7. Test on Raspberry Pi (pi-customer-1)
8. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
9. Phase 4: Self-update mechanism
## Architecture decisions
@@ -222,6 +275,12 @@ Last updated: 2026-02-15 (session 9)
| Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI |
| FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script |
| Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) |
| Central scheduler (not ad-hoc goroutines) | Single place to register/monitor all periodic tasks, graceful shutdown, skip-if-running |
| CPU sampling via background goroutine | /proc/stat delta needs two readings — collector runs every 5s, GetInfo() reads cached value |
| Temperature from /host/sys (Docker mount) | Container can't read host /sys directly — mount /sys:/host/sys:ro, try /host/sys first |
| Restic password auto-generated | No manual setup needed — generated on first backup run, stored in named volume |
| DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars |
| Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger |
## Key file locations on demo-felhom
+47 -17
View File
@@ -24,7 +24,7 @@ controller generates secrets, saves app.yaml, runs `docker compose up -d`, and t
with Traefik routing and health checks. The dashboard correctly shows real-time container states
including health substatus (starting → healthy → running).
Current version: **v0.3.0**
Current version: **v0.4.0**
### What works
- Dashboard with live container state (green/orange/yellow/red)
@@ -47,6 +47,14 @@ Current version: **v0.3.0**
- Clickable app cards on dashboard and applications pages (navigate to info page)
- Memory bar with two-segment visualization on deploy page (committed vs new app allocation)
- Deployment progress UI: 3-step progress panel with real-time health polling (config → containers → health check)
- CPU usage bar with load average display (1/5/15 min)
- Temperature display with colored indicator dot (thermal zone reading)
- Central job scheduler replacing ad-hoc goroutines (periodic + daily jobs)
- Healthchecks.io-compatible system health pings with retry logic
- Database auto-discovery and dump (PostgreSQL/MariaDB via docker exec)
- Restic backup with auto-password generation, snapshot, prune, stats
- Backup status card on dashboard with manual "Mentés most" trigger button
- Backup API endpoints: status query and manual trigger
### Known issues / next priorities
- Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel)
@@ -101,10 +109,21 @@ controller/
│ ├── sync/
│ │ └── sync.go # Git sync: clone/pull app catalog, content-hash copy
│ ├── api/router.go # REST API endpoints
│ ├── scheduler/
│ │ └── scheduler.go # Central job scheduler (Every, Daily, skip-if-running)
│ ├── system/
│ │ ├── info.go # SystemInfo struct
│ │ ├── info_linux.go # Linux: /proc/meminfo + statfs
│ │ ── info_other.go # Non-Linux stub
│ │ ├── info_linux.go # Linux: /proc/meminfo + statfs + loadavg + temperature
│ │ ── info_other.go # Non-Linux stub
│ │ ├── cpu_linux.go # CPU collector (background /proc/stat sampling)
│ │ └── cpu_other.go # CPU collector stub (non-Linux)
│ ├── monitor/
│ │ ├── pinger.go # Healthchecks.io HTTP ping client
│ │ └── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker)
│ ├── backup/
│ │ ├── backup.go # Backup orchestrator (DB dumps + restic + prune)
│ │ ├── dbdump.go # Database auto-discovery + dump (pg_dump, mariadb-dump)
│ │ └── restic.go # Restic operations (init, snapshot, prune, stats)
│ └── web/
│ ├── server.go # HTTP server, routing, static file serving
│ ├── auth.go # Session auth, login/logout handlers
@@ -135,12 +154,12 @@ controller/
| **Config** | `internal/config/` | ✅ Done | Load & validate controller.yaml, env overrides |
| **Stacks** | `internal/stacks/` | ✅ Done | Compose operations, scanning, metadata, deploy flow |
| **API** | `internal/api/` | ✅ Done | REST endpoints (stacks, deploy, rescan, system info, health) |
| **System** | `internal/system/` | ✅ Done | System resource info (RAM, disk usage) for dashboard & API |
| **System** | `internal/system/` | ✅ Done | System resource info (RAM, disk, CPU, temperature, load) |
| **Web** | `internal/web/` | ✅ Done | Hungarian dashboard, auth, deploy pages, asset serving |
| **Sync** | `internal/sync/` | ✅ Done | Git-based app catalog sync (clone/pull, content-hash copy) |
| **Backup** | `internal/backup/` | 📲 Phase 3 | DB dumps, restic snapshots, restore |
| **Monitor** | `internal/monitor/` | 📲 Phase 2 | Health checks, Healthchecks pings, system metrics |
| **Scheduler** | `internal/scheduler/` | 📲 Phase 2 | Cron-like job runner for all periodic tasks |
| **Scheduler** | `internal/scheduler/` | ✅ Done | Central job scheduler (periodic + daily, skip-if-running) |
| **Monitor** | `internal/monitor/` | ✅ Done | Healthchecks.io pings, system health checks |
| **Backup** | `internal/backup/` | ✅ Done | DB auto-discovery + dump, restic snapshots, prune, manual trigger |
## Stack Management
@@ -352,7 +371,7 @@ docker compose up -d
| Node | Hardware | Domain | IP | Status |
|------|----------|--------|----|--------|
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | ✅ Controller v0.2.11 + Paperless-ngx running |
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | ✅ Controller v0.4.0 + Paperless-ngx running |
| pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | — | 📲 Not yet tested |
### First deployment log (Paperless-ngx on demo-felhom)
@@ -385,7 +404,9 @@ docker compose up -d
| POST | `/api/stacks/{name}/optional-config` | Yes | Update optional config env vars |
| GET | `/api/stacks/{name}/logs` | Yes | Container logs (add `?raw=1` for plain text) |
| POST | `/api/stacks/rescan` | Yes | Trigger manual stack discovery |
| GET | `/api/system/info` | Yes | System resource usage (RAM, disk, HDD) |
| GET | `/api/system/info` | Yes | System resource usage (RAM, disk, CPU, temp, load) |
| GET | `/api/backup/status` | Yes | Backup status (last run, DB dump count, repo stats) |
| POST | `/api/backup/run` | Yes | Trigger manual backup (DB dumps + restic snapshot) |
## Status & Roadmap
@@ -412,20 +433,29 @@ docker compose up -d
- [x] Alphabetically sorted stack display
- [x] Deploy page doubles as read-only config viewer for deployed apps
### Phase 2 — Monitoring & Health
### Phase 2 — Monitoring & Health ✅ COMPLETE
- [x] System metrics on dashboard (RAM, SSD, HDD usage bars)
- [x] `/api/system/info` endpoint with live resource data
- [x] Pre-deploy memory validation (mem_request hard block, mem_limit soft warning)
- [x] Memory summary bar on deploy page
- [ ] CPU and temperature metrics
- [ ] Healthchecks.io ping integration
- [x] CPU usage collector (background /proc/stat sampling, 5s interval)
- [x] CPU usage bar on dashboard with load average display
- [x] Temperature reading from /sys/class/thermal (with /host/sys Docker mount)
- [x] Temperature display with colored indicator dot (green/yellow/red)
- [x] Central job scheduler (replaces ad-hoc goroutines)
- [x] Healthchecks.io-compatible HTTP pinger with retry logic
- [x] System health checks (disk, memory, CPU, temp, Docker, protected containers)
- [ ] Customer notifications (email/Telegram)
### Phase 3 — Backups
- [ ] DB dump engine (PostgreSQL, MariaDB/MySQL, SQLite)
- [ ] Restic integration (snapshot, prune, check)
- [ ] Backup status on dashboard
- [ ] Manual backup trigger from UI
### Phase 3 — Backups ✅ COMPLETE
- [x] DB auto-discovery (PostgreSQL/MariaDB containers via docker inspect)
- [x] DB dump engine (pg_dump/mariadb-dump via docker exec, atomic writes)
- [x] Restic integration (auto-init, snapshot, prune, check, stats)
- [x] Restic password auto-generation (no manual setup needed)
- [x] Backup orchestrator (DB dumps + restic + weekly prune)
- [x] Backup status on dashboard (last run, DB count, repo stats)
- [x] Manual backup trigger from UI ("Mentés most" button)
- [x] `GET /api/backup/status` and `POST /api/backup/run` endpoints
- [ ] Restore workflow
### Phase 4 — Git Sync & Updates