docs: update CONTEXT.md and README.md for v0.4.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+71
-12
@@ -7,7 +7,7 @@
|
|||||||
>
|
>
|
||||||
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
|
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
|
||||||
|
|
||||||
Last updated: 2026-02-15 (session 9)
|
Last updated: 2026-02-15 (session 10)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -22,13 +22,65 @@ Last updated: 2026-02-15 (session 9)
|
|||||||
## Current project state
|
## Current project state
|
||||||
|
|
||||||
### felhom-controller (this repo)
|
### felhom-controller (this repo)
|
||||||
- **Version:** v0.3.0
|
- **Version:** v0.4.0
|
||||||
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
|
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
|
||||||
|
- **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings)
|
||||||
|
- **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger)
|
||||||
- **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13)
|
- **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13)
|
||||||
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
|
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
|
||||||
- **All Phase 1 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth
|
- **All Phase 1-3 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups
|
||||||
|
|
||||||
### What was just completed (2026-02-15 session 9)
|
### What was just completed (2026-02-15 session 10)
|
||||||
|
- **v0.4.0 — Monitoring & Health + Backups (Phase 2 & 3):**
|
||||||
|
- **Central job scheduler** (`internal/scheduler/scheduler.go`):
|
||||||
|
- Replaces ad-hoc goroutines in main.go with a unified scheduler
|
||||||
|
- `Every(name, interval, fn)` for periodic jobs, `Daily(name, timeStr, fn)` for scheduled tasks
|
||||||
|
- Panic recovery, skip-if-running, quiet mode for high-frequency jobs (≤30s)
|
||||||
|
- Daily jobs use `Europe/Budapest` timezone with `time.Timer` for DST correctness
|
||||||
|
- Graceful shutdown with 30s timeout for running jobs
|
||||||
|
- **CPU usage collector** (`internal/system/cpu_linux.go`):
|
||||||
|
- Background goroutine samples `/proc/stat` every 5s, computes delta-based CPU %
|
||||||
|
- Platform stubs for non-Linux in `cpu_other.go`
|
||||||
|
- **Temperature & load metrics** (`internal/system/info_linux.go`):
|
||||||
|
- Reads `/proc/loadavg` for 1/5/15 min load averages
|
||||||
|
- Reads thermal zones from `/host/sys/class/thermal/` (Docker mount) with `/sys/` fallback
|
||||||
|
- Handles millidegree values, picks highest zone, with hwmon fallback
|
||||||
|
- **Healthchecks.io pinger** (`internal/monitor/pinger.go`):
|
||||||
|
- HTTP ping client for Healthchecks.io-compatible endpoints
|
||||||
|
- POST to `/ping/{uuid}` (success), `/fail` (failure), `/start` (started)
|
||||||
|
- 10s timeout, 3 retries with 2s backoff, skips CHANGEME UUIDs
|
||||||
|
- **System health checks** (`internal/monitor/healthcheck.go`):
|
||||||
|
- Checks disk, memory, CPU, temperature, Docker reachability, protected containers
|
||||||
|
- Returns HealthReport with status "ok"/"warn"/"fail" + formatted message for pings
|
||||||
|
- **Database dump engine** (`internal/backup/dbdump.go`):
|
||||||
|
- Auto-discovers PostgreSQL/MariaDB containers via `docker ps` + `docker inspect`
|
||||||
|
- Dumps via `docker exec pg_dump`/`mariadb-dump` with 5min timeout
|
||||||
|
- Atomic writes (`.tmp` → `.sql`), empty file detection, stale temp cleanup
|
||||||
|
- **Restic integration** (`internal/backup/restic.go`):
|
||||||
|
- Auto-generates repository password (32 random bytes, base64url)
|
||||||
|
- Init, snapshot (JSON output), prune, check, stats, latest snapshot
|
||||||
|
- Stale lock detection with automatic unlock + retry
|
||||||
|
- **Backup orchestrator** (`internal/backup/backup.go`):
|
||||||
|
- DB dumps + restic snapshots, weekly prune on Sundays
|
||||||
|
- Thread-safe running flag, Healthchecks.io pings with results
|
||||||
|
- `RunFullBackup()` for manual trigger (sequential: dumps → snapshot)
|
||||||
|
- **Wiring updates:**
|
||||||
|
- `main.go`: scheduler-based job registration, cpuCollector lifecycle, pinger + backupMgr init
|
||||||
|
- `api/router.go`: `GET /api/backup/status`, `POST /api/backup/run`
|
||||||
|
- `web/server.go` + `handlers.go`: pass cpuCollector to GetInfo(), backup status on dashboard
|
||||||
|
- `funcmap.go`: `tempColor`, `fmtTemp`, `fmtLoad` template functions
|
||||||
|
- **Dashboard UI enhancements:**
|
||||||
|
- CPU usage bar with load average display below
|
||||||
|
- Temperature with colored indicator dot (green/yellow/red at 60°/75°C)
|
||||||
|
- Backup status card: last run time, DB count, repo size/snapshots
|
||||||
|
- "Mentés most" button triggers manual backup via API
|
||||||
|
- **Config updates:**
|
||||||
|
- `controller.yaml.example`: added `system_health_interval`, `hdd_path`, `system.reserved_memory_mb`
|
||||||
|
- `docker-compose.yml`: added `/sys:/host/sys:ro` mount for temperature reading
|
||||||
|
- `restic_password_file` default changed to `data/` subdir (auto-generated in named volume)
|
||||||
|
- **Controller version:** v0.4.0 — deployed and verified on demo-felhom.eu
|
||||||
|
|
||||||
|
### What was previously completed (2026-02-15 session 9)
|
||||||
- **v0.3.0 — Structural refactoring (templates + server split + domain rename):**
|
- **v0.3.0 — Structural refactoring (templates + server split + domain rename):**
|
||||||
- **Templates: go:embed migration** — moved all 7 HTML templates + CSS from Go string constants to individual files in `internal/web/templates/`. Created `embed.go` with `//go:embed` directive. Template loading now uses `ParseFS()` instead of `Parse()`. CSS served from embed.FS via `ReadFile()`. Zero runtime file dependencies — still compiled into the binary.
|
- **Templates: go:embed migration** — moved all 7 HTML templates + CSS from Go string constants to individual files in `internal/web/templates/`. Created `embed.go` with `//go:embed` directive. Template loading now uses `ParseFS()` instead of `Parse()`. CSS served from embed.FS via `ReadFile()`. Zero runtime file dependencies — still compiled into the binary.
|
||||||
- **Server decomposition** — split monolithic `server.go` (540 lines) into focused files:
|
- **Server decomposition** — split monolithic `server.go` (540 lines) into focused files:
|
||||||
@@ -190,14 +242,15 @@ Last updated: 2026-02-15 (session 9)
|
|||||||
7. Documentation: restart vs up -d for image updates
|
7. Documentation: restart vs up -d for image updates
|
||||||
|
|
||||||
### What's next (priorities)
|
### What's next (priorities)
|
||||||
1. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI
|
1. **Configure Healthchecks.io UUIDs** on demo-felhom.eu (replace CHANGEME in controller.yaml)
|
||||||
2. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
|
2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps
|
||||||
3. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) to validate all .felhom.yml files
|
3. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI
|
||||||
4. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
|
4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
|
||||||
5. Test on Raspberry Pi (pi-customer-1)
|
5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets)
|
||||||
6. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
|
6. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.)
|
||||||
7. Phase 2 continued: CPU/temperature metrics, Healthchecks.io pings
|
7. Test on Raspberry Pi (pi-customer-1)
|
||||||
8. Phase 3: Backup system (DB dumps + restic)
|
8. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar
|
||||||
|
9. Phase 4: Self-update mechanism
|
||||||
|
|
||||||
## Architecture decisions
|
## Architecture decisions
|
||||||
|
|
||||||
@@ -222,6 +275,12 @@ Last updated: 2026-02-15 (session 9)
|
|||||||
| Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI |
|
| Orphan = deployed but not in catalog | Safe lifecycle: remove from catalog → mark orphaned → user deletes via UI |
|
||||||
| FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script |
|
| FileBrowser as infra (not catalog) | Needed even after apps deleted (user browses HDD data); deployed by setup script |
|
||||||
| Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) |
|
| Protected HDD paths | Safety net: never delete top-level HDD dirs (media, storage, Dokumentumok, appdata) |
|
||||||
|
| Central scheduler (not ad-hoc goroutines) | Single place to register/monitor all periodic tasks, graceful shutdown, skip-if-running |
|
||||||
|
| CPU sampling via background goroutine | /proc/stat delta needs two readings — collector runs every 5s, GetInfo() reads cached value |
|
||||||
|
| Temperature from /host/sys (Docker mount) | Container can't read host /sys directly — mount /sys:/host/sys:ro, try /host/sys first |
|
||||||
|
| Restic password auto-generated | No manual setup needed — generated on first backup run, stored in named volume |
|
||||||
|
| DB discovery via docker inspect | No config needed — discovers postgres/mariadb containers by image name + env vars |
|
||||||
|
| Backup orchestrator with running flag | Prevents concurrent backups, supports both scheduled and manual trigger |
|
||||||
|
|
||||||
## Key file locations on demo-felhom
|
## Key file locations on demo-felhom
|
||||||
|
|
||||||
|
|||||||
+47
-17
@@ -24,7 +24,7 @@ controller generates secrets, saves app.yaml, runs `docker compose up -d`, and t
|
|||||||
with Traefik routing and health checks. The dashboard correctly shows real-time container states
|
with Traefik routing and health checks. The dashboard correctly shows real-time container states
|
||||||
including health substatus (starting → healthy → running).
|
including health substatus (starting → healthy → running).
|
||||||
|
|
||||||
Current version: **v0.3.0**
|
Current version: **v0.4.0**
|
||||||
|
|
||||||
### What works
|
### What works
|
||||||
- Dashboard with live container state (green/orange/yellow/red)
|
- Dashboard with live container state (green/orange/yellow/red)
|
||||||
@@ -47,6 +47,14 @@ Current version: **v0.3.0**
|
|||||||
- Clickable app cards on dashboard and applications pages (navigate to info page)
|
- Clickable app cards on dashboard and applications pages (navigate to info page)
|
||||||
- Memory bar with two-segment visualization on deploy page (committed vs new app allocation)
|
- Memory bar with two-segment visualization on deploy page (committed vs new app allocation)
|
||||||
- Deployment progress UI: 3-step progress panel with real-time health polling (config → containers → health check)
|
- Deployment progress UI: 3-step progress panel with real-time health polling (config → containers → health check)
|
||||||
|
- CPU usage bar with load average display (1/5/15 min)
|
||||||
|
- Temperature display with colored indicator dot (thermal zone reading)
|
||||||
|
- Central job scheduler replacing ad-hoc goroutines (periodic + daily jobs)
|
||||||
|
- Healthchecks.io-compatible system health pings with retry logic
|
||||||
|
- Database auto-discovery and dump (PostgreSQL/MariaDB via docker exec)
|
||||||
|
- Restic backup with auto-password generation, snapshot, prune, stats
|
||||||
|
- Backup status card on dashboard with manual "Mentés most" trigger button
|
||||||
|
- Backup API endpoints: status query and manual trigger
|
||||||
|
|
||||||
### Known issues / next priorities
|
### Known issues / next priorities
|
||||||
- Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel)
|
- Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel)
|
||||||
@@ -101,10 +109,21 @@ controller/
|
|||||||
│ ├── sync/
|
│ ├── sync/
|
||||||
│ │ └── sync.go # Git sync: clone/pull app catalog, content-hash copy
|
│ │ └── sync.go # Git sync: clone/pull app catalog, content-hash copy
|
||||||
│ ├── api/router.go # REST API endpoints
|
│ ├── api/router.go # REST API endpoints
|
||||||
|
│ ├── scheduler/
|
||||||
|
│ │ └── scheduler.go # Central job scheduler (Every, Daily, skip-if-running)
|
||||||
│ ├── system/
|
│ ├── system/
|
||||||
│ │ ├── info.go # SystemInfo struct
|
│ │ ├── info.go # SystemInfo struct
|
||||||
│ │ ├── info_linux.go # Linux: /proc/meminfo + statfs
|
│ │ ├── info_linux.go # Linux: /proc/meminfo + statfs + loadavg + temperature
|
||||||
│ │ └── info_other.go # Non-Linux stub
|
│ │ ├── info_other.go # Non-Linux stub
|
||||||
|
│ │ ├── cpu_linux.go # CPU collector (background /proc/stat sampling)
|
||||||
|
│ │ └── cpu_other.go # CPU collector stub (non-Linux)
|
||||||
|
│ ├── monitor/
|
||||||
|
│ │ ├── pinger.go # Healthchecks.io HTTP ping client
|
||||||
|
│ │ └── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker)
|
||||||
|
│ ├── backup/
|
||||||
|
│ │ ├── backup.go # Backup orchestrator (DB dumps + restic + prune)
|
||||||
|
│ │ ├── dbdump.go # Database auto-discovery + dump (pg_dump, mariadb-dump)
|
||||||
|
│ │ └── restic.go # Restic operations (init, snapshot, prune, stats)
|
||||||
│ └── web/
|
│ └── web/
|
||||||
│ ├── server.go # HTTP server, routing, static file serving
|
│ ├── server.go # HTTP server, routing, static file serving
|
||||||
│ ├── auth.go # Session auth, login/logout handlers
|
│ ├── auth.go # Session auth, login/logout handlers
|
||||||
@@ -135,12 +154,12 @@ controller/
|
|||||||
| **Config** | `internal/config/` | ✅ Done | Load & validate controller.yaml, env overrides |
|
| **Config** | `internal/config/` | ✅ Done | Load & validate controller.yaml, env overrides |
|
||||||
| **Stacks** | `internal/stacks/` | ✅ Done | Compose operations, scanning, metadata, deploy flow |
|
| **Stacks** | `internal/stacks/` | ✅ Done | Compose operations, scanning, metadata, deploy flow |
|
||||||
| **API** | `internal/api/` | ✅ Done | REST endpoints (stacks, deploy, rescan, system info, health) |
|
| **API** | `internal/api/` | ✅ Done | REST endpoints (stacks, deploy, rescan, system info, health) |
|
||||||
| **System** | `internal/system/` | ✅ Done | System resource info (RAM, disk usage) for dashboard & API |
|
| **System** | `internal/system/` | ✅ Done | System resource info (RAM, disk, CPU, temperature, load) |
|
||||||
| **Web** | `internal/web/` | ✅ Done | Hungarian dashboard, auth, deploy pages, asset serving |
|
| **Web** | `internal/web/` | ✅ Done | Hungarian dashboard, auth, deploy pages, asset serving |
|
||||||
| **Sync** | `internal/sync/` | ✅ Done | Git-based app catalog sync (clone/pull, content-hash copy) |
|
| **Sync** | `internal/sync/` | ✅ Done | Git-based app catalog sync (clone/pull, content-hash copy) |
|
||||||
| **Backup** | `internal/backup/` | 📲 Phase 3 | DB dumps, restic snapshots, restore |
|
| **Scheduler** | `internal/scheduler/` | ✅ Done | Central job scheduler (periodic + daily, skip-if-running) |
|
||||||
| **Monitor** | `internal/monitor/` | 📲 Phase 2 | Health checks, Healthchecks pings, system metrics |
|
| **Monitor** | `internal/monitor/` | ✅ Done | Healthchecks.io pings, system health checks |
|
||||||
| **Scheduler** | `internal/scheduler/` | 📲 Phase 2 | Cron-like job runner for all periodic tasks |
|
| **Backup** | `internal/backup/` | ✅ Done | DB auto-discovery + dump, restic snapshots, prune, manual trigger |
|
||||||
|
|
||||||
## Stack Management
|
## Stack Management
|
||||||
|
|
||||||
@@ -352,7 +371,7 @@ docker compose up -d
|
|||||||
|
|
||||||
| Node | Hardware | Domain | IP | Status |
|
| Node | Hardware | Domain | IP | Status |
|
||||||
|------|----------|--------|----|--------|
|
|------|----------|--------|----|--------|
|
||||||
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | ✅ Controller v0.2.11 + Paperless-ngx running |
|
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | ✅ Controller v0.4.0 + Paperless-ngx running |
|
||||||
| pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | — | 📲 Not yet tested |
|
| pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | — | 📲 Not yet tested |
|
||||||
|
|
||||||
### First deployment log (Paperless-ngx on demo-felhom)
|
### First deployment log (Paperless-ngx on demo-felhom)
|
||||||
@@ -385,7 +404,9 @@ docker compose up -d
|
|||||||
| POST | `/api/stacks/{name}/optional-config` | Yes | Update optional config env vars |
|
| POST | `/api/stacks/{name}/optional-config` | Yes | Update optional config env vars |
|
||||||
| GET | `/api/stacks/{name}/logs` | Yes | Container logs (add `?raw=1` for plain text) |
|
| GET | `/api/stacks/{name}/logs` | Yes | Container logs (add `?raw=1` for plain text) |
|
||||||
| POST | `/api/stacks/rescan` | Yes | Trigger manual stack discovery |
|
| POST | `/api/stacks/rescan` | Yes | Trigger manual stack discovery |
|
||||||
| GET | `/api/system/info` | Yes | System resource usage (RAM, disk, HDD) |
|
| GET | `/api/system/info` | Yes | System resource usage (RAM, disk, CPU, temp, load) |
|
||||||
|
| GET | `/api/backup/status` | Yes | Backup status (last run, DB dump count, repo stats) |
|
||||||
|
| POST | `/api/backup/run` | Yes | Trigger manual backup (DB dumps + restic snapshot) |
|
||||||
|
|
||||||
## Status & Roadmap
|
## Status & Roadmap
|
||||||
|
|
||||||
@@ -412,20 +433,29 @@ docker compose up -d
|
|||||||
- [x] Alphabetically sorted stack display
|
- [x] Alphabetically sorted stack display
|
||||||
- [x] Deploy page doubles as read-only config viewer for deployed apps
|
- [x] Deploy page doubles as read-only config viewer for deployed apps
|
||||||
|
|
||||||
### Phase 2 — Monitoring & Health
|
### Phase 2 — Monitoring & Health ✅ COMPLETE
|
||||||
- [x] System metrics on dashboard (RAM, SSD, HDD usage bars)
|
- [x] System metrics on dashboard (RAM, SSD, HDD usage bars)
|
||||||
- [x] `/api/system/info` endpoint with live resource data
|
- [x] `/api/system/info` endpoint with live resource data
|
||||||
- [x] Pre-deploy memory validation (mem_request hard block, mem_limit soft warning)
|
- [x] Pre-deploy memory validation (mem_request hard block, mem_limit soft warning)
|
||||||
- [x] Memory summary bar on deploy page
|
- [x] Memory summary bar on deploy page
|
||||||
- [ ] CPU and temperature metrics
|
- [x] CPU usage collector (background /proc/stat sampling, 5s interval)
|
||||||
- [ ] Healthchecks.io ping integration
|
- [x] CPU usage bar on dashboard with load average display
|
||||||
|
- [x] Temperature reading from /sys/class/thermal (with /host/sys Docker mount)
|
||||||
|
- [x] Temperature display with colored indicator dot (green/yellow/red)
|
||||||
|
- [x] Central job scheduler (replaces ad-hoc goroutines)
|
||||||
|
- [x] Healthchecks.io-compatible HTTP pinger with retry logic
|
||||||
|
- [x] System health checks (disk, memory, CPU, temp, Docker, protected containers)
|
||||||
- [ ] Customer notifications (email/Telegram)
|
- [ ] Customer notifications (email/Telegram)
|
||||||
|
|
||||||
### Phase 3 — Backups
|
### Phase 3 — Backups ✅ COMPLETE
|
||||||
- [ ] DB dump engine (PostgreSQL, MariaDB/MySQL, SQLite)
|
- [x] DB auto-discovery (PostgreSQL/MariaDB containers via docker inspect)
|
||||||
- [ ] Restic integration (snapshot, prune, check)
|
- [x] DB dump engine (pg_dump/mariadb-dump via docker exec, atomic writes)
|
||||||
- [ ] Backup status on dashboard
|
- [x] Restic integration (auto-init, snapshot, prune, check, stats)
|
||||||
- [ ] Manual backup trigger from UI
|
- [x] Restic password auto-generation (no manual setup needed)
|
||||||
|
- [x] Backup orchestrator (DB dumps + restic + weekly prune)
|
||||||
|
- [x] Backup status on dashboard (last run, DB count, repo stats)
|
||||||
|
- [x] Manual backup trigger from UI ("Mentés most" button)
|
||||||
|
- [x] `GET /api/backup/status` and `POST /api/backup/run` endpoints
|
||||||
- [ ] Restore workflow
|
- [ ] Restore workflow
|
||||||
|
|
||||||
### Phase 4 — Git Sync & Updates
|
### Phase 4 — Git Sync & Updates
|
||||||
|
|||||||
Reference in New Issue
Block a user