docs: update CONTEXT.md and README for v0.6.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-16 13:25:05 +01:00
parent 97074e7a0c
commit 8a1b9e57ae
2 changed files with 80 additions and 22 deletions
+47 -10
View File
@@ -7,7 +7,7 @@
> >
> Ask Claude Code: "Please update CONTEXT.md with what we did today" > Ask Claude Code: "Please update CONTEXT.md with what we did today"
Last updated: 2026-02-16 (session 17) Last updated: 2026-02-16 (session 18)
--- ---
@@ -22,7 +22,7 @@ Last updated: 2026-02-16 (session 17)
## Current project state ## Current project state
### felhom-controller (this repo) ### felhom-controller (this repo)
- **Version:** v0.5.4 - **Version:** v0.6.0
- **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow - **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow
- **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings) - **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings)
- **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**) - **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**)
@@ -31,7 +31,40 @@ Last updated: 2026-02-16 (session 17)
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080 - **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
- **All Phase 1-4 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page - **All Phase 1-4 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page
### What was just completed (2026-02-16 session 17) ### What was just completed (2026-02-16 session 18)
- **v0.6.0 — Healthcheck Implementation + Central Push + Hub Dashboard:**
- **Part 1 — Healthcheck enhancements (controller-side):**
- Added `heartbeat` ping — lightweight "I'm alive" signal every 5 min (no logic, just ping)
- Added `backup_integrity` ping — weekly `restic check` on Sunday 04:00, pings healthchecks with result
- Added `Heartbeat` and `BackupIntegrity` fields to `PingUUIDsConfig`
- Added `RunIntegrityCheck()` to backup Manager (calls restic Check(), updates lastCheckTime/lastCheckOK, pings)
- Updated `controller.yaml.example` with new monitoring ping_uuids
- Created `monitoring/DEPRECATED.md` for legacy bash monitoring scripts
- **Part 2 — Central hub reporting (controller-side):**
- New `internal/report/` package: types.go (Report struct), builder.go (BuildReport), pusher.go (HTTP push)
- Report builder gathers data from all subsystems: system info (via metrics.GetStaticInfo + system.GetInfo), container stats (via metricsStore.QueryContainerSummary), backup status (via backupMgr.GetFullStatus), health (via monitor.RunHealthCheck), stacks (via stackMgr.GetStacks)
- Report pusher: POST JSON to hub with Bearer token auth, 3 retries with 5s backoff, never fails caller
- Added `HubConfig` to config.go (enabled, url, api_key, push_interval)
- Wired hub reporting into scheduler (configurable interval, default 15m)
- Hub reporting disabled by default (hub.enabled: false)
- **Part 3 — Hub service (felhom.eu repo, new `hub/` subfolder):**
- Full Go service: `cmd/hub/main.go`, `internal/api/handler.go`, `internal/store/store.go`, `internal/web/server.go`
- SQLite store with WAL mode, auto-migration, denormalized fields for fast queries
- REST API: POST /api/v1/report (Bearer token auth), GET /api/v1/customers, GET /api/v1/customers/{id}, GET /api/v1/customers/{id}/history
- Dark theme dashboard (English): multi-customer overview table with status indicators, customer detail page with system/storage/containers/backup/health sections
- Color coding: green (OK, <30min), yellow (warn or 30-60min), red (fail or >60min)
- K8s manifest: Deployment + Service + Ingress for hub.felhom.eu in felhom-system namespace
- Dockerfile, Makefile, hub.yaml.example config
- 90-day report retention with daily auto-prune
- **Controller version:** v0.6.0 — deployed and verified on demo-felhom.eu (9 scheduler jobs, all new jobs registered)
- **Manual steps remaining for Viktor (Part 4 of TASK.md):**
- Create 5 healthcheck checks on status.felhom.eu (heartbeat, system-health, db-dump, backup, backup-integrity)
- Update controller.yaml on demo-felhom with real UUIDs
- Build and deploy felhom-hub to k3s cluster
- Configure hub.felhom.eu DNS in Cloudflare
- Enable hub reporting on demo-felhom controller.yaml
### What was previously completed (2026-02-16 session 17)
- **v0.5.4 — Monitoring Page Frontend Fixes (4 bugs, frontend-only):** - **v0.5.4 — Monitoring Page Frontend Fixes (4 bugs, frontend-only):**
- **Bug 1: Tooltip "Invalid Date"** — `items[0].parsed.x` unreliable across Chart.js versions. Fixed tooltip callback to use `items[0].raw.x` (direct {x,y} data access) with `parsed.x` as fallback. - **Bug 1: Tooltip "Invalid Date"** — `items[0].parsed.x` unreliable across Chart.js versions. Fixed tooltip callback to use `items[0].raw.x` (direct {x,y} data access) with `parsed.x` as fallback.
- **Bug 2: Charts fill full width regardless of data density** — `setChartXBounds()` setting `min/max` at runtime was ignored because the scale was created without them. Fixed by including `min: now - defaultRangeMs, max: now` in the initial `chartOpts()` options. Now "7 nap" shows full 7-day x-axis with data clustered on the right. - **Bug 2: Charts fill full width regardless of data density** — `setChartXBounds()` setting `min/max` at runtime was ignored because the scale was created without them. Fixed by including `min: now - defaultRangeMs, max: now` in the initial `chartOpts()` options. Now "7 nap" shows full 7-day x-axis with data clustered on the right.
@@ -336,15 +369,19 @@ Last updated: 2026-02-16 (session 17)
7. Documentation: restart vs up -d for image updates 7. Documentation: restart vs up -d for image updates
### What's next (priorities) ### What's next (priorities)
1. **Configure Healthchecks.io UUIDs** on demo-felhom.eu (replace CHANGEME in controller.yaml) 1. **Manual steps for v0.6.0** — Viktor needs to:
- Create 5 healthcheck checks on status.felhom.eu with correct periods/grace
- Update controller.yaml on demo-felhom with real UUIDs
- Build + deploy felhom-hub to k3s (`cd hub && make docker-push`, `kubectl apply -f manifests/hub.yaml`)
- Configure hub.felhom.eu DNS in Cloudflare
- Enable hub reporting on demo-felhom (`hub.enabled: true`, `hub.api_key: <key>`)
2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps 2. **Test backup flow** — trigger manual backup via dashboard, verify restic repo + DB dumps
3. **Test orphan delete flow** — try deleting the orphaned filebrowser stack via the UI 3. **Test backup integrity check** — wait for Sunday 04:00 or manually trigger
4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden) 4. Add `app_info` + `optional_config` to more apps (start with Immich, Mealie, Vaultwarden)
5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets) 5. Deploy a second app (e.g., ActualBudget — simplest, or Immich — tests HDD + secrets)
6. Add app screenshots to the asset pipeline (romm-screenshot-1.webp etc.) 6. Test on Raspberry Pi (pi-customer-1)
7. Test on Raspberry Pi (pi-customer-1) 7. Phase 4: Self-update mechanism
8. Add `paths.hdd_path` to demo-felhom controller.yaml to enable HDD bar 8. v0.6.1: Hub alerting (webhook to Healthchecks for stale customers)
9. Phase 4: Self-update mechanism
## Architecture decisions ## Architecture decisions
@@ -411,7 +448,7 @@ Last updated: 2026-02-16 (session 17)
|------------|--------|-------| |------------|--------|-------|
| deploy-felhom-compose | Active | This repo. Controller code + deploy scripts | | deploy-felhom-compose | Active | This repo. Controller code + deploy scripts |
| app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits | | app-catalog-felhom.eu | Active | 10 app templates, all with .felhom.yml metadata + memory limits |
| felhom.eu | Stable | Website live, SEO indexed, email working | | felhom.eu | Active | Website + hub/ subfolder (felhom-hub service) + k8s manifests |
| homelab-manifests | Stable | k3s cluster running (dooplex.hu services) | | homelab-manifests | Stable | k3s cluster running (dooplex.hu services) |
| misc-scripts | Utility | collect-repo.sh, backup helpers | | misc-scripts | Utility | collect-repo.sh, backup helpers |
+33 -12
View File
@@ -24,7 +24,7 @@ controller generates secrets, saves app.yaml, runs `docker compose up -d`, and t
with Traefik routing and health checks. The dashboard correctly shows real-time container states with Traefik routing and health checks. The dashboard correctly shows real-time container states
including health substatus (starting → healthy → running). including health substatus (starting → healthy → running).
Current version: **v0.4.0** Current version: **v0.6.0**
### What works ### What works
- Dashboard with live container state (green/orange/yellow/red) - Dashboard with live container state (green/orange/yellow/red)
@@ -55,6 +55,10 @@ Current version: **v0.4.0**
- Restic backup with auto-password generation, snapshot, prune, stats - Restic backup with auto-password generation, snapshot, prune, stats
- Backup status card on dashboard with manual "Mentés most" trigger button - Backup status card on dashboard with manual "Mentés most" trigger button
- Backup API endpoints: status query and manual trigger - Backup API endpoints: status query and manual trigger
- SQLite metrics store (system + container metrics, 60s collection, 30-day retention)
- Heartbeat ping (5-minute "I'm alive" signal to Healthchecks)
- Weekly backup integrity check (restic check, Sunday 04:00)
- Central hub reporting (periodic JSON push to felhom-hub service)
### Known issues / next priorities ### Known issues / next priorities
- Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel) - Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel)
@@ -87,10 +91,10 @@ Current version: **v0.4.0**
│ │ └──────────┘ │ │ │ │ └──────────┘ │ │
│ └────────────────────────────────────────────┘ │ │ └────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────────┘
│ pings │ git pull │ pings │ JSON push │ git pull
▼ ▼
status.felhom.eu gitea.dooplex.hu status.felhom.eu hub.felhom.eu gitea.dooplex.hu
(Healthchecks on k3s) (stack definitions) (Healthchecks) (central dashboard) (stack definitions)
``` ```
## Repository Layout ## Repository Layout
@@ -121,9 +125,18 @@ controller/
│ │ ├── pinger.go # Healthchecks.io HTTP ping client │ │ ├── pinger.go # Healthchecks.io HTTP ping client
│ │ └── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker) │ │ └── healthcheck.go # System health checks (disk, mem, CPU, temp, Docker)
│ ├── backup/ │ ├── backup/
│ │ ├── backup.go # Backup orchestrator (DB dumps + restic + prune) │ │ ├── backup.go # Backup orchestrator (DB dumps + restic + prune + integrity)
│ │ ├── dbdump.go # Database auto-discovery + dump (pg_dump, mariadb-dump) │ │ ├── dbdump.go # Database auto-discovery + dump (pg_dump, mariadb-dump)
│ │ └── restic.go # Restic operations (init, snapshot, prune, stats) │ │ └── restic.go # Restic operations (init, snapshot, prune, check, stats)
│ ├── metrics/
│ │ ├── store.go # SQLite metrics storage (system + container, downsampled queries)
│ │ ├── collector.go # Background collector (60s interval, system + docker stats)
│ │ ├── types.go # SystemSample, ContainerSample, StaticSystemInfo structs
│ │ └── sysinfo.go # Host-level static info (/proc, /etc)
│ ├── report/
│ │ ├── types.go # Hub report JSON payload definitions
│ │ ├── builder.go # Builds report from system/stacks/backup/metrics state
│ │ └── pusher.go # HTTP POST to central hub (retry, Bearer auth)
│ └── web/ │ └── web/
│ ├── server.go # HTTP server, routing, static file serving │ ├── server.go # HTTP server, routing, static file serving
│ ├── auth.go # Session auth, login/logout handlers │ ├── auth.go # Session auth, login/logout handlers
@@ -159,6 +172,8 @@ controller/
| **Sync** | `internal/sync/` | ✅ Done | Git-based app catalog sync (clone/pull, content-hash copy) | | **Sync** | `internal/sync/` | ✅ Done | Git-based app catalog sync (clone/pull, content-hash copy) |
| **Scheduler** | `internal/scheduler/` | ✅ Done | Central job scheduler (periodic + daily, skip-if-running) | | **Scheduler** | `internal/scheduler/` | ✅ Done | Central job scheduler (periodic + daily, skip-if-running) |
| **Monitor** | `internal/monitor/` | ✅ Done | Healthchecks.io pings, system health checks | | **Monitor** | `internal/monitor/` | ✅ Done | Healthchecks.io pings, system health checks |
| **Metrics** | `internal/metrics/` | ✅ Done | SQLite time-series store, system + container collection |
| **Report** | `internal/report/` | ✅ Done | Central hub push (JSON report builder + HTTP pusher) |
| **Backup** | `internal/backup/` | ✅ Done | DB auto-discovery + dump, restic snapshots, prune, manual trigger | | **Backup** | `internal/backup/` | ✅ Done | DB auto-discovery + dump, restic snapshots, prune, manual trigger |
## Stack Management ## Stack Management
@@ -371,7 +386,7 @@ docker compose up -d
| Node | Hardware | Domain | IP | Status | | Node | Hardware | Domain | IP | Status |
|------|----------|--------|----|--------| |------|----------|--------|----|--------|
| demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | ✅ Controller v0.4.0 + Paperless-ngx running | | demo-felhom | Acemagic GK3PLUS N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | ✅ Controller v0.6.0 + Paperless-ngx running |
| pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | — | 📲 Not yet tested | | pi-customer-1 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | — | 📲 Not yet tested |
### First deployment log (Paperless-ngx on demo-felhom) ### First deployment log (Paperless-ngx on demo-felhom)
@@ -445,6 +460,9 @@ docker compose up -d
- [x] Central job scheduler (replaces ad-hoc goroutines) - [x] Central job scheduler (replaces ad-hoc goroutines)
- [x] Healthchecks.io-compatible HTTP pinger with retry logic - [x] Healthchecks.io-compatible HTTP pinger with retry logic
- [x] System health checks (disk, memory, CPU, temp, Docker, protected containers) - [x] System health checks (disk, memory, CPU, temp, Docker, protected containers)
- [x] Heartbeat ping (5-minute "I'm alive" signal)
- [x] SQLite metrics store (system + container metrics, 60s collection, 30-day prune)
- [x] Backup integrity check (weekly restic check with Healthchecks ping)
- [ ] Customer notifications (email/Telegram) - [ ] Customer notifications (email/Telegram)
### Phase 3 — Backups ✅ COMPLETE ### Phase 3 — Backups ✅ COMPLETE
@@ -472,10 +490,13 @@ docker compose up -d
- [ ] Health-based rollback mechanism - [ ] Health-based rollback mechanism
- [ ] Config export/import - [ ] Config export/import
### Phase 6 — Central Management (future) ### Phase 6 — Central Management (in progress)
- [ ] API authentication for remote management - [x] Central hub reporting (controller → hub JSON push with Bearer auth)
- [ ] Central dashboard on k3s querying all customer controllers - [x] Hub report builder (system, stacks, backup, health, containers, metrics)
- [x] Hub service (felhom-hub: REST API + SQLite + dark-theme dashboard)
- [x] K8s manifests for hub deployment on k3s
- [ ] Fleet-wide update management - [ ] Fleet-wide update management
- [ ] Customer notifications (email/Telegram)
## Related Repositories ## Related Repositories
@@ -483,5 +504,5 @@ docker compose up -d
|------------|---------| |------------|---------|
| [deploy-felhom-compose](https://gitea.dooplex.hu/admin/deploy-felhom-compose) | This repo — controller + deploy scripts | | [deploy-felhom-compose](https://gitea.dooplex.hu/admin/deploy-felhom-compose) | This repo — controller + deploy scripts |
| [app-catalog-felhom.eu](https://gitea.dooplex.hu/admin/app-catalog-felhom.eu) | Docker Compose templates + .felhom.yml metadata | | [app-catalog-felhom.eu](https://gitea.dooplex.hu/admin/app-catalog-felhom.eu) | Docker Compose templates + .felhom.yml metadata |
| [felhom.eu](https://gitea.dooplex.hu/admin/felhom.eu) | Website + app assets + felhom infra manifests | | [felhom.eu](https://gitea.dooplex.hu/admin/felhom.eu) | Website + app assets + felhom infra manifests (incl. felhom-hub) |
| [homelab-manifests](https://gitea.dooplex.hu/admin/homelab-manifests) | k3s cluster manifests (dooplex.hu) | | [homelab-manifests](https://gitea.dooplex.hu/admin/homelab-manifests) | k3s cluster manifests (dooplex.hu) |