# CONTEXT.md — Project Memory > This file serves as persistent project memory across Claude Code sessions. > It replaces the auto-generated "Memory" from the claude.ai Project. > **Update this file at the end of each working session** with current state, > recent decisions, and anything the next session needs to know. > > Ask Claude Code: "Please update CONTEXT.md with what we did today" Last updated: 2026-02-16 (session 20) --- ## About Viktor (project owner) - Works at Magyar Telekom (Budapest), building Felhom as a side business - Felhom: managed home-server service for Hungarian households - Technical but prefers pragmatic solutions over over-engineering - Runs all infrastructure on Gitea (gitea.dooplex.hu), k3s cluster for management - Customer deployments use Docker Compose (not Kubernetes) for simplicity ## Current project state ### felhom-controller (this repo) - **Version:** v0.6.2 - **Phase 1:** ✅ COMPLETE — Stack Manager + Deploy Flow - **Phase 2:** ✅ COMPLETE — Monitoring & Health (scheduler, CPU/temp, healthchecks.io pings) - **Phase 3:** ✅ COMPLETE — Backups (DB dumps, restic integration, manual trigger, **dedicated backup page**) - **Phase 4:** ✅ COMPLETE — Monitoring Page with Metrics Store (SQLite, Chart.js, system + container metrics) - **First app deployed:** Paperless-ngx on demo-felhom.eu (2026-02-13) - **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080 - **All Phase 1-4 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth, monitoring, backups, backup detail page, system monitoring page ### What was just completed (2026-02-16 session 20) - **Hub Dashboard Bugs + Backup Validation Fix (3 bugs):** - **Bug 1&2 (Hub repo, felhom-hub v0.1.2):** Hub timestamp parsing failure — `time.Parse` with single hardcoded format silently failed for formats returned by `modernc.org/sqlite`. Added `parseSQLiteTime()` that tries 6 common formats. Fixed: hub main page showing DOWN despite OK status, and report history timestamps showing 00:00:00. - **Bug 3 (Controller repo, v0.6.2):** Backup page showing "Hiba" for all DB validations — zero-value `DumpValidation{}` (never assigned) hit the `{{else}}` branch in template. Three fixes: - Template: 4-branch guard (Valid → OK / Error → Hiba / zero-value → "–" with tooltip) - Debug logging: Added `[DEBUG]` and `[WARN]` log lines to all `ValidateDump()` code paths - Re-validation: `RefreshCache()` now cross-checks `lastDBDump` results against fresh `ListDumpFiles()` validation, healing stale in-memory state - **Deployed:** Hub v0.1.2 to k3s, Controller v0.6.2 to demo-felhom - **Verified:** Controller logs show `ValidateDump OK` for all 3 databases (immich: 60 tables, paperless: 67 tables, romm: 14 tables) ### What was previously completed (2026-02-16 session 19) - **v0.6.1 — Code Review Bugfixes (7 fixes):** - **Fix 1:** `http.NotFound(w, nil)` → pass actual `*http.Request` in `deployHandler` and `appDetailHandler` - **Fix 2:** Dashboard running/stopped counts now computed from the filtered `deployedStacks` set (was counting ALL stacks including non-deployed) - **Fix 3:** Session cookie `Secure` flag now dynamic based on `r.TLS != nil || X-Forwarded-Proto == "https"`. `SameSite` changed from `Strict` to `Lax` (Strict breaks Cloudflare Tunnel redirects) - **Fix 4:** Removed misleading `subtle.ConstantTimeCompare` from `isValidSession()` (map lookup already leaks timing; comparing token to itself is meaningless). Removed unused `token` field from `session` struct. Removed `crypto/subtle` import. - **Fix 5:** Replaced `time.Tick()` (goroutine leak) with proper `time.NewTicker` + `done` channel in `cleanupSessions()`. Added `Close()` method to Server. Added `done chan struct{}` to Server struct. - **Fix 6:** Added `http.MaxBytesReader(w, req.Body, 1<<20)` (1MB limit) to `deployStack`, `updateOptionalConfig`, `deleteStack` API handlers via `limitBody()` helper. - **Fix 7:** Cached `time.LoadLocation("Europe/Budapest")` once at top of `templateFuncMap()`, removed 5 per-function `LoadLocation` calls (timeAgo, fmtTime, fmtTimeShort, nextRunLabel, nextPruneLabel). - **Post-fix verification:** All 4 grep checks pass (0 results for NotFound(w,nil), ConstantTimeCompare, time.Tick(, Secure:.*true). `go vet ./...` clean. - **Controller version:** v0.6.1 — deployed and verified on demo-felhom.eu ### What was previously completed (2026-02-16 session 18) - **v0.6.0 — Healthcheck Implementation + Central Push + Hub Dashboard:** - **Part 1 — Healthcheck enhancements (controller-side):** - Added `heartbeat` ping — lightweight "I'm alive" signal every 5 min (no logic, just ping) - Added `backup_integrity` ping — weekly `restic check` on Sunday 04:00, pings healthchecks with result - Added `Heartbeat` and `BackupIntegrity` fields to `PingUUIDsConfig` - Added `RunIntegrityCheck()` to backup Manager (calls restic Check(), updates lastCheckTime/lastCheckOK, pings) - Updated `controller.yaml.example` with new monitoring ping_uuids - Created `monitoring/DEPRECATED.md` for legacy bash monitoring scripts - **Part 2 — Central hub reporting (controller-side):** - New `internal/report/` package: types.go (Report struct), builder.go (BuildReport), pusher.go (HTTP push) - Report builder gathers data from all subsystems: system info (via metrics.GetStaticInfo + system.GetInfo), container stats (via metricsStore.QueryContainerSummary), backup status (via backupMgr.GetFullStatus), health (via monitor.RunHealthCheck), stacks (via stackMgr.GetStacks) - Report pusher: POST JSON to hub with Bearer token auth, 3 retries with 5s backoff, never fails caller - Added `HubConfig` to config.go (enabled, url, api_key, push_interval) - Wired hub reporting into scheduler (configurable interval, default 15m) - Hub reporting disabled by default (hub.enabled: false) - **Part 3 — Hub service (felhom.eu repo, new `hub/` subfolder):** - Full Go service: `cmd/hub/main.go`, `internal/api/handler.go`, `internal/store/store.go`, `internal/web/server.go` - SQLite store with WAL mode, auto-migration, denormalized fields for fast queries - REST API: POST /api/v1/report (Bearer token auth), GET /api/v1/customers, GET /api/v1/customers/{id}, GET /api/v1/customers/{id}/history - Dark theme dashboard (English): multi-customer overview table with status indicators, customer detail page with system/storage/containers/backup/health sections - Color coding: green (OK, <30min), yellow (warn or 30-60min), red (fail or >60min) - K8s manifest: Deployment + Service + Ingress for hub.felhom.eu in felhom-system namespace - Dockerfile, Makefile, hub.yaml.example config - 90-day report retention with daily auto-prune - **Controller version:** v0.6.0 — deployed and verified on demo-felhom.eu (9 scheduler jobs, all new jobs registered) - **Manual steps remaining for Viktor (Part 4 of TASK.md):** - Create 5 healthcheck checks on status.felhom.eu (heartbeat, system-health, db-dump, backup, backup-integrity) - Update controller.yaml on demo-felhom with real UUIDs - Build and deploy felhom-hub to k3s cluster - Configure hub.felhom.eu DNS in Cloudflare - Enable hub reporting on demo-felhom controller.yaml ### What was previously completed (2026-02-16 session 17) - **v0.5.4 — Monitoring Page Frontend Fixes (4 bugs, frontend-only):** - **Bug 1: Tooltip "Invalid Date"** — `items[0].parsed.x` unreliable across Chart.js versions. Fixed tooltip callback to use `items[0].raw.x` (direct {x,y} data access) with `parsed.x` as fallback. - **Bug 2: Charts fill full width regardless of data density** — `setChartXBounds()` setting `min/max` at runtime was ignored because the scale was created without them. Fixed by including `min: now - defaultRangeMs, max: now` in the initial `chartOpts()` options. Now "7 nap" shows full 7-day x-axis with data clustered on the right. - **Bug 3: Sysinfo values not consistently right-aligned** — `.sysinfo-grid` used `auto-fill` creating variable-width cells. Fixed to `1fr 1fr` (fixed 2-column). Added `align-items: baseline`, `gap: 1rem`, `white-space: nowrap` on labels, `font-weight: 600` + `word-break: break-word` on values. Removed redundant `