Code Review Bugfixes (v0.6.1)

2026-02-16 14:35:16 +01:00
parent 59b239739e
commit 104c97040c
1 changed files with 275 additions and 654 deletions
@@ -1,720 +1,341 @@
-# TASK.md — v0.6.0: Healthcheck Implementation + Central Push + Multi-Customer Dashboard
+# TASK.md — Code Review Bugfixes (v0.6.1)

-> **Version:** v0.6.0
-> **Depends on:** v0.5.4 (current)
-> **Repo:** `deploy-felhom-compose` (controller/ subfolder)
-> **Build:** `~/build/felhom-controller/build.sh 0.6.0 --push`
-> **Deploy target:** demo-felhom.eu (N100) + k3s cluster (dooplex.hu)
+## Overview
+
+Fix bugs and logic issues identified during the v0.6.0 code review. All changes are in the `controller/` subtree.
+No new features — only correctness, safety, and quality fixes.
+
+After all fixes: bump version to **v0.6.1**, commit, build, push, deploy, verify.

 ---

-## Context
+## Fix 1: `http.NotFound(w, nil)` — pass request, not nil

-The controller already has health monitoring infrastructure built in v0.4.0:
- `internal/monitor/pinger.go` — Healthchecks.io-compatible HTTP ping client (success/fail/start, retries)
- `internal/monitor/healthcheck.go` — System health checks (disk, memory, CPU, temp, Docker, protected containers)
- Scheduler jobs in `main.go`: `system-health` (every 5m), `db-dump` (daily), `backup` (daily)
- Backup manager already calls `pinger.Ping()`/`pinger.Fail()` after each operation
+**Files:** `internal/web/handlers.go`

-**Problem:** The demo-felhom Healthchecks project has **zero checks created** (screenshot confirms empty project at `status.felhom.eu/projects/.../checks/`). The `controller.yaml` on demo-felhom has all `CHANGEME` placeholder UUIDs. Nothing is actually pinging.
+**Problem:** Two handlers discard the `*http.Request` parameter as `_`, then call `http.NotFound(w, nil)`. While Go's current stdlib doesn't dereference the request in `NotFound`, this is incorrect and will break if middleware wraps it.

-Additionally, there are legacy bash scripts (`backup-healthcheck.sh`, `monitoring-setup.sh`) from the pre-controller era that duplicate functionality now built into the controller. These should be deprecated in favor of controller-native pings.
+**Changes:**

-**This version has two major parts:**
-1. **Prerequisite:** Get healthchecks actually working on demo-felhom (create checks, configure UUIDs, verify pings)
-2. **New feature:** Central push from customer controllers to k3s + multi-customer overview dashboard
+In `deployHandler`, change signature and call:
+```go
+// BEFORE:
+func (s *Server) deployHandler(w http.ResponseWriter, _ *http.Request, name string) {
+    ...
+    if err != nil {
+        http.NotFound(w, nil)

---
-
-## Part 0: Healthcheck Ping Design (controller.yaml schema update)
-
-### Current ping types (already implemented in code)
-
-| Ping | Schedule | Source | What it proves |
-|------|----------|--------|----------------|
-| `system_health` | Every 5 min | `monitor.RunHealthCheck()` | Server alive, Docker running, disks OK, protected containers up, CPU/mem/temp within thresholds |
-| `db_dump` | Daily 02:30 | `backup.RunDBDumps()` | Database dumps completed successfully |
-| `backup` | Daily 03:00 | `backup.RunBackup()` | Restic snapshot completed successfully |
-
-### New ping types to add
-
-| Ping | Schedule | Source | What it proves |
-|------|----------|--------|----------------|
-| `backup_integrity` | Weekly (Sunday 04:00) | New: `backup.RunIntegrityCheck()` | Restic repo passes `restic check` — data is not corrupted |
-| `heartbeat` | Every 5 min | New: lightweight HTTP POST, no logic | Controller process is alive (distinct from `system_health` which does heavy checks and could fail due to a bug while the controller itself is fine) |
-
-### Revised `controller.yaml` monitoring section
-
-```yaml
-monitoring:
-  enabled: true
-  healthchecks_base: "https://status.felhom.eu"
-  ping_uuids:
-    heartbeat: ""              # NEW — every 1 min, controller alive
-    system_health: ""          # existing — every 5 min, comprehensive check
-    db_dump: ""                # existing — daily after db dumps
-    backup: ""                 # existing — daily after restic snapshot
-    backup_integrity: ""       # NEW — weekly after restic check
-  system_health_interval: "5m"
-  health_check_schedule: "06:00"
-  thresholds:
-    disk_warn_percent: 80
-    disk_crit_percent: 90
-    backup_max_age_hours: 36
-    cpu_warn_percent: 90
-    memory_warn_percent: 85
-    temperature_warn_celsius: 75
+// AFTER:
+func (s *Server) deployHandler(w http.ResponseWriter, r *http.Request, name string) {
+    ...
+    if err != nil {
+        http.NotFound(w, r)
 ```

-> **Note:** Empty string and "CHANGEME..." UUIDs are both skipped by the pinger (already implemented). This means any check can be left unconfigured — the controller just skips it silently.
+In `appDetailHandler`, same fix:
+```go
+// BEFORE:
+func (s *Server) appDetailHandler(w http.ResponseWriter, _ *http.Request, slug string) {
+    ...
+    if found == nil {
+        http.NotFound(w, nil)

-### Healthchecks check configuration (to be created manually on status.felhom.eu)
+// AFTER:
+func (s *Server) appDetailHandler(w http.ResponseWriter, r *http.Request, slug string) {
+    ...
+    if found == nil {
+        http.NotFound(w, r)
+```

-For each customer project, create these checks:
-
-| Check name | Period | Grace | Tags |
-|-----------|--------|-------|------|
-| `heartbeat` | 5 minutes | 10 minutes | `heartbeat` |
-| `system-health` | 5 minutes | 10 minutes | `system`, `health` |
-| `db-dump` | 1 day (02:30 CET) | 30 minutes | `backup`, `db` |
-| `backup` | 1 day (03:00 CET) | 60 minutes | `backup`, `restic` |
-| `backup-integrity` | 7 days | 24 hours | `backup`, `integrity` |
+**Verify:** Grep for `NotFound(w, nil)` — should return 0 results after fix.

 ---

-## Part 1: Controller-side healthcheck implementation
+## Fix 2: Dashboard running/stopped counts don't match displayed stacks

-### Task 1.1: Add heartbeat ping
+**File:** `internal/web/handlers.go`, `dashboardHandler`

-**Files:** `cmd/controller/main.go`
+**Problem:** The `running`/`stopped` stat counters iterate over ALL stacks (including non-deployed ones), but the dashboard only displays deployed + protected stacks. The numbers don't match what the user sees.

-Add a new scheduler job — the simplest possible ping, no health check logic:
+**Fix:** Compute the counts from the same filtered set (`deployedStacks`), not from `stackList`. Move the filter loop first, then count from the filtered result.

 ```go
-// Heartbeat — lightweight "I'm alive" signal
-sched.Every("heartbeat", 5*time.Minute, func(ctx context.Context) error {
-    pinger.Ping(cfg.Monitoring.PingUUIDs.Heartbeat, "")
-    return nil
+func (s *Server) dashboardHandler(w http.ResponseWriter, _ *http.Request) {
+    stackList := s.stackMgr.GetStacks()
+
+    // Filter to deployed + protected stacks first
+    var deployedStacks []stacks.Stack
+    for _, st := range stackList {
+        if st.Deployed || st.Protected {
+            deployedStacks = append(deployedStacks, st)
+        }
+    }
+
+    // Count from the DISPLAYED set only
+    running, stopped := 0, 0
+    for _, st := range deployedStacks {
+        switch st.State {
+        case stacks.StateRunning, stacks.StateStarting, stacks.StateUnhealthy, stacks.StateRestarting:
+            running++
+        case stacks.StateStopped, stacks.StateExited:
+            stopped++
+        }
+    }
+
+    // ... rest unchanged, but use deployedStacks for display ...
+    data["Stacks"] = deployedStacks
+    data["RunningCount"] = running
+    data["StoppedCount"] = stopped
+    data["TotalCount"] = len(stackList)  // keep this as total catalog size
+```
+
+**Verify:** Deploy, open dashboard. Count of green + red badges on cards should match the stat numbers.
+
+---
+
+## Fix 3: `Secure: true` cookie blocks HTTP login
+
+**File:** `internal/web/auth.go`, `handleLogin`
+
+**Problem:** The session cookie has `Secure: true` hardcoded. When accessing via plain HTTP (e.g., `http://192.168.0.162:8080` during local setup), the browser silently refuses to send the cookie back, making login impossible with no visible error.
+
+**Fix:** Set `Secure` dynamically based on the incoming request:
+
+```go
+// BEFORE:
+http.SetCookie(w, &http.Cookie{
+    Name:     sessionCookieName,
+    Value:    token,
+    Path:     "/",
+    MaxAge:   int(sessionMaxAge.Seconds()),
+    HttpOnly: true,
+    SameSite: http.SameSiteStrictMode,
+    Secure:   true,
+})
+
+// AFTER:
+isSecure := r.TLS != nil || r.Header.Get("X-Forwarded-Proto") == "https"
+http.SetCookie(w, &http.Cookie{
+    Name:     sessionCookieName,
+    Value:    token,
+    Path:     "/",
+    MaxAge:   int(sessionMaxAge.Seconds()),
+    HttpOnly: true,
+    SameSite: http.SameSiteLaxMode,  // Lax needed: Strict can break redirects through CF tunnel
+    Secure:   isSecure,
 })
 ```

-**Files:** `internal/config/config.go`
+Note: Also change `SameSiteStrictMode` → `SameSiteLaxMode`. Strict mode can cause issues when users arrive via Cloudflare Tunnel redirects (the cookie won't be sent on the first navigation from an external link).

-Add `Heartbeat` field to `PingUUIDsConfig`:
+**Verify:** Access `http://192.168.0.162:8080` in browser, log in — should work. Also verify HTTPS login still works via `https://vezerlo.demo-felhom.eu`.
+
+---
+
+## Fix 4: Remove misleading `subtle.ConstantTimeCompare` in session check
+
+**File:** `internal/web/auth.go`, `isValidSession`
+
+**Problem:** The map lookup `s.sessions[token]` already reveals the token via timing. The subsequent `ConstantTimeCompare` compares the token to itself (it was just fetched by that key), so it always returns 1 and adds no security. It's misleading to keep it.
+
+**Fix:** Simplify:

 ```go
-type PingUUIDsConfig struct {
-    Heartbeat       string `yaml:"heartbeat"`
-    DBDump          string `yaml:"db_dump"`
-    Backup          string `yaml:"backup"`
-    SystemHealth    string `yaml:"system_health"`
-    BackupIntegrity string `yaml:"backup_integrity"`  // new
-}
-```
-
-### Task 1.2: Add backup integrity check
-
-**Files:** `internal/backup/restic.go`
-
-Add a `Check()` method (may already exist as part of prune logic — verify first):
-
-```go
-// Check runs `restic check` to verify repository integrity.
-func (r *ResticRunner) Check() error {
-    args := []string{"check", "--repo", r.repo, "--json"}
-    // ... standard exec with password file, timeout 30 min
-}
-```
-
-**Files:** `internal/backup/backup.go`
-
-Add `RunIntegrityCheck()`:
-
-```go
-// RunIntegrityCheck runs restic check and pings healthchecks with the result.
-func (m *Manager) RunIntegrityCheck(ctx context.Context) error {
-    err := m.restic.Check()
-    uuid := m.cfg.Monitoring.PingUUIDs.BackupIntegrity
-    if err != nil {
-        m.pinger.Fail(uuid, fmt.Sprintf("restic check failed: %v", err))
-        return err
+// BEFORE:
+func (s *Server) isValidSession(token string) bool {
+    s.sessionsMu.RLock()
+    defer s.sessionsMu.RUnlock()
+    sess, ok := s.sessions[token]
+    if !ok || time.Now().After(sess.expiresAt) {
+        return false
    }
-    m.pinger.Ping(uuid, "restic check passed")
-    return nil
+    return subtle.ConstantTimeCompare([]byte(sess.token), []byte(token)) == 1
+}
+
+// AFTER:
+func (s *Server) isValidSession(token string) bool {
+    s.sessionsMu.RLock()
+    defer s.sessionsMu.RUnlock()
+    sess, ok := s.sessions[token]
+    return ok && time.Now().Before(sess.expiresAt)
 }
 ```

-**Files:** `cmd/controller/main.go`
-
-Register the weekly job:
+Also: the `token` field in the `session` struct is now unused (it duplicates the map key). Remove it:

 ```go
-if cfg.Backup.Enabled && backupMgr != nil {
-    // ... existing daily jobs ...
+// BEFORE:
+type session struct {
+    token     string
+    expiresAt time.Time
+}

-    // Weekly integrity check — Sunday 04:00
-    sched.Daily("backup-integrity", "04:00", func(ctx context.Context) error {
-        if time.Now().Weekday() != time.Sunday {
-            return nil // skip non-Sundays
+// AFTER:
+type session struct {
+    expiresAt time.Time
+}
+```
+
+And update `createSession`:
+```go
+// BEFORE:
+s.sessions[token] = &session{token: token, expiresAt: time.Now().Add(sessionMaxAge)}
+
+// AFTER:
+s.sessions[token] = &session{expiresAt: time.Now().Add(sessionMaxAge)}
+```
+
+After these changes, remove the `"crypto/subtle"` import if no longer used.
+
+**Verify:** Log in, navigate around — session should work. Log out — should redirect to login.
+
+---
+
+## Fix 5: `cleanupSessions` goroutine leak
+
+**File:** `internal/web/auth.go`
+
+**Problem:** `time.Tick()` creates a ticker that can never be GC'd. The goroutine runs forever, even during shutdown.
+
+**Fix:** This one is lower priority since the controller runs as a long-lived process, but the fix is simple. Since we don't currently pass a context to `NewServer`, use a `done` channel on the server:
+
+Add a `done` channel to the Server struct:
+```go
+type Server struct {
+    // ... existing fields ...
+    done chan struct{}
+}
+```
+
+Initialize it in `NewServer`:
+```go
+func NewServer(...) *Server {
+    s := &Server{
+        // ... existing ...
+        done: make(chan struct{}),
+    }
+    s.loadTemplates()
+    go s.cleanupSessions()
+    return s
+}
+```
+
+Rewrite `cleanupSessions`:
+```go
+func (s *Server) cleanupSessions() {
+    ticker := time.NewTicker(15 * time.Minute)
+    defer ticker.Stop()
+    for {
+        select {
+        case <-s.done:
+            return
+        case <-ticker.C:
+            s.sessionsMu.Lock()
+            now := time.Now()
+            for t, sess := range s.sessions {
+                if now.After(sess.expiresAt) {
+                    delete(s.sessions, t)
+                }
+            }
+            s.sessionsMu.Unlock()
        }
-        return backupMgr.RunIntegrityCheck(ctx)
-    })
-}
-```
-
-> **Note on scheduler:** `Daily()` fires every day at the given time. To make it weekly, check the weekday inside the function. If you prefer, add a `Weekly()` method to the scheduler — but the weekday check is simpler and consistent with how prune already works.
-
-### Task 1.3: Update example config
-
-**Files:** `controller/configs/controller.yaml.example`
-
-Update the `monitoring.ping_uuids` section to include `heartbeat` and `backup_integrity` fields. Add comments explaining each.
-
-### Task 1.4: Deprecation note for bash monitoring scripts
-
-The following files in `deploy-felhom-compose/monitoring/` are **superseded** by the controller's built-in monitoring:
-
- `backup-healthcheck.sh` → replaced by `internal/monitor/healthcheck.go` (scheduler: `system-health`)
- `monitoring-setup.sh` → no longer needed (controller reads `controller.yaml` directly)
- `monitoring.conf.template` → replaced by `controller.yaml` monitoring section
- `backup-healthcheck.service` / `.timer` → replaced by controller's scheduler
-
-**Action:** Add a `DEPRECATED.md` in `deploy-felhom-compose/monitoring/` explaining that these scripts are kept for reference only and should not be used on nodes running felhom-controller v0.4.0+. Do NOT delete the files yet — they may be needed if a customer is still on a pre-controller setup.
-
-### Verification (Part 1)
-
-After building and deploying v0.6.0 to demo-felhom:
-
-1. Check controller logs: `docker logs felhom-controller --since 5m | grep -i "ping\|health\|heartbeat"`
-2. Verify pings arrive at `status.felhom.eu` — all 5 checks should show green within 10 minutes
-3. Test failure: `docker stop traefik`, wait 5 min, check that `system-health` goes red (protected container missing)
-4. Restart traefik: `docker start traefik`, verify recovery
-
---
-
-## Part 2: Central push to k3s (customer → operator reporting)
-
-### Architecture
-
-```
-┌─────────────────────────┐         HTTPS POST /api/v1/report
-│  Customer controller    │────────────────────────────────────────┐
-│  (demo-felhom.eu)       │   every 15 min (configurable)         │
-└─────────────────────────┘                                       ▼
-                                                    ┌─────────────────────────────┐
-┌─────────────────────────┐         HTTPS POST      │  felhom-hub                 │
-│  Customer controller    │────────────────────────▶│  (k3s pod on dooplex.hu)    │
-│  (customer-2)           │                         │                             │
-└─────────────────────────┘                         │  - Receives reports         │
-                                                    │  - Stores in SQLite         │
-                                                    │  - Serves dashboard         │
-                                                    │  - Alerts on stale reports  │
-                                                    └─────────────────────────────┘
-                                                          hub.felhom.eu
-```
-
-### Task 2.1: Define the report payload
-
-The controller pushes a JSON summary every 15 minutes. This is **not** raw metrics — it's an aggregated health summary.
-
-```json
-{
-  "version": 1,
-  "customer_id": "demo-felhom",
-  "customer_name": "Demo Ügyfél",
-  "controller_version": "0.6.0",
-  "timestamp": "2026-02-16T12:00:00Z",
-  "system": {
-    "hostname": "demo-felhom",
-    "os": "Debian GNU/Linux 13 (trixie)",
-    "kernel": "6.12.69+deb13-amd64",
-    "cpu_model": "Intel N100",
-    "cpu_cores": 4,
-    "uptime_seconds": 345600,
-    "cpu_percent": 12.5,
-    "memory_total_mb": 15872,
-    "memory_used_mb": 4200,
-    "memory_percent": 26.5,
-    "temperature_celsius": 48.0,
-    "load_avg_1": 0.45,
-    "load_avg_5": 0.38,
-    "load_avg_15": 0.32
-  },
-  "storage": [
-    { "mount": "/", "total_gb": 476.0, "used_gb": 28.5, "percent": 6.0 },
-    { "mount": "/mnt/hdd_1", "total_gb": 931.0, "used_gb": 120.3, "percent": 12.9 }
-  ],
-  "containers": {
-    "total": 16,
-    "running": 14,
-    "stopped": 2,
-    "unhealthy": 0,
-    "list": [
-      { "name": "paperless-ngx-webserver-1", "state": "running", "cpu_percent": 2.1, "memory_mb": 350 },
-      { "name": "traefik", "state": "running", "cpu_percent": 0.3, "memory_mb": 45 }
-    ]
-  },
-  "backup": {
-    "enabled": true,
-    "last_db_dump": "2026-02-16T02:30:15Z",
-    "last_snapshot": "2026-02-16T03:02:45Z",
-    "snapshot_count": 42,
-    "repo_size_mb": 2048,
-    "last_integrity_check": "2026-02-09T04:00:00Z",
-    "integrity_ok": true
-  },
-  "health": {
-    "status": "ok",
-    "issues": [],
-    "warnings": ["Disk /mnt/hdd_1 at 82%"]
-  },
-  "stacks": {
-    "deployed": ["paperless-ngx", "immich", "jellyfin"],
-    "available": ["nextcloud", "vaultwarden", "home-assistant"],
-    "updates_available": 1
-  }
-}
-```
-
-### Task 2.2: Implement report builder in the controller
-
-**New file:** `controller/internal/report/builder.go`
-
-```go
-package report
-
-// Report is the JSON payload pushed to the central hub.
-type Report struct {
-    Version           int              `json:"version"`
-    CustomerID        string           `json:"customer_id"`
-    CustomerName      string           `json:"customer_name"`
-    ControllerVersion string           `json:"controller_version"`
-    Timestamp         time.Time        `json:"timestamp"`
-    System            SystemReport     `json:"system"`
-    Storage           []StorageReport  `json:"storage"`
-    Containers        ContainerReport  `json:"containers"`
-    Backup            BackupReport     `json:"backup"`
-    Health            HealthReport     `json:"health"`
-    Stacks            StacksReport     `json:"stacks"`
-}
-
-// BuildReport collects current state from all subsystems and returns a Report.
-func BuildReport(cfg *config.Config, stackMgr *stacks.Manager,
-    backupMgr *backup.Manager, cpuCollector *system.CPUCollector,
-    pinger *monitor.Pinger, version string) *Report {
-    // Gather system info from system.GetInfo()
-    // Gather container info from stackMgr
-    // Gather backup info from backupMgr.GetFullStatus()
-    // Gather health from monitor.RunHealthCheck()
-    // Gather stack list from stackMgr.GetStacks()
-    // Return assembled Report
-}
-```
-
-This function should call existing methods — **do not duplicate logic**. Use the same data sources the dashboard and monitoring page already use.
-
-### Task 2.3: Implement report pusher in the controller
-
-**New file:** `controller/internal/report/pusher.go`
-
-```go
-package report
-
-// Pusher sends reports to the central hub.
-type Pusher struct {
-    hubURL     string
-    apiKey     string
-    httpClient *http.Client
-    logger     *log.Logger
-    enabled    bool
-}
-
-// Push sends a report to the hub. Returns nil on success.
-// Retries 3 times with 5s backoff. Never returns error to caller
-// (push failures should not affect controller operation).
-func (p *Pusher) Push(report *Report) error {
-    // JSON marshal
-    // POST to hubURL + "/api/v1/report"
-    // Header: Authorization: Bearer <apiKey>
-    // Header: Content-Type: application/json
-    // Retry on failure
-    // Log but don't propagate errors
-}
-```
-
-### Task 2.4: Add hub configuration to controller.yaml
-
-**Files:** `internal/config/config.go`, `controller/configs/controller.yaml.example`
-
-```yaml
-# --- Central hub (operator dashboard) ---
-hub:
-  enabled: false                              # Enable central reporting
-  url: "https://hub.felhom.eu"                # Hub API endpoint
-  api_key: ""                                 # Shared secret for authentication
-  push_interval: "15m"                        # How often to push reports
-```
-
-```go
-type HubConfig struct {
-    Enabled      bool   `yaml:"enabled"`
-    URL          string `yaml:"url"`
-    APIKey       string `yaml:"api_key"`
-    PushInterval string `yaml:"push_interval"`
-}
-```
-
-Add `Hub HubConfig `yaml:"hub"`` to the main `Config` struct.
-
-### Task 2.5: Wire the pusher into main.go
-
-```go
-// --- Central hub reporting ---
-if cfg.Hub.Enabled && cfg.Hub.URL != "" {
-    pushInterval, err := time.ParseDuration(cfg.Hub.PushInterval)
-    if err != nil {
-        pushInterval = 15 * time.Minute
    }
-    pusher := report.NewPusher(&cfg.Hub, logger)
-    sched.Every("hub-report", pushInterval, func(ctx context.Context) error {
-        r := report.BuildReport(cfg, stackMgr, backupMgr, cpuCollector, pinger, version)
-        return pusher.Push(r)
-    })
-    logger.Printf("[INFO] Hub reporting enabled (every %s to %s)", pushInterval, cfg.Hub.URL)
 }
 ```

-### Verification (Part 2)
+Add a `Close` method (called from main during shutdown, optional for now):
+```go
+func (s *Server) Close() {
+    close(s.done)
+}
+```

-1. Set `hub.enabled: true` and `hub.url` to a temporary endpoint (e.g., `https://webhook.site/...`) in demo-felhom's `controller.yaml`
-2. Restart controller, check logs for "Hub reporting enabled"
-3. Wait 15 min (or set `push_interval: "1m"` for testing), verify JSON arrives at the endpoint
-4. Validate JSON structure matches the spec above
-5. Reset `push_interval` to `"15m"` after testing
+**Verify:** Build succeeds, controller starts without errors.

 ---

-## Part 3: Hub service on k3s (operator side)
+## Fix 6: Add `http.MaxBytesReader` to API POST endpoints

-### Overview
+**File:** `internal/api/router.go`

-The hub is a lightweight Go service deployed on Viktor's k3s cluster in the `felhom-system` namespace. It receives reports from customer controllers, stores them in SQLite, and serves an English-language dashboard for Viktor.
+**Problem:** `json.NewDecoder(req.Body).Decode(&body)` has no size limit. A malicious or accidental large POST could exhaust memory.

-**Domain:** `hub.felhom.eu` (Nginx Ingress, cert-manager TLS)
-**Namespace:** `felhom-system` (alongside Healthchecks and other felhom infra)
-**Code:** `felhom.eu` repo on Gitea, `hub/` subfolder
+**Fix:** Add a helper and use it in all handlers that decode JSON bodies (`deployStack`, `updateOptionalConfig`, `deleteStack`):

-### Task 3.1: Hub service (subfolder in felhom.eu repository)
-
-The hub lives in the existing `felhom.eu` repository on Gitea as a `hub/` subfolder. It's deployed to the k3s cluster in the `felhom-system` namespace (alongside Healthchecks and other felhom infra). K8s manifests go in the `homelab-manifests` repo as usual.
-
-**Structure (inside felhom.eu repo):**
-
-```
-hub/
-├── cmd/hub/main.go              # Entry point
-├── internal/
-│   ├── api/
-│   │   └── handler.go           # POST /api/v1/report, GET /api/v1/customers
-│   ├── store/
-│   │   └── store.go             # SQLite: save reports, query latest per customer
-│   └── web/
-│       ├── server.go            # Dashboard HTTP server
-│       ├── templates/
-│       │   ├── dashboard.html   # Multi-customer overview (English)
-│       │   ├── customer.html    # Single customer detail (English)
-│       │   └── style.css        # Dark theme matching felhom.eu
-│       └── embed.go
-├── configs/
-│   └── hub.yaml.example
-├── Dockerfile
-├── Makefile
-└── go.mod
+Add helper at the bottom of `router.go`:
+```go
+// limitBody wraps the request body with a size limit (default 1MB).
+func limitBody(w http.ResponseWriter, req *http.Request) {
+    req.Body = http.MaxBytesReader(w, req.Body, 1<<20) // 1MB
+}
 ```

-K8s manifests in `felhom.eu/manifests/` (alongside healthchecks.yaml, webpage.yaml, etc.):
-```
-manifests/hub.yaml               # Deployment, Service, Ingress, PVC
+Then at the start of each handler that reads the body:
+```go
+func (r *Router) deployStack(w http.ResponseWriter, req *http.Request, name string) {
+    limitBody(w, req)
+    // ... existing json decode ...
+}
+
+func (r *Router) updateOptionalConfig(w http.ResponseWriter, req *http.Request, name string) {
+    limitBody(w, req)
+    // ...
+}
+
+func (r *Router) deleteStack(w http.ResponseWriter, req *http.Request, name string) {
+    limitBody(w, req)
+    // ...
+}
 ```

-### Task 3.2: Hub API endpoints
-
-| Method | Path | Auth | Description |
-|--------|------|------|-------------|
-| `POST` | `/api/v1/report` | Bearer token | Receive customer report (JSON body) |
-| `GET` | `/api/v1/customers` | Session/Basic | List all customers with latest status |
-| `GET` | `/api/v1/customers/{id}` | Session/Basic | Get latest report for a customer |
-| `GET` | `/api/v1/customers/{id}/history` | Session/Basic | Get report history (last 24h/7d/30d) |
-| `GET` | `/` | Session/Basic | Dashboard HTML page |
-| `GET` | `/customers/{id}` | Session/Basic | Customer detail HTML page |
-
-**Authentication:**
- Report ingest: Bearer token (shared secret per customer, or a single hub-wide key for simplicity)
- Dashboard: Basic auth or simple password (Viktor only) — reuse the same bcrypt approach as the controller
-
-### Task 3.3: Hub SQLite schema
-
-```sql
-CREATE TABLE IF NOT EXISTS reports (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    customer_id TEXT NOT NULL,
-    received_at DATETIME NOT NULL DEFAULT (datetime('now')),
-    report_json TEXT NOT NULL,           -- Full JSON payload
-    -- Denormalized fields for fast queries:
-    health_status TEXT,                  -- "ok", "warn", "fail"
-    cpu_percent REAL,
-    memory_percent REAL,
-    container_total INTEGER,
-    container_running INTEGER,
-    backup_last_snapshot DATETIME,
-    controller_version TEXT
-);
-
-CREATE INDEX IF NOT EXISTS idx_reports_customer ON reports(customer_id, received_at DESC);
-
-- Prune old reports: keep 30 days of history
-- Run daily: DELETE FROM reports WHERE received_at < datetime('now', '-30 days');
-```
-
-### Task 3.4: Hub dashboard UI (English)
-
-**Overview page (`/`):**
-
-A table/grid showing all customers at a glance:
-
-| Customer | Status | Last seen | CPU | Memory | Disk | Containers | Last backup | Version |
-|----------|--------|-----------|-----|--------|------|------------|-------------|---------|
-| 🟢 Demo Ügyfél | OK | 2 min ago | 12% | 26% | 6%/13% | 14/16 | 3h ago | 0.6.0 |
-| 🟡 Kovács Péter | WARN | 18 min ago | 45% | 78% | 82% ⚠️ | 8/8 | 4h ago | 0.5.4 |
-| 🔴 Nagy Anna | DOWN | 2h ago | – | – | – | – | 26h ago ⚠️ | 0.5.4 |
-
-**Color coding:**
- 🟢 Green: last seen < 30 min AND health = "ok"
- 🟡 Yellow: last seen < 30 min AND health = "warn", OR last seen 30-60 min
- 🔴 Red: last seen > 60 min OR health = "fail"
-
-**Customer detail page (`/customers/{id}`):**
-
- Last report timestamp
- Full system info section (same layout as controller's monitoring page)
- Container list with CPU/memory
- Backup status details
- Health issues/warnings
- Report history (collapsible list, last 24h)
-
-**Design:** English language. Dark theme matching felhom.eu / the controller dashboard. Use the same CSS variables and fonts.
-
-### Task 3.5: Hub Kubernetes manifests
-
-**File:** `felhom.eu/manifests/hub.yaml` (alongside `healthchecks.yaml`, `webpage.yaml`, etc.)
-
-```yaml
-# Namespace: felhom-system (shared with healthchecks and other felhom infra)
-# Deployment: 1 replica, 64Mi-256Mi memory
-# Service: ClusterIP port 8080
-# PVC: 1Gi for SQLite (Longhorn)
-# Ingress: hub.felhom.eu via nginx-internal, cert-manager TLS
-# Auth: same geo-restriction as other dooplex.hu services (HU only)
-```
-
-**ConfigMap** for `hub.yaml` config:
-```yaml
-auth:
-  password_hash: ""          # bcrypt hash, same approach as controller
-api:
-  report_api_key: ""         # Bearer token for report ingest
-retention:
-  max_days: 90               # Keep 90 days of report history
-  prune_schedule: "04:30"    # Daily prune
-alerting:
-  stale_threshold: "30m"     # Alert if customer not seen for 30 min
-```
-
-### Task 3.6: Alerting (optional, future enhancement)
-
-When a customer is "stale" (no report for > 30 min), the hub could:
- Send a webhook to Healthchecks (one "customer-X-reporting" check per customer)
- Send email via Resend
- Push to Telegram
-
-For v0.6.0 scope: just show the status on the dashboard. Alerting can be added in v0.6.1.
+**Verify:** Build succeeds. Normal deploy/config/delete still works (payloads are tiny).

 ---

-## Part 4: Manual steps for Viktor (demo-felhom setup)
+## Fix 7: Cache `time.LoadLocation` in template funcmap

-These steps must be done by Viktor manually — Claude Code cannot access status.felhom.eu or the demo-felhom server.
+**File:** `internal/web/funcmap.go`

-### 4.1: Create Healthchecks checks on status.felhom.eu
+**Problem:** At least 5 template functions call `time.LoadLocation("Europe/Budapest")` on every render. While Go caches internally, it still acquires a mutex each time.

-1. Log into `status.felhom.eu`
-2. Open the "demo-felhom" project
-3. Create 5 checks with the settings from the table in Part 0
-4. Copy the ping UUIDs for each check
+**Fix:** Load once at the top of `templateFuncMap` and capture in the closures:

-### 4.2: Update controller.yaml on demo-felhom
+```go
+func (s *Server) templateFuncMap() template.FuncMap {
+    loc, err := time.LoadLocation("Europe/Budapest")
+    if err != nil {
+        loc = time.UTC
+    }

-SSH into demo-felhom and update `/opt/docker/felhom-controller/controller.yaml`:
-
-```yaml
-monitoring:
-  enabled: true
-  healthchecks_base: "https://status.felhom.eu"
-  ping_uuids:
-    heartbeat: "<UUID-from-step-4.1>"
-    system_health: "<UUID-from-step-4.1>"
-    db_dump: "<UUID-from-step-4.1>"
-    backup: "<UUID-from-step-4.1>"
-    backup_integrity: "<UUID-from-step-4.1>"
-  system_health_interval: "5m"
-  health_check_schedule: "06:00"
-  thresholds:
-    disk_warn_percent: 80
-    disk_crit_percent: 90
-    backup_max_age_hours: 36
-    cpu_warn_percent: 90
-    memory_warn_percent: 85
-    temperature_warn_celsius: 75
+    return template.FuncMap{
+        // ... in every function that currently calls time.LoadLocation,
+        // replace with the captured `loc` variable.
+        // Remove the per-function `loc, _ := time.LoadLocation(...)` lines.
+        // Example:
+        "timeAgo": func(t time.Time) string {
+            if t.IsZero() { return "–" }
+            now := time.Now().In(loc)
+            d := now.Sub(t.In(loc))
+            // ... rest unchanged ...
+        },
+        // Apply same pattern to: fmtTime, fmtTimeShort, nextRunLabel, nextPruneLabel
+    }
+}
 ```

-### 4.3: Restart controller
+There are 5 functions that need this change: `timeAgo`, `fmtTime`, `fmtTimeShort`, `nextRunLabel`, `nextPruneLabel`.

-```bash
-cd /opt/docker/felhom-controller
-docker compose pull
-docker compose up -d
-docker logs -f felhom-controller --since 1m
-```
-
-### 4.4: Verify pings
-
-Wait 5 minutes, then check `status.felhom.eu` — all 5 checks should be green.
-
-### 4.5: Deploy hub to k3s (after Part 3 is built)
-
-```bash
-# Build and push hub image (from felhom.eu repo, hub/ subfolder)
-cd hub && make docker-push
-
-# Apply k8s manifests (from felhom.eu repo, manifests/ folder)
-kubectl apply -f manifests/hub.yaml
-
-# Configure hub.felhom.eu DNS in Cloudflare
-# Update demo-felhom controller.yaml with hub config
-```
+**Verify:** Build succeeds. Dashboard/backup page timestamps still display correctly in Budapest time.

 ---

-## Implementation order
+## Post-fix checklist

-1. **Part 1** (controller-side, in `deploy-felhom-compose` repo):
-   - Task 1.1: Heartbeat ping (5 min)
-   - Task 1.2: Backup integrity check (20 min)
-   - Task 1.3: Update example config (5 min)
-   - Task 1.4: Deprecation note for bash scripts (5 min)
-
-2. **Part 4.1–4.4** (Viktor manual: create checks, configure UUIDs, verify)
-
-3. **Part 2** (controller-side, report push):
-   - Task 2.1: Report payload types (10 min)
-   - Task 2.2: Report builder (30 min)
-   - Task 2.3: Report pusher (15 min)
-   - Task 2.4: Hub config in controller.yaml (10 min)
-   - Task 2.5: Wire into main.go (5 min)
-
-4. **Part 3** (hub in `felhom.eu` repo, k8s manifests in `homelab-manifests`):
-   - Task 3.1: Project scaffold in `hub/` subfolder (10 min)
-   - Task 3.2: API handlers (30 min)
-   - Task 3.3: SQLite store (20 min)
-   - Task 3.4: Dashboard UI — English (60 min)
-   - Task 3.5: K8s manifests in `felhom.eu/manifests/` (20 min)
-
-5. **Part 4.5** (Viktor manual: deploy hub, wire everything)
-
---
-
-## Files to modify (controller repo)
-
-```
-controller/cmd/controller/main.go                     — heartbeat job, integrity job, hub pusher
-controller/internal/config/config.go                   — PingUUIDsConfig + HubConfig
-controller/internal/backup/backup.go                   — RunIntegrityCheck()
-controller/internal/backup/restic.go                   — Check() method (verify/add)
-controller/internal/report/builder.go                  — NEW: report assembly
-controller/internal/report/pusher.go                   — NEW: HTTP push client
-controller/internal/report/types.go                    — NEW: Report struct definitions
-controller/configs/controller.yaml.example             — updated monitoring + new hub section
-monitoring/DEPRECATED.md                               — NEW: deprecation notice for bash scripts
-```
-
-## Files to create (hub — in felhom.eu repo)
-
-```
-hub/cmd/hub/main.go
-hub/internal/api/handler.go
-hub/internal/store/store.go
-hub/internal/web/server.go
-hub/internal/web/templates/dashboard.html
-hub/internal/web/templates/customer.html
-hub/internal/web/templates/style.css
-hub/internal/web/embed.go
-hub/configs/hub.yaml.example
-hub/Dockerfile
-hub/Makefile
-hub/go.mod
-hub/README.md
-```
-
-## Files to create (k8s manifests — in felhom.eu repo)
-
-```
-manifests/hub.yaml
-```
-
---
-
-## Verification checklist
-
- [ ] Heartbeat ping arrives every 5 min at status.felhom.eu
- [ ] System health ping arrives every 5 min with diagnostic body
- [ ] DB dump ping arrives daily at ~02:30
- [ ] Backup ping arrives daily at ~03:00
- [ ] Backup integrity ping arrives weekly on Sunday ~04:00
- [ ] Stopping a protected container triggers system-health FAIL
- [ ] Controller logs show "Hub reporting enabled" when hub.enabled=true
- [ ] Hub receives JSON reports from controller
- [ ] Hub dashboard shows demo-felhom with green status
- [ ] Hub dashboard shows "last seen: X min ago" updating correctly
- [ ] Hub shows red status when controller is stopped for > 60 min
- [ ] Hub SQLite prunes old reports automatically
- [ ] All UUIDs are configurable (empty/CHANGEME = silently skipped)
-
---
-
-## CONTEXT.md update (after completion)
-
-Add to "What was just completed" section:
-
-```
-### What was just completed (session N)
- **v0.6.0 — Healthcheck Implementation + Central Push + Hub Dashboard:**
-  - **Healthcheck pings fully operational:** 5 check types (heartbeat, system-health, db-dump, backup, backup-integrity) configured on demo-felhom, all pinging status.felhom.eu
-  - **Backup integrity check:** Weekly `restic check` with Healthchecks ping
-  - **Central hub reporting:** Controller pushes JSON health summary every 15 min to hub.felhom.eu
-  - **felhom-hub service:** New Go service in felhom.eu repo (`hub/` subfolder), k8s manifests in `felhom.eu/manifests/hub.yaml`, deployed on k3s in felhom-system namespace, SQLite storage, English multi-customer dashboard
-  - **Deprecated:** Legacy bash monitoring scripts (backup-healthcheck.sh, monitoring-setup.sh) superseded by controller-native monitoring
-```
-
-Also update the repository distinction in CONTEXT.md:
-
-```
-## Repository & manifest layout
-
- **homelab-manifests** — Viktor's personal k3s apps (*.dooplex.hu): mon-system, servarr, pihole, etc.
- **felhom.eu** — Everything felhom-related:
-  - `website/` — felhom.eu public website HTML
-  - `manifests/` — k8s manifests for felhom infra in felhom-system namespace (webpage, healthchecks, contact-mailer, umami, hub, felhom.secret)
-  - `hub/` — felhom-hub Go service (central multi-customer dashboard)
- **deploy-felhom-compose** — Customer-side: felhom-controller code, deploy scripts, monitoring scripts
- **app-catalog-felhom.eu** — Docker Compose templates for customer apps
-```
+1. `grep -rn 'NotFound(w, nil)' internal/` → 0 results
+2. `grep -rn 'subtle.ConstantTimeCompare' internal/` → 0 results (unless used elsewhere)
+3. `grep -rn 'time.Tick(' internal/` → 0 results
+4. `grep -rn 'Secure:.*true' internal/web/auth.go` → 0 results (now dynamic)
+5. Build: `go build ./cmd/controller/` succeeds with no errors
+6. `go vet ./...` passes
+7. Version bump in build to v0.6.1
+8. Commit, push, build, deploy, verify on demo-felhom.eu