# TASK.md — Hub Dashboard Bugs + Backup Validation Fix ## Overview Three bugs identified from the live hub.felhom.eu and controller backup page: 1. **Hub main page shows DOWN** despite the detail page showing STATUS: OK 2. **Hub report history timestamps show 00:00:00** instead of actual times 3. **Backup page shows "Hiba" for all DB validations** with no tooltip detail Bugs 1 and 2 share the same root cause (timestamp parsing). Bug 3 is in the controller. --- ## Bug 1 & 2: Hub timestamp parsing failure **Repository:** `felhom.eu` → `hub/` ### Root cause The hub's SQLite store parses `received_at` timestamps with a single format: ```go c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) ``` The parse error is silently discarded (`_`). When the format doesn't match what the `modernc.org/sqlite` driver returns, `ReceivedAt` becomes Go's zero time (`0001-01-01 00:00:00`). **Consequences:** - `time.Since(zeroTime)` ≈ 740,000+ hours → `TimeSinceReport > 1 hour` → **OverallStatus = "down"** - `zeroTime.Format("15:04:05")` → **"00:00:00"** in report history - Detail page health status shows OK because that comes from the report JSON payload, not the timestamp The `modernc.org/sqlite` driver may return datetime strings in various formats depending on how the value was stored and the SQLite version: - `2026-02-16 14:30:00` (what we expect) - `2026-02-16T14:30:00Z` (ISO 8601 / RFC3339-ish) - `2026-02-16 14:30:00+00:00` (with timezone offset) - `2026-02-16 14:30:00.123456` (with fractional seconds) ### Fix: `hub/internal/store/store.go` **Step 1:** Add a robust timestamp parser function at the bottom of store.go: ```go // parseSQLiteTime tries multiple formats that modernc.org/sqlite may return. func parseSQLiteTime(s string) time.Time { formats := []string{ "2006-01-02 15:04:05", // SQLite datetime('now') "2006-01-02T15:04:05Z", // RFC3339 without fractional time.RFC3339, // 2006-01-02T15:04:05Z07:00 time.RFC3339Nano, // with fractional seconds "2006-01-02 15:04:05+00:00", // with explicit UTC offset "2006-01-02 15:04:05.999999999", // with fractional, no TZ } for _, f := range formats { if t, err := time.Parse(f, s); err == nil { return t } } // Last resort: if string is non-empty, log it for debugging if s != "" { log.Printf("[WARN] Could not parse timestamp: %q", s) } return time.Time{} // zero time } ``` Note: Add `"log"` to the import block if not already present. **Step 2:** Replace ALL occurrences of `time.Parse("2006-01-02 15:04:05", receivedAt)` in store.go. There are **three** locations: 1. **`GetCustomers()`** — in the `for rows.Next()` loop: ```go // BEFORE: c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) // AFTER: c.ReceivedAt = parseSQLiteTime(receivedAt) ``` 2. **`GetCustomer()`** — after `row.Scan`: ```go // BEFORE: c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) // AFTER: c.ReceivedAt = parseSQLiteTime(receivedAt) ``` 3. **`GetCustomerHistory()`** — in the `for rows.Next()` loop: ```go // BEFORE: c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) // AFTER: c.ReceivedAt = parseSQLiteTime(receivedAt) ``` **Step 3 (optional diagnostic):** Temporarily add a log line in `SaveReport` to see what format SQLite actually stores/returns. This can be removed after verifying the fix: ```go // Add after the INSERT in SaveReport, before return: // Debug: check what format SQLite returns var dbTime string s.db.QueryRow("SELECT received_at FROM reports WHERE customer_id = ? ORDER BY id DESC LIMIT 1", customerID).Scan(&dbTime) s.logger.Printf("[DEBUG] SQLite received_at raw value: %q", dbTime) ``` ### Verify After rebuilding and deploying the hub: 1. Wait for the next controller report push (or trigger manually) 2. Check hub.felhom.eu — status should show **OK** (green), not DOWN 3. Click into customer detail — "Last report: X min ago" should show a reasonable value 4. Report History timestamps should show actual times like `14:36:32`, not `00:00:00` 5. Check hub pod logs for any `[WARN] Could not parse timestamp` messages (should be none) ### Post-fix grep ```bash grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/store.go # Should return 0 results — all replaced with parseSQLiteTime() ``` --- ## Bug 3: Backup page shows "Hiba" for all DB validations **Repository:** `deploy-felhom-compose` → `controller/` ### Symptoms - All 3 databases (immich, paperless, romm) show "Hiba" in the Érvényesítés column - The Állapot column shows "OK" (dump succeeded) - No tooltip text on hover (meaning `Validation.Error` is empty) - Dump files are valid — headers are correct, sizes are reasonable (43.2 MB / 319.6 KB / 38.7 KB) ### Analysis The template condition for "Hiba" in the `LastDBDump` path is: ```html {{if .Error}} → shows "–" (dump failed) {{else if .Validation.Valid}} → shows "X tábla" (validation passed) {{else}} → shows "Hiba" (THIS IS WHAT WE SEE) ``` "Hiba" with empty tooltip means `Validation.Valid == false` AND `Validation.Error == ""`. This is the **zero-value** of `DumpValidation{}` — meaning validation was never assigned. The code in `DumpOne()` calls `ValidateDump()` and the code in `ListDumpFiles()` also calls `ValidateDump()`. Both paths should populate the Validation field. Yet the UI shows zero-value. **Most likely cause:** The `lastDBDump` state was populated by an older code version (before validation was wired), OR there's a race condition where `RefreshCache` captures `lastDBDump` mid-construction, OR the validation ran but hit an unexpected issue (permissions, encoding). ### Diagnostic step (run on demo-felhom FIRST) Before applying fixes, check the controller logs to understand what happened: ```bash # Check the last DB dump run sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "db dump|table|valid|dump:" # Check if there was a controller restart sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "starting|version|shutdown" # Check if the old bash systemd timer is ALSO running (double-dump conflict!) systemctl is-active backup-db-dump.timer systemctl list-timers | grep backup ``` **IMPORTANT:** If `backup-db-dump.timer` is still active, it will race with the controller's built-in `db-dump` scheduler job. Both write to the same directory. The bash script overwrites files directly (no `.tmp` + rename), which could corrupt the file mid-validation. **Disable it:** ```bash sudo systemctl stop backup-db-dump.timer sudo systemctl disable backup-db-dump.timer ``` ### Fix 1: Add debug logging to `ValidateDump` **File:** `controller/internal/backup/dbdump.go`, function `ValidateDump` Add a log parameter and diagnostic output so we can see what's happening: ```go // BEFORE: func ValidateDump(filePath string, dbType DBType) DumpValidation { // AFTER: func ValidateDump(filePath string, dbType DBType) DumpValidation { log.Printf("[DEBUG] ValidateDump: %s (type=%s)", filePath, dbType) ``` And at the end, before `return v`: ```go v.Valid = true log.Printf("[DEBUG] ValidateDump OK: %s — %d tables, header found", filePath, tableCount) return v } ``` Also add logging to the error paths: After `v.Error = "dump file too small (< 100 bytes)"`: ```go log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error) ``` After `v.Error = fmt.Sprintf("read failed: %v", err)`: ```go log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error) ``` After `v.Error = "... dump missing comment header"`: ```go log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error) ``` After `v.Error = "no CREATE TABLE statements found"`: ```go log.Printf("[WARN] ValidateDump FAIL: %s — %s (header was found, scanned %d lines)", filePath, v.Error, len(strings.Split(content, "\n"))) ``` Note: Import `"log"` at the top of the file if not already imported (use the standard `log` package, not the `*log.Logger` parameter — this is a quick debug addition. Can be cleaned up later.) ### Fix 2: Template guard against zero-value Validation Even with debug logging, we should make the template resilient to zero-value Validation. The "Hiba" label with no explanation is a bad UX. **File:** `controller/internal/web/templates/backups.html` In the `LastDBDump` section, change the Érvényesítés (validation) column: ```html {{if .Error}} {{else if .Validation.Valid}} {{.Validation.TableCount}} tábla {{else}} Hiba {{end}} {{if .Error}} {{else if .Validation.Valid}} {{.Validation.TableCount}} tábla {{else if .Validation.Error}} Hiba {{else}} {{end}} ``` This ensures: - If validation passed → green badge with table count - If validation failed with a reason → red "Hiba" with tooltip - If validation never ran (zero-value) → gray "–" with explanatory tooltip ### Fix 3: Re-validate on cache refresh (belt-and-suspenders) Since `RefreshCache` already calls `ListDumpFiles()` which runs `ValidateDump()` per file, the `DumpFiles` fallback always has fresh validation. The issue is only in the `LastDBDump` path when in-memory results have stale/missing validation. Add a cross-check: if `LastDBDump` results have zero-value Validation but the file exists, re-validate it. Add this in `RefreshCache`, after the existing code: **File:** `controller/internal/backup/backup.go`, function `RefreshCache` After the line `status.DumpFiles = files` and before the lock section, add: ```go // Cross-check: if LastDBDump results have empty validation but files exist, // re-validate from disk. This handles controller restarts and race conditions. if m.lastDBDump != nil { fileValidation := make(map[string]DumpValidation) // keyed by filename for _, f := range files { fileValidation[f.FileName] = f.Validation } for i, r := range m.lastDBDump.Results { if !r.Validation.Valid && r.Validation.Error == "" && r.FilePath != "" { filename := filepath.Base(r.FilePath) if fv, ok := fileValidation[filename]; ok { m.lastDBDump.Results[i].Validation = fv m.logger.Printf("[INFO] Re-validated %s from disk: valid=%v tables=%d", filename, fv.Valid, fv.TableCount) } } } } ``` Note: Add `"path/filepath"` to imports if not already present. This runs every 5 minutes (same cadence as the cache refresh) and will automatically heal any stale validation state in `lastDBDump` by cross-referencing the fresh `ListDumpFiles` results. ### Fix 4: Disable conflicting systemd timer (manual step) If the diagnostic step above reveals that `backup-db-dump.timer` is still active: ```bash sudo systemctl stop backup-db-dump.timer sudo systemctl disable backup-db-dump.timer # Optionally verify: systemctl list-timers | grep backup # Should show nothing ``` The controller's built-in `db-dump` scheduler job at 02:30 replaces this timer entirely. Having both run simultaneously can corrupt dump files mid-write. ### Verify After deploying fixes: 1. Wait for cache refresh (5 minutes) or trigger a manual backup ("Mentés most") 2. Check `/backups` page — validation column should show "X tábla" for all databases 3. Check controller logs for `[DEBUG] ValidateDump` lines confirming validation ran 4. Verify no `[WARN] ValidateDump FAIL` lines in logs --- ## Post-fix checklist ### Hub (felhom.eu repo → hub/) - [ ] `grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/` → 0 results - [ ] `parseSQLiteTime` function exists in store.go - [ ] `go build ./cmd/hub/` succeeds - [ ] `go vet ./...` passes - [ ] Build new image, deploy to k3s - [ ] hub.felhom.eu shows OK status for demo-felhom - [ ] Report history shows real timestamps ### Controller (deploy-felhom-compose repo → controller/) - [ ] Template has 4-branch validation check (Valid / Error / zero-value guard) - [ ] `RefreshCache` has cross-check re-validation logic - [ ] `ValidateDump` has debug logging - [ ] `backup-db-dump.timer` is disabled on demo-felhom - [ ] `go build ./cmd/controller/` succeeds - [ ] `go vet ./...` passes - [ ] Build, deploy to demo-felhom - [ ] Backup page shows table counts, not "Hiba" - [ ] Controller logs show `[DEBUG] ValidateDump OK` entries ### Version bumps - Hub: bump to next patch version - Controller: include in v0.6.1 release (alongside the code review fixes from the other TASK.md)