361 lines
13 KiB
Markdown
361 lines
13 KiB
Markdown
# TASK.md — Hub Dashboard Bugs + Backup Validation Fix
|
||
|
||
## Overview
|
||
|
||
Three bugs identified from the live hub.felhom.eu and controller backup page:
|
||
|
||
1. **Hub main page shows DOWN** despite the detail page showing STATUS: OK
|
||
2. **Hub report history timestamps show 00:00:00** instead of actual times
|
||
3. **Backup page shows "Hiba" for all DB validations** with no tooltip detail
|
||
|
||
Bugs 1 and 2 share the same root cause (timestamp parsing). Bug 3 is in the controller.
|
||
|
||
---
|
||
|
||
## Bug 1 & 2: Hub timestamp parsing failure
|
||
|
||
**Repository:** `felhom.eu` → `hub/`
|
||
|
||
### Root cause
|
||
|
||
The hub's SQLite store parses `received_at` timestamps with a single format:
|
||
|
||
```go
|
||
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
|
||
```
|
||
|
||
The parse error is silently discarded (`_`). When the format doesn't match what the
|
||
`modernc.org/sqlite` driver returns, `ReceivedAt` becomes Go's zero time (`0001-01-01 00:00:00`).
|
||
|
||
**Consequences:**
|
||
- `time.Since(zeroTime)` ≈ 740,000+ hours → `TimeSinceReport > 1 hour` → **OverallStatus = "down"**
|
||
- `zeroTime.Format("15:04:05")` → **"00:00:00"** in report history
|
||
- Detail page health status shows OK because that comes from the report JSON payload, not the timestamp
|
||
|
||
The `modernc.org/sqlite` driver may return datetime strings in various formats depending on
|
||
how the value was stored and the SQLite version:
|
||
- `2026-02-16 14:30:00` (what we expect)
|
||
- `2026-02-16T14:30:00Z` (ISO 8601 / RFC3339-ish)
|
||
- `2026-02-16 14:30:00+00:00` (with timezone offset)
|
||
- `2026-02-16 14:30:00.123456` (with fractional seconds)
|
||
|
||
### Fix: `hub/internal/store/store.go`
|
||
|
||
**Step 1:** Add a robust timestamp parser function at the bottom of store.go:
|
||
|
||
```go
|
||
// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return.
|
||
func parseSQLiteTime(s string) time.Time {
|
||
formats := []string{
|
||
"2006-01-02 15:04:05", // SQLite datetime('now')
|
||
"2006-01-02T15:04:05Z", // RFC3339 without fractional
|
||
time.RFC3339, // 2006-01-02T15:04:05Z07:00
|
||
time.RFC3339Nano, // with fractional seconds
|
||
"2006-01-02 15:04:05+00:00", // with explicit UTC offset
|
||
"2006-01-02 15:04:05.999999999", // with fractional, no TZ
|
||
}
|
||
for _, f := range formats {
|
||
if t, err := time.Parse(f, s); err == nil {
|
||
return t
|
||
}
|
||
}
|
||
// Last resort: if string is non-empty, log it for debugging
|
||
if s != "" {
|
||
log.Printf("[WARN] Could not parse timestamp: %q", s)
|
||
}
|
||
return time.Time{} // zero time
|
||
}
|
||
```
|
||
|
||
Note: Add `"log"` to the import block if not already present.
|
||
|
||
**Step 2:** Replace ALL occurrences of `time.Parse("2006-01-02 15:04:05", receivedAt)` in store.go.
|
||
|
||
There are **three** locations:
|
||
|
||
1. **`GetCustomers()`** — in the `for rows.Next()` loop:
|
||
```go
|
||
// BEFORE:
|
||
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
|
||
|
||
// AFTER:
|
||
c.ReceivedAt = parseSQLiteTime(receivedAt)
|
||
```
|
||
|
||
2. **`GetCustomer()`** — after `row.Scan`:
|
||
```go
|
||
// BEFORE:
|
||
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
|
||
|
||
// AFTER:
|
||
c.ReceivedAt = parseSQLiteTime(receivedAt)
|
||
```
|
||
|
||
3. **`GetCustomerHistory()`** — in the `for rows.Next()` loop:
|
||
```go
|
||
// BEFORE:
|
||
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
|
||
|
||
// AFTER:
|
||
c.ReceivedAt = parseSQLiteTime(receivedAt)
|
||
```
|
||
|
||
**Step 3 (optional diagnostic):** Temporarily add a log line in `SaveReport` to see what format
|
||
SQLite actually stores/returns. This can be removed after verifying the fix:
|
||
|
||
```go
|
||
// Add after the INSERT in SaveReport, before return:
|
||
// Debug: check what format SQLite returns
|
||
var dbTime string
|
||
s.db.QueryRow("SELECT received_at FROM reports WHERE customer_id = ? ORDER BY id DESC LIMIT 1", customerID).Scan(&dbTime)
|
||
s.logger.Printf("[DEBUG] SQLite received_at raw value: %q", dbTime)
|
||
```
|
||
|
||
### Verify
|
||
|
||
After rebuilding and deploying the hub:
|
||
1. Wait for the next controller report push (or trigger manually)
|
||
2. Check hub.felhom.eu — status should show **OK** (green), not DOWN
|
||
3. Click into customer detail — "Last report: X min ago" should show a reasonable value
|
||
4. Report History timestamps should show actual times like `14:36:32`, not `00:00:00`
|
||
5. Check hub pod logs for any `[WARN] Could not parse timestamp` messages (should be none)
|
||
|
||
### Post-fix grep
|
||
|
||
```bash
|
||
grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/store.go
|
||
# Should return 0 results — all replaced with parseSQLiteTime()
|
||
```
|
||
|
||
---
|
||
|
||
## Bug 3: Backup page shows "Hiba" for all DB validations
|
||
|
||
**Repository:** `deploy-felhom-compose` → `controller/`
|
||
|
||
### Symptoms
|
||
|
||
- All 3 databases (immich, paperless, romm) show "Hiba" in the Érvényesítés column
|
||
- The Állapot column shows "OK" (dump succeeded)
|
||
- No tooltip text on hover (meaning `Validation.Error` is empty)
|
||
- Dump files are valid — headers are correct, sizes are reasonable (43.2 MB / 319.6 KB / 38.7 KB)
|
||
|
||
### Analysis
|
||
|
||
The template condition for "Hiba" in the `LastDBDump` path is:
|
||
```html
|
||
{{if .Error}} → shows "–" (dump failed)
|
||
{{else if .Validation.Valid}} → shows "X tábla" (validation passed)
|
||
{{else}} → shows "Hiba" (THIS IS WHAT WE SEE)
|
||
```
|
||
|
||
"Hiba" with empty tooltip means `Validation.Valid == false` AND `Validation.Error == ""`.
|
||
This is the **zero-value** of `DumpValidation{}` — meaning validation was never assigned.
|
||
|
||
The code in `DumpOne()` calls `ValidateDump()` and the code in `ListDumpFiles()` also calls
|
||
`ValidateDump()`. Both paths should populate the Validation field. Yet the UI shows zero-value.
|
||
|
||
**Most likely cause:** The `lastDBDump` state was populated by an older code version (before
|
||
validation was wired), OR there's a race condition where `RefreshCache` captures `lastDBDump`
|
||
mid-construction, OR the validation ran but hit an unexpected issue (permissions, encoding).
|
||
|
||
### Diagnostic step (run on demo-felhom FIRST)
|
||
|
||
Before applying fixes, check the controller logs to understand what happened:
|
||
|
||
```bash
|
||
# Check the last DB dump run
|
||
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "db dump|table|valid|dump:"
|
||
|
||
# Check if there was a controller restart
|
||
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "starting|version|shutdown"
|
||
|
||
# Check if the old bash systemd timer is ALSO running (double-dump conflict!)
|
||
systemctl is-active backup-db-dump.timer
|
||
systemctl list-timers | grep backup
|
||
```
|
||
|
||
**IMPORTANT:** If `backup-db-dump.timer` is still active, it will race with the controller's
|
||
built-in `db-dump` scheduler job. Both write to the same directory. The bash script overwrites
|
||
files directly (no `.tmp` + rename), which could corrupt the file mid-validation. **Disable it:**
|
||
|
||
```bash
|
||
sudo systemctl stop backup-db-dump.timer
|
||
sudo systemctl disable backup-db-dump.timer
|
||
```
|
||
|
||
### Fix 1: Add debug logging to `ValidateDump`
|
||
|
||
**File:** `controller/internal/backup/dbdump.go`, function `ValidateDump`
|
||
|
||
Add a log parameter and diagnostic output so we can see what's happening:
|
||
|
||
```go
|
||
// BEFORE:
|
||
func ValidateDump(filePath string, dbType DBType) DumpValidation {
|
||
|
||
// AFTER:
|
||
func ValidateDump(filePath string, dbType DBType) DumpValidation {
|
||
log.Printf("[DEBUG] ValidateDump: %s (type=%s)", filePath, dbType)
|
||
```
|
||
|
||
And at the end, before `return v`:
|
||
|
||
```go
|
||
v.Valid = true
|
||
log.Printf("[DEBUG] ValidateDump OK: %s — %d tables, header found", filePath, tableCount)
|
||
return v
|
||
}
|
||
```
|
||
|
||
Also add logging to the error paths:
|
||
|
||
After `v.Error = "dump file too small (< 100 bytes)"`:
|
||
```go
|
||
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
|
||
```
|
||
|
||
After `v.Error = fmt.Sprintf("read failed: %v", err)`:
|
||
```go
|
||
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
|
||
```
|
||
|
||
After `v.Error = "... dump missing comment header"`:
|
||
```go
|
||
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
|
||
```
|
||
|
||
After `v.Error = "no CREATE TABLE statements found"`:
|
||
```go
|
||
log.Printf("[WARN] ValidateDump FAIL: %s — %s (header was found, scanned %d lines)", filePath, v.Error, len(strings.Split(content, "\n")))
|
||
```
|
||
|
||
Note: Import `"log"` at the top of the file if not already imported (use the standard `log`
|
||
package, not the `*log.Logger` parameter — this is a quick debug addition. Can be cleaned up later.)
|
||
|
||
### Fix 2: Template guard against zero-value Validation
|
||
|
||
Even with debug logging, we should make the template resilient to zero-value Validation.
|
||
The "Hiba" label with no explanation is a bad UX.
|
||
|
||
**File:** `controller/internal/web/templates/backups.html`
|
||
|
||
In the `LastDBDump` section, change the Érvényesítés (validation) column:
|
||
|
||
```html
|
||
<!-- BEFORE: -->
|
||
{{if .Error}}
|
||
<span class="validation-badge validation-na">–</span>
|
||
{{else if .Validation.Valid}}
|
||
<span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
|
||
{{else}}
|
||
<span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
|
||
{{end}}
|
||
|
||
<!-- AFTER: -->
|
||
{{if .Error}}
|
||
<span class="validation-badge validation-na">–</span>
|
||
{{else if .Validation.Valid}}
|
||
<span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
|
||
{{else if .Validation.Error}}
|
||
<span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
|
||
{{else}}
|
||
<span class="validation-badge validation-na" title="Az érvényesítés nem futott le">–</span>
|
||
{{end}}
|
||
```
|
||
|
||
This ensures:
|
||
- If validation passed → green badge with table count
|
||
- If validation failed with a reason → red "Hiba" with tooltip
|
||
- If validation never ran (zero-value) → gray "–" with explanatory tooltip
|
||
|
||
### Fix 3: Re-validate on cache refresh (belt-and-suspenders)
|
||
|
||
Since `RefreshCache` already calls `ListDumpFiles()` which runs `ValidateDump()` per file,
|
||
the `DumpFiles` fallback always has fresh validation. The issue is only in the `LastDBDump`
|
||
path when in-memory results have stale/missing validation.
|
||
|
||
Add a cross-check: if `LastDBDump` results have zero-value Validation but the file exists,
|
||
re-validate it. Add this in `RefreshCache`, after the existing code:
|
||
|
||
**File:** `controller/internal/backup/backup.go`, function `RefreshCache`
|
||
|
||
After the line `status.DumpFiles = files` and before the lock section, add:
|
||
|
||
```go
|
||
// Cross-check: if LastDBDump results have empty validation but files exist,
|
||
// re-validate from disk. This handles controller restarts and race conditions.
|
||
if m.lastDBDump != nil {
|
||
fileValidation := make(map[string]DumpValidation) // keyed by filename
|
||
for _, f := range files {
|
||
fileValidation[f.FileName] = f.Validation
|
||
}
|
||
for i, r := range m.lastDBDump.Results {
|
||
if !r.Validation.Valid && r.Validation.Error == "" && r.FilePath != "" {
|
||
filename := filepath.Base(r.FilePath)
|
||
if fv, ok := fileValidation[filename]; ok {
|
||
m.lastDBDump.Results[i].Validation = fv
|
||
m.logger.Printf("[INFO] Re-validated %s from disk: valid=%v tables=%d",
|
||
filename, fv.Valid, fv.TableCount)
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Note: Add `"path/filepath"` to imports if not already present.
|
||
|
||
This runs every 5 minutes (same cadence as the cache refresh) and will automatically
|
||
heal any stale validation state in `lastDBDump` by cross-referencing the fresh
|
||
`ListDumpFiles` results.
|
||
|
||
### Fix 4: Disable conflicting systemd timer (manual step)
|
||
|
||
If the diagnostic step above reveals that `backup-db-dump.timer` is still active:
|
||
|
||
```bash
|
||
sudo systemctl stop backup-db-dump.timer
|
||
sudo systemctl disable backup-db-dump.timer
|
||
# Optionally verify:
|
||
systemctl list-timers | grep backup
|
||
# Should show nothing
|
||
```
|
||
|
||
The controller's built-in `db-dump` scheduler job at 02:30 replaces this timer entirely.
|
||
Having both run simultaneously can corrupt dump files mid-write.
|
||
|
||
### Verify
|
||
|
||
After deploying fixes:
|
||
1. Wait for cache refresh (5 minutes) or trigger a manual backup ("Mentés most")
|
||
2. Check `/backups` page — validation column should show "X tábla" for all databases
|
||
3. Check controller logs for `[DEBUG] ValidateDump` lines confirming validation ran
|
||
4. Verify no `[WARN] ValidateDump FAIL` lines in logs
|
||
|
||
---
|
||
|
||
## Post-fix checklist
|
||
|
||
### Hub (felhom.eu repo → hub/)
|
||
- [ ] `grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/` → 0 results
|
||
- [ ] `parseSQLiteTime` function exists in store.go
|
||
- [ ] `go build ./cmd/hub/` succeeds
|
||
- [ ] `go vet ./...` passes
|
||
- [ ] Build new image, deploy to k3s
|
||
- [ ] hub.felhom.eu shows OK status for demo-felhom
|
||
- [ ] Report history shows real timestamps
|
||
|
||
### Controller (deploy-felhom-compose repo → controller/)
|
||
- [ ] Template has 4-branch validation check (Valid / Error / zero-value guard)
|
||
- [ ] `RefreshCache` has cross-check re-validation logic
|
||
- [ ] `ValidateDump` has debug logging
|
||
- [ ] `backup-db-dump.timer` is disabled on demo-felhom
|
||
- [ ] `go build ./cmd/controller/` succeeds
|
||
- [ ] `go vet ./...` passes
|
||
- [ ] Build, deploy to demo-felhom
|
||
- [ ] Backup page shows table counts, not "Hiba"
|
||
- [ ] Controller logs show `[DEBUG] ValidateDump OK` entries
|
||
|
||
### Version bumps
|
||
- Hub: bump to next patch version
|
||
- Controller: include in v0.6.1 release (alongside the code review fixes from the other TASK.md) |