Files
deploy-felhom-compose/TASK.md
T

13 KiB
Raw Blame History

TASK.md — Hub Dashboard Bugs + Backup Validation Fix

Overview

Three bugs identified from the live hub.felhom.eu and controller backup page:

  1. Hub main page shows DOWN despite the detail page showing STATUS: OK
  2. Hub report history timestamps show 00:00:00 instead of actual times
  3. Backup page shows "Hiba" for all DB validations with no tooltip detail

Bugs 1 and 2 share the same root cause (timestamp parsing). Bug 3 is in the controller.


Bug 1 & 2: Hub timestamp parsing failure

Repository: felhom.euhub/

Root cause

The hub's SQLite store parses received_at timestamps with a single format:

c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)

The parse error is silently discarded (_). When the format doesn't match what the modernc.org/sqlite driver returns, ReceivedAt becomes Go's zero time (0001-01-01 00:00:00).

Consequences:

  • time.Since(zeroTime) ≈ 740,000+ hours → TimeSinceReport > 1 hourOverallStatus = "down"
  • zeroTime.Format("15:04:05")"00:00:00" in report history
  • Detail page health status shows OK because that comes from the report JSON payload, not the timestamp

The modernc.org/sqlite driver may return datetime strings in various formats depending on how the value was stored and the SQLite version:

  • 2026-02-16 14:30:00 (what we expect)
  • 2026-02-16T14:30:00Z (ISO 8601 / RFC3339-ish)
  • 2026-02-16 14:30:00+00:00 (with timezone offset)
  • 2026-02-16 14:30:00.123456 (with fractional seconds)

Fix: hub/internal/store/store.go

Step 1: Add a robust timestamp parser function at the bottom of store.go:

// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return.
func parseSQLiteTime(s string) time.Time {
	formats := []string{
		"2006-01-02 15:04:05",       // SQLite datetime('now')
		"2006-01-02T15:04:05Z",      // RFC3339 without fractional
		time.RFC3339,                 // 2006-01-02T15:04:05Z07:00
		time.RFC3339Nano,            // with fractional seconds
		"2006-01-02 15:04:05+00:00", // with explicit UTC offset
		"2006-01-02 15:04:05.999999999", // with fractional, no TZ
	}
	for _, f := range formats {
		if t, err := time.Parse(f, s); err == nil {
			return t
		}
	}
	// Last resort: if string is non-empty, log it for debugging
	if s != "" {
		log.Printf("[WARN] Could not parse timestamp: %q", s)
	}
	return time.Time{} // zero time
}

Note: Add "log" to the import block if not already present.

Step 2: Replace ALL occurrences of time.Parse("2006-01-02 15:04:05", receivedAt) in store.go.

There are three locations:

  1. GetCustomers() — in the for rows.Next() loop:
// BEFORE:
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)

// AFTER:
c.ReceivedAt = parseSQLiteTime(receivedAt)
  1. GetCustomer() — after row.Scan:
// BEFORE:
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)

// AFTER:
c.ReceivedAt = parseSQLiteTime(receivedAt)
  1. GetCustomerHistory() — in the for rows.Next() loop:
// BEFORE:
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)

// AFTER:
c.ReceivedAt = parseSQLiteTime(receivedAt)

Step 3 (optional diagnostic): Temporarily add a log line in SaveReport to see what format SQLite actually stores/returns. This can be removed after verifying the fix:

// Add after the INSERT in SaveReport, before return:
// Debug: check what format SQLite returns
var dbTime string
s.db.QueryRow("SELECT received_at FROM reports WHERE customer_id = ? ORDER BY id DESC LIMIT 1", customerID).Scan(&dbTime)
s.logger.Printf("[DEBUG] SQLite received_at raw value: %q", dbTime)

Verify

After rebuilding and deploying the hub:

  1. Wait for the next controller report push (or trigger manually)
  2. Check hub.felhom.eu — status should show OK (green), not DOWN
  3. Click into customer detail — "Last report: X min ago" should show a reasonable value
  4. Report History timestamps should show actual times like 14:36:32, not 00:00:00
  5. Check hub pod logs for any [WARN] Could not parse timestamp messages (should be none)

Post-fix grep

grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/store.go
# Should return 0 results — all replaced with parseSQLiteTime()

Bug 3: Backup page shows "Hiba" for all DB validations

Repository: deploy-felhom-composecontroller/

Symptoms

  • All 3 databases (immich, paperless, romm) show "Hiba" in the Érvényesítés column
  • The Állapot column shows "OK" (dump succeeded)
  • No tooltip text on hover (meaning Validation.Error is empty)
  • Dump files are valid — headers are correct, sizes are reasonable (43.2 MB / 319.6 KB / 38.7 KB)

Analysis

The template condition for "Hiba" in the LastDBDump path is:

{{if .Error}}         → shows "" (dump failed)
{{else if .Validation.Valid}} → shows "X tábla" (validation passed)
{{else}}              → shows "Hiba" (THIS IS WHAT WE SEE)

"Hiba" with empty tooltip means Validation.Valid == false AND Validation.Error == "". This is the zero-value of DumpValidation{} — meaning validation was never assigned.

The code in DumpOne() calls ValidateDump() and the code in ListDumpFiles() also calls ValidateDump(). Both paths should populate the Validation field. Yet the UI shows zero-value.

Most likely cause: The lastDBDump state was populated by an older code version (before validation was wired), OR there's a race condition where RefreshCache captures lastDBDump mid-construction, OR the validation ran but hit an unexpected issue (permissions, encoding).

Diagnostic step (run on demo-felhom FIRST)

Before applying fixes, check the controller logs to understand what happened:

# Check the last DB dump run
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "db dump|table|valid|dump:"

# Check if there was a controller restart
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "starting|version|shutdown"

# Check if the old bash systemd timer is ALSO running (double-dump conflict!)
systemctl is-active backup-db-dump.timer
systemctl list-timers | grep backup

IMPORTANT: If backup-db-dump.timer is still active, it will race with the controller's built-in db-dump scheduler job. Both write to the same directory. The bash script overwrites files directly (no .tmp + rename), which could corrupt the file mid-validation. Disable it:

sudo systemctl stop backup-db-dump.timer
sudo systemctl disable backup-db-dump.timer

Fix 1: Add debug logging to ValidateDump

File: controller/internal/backup/dbdump.go, function ValidateDump

Add a log parameter and diagnostic output so we can see what's happening:

// BEFORE:
func ValidateDump(filePath string, dbType DBType) DumpValidation {

// AFTER:
func ValidateDump(filePath string, dbType DBType) DumpValidation {
	log.Printf("[DEBUG] ValidateDump: %s (type=%s)", filePath, dbType)

And at the end, before return v:

	v.Valid = true
	log.Printf("[DEBUG] ValidateDump OK: %s — %d tables, header found", filePath, tableCount)
	return v
}

Also add logging to the error paths:

After v.Error = "dump file too small (< 100 bytes)":

log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)

After v.Error = fmt.Sprintf("read failed: %v", err):

log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)

After v.Error = "... dump missing comment header":

log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)

After v.Error = "no CREATE TABLE statements found":

log.Printf("[WARN] ValidateDump FAIL: %s — %s (header was found, scanned %d lines)", filePath, v.Error, len(strings.Split(content, "\n")))

Note: Import "log" at the top of the file if not already imported (use the standard log package, not the *log.Logger parameter — this is a quick debug addition. Can be cleaned up later.)

Fix 2: Template guard against zero-value Validation

Even with debug logging, we should make the template resilient to zero-value Validation. The "Hiba" label with no explanation is a bad UX.

File: controller/internal/web/templates/backups.html

In the LastDBDump section, change the Érvényesítés (validation) column:

<!-- BEFORE: -->
{{if .Error}}
    <span class="validation-badge validation-na"></span>
{{else if .Validation.Valid}}
    <span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
{{else}}
    <span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
{{end}}

<!-- AFTER: -->
{{if .Error}}
    <span class="validation-badge validation-na"></span>
{{else if .Validation.Valid}}
    <span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
{{else if .Validation.Error}}
    <span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
{{else}}
    <span class="validation-badge validation-na" title="Az érvényesítés nem futott le"></span>
{{end}}

This ensures:

  • If validation passed → green badge with table count
  • If validation failed with a reason → red "Hiba" with tooltip
  • If validation never ran (zero-value) → gray "" with explanatory tooltip

Fix 3: Re-validate on cache refresh (belt-and-suspenders)

Since RefreshCache already calls ListDumpFiles() which runs ValidateDump() per file, the DumpFiles fallback always has fresh validation. The issue is only in the LastDBDump path when in-memory results have stale/missing validation.

Add a cross-check: if LastDBDump results have zero-value Validation but the file exists, re-validate it. Add this in RefreshCache, after the existing code:

File: controller/internal/backup/backup.go, function RefreshCache

After the line status.DumpFiles = files and before the lock section, add:

	// Cross-check: if LastDBDump results have empty validation but files exist,
	// re-validate from disk. This handles controller restarts and race conditions.
	if m.lastDBDump != nil {
		fileValidation := make(map[string]DumpValidation) // keyed by filename
		for _, f := range files {
			fileValidation[f.FileName] = f.Validation
		}
		for i, r := range m.lastDBDump.Results {
			if !r.Validation.Valid && r.Validation.Error == "" && r.FilePath != "" {
				filename := filepath.Base(r.FilePath)
				if fv, ok := fileValidation[filename]; ok {
					m.lastDBDump.Results[i].Validation = fv
					m.logger.Printf("[INFO] Re-validated %s from disk: valid=%v tables=%d",
						filename, fv.Valid, fv.TableCount)
				}
			}
		}
	}

Note: Add "path/filepath" to imports if not already present.

This runs every 5 minutes (same cadence as the cache refresh) and will automatically heal any stale validation state in lastDBDump by cross-referencing the fresh ListDumpFiles results.

Fix 4: Disable conflicting systemd timer (manual step)

If the diagnostic step above reveals that backup-db-dump.timer is still active:

sudo systemctl stop backup-db-dump.timer
sudo systemctl disable backup-db-dump.timer
# Optionally verify:
systemctl list-timers | grep backup
# Should show nothing

The controller's built-in db-dump scheduler job at 02:30 replaces this timer entirely. Having both run simultaneously can corrupt dump files mid-write.

Verify

After deploying fixes:

  1. Wait for cache refresh (5 minutes) or trigger a manual backup ("Mentés most")
  2. Check /backups page — validation column should show "X tábla" for all databases
  3. Check controller logs for [DEBUG] ValidateDump lines confirming validation ran
  4. Verify no [WARN] ValidateDump FAIL lines in logs

Post-fix checklist

Hub (felhom.eu repo → hub/)

  • grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/ → 0 results
  • parseSQLiteTime function exists in store.go
  • go build ./cmd/hub/ succeeds
  • go vet ./... passes
  • Build new image, deploy to k3s
  • hub.felhom.eu shows OK status for demo-felhom
  • Report history shows real timestamps

Controller (deploy-felhom-compose repo → controller/)

  • Template has 4-branch validation check (Valid / Error / zero-value guard)
  • RefreshCache has cross-check re-validation logic
  • ValidateDump has debug logging
  • backup-db-dump.timer is disabled on demo-felhom
  • go build ./cmd/controller/ succeeds
  • go vet ./... passes
  • Build, deploy to demo-felhom
  • Backup page shows table counts, not "Hiba"
  • Controller logs show [DEBUG] ValidateDump OK entries

Version bumps

  • Hub: bump to next patch version
  • Controller: include in v0.6.1 release (alongside the code review fixes from the other TASK.md)