Hub Dashboard Bugs + Backup Validation Fix

This commit is contained in:
2026-02-16 14:53:35 +01:00
parent 62a185afb8
commit f511433754
+300 -280
View File
@@ -1,341 +1,361 @@
# TASK.md — Code Review Bugfixes (v0.6.1)
# TASK.md — Hub Dashboard Bugs + Backup Validation Fix
## Overview
Fix bugs and logic issues identified during the v0.6.0 code review. All changes are in the `controller/` subtree.
No new features — only correctness, safety, and quality fixes.
Three bugs identified from the live hub.felhom.eu and controller backup page:
After all fixes: bump version to **v0.6.1**, commit, build, push, deploy, verify.
1. **Hub main page shows DOWN** despite the detail page showing STATUS: OK
2. **Hub report history timestamps show 00:00:00** instead of actual times
3. **Backup page shows "Hiba" for all DB validations** with no tooltip detail
Bugs 1 and 2 share the same root cause (timestamp parsing). Bug 3 is in the controller.
---
## Fix 1: `http.NotFound(w, nil)` — pass request, not nil
## Bug 1 & 2: Hub timestamp parsing failure
**Files:** `internal/web/handlers.go`
**Repository:** `felhom.eu``hub/`
**Problem:** Two handlers discard the `*http.Request` parameter as `_`, then call `http.NotFound(w, nil)`. While Go's current stdlib doesn't dereference the request in `NotFound`, this is incorrect and will break if middleware wraps it.
### Root cause
**Changes:**
The hub's SQLite store parses `received_at` timestamps with a single format:
In `deployHandler`, change signature and call:
```go
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
```
The parse error is silently discarded (`_`). When the format doesn't match what the
`modernc.org/sqlite` driver returns, `ReceivedAt` becomes Go's zero time (`0001-01-01 00:00:00`).
**Consequences:**
- `time.Since(zeroTime)` ≈ 740,000+ hours → `TimeSinceReport > 1 hour`**OverallStatus = "down"**
- `zeroTime.Format("15:04:05")`**"00:00:00"** in report history
- Detail page health status shows OK because that comes from the report JSON payload, not the timestamp
The `modernc.org/sqlite` driver may return datetime strings in various formats depending on
how the value was stored and the SQLite version:
- `2026-02-16 14:30:00` (what we expect)
- `2026-02-16T14:30:00Z` (ISO 8601 / RFC3339-ish)
- `2026-02-16 14:30:00+00:00` (with timezone offset)
- `2026-02-16 14:30:00.123456` (with fractional seconds)
### Fix: `hub/internal/store/store.go`
**Step 1:** Add a robust timestamp parser function at the bottom of store.go:
```go
// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return.
func parseSQLiteTime(s string) time.Time {
formats := []string{
"2006-01-02 15:04:05", // SQLite datetime('now')
"2006-01-02T15:04:05Z", // RFC3339 without fractional
time.RFC3339, // 2006-01-02T15:04:05Z07:00
time.RFC3339Nano, // with fractional seconds
"2006-01-02 15:04:05+00:00", // with explicit UTC offset
"2006-01-02 15:04:05.999999999", // with fractional, no TZ
}
for _, f := range formats {
if t, err := time.Parse(f, s); err == nil {
return t
}
}
// Last resort: if string is non-empty, log it for debugging
if s != "" {
log.Printf("[WARN] Could not parse timestamp: %q", s)
}
return time.Time{} // zero time
}
```
Note: Add `"log"` to the import block if not already present.
**Step 2:** Replace ALL occurrences of `time.Parse("2006-01-02 15:04:05", receivedAt)` in store.go.
There are **three** locations:
1. **`GetCustomers()`** — in the `for rows.Next()` loop:
```go
// BEFORE:
func (s *Server) deployHandler(w http.ResponseWriter, _ *http.Request, name string) {
...
if err != nil {
http.NotFound(w, nil)
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
// AFTER:
func (s *Server) deployHandler(w http.ResponseWriter, r *http.Request, name string) {
...
if err != nil {
http.NotFound(w, r)
c.ReceivedAt = parseSQLiteTime(receivedAt)
```
In `appDetailHandler`, same fix:
2. **`GetCustomer()`** — after `row.Scan`:
```go
// BEFORE:
func (s *Server) appDetailHandler(w http.ResponseWriter, _ *http.Request, slug string) {
...
if found == nil {
http.NotFound(w, nil)
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
// AFTER:
func (s *Server) appDetailHandler(w http.ResponseWriter, r *http.Request, slug string) {
...
if found == nil {
http.NotFound(w, r)
c.ReceivedAt = parseSQLiteTime(receivedAt)
```
**Verify:** Grep for `NotFound(w, nil)`should return 0 results after fix.
3. **`GetCustomerHistory()`**in the `for rows.Next()` loop:
```go
// BEFORE:
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
---
// AFTER:
c.ReceivedAt = parseSQLiteTime(receivedAt)
```
## Fix 2: Dashboard running/stopped counts don't match displayed stacks
**File:** `internal/web/handlers.go`, `dashboardHandler`
**Problem:** The `running`/`stopped` stat counters iterate over ALL stacks (including non-deployed ones), but the dashboard only displays deployed + protected stacks. The numbers don't match what the user sees.
**Fix:** Compute the counts from the same filtered set (`deployedStacks`), not from `stackList`. Move the filter loop first, then count from the filtered result.
**Step 3 (optional diagnostic):** Temporarily add a log line in `SaveReport` to see what format
SQLite actually stores/returns. This can be removed after verifying the fix:
```go
func (s *Server) dashboardHandler(w http.ResponseWriter, _ *http.Request) {
stackList := s.stackMgr.GetStacks()
// Filter to deployed + protected stacks first
var deployedStacks []stacks.Stack
for _, st := range stackList {
if st.Deployed || st.Protected {
deployedStacks = append(deployedStacks, st)
}
}
// Count from the DISPLAYED set only
running, stopped := 0, 0
for _, st := range deployedStacks {
switch st.State {
case stacks.StateRunning, stacks.StateStarting, stacks.StateUnhealthy, stacks.StateRestarting:
running++
case stacks.StateStopped, stacks.StateExited:
stopped++
}
}
// ... rest unchanged, but use deployedStacks for display ...
data["Stacks"] = deployedStacks
data["RunningCount"] = running
data["StoppedCount"] = stopped
data["TotalCount"] = len(stackList) // keep this as total catalog size
// Add after the INSERT in SaveReport, before return:
// Debug: check what format SQLite returns
var dbTime string
s.db.QueryRow("SELECT received_at FROM reports WHERE customer_id = ? ORDER BY id DESC LIMIT 1", customerID).Scan(&dbTime)
s.logger.Printf("[DEBUG] SQLite received_at raw value: %q", dbTime)
```
**Verify:** Deploy, open dashboard. Count of green + red badges on cards should match the stat numbers.
### Verify
After rebuilding and deploying the hub:
1. Wait for the next controller report push (or trigger manually)
2. Check hub.felhom.eu — status should show **OK** (green), not DOWN
3. Click into customer detail — "Last report: X min ago" should show a reasonable value
4. Report History timestamps should show actual times like `14:36:32`, not `00:00:00`
5. Check hub pod logs for any `[WARN] Could not parse timestamp` messages (should be none)
### Post-fix grep
```bash
grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/store.go
# Should return 0 results — all replaced with parseSQLiteTime()
```
---
## Fix 3: `Secure: true` cookie blocks HTTP login
## Bug 3: Backup page shows "Hiba" for all DB validations
**File:** `internal/web/auth.go`, `handleLogin`
**Repository:** `deploy-felhom-compose``controller/`
**Problem:** The session cookie has `Secure: true` hardcoded. When accessing via plain HTTP (e.g., `http://192.168.0.162:8080` during local setup), the browser silently refuses to send the cookie back, making login impossible with no visible error.
### Symptoms
**Fix:** Set `Secure` dynamically based on the incoming request:
- All 3 databases (immich, paperless, romm) show "Hiba" in the Érvényesítés column
- The Állapot column shows "OK" (dump succeeded)
- No tooltip text on hover (meaning `Validation.Error` is empty)
- Dump files are valid — headers are correct, sizes are reasonable (43.2 MB / 319.6 KB / 38.7 KB)
### Analysis
The template condition for "Hiba" in the `LastDBDump` path is:
```html
{{if .Error}} → shows "" (dump failed)
{{else if .Validation.Valid}} → shows "X tábla" (validation passed)
{{else}} → shows "Hiba" (THIS IS WHAT WE SEE)
```
"Hiba" with empty tooltip means `Validation.Valid == false` AND `Validation.Error == ""`.
This is the **zero-value** of `DumpValidation{}` — meaning validation was never assigned.
The code in `DumpOne()` calls `ValidateDump()` and the code in `ListDumpFiles()` also calls
`ValidateDump()`. Both paths should populate the Validation field. Yet the UI shows zero-value.
**Most likely cause:** The `lastDBDump` state was populated by an older code version (before
validation was wired), OR there's a race condition where `RefreshCache` captures `lastDBDump`
mid-construction, OR the validation ran but hit an unexpected issue (permissions, encoding).
### Diagnostic step (run on demo-felhom FIRST)
Before applying fixes, check the controller logs to understand what happened:
```bash
# Check the last DB dump run
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "db dump|table|valid|dump:"
# Check if there was a controller restart
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "starting|version|shutdown"
# Check if the old bash systemd timer is ALSO running (double-dump conflict!)
systemctl is-active backup-db-dump.timer
systemctl list-timers | grep backup
```
**IMPORTANT:** If `backup-db-dump.timer` is still active, it will race with the controller's
built-in `db-dump` scheduler job. Both write to the same directory. The bash script overwrites
files directly (no `.tmp` + rename), which could corrupt the file mid-validation. **Disable it:**
```bash
sudo systemctl stop backup-db-dump.timer
sudo systemctl disable backup-db-dump.timer
```
### Fix 1: Add debug logging to `ValidateDump`
**File:** `controller/internal/backup/dbdump.go`, function `ValidateDump`
Add a log parameter and diagnostic output so we can see what's happening:
```go
// BEFORE:
http.SetCookie(w, &http.Cookie{
Name: sessionCookieName,
Value: token,
Path: "/",
MaxAge: int(sessionMaxAge.Seconds()),
HttpOnly: true,
SameSite: http.SameSiteStrictMode,
Secure: true,
})
func ValidateDump(filePath string, dbType DBType) DumpValidation {
// AFTER:
isSecure := r.TLS != nil || r.Header.Get("X-Forwarded-Proto") == "https"
http.SetCookie(w, &http.Cookie{
Name: sessionCookieName,
Value: token,
Path: "/",
MaxAge: int(sessionMaxAge.Seconds()),
HttpOnly: true,
SameSite: http.SameSiteLaxMode, // Lax needed: Strict can break redirects through CF tunnel
Secure: isSecure,
})
func ValidateDump(filePath string, dbType DBType) DumpValidation {
log.Printf("[DEBUG] ValidateDump: %s (type=%s)", filePath, dbType)
```
Note: Also change `SameSiteStrictMode``SameSiteLaxMode`. Strict mode can cause issues when users arrive via Cloudflare Tunnel redirects (the cookie won't be sent on the first navigation from an external link).
**Verify:** Access `http://192.168.0.162:8080` in browser, log in — should work. Also verify HTTPS login still works via `https://vezerlo.demo-felhom.eu`.
---
## Fix 4: Remove misleading `subtle.ConstantTimeCompare` in session check
**File:** `internal/web/auth.go`, `isValidSession`
**Problem:** The map lookup `s.sessions[token]` already reveals the token via timing. The subsequent `ConstantTimeCompare` compares the token to itself (it was just fetched by that key), so it always returns 1 and adds no security. It's misleading to keep it.
**Fix:** Simplify:
And at the end, before `return v`:
```go
// BEFORE:
func (s *Server) isValidSession(token string) bool {
s.sessionsMu.RLock()
defer s.sessionsMu.RUnlock()
sess, ok := s.sessions[token]
if !ok || time.Now().After(sess.expiresAt) {
return false
}
return subtle.ConstantTimeCompare([]byte(sess.token), []byte(token)) == 1
}
// AFTER:
func (s *Server) isValidSession(token string) bool {
s.sessionsMu.RLock()
defer s.sessionsMu.RUnlock()
sess, ok := s.sessions[token]
return ok && time.Now().Before(sess.expiresAt)
v.Valid = true
log.Printf("[DEBUG] ValidateDump OK: %s — %d tables, header found", filePath, tableCount)
return v
}
```
Also: the `token` field in the `session` struct is now unused (it duplicates the map key). Remove it:
Also add logging to the error paths:
After `v.Error = "dump file too small (< 100 bytes)"`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
```
After `v.Error = fmt.Sprintf("read failed: %v", err)`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
```
After `v.Error = "... dump missing comment header"`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
```
After `v.Error = "no CREATE TABLE statements found"`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s (header was found, scanned %d lines)", filePath, v.Error, len(strings.Split(content, "\n")))
```
Note: Import `"log"` at the top of the file if not already imported (use the standard `log`
package, not the `*log.Logger` parameter — this is a quick debug addition. Can be cleaned up later.)
### Fix 2: Template guard against zero-value Validation
Even with debug logging, we should make the template resilient to zero-value Validation.
The "Hiba" label with no explanation is a bad UX.
**File:** `controller/internal/web/templates/backups.html`
In the `LastDBDump` section, change the Érvényesítés (validation) column:
```html
<!-- BEFORE: -->
{{if .Error}}
<span class="validation-badge validation-na"></span>
{{else if .Validation.Valid}}
<span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
{{else}}
<span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
{{end}}
<!-- AFTER: -->
{{if .Error}}
<span class="validation-badge validation-na"></span>
{{else if .Validation.Valid}}
<span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
{{else if .Validation.Error}}
<span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
{{else}}
<span class="validation-badge validation-na" title="Az érvényesítés nem futott le"></span>
{{end}}
```
This ensures:
- If validation passed → green badge with table count
- If validation failed with a reason → red "Hiba" with tooltip
- If validation never ran (zero-value) → gray "" with explanatory tooltip
### Fix 3: Re-validate on cache refresh (belt-and-suspenders)
Since `RefreshCache` already calls `ListDumpFiles()` which runs `ValidateDump()` per file,
the `DumpFiles` fallback always has fresh validation. The issue is only in the `LastDBDump`
path when in-memory results have stale/missing validation.
Add a cross-check: if `LastDBDump` results have zero-value Validation but the file exists,
re-validate it. Add this in `RefreshCache`, after the existing code:
**File:** `controller/internal/backup/backup.go`, function `RefreshCache`
After the line `status.DumpFiles = files` and before the lock section, add:
```go
// BEFORE:
type session struct {
token string
expiresAt time.Time
}
// AFTER:
type session struct {
expiresAt time.Time
}
// Cross-check: if LastDBDump results have empty validation but files exist,
// re-validate from disk. This handles controller restarts and race conditions.
if m.lastDBDump != nil {
fileValidation := make(map[string]DumpValidation) // keyed by filename
for _, f := range files {
fileValidation[f.FileName] = f.Validation
}
for i, r := range m.lastDBDump.Results {
if !r.Validation.Valid && r.Validation.Error == "" && r.FilePath != "" {
filename := filepath.Base(r.FilePath)
if fv, ok := fileValidation[filename]; ok {
m.lastDBDump.Results[i].Validation = fv
m.logger.Printf("[INFO] Re-validated %s from disk: valid=%v tables=%d",
filename, fv.Valid, fv.TableCount)
}
}
}
}
```
And update `createSession`:
```go
// BEFORE:
s.sessions[token] = &session{token: token, expiresAt: time.Now().Add(sessionMaxAge)}
Note: Add `"path/filepath"` to imports if not already present.
// AFTER:
s.sessions[token] = &session{expiresAt: time.Now().Add(sessionMaxAge)}
This runs every 5 minutes (same cadence as the cache refresh) and will automatically
heal any stale validation state in `lastDBDump` by cross-referencing the fresh
`ListDumpFiles` results.
### Fix 4: Disable conflicting systemd timer (manual step)
If the diagnostic step above reveals that `backup-db-dump.timer` is still active:
```bash
sudo systemctl stop backup-db-dump.timer
sudo systemctl disable backup-db-dump.timer
# Optionally verify:
systemctl list-timers | grep backup
# Should show nothing
```
After these changes, remove the `"crypto/subtle"` import if no longer used.
The controller's built-in `db-dump` scheduler job at 02:30 replaces this timer entirely.
Having both run simultaneously can corrupt dump files mid-write.
**Verify:** Log in, navigate around — session should work. Log out — should redirect to login.
### Verify
---
## Fix 5: `cleanupSessions` goroutine leak
**File:** `internal/web/auth.go`
**Problem:** `time.Tick()` creates a ticker that can never be GC'd. The goroutine runs forever, even during shutdown.
**Fix:** This one is lower priority since the controller runs as a long-lived process, but the fix is simple. Since we don't currently pass a context to `NewServer`, use a `done` channel on the server:
Add a `done` channel to the Server struct:
```go
type Server struct {
// ... existing fields ...
done chan struct{}
}
```
Initialize it in `NewServer`:
```go
func NewServer(...) *Server {
s := &Server{
// ... existing ...
done: make(chan struct{}),
}
s.loadTemplates()
go s.cleanupSessions()
return s
}
```
Rewrite `cleanupSessions`:
```go
func (s *Server) cleanupSessions() {
ticker := time.NewTicker(15 * time.Minute)
defer ticker.Stop()
for {
select {
case <-s.done:
return
case <-ticker.C:
s.sessionsMu.Lock()
now := time.Now()
for t, sess := range s.sessions {
if now.After(sess.expiresAt) {
delete(s.sessions, t)
}
}
s.sessionsMu.Unlock()
}
}
}
```
Add a `Close` method (called from main during shutdown, optional for now):
```go
func (s *Server) Close() {
close(s.done)
}
```
**Verify:** Build succeeds, controller starts without errors.
---
## Fix 6: Add `http.MaxBytesReader` to API POST endpoints
**File:** `internal/api/router.go`
**Problem:** `json.NewDecoder(req.Body).Decode(&body)` has no size limit. A malicious or accidental large POST could exhaust memory.
**Fix:** Add a helper and use it in all handlers that decode JSON bodies (`deployStack`, `updateOptionalConfig`, `deleteStack`):
Add helper at the bottom of `router.go`:
```go
// limitBody wraps the request body with a size limit (default 1MB).
func limitBody(w http.ResponseWriter, req *http.Request) {
req.Body = http.MaxBytesReader(w, req.Body, 1<<20) // 1MB
}
```
Then at the start of each handler that reads the body:
```go
func (r *Router) deployStack(w http.ResponseWriter, req *http.Request, name string) {
limitBody(w, req)
// ... existing json decode ...
}
func (r *Router) updateOptionalConfig(w http.ResponseWriter, req *http.Request, name string) {
limitBody(w, req)
// ...
}
func (r *Router) deleteStack(w http.ResponseWriter, req *http.Request, name string) {
limitBody(w, req)
// ...
}
```
**Verify:** Build succeeds. Normal deploy/config/delete still works (payloads are tiny).
---
## Fix 7: Cache `time.LoadLocation` in template funcmap
**File:** `internal/web/funcmap.go`
**Problem:** At least 5 template functions call `time.LoadLocation("Europe/Budapest")` on every render. While Go caches internally, it still acquires a mutex each time.
**Fix:** Load once at the top of `templateFuncMap` and capture in the closures:
```go
func (s *Server) templateFuncMap() template.FuncMap {
loc, err := time.LoadLocation("Europe/Budapest")
if err != nil {
loc = time.UTC
}
return template.FuncMap{
// ... in every function that currently calls time.LoadLocation,
// replace with the captured `loc` variable.
// Remove the per-function `loc, _ := time.LoadLocation(...)` lines.
// Example:
"timeAgo": func(t time.Time) string {
if t.IsZero() { return "" }
now := time.Now().In(loc)
d := now.Sub(t.In(loc))
// ... rest unchanged ...
},
// Apply same pattern to: fmtTime, fmtTimeShort, nextRunLabel, nextPruneLabel
}
}
```
There are 5 functions that need this change: `timeAgo`, `fmtTime`, `fmtTimeShort`, `nextRunLabel`, `nextPruneLabel`.
**Verify:** Build succeeds. Dashboard/backup page timestamps still display correctly in Budapest time.
After deploying fixes:
1. Wait for cache refresh (5 minutes) or trigger a manual backup ("Mentés most")
2. Check `/backups` page — validation column should show "X tábla" for all databases
3. Check controller logs for `[DEBUG] ValidateDump` lines confirming validation ran
4. Verify no `[WARN] ValidateDump FAIL` lines in logs
---
## Post-fix checklist
1. `grep -rn 'NotFound(w, nil)' internal/` → 0 results
2. `grep -rn 'subtle.ConstantTimeCompare' internal/` → 0 results (unless used elsewhere)
3. `grep -rn 'time.Tick(' internal/` → 0 results
4. `grep -rn 'Secure:.*true' internal/web/auth.go` → 0 results (now dynamic)
5. Build: `go build ./cmd/controller/` succeeds with no errors
6. `go vet ./...` passes
7. Version bump in build to v0.6.1
8. Commit, push, build, deploy, verify on demo-felhom.eu
### Hub (felhom.eu repo → hub/)
- [ ] `grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/` → 0 results
- [ ] `parseSQLiteTime` function exists in store.go
- [ ] `go build ./cmd/hub/` succeeds
- [ ] `go vet ./...` passes
- [ ] Build new image, deploy to k3s
- [ ] hub.felhom.eu shows OK status for demo-felhom
- [ ] Report history shows real timestamps
### Controller (deploy-felhom-compose repo → controller/)
- [ ] Template has 4-branch validation check (Valid / Error / zero-value guard)
- [ ] `RefreshCache` has cross-check re-validation logic
- [ ] `ValidateDump` has debug logging
- [ ] `backup-db-dump.timer` is disabled on demo-felhom
- [ ] `go build ./cmd/controller/` succeeds
- [ ] `go vet ./...` passes
- [ ] Build, deploy to demo-felhom
- [ ] Backup page shows table counts, not "Hiba"
- [ ] Controller logs show `[DEBUG] ValidateDump OK` entries
### Version bumps
- Hub: bump to next patch version
- Controller: include in v0.6.1 release (alongside the code review fixes from the other TASK.md)