Create CLAUDE.md + cleanup + statusIcon fix (felhom.eu repo)

This commit is contained in:
2026-02-16 15:49:53 +01:00
parent ec99aad217
commit 4bee8c3526
+281 -326
View File
@@ -1,361 +1,316 @@
# TASK.md — Hub Dashboard Bugs + Backup Validation Fix
# TASK: Create CLAUDE.md + cleanup + statusIcon fix (felhom.eu repo)
## Overview
## Context
Three bugs identified from the live hub.felhom.eu and controller backup page:
The `felhom.eu` repo lacks a CLAUDE.md with build instructions for the hub, has no `.gitignore` (so `hub.exe` got committed), and has a statusIcon rendering bug on the hub dashboard.
1. **Hub main page shows DOWN** despite the detail page showing STATUS: OK
2. **Hub report history timestamps show 00:00:00** instead of actual times
3. **Backup page shows "Hiba" for all DB validations** with no tooltip detail
**Current state:** Hub v0.1.2 running on k3s. Controller v0.6.2 on demo node.
Bugs 1 and 2 share the same root cause (timestamp parsing). Bug 3 is in the controller.
All changes in this task are in the **felhom.eu repo** only.
---
## Bug 1 & 2: Hub timestamp parsing failure
## Task 1: Create CLAUDE.md
**Repository:** `felhom.eu``hub/`
Create `CLAUDE.md` in the repo root (`E:\git\felhom.eu\CLAUDE.md`) with the following content.
Use the controller's `CLAUDE.md` (in `deploy-felhom-compose`) as a style reference.
### Root cause
The CLAUDE.md should include these sections:
The hub's SQLite store parses `received_at` timestamps with a single format:
### Project overview
```go
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
This repo (`felhom.eu`) contains:
- **Website** (`website/`) — Static HTML pages at felhom.eu, served via k3s nginx + git-sync sidecar
- **Hub** (`hub/`) — Go application (felhom-hub) — centralized dashboard for monitoring customer controllers, runs on k3s at hub.felhom.eu
- **K8s manifests** (`manifests/`) — k3s deployment manifests for all felhom-system services
See `README.md` for full architecture, DNS, email, and SEO documentation.
See `TASK.md` for the current task to implement (if it exists).
### Code quality rules
Same as controller CLAUDE.md:
- Always double-check generated code for bugs, logic issues, syntax errors
- Handle edge cases without overcomplicating
- Add debug capabilities for troubleshooting
- Ask for more input rather than guessing
### Workspace layout
```
E:\git\felhom.eu\ (or /e/git/felhom.eu/ in Git Bash)
├── hub/ # felhom-hub Go application
│ ├── cmd/hub/ # Entry point (main.go)
│ ├── internal/
│ │ ├── api/ # Report ingestion API
│ │ ├── store/ # SQLite storage + queries
│ │ └── web/ # Dashboard UI
│ │ ├── server.go # Server, routing, template funcs
│ │ ├── embed.go # go:embed for templates
│ │ └── templates/ # HTML templates + CSS
│ ├── configs/ # Example config files
│ ├── Dockerfile
│ ├── Makefile
│ └── go.mod
├── manifests/ # k3s deployment manifests
│ ├── hub.yaml # Hub deployment (hub.felhom.eu)
│ ├── webpage.yaml # Website + FileBrowser + git-sync
│ ├── contact-mailer.yaml # Contact form email sender
│ ├── healthchecks.yaml # Healthchecks (status.felhom.eu)
│ └── umami.yaml # Analytics (stats.felhom.eu)
├── website/ # Static HTML pages (felhom.eu)
│ ├── index.html
│ ├── alkalmazasok.html
│ ├── ... (all Hungarian, UTF-8 with BOM)
│ └── assets/ # Logos, screenshots, OG images
├── CLAUDE.md # This file
├── README.md # Full project documentation
└── TASK.md # Current task (if exists)
```
The parse error is silently discarded (`_`). When the format doesn't match what the
`modernc.org/sqlite` driver returns, `ReceivedAt` becomes Go's zero time (`0001-01-01 00:00:00`).
Related repos (same parent directory):
```
E:\git\deploy-felhom-compose\ # felhom-controller Go app + deploy scripts
E:\git\app-catalog-felhom.eu\ # Docker Compose templates per app
E:\git\homelab-manifests\ # k3s cluster manifests (dooplex.hu services)
E:\git\misc-scripts\ # Helper scripts (build scripts, repo collector)
```
**Consequences:**
- `time.Since(zeroTime)` ≈ 740,000+ hours → `TimeSinceReport > 1 hour`**OverallStatus = "down"**
- `zeroTime.Format("15:04:05")`**"00:00:00"** in report history
- Detail page health status shows OK because that comes from the report JSON payload, not the timestamp
All repos hosted at `gitea.dooplex.hu/admin/`.
The `modernc.org/sqlite` driver may return datetime strings in various formats depending on
how the value was stored and the SQLite version:
- `2026-02-16 14:30:00` (what we expect)
- `2026-02-16T14:30:00Z` (ISO 8601 / RFC3339-ish)
- `2026-02-16 14:30:00+00:00` (with timezone offset)
- `2026-02-16 14:30:00.123456` (with fractional seconds)
### SSH access
### Fix: `hub/internal/store/store.go`
SSH key-based authentication configured. No password prompts.
**Step 1:** Add a robust timestamp parser function at the bottom of store.go:
| Host | IP | User | Role |
|------|----|------|------|
| Build server (k3s node) | 192.168.0.180 | kisfenyo | Build + push images, kubectl |
| Demo node | 192.168.0.162 | kisfenyo | Test deployment (demo-felhom.eu) |
**Note:** `kubectl` on the build server requires `sudo` (k3s kubeconfig permissions).
### Build & deploy workflow — Hub
After making code changes to `hub/`, you **MUST** build, push, and deploy the new image.
Do NOT leave code changes uncommitted or undeployed.
#### Step 1: Commit and push changes
```bash
cd /e/git/felhom.eu
git add -A && git commit -m "<descriptive message>" && git push
```
#### Step 2: Build + push the container image on the build server
The build server (192.168.0.180) has the build toolchain. The build script lives at
`~/build/felhom-hub/build.sh` on the build server (NOT in this repo).
First, check the current running version:
```bash
ssh kisfenyo@192.168.0.180 "sudo kubectl get deploy -n felhom-system hub -o jsonpath='{.spec.template.spec.containers[0].image}'"
```
Then build with the next version (e.g., if current is 0.1.2, use 0.1.3):
```bash
ssh kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <NEW_VERSION> --push"
```
The build script:
- Pulls latest code from Gitea (`git pull` on the felhom.eu repo)
- Copies `hub/` source to a clean build workspace
- Builds Docker image with version + build-time ldflags
- Pushes to `gitea.dooplex.hu/admin/felhom-hub:<VERSION>` and `:latest`
#### Step 3: Deploy to k3s
```bash
ssh kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:<NEW_VERSION>"
```
#### Step 4: Verify the deployment
```bash
ssh kisfenyo@192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub && echo '---' && sudo kubectl logs -n felhom-system -l app=hub --tail 10"
```
Should show pod Running and `[INFO] felhom-hub <VERSION> starting` in logs.
#### Build workflow summary
| Step | Command | Where |
|------|---------|-------|
| 1. Commit + push | `git add -A && git commit && git push` | Local (this repo) |
| 2. Build + push image | `ssh 192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <VER> --push"` | Build server |
| 3. Deploy | `ssh 192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=...:<VER>"` | Build server (kubectl) |
| 4. Verify | `ssh 192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub"` | Build server |
### Build & deploy workflow — Website
The website auto-deploys via git-sync sidecar. Just push to `main`:
```bash
cd /e/git/felhom.eu
git add -A && git commit -m "<message>" && git push
```
Changes are live within 1-2 minutes. No build step needed.
For emergency edits, use FileBrowser at `https://files.felhom.eu`.
### Build & deploy workflow — K8s Manifests
Manifests are applied manually:
```bash
ssh kisfenyo@192.168.0.180 "sudo kubectl apply -f /home/kisfenyo/git/felhom.eu/manifests/<manifest>.yaml"
```
Remember to `git pull` on the build server first if you pushed changes locally.
### Tech stack (Hub)
- **Language:** Go 1.24+
- **Web framework:** stdlib `net/http` + `html/template`
- **Database:** SQLite via `modernc.org/sqlite` (pure Go, no CGo)
- **Auth:** bcrypt password hash + basic auth
- **Deployment:** Docker container on k3s (felhom-system namespace)
- **Storage:** Longhorn PVC at `/data/` (SQLite DB)
- **Config:** YAML file mounted via k8s ConfigMap at `/etc/felhom-hub/hub.yaml`
### Key patterns
- Hub receives reports from customer controllers via `POST /api/v1/report` (Bearer token auth)
- Dashboard shows all customers in a table with status, CPU, memory, disk, containers, backup age
- Customer detail page shows system info, report history, full JSON report
- Status logic: OK (report < 30m), WARN (30m-1h or health=warn), DOWN (> 1h or health=fail)
- SQLite timestamps may vary in format — use `parseSQLiteTime()` for robust parsing
- Auto-refresh: dashboard and detail pages refresh every 60 seconds via `<meta http-equiv="refresh">`
- Geo-restricted to Hungary via nginx ingress annotation
### File encoding
All HTML files in `website/` are **UTF-8 with BOM**. Ensure your editor preserves this.
Hub Go source files are standard UTF-8 (no BOM).
---
## Task 2: Create .gitignore
Create `.gitignore` in the repo root with appropriate entries:
```gitignore
# Go binaries
hub/hub
hub/hub.exe
hub/bin/
# Build artifacts
*.exe
*.dll
*.so
*.dylib
# Test and coverage
*.test
*.out
coverage.html
# IDE
.idea/
.vscode/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db
# Temporary files
*.tmp
*.bak
```
## Task 3: Remove hub.exe from git history
After creating `.gitignore`, remove the committed binary:
```bash
git rm --cached hub/hub.exe
```
This removes it from tracking but the `.gitignore` prevents re-adding. No need to rewrite
history — just remove from current tree.
Also check for any other binaries that shouldn't be tracked:
```bash
find hub/ -name "*.exe" -o -name "hub" -type f -executable
```
## Task 4: Fix hub statusIcon rendering
**File:** `hub/internal/web/server.go`
**Problem:** `statusIcon()` returns HTML entities (`&#x1F7E2;`), but Go's `html/template`
auto-escapes them to literal text (`&amp;#x1F7E2;`). Additionally, emoji don't respond to
CSS `color` — but the templates already apply `style="color: {{statusColor .OverallStatus}}"`.
**Fix:** Change `statusIcon()` to return `●` (U+25CF, BLACK CIRCLE) — a plain Unicode character
that responds to CSS color styling. The existing `statusColor()` function handles color differentiation.
```go
// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return.
func parseSQLiteTime(s string) time.Time {
formats := []string{
"2006-01-02 15:04:05", // SQLite datetime('now')
"2006-01-02T15:04:05Z", // RFC3339 without fractional
time.RFC3339, // 2006-01-02T15:04:05Z07:00
time.RFC3339Nano, // with fractional seconds
"2006-01-02 15:04:05+00:00", // with explicit UTC offset
"2006-01-02 15:04:05.999999999", // with fractional, no TZ
// BEFORE (broken):
func statusIcon(status string) string {
switch status {
case "ok":
return "&#x1F7E2;" // green circle
case "warn":
return "&#x1F7E1;" // yellow circle
case "down", "fail":
return "&#x1F534;" // red circle
default:
return "&#x26AA;" // white circle
}
for _, f := range formats {
if t, err := time.Parse(f, s); err == nil {
return t
}
}
// Last resort: if string is non-empty, log it for debugging
if s != "" {
log.Printf("[WARN] Could not parse timestamp: %q", s)
}
return time.Time{} // zero time
}
// AFTER (works with CSS color):
func statusIcon(status string) string {
return "●"
}
```
Note: Add `"log"` to the import block if not already present.
No template changes needed — `statusColor()` already provides the correct color per status.
**Step 2:** Replace ALL occurrences of `time.Parse("2006-01-02 15:04:05", receivedAt)` in store.go.
There are **three** locations:
1. **`GetCustomers()`** — in the `for rows.Next()` loop:
```go
// BEFORE:
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
// AFTER:
c.ReceivedAt = parseSQLiteTime(receivedAt)
```
2. **`GetCustomer()`** — after `row.Scan`:
```go
// BEFORE:
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
// AFTER:
c.ReceivedAt = parseSQLiteTime(receivedAt)
```
3. **`GetCustomerHistory()`** — in the `for rows.Next()` loop:
```go
// BEFORE:
c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt)
// AFTER:
c.ReceivedAt = parseSQLiteTime(receivedAt)
```
**Step 3 (optional diagnostic):** Temporarily add a log line in `SaveReport` to see what format
SQLite actually stores/returns. This can be removed after verifying the fix:
```go
// Add after the INSERT in SaveReport, before return:
// Debug: check what format SQLite returns
var dbTime string
s.db.QueryRow("SELECT received_at FROM reports WHERE customer_id = ? ORDER BY id DESC LIMIT 1", customerID).Scan(&dbTime)
s.logger.Printf("[DEBUG] SQLite received_at raw value: %q", dbTime)
```
### Verify
After rebuilding and deploying the hub:
1. Wait for the next controller report push (or trigger manually)
2. Check hub.felhom.eu — status should show **OK** (green), not DOWN
3. Click into customer detail — "Last report: X min ago" should show a reasonable value
4. Report History timestamps should show actual times like `14:36:32`, not `00:00:00`
5. Check hub pod logs for any `[WARN] Could not parse timestamp` messages (should be none)
### Post-fix grep
```bash
grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/store.go
# Should return 0 results — all replaced with parseSQLiteTime()
```
**Verification:**
1. Dashboard: colored dot (green/yellow/red) before customer name, no `&#x` text
2. Customer detail: colored dot in header
3. Colors match status (green=OK, yellow=WARN, red=DOWN)
---
## Bug 3: Backup page shows "Hiba" for all DB validations
## Build & Deploy
**Repository:** `deploy-felhom-compose``controller/`
### Symptoms
- All 3 databases (immich, paperless, romm) show "Hiba" in the Érvényesítés column
- The Állapot column shows "OK" (dump succeeded)
- No tooltip text on hover (meaning `Validation.Error` is empty)
- Dump files are valid — headers are correct, sizes are reasonable (43.2 MB / 319.6 KB / 38.7 KB)
### Analysis
The template condition for "Hiba" in the `LastDBDump` path is:
```html
{{if .Error}} → shows "" (dump failed)
{{else if .Validation.Valid}} → shows "X tábla" (validation passed)
{{else}} → shows "Hiba" (THIS IS WHAT WE SEE)
```
"Hiba" with empty tooltip means `Validation.Valid == false` AND `Validation.Error == ""`.
This is the **zero-value** of `DumpValidation{}` — meaning validation was never assigned.
The code in `DumpOne()` calls `ValidateDump()` and the code in `ListDumpFiles()` also calls
`ValidateDump()`. Both paths should populate the Validation field. Yet the UI shows zero-value.
**Most likely cause:** The `lastDBDump` state was populated by an older code version (before
validation was wired), OR there's a race condition where `RefreshCache` captures `lastDBDump`
mid-construction, OR the validation ran but hit an unexpected issue (permissions, encoding).
### Diagnostic step (run on demo-felhom FIRST)
Before applying fixes, check the controller logs to understand what happened:
After all changes, commit and deploy hub v0.1.3:
```bash
# Check the last DB dump run
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "db dump|table|valid|dump:"
# 1. Commit
cd /e/git/felhom.eu
git add -A && git commit -m "add CLAUDE.md, .gitignore, fix statusIcon rendering" && git push
# Check if there was a controller restart
sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "starting|version|shutdown"
# 2. Build
ssh kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh 0.1.3 --push"
# Check if the old bash systemd timer is ALSO running (double-dump conflict!)
systemctl is-active backup-db-dump.timer
systemctl list-timers | grep backup
# 3. Deploy
ssh kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:0.1.3"
# 4. Verify
ssh kisfenyo@192.168.0.180 "sudo kubectl rollout status -n felhom-system deploy/hub && sudo kubectl logs -n felhom-system -l app=hub --tail 5"
```
**IMPORTANT:** If `backup-db-dump.timer` is still active, it will race with the controller's
built-in `db-dump` scheduler job. Both write to the same directory. The bash script overwrites
files directly (no `.tmp` + rename), which could corrupt the file mid-validation. **Disable it:**
## Post-deploy checklist
```bash
sudo systemctl stop backup-db-dump.timer
sudo systemctl disable backup-db-dump.timer
```
### Fix 1: Add debug logging to `ValidateDump`
**File:** `controller/internal/backup/dbdump.go`, function `ValidateDump`
Add a log parameter and diagnostic output so we can see what's happening:
```go
// BEFORE:
func ValidateDump(filePath string, dbType DBType) DumpValidation {
// AFTER:
func ValidateDump(filePath string, dbType DBType) DumpValidation {
log.Printf("[DEBUG] ValidateDump: %s (type=%s)", filePath, dbType)
```
And at the end, before `return v`:
```go
v.Valid = true
log.Printf("[DEBUG] ValidateDump OK: %s — %d tables, header found", filePath, tableCount)
return v
}
```
Also add logging to the error paths:
After `v.Error = "dump file too small (< 100 bytes)"`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
```
After `v.Error = fmt.Sprintf("read failed: %v", err)`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
```
After `v.Error = "... dump missing comment header"`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error)
```
After `v.Error = "no CREATE TABLE statements found"`:
```go
log.Printf("[WARN] ValidateDump FAIL: %s — %s (header was found, scanned %d lines)", filePath, v.Error, len(strings.Split(content, "\n")))
```
Note: Import `"log"` at the top of the file if not already imported (use the standard `log`
package, not the `*log.Logger` parameter — this is a quick debug addition. Can be cleaned up later.)
### Fix 2: Template guard against zero-value Validation
Even with debug logging, we should make the template resilient to zero-value Validation.
The "Hiba" label with no explanation is a bad UX.
**File:** `controller/internal/web/templates/backups.html`
In the `LastDBDump` section, change the Érvényesítés (validation) column:
```html
<!-- BEFORE: -->
{{if .Error}}
<span class="validation-badge validation-na"></span>
{{else if .Validation.Valid}}
<span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
{{else}}
<span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
{{end}}
<!-- AFTER: -->
{{if .Error}}
<span class="validation-badge validation-na"></span>
{{else if .Validation.Valid}}
<span class="validation-badge validation-ok">{{.Validation.TableCount}} tábla</span>
{{else if .Validation.Error}}
<span class="validation-badge validation-fail" title="{{.Validation.Error}}">Hiba</span>
{{else}}
<span class="validation-badge validation-na" title="Az érvényesítés nem futott le"></span>
{{end}}
```
This ensures:
- If validation passed → green badge with table count
- If validation failed with a reason → red "Hiba" with tooltip
- If validation never ran (zero-value) → gray "" with explanatory tooltip
### Fix 3: Re-validate on cache refresh (belt-and-suspenders)
Since `RefreshCache` already calls `ListDumpFiles()` which runs `ValidateDump()` per file,
the `DumpFiles` fallback always has fresh validation. The issue is only in the `LastDBDump`
path when in-memory results have stale/missing validation.
Add a cross-check: if `LastDBDump` results have zero-value Validation but the file exists,
re-validate it. Add this in `RefreshCache`, after the existing code:
**File:** `controller/internal/backup/backup.go`, function `RefreshCache`
After the line `status.DumpFiles = files` and before the lock section, add:
```go
// Cross-check: if LastDBDump results have empty validation but files exist,
// re-validate from disk. This handles controller restarts and race conditions.
if m.lastDBDump != nil {
fileValidation := make(map[string]DumpValidation) // keyed by filename
for _, f := range files {
fileValidation[f.FileName] = f.Validation
}
for i, r := range m.lastDBDump.Results {
if !r.Validation.Valid && r.Validation.Error == "" && r.FilePath != "" {
filename := filepath.Base(r.FilePath)
if fv, ok := fileValidation[filename]; ok {
m.lastDBDump.Results[i].Validation = fv
m.logger.Printf("[INFO] Re-validated %s from disk: valid=%v tables=%d",
filename, fv.Valid, fv.TableCount)
}
}
}
}
```
Note: Add `"path/filepath"` to imports if not already present.
This runs every 5 minutes (same cadence as the cache refresh) and will automatically
heal any stale validation state in `lastDBDump` by cross-referencing the fresh
`ListDumpFiles` results.
### Fix 4: Disable conflicting systemd timer (manual step)
If the diagnostic step above reveals that `backup-db-dump.timer` is still active:
```bash
sudo systemctl stop backup-db-dump.timer
sudo systemctl disable backup-db-dump.timer
# Optionally verify:
systemctl list-timers | grep backup
# Should show nothing
```
The controller's built-in `db-dump` scheduler job at 02:30 replaces this timer entirely.
Having both run simultaneously can corrupt dump files mid-write.
### Verify
After deploying fixes:
1. Wait for cache refresh (5 minutes) or trigger a manual backup ("Mentés most")
2. Check `/backups` page — validation column should show "X tábla" for all databases
3. Check controller logs for `[DEBUG] ValidateDump` lines confirming validation ran
4. Verify no `[WARN] ValidateDump FAIL` lines in logs
---
## Post-fix checklist
### Hub (felhom.eu repo → hub/)
- [ ] `grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/` → 0 results
- [ ] `parseSQLiteTime` function exists in store.go
- [ ] `go build ./cmd/hub/` succeeds
- [ ] `go vet ./...` passes
- [ ] Build new image, deploy to k3s
- [ ] hub.felhom.eu shows OK status for demo-felhom
- [ ] Report history shows real timestamps
### Controller (deploy-felhom-compose repo → controller/)
- [ ] Template has 4-branch validation check (Valid / Error / zero-value guard)
- [ ] `RefreshCache` has cross-check re-validation logic
- [ ] `ValidateDump` has debug logging
- [ ] `backup-db-dump.timer` is disabled on demo-felhom
- [ ] `go build ./cmd/controller/` succeeds
- [ ] `go vet ./...` passes
- [ ] Build, deploy to demo-felhom
- [ ] Backup page shows table counts, not "Hiba"
- [ ] Controller logs show `[DEBUG] ValidateDump OK` entries
### Version bumps
- Hub: bump to next patch version
- Controller: include in v0.6.1 release (alongside the code review fixes from the other TASK.md)
- [ ] `hub.felhom.eu` shows colored `●` dot, not `&#x1F7E2;` text
- [ ] `hub.exe` no longer in repo (`git ls-files hub/hub.exe` returns empty)
- [ ] `CLAUDE.md` exists in repo root
- [ ] `.gitignore` exists in repo root