From 4bee8c35264914ec6bc3d8974adeb8c6070054ae Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Mon, 16 Feb 2026 15:49:53 +0100 Subject: [PATCH] Create CLAUDE.md + cleanup + statusIcon fix (felhom.eu repo) --- TASK.md | 607 ++++++++++++++++++++++++++------------------------------ 1 file changed, 281 insertions(+), 326 deletions(-) diff --git a/TASK.md b/TASK.md index 84defc1..e319be1 100644 --- a/TASK.md +++ b/TASK.md @@ -1,361 +1,316 @@ -# TASK.md — Hub Dashboard Bugs + Backup Validation Fix +# TASK: Create CLAUDE.md + cleanup + statusIcon fix (felhom.eu repo) -## Overview +## Context -Three bugs identified from the live hub.felhom.eu and controller backup page: +The `felhom.eu` repo lacks a CLAUDE.md with build instructions for the hub, has no `.gitignore` (so `hub.exe` got committed), and has a statusIcon rendering bug on the hub dashboard. -1. **Hub main page shows DOWN** despite the detail page showing STATUS: OK -2. **Hub report history timestamps show 00:00:00** instead of actual times -3. **Backup page shows "Hiba" for all DB validations** with no tooltip detail +**Current state:** Hub v0.1.2 running on k3s. Controller v0.6.2 on demo node. -Bugs 1 and 2 share the same root cause (timestamp parsing). Bug 3 is in the controller. +All changes in this task are in the **felhom.eu repo** only. --- -## Bug 1 & 2: Hub timestamp parsing failure +## Task 1: Create CLAUDE.md -**Repository:** `felhom.eu` → `hub/` +Create `CLAUDE.md` in the repo root (`E:\git\felhom.eu\CLAUDE.md`) with the following content. +Use the controller's `CLAUDE.md` (in `deploy-felhom-compose`) as a style reference. -### Root cause +The CLAUDE.md should include these sections: -The hub's SQLite store parses `received_at` timestamps with a single format: +### Project overview -```go -c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) +This repo (`felhom.eu`) contains: +- **Website** (`website/`) — Static HTML pages at felhom.eu, served via k3s nginx + git-sync sidecar +- **Hub** (`hub/`) — Go application (felhom-hub) — centralized dashboard for monitoring customer controllers, runs on k3s at hub.felhom.eu +- **K8s manifests** (`manifests/`) — k3s deployment manifests for all felhom-system services + +See `README.md` for full architecture, DNS, email, and SEO documentation. +See `TASK.md` for the current task to implement (if it exists). + +### Code quality rules + +Same as controller CLAUDE.md: +- Always double-check generated code for bugs, logic issues, syntax errors +- Handle edge cases without overcomplicating +- Add debug capabilities for troubleshooting +- Ask for more input rather than guessing + +### Workspace layout + +``` +E:\git\felhom.eu\ (or /e/git/felhom.eu/ in Git Bash) +├── hub/ # felhom-hub Go application +│ ├── cmd/hub/ # Entry point (main.go) +│ ├── internal/ +│ │ ├── api/ # Report ingestion API +│ │ ├── store/ # SQLite storage + queries +│ │ └── web/ # Dashboard UI +│ │ ├── server.go # Server, routing, template funcs +│ │ ├── embed.go # go:embed for templates +│ │ └── templates/ # HTML templates + CSS +│ ├── configs/ # Example config files +│ ├── Dockerfile +│ ├── Makefile +│ └── go.mod +├── manifests/ # k3s deployment manifests +│ ├── hub.yaml # Hub deployment (hub.felhom.eu) +│ ├── webpage.yaml # Website + FileBrowser + git-sync +│ ├── contact-mailer.yaml # Contact form email sender +│ ├── healthchecks.yaml # Healthchecks (status.felhom.eu) +│ └── umami.yaml # Analytics (stats.felhom.eu) +├── website/ # Static HTML pages (felhom.eu) +│ ├── index.html +│ ├── alkalmazasok.html +│ ├── ... (all Hungarian, UTF-8 with BOM) +│ └── assets/ # Logos, screenshots, OG images +├── CLAUDE.md # This file +├── README.md # Full project documentation +└── TASK.md # Current task (if exists) ``` -The parse error is silently discarded (`_`). When the format doesn't match what the -`modernc.org/sqlite` driver returns, `ReceivedAt` becomes Go's zero time (`0001-01-01 00:00:00`). +Related repos (same parent directory): +``` +E:\git\deploy-felhom-compose\ # felhom-controller Go app + deploy scripts +E:\git\app-catalog-felhom.eu\ # Docker Compose templates per app +E:\git\homelab-manifests\ # k3s cluster manifests (dooplex.hu services) +E:\git\misc-scripts\ # Helper scripts (build scripts, repo collector) +``` -**Consequences:** -- `time.Since(zeroTime)` ≈ 740,000+ hours → `TimeSinceReport > 1 hour` → **OverallStatus = "down"** -- `zeroTime.Format("15:04:05")` → **"00:00:00"** in report history -- Detail page health status shows OK because that comes from the report JSON payload, not the timestamp +All repos hosted at `gitea.dooplex.hu/admin/`. -The `modernc.org/sqlite` driver may return datetime strings in various formats depending on -how the value was stored and the SQLite version: -- `2026-02-16 14:30:00` (what we expect) -- `2026-02-16T14:30:00Z` (ISO 8601 / RFC3339-ish) -- `2026-02-16 14:30:00+00:00` (with timezone offset) -- `2026-02-16 14:30:00.123456` (with fractional seconds) +### SSH access -### Fix: `hub/internal/store/store.go` +SSH key-based authentication configured. No password prompts. -**Step 1:** Add a robust timestamp parser function at the bottom of store.go: +| Host | IP | User | Role | +|------|----|------|------| +| Build server (k3s node) | 192.168.0.180 | kisfenyo | Build + push images, kubectl | +| Demo node | 192.168.0.162 | kisfenyo | Test deployment (demo-felhom.eu) | + +**Note:** `kubectl` on the build server requires `sudo` (k3s kubeconfig permissions). + +### Build & deploy workflow — Hub + +After making code changes to `hub/`, you **MUST** build, push, and deploy the new image. +Do NOT leave code changes uncommitted or undeployed. + +#### Step 1: Commit and push changes + +```bash +cd /e/git/felhom.eu +git add -A && git commit -m "" && git push +``` + +#### Step 2: Build + push the container image on the build server + +The build server (192.168.0.180) has the build toolchain. The build script lives at +`~/build/felhom-hub/build.sh` on the build server (NOT in this repo). + +First, check the current running version: +```bash +ssh kisfenyo@192.168.0.180 "sudo kubectl get deploy -n felhom-system hub -o jsonpath='{.spec.template.spec.containers[0].image}'" +``` + +Then build with the next version (e.g., if current is 0.1.2, use 0.1.3): +```bash +ssh kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh --push" +``` + +The build script: +- Pulls latest code from Gitea (`git pull` on the felhom.eu repo) +- Copies `hub/` source to a clean build workspace +- Builds Docker image with version + build-time ldflags +- Pushes to `gitea.dooplex.hu/admin/felhom-hub:` and `:latest` + +#### Step 3: Deploy to k3s + +```bash +ssh kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:" +``` + +#### Step 4: Verify the deployment + +```bash +ssh kisfenyo@192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub && echo '---' && sudo kubectl logs -n felhom-system -l app=hub --tail 10" +``` + +Should show pod Running and `[INFO] felhom-hub starting` in logs. + +#### Build workflow summary + +| Step | Command | Where | +|------|---------|-------| +| 1. Commit + push | `git add -A && git commit && git push` | Local (this repo) | +| 2. Build + push image | `ssh 192.168.0.180 "cd ~/build/felhom-hub && ./build.sh --push"` | Build server | +| 3. Deploy | `ssh 192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=...:"` | Build server (kubectl) | +| 4. Verify | `ssh 192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub"` | Build server | + +### Build & deploy workflow — Website + +The website auto-deploys via git-sync sidecar. Just push to `main`: + +```bash +cd /e/git/felhom.eu +git add -A && git commit -m "" && git push +``` + +Changes are live within 1-2 minutes. No build step needed. + +For emergency edits, use FileBrowser at `https://files.felhom.eu`. + +### Build & deploy workflow — K8s Manifests + +Manifests are applied manually: + +```bash +ssh kisfenyo@192.168.0.180 "sudo kubectl apply -f /home/kisfenyo/git/felhom.eu/manifests/.yaml" +``` + +Remember to `git pull` on the build server first if you pushed changes locally. + +### Tech stack (Hub) + +- **Language:** Go 1.24+ +- **Web framework:** stdlib `net/http` + `html/template` +- **Database:** SQLite via `modernc.org/sqlite` (pure Go, no CGo) +- **Auth:** bcrypt password hash + basic auth +- **Deployment:** Docker container on k3s (felhom-system namespace) +- **Storage:** Longhorn PVC at `/data/` (SQLite DB) +- **Config:** YAML file mounted via k8s ConfigMap at `/etc/felhom-hub/hub.yaml` + +### Key patterns + +- Hub receives reports from customer controllers via `POST /api/v1/report` (Bearer token auth) +- Dashboard shows all customers in a table with status, CPU, memory, disk, containers, backup age +- Customer detail page shows system info, report history, full JSON report +- Status logic: OK (report < 30m), WARN (30m-1h or health=warn), DOWN (> 1h or health=fail) +- SQLite timestamps may vary in format — use `parseSQLiteTime()` for robust parsing +- Auto-refresh: dashboard and detail pages refresh every 60 seconds via `` +- Geo-restricted to Hungary via nginx ingress annotation + +### File encoding + +All HTML files in `website/` are **UTF-8 with BOM**. Ensure your editor preserves this. +Hub Go source files are standard UTF-8 (no BOM). + +--- + +## Task 2: Create .gitignore + +Create `.gitignore` in the repo root with appropriate entries: + +```gitignore +# Go binaries +hub/hub +hub/hub.exe +hub/bin/ + +# Build artifacts +*.exe +*.dll +*.so +*.dylib + +# Test and coverage +*.test +*.out +coverage.html + +# IDE +.idea/ +.vscode/ +*.swp +*.swo +*~ + +# OS +.DS_Store +Thumbs.db + +# Temporary files +*.tmp +*.bak +``` + +## Task 3: Remove hub.exe from git history + +After creating `.gitignore`, remove the committed binary: + +```bash +git rm --cached hub/hub.exe +``` + +This removes it from tracking but the `.gitignore` prevents re-adding. No need to rewrite +history — just remove from current tree. + +Also check for any other binaries that shouldn't be tracked: +```bash +find hub/ -name "*.exe" -o -name "hub" -type f -executable +``` + +## Task 4: Fix hub statusIcon rendering + +**File:** `hub/internal/web/server.go` + +**Problem:** `statusIcon()` returns HTML entities (`🟢`), but Go's `html/template` +auto-escapes them to literal text (`&#x1F7E2;`). Additionally, emoji don't respond to +CSS `color` — but the templates already apply `style="color: {{statusColor .OverallStatus}}"`. + +**Fix:** Change `statusIcon()` to return `●` (U+25CF, BLACK CIRCLE) — a plain Unicode character +that responds to CSS color styling. The existing `statusColor()` function handles color differentiation. ```go -// parseSQLiteTime tries multiple formats that modernc.org/sqlite may return. -func parseSQLiteTime(s string) time.Time { - formats := []string{ - "2006-01-02 15:04:05", // SQLite datetime('now') - "2006-01-02T15:04:05Z", // RFC3339 without fractional - time.RFC3339, // 2006-01-02T15:04:05Z07:00 - time.RFC3339Nano, // with fractional seconds - "2006-01-02 15:04:05+00:00", // with explicit UTC offset - "2006-01-02 15:04:05.999999999", // with fractional, no TZ +// BEFORE (broken): +func statusIcon(status string) string { + switch status { + case "ok": + return "🟢" // green circle + case "warn": + return "🟡" // yellow circle + case "down", "fail": + return "🔴" // red circle + default: + return "⚪" // white circle } - for _, f := range formats { - if t, err := time.Parse(f, s); err == nil { - return t - } - } - // Last resort: if string is non-empty, log it for debugging - if s != "" { - log.Printf("[WARN] Could not parse timestamp: %q", s) - } - return time.Time{} // zero time +} + +// AFTER (works with CSS color): +func statusIcon(status string) string { + return "●" } ``` -Note: Add `"log"` to the import block if not already present. +No template changes needed — `statusColor()` already provides the correct color per status. -**Step 2:** Replace ALL occurrences of `time.Parse("2006-01-02 15:04:05", receivedAt)` in store.go. - -There are **three** locations: - -1. **`GetCustomers()`** — in the `for rows.Next()` loop: -```go -// BEFORE: -c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) - -// AFTER: -c.ReceivedAt = parseSQLiteTime(receivedAt) -``` - -2. **`GetCustomer()`** — after `row.Scan`: -```go -// BEFORE: -c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) - -// AFTER: -c.ReceivedAt = parseSQLiteTime(receivedAt) -``` - -3. **`GetCustomerHistory()`** — in the `for rows.Next()` loop: -```go -// BEFORE: -c.ReceivedAt, _ = time.Parse("2006-01-02 15:04:05", receivedAt) - -// AFTER: -c.ReceivedAt = parseSQLiteTime(receivedAt) -``` - -**Step 3 (optional diagnostic):** Temporarily add a log line in `SaveReport` to see what format -SQLite actually stores/returns. This can be removed after verifying the fix: - -```go -// Add after the INSERT in SaveReport, before return: -// Debug: check what format SQLite returns -var dbTime string -s.db.QueryRow("SELECT received_at FROM reports WHERE customer_id = ? ORDER BY id DESC LIMIT 1", customerID).Scan(&dbTime) -s.logger.Printf("[DEBUG] SQLite received_at raw value: %q", dbTime) -``` - -### Verify - -After rebuilding and deploying the hub: -1. Wait for the next controller report push (or trigger manually) -2. Check hub.felhom.eu — status should show **OK** (green), not DOWN -3. Click into customer detail — "Last report: X min ago" should show a reasonable value -4. Report History timestamps should show actual times like `14:36:32`, not `00:00:00` -5. Check hub pod logs for any `[WARN] Could not parse timestamp` messages (should be none) - -### Post-fix grep - -```bash -grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/store.go -# Should return 0 results — all replaced with parseSQLiteTime() -``` +**Verification:** +1. Dashboard: colored dot (green/yellow/red) before customer name, no `&#x` text +2. Customer detail: colored dot in header +3. Colors match status (green=OK, yellow=WARN, red=DOWN) --- -## Bug 3: Backup page shows "Hiba" for all DB validations +## Build & Deploy -**Repository:** `deploy-felhom-compose` → `controller/` - -### Symptoms - -- All 3 databases (immich, paperless, romm) show "Hiba" in the Érvényesítés column -- The Állapot column shows "OK" (dump succeeded) -- No tooltip text on hover (meaning `Validation.Error` is empty) -- Dump files are valid — headers are correct, sizes are reasonable (43.2 MB / 319.6 KB / 38.7 KB) - -### Analysis - -The template condition for "Hiba" in the `LastDBDump` path is: -```html -{{if .Error}} → shows "–" (dump failed) -{{else if .Validation.Valid}} → shows "X tábla" (validation passed) -{{else}} → shows "Hiba" (THIS IS WHAT WE SEE) -``` - -"Hiba" with empty tooltip means `Validation.Valid == false` AND `Validation.Error == ""`. -This is the **zero-value** of `DumpValidation{}` — meaning validation was never assigned. - -The code in `DumpOne()` calls `ValidateDump()` and the code in `ListDumpFiles()` also calls -`ValidateDump()`. Both paths should populate the Validation field. Yet the UI shows zero-value. - -**Most likely cause:** The `lastDBDump` state was populated by an older code version (before -validation was wired), OR there's a race condition where `RefreshCache` captures `lastDBDump` -mid-construction, OR the validation ran but hit an unexpected issue (permissions, encoding). - -### Diagnostic step (run on demo-felhom FIRST) - -Before applying fixes, check the controller logs to understand what happened: +After all changes, commit and deploy hub v0.1.3: ```bash -# Check the last DB dump run -sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "db dump|table|valid|dump:" +# 1. Commit +cd /e/git/felhom.eu +git add -A && git commit -m "add CLAUDE.md, .gitignore, fix statusIcon rendering" && git push -# Check if there was a controller restart -sudo journalctl -u felhom-controller --since "2026-02-16 00:00" | grep -iE "starting|version|shutdown" +# 2. Build +ssh kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh 0.1.3 --push" -# Check if the old bash systemd timer is ALSO running (double-dump conflict!) -systemctl is-active backup-db-dump.timer -systemctl list-timers | grep backup +# 3. Deploy +ssh kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:0.1.3" + +# 4. Verify +ssh kisfenyo@192.168.0.180 "sudo kubectl rollout status -n felhom-system deploy/hub && sudo kubectl logs -n felhom-system -l app=hub --tail 5" ``` -**IMPORTANT:** If `backup-db-dump.timer` is still active, it will race with the controller's -built-in `db-dump` scheduler job. Both write to the same directory. The bash script overwrites -files directly (no `.tmp` + rename), which could corrupt the file mid-validation. **Disable it:** +## Post-deploy checklist -```bash -sudo systemctl stop backup-db-dump.timer -sudo systemctl disable backup-db-dump.timer -``` - -### Fix 1: Add debug logging to `ValidateDump` - -**File:** `controller/internal/backup/dbdump.go`, function `ValidateDump` - -Add a log parameter and diagnostic output so we can see what's happening: - -```go -// BEFORE: -func ValidateDump(filePath string, dbType DBType) DumpValidation { - -// AFTER: -func ValidateDump(filePath string, dbType DBType) DumpValidation { - log.Printf("[DEBUG] ValidateDump: %s (type=%s)", filePath, dbType) -``` - -And at the end, before `return v`: - -```go - v.Valid = true - log.Printf("[DEBUG] ValidateDump OK: %s — %d tables, header found", filePath, tableCount) - return v -} -``` - -Also add logging to the error paths: - -After `v.Error = "dump file too small (< 100 bytes)"`: -```go -log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error) -``` - -After `v.Error = fmt.Sprintf("read failed: %v", err)`: -```go -log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error) -``` - -After `v.Error = "... dump missing comment header"`: -```go -log.Printf("[WARN] ValidateDump FAIL: %s — %s", filePath, v.Error) -``` - -After `v.Error = "no CREATE TABLE statements found"`: -```go -log.Printf("[WARN] ValidateDump FAIL: %s — %s (header was found, scanned %d lines)", filePath, v.Error, len(strings.Split(content, "\n"))) -``` - -Note: Import `"log"` at the top of the file if not already imported (use the standard `log` -package, not the `*log.Logger` parameter — this is a quick debug addition. Can be cleaned up later.) - -### Fix 2: Template guard against zero-value Validation - -Even with debug logging, we should make the template resilient to zero-value Validation. -The "Hiba" label with no explanation is a bad UX. - -**File:** `controller/internal/web/templates/backups.html` - -In the `LastDBDump` section, change the Érvényesítés (validation) column: - -```html - -{{if .Error}} - -{{else if .Validation.Valid}} - {{.Validation.TableCount}} tábla -{{else}} - Hiba -{{end}} - - -{{if .Error}} - -{{else if .Validation.Valid}} - {{.Validation.TableCount}} tábla -{{else if .Validation.Error}} - Hiba -{{else}} - -{{end}} -``` - -This ensures: -- If validation passed → green badge with table count -- If validation failed with a reason → red "Hiba" with tooltip -- If validation never ran (zero-value) → gray "–" with explanatory tooltip - -### Fix 3: Re-validate on cache refresh (belt-and-suspenders) - -Since `RefreshCache` already calls `ListDumpFiles()` which runs `ValidateDump()` per file, -the `DumpFiles` fallback always has fresh validation. The issue is only in the `LastDBDump` -path when in-memory results have stale/missing validation. - -Add a cross-check: if `LastDBDump` results have zero-value Validation but the file exists, -re-validate it. Add this in `RefreshCache`, after the existing code: - -**File:** `controller/internal/backup/backup.go`, function `RefreshCache` - -After the line `status.DumpFiles = files` and before the lock section, add: - -```go - // Cross-check: if LastDBDump results have empty validation but files exist, - // re-validate from disk. This handles controller restarts and race conditions. - if m.lastDBDump != nil { - fileValidation := make(map[string]DumpValidation) // keyed by filename - for _, f := range files { - fileValidation[f.FileName] = f.Validation - } - for i, r := range m.lastDBDump.Results { - if !r.Validation.Valid && r.Validation.Error == "" && r.FilePath != "" { - filename := filepath.Base(r.FilePath) - if fv, ok := fileValidation[filename]; ok { - m.lastDBDump.Results[i].Validation = fv - m.logger.Printf("[INFO] Re-validated %s from disk: valid=%v tables=%d", - filename, fv.Valid, fv.TableCount) - } - } - } - } -``` - -Note: Add `"path/filepath"` to imports if not already present. - -This runs every 5 minutes (same cadence as the cache refresh) and will automatically -heal any stale validation state in `lastDBDump` by cross-referencing the fresh -`ListDumpFiles` results. - -### Fix 4: Disable conflicting systemd timer (manual step) - -If the diagnostic step above reveals that `backup-db-dump.timer` is still active: - -```bash -sudo systemctl stop backup-db-dump.timer -sudo systemctl disable backup-db-dump.timer -# Optionally verify: -systemctl list-timers | grep backup -# Should show nothing -``` - -The controller's built-in `db-dump` scheduler job at 02:30 replaces this timer entirely. -Having both run simultaneously can corrupt dump files mid-write. - -### Verify - -After deploying fixes: -1. Wait for cache refresh (5 minutes) or trigger a manual backup ("Mentés most") -2. Check `/backups` page — validation column should show "X tábla" for all databases -3. Check controller logs for `[DEBUG] ValidateDump` lines confirming validation ran -4. Verify no `[WARN] ValidateDump FAIL` lines in logs - ---- - -## Post-fix checklist - -### Hub (felhom.eu repo → hub/) -- [ ] `grep -rn 'time.Parse("2006-01-02 15:04:05"' hub/internal/store/` → 0 results -- [ ] `parseSQLiteTime` function exists in store.go -- [ ] `go build ./cmd/hub/` succeeds -- [ ] `go vet ./...` passes -- [ ] Build new image, deploy to k3s -- [ ] hub.felhom.eu shows OK status for demo-felhom -- [ ] Report history shows real timestamps - -### Controller (deploy-felhom-compose repo → controller/) -- [ ] Template has 4-branch validation check (Valid / Error / zero-value guard) -- [ ] `RefreshCache` has cross-check re-validation logic -- [ ] `ValidateDump` has debug logging -- [ ] `backup-db-dump.timer` is disabled on demo-felhom -- [ ] `go build ./cmd/controller/` succeeds -- [ ] `go vet ./...` passes -- [ ] Build, deploy to demo-felhom -- [ ] Backup page shows table counts, not "Hiba" -- [ ] Controller logs show `[DEBUG] ValidateDump OK` entries - -### Version bumps -- Hub: bump to next patch version -- Controller: include in v0.6.1 release (alongside the code review fixes from the other TASK.md) \ No newline at end of file +- [ ] `hub.felhom.eu` shows colored `●` dot, not `🟢` text +- [ ] `hub.exe` no longer in repo (`git ls-files hub/hub.exe` returns empty) +- [ ] `CLAUDE.md` exists in repo root +- [ ] `.gitignore` exists in repo root \ No newline at end of file