# TASK: Fix startup hub report — Push() silently swallows errors (v0.15.5) ## Problem The startup hub report exists but silently fails. On the latest deployment, the controller tried to push a report 5 seconds after boot, but the hub returned HTTP 503 (it was still starting up). `Push()` always returns `nil` by design, so `main.go` logged `[INFO] Startup hub report sent` even though the push actually failed. The hub shows stale data until the first scheduled report fires (15 minutes later). Evidence from logs: ``` 09:46:47 [INFO] Hub reporting enabled (every 15m0s to https://hub.felhom.eu) 09:47:02 [WARN] Hub report push failed after 3 attempts: HTTP 503 ← Push() logged this internally 09:47:02 [INFO] Startup hub report sent ← main.go logged "sent" because Push() returned nil ``` The hub pod only became ready at 09:47:02 — the same second Push() gave up. ## Root cause `Push()` in `pusher.go` (line 39-86) has comment: "Never returns error to caller — push failures should not affect controller operation." It always returns `nil`. The startup code in `main.go` checks `err` from `Push()` but it's always nil, so it always takes the success branch. The scheduler (`scheduler.go:223`) already handles errors from `JobFunc` gracefully — it logs the error and continues. So returning real errors from `Push()` is safe for scheduled calls too. ## Fix ### Step 1: Make `Push()` return actual errors **File:** `controller/internal/report/pusher.go` Change `Push()` to return the real error instead of always `nil`: **Current** (line 38-86): ```go // Push sends a report to the hub. Retries 3 times with 5s backoff. // Never returns error to caller — push failures should not affect controller operation. func (p *Pusher) Push(report *Report) error { if !p.enabled { return nil } data, err := json.Marshal(report) if err != nil { p.logger.Printf("[WARN] Hub report marshal failed: %v", err) return nil } url := p.hubURL + "/api/v1/report" var lastErr error for attempt := 0; attempt < 3; attempt++ { if attempt > 0 { time.Sleep(5 * time.Second) } req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data)) if err != nil { lastErr = err continue } req.Header.Set("Content-Type", "application/json") if p.apiKey != "" { req.Header.Set("Authorization", "Bearer "+p.apiKey) } resp, err := p.httpClient.Do(req) if err != nil { lastErr = err continue } io.Copy(io.Discard, resp.Body) resp.Body.Close() if resp.StatusCode >= 200 && resp.StatusCode < 300 { p.logger.Printf("[INFO] Hub report pushed successfully (%d bytes)", len(data)) return nil } lastErr = fmt.Errorf("HTTP %d", resp.StatusCode) } p.logger.Printf("[WARN] Hub report push failed after 3 attempts: %v", lastErr) return nil } ``` **Replace with:** ```go // Push sends a report to the hub. Retries 3 times with 5s backoff. func (p *Pusher) Push(report *Report) error { if !p.enabled { return nil } data, err := json.Marshal(report) if err != nil { return fmt.Errorf("marshal report: %w", err) } url := p.hubURL + "/api/v1/report" var lastErr error for attempt := 0; attempt < 3; attempt++ { if attempt > 0 { time.Sleep(5 * time.Second) } req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data)) if err != nil { lastErr = err continue } req.Header.Set("Content-Type", "application/json") if p.apiKey != "" { req.Header.Set("Authorization", "Bearer "+p.apiKey) } resp, err := p.httpClient.Do(req) if err != nil { lastErr = err continue } io.Copy(io.Discard, resp.Body) resp.Body.Close() if resp.StatusCode >= 200 && resp.StatusCode < 300 { p.logger.Printf("[INFO] Hub report pushed successfully (%d bytes)", len(data)) return nil } lastErr = fmt.Errorf("HTTP %d", resp.StatusCode) } return fmt.Errorf("hub push failed after 3 attempts: %w", lastErr) } ``` Changes: - Removed "Never returns error" comment - Marshal error: return wrapped error instead of logging + nil - After retries exhausted: return error instead of logging + nil - Success path: unchanged (returns nil) This is safe because: - The scheduler (`executeJob` in `scheduler.go:223-235`) already catches and logs errors from `JobFunc` - The startup code in `main.go` already checks `err` — it just never saw one before ### Step 2: Add startup retry with longer delay **File:** `controller/cmd/controller/main.go` The startup goroutine (starting at ~line 270) sends the hub report once. If Push() fails (hub not ready), it should retry a few times with delay. The hub typically takes 10-15 seconds to start. **Current** (~line 289-297): ```go // Hub report if hubPusher != nil { if cfg.Hub.Enabled { r := report.BuildReport(cfg, stackMgr, backupMgr, cpuCollector, metricsStore, Version, sett.GetStoragePaths()) if err := hubPusher.Push(r); err != nil { logger.Printf("[WARN] Startup hub report failed: %v", err) } else { logger.Println("[INFO] Startup hub report sent") } } else { ``` **Replace the `if cfg.Hub.Enabled` block** (keep the `else` disabled-notification branch unchanged): ```go // Hub report if hubPusher != nil { if cfg.Hub.Enabled { r := report.BuildReport(cfg, stackMgr, backupMgr, cpuCollector, metricsStore, Version, sett.GetStoragePaths()) var pushErr error for attempt := 1; attempt <= 3; attempt++ { pushErr = hubPusher.Push(r) if pushErr == nil { logger.Println("[INFO] Startup hub report sent") break } logger.Printf("[WARN] Startup hub report attempt %d/3 failed: %v", attempt, pushErr) if attempt < 3 { time.Sleep(15 * time.Second) } } if pushErr != nil { logger.Printf("[WARN] Startup hub report failed after 3 attempts — next scheduled push in %s", cfg.Hub.PushInterval) } } else { ``` This gives the hub up to ~40 seconds to come up (5s initial + Push's own 3x5s retries on first attempt, then 15s wait, then another Push attempt, etc.). The `else` branch for disabled notifications stays unchanged. **IMPORTANT:** The `else` branch (disabled notification via `PushOnce`) stays as-is — no changes needed there. --- ## Summary of changes | File | Change | |------|--------| | `controller/internal/report/pusher.go` | `Push()` returns actual errors instead of always nil | | `controller/cmd/controller/main.go` | Startup hub push retries 3 times with 15s delay between attempts | Only **2 files** changed. No new types, no new methods, no template changes. --- ## Build & Deploy ```bash SSH=/c/Windows/System32/OpenSSH/ssh.exe # 1. Commit & push cd e:/git/deploy-felhom-compose git add -A && git commit -m "v0.15.5: Fix startup hub report — Push() returns real errors, startup retries" && git push # 2. Build $SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-controller && git -C ~/git/deploy-felhom-compose pull && ./build.sh v0.15.5 --push" # 3. Deploy $SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker pull gitea.dooplex.hu/admin/felhom-controller:v0.15.5 && sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:v0.15.5|' docker-compose.yml && sudo docker compose up -d" # 4. Verify — look for successful startup push $SSH kisfenyo@192.168.0.162 "sleep 10 && docker logs felhom-controller --tail 15 2>&1 | grep -i hub" ``` ### Compile check Always run `go build ./...` in `controller/` before committing. ## Documentation Add a CHANGELOG.md entry. Read the first 30 lines for format, then insert a new entry: ```markdown ### vX.X.X (2026-02-19 session XX) - **v0.15.5 — Fix startup hub report silently failing:** `Push()` now returns actual errors instead of always nil. Previously, push failures were logged internally but the caller could never detect them, leading to misleading "Startup hub report sent" log even when the push failed (e.g., hub returning HTTP 503 during simultaneous deployment). Startup hub push now retries 3 times with 15-second delays between attempts, giving the hub time to come up when both are deployed together. Each attempt uses Push()'s own 3-retry logic internally. **Files modified (2):** `internal/report/pusher.go`, `cmd/controller/main.go` ``` Update version in `C:\Users\User\.claude\projects\e--git\memory\MEMORY.md` to `v0.15.5`. ## Verification After deploying v0.15.5: 1. Check logs: `docker logs felhom-controller 2>&1 | grep -i hub` - Should show `[INFO] Startup hub report sent` (success) - OR `[WARN] Startup hub report attempt 1/3 failed: ...` followed by eventual success 2. Check hub dashboard at `hub.felhom.eu` — should show fresh data with current timestamp 3. If hub is deployed at the same time: the retries should handle the delay