Major rewrite of scripts/docker-setup.sh (v5.0)

This commit is contained in:
2026-02-19 11:12:39 +01:00
parent 00c668fc92
commit 5d993b66a2
2 changed files with 500 additions and 235 deletions
+497 -231
View File
@@ -1,246 +1,512 @@
# TASK: Fix startup hub report — Push() silently swallows errors (v0.15.5)
# TASK: Major rewrite of `scripts/docker-setup.sh` (v5.0)
## Problem
## Overview
The startup hub report exists but silently fails. On the latest deployment, the controller tried to push a report 5 seconds after boot, but the hub returned HTTP 503 (it was still starting up). `Push()` always returns `nil` by design, so `main.go` logged `[INFO] Startup hub report sent` even though the push actually failed. The hub shows stale data until the first scheduled report fires (15 minutes later).
Rewrite `docker-setup.sh` to bring it up to date with the current Felhom architecture.
The script should now be a complete end-to-end provisioning tool: install infrastructure,
run an interactive configuration wizard, generate `controller.yaml`, deploy FileBrowser
as a protected stack, and deploy felhom-controller — all in one run.
Evidence from logs:
```
09:46:47 [INFO] Hub reporting enabled (every 15m0s to https://hub.felhom.eu)
09:47:02 [WARN] Hub report push failed after 3 attempts: HTTP 503 ← Push() logged this internally
09:47:02 [INFO] Startup hub report sent ← main.go logged "sent" because Push() returned nil
```
The hub pod only became ready at 09:47:02 — the same second Push() gave up.
## Root cause
`Push()` in `pusher.go` (line 39-86) has comment: "Never returns error to caller — push failures should not affect controller operation." It always returns `nil`. The startup code in `main.go` checks `err` from `Push()` but it's always nil, so it always takes the success branch.
The scheduler (`scheduler.go:223`) already handles errors from `JobFunc` gracefully — it logs the error and continues. So returning real errors from `Push()` is safe for scheduled calls too.
## Fix
### Step 1: Make `Push()` return actual errors
**File:** `controller/internal/report/pusher.go`
Change `Push()` to return the real error instead of always `nil`:
**Current** (line 38-86):
```go
// Push sends a report to the hub. Retries 3 times with 5s backoff.
// Never returns error to caller — push failures should not affect controller operation.
func (p *Pusher) Push(report *Report) error {
if !p.enabled {
return nil
}
data, err := json.Marshal(report)
if err != nil {
p.logger.Printf("[WARN] Hub report marshal failed: %v", err)
return nil
}
url := p.hubURL + "/api/v1/report"
var lastErr error
for attempt := 0; attempt < 3; attempt++ {
if attempt > 0 {
time.Sleep(5 * time.Second)
}
req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data))
if err != nil {
lastErr = err
continue
}
req.Header.Set("Content-Type", "application/json")
if p.apiKey != "" {
req.Header.Set("Authorization", "Bearer "+p.apiKey)
}
resp, err := p.httpClient.Do(req)
if err != nil {
lastErr = err
continue
}
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
p.logger.Printf("[INFO] Hub report pushed successfully (%d bytes)", len(data))
return nil
}
lastErr = fmt.Errorf("HTTP %d", resp.StatusCode)
}
p.logger.Printf("[WARN] Hub report push failed after 3 attempts: %v", lastErr)
return nil
}
```
**Replace with:**
```go
// Push sends a report to the hub. Retries 3 times with 5s backoff.
func (p *Pusher) Push(report *Report) error {
if !p.enabled {
return nil
}
data, err := json.Marshal(report)
if err != nil {
return fmt.Errorf("marshal report: %w", err)
}
url := p.hubURL + "/api/v1/report"
var lastErr error
for attempt := 0; attempt < 3; attempt++ {
if attempt > 0 {
time.Sleep(5 * time.Second)
}
req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data))
if err != nil {
lastErr = err
continue
}
req.Header.Set("Content-Type", "application/json")
if p.apiKey != "" {
req.Header.Set("Authorization", "Bearer "+p.apiKey)
}
resp, err := p.httpClient.Do(req)
if err != nil {
lastErr = err
continue
}
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
p.logger.Printf("[INFO] Hub report pushed successfully (%d bytes)", len(data))
return nil
}
lastErr = fmt.Errorf("HTTP %d", resp.StatusCode)
}
return fmt.Errorf("hub push failed after 3 attempts: %w", lastErr)
}
```
Changes:
- Removed "Never returns error" comment
- Marshal error: return wrapped error instead of logging + nil
- After retries exhausted: return error instead of logging + nil
- Success path: unchanged (returns nil)
This is safe because:
- The scheduler (`executeJob` in `scheduler.go:223-235`) already catches and logs errors from `JobFunc`
- The startup code in `main.go` already checks `err` — it just never saw one before
### Step 2: Add startup retry with longer delay
**File:** `controller/cmd/controller/main.go`
The startup goroutine (starting at ~line 270) sends the hub report once. If Push() fails (hub not ready), it should retry a few times with delay. The hub typically takes 10-15 seconds to start.
**Current** (~line 289-297):
```go
// Hub report
if hubPusher != nil {
if cfg.Hub.Enabled {
r := report.BuildReport(cfg, stackMgr, backupMgr, cpuCollector, metricsStore, Version, sett.GetStoragePaths())
if err := hubPusher.Push(r); err != nil {
logger.Printf("[WARN] Startup hub report failed: %v", err)
} else {
logger.Println("[INFO] Startup hub report sent")
}
} else {
```
**Replace the `if cfg.Hub.Enabled` block** (keep the `else` disabled-notification branch unchanged):
```go
// Hub report
if hubPusher != nil {
if cfg.Hub.Enabled {
r := report.BuildReport(cfg, stackMgr, backupMgr, cpuCollector, metricsStore, Version, sett.GetStoragePaths())
var pushErr error
for attempt := 1; attempt <= 3; attempt++ {
pushErr = hubPusher.Push(r)
if pushErr == nil {
logger.Println("[INFO] Startup hub report sent")
break
}
logger.Printf("[WARN] Startup hub report attempt %d/3 failed: %v", attempt, pushErr)
if attempt < 3 {
time.Sleep(15 * time.Second)
}
}
if pushErr != nil {
logger.Printf("[WARN] Startup hub report failed after 3 attempts — next scheduled push in %s", cfg.Hub.PushInterval)
}
} else {
```
This gives the hub up to ~40 seconds to come up (5s initial + Push's own 3x5s retries on first attempt, then 15s wait, then another Push attempt, etc.). The `else` branch for disabled notifications stays unchanged.
**IMPORTANT:** The `else` branch (disabled notification via `PushOnce`) stays as-is — no changes needed there.
**Read the entire current `scripts/docker-setup.sh` before starting. This is a rewrite
of an existing ~1600-line script, not a new file.**
---
## Summary of changes
## Changes Required
| File | Change |
|------|--------|
| `controller/internal/report/pusher.go` | `Push()` returns actual errors instead of always nil |
| `controller/cmd/controller/main.go` | Startup hub push retries 3 times with 15s delay between attempts |
### 1. Update banner and version
Only **2 files** changed. No new types, no new methods, no template changes.
- Set `SCRIPT_VERSION="5.0.0"`
- Update `print_banner()` — no Portainer, the title should be:
```
Felhom Infrastructure Setup v5.0.0
```
- Update the comment header block at the top of the file to match the new scope
(Docker + Traefik + FileBrowser + Controller + configuration wizard).
- Update `print_help()` to reflect all removed/changed options.
### 2. Remove Portainer (confirm clean)
The current script has no Portainer code (already removed in a prior version).
Just make sure there are zero references to "portainer" or "Portainer" anywhere —
banner, comments, help text, variables. Search and confirm.
### 3. Remove `--cf-tunnel-token` CLI option
**Remove** the `--cf-tunnel-token` CLI flag and the `CF_TUNNEL_TOKEN` variable from
`parse_args()`. The Cloudflare tunnel token is now collected by the configuration wizard
and written into `controller.yaml` (see §7 below). The `install_cloudflare_tunnel()`
function stays but reads the token from the wizard variable instead of a CLI flag.
Also remove `--hdd-path` CLI option and `HDD_PATH` variable — deprecated.
Keep these CLI options (still useful for non-interactive/scripted runs):
- `--ip`, `--gateway`, `--dns`, `--interface` (network config)
- `--domain`, `--email`, `--cf-token` (TLS/domain — can pre-seed wizard)
- `--customer` (customer ID — can pre-seed wizard)
- `--traefik-password`, `--self-signed-cert`
- `--skip-filebrowser`
- `--dry-run`, `--debug`, `--help`, `--bootstrap`
### 4. Remove `--hdd-path` references
Remove `HDD_PATH` variable, `--hdd-path` argument parsing, and all references.
FileBrowser mounts are determined by the wizard (system_data_path and any existing
`/mnt/*` mounts).
### 5. FileBrowser deployment as protected stack
The current `install_filebrowser()` function needs to be rewritten:
**Location:** Deploy to `/opt/docker/stacks/filebrowser/` (already the current
`FILEBROWSER_DIR` — keep this).
**Compose file:** Generate a compose file matching the current production layout
on the demo node. Key differences from current script template:
```yaml
services:
filebrowser:
image: gtstef/filebrowser:latest
container_name: filebrowser
restart: unless-stopped
environment:
- TZ=Europe/Budapest
volumes:
- filebrowser_data:/home/filebrowser/data
# Mount discovered drives — populated by wizard
# e.g. /mnt/hdd_1:/srv/hdd_1, /mnt/sys_drive:/srv/sys_drive
networks:
- traefik-public
deploy:
resources:
limits:
memory: 256M
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:80/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
labels:
- "traefik.enable=true"
- "traefik.http.routers.filebrowser.rule=Host(`files.<DOMAIN>`)"
- "traefik.http.routers.filebrowser.entrypoints=websecure"
- "traefik.http.routers.filebrowser.tls=true"
- "traefik.http.services.filebrowser.loadbalancer.server.port=80"
- "traefik.docker.network=traefik-public"
```
**Drive discovery for volumes:** The wizard (§7) collects `system_data_path`.
Additionally, scan `/mnt/` for existing mount points at install time. For each
discovered mount (e.g., `/mnt/hdd_1`, `/mnt/sys_drive`), add a volume mapping:
`/mnt/<name>:/srv/<name>`. If no mounts found, only mount the `system_data_path`.
**Hardcode domain** in the Traefik host rule (no `${DOMAIN}` env var needed).
Use the wizard's domain value directly: `Host(\`files.ACTUAL-DOMAIN\`)`.
**Also generate `.felhom.yml`** metadata file — keep the existing one from the
current script (Hungarian text, category: storage, etc.).
**No `.env` file needed** for filebrowser (domain is hardcoded in compose labels).
### 6. Controller deployment (NEW step)
Add a new step to deploy felhom-controller. This is currently missing from the
script — the user had to deploy it manually.
**Location:** `/opt/docker/felhom-controller/`
**docker-compose.yml** — generate matching the current production layout:
```yaml
services:
felhom-controller:
image: gitea.dooplex.hu/admin/felhom-controller:latest
container_name: felhom-controller
restart: unless-stopped
privileged: true
ports:
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/docker/felhom-controller/controller.yaml:/opt/docker/felhom-controller/controller.yaml:ro
- controller-data:/opt/docker/felhom-controller/data
- /opt/docker/stacks:/opt/docker/stacks
- /srv/backups:/srv/backups
- type: bind
source: /mnt
target: /mnt
bind:
propagation: rshared
- /sys:/host/sys:ro
- /etc/os-release:/host/etc/os-release:ro
- /etc/hostname:/host/etc/hostname:ro
- /dev:/host-dev:rw
- /etc/fstab:/host-fstab
- /run/udev:/run/udev:ro
environment:
- TZ=Europe/Budapest
labels:
- "traefik.enable=true"
- "traefik.http.routers.controller.rule=Host(`felhom.<DOMAIN>`)"
- "traefik.http.routers.controller.entrypoints=websecure"
- "traefik.http.routers.controller.tls=true"
- "traefik.http.services.controller.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "felhom.managed=true"
- "felhom.component=controller"
networks:
- traefik-public
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"]
interval: 30s
timeout: 5s
start_period: 10s
retries: 3
volumes:
controller-data:
networks:
traefik-public:
external: true
```
**Hardcode domain** in Traefik labels (like filebrowser).
**Generate `.env`** with just `DOMAIN=<domain>` — needed only as a reference/
documentation, since we hardcode the domain in compose labels. Actually, skip
the `.env` file entirely — compose doesn't need it if labels are hardcoded.
**Use `latest` tag** for the image. The controller has self-update capability
so it will manage its own version after initial deployment.
**Pull and start** the controller, then verify health via the healthcheck endpoint.
### 7. Configuration wizard for `controller.yaml`
Add an interactive wizard function `run_config_wizard()` that runs AFTER
infrastructure setup but BEFORE deploying the controller. It generates
`/opt/docker/felhom-controller/controller.yaml`.
**CLI pre-seeding:** If `--domain`, `--customer`, `--email`, `--cf-token` are
provided via CLI, use them as defaults in the wizard (user can still change).
**Wizard flow** (each question is a `read -p` prompt with a default shown in brackets):
```
===========================================================
Felhom Controller Configuration Wizard
===========================================================
--- Customer identity ---
Customer ID [demo-felhom]: _
Customer display name [Demo Ügyfél]: _
Domain [homeserver.local]: _
Customer email (optional) []: _
--- Infrastructure secrets ---
Cloudflare Tunnel token (optional, leave empty to skip) []: _
Cloudflare API token (for DNS-01 certs, optional) []: _
--- Paths ---
System data partition mount point
(if the system drive was partitioned for user data,
provide the mount point, e.g., /mnt/sys_drive)
System data path [/mnt/sys_drive]: _
--- Dashboard password ---
Set a password for the controller dashboard?
(leave empty for first-visit setup prompt)
Dashboard password []: _
--- Git sync ---
App catalog repository URL [https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git]: _
Git username []: _
Git token []: _
--- Healthcheck monitoring ---
Healthchecks.io ping UUIDs (leave empty to skip):
Heartbeat UUID []: _
System health UUID []: _
DB dump UUID []: _
Backup UUID []: _
Backup integrity UUID []: _
--- Ready ---
```
**Password hashing:** If user provides a dashboard password, hash it with bcrypt.
Use `htpasswd -bnBC 10 "" "PASSWORD" | tr -d ':'` or the `python3 -c` fallback.
Store the hash in `web.password_hash`.
**Session secret:** Auto-generate: `openssl rand -hex 32`
**Hub config:** Always enabled, with the hardcoded API key:
```yaml
hub:
enabled: true
url: "https://hub.felhom.eu"
api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
push_interval: "15m"
```
**Backup:** Keep `enabled: true` — the user confirmed it should stay for
troubleshooting purposes.
**hdd_path:** Do NOT include in generated config. It's deprecated. Remove it
from the template entirely.
**Full template** — write this to `/opt/docker/felhom-controller/controller.yaml`:
```yaml
# Felhom Controller Configuration
# Generated by docker-setup.sh v5.0.0 on <DATE>
customer:
id: "<CUSTOMER_ID>"
name: "<CUSTOMER_NAME>"
domain: "<DOMAIN>"
email: "<EMAIL>"
telegram_chat_id: ""
infrastructure:
cf_tunnel_token: "<CF_TUNNEL_TOKEN>"
cf_api_token: "<CF_API_TOKEN>"
paths:
stacks_dir: "/opt/docker/stacks"
data_dir: "/opt/docker/felhom-controller/data"
system_data_path: "<SYSTEM_DATA_PATH>"
system:
reserved_memory_mb: 384
web:
listen: ":8080"
password_hash: "<BCRYPT_HASH_OR_EMPTY>"
session_secret: "<AUTO_GENERATED_HEX>"
git:
repo_url: "<GIT_REPO_URL>"
branch: "main"
sync_interval: "15m"
username: "<GIT_USERNAME>"
token: "<GIT_TOKEN>"
stacks:
protected:
- "traefik"
- "cloudflared"
- "felhom-controller"
- "filebrowser"
update_window: "03:00-05:00"
compose_command: ""
backup:
enabled: true
restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
db_dump_schedule: "02:30"
restic_schedule: "03:00"
retention:
keep_daily: 7
keep_weekly: 4
keep_monthly: 6
prune_schedule: "weekly"
monitoring:
enabled: true
healthchecks_base: "https://status.felhom.eu"
ping_uuids:
heartbeat: "<HEARTBEAT_UUID>"
system_health: "<SYSTEM_HEALTH_UUID>"
db_dump: "<DB_DUMP_UUID>"
backup: "<BACKUP_UUID>"
backup_integrity: "<BACKUP_INTEGRITY_UUID>"
system_health_interval: "5m"
health_check_schedule: "06:00"
thresholds:
disk_warn_percent: 80
disk_crit_percent: 90
backup_max_age_hours: 36
cpu_warn_percent: 90
memory_warn_percent: 85
temperature_warn_celsius: 75
hub:
enabled: true
url: "https://hub.felhom.eu"
api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
push_interval: "15m"
self_update:
enabled: true
check_interval: "6h"
image: "gitea.dooplex.hu/admin/felhom-controller"
auto_update: false
health_timeout_seconds: 60
notifications:
customer_events:
- "disk_warning"
- "backup_failed"
- "update_available"
- "security_update"
operator_events:
- "disk_critical"
- "backup_failed"
- "self_update_failed"
- "container_unhealthy"
logging:
level: "info"
file: ""
max_size_mb: 10
max_files: 3
assets:
source_url: "https://felhom.eu"
```
### 8. Update `controller.yaml.example`
Update `controller/configs/controller.yaml.example` to match the wizard template:
- **Remove** `hdd_path` line entirely
- **Set** `hub.enabled: true` (was `false`)
- **Set** `hub.api_key` to the real key: `094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8`
- **Improve** `system_data_path` comment to be clearer:
```yaml
system_data_path: "/mnt/sys_drive" # Mount point of user-data partition on system drive (e.g., /mnt/sys_drive)
```
### 9. Update `install_cloudflare_tunnel()`
The function currently reads from `CF_TUNNEL_TOKEN` (CLI arg). Change it to
read from the wizard variable (same variable name is fine, just populated by the
wizard instead of CLI). The function body stays the same — it creates the
docker-compose at `/opt/docker/cloudflared/` and starts it.
**Guard:** If wizard left the CF tunnel token empty, skip this step (already
handled by the existing `if [[ -z "$CF_TUNNEL_TOKEN" ]]` check).
### 10. Update execution order in `main()`
New execution order:
```
1. Install base packages
2. Configure network (static IP, if requested)
3. Install Docker Engine + Compose
4. Install Traefik reverse proxy
5. Generate self-signed certificate (if requested)
6. Run configuration wizard → generates controller.yaml
7. Install Cloudflare Tunnel (if token provided in wizard)
8. Install FileBrowser (protected stack)
9. Deploy felhom-controller
10. Install helper tools
11. Print summary
```
Update step numbering and `get_total_steps()` accordingly.
### 11. Update `print_summary()`
Update the summary to reflect:
- Controller is deployed and accessible at `https://felhom.<DOMAIN>`
- FileBrowser at `https://files.<DOMAIN>`
- Remove manual "deploy felhom-controller" instructions (it's automated now)
- Show healthcheck UUID status (configured / not configured)
- Show hub status (enabled)
- Remove the `CUSTOMER_ID` display bug (the "Note: No --customer specified"
message is inside the `if [[ -n "$CUSTOMER_ID" ]]` block — wrong logic)
### 12. Update `print_help()`
Update help text to reflect:
- Removed `--cf-tunnel-token` (now in wizard)
- Removed `--hdd-path` (deprecated)
- Mention the interactive wizard
- Updated "WHAT THIS SCRIPT INSTALLS" list:
1. Base packages
2. Docker Engine + Compose
3. Traefik reverse proxy
4. TLS certificates
5. Felhom Controller (with interactive configuration)
6. FileBrowser Quantum (web file manager)
7. Cloudflare Tunnel (if configured)
8. Helper tools
---
## Build & Deploy
## Additional observations
### Bugs in current script
1. **`print_summary()` CUSTOMER_ID logic is inverted** (line ~1507):
The "Note: No --customer specified" message is inside `if [[ -n "$CUSTOMER_ID" ]]`
which only triggers when a customer IS specified. Should be in an else branch
or removed.
2. **Step numbering is fragile**: The `get_total_steps()` and hardcoded step
numbers (e.g., `log_step "3/$(get_total_steps)"`) will desync if steps are
added/removed. Consider using a counter variable incremented at each step.
### Things NOT to change
- `bootstrap_sudo()` — works fine, keep as-is
- Network configuration (steps 2) — keep all network manager detection logic
- Docker installation (step 3) — keep as-is
- Traefik installation (step 4) — keep as-is
- Self-signed cert generation — keep as-is
- Helper tools installation — keep as-is
- Error trap and diagnostics — keep as-is
- Color/logging functions — keep as-is
### Template completeness check
The controller.yaml template covers all sections from the current example.
Sections that use sensible defaults and don't need wizard prompts:
- `system.reserved_memory_mb` (384)
- `backup.*` (all defaults are fine)
- `stacks.protected` (hardcoded list)
- `stacks.update_window` ("03:00-05:00")
- `monitoring.thresholds.*` (all defaults)
- `self_update.*` (all defaults)
- `notifications.*` (all defaults)
- `logging.*` (all defaults)
- `assets.*` (hardcoded)
---
## Implementation notes
- The script is bash — no external YAML parser needed. Use `cat > file << EOF`
with variable substitution for generating YAML.
- For bcrypt hashing, prefer `htpasswd -bnBC 10 "" "$password" | tr -d ':\n'`
(apache2-utils is installed in step 1). Fallback: `python3 -c "import bcrypt; ..."`
- The wizard should show current/default values in brackets and accept Enter
for defaults: `read -p "Domain [$default]: " input; value="${input:-$default}"`
- Dry-run mode should show what the wizard WOULD generate without writing files.
- All generated files should have appropriate permissions:
- `controller.yaml`: `chmod 600` (contains secrets)
- `docker-compose.yml` files: `chmod 644`
---
## Build & test
After implementing, test the script with `--dry-run` to verify:
```bash
sudo ./docker-setup.sh --domain test.local --customer test --dry-run
```
For a real deployment test on the demo node:
```bash
# Copy script to demo node
SSH=/c/Windows/System32/OpenSSH/ssh.exe
# 1. Commit & push
cd e:/git/deploy-felhom-compose
git add -A && git commit -m "v0.15.5: Fix startup hub report — Push() returns real errors, startup retries" && git push
# 2. Build
$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-controller && git -C ~/git/deploy-felhom-compose pull && ./build.sh v0.15.5 --push"
# 3. Deploy
$SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker pull gitea.dooplex.hu/admin/felhom-controller:v0.15.5 && sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:v0.15.5|' docker-compose.yml && sudo docker compose up -d"
# 4. Verify — look for successful startup push
$SSH kisfenyo@192.168.0.162 "sleep 10 && docker logs felhom-controller --tail 15 2>&1 | grep -i hub"
scp scripts/docker-setup.sh kisfenyo@192.168.0.162:/tmp/
# Run on demo node (it already has infrastructure, so most steps will skip)
$SSH kisfenyo@192.168.0.162 "sudo bash /tmp/docker-setup.sh --domain demo-felhom.eu --customer demo-felhom --email certs@felhom.eu --cf-token <token>"
```
### Compile check
Always run `go build ./...` in `controller/` before committing.
## Documentation
Add a CHANGELOG.md entry. Read the first 30 lines for format, then insert a new entry:
```markdown
### vX.X.X (2026-02-19 session XX)
- **v0.15.5 — Fix startup hub report silently failing:**
`Push()` now returns actual errors instead of always nil. Previously, push failures were logged internally but the caller could never detect them, leading to misleading "Startup hub report sent" log even when the push failed (e.g., hub returning HTTP 503 during simultaneous deployment).
Startup hub push now retries 3 times with 15-second delays between attempts, giving the hub time to come up when both are deployed together. Each attempt uses Push()'s own 3-retry logic internally.
**Files modified (2):** `internal/report/pusher.go`, `cmd/controller/main.go`
```
Update version in `C:\Users\User\.claude\projects\e--git\memory\MEMORY.md` to `v0.15.5`.
## Verification
After deploying v0.15.5:
1. Check logs: `docker logs felhom-controller 2>&1 | grep -i hub`
- Should show `[INFO] Startup hub report sent` (success)
- OR `[WARN] Startup hub report attempt 1/3 failed: ...` followed by eventual success
2. Check hub dashboard at `hub.felhom.eu` — should show fresh data with current timestamp
3. If hub is deployed at the same time: the retries should handle the delay