From 0be798af5da77c1dcee46dc62ab86741cdf40cf9 Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Sat, 14 Feb 2026 18:57:20 +0100 Subject: [PATCH] updated startup monitoring --- CLAUDE.md | 5 +- CONTEXT.md | 12 +- controller/README.md | 12 +- controller/internal/web/templates.go | 172 +++++++++++++++++++++++++-- 4 files changed, 187 insertions(+), 14 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 297c367..f24a02d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -102,6 +102,7 @@ ssh kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && docker pull gite - App cards on dashboard and stacks pages are clickable via `data-href` attribute (skip protected stacks) - Logs page uses AJAX polling (`?raw=1` query param returns plain text) with auto-scroll and pause/resume - Memory bar on deploy page uses two-segment stacked bar (committed = solid green, new = translucent green) +- Deploy flow shows 3-step progress panel (config → containers → health), polls `GET /api/stacks/{name}` every 3s until running/unhealthy/timeout(120s) ## Git sync module (internal/sync) @@ -138,4 +139,6 @@ Key patterns used in `internal/stacks/`: 3. Go map iteration order is random — always sort before displaying in UI 4. Docker's `.State` field says "running" even for unhealthy containers — must parse `.Status` for health info 5. After `DeployStack()` succeeds, update in-memory `Deployed` flag immediately — `RefreshStatus()` only reads docker ps, not app.yaml -6. `docker compose up -d` returns exit 0 even when containers crash-loop — post-start status check is essential for detecting failures \ No newline at end of file +6. `docker compose up -d` returns exit 0 even when containers crash-loop — post-start status check is essential for detecting failures +7. Mealie image has no wget/curl — use Python TCP socket check for healthcheck; set `start_period: 60s` for DB migration time +8. Always verify container images have the healthcheck tool (`wget`, `curl`, etc.) before using it — Alpine has BusyBox wget, Python images have `python3` \ No newline at end of file diff --git a/CONTEXT.md b/CONTEXT.md index b75fb82..2b6f86f 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -37,8 +37,13 @@ Last updated: 2026-02-14 (session 3) - All verbose checks gated on `cfg.Logging.Level == "debug"`; timing always at INFO - **UI improvements** in `internal/web/templates.go` and `server.go`: - **Memory bar fix on deploy page**: Bar segments now always visible (min-width: 3px), new app segment uses translucent green with distinct border for clear visual separation from committed memory - - **Clickable app cards**: Cards on Vezérlőpult and Alkalmazások pages are now clickable (navigates to deploy/detail page). Uses `data-href` attribute + delegated click handler. Protected stacks excluded + - **Clickable app cards**: Cards on Vezérlőpult and Alkalmazások pages are now clickable (navigates to deploy/detail page). Uses `data-href` attribute + delegated click handler. Protected stacks excluded. Actions area (buttons, state labels) excluded from click-to-navigate - **Live-scrolling logs**: Logs page now auto-refreshes every 3s via AJAX polling (`?raw=1` returns plain text). Fixed-height container (70vh) with auto-scroll to bottom. Pulsing green "Élő" indicator. Pause/resume toggle ("Szüneteltetés"/"Folytatás"). User scroll position preserved when scrolled up to read history + - **Deployment progress UI**: Deploy button no longer shows alert+redirect immediately. Instead shows 3-step progress panel: config saved → containers starting → app initializing. Polls `GET /api/stacks/{name}` every 3s to track actual container health state. Handles running (auto-redirect), starting (keep polling), unhealthy (warning), exited (error), and 120s timeout. Shows elapsed time counter +- **Mealie healthcheck fix** (app-catalog-felhom.eu): + - `wget --spider` replaced with Python TCP socket check — mealie image doesn't include wget + - `start_period` increased to 60s (DB migrations take ~40s on first start) +- **Healthcheck audit**: filebrowser (Alpine, has BusyBox wget — OK), stirling-pdf (Ubuntu, has wget — OK) ### Previously completed (2026-02-15 session 2) - **Phase 4: Git Sync + App Catalog Audit** — major milestone @@ -178,4 +183,7 @@ Last updated: 2026-02-14 (session 3) - Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers - BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage - `docker compose up -d` returns exit 0 even when containers immediately crash-loop — need post-start status check to detect this -- When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files \ No newline at end of file +- When logging env vars for debugging, only log keys (not values) to avoid leaking secrets in log files +- Mealie image (`ghcr.io/mealie-recipes/mealie`) doesn't include wget/curl — use Python TCP socket check for healthcheck +- Mealie DB migrations on first start take ~40s (alembic) — use `start_period: 60s` to avoid premature unhealthy status +- Alpine-based images (filebrowser, vaultwarden) have wget via BusyBox — healthchecks with `wget --spider` work fine \ No newline at end of file diff --git a/controller/README.md b/controller/README.md index 46a3186..794d992 100644 --- a/controller/README.md +++ b/controller/README.md @@ -44,6 +44,7 @@ Current version: **v0.2.1** - Verbose debug logging with operation timing, post-start container state checks, and image pull detection - Clickable app cards on dashboard and applications pages (navigate to detail/deploy page) - Memory bar with two-segment visualization on deploy page (committed vs new app allocation) +- Deployment progress UI: 3-step progress panel with real-time health polling (config → containers → health check) ### Known issues / next priorities - Cloudflare Tunnel + Traefik TLS: paperless.demo-felhom.eu works locally but shows "Not secure" (certificate chain not fully validated through tunnel) @@ -154,8 +155,15 @@ controller/ - Saves `app.yaml` (env vars + locked fields list) - Runs `docker compose up -d` with env vars injected - Updates in-memory state immediately (no stale "Telepítés" button) -4. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only -5. "Részletek" button opens deploy page in read-only mode showing current config +4. **Progress UI** replaces the form with a 3-step progress panel: + - ✅ "Konfiguráció mentve" — shown immediately after API success + - ⏳ "Konténer(ek) indítása..." → ✅ when containers are up + - ⏳ "Alkalmazás inicializálása..." → ✅ when state = `running` (healthy) + - Polls `GET /api/stacks/{name}` every 3 seconds for real-time health status + - Handles: `running` (auto-redirect), `starting` (keep polling), `unhealthy` (warning), `exited` (error) + - Timeout after 120 seconds with informational message +5. Post-deploy: locked fields (DB_PASSWORD, etc.) become read-only +6. "Részletek" button opens deploy page in read-only mode showing current config ### Memory validation during deploy diff --git a/controller/internal/web/templates.go b/controller/internal/web/templates.go index 4bfff13..675b42b 100644 --- a/controller/internal/web/templates.go +++ b/controller/internal/web/templates.go @@ -426,6 +426,27 @@ const deployTmpl = ` {{end}} + +