added fix for deployment race condition

This commit is contained in:
2026-02-14 19:17:01 +01:00
parent 0be798af5d
commit a8096faf59
5 changed files with 55 additions and 21 deletions
+3 -1
View File
@@ -98,11 +98,13 @@ ssh kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && docker pull gite
- Stacks are sorted alphabetically by DisplayName
- Protected stacks (traefik, cloudflared, felhom-controller) can't be stopped from UI
- `app.yaml` persists deploy config; `deployed: true` flag controls UI state
- In-memory `Deployed` flag is set BEFORE `docker compose up -d` (avoids race condition with slow image pulls); reverted on failure
- Password fields require explicit user input or generation (no silent auto-fill)
- App cards on dashboard and stacks pages are clickable via `data-href` attribute (skip protected stacks)
- Logs page uses AJAX polling (`?raw=1` query param returns plain text) with auto-scroll and pause/resume
- Memory bar on deploy page uses two-segment stacked bar (committed = solid green, new = translucent green)
- Deploy flow shows 3-step progress panel (config → containers → health), polls `GET /api/stacks/{name}` every 3s until running/unhealthy/timeout(120s)
- Telepítés buttons have `checkBeforeDeploy()` onclick guard — fetches live state from API before navigating to deploy page
## Git sync module (internal/sync)
@@ -138,7 +140,7 @@ Key patterns used in `internal/stacks/`:
2. `docker compose restart` does NOT pick up new images — always use `docker compose up -d`
3. Go map iteration order is random — always sort before displaying in UI
4. Docker's `.State` field says "running" even for unhealthy containers — must parse `.Status` for health info
5. After `DeployStack()` succeeds, update in-memory `Deployed` flag immediately — `RefreshStatus()` only reads docker ps, not app.yaml
5. In-memory `Deployed` flag must be set BEFORE `docker compose up -d` (not after) — compose can take 30-60s for image pulls; revert both in-memory and disk on failure
6. `docker compose up -d` returns exit 0 even when containers crash-loop — post-start status check is essential for detecting failures
7. Mealie image has no wget/curl — use Python TCP socket check for healthcheck; set `start_period: 60s` for DB migration time
8. Always verify container images have the healthcheck tool (`wget`, `curl`, etc.) before using it — Alpine has BusyBox wget, Python images have `python3`
+13 -3
View File
@@ -7,7 +7,7 @@
>
> Ask Claude Code: "Please update CONTEXT.md with what we did today"
Last updated: 2026-02-14 (session 3)
Last updated: 2026-02-14 (session 4)
---
@@ -28,7 +28,17 @@ Last updated: 2026-02-14 (session 3)
- **Running on:** demo-felhom (N100 mini PC) at 192.168.0.162:8080
- **All Phase 1 features working:** deploy, start/stop/restart/update, logs, health-aware states, auth
### What was just completed (2026-02-14 session 3)
### What was just completed (2026-02-14 session 4)
- **Fixed deploy race condition** in `internal/stacks/deploy.go`:
- In-memory `Deployed` flag now set BEFORE `docker compose up -d` (compose up can take 30-60s for image pulls)
- On failure: both in-memory state and disk (app.yaml) are reverted
- Eliminates stale "Telepítés" button during long compose operations
- **Added `checkBeforeDeploy()` JS guard** in `internal/web/templates.go`:
- Telepítés buttons on Vezérlőpult and Alkalmazások pages now fetch live state from `/api/stacks/{name}` before navigating
- If app is already deployed (e.g., another tab deployed it), shows alert and reloads page instead of navigating to deploy form
- Catches stale UI state gracefully
### Previously completed (2026-02-14 session 3)
- **Enhanced debug logging** across all stack operations in `internal/stacks/`:
- **Operation timing**: All stack ops (start, stop, restart, update, deploy) now log elapsed time
- **Post-start container state check**: Async goroutine after start/restart/update/deploy
@@ -179,7 +189,7 @@ Last updated: 2026-02-14 (session 3)
- Go maps have random iteration order — always sort slices before displaying
- Docker `.State`="running" doesn't mean healthy — check `.Status` for "(health: starting)" / "(unhealthy)"
- Paperless-ngx needs `PAPERLESS_OCR_LANGUAGES` (plural) to install language packs, `PAPERLESS_OCR_LANGUAGE` (singular) to select
- After deploying a stack, update the in-memory Deployed flag immediately — RefreshStatus() only reads docker ps
- In-memory Deployed flag must be set BEFORE `docker compose up -d` (not after) — compose can take 30-60s for image pulls, during which the UI would show a stale "Telepítés" button
- Cloudflare Tunnel handles *.demo-felhom.eu → Traefik handles Host()-based routing to containers
- BIOS "AC Power Recovery" must be enabled on N100 for auto-restart after power outage
- `docker compose up -d` returns exit 0 even when containers immediately crash-loop — need post-start status check to detect this
+3 -2
View File
@@ -148,13 +148,14 @@ controller/
- **Auto-generated**: DB passwords, secret keys (shown as "✓ Generated")
- **User input**: HDD path, admin password, language, etc.
- **"🎲 Generálás"** button next to password fields
3. Clicks "Telepítés" → controller:
3. Clicks "Telepítés" → `checkBeforeDeploy()` JS guard fetches live state from API first (prevents deploying if already deployed from another tab). Then controller:
- **Memory validation**: checks `mem_request` against available system RAM (see below)
- Validates all required fields (password fields must be explicitly filled or generated)
- Generates auto-secrets (DB passwords, hex keys)
- Saves `app.yaml` (env vars + locked fields list)
- **Updates in-memory state immediately** (so UI shows "deployed" during slow compose ops)
- Runs `docker compose up -d` with env vars injected
- Updates in-memory state immediately (no stale "Telepítés" button)
- On failure: reverts both in-memory state and disk (app.yaml `deployed: false`)
4. **Progress UI** replaces the form with a 3-step progress panel:
- ✅ "Konfiguráció mentve" — shown immediately after API success
- ⏳ "Konténer(ek) indítása..." → ✅ when containers are up
+21 -13
View File
@@ -183,19 +183,9 @@ func (m *Manager) DeployStack(req DeployRequest) (string, error) {
m.checkLocalImages(req.StackName, stackDir)
}
// Run docker compose up -d
start := time.Now()
_, composeErr := m.composeExecWithEnv(stackDir, env, "up", "-d")
if composeErr != nil {
m.logger.Printf("[ERROR] Stack %s deploy failed after %.1fs: %v", req.StackName, time.Since(start).Seconds(), composeErr)
// Deployment failed — keep app.yaml for debugging but mark as not deployed
appCfg.Deployed = false
_ = SaveAppConfig(stackDir, appCfg)
return "", fmt.Errorf("docker compose up failed: %w", composeErr)
}
// Update in-memory stack state immediately so the UI reflects the deployment
// without waiting for the next ScanStacks() cycle.
// Update in-memory stack state BEFORE compose up so the UI reflects
// "deployed" immediately (compose up can take 30-60s for image pulls).
// If compose up fails, we revert both disk and in-memory state below.
m.mu.Lock()
if s, ok := m.stacks[req.StackName]; ok {
s.Deployed = true
@@ -203,6 +193,24 @@ func (m *Manager) DeployStack(req DeployRequest) (string, error) {
}
m.mu.Unlock()
// Run docker compose up -d
start := time.Now()
_, composeErr := m.composeExecWithEnv(stackDir, env, "up", "-d")
if composeErr != nil {
m.logger.Printf("[ERROR] Stack %s deploy failed after %.1fs: %v", req.StackName, time.Since(start).Seconds(), composeErr)
// Revert in-memory state
m.mu.Lock()
if s, ok := m.stacks[req.StackName]; ok {
s.Deployed = false
s.AppConfig = nil
}
m.mu.Unlock()
// Revert disk state — keep app.yaml for debugging but mark as not deployed
appCfg.Deployed = false
_ = SaveAppConfig(stackDir, appCfg)
return "", fmt.Errorf("docker compose up failed: %w", composeErr)
}
m.logger.Printf("[INFO] Stack %s deployed successfully (took %.1fs)", req.StackName, time.Since(start).Seconds())
// Post-deploy container state check (async, non-blocking)
+15 -2
View File
@@ -42,6 +42,19 @@ const layoutTmpl = `
var card = e.target.closest('[data-href]');
if (card) window.location.href = card.dataset.href;
});
async function checkBeforeDeploy(e, name) {
try {
var resp = await fetch('/api/stacks/' + name);
var data = await resp.json();
if (data.ok && data.data && data.data.deployed) {
e.preventDefault();
alert('Ez az alkalmazás már telepítve van.');
window.location.reload();
return false;
}
} catch(err) {}
return true;
}
async function syncTemplates() {
const btn = document.getElementById('sync-btn');
const toast = document.getElementById('sync-toast');
@@ -189,7 +202,7 @@ const dashboardTmpl = `
{{if .Protected}}
<span class="badge badge-protected">Védett</span>
{{else if not .Deployed}}
<a href="/stacks/{{.Name}}/deploy" class="btn btn-sm btn-primary">Telepítés</a>
<a href="/stacks/{{.Name}}/deploy" class="btn btn-sm btn-primary" onclick="return checkBeforeDeploy(event, '{{.Name}}')">Telepítés</a>
{{else}}
{{if isOperational .State}}
<button class="btn btn-sm btn-warning" onclick="stackAction('{{.Name}}', 'restart')">↻</button>
@@ -267,7 +280,7 @@ const stacksTmpl = `
{{if .Protected}}
<span class="badge badge-protected">Védett rendszerkomponens</span>
{{else if not .Deployed}}
<a href="/stacks/{{.Name}}/deploy" class="btn btn-primary">Telepítés</a>
<a href="/stacks/{{.Name}}/deploy" class="btn btn-primary" onclick="return checkBeforeDeploy(event, '{{.Name}}')">Telepítés</a>
<a href="{{appPageURL .Meta.Slug}}" class="btn btn-outline">Részletek</a>
{{else}}
{{if isOperational .State}}