feat: controller-side HTTP/TCP health probes
Add network-level health probing from the controller to deployed apps. The controller probes containers over the shared Docker network and overrides stack state to "unhealthy" if the service isn't responding. Three probe types: http (any response = alive), api (validates status code and body content), tcp (port reachability). Configured per-app via healthcheck: section in .felhom.yml. Runs every minute, per-app interval defaults to 5 minutes. This replaces Docker-level healthchecks for distroless images (e.g. Vikunja) that lack shell utilities, and complements existing Docker healthchecks for other apps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+12
-1
@@ -212,11 +212,22 @@ When app templates are updated (e.g., a new `APP_KEY` secret is added to `.felho
|
||||
| Running + healthy | Green | "Fut" | All containers running and healthy |
|
||||
| Running + starting | Orange | "Indulas..." | Healthcheck not yet passed |
|
||||
| Deploying | Orange | "Telepítés..." | Compose up in progress (image pull, container creation) |
|
||||
| Running + unhealthy | Yellow | "Nem egeszseges" | Healthcheck failing |
|
||||
| Running + unhealthy | Yellow | "Nem egeszseges" | Docker or controller-side healthcheck failing |
|
||||
| Stopped/exited | Red | "Leallitva" | All containers stopped |
|
||||
| Restarting | Yellow | "Ujrainditas..." | Restart loop |
|
||||
| Not deployed | Gray | "Nincs telepitve" | Compose file exists, not deployed |
|
||||
|
||||
#### Controller-side Health Probes (`internal/stacks/healthprobe.go`)
|
||||
|
||||
For apps that declare a `healthcheck:` section in `.felhom.yml`, the controller probes the container directly over the Docker network (both are on `traefik-public`). This complements Docker-level healthchecks and is the **only** health mechanism for distroless/scratch images that lack shell utilities.
|
||||
|
||||
Three probe types are supported:
|
||||
- **`http`** — Any HTTP response (even 4xx/5xx) = service is alive. Only connection refused/timeout = unhealthy.
|
||||
- **`api`** — HTTP request with response validation (expected status code, body content). Fails if expectations aren't met.
|
||||
- **`tcp`** — Simple port reachability check via `net.Dial`.
|
||||
|
||||
Multiple checks per app are supported (all must pass). The probe scheduler runs every minute; per-app intervals default to 5 minutes and are configurable via `healthcheck.interval` in `.felhom.yml`. Probe results are stored in `Stack.HealthProbe` and exposed via the API. Failed probes override the stack state to `StateUnhealthy`; the override clears automatically when the next probe passes.
|
||||
|
||||
---
|
||||
|
||||
### 2. Backup System
|
||||
|
||||
Reference in New Issue
Block a user