v0.41.0: first-boot base-infra bring-up + self-heal (+ Section-G mount fix)
New internal/infra package renders traefik/cloudflared/filebrowser from config (pinned images, single source of truth; web filebrowser path delegates here). stacks.EnsureBaseStack deploys the traefik-public network + the three stacks, single-flight + idempotent + non-fatal; wired to first boot and every health tick. monitor.EffectiveProtected drops cloudflared when no tunnel token. Section-G fix lives in felhom-agent build-golden.sh (same-path stacks bind). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,41 @@
|
||||
## Changelog
|
||||
|
||||
### v0.41.0 — first-boot base-infrastructure bring-up + self-heal (+ Section-G mount fix) (2026-06-11)
|
||||
|
||||
Lockstep with `felhom-agent` v0.20.0 + a golden rebake. A freshly-onboarded controller came up ONLINE
|
||||
but **Health = FAIL: protected containers not running — traefik, cloudflared, filebrowser**: nothing
|
||||
ever deployed the base stack on a Proxmox bootstrap (it was only ever created by the bare-metal
|
||||
`scripts/docker-setup.sh`), and the health loop only *detected* the gap. This release makes the
|
||||
controller stand up its own base infrastructure.
|
||||
|
||||
- **New `internal/infra` package** — pure renderers (`//go:embed` templates lifted verbatim from
|
||||
`scripts/docker-setup.sh`) for traefik (`traefik.yml` + compose + a 0600 `.env` carrying the CF DNS
|
||||
token only when set), cloudflared (compose; `TUNNEL_TOKEN`), and filebrowser (compose + `config.yaml`).
|
||||
**Image tags are PINNED here as the single source of truth** — `traefik:v3.6.7`,
|
||||
`cloudflare/cloudflared:2026.6.0`, `gtstef/filebrowser:1.3.3-stable` (no `:latest`). The web
|
||||
FileBrowser sync path now **delegates** to `infra` so the pins can never diverge.
|
||||
- **`stacks.Manager.EnsureBaseStack`** (`internal/stacks/infra.go`) — creates the `traefik-public`
|
||||
network, then deploys traefik → cloudflared → filebrowser under `${stacks_dir}/<name>`. **Single-flight**
|
||||
(TryLock — it's fired from both first-boot and every health tick), **idempotent** (skips a stack whose
|
||||
container is already running), **non-fatal** (logs, never crashes). cloudflared is deployed only when a
|
||||
tunnel token is configured; filebrowser is not overwritten if its compose already exists (preserves the
|
||||
storage mounts the web sync path manages).
|
||||
- **Triggers** (`cmd/controller/main.go`): first-boot bring-up after stack init (goroutine, non-fatal);
|
||||
self-heal calls `EnsureBaseStack` unconditionally on every `system-health` tick (decoupled from the
|
||||
issue strings — safe because of the single-flight + idempotency).
|
||||
- **Dynamic protected set** (`monitor.EffectiveProtected`): cloudflared counts as a protected container
|
||||
only when a tunnel token is configured, so a LAN-only node doesn't report FAIL forever for a stack it
|
||||
intentionally skips. Detection and the bring-up condition agree.
|
||||
- **Section-G fix (in `felhom-agent` build-golden.sh):** the controller writes compose stacks under
|
||||
`/opt/docker/stacks` inside its container, but the bootstrap `docker run` never bind-mounted that path,
|
||||
so the guest daemon resolved every relative bind source on the guest filesystem (empty dirs) — breaking
|
||||
**all** bind-mounted stacks (base infra + customer apps). Fixed with a same-path host bind
|
||||
(`-v /opt/docker/stacks:/opt/docker/stacks`). Empirically confirmed on guest 9201 (probe printed
|
||||
`cat: read error: Is a directory` before, `hello-from-controller` after).
|
||||
- Tests: non-hollow `infra` render tests (customer params present, no `:latest` survives, both ACME/CF
|
||||
branches render, `.env` 0600, rendered YAML parses), `EnsureBaseStack` single-flight, and
|
||||
`EffectiveProtected`.
|
||||
|
||||
### v0.40.0 — bootstrap pull+merge onboarding (controller pulls its config from the hub) (2026-06-11)
|
||||
|
||||
Lockstep with `felhom-agent` v0.19.0. Fixes the onboarding 401: a freshly provisioned guest used to
|
||||
|
||||
Reference in New Issue
Block a user