# REPORT — slice 8B.2 (controller half): resume at `snapshotted` (v0.38.0) (2026-06-10) > Overwrite-latest report. Cumulative history: [CHANGELOG.md](CHANGELOG.md). Implements the > controller half of `TASK — Slice 8B.2`. Pairs with `felhom-agent` v0.13.0. No hub change. ## Outcome The quiesce loop (8B) kept the app stopped for the **whole backup**. In snapshot mode the app only needs to be stopped until the **storage snapshot** is taken; after that vzdump reads from the snapshot. The controller now resumes its app at the agent's **`snapshotted`** phase instead of `done` — app downtime drops from *whole-backup* to *until-snapshot*, with no loss of app-consistency. **Measured live: ~3s vs ~23s (~87% cut)**, restore still clean. ## What landed (`internal/quiesce`) - The status-poll loop **resumes (`StartStack` + clears the marker) at `snapshotted`**, then **keeps polling to `done`/`failed`** — so a new backup isn't started until this one truly finishes and a post-snapshot failure is still observed (the backup isn't "successful" until `done`; the early resume does not mark it done). - **Fallback:** if `snapshotted` never arrives (stop/downgraded storage), it resumes at `done` exactly as 8B. The agent only emits `snapshotted` when the actual mode is snapshot. - **Crash-safety unchanged:** marker written before stop; guaranteed unquiesce (deferred); startup `Recover()`. A failure *after* `snapshotted` is harmless — the app is already up. ## Tests `go build ./...` + `go test ./...` green. quiesce: resume at `snapshotted` (RESUME event before `done`, marker cleared, then tracked to `done`); stop-mode fallback (resume at `done`, no `snapshotted`); fail-after-`snapshotted` (single resume, app stays up); the 8B crash-safety tests stay green. ## Live validation (demo-felhom) A provisioned controller v0.38.0 with a postgres stack, short quiesce poll: timeline — `quiescing [pgtest]` 12:58:45 → `snapshotted — resuming app early` 12:58:48 → `backup done` 12:59:08. **App downtime ≈ 3s** (vs ≈ 23s to `done`). The snapshot backup restored to a scratch guest came up **clean** (`database system was shut down at 12:58:45`, no WAL replay) — the early resume preserved app-consistency. The controller kept tracking to `done` after resuming (no overlapping backup). ## Deferred / dependency Snapshot-capable storage (lvm-thin/ZFS) required for the win; stop/downgraded storage falls back to resume-at-`done` (8B). No consistency-contract or crash-safety change. No secrets committed.