slice 8B.2 (controller): resume app at snapshotted, keep tracking to done (v0.38.0)
Quiesce loop resumes (StartStack + clear marker) at the snapshotted phase instead of done -> downtime whole-backup -> until-snapshot, no consistency loss. Keeps polling to done/failed (no overlapping backup; post-snapshot failure observed). Stop-mode fallback to done + crash-safety preserved. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -38,8 +38,9 @@ type Stacks interface {
|
||||
|
||||
// Backup status phases (mirror the agent's vocabulary).
|
||||
const (
|
||||
phaseDone = "done"
|
||||
phaseFailed = "failed"
|
||||
phaseSnapshotted = "snapshotted" // 8B.2: storage snapshot taken → app may resume early
|
||||
phaseDone = "done"
|
||||
phaseFailed = "failed"
|
||||
)
|
||||
|
||||
// Marker is the persisted quiesce state — the crash-safety + single-flight record. It is written
|
||||
@@ -200,11 +201,25 @@ func (l *Loop) runOnce(ctx context.Context) error {
|
||||
return fmt.Errorf("poll backup status: %w", err)
|
||||
}
|
||||
switch phase {
|
||||
case phaseSnapshotted:
|
||||
// 8B.2: the storage snapshot is taken — the app-stopped state is captured, so the app
|
||||
// may resume NOW (downtime = until-snapshot, not until-backup-done) with no loss of
|
||||
// app-consistency. unquiesce is idempotent (fires once); we then KEEP polling to
|
||||
// done/failed so a new backup isn't started until this one truly finishes (and so a
|
||||
// post-snapshot failure is observed). The marker is cleared on resume — a crash in this
|
||||
// tail leaves the app already up, nothing to recover.
|
||||
if !unquiesced {
|
||||
l.logger.Printf("[INFO] [quiesce] backup job %s snapshotted — resuming app early (8B.2)", jobID)
|
||||
unquiesce("snapshotted (early resume)")
|
||||
}
|
||||
case phaseDone:
|
||||
// Fallback (stop/downgraded mode never emits snapshotted): resume at done, exactly 8B.
|
||||
l.logger.Printf("[INFO] [quiesce] backup job %s done", jobID)
|
||||
unquiesce("backup done")
|
||||
return nil
|
||||
case phaseFailed:
|
||||
// If we already resumed at snapshotted, the app is up — just note the backup failed
|
||||
// (recorded for the agent's due window when it stores the failed result).
|
||||
l.logger.Printf("[WARN] [quiesce] backup job %s failed", jobID)
|
||||
unquiesce("backup failed")
|
||||
return nil
|
||||
|
||||
Reference in New Issue
Block a user