v0.54.0: Phase 2b — restore-from-recovery-unit + fail-closed data-key gate
Restore recreates an app from its on-drive unit + the guest's own secrets, regenerating nothing. reconcileRestoreSecrets (pure, unit-tested) merges the unit's non-secret env with secrets recovered from the live app.yaml and FAILS CLOSED if a data-encrypting key is unrecoverable (refuse — a PBS whole-guest restore is needed — rather than regenerate and corrupt). Resettable secrets missing → warn + proceed. - backup: RestoreFromRecoveryUnit (manifest -> recover secrets -> gate -> restore volumes -> recreate definition + redeploy w/ re-pull); falls back to volume-only. - seams: RecoverStackSecrets/RecreateStackFromUnit (adapter +encKey), stacks.RedeployFromEnv. Wired into /backup/restore. - tests: gate (refuse/proceed/verbatim) + data_key parsing. Gate + reconcile + data_key parsing unit-tested; capture live-validated (v0.53.1). Full readable-data e2e vs AdventureLog needs the auth-gated dashboard restore — pending. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,33 @@
|
|||||||
## Changelog
|
## Changelog
|
||||||
|
|
||||||
|
### v0.54.0 — Phase 2b: restore-from-recovery-unit + fail-closed data-key gate (2026-06-13)
|
||||||
|
|
||||||
|
Restore now recreates an app from its on-drive recovery unit **plus the guest's own secrets** — never
|
||||||
|
from secrets stored in the unit (there are none), and **regenerating nothing**.
|
||||||
|
|
||||||
|
- **Fail-closed data-key gate** (`reconcileRestoreSecrets`, `internal/backup/restore_unit.go` — a pure,
|
||||||
|
exhaustively unit-tested function): merges the unit's non-secret env with the secret values recovered
|
||||||
|
from the guest's live app.yaml. A missing/empty **data-encrypting key** (`data_key`) **aborts the
|
||||||
|
restore** with a clear message (a PBS whole-guest restore is required) — because regenerating it would
|
||||||
|
render stored data unreadable. A missing *resettable* secret (DB/admin password) is non-fatal (warn +
|
||||||
|
proceed; the app may need a credential reset). Secrets are recovered, never regenerated.
|
||||||
|
- **`RestoreFromRecoveryUnit`**: reads the unit manifest → recovers secrets from the guest
|
||||||
|
(`RecoverStackSecrets`) → applies the gate → restores named-volume data from the unit's tars →
|
||||||
|
recovers the app definition from the unit and redeploys with the reconstructed env (re-pulling the
|
||||||
|
pinned image). Falls back to the legacy volume-only `RestoreApp` if no unit exists. Wired into the
|
||||||
|
`/backup/restore` web handler.
|
||||||
|
- **New seams:** `StackDataProvider.RecoverStackSecrets` / `RecreateStackFromUnit` (main.go
|
||||||
|
`stackAdapter`, with the controller `encKey` for decrypting the live app.yaml); `stacks.Manager.
|
||||||
|
RedeployFromEnv` (writes app.yaml from the full env incl. locked secrets, then `compose up -d`).
|
||||||
|
- **Tests:** the gate (all recovered / data-key missing → refuse / empty data-key → refuse / resettable
|
||||||
|
missing → proceed+warn, recovered values used verbatim) and `data_key` parsing from `.felhom.yml`
|
||||||
|
(`Metadata.DataKeyEnvVars()`).
|
||||||
|
- **Validation status:** the gate + reconciliation + data_key parsing are unit-tested (authoritative for
|
||||||
|
the refuse/proceed/regenerate-nothing behaviour); the capture side is live-validated (v0.53.1, RomM).
|
||||||
|
The full live **readable-data e2e** against AdventureLog (deploy → back up → restore → confirm the
|
||||||
|
data decrypts) requires triggering the **auth-gated** `/backup/restore` from the dashboard — pending an
|
||||||
|
operator-run on the demo.
|
||||||
|
|
||||||
### v0.53.1 — Phase 2: recovery units refresh on the periodic cache cycle (idempotent) (2026-06-13)
|
### v0.53.1 — Phase 2: recovery units refresh on the periodic cache cycle (idempotent) (2026-06-13)
|
||||||
|
|
||||||
The recovery-unit capture now also runs from `RefreshCache` (controller startup + every 5m), not only
|
The recovery-unit capture now also runs from `RefreshCache` (controller startup + every 5m), not only
|
||||||
|
|||||||
@@ -218,6 +218,7 @@ func main() {
|
|||||||
stackProv := &stackAdapter{
|
stackProv := &stackAdapter{
|
||||||
mgr: stackMgr,
|
mgr: stackMgr,
|
||||||
getStoragePaths: func() []settings.StoragePath { return sett.GetStoragePaths() },
|
getStoragePaths: func() []settings.StoragePath { return sett.GetStoragePaths() },
|
||||||
|
encKey: encKey,
|
||||||
}
|
}
|
||||||
if cfg.Backup.Enabled {
|
if cfg.Backup.Enabled {
|
||||||
backupMgr = backup.NewManager(cfg, sett, logger)
|
backupMgr = backup.NewManager(cfg, sett, logger)
|
||||||
@@ -768,6 +769,7 @@ func setupLogger(cfg *config.Config) (*log.Logger, *web.LogBuffer) {
|
|||||||
type stackAdapter struct {
|
type stackAdapter struct {
|
||||||
mgr *stacks.Manager
|
mgr *stacks.Manager
|
||||||
getStoragePaths func() []settings.StoragePath
|
getStoragePaths func() []settings.StoragePath
|
||||||
|
encKey []byte // for decrypting live app.yaml secrets during restore-from-unit
|
||||||
}
|
}
|
||||||
|
|
||||||
func (a *stackAdapter) GetStackComposePath(name string) (string, bool) {
|
func (a *stackAdapter) GetStackComposePath(name string) (string, bool) {
|
||||||
@@ -904,6 +906,49 @@ func (a *stackAdapter) GetStackRecoveryInfo(name string) (backup.RecoveryInfo, b
|
|||||||
}, true
|
}, true
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// RecoverStackSecrets returns the live decrypted values for the named secret env vars present in the
|
||||||
|
// stack's app.yaml (the guest's own — live rootfs or PBS-restored). Absent/empty names are omitted;
|
||||||
|
// the caller's fail-closed gate decides. Secrets come from the guest, never from the recovery unit.
|
||||||
|
func (a *stackAdapter) RecoverStackSecrets(name string, names []string) map[string]string {
|
||||||
|
s, ok := a.mgr.GetStack(name)
|
||||||
|
if !ok {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
cfg := stacks.LoadAppConfigDecrypted(filepath.Dir(s.ComposePath), a.encKey)
|
||||||
|
if cfg == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
out := make(map[string]string)
|
||||||
|
for _, n := range names {
|
||||||
|
if v, ok := cfg.Env[n]; ok && v != "" {
|
||||||
|
out[n] = v
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
// RecreateStackFromUnit restores the app definition from the unit's compose dir into the stack dir,
|
||||||
|
// then redeploys with the reconstructed full env (re-pulling the pinned image). Secrets in fullEnv were
|
||||||
|
// recovered from the guest, never regenerated.
|
||||||
|
func (a *stackAdapter) RecreateStackFromUnit(name, composeSrcDir string, fullEnv map[string]string) error {
|
||||||
|
s, ok := a.mgr.GetStack(name)
|
||||||
|
if !ok {
|
||||||
|
return fmt.Errorf("stack %q not found", name)
|
||||||
|
}
|
||||||
|
stackDir := filepath.Dir(s.ComposePath)
|
||||||
|
// Recover the app definition from the unit (compose + .felhom.yml) into the stack dir.
|
||||||
|
for _, fname := range []string{"docker-compose.yml", ".felhom.yml"} {
|
||||||
|
data, err := os.ReadFile(filepath.Join(composeSrcDir, fname))
|
||||||
|
if err != nil {
|
||||||
|
continue // capture whichever existed
|
||||||
|
}
|
||||||
|
if err := os.WriteFile(filepath.Join(stackDir, fname), data, 0644); err != nil {
|
||||||
|
return fmt.Errorf("restoring %s from unit: %w", fname, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return a.mgr.RedeployFromEnv(name, fullEnv)
|
||||||
|
}
|
||||||
|
|
||||||
// RefreshAndIsRunning forces a docker ps scan before checking state.
|
// RefreshAndIsRunning forces a docker ps scan before checking state.
|
||||||
// Called during post-restore health check (~every 5s for up to 90s).
|
// Called during post-restore health check (~every 5s for up to 90s).
|
||||||
// Full refresh is acceptable here since restores are rare operations.
|
// Full refresh is acceptable here since restores are rare operations.
|
||||||
|
|||||||
@@ -30,6 +30,19 @@ type StackDataProvider interface {
|
|||||||
// from the guest's own app.yaml, live or via the PBS whole-guest snapshot). ok=false if the
|
// from the guest's own app.yaml, live or via the PBS whole-guest snapshot). ok=false if the
|
||||||
// stack is unknown.
|
// stack is unknown.
|
||||||
GetStackRecoveryInfo(name string) (RecoveryInfo, bool)
|
GetStackRecoveryInfo(name string) (RecoveryInfo, bool)
|
||||||
|
|
||||||
|
// --- Phase 2b: restore-from-recovery-unit ---
|
||||||
|
|
||||||
|
// RecoverStackSecrets returns the live decrypted values for the named secret env vars that are
|
||||||
|
// currently present (non-empty) in the stack's app.yaml (the guest's own — live rootfs, or
|
||||||
|
// PBS-restored). Names that are absent/empty are simply omitted from the map; the caller's
|
||||||
|
// fail-closed gate decides what to do. The unit is never the source of secrets.
|
||||||
|
RecoverStackSecrets(name string, names []string) map[string]string
|
||||||
|
|
||||||
|
// RecreateStackFromUnit restores an app's definition from the unit's compose dir into the stack
|
||||||
|
// dir, writes app.yaml from fullEnv (encrypting secret fields), and (re-)deploys it via
|
||||||
|
// `docker compose up -d`, which re-pulls the pinned image. Secrets are NEVER regenerated.
|
||||||
|
RecreateStackFromUnit(name, composeSrcDir string, fullEnv map[string]string) error
|
||||||
}
|
}
|
||||||
|
|
||||||
// RecoveryInfo carries everything needed to write a secret-free recovery unit for a stack.
|
// RecoveryInfo carries everything needed to write a secret-free recovery unit for a stack.
|
||||||
|
|||||||
@@ -30,6 +30,10 @@ func (f *fakeRecoveryProvider) RefreshAndIsRunning(string) bool { retur
|
|||||||
func (f *fakeRecoveryProvider) GetStackRecoveryInfo(string) (RecoveryInfo, bool) {
|
func (f *fakeRecoveryProvider) GetStackRecoveryInfo(string) (RecoveryInfo, bool) {
|
||||||
return f.info, true
|
return f.info, true
|
||||||
}
|
}
|
||||||
|
func (f *fakeRecoveryProvider) RecoverStackSecrets(string, []string) map[string]string { return nil }
|
||||||
|
func (f *fakeRecoveryProvider) RecreateStackFromUnit(string, string, map[string]string) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
// TestCaptureRecoveryUnitIsSecretFree proves the captured unit (a) contains compose+config+manifest,
|
// TestCaptureRecoveryUnitIsSecretFree proves the captured unit (a) contains compose+config+manifest,
|
||||||
// (b) enumerates the existing dumps, and (c) is SECRET-FREE: a secret value present in the SOURCE
|
// (b) enumerates the existing dumps, and (c) is SECRET-FREE: a secret value present in the SOURCE
|
||||||
|
|||||||
@@ -0,0 +1,140 @@
|
|||||||
|
package backup
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"gopkg.in/yaml.v3"
|
||||||
|
)
|
||||||
|
|
||||||
|
// reconcileRestoreSecrets merges the recovery unit's non-secret env with the secrets recovered from
|
||||||
|
// the guest's own app.yaml, and applies the FAIL-CLOSED data-key gate. It is the safety-critical heart
|
||||||
|
// of Phase 2b and is deliberately a pure function (no I/O) so it can be exhaustively unit-tested.
|
||||||
|
//
|
||||||
|
// Policy (per the Phase 2 design — see REPORT/CHANGELOG):
|
||||||
|
// - Regenerate NOTHING. Every secret comes from the guest (live rootfs, or PBS whole-guest restore).
|
||||||
|
// - A missing DATA-ENCRYPTING key (`dataKeyNames`) is FATAL: regenerating it would render the
|
||||||
|
// restored data unreadable, so we refuse and tell the operator to do a PBS whole-guest restore.
|
||||||
|
// - A missing resettable secret (DB password, admin password) is NON-fatal: it's returned in
|
||||||
|
// `missing` so the caller can warn; the app may simply need a credential reset, no data is lost.
|
||||||
|
func reconcileRestoreSecrets(nonSecretEnv, recoveredSecrets map[string]string, secretNames, dataKeyNames []string) (fullEnv map[string]string, missing []string, err error) {
|
||||||
|
fullEnv = make(map[string]string, len(nonSecretEnv)+len(secretNames))
|
||||||
|
for k, v := range nonSecretEnv {
|
||||||
|
fullEnv[k] = v
|
||||||
|
}
|
||||||
|
have := func(n string) bool {
|
||||||
|
v, ok := recoveredSecrets[n]
|
||||||
|
return ok && v != ""
|
||||||
|
}
|
||||||
|
for _, n := range secretNames {
|
||||||
|
if have(n) {
|
||||||
|
fullEnv[n] = recoveredSecrets[n]
|
||||||
|
} else {
|
||||||
|
missing = append(missing, n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Fail-closed: any unrecoverable data-encrypting key aborts the restore.
|
||||||
|
var missingDataKeys []string
|
||||||
|
for _, dk := range dataKeyNames {
|
||||||
|
if !have(dk) {
|
||||||
|
missingDataKeys = append(missingDataKeys, dk)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if len(missingDataKeys) > 0 {
|
||||||
|
return nil, missing, fmt.Errorf(
|
||||||
|
"refusing to restore: data-encrypting key(s) %v could not be recovered from the guest's app.yaml — "+
|
||||||
|
"a PBS whole-guest restore is required first (regenerating the key would render stored data unreadable)",
|
||||||
|
missingDataKeys)
|
||||||
|
}
|
||||||
|
return fullEnv, missing, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// readStrippedEnv parses the non-secret env from a recovery unit's secret-stripped app.yaml.
|
||||||
|
func readStrippedEnv(path string) map[string]string {
|
||||||
|
data, err := os.ReadFile(path)
|
||||||
|
if err != nil {
|
||||||
|
return map[string]string{}
|
||||||
|
}
|
||||||
|
var s strippedAppYaml
|
||||||
|
if yaml.Unmarshal(data, &s) != nil || s.Env == nil {
|
||||||
|
return map[string]string{}
|
||||||
|
}
|
||||||
|
return s.Env
|
||||||
|
}
|
||||||
|
|
||||||
|
// RestoreFromRecoveryUnit recreates an app from its on-drive recovery unit + the guest's own secrets.
|
||||||
|
//
|
||||||
|
// It reads the unit manifest, recovers the secret values from the guest's live app.yaml, applies the
|
||||||
|
// fail-closed data-key gate, restores the named-volume data from the unit's tars, then restores the
|
||||||
|
// app's definition from the unit and redeploys it with the reconstructed env (re-pulling the pinned
|
||||||
|
// image). No secret is ever regenerated, and no secret is read from the unit. If no unit exists it
|
||||||
|
// falls back to the legacy volume-only RestoreApp.
|
||||||
|
func (m *Manager) RestoreFromRecoveryUnit(stackName string) error {
|
||||||
|
if m.stackProvider == nil {
|
||||||
|
return fmt.Errorf("stack provider not configured")
|
||||||
|
}
|
||||||
|
|
||||||
|
m.mu.Lock()
|
||||||
|
if m.running {
|
||||||
|
m.mu.Unlock()
|
||||||
|
return fmt.Errorf("backup or restore already in progress")
|
||||||
|
}
|
||||||
|
m.running = true
|
||||||
|
m.mu.Unlock()
|
||||||
|
defer func() {
|
||||||
|
m.mu.Lock()
|
||||||
|
m.running = false
|
||||||
|
m.mu.Unlock()
|
||||||
|
}()
|
||||||
|
|
||||||
|
drivePath := m.GetAppDrivePath(stackName)
|
||||||
|
if drivePath == "" || !filepath.IsAbs(drivePath) {
|
||||||
|
return fmt.Errorf("cannot determine drive path for %s", stackName)
|
||||||
|
}
|
||||||
|
nsRoot := m.namespaceRoot(drivePath)
|
||||||
|
|
||||||
|
manifest := readManifest(RecoveryUnitManifestPath(nsRoot, stackName))
|
||||||
|
if manifest == nil {
|
||||||
|
m.logger.Printf("[WARN] [backup] No recovery unit for %s — falling back to volume-only restore", stackName)
|
||||||
|
m.mu.Lock()
|
||||||
|
m.running = false // RestoreApp re-acquires the running flag
|
||||||
|
m.mu.Unlock()
|
||||||
|
return m.RestoreApp(stackName, "")
|
||||||
|
}
|
||||||
|
|
||||||
|
composeDir := RecoveryUnitComposePath(nsRoot, stackName)
|
||||||
|
nonSecretEnv := readStrippedEnv(filepath.Join(composeDir, "app.yaml"))
|
||||||
|
|
||||||
|
// Recover secrets from the GUEST (never the unit), then apply the fail-closed gate.
|
||||||
|
recovered := m.stackProvider.RecoverStackSecrets(stackName, manifest.SecretEnvVars)
|
||||||
|
fullEnv, missing, err := reconcileRestoreSecrets(nonSecretEnv, recovered, manifest.SecretEnvVars, manifest.DataKeyEnvVars)
|
||||||
|
if err != nil {
|
||||||
|
m.logger.Printf("[ERROR] [backup] Restore REFUSED for %s: %v", stackName, err)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if len(missing) > 0 {
|
||||||
|
m.logger.Printf("[WARN] [backup] Restore %s: %d resettable secret(s) unrecoverable %v — proceeding (may need a credential reset; no data-key affected)",
|
||||||
|
stackName, len(missing), missing)
|
||||||
|
}
|
||||||
|
m.logger.Printf("[INFO] [backup] Restoring %s from recovery unit: images=%d, secrets recovered=%d/%d, data_keys=%d",
|
||||||
|
stackName, len(manifest.ImagePins), len(manifest.SecretEnvVars)-len(missing), len(manifest.SecretEnvVars), len(manifest.DataKeyEnvVars))
|
||||||
|
|
||||||
|
// Stop, restore named-volume data, then recreate the definition + redeploy with the recovered env.
|
||||||
|
if err := m.stackProvider.StopStack(stackName); err != nil {
|
||||||
|
m.logger.Printf("[WARN] [backup] could not stop %s before restore: %v (continuing)", stackName, err)
|
||||||
|
}
|
||||||
|
if err := m.restoreDockerVolumes(stackName, drivePath); err != nil {
|
||||||
|
m.logger.Printf("[WARN] [backup] volume restore for %s: %v (continuing)", stackName, err)
|
||||||
|
}
|
||||||
|
if err := m.stackProvider.RecreateStackFromUnit(stackName, composeDir, fullEnv); err != nil {
|
||||||
|
return fmt.Errorf("recreating %s from unit: %w", stackName, err)
|
||||||
|
}
|
||||||
|
if err := m.waitForHealthy(stackName, 90*time.Second); err != nil {
|
||||||
|
m.logger.Printf("[WARN] [backup] %s restored but health check failed: %v", stackName, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
m.logger.Printf("[INFO] [backup] Restore-from-unit completed: %s", stackName)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
@@ -0,0 +1,62 @@
|
|||||||
|
package backup
|
||||||
|
|
||||||
|
import "testing"
|
||||||
|
|
||||||
|
// TestReconcileRestoreSecrets covers the safety-critical fail-closed gate + secret reconciliation.
|
||||||
|
func TestReconcileRestoreSecrets(t *testing.T) {
|
||||||
|
nonSecret := map[string]string{"SUBDOMAIN": "trips", "DOMAIN": "demo-felhom.eu"}
|
||||||
|
|
||||||
|
t.Run("all recovered, no data_key — full env, no error", func(t *testing.T) {
|
||||||
|
recovered := map[string]string{"DB_PASSWORD": "pw", "SECRET_KEY": "deadbeef"}
|
||||||
|
full, missing, err := reconcileRestoreSecrets(nonSecret, recovered,
|
||||||
|
[]string{"DB_PASSWORD", "SECRET_KEY"}, nil)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("unexpected error: %v", err)
|
||||||
|
}
|
||||||
|
if len(missing) != 0 {
|
||||||
|
t.Errorf("missing: %v", missing)
|
||||||
|
}
|
||||||
|
// Non-secret + both secrets present, and recovered values used VERBATIM (regenerate nothing).
|
||||||
|
if full["SUBDOMAIN"] != "trips" || full["DB_PASSWORD"] != "pw" || full["SECRET_KEY"] != "deadbeef" {
|
||||||
|
t.Errorf("full env wrong: %v", full)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("data_key missing — FAIL CLOSED (refuse)", func(t *testing.T) {
|
||||||
|
recovered := map[string]string{"DB_PASSWORD": "pw"} // SECRET_KEY (a data_key) is gone
|
||||||
|
full, _, err := reconcileRestoreSecrets(nonSecret, recovered,
|
||||||
|
[]string{"DB_PASSWORD", "SECRET_KEY"}, []string{"SECRET_KEY"})
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("expected fail-closed error for missing data-encrypting key, got nil")
|
||||||
|
}
|
||||||
|
if full != nil {
|
||||||
|
t.Errorf("full env should be nil on refusal, got %v", full)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("data_key empty value — FAIL CLOSED", func(t *testing.T) {
|
||||||
|
recovered := map[string]string{"SECRET_KEY": ""} // present but empty == unrecoverable
|
||||||
|
_, _, err := reconcileRestoreSecrets(nonSecret, recovered, []string{"SECRET_KEY"}, []string{"SECRET_KEY"})
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("empty data-key value must fail closed")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("resettable secret missing — proceed with warning", func(t *testing.T) {
|
||||||
|
recovered := map[string]string{"SECRET_KEY": "deadbeef"} // data_key ok; DB_PASSWORD missing
|
||||||
|
full, missing, err := reconcileRestoreSecrets(nonSecret, recovered,
|
||||||
|
[]string{"DB_PASSWORD", "SECRET_KEY"}, []string{"SECRET_KEY"})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("a missing resettable secret must NOT fail closed: %v", err)
|
||||||
|
}
|
||||||
|
if len(missing) != 1 || missing[0] != "DB_PASSWORD" {
|
||||||
|
t.Errorf("missing should be [DB_PASSWORD], got %v", missing)
|
||||||
|
}
|
||||||
|
if full["SECRET_KEY"] != "deadbeef" {
|
||||||
|
t.Errorf("data-key should be preserved verbatim: %v", full)
|
||||||
|
}
|
||||||
|
if _, present := full["DB_PASSWORD"]; present {
|
||||||
|
t.Errorf("missing resettable secret should be absent, not regenerated")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
@@ -431,6 +431,48 @@ func (m *Manager) UpdateStackConfig(name string, values map[string]string) error
|
|||||||
return m.RefreshStatus()
|
return m.RefreshStatus()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// RedeployFromEnv writes app.yaml from the given FULL env (encrypting secret fields) and (re-)deploys
|
||||||
|
// the stack with `docker compose up -d`, which re-pulls the pinned image. Used by the restore-from-unit
|
||||||
|
// flow (Phase 2b): unlike UpdateStackConfig it sets the full env INCLUDING locked secrets — which were
|
||||||
|
// recovered from the guest's own app.yaml, never regenerated. Caller is responsible for the gate.
|
||||||
|
func (m *Manager) RedeployFromEnv(name string, env map[string]string) error {
|
||||||
|
stack, ok := m.GetStack(name)
|
||||||
|
if !ok {
|
||||||
|
return fmt.Errorf("stack %q not found", name)
|
||||||
|
}
|
||||||
|
stackDir := filepath.Dir(stack.ComposePath)
|
||||||
|
meta := LoadMetadata(stackDir)
|
||||||
|
|
||||||
|
cfg := &AppConfig{
|
||||||
|
Deployed: true,
|
||||||
|
DeployedAt: time.Now().UTC().Format(time.RFC3339),
|
||||||
|
Env: env,
|
||||||
|
}
|
||||||
|
for _, f := range meta.DeployFields {
|
||||||
|
if f.LockedAfterDeploy {
|
||||||
|
cfg.LockedFields = append(cfg.LockedFields, f.EnvVar)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := SaveAppConfig(stackDir, cfg, m.encKey, SensitiveEnvVars(&meta)); err != nil {
|
||||||
|
return fmt.Errorf("saving app config: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
m.mu.Lock()
|
||||||
|
if s, ok := m.stacks[name]; ok {
|
||||||
|
s.Deployed = true
|
||||||
|
s.AppConfig = cfg
|
||||||
|
}
|
||||||
|
m.mu.Unlock()
|
||||||
|
|
||||||
|
m.logger.Printf("[INFO] [stacks] Redeploying %s from recovery unit with %d env vars", name, len(env))
|
||||||
|
deployEnv := m.stackEnv(stackDir) // decrypts secrets back for compose
|
||||||
|
if _, err := m.composeExecCustomEnv(stackDir, deployEnv, "up", "-d"); err != nil {
|
||||||
|
return fmt.Errorf("compose up: %w", err)
|
||||||
|
}
|
||||||
|
m.logPostStartStatus(name, stackDir, deployEnv)
|
||||||
|
return m.RefreshStatus()
|
||||||
|
}
|
||||||
|
|
||||||
// composeExecWithEnv runs a compose command with custom env vars injected.
|
// composeExecWithEnv runs a compose command with custom env vars injected.
|
||||||
func (m *Manager) composeExecWithEnv(dir string, env map[string]string, args ...string) (string, error) {
|
func (m *Manager) composeExecWithEnv(dir string, env map[string]string, args ...string) (string, error) {
|
||||||
cmdEnv := os.Environ()
|
cmdEnv := os.Environ()
|
||||||
|
|||||||
@@ -0,0 +1,40 @@
|
|||||||
|
package stacks
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestDataKeyParsing proves the catalog `data_key: true` annotation flows through .felhom.yml parsing
|
||||||
|
// into Metadata.DataKeyEnvVars() — the capture-side half of the Phase 2b fail-closed mechanism. The
|
||||||
|
// fail-closed gate itself is unit-tested in internal/backup (reconcileRestoreSecrets).
|
||||||
|
func TestDataKeyParsing(t *testing.T) {
|
||||||
|
dir := t.TempDir()
|
||||||
|
// Mirrors adventurelog/.felhom.yml: SECRET_KEY is a data-encrypting key, DB_PASSWORD is resettable.
|
||||||
|
yml := `display_name: AdventureLog
|
||||||
|
deploy_fields:
|
||||||
|
- env_var: SECRET_KEY
|
||||||
|
label: "Titkosítási kulcs"
|
||||||
|
type: secret
|
||||||
|
data_key: true
|
||||||
|
- env_var: DB_PASSWORD
|
||||||
|
label: "Adatbázis jelszó"
|
||||||
|
type: secret
|
||||||
|
`
|
||||||
|
if err := os.WriteFile(filepath.Join(dir, ".felhom.yml"), []byte(yml), 0644); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
meta := LoadMetadata(dir)
|
||||||
|
dk := meta.DataKeyEnvVars()
|
||||||
|
if len(dk) != 1 || dk[0] != "SECRET_KEY" {
|
||||||
|
t.Fatalf("DataKeyEnvVars() = %v, want [SECRET_KEY]", dk)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Both secrets are sensitive (stripped from the unit); only SECRET_KEY is a data_key (fail-closed).
|
||||||
|
sens := SensitiveEnvVars(&meta)
|
||||||
|
if len(sens) != 2 {
|
||||||
|
t.Errorf("SensitiveEnvVars() = %v, want both SECRET_KEY and DB_PASSWORD", sens)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -714,7 +714,9 @@ func (s *Server) backupRestoreHandler(w http.ResponseWriter, r *http.Request) {
|
|||||||
s.logger.Printf("[WARN] [web] Restore requested: stack=%s, snapshot=%s from %s", stackName, snapshotID, r.RemoteAddr)
|
s.logger.Printf("[WARN] [web] Restore requested: stack=%s, snapshot=%s from %s", stackName, snapshotID, r.RemoteAddr)
|
||||||
|
|
||||||
start := time.Now()
|
start := time.Now()
|
||||||
err := s.backupMgr.RestoreApp(stackName, snapshotID)
|
// Phase 2b: restore from the app's recovery unit (recovers secrets from the guest, fail-closed on
|
||||||
|
// an unrecoverable data-encrypting key; falls back to volume-only restore if no unit exists).
|
||||||
|
err := s.backupMgr.RestoreFromRecoveryUnit(stackName)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
s.logger.Printf("[ERROR] [web] Restore failed: %v", err)
|
s.logger.Printf("[ERROR] [web] Restore failed: %v", err)
|
||||||
if s.isDebug() {
|
if s.isDebug() {
|
||||||
|
|||||||
Reference in New Issue
Block a user