Validated both unvalidated 10D mechanisms: (1) identity-bundle escrow round-trip via age scrypt+AEAD (recover on a secret-less box, wrong-R fails closed), (2) Cloudflare tunnel re-establishment — running the recovered token on a new box routes the hostname there immediately (no DNS change); the old connector is a hot standby, superseded in routing but not auto-retired -> 10D must rotate the tunnel/PBS token + retire the stale connector for host-loss security. Redacted; secrets shredded; live demo untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2.9 KiB
felhom.eu — task reports
Overwrite this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in hub/CHANGELOG.md.
REPORT — Slice 10D core SPIKE: identity-escrow round-trip + tunnel re-establishment (2026-06-10)
Type
SPIKE runbook (CC-executed on the demo). Validated the two unvalidated mechanisms under the 10D DR
capstone before speccing the orchestration. Deliverable: the redacted findings doc
documentation/tests/slice10d-identity-restore-spike-findings.md.
Handled crown jewels (R + identity/tunnel tokens) — staged 0600, by reference, shredded at teardown; no secret committed.
Results — GO to spec 10D
S1 — identity-escrow round-trip (age): the identity bundle {tunnel_token, pbs_token} wraps under
an EFF-wordlist R via age (scrypt + ChaCha20-Poly1305 AEAD), recovers byte-identical on a
secret-less fresh box given only blob + R, and a wrong R fails closed (no plaintext). Mirrors the
proven K-escrow → 10D reuses the 10C Consume shape for the identity bundle.
S2 — tunnel re-establishment: running the recovered Cloudflare tunnel token's connector on a NEW box → the customer's hostname routes to it immediately, no DNS change (the CNAME→tunnel is stable; only the connector moves). With both connectors up, 14/14 requests served from NEW; stopping NEW fell back to OLD (6/6) — the old connector is a hot standby, superseded in routing but NOT auto-retired.
Load-bearing consequence for 10D: routing failover is automatic, but the old box's connector + the (same) tunnel token stay valid → 10D must rotate the tunnel/PBS tokens and/or delete the stale connector after re-establishment (host-LOSS security). That needs an Account Cloudflare-Tunnel -scoped hub credential (broader than the current WAF-only zone token) — feeds the design-review S4 CF-token-placement decision. Also: a remotely-managed tunnel uses its dashboard ingress (cloudflared ignores local config), so the new box must run the tunnel's expected origin (the restore orchestration brings it up).
Safety / teardown
Per operator instruction the test used a new dr-spike.demo-felhom.eu subdomain on the demo's own
(idle — guests down) tunnel; the live *.demo-felhom.eu wildcard + all other records were untouched,
the tunnel's remote config was never modified (the zone API token lacks cfd_tunnel permission), and
the throwaway subdomain + both connectors + all secrets were removed/shredded at teardown. The demo
returns to exactly its prior state.
Out of scope (→ 10D spec)
Recovery-mode toggle + re-enroll handshake + cred rotation; identity-escrow creation wired into
provisioning; the restore orchestration (consume → pull → RestoreLXC → bring up origin → re-establish).