Files
felhom.eu/REPORT.md
T
admin a22b87e6e3 docs: slice 10D core spike findings (identity-escrow + tunnel re-establishment) — GO
Validated both unvalidated 10D mechanisms: (1) identity-bundle escrow round-trip
via age scrypt+AEAD (recover on a secret-less box, wrong-R fails closed), (2)
Cloudflare tunnel re-establishment — running the recovered token on a new box
routes the hostname there immediately (no DNS change); the old connector is a
hot standby, superseded in routing but not auto-retired -> 10D must rotate the
tunnel/PBS token + retire the stale connector for host-loss security. Redacted;
secrets shredded; live demo untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 23:17:53 +02:00

2.9 KiB

felhom.eu — task reports

Overwrite this file with a summary of the most recent task only (uniform with the other repos; not cumulative). The cumulative hub history lives in hub/CHANGELOG.md.


REPORT — Slice 10D core SPIKE: identity-escrow round-trip + tunnel re-establishment (2026-06-10)

Type

SPIKE runbook (CC-executed on the demo). Validated the two unvalidated mechanisms under the 10D DR capstone before speccing the orchestration. Deliverable: the redacted findings doc documentation/tests/slice10d-identity-restore-spike-findings.md. Handled crown jewels (R + identity/tunnel tokens) — staged 0600, by reference, shredded at teardown; no secret committed.

Results — GO to spec 10D

S1 — identity-escrow round-trip (age): the identity bundle {tunnel_token, pbs_token} wraps under an EFF-wordlist R via age (scrypt + ChaCha20-Poly1305 AEAD), recovers byte-identical on a secret-less fresh box given only blob + R, and a wrong R fails closed (no plaintext). Mirrors the proven K-escrow → 10D reuses the 10C Consume shape for the identity bundle.

S2 — tunnel re-establishment: running the recovered Cloudflare tunnel token's connector on a NEW box → the customer's hostname routes to it immediately, no DNS change (the CNAME→tunnel is stable; only the connector moves). With both connectors up, 14/14 requests served from NEW; stopping NEW fell back to OLD (6/6) — the old connector is a hot standby, superseded in routing but NOT auto-retired.

Load-bearing consequence for 10D: routing failover is automatic, but the old box's connector + the (same) tunnel token stay valid → 10D must rotate the tunnel/PBS tokens and/or delete the stale connector after re-establishment (host-LOSS security). That needs an Account Cloudflare-Tunnel -scoped hub credential (broader than the current WAF-only zone token) — feeds the design-review S4 CF-token-placement decision. Also: a remotely-managed tunnel uses its dashboard ingress (cloudflared ignores local config), so the new box must run the tunnel's expected origin (the restore orchestration brings it up).

Safety / teardown

Per operator instruction the test used a new dr-spike.demo-felhom.eu subdomain on the demo's own (idle — guests down) tunnel; the live *.demo-felhom.eu wildcard + all other records were untouched, the tunnel's remote config was never modified (the zone API token lacks cfd_tunnel permission), and the throwaway subdomain + both connectors + all secrets were removed/shredded at teardown. The demo returns to exactly its prior state.

Out of scope (→ 10D spec)

Recovery-mode toggle + re-enroll handshake + cred rotation; identity-escrow creation wired into provisioning; the restore orchestration (consume → pull → RestoreLXC → bring up origin → re-establish).