# Slice 10D core — identity-escrow round-trip + tunnel re-establishment: Findings **Hosts:** "box1"/OLD = `demo-felhom` (192.168.0.162); "NEW box" = the build server (192.168.0.180). Cloudflare zone `demo-felhom.eu` (per operator instruction — see the zone note in §2), tunnel **`demo-minipc`** (`8b4edf48-…`). `cloudflared` 2026.6.0; `age` (filippo.io/age) scrypt+ChaCha20-Poly1305. **Date:** 2026-06-10. **Driver:** SPIKE — validate the two unvalidated mechanisms under the 10D DR capstone (identity-escrow round-trip + tunnel re-establishment) BEFORE speccing the orchestration. > **REDACTED by policy.** No recovery code `R`, no Cloudflare **tunnel token**, no **API token**, no > tunnel **connector secret**, no identity-bundle token values appear here — only mechanism, command > *shapes*, and routing *behaviour*. Tunnel/zone/connector *identifiers* (non-secret) are shown. R + > all tokens were staged to `0600` files, referenced by path, and **shredded at teardown**. --- ## 1. Phase S1 — identity-escrow round-trip (age over R) — **PASS** The identity bundle `{tunnel_token, pbs_token}` is wrapped under a recovery code `R` and recovered on a secret-less box — the K-escrow mechanism (slice 7/10C), applied to the identity bundle. - **Crypto:** `age` with a **scrypt** passphrase recipient + **ChaCha20-Poly1305 AEAD** (the blob header is `age-encryption.org/v1` / `-> scrypt …`). No hand-rolled crypto — a vetted passphrase-AEAD, equivalent to `age -p`. `R` = a 10-word EFF-wordlist code (the slice-7 generator). - **Wrap → recover:** `wrap(bundle, R) → identity.blob`; on a **fenced, secret-less fresh box** handed ONLY the blob + R, `unwrap(blob, R)` recovered the bundle **byte-identical** to the original (`sha256` match) — `tunnel_token` + `pbs_token` intact. - **Negative (wrong R):** `unwrap(blob, WRONG-R)` **failed closed** — `incorrect passphrase`, **no plaintext emitted** (no file written). Identical fail-closed behaviour to the K-escrow's wrong-R. **F-D1 — the identity bundle escrows exactly like `K`.** Same two-factor, zero-knowledge shape: the blob is opaque without `R`; `R` is the only out-of-band secret. 10D can reuse the **10C `Consume` pattern** (Unwrap → install) for the identity bundle, with `age` (or the PBS-key path) as the AEAD. ## 2. Phase S2 — tunnel re-establishment on a NEW box — **PASS (with a security caveat)** **Zone note:** the operator directed the test to the `demo-felhom.eu` zone (the `sajatfelhom.hu` throwaway zone resolved IPv6-only and was unreachable from the demo host). A **new** test subdomain `dr-spike.demo-felhom.eu` was added (the live `*.demo-felhom.eu` wildcard + all other records were **untouched**) and removed at teardown. The tunnel used was the demo's own `demo-minipc` — its live connector was **down** (demo guests stopped), so no live traffic was displaced; this made it a faithful "host X is back" test (X = the demo). ### Setup - A `CNAME dr-spike.demo-felhom.eu → .cfargotunnel.com` (proxied) was created with the **zone-scoped** API token (DNS:Edit). The same token **lacked** Account `Cloudflare Tunnel:Edit` (`cfd_tunnel` → auth error) — so the tunnel's ingress config could not be set via the API. - OLD box (162) + NEW box (180): each ran `cloudflared` with the **recovered tunnel token**, plus a distinguishable HTTPS origin ("OLD box" / "NEW box") behind the hostname the remote ingress expects. ### Results - **Routing to the connector works.** `dr-spike.demo-felhom.eu` → Cloudflare edge → the tunnel → **the running connector** (the cloudflared log shows `dest=https://dr-spike.demo-felhom.eu/` arriving at the connector). The DNS CNAME → tunnel is **stable**; only the *connector* moves — **no DNS change is needed to move a hostname to a new box.** - **New box takes over routing immediately.** With BOTH connectors up (OLD 162 + NEW 180), **14/14** requests served from **NEW**; **0** from OLD. Cloudflare routes to the most-recently-established connector. - **Old connector is a HOT STANDBY, not auto-retired.** The OLD connector stayed **active + registered** (no unregister/lost events) while serving 0 traffic. On **stopping NEW**, traffic **fell back to OLD (6/6)** within seconds — so OLD was a live failover the whole time. **F-D2 — a tunnel TOKEN carries the credentials, but a remotely-managed tunnel ignores local ingress.** The base64 token decodes to `{AccountTag, TunnelID, TunnelSecret}` → a cloudflared credentials file. BUT for a **remotely-managed** tunnel (dashboard/API config), cloudflared uses the **REMOTE** ingress (here `originService=https://traefik`) and **ignores any local `config.yml`**. So on DR the new box's connector serves the tunnel's **hub/dashboard-owned** ingress → the **origin service (traefik/the app) must be running on the new box** (the restore orchestration brings it up; the 502 here was only the missing origin, not a routing failure). Alternatively DR uses a **locally-managed** tunnel (credentials file + local config) for full local ingress control. **F-D3 — identity continuity is automatic on running the recovered token.** Recovered tunnel token → run cloudflared on the new box → the customer's hostname routes to the new box, **no DNS edit, no operator routing step.** This is the "the host is back as host X" mechanism. **F-D4 / the load-bearing DR consequence — the OLD connector + token stay valid → 10D MUST rotate.** Routing needs **no** explicit old-connector retirement (newest wins, old is standby). BUT the old box's connector remains **registered and authenticated with the SAME tunnel token**, and the leaked token still grants tunnel access. In **host-LOSS DR** (the old box is gone/untrusted/compromised), that is a security gap: a recovered old box (or a leaked token) can silently **re-register and co-serve** the customer's hostname. **10D must, after re-establishment, ROTATE the tunnel token (and PBS token) and/or explicitly delete the stale connector** (`cleanup_connections` / the connector DELETE API) — this needs an **Account `Cloudflare Tunnel:Edit` token**, which the geo-restriction zone token does NOT have (the hub's CF credential placement, design-review S4, must cover tunnel + connector management for DR, not just WAF). ### Gotchas (test-environment, not DR) - **Split-horizon DNS:** the LAN pi-hole resolves `*.demo-felhom.eu → 192.168.0.162`, masking the Cloudflare edge from internal hosts. Tested via the real edge with `curl --resolve :443:` (CF IP from `dig @1.1.1.1`). - **Origin TLS:** the remote ingress origin was `https://traefik`; the spike pointed `traefik → 127.0.0.1` (`/etc/hosts`) at a self-signed HTTPS responder, which the remote config accepted (its `originRequest.noTLSVerify` is set for the internal traefik). On DR the new box must present the real origin. ## 3. GO / NO-GO **GO** to spec **10D**. Both unvalidated mechanisms are proven: 1. The **identity bundle escrows + recovers exactly like `K`** (age scrypt+AEAD; wrong-R fails closed) → reuse the 10C `Consume` shape. 2. **Tunnel re-establishment is automatic**: run the recovered token on the new box → the customer's hostname routes there (no DNS step). The old connector is a hot standby, superseded in routing. **The 10D spec MUST include (consequences of this spike):** - **Identity-escrow CREATION at provisioning** (extend slice-7 escrow to also emit the identity blob: `{tunnel_token, pbs_token, …}` wrapped under the SAME `R`, or a sibling blob) — so DR has it. - **Restore-mode consumption** of the identity blob (10C `Consume` pattern; `R` by hand) + install the tunnel/PBS tokens. - **The new box must run the tunnel's expected origin** (restore orchestration brings up traefik/apps before/with the connector), OR DR uses a locally-managed tunnel config. - **Cred ROTATION after re-establishment** (rotate tunnel + PBS tokens; delete the stale connector) — the security capstone for host-LOSS DR. Requires an **Account Cloudflare-Tunnel-scoped** credential on the hub (broader than the current WAF-only zone token). ## 4. Teardown (verify the live demo is untouched) - **Connectors stopped + removed** on both boxes (cloudflared + the HTTPS/responder units); `cloudflared` binaries removed; `/etc/hosts` `traefik` entries removed. - **DNS:** the throwaway `dr-spike.demo-felhom.eu` record **deleted**; the live `*.demo-felhom.eu` wildcard + all other records **untouched**; the `sajatfelhom.hu` test record (created then abandoned on the zone-switch) **deleted**. - **Tunnel:** its **remote config was never modified** (the API token lacked `cfd_tunnel` permission) — so `demo-minipc` returns to exactly its prior state (no spike connectors; the demo's own connector reclaims it when the demo guest restarts). - **Secrets shredded:** `R`, the identity bundle/blob, the tunnel token, the API token, the cloudflared credentials file (`AccountTag/TunnelID/TunnelSecret`), the throwaway `age` harness. No secret committed. ## Out of scope (note; don't build — → 10D spec) - The recovery-mode toggle + re-enroll handshake + **cred rotation**. - Identity-escrow **creation wired into provisioning** (slice-7 escrow extension). - The **restore orchestration** (consume → pull → `RestoreLXC` → bring up origin → re-establish under identity).