slice 10D (hub): DR capstone — recovery mode + re-enroll + directive serving (hub v0.11.0)
Recovery-mode toggle (global key, bounded auto-expiry) gates re-enroll + restore-directive serving. Re-enroll rotates the agent<->hub credential to the new box (old key revoked); returns the opaque escrow blobs + non-secret directive. Store gains recovery_mode_until + identity_blob + directive_json. Hub holds no usable secret + no Cloudflare write-power (operator-side rotation). Doc 03 §9: slice 10 CLOSED. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,39 @@
|
||||
# Felhom Hub — Changelog
|
||||
|
||||
## v0.11.0 — slice 10D: DR capstone — recovery mode + re-enroll + directive serving (2026-06-10)
|
||||
|
||||
The hub half of the slice-10 DR capstone (closes slice 10). The hub ORCHESTRATES recovery but holds
|
||||
**no usable secret and no Cloudflare write-power**: the escrow blobs it serves are opaque (need `R`,
|
||||
which the hub never has), and the destructive tunnel/PBS rotation is the **operator's** step from a
|
||||
trusted environment. A compromised hub can at most hand out opaque blobs + rotate/revoke its own
|
||||
per-host credential — it cannot hijack a customer's tunnel.
|
||||
|
||||
### Added
|
||||
- **`PUT /admin/hosts/{id}/recovery-mode`** (global key) — arm recovery mode with a bounded TTL
|
||||
(`ttl_seconds`, clamped [60s, 4h], default 30m → **auto-expires**); **`DELETE`** to disable. The
|
||||
restore directive + re-enroll are served ONLY while recovery mode is active.
|
||||
- **`POST /hosts/{id}/re-enroll`** — gated ONLY on recovery mode (the lost box has no old key; the
|
||||
operator armed recovery mode after out-of-band validation). Rotates the host's API key to the new
|
||||
box's key (**the old box's hub access is revoked instantly**) and returns the DR directive + the two
|
||||
**opaque** escrow blobs. Without recovery mode → 403. Zero-knowledge: even a wrongful re-enroll in
|
||||
the window leaks nothing recoverable (the blobs need `R`).
|
||||
- **`GET /hosts/{id}/restore-directive`** (re-enrolled key, recovery-gated) — re-fetch the directive.
|
||||
- **Store/escrow**: `hosts.recovery_mode_until` (additive); `host_escrow.identity_blob` +
|
||||
`directive_json` (the age-wrapped identity blob + non-secret directive, stored alongside the
|
||||
K-escrow). Methods: `SetRecoveryMode`/`ClearRecoveryMode`, `RotateHostAPIKey`, `SaveHostDRBundle`/
|
||||
`GetHostDRBundle`. The slice-7 escrow upload (`PUT /hosts/{id}/escrow`) now also accepts
|
||||
`identity_blob_b64` + `directive` (additive).
|
||||
|
||||
### Not built (by design — the locked rotation model)
|
||||
- **No Cloudflare write-credential in the hub.** The operator deletes the stale tunnel connector +
|
||||
rotates the tunnel/PBS token from their trusted environment (a documented procedure / future small
|
||||
operator CLI). The hub may optionally hold a read-only CF token to surface connector state.
|
||||
|
||||
### Tests
|
||||
- re-enroll refused without recovery mode (403); recovery-mode arm is global-key-only; re-enroll
|
||||
**rotates + revokes** (old key → 401, new key → 200); directive served only in recovery mode +
|
||||
**expires**; clear disables re-enroll.
|
||||
|
||||
## v0.10.0 — slice 10B: signed-op job completion (clear-job) (2026-06-10)
|
||||
|
||||
The hub half of slice 10B is small by design — the hub stores + serves the operator-signed blobs
|
||||
|
||||
@@ -0,0 +1,183 @@
|
||||
package api
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"encoding/base64"
|
||||
"encoding/json"
|
||||
"io"
|
||||
"net/http"
|
||||
"time"
|
||||
|
||||
"gitea.dooplex.hu/admin/felhom-hub/internal/configgen"
|
||||
)
|
||||
|
||||
// Slice 10D — DR capstone, hub side. The hub ORCHESTRATES recovery (recovery-mode toggle, directive
|
||||
// serving, re-enroll + its OWN agent↔hub credential rotation) but holds **no usable secret and no
|
||||
// Cloudflare write-power**: the escrow blobs it serves are opaque (need R, which the hub never has),
|
||||
// and the destructive tunnel/PBS rotation is the operator's step from a trusted environment. A
|
||||
// compromised hub can at most hand out opaque blobs + revoke/rotate its own per-host credential.
|
||||
|
||||
const (
|
||||
defaultRecoveryTTL = 30 * time.Minute // bounded auto-expiry default
|
||||
maxRecoveryTTL = 4 * time.Hour
|
||||
)
|
||||
|
||||
func writeJSON(w http.ResponseWriter, code int, v any) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(code)
|
||||
json.NewEncoder(w).Encode(v)
|
||||
}
|
||||
|
||||
// handleSetRecoveryMode arms recovery mode for a host (GLOBAL/operator key only). Body:
|
||||
// {"ttl_seconds": N} (clamped to [60, maxRecoveryTTL]; default 30m). The directive + re-enroll are
|
||||
// served ONLY while this is active; it auto-expires.
|
||||
func (h *Handler) handleSetRecoveryMode(w http.ResponseWriter, r *http.Request, hostID string) {
|
||||
if _, _, isGlobal, ok := h.checkAuthHost(r); !ok || !isGlobal {
|
||||
http.Error(w, "Forbidden: global key required", http.StatusForbidden)
|
||||
return
|
||||
}
|
||||
if hostID == "" {
|
||||
http.Error(w, "Missing host_id", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
body, _ := io.ReadAll(io.LimitReader(r.Body, 1<<16))
|
||||
var req struct {
|
||||
TTLSeconds int `json:"ttl_seconds"`
|
||||
}
|
||||
json.Unmarshal(body, &req)
|
||||
ttl := defaultRecoveryTTL
|
||||
if req.TTLSeconds > 0 {
|
||||
ttl = time.Duration(req.TTLSeconds) * time.Second
|
||||
}
|
||||
if ttl < time.Minute {
|
||||
ttl = time.Minute
|
||||
}
|
||||
if ttl > maxRecoveryTTL {
|
||||
ttl = maxRecoveryTTL
|
||||
}
|
||||
until := time.Now().UTC().Add(ttl)
|
||||
if err := h.store.SetRecoveryMode(hostID, until); err == sql.ErrNoRows {
|
||||
http.Error(w, "Unknown host_id", http.StatusNotFound)
|
||||
return
|
||||
} else if err != nil {
|
||||
h.logger.Printf("[ERROR] set recovery mode for %s: %v", hostID, err)
|
||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
h.logger.Printf("[INFO] DR: recovery mode ARMED for host %s until %s (auto-expires)", hostID, until.Format(time.RFC3339))
|
||||
writeJSON(w, http.StatusOK, map[string]any{"status": "ok", "recovery_mode_until": until.Format(time.RFC3339)})
|
||||
}
|
||||
|
||||
// handleClearRecoveryMode disables recovery mode (GLOBAL key).
|
||||
func (h *Handler) handleClearRecoveryMode(w http.ResponseWriter, r *http.Request, hostID string) {
|
||||
if _, _, isGlobal, ok := h.checkAuthHost(r); !ok || !isGlobal {
|
||||
http.Error(w, "Forbidden: global key required", http.StatusForbidden)
|
||||
return
|
||||
}
|
||||
if err := h.store.ClearRecoveryMode(hostID); err != nil {
|
||||
h.logger.Printf("[ERROR] clear recovery mode for %s: %v", hostID, err)
|
||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
h.logger.Printf("[INFO] DR: recovery mode DISABLED for host %s", hostID)
|
||||
writeJSON(w, http.StatusOK, map[string]any{"status": "ok"})
|
||||
}
|
||||
|
||||
// reEnrollResponse is the re-enroll / restore-directive payload (slice 10D). The blobs are OPAQUE.
|
||||
type reEnrollResponse struct {
|
||||
HostID string `json:"host_id"`
|
||||
APIKeyRotated bool `json:"api_key_rotated"`
|
||||
Directive json.RawMessage `json:"directive"` // non-secret DR directive
|
||||
KEscrowB64 string `json:"k_escrow_b64"` // opaque PBS-key escrow blob
|
||||
IdentityEscrowB64 string `json:"identity_escrow_b64"` // opaque identity escrow blob
|
||||
}
|
||||
|
||||
// handleReEnroll is the re-enroll handshake (slice 10D.2). Gated ONLY on RECOVERY MODE (the lost box
|
||||
// has no key, so no old-key auth) — the operator armed recovery mode (operational gate) after
|
||||
// out-of-band validation. The new box posts a fresh api_key; the hub ROTATES the host's credential
|
||||
// to it (the old box's hub access is revoked instantly) and returns the DR directive + opaque blobs.
|
||||
// Without recovery mode → 403. The blobs are useless without R (zero-knowledge): even a wrongful
|
||||
// re-enroll within the window leaks nothing recoverable.
|
||||
func (h *Handler) handleReEnroll(w http.ResponseWriter, r *http.Request, hostID string) {
|
||||
if hostID == "" {
|
||||
http.Error(w, "Missing host_id", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
host, err := h.store.GetHost(hostID)
|
||||
if err != nil {
|
||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
if host == nil {
|
||||
http.Error(w, "Unknown host_id", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
// THE GATE: recovery mode must be active (operator-armed, not expired).
|
||||
if !host.InRecoveryMode(time.Now().UTC()) {
|
||||
h.logger.Printf("[WARN] DR: re-enroll REFUSED for %s — recovery mode not active", hostID)
|
||||
http.Error(w, "Forbidden: host not in recovery mode (operator must arm it)", http.StatusForbidden)
|
||||
return
|
||||
}
|
||||
body, _ := io.ReadAll(io.LimitReader(r.Body, 1<<16))
|
||||
var req struct {
|
||||
NewAPIKey string `json:"new_api_key"`
|
||||
}
|
||||
if json.Unmarshal(body, &req) != nil || req.NewAPIKey == "" {
|
||||
// If the box did not supply one, mint it (still rotates the credential).
|
||||
req.NewAPIKey, _ = configgen.RandomHex(32)
|
||||
}
|
||||
// Rotate the agent↔hub credential to the new box — the old box's key is revoked here.
|
||||
if err := h.store.RotateHostAPIKey(hostID, req.NewAPIKey); err != nil {
|
||||
h.logger.Printf("[ERROR] re-enroll rotate key for %s: %v", hostID, err)
|
||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
resp := reEnrollResponse{HostID: hostID, APIKeyRotated: true, Directive: json.RawMessage("{}")}
|
||||
if bundle, err := h.store.GetHostDRBundle(hostID); err == nil && bundle != nil {
|
||||
resp.KEscrowB64 = base64.StdEncoding.EncodeToString(bundle.KEscrowBlob)
|
||||
resp.IdentityEscrowB64 = base64.StdEncoding.EncodeToString(bundle.IdentityBlob)
|
||||
if bundle.DirectiveJSON != "" {
|
||||
resp.Directive = json.RawMessage(bundle.DirectiveJSON)
|
||||
}
|
||||
}
|
||||
h.logger.Printf("[INFO] DR: host %s RE-ENROLLED (hub credential rotated; old key revoked; directive served)", hostID)
|
||||
// The new key is returned so the box can use it; the operator sees the rotation in the response.
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
json.NewEncoder(w).Encode(map[string]any{
|
||||
"host_id": hostID, "api_key_rotated": true, "new_api_key": req.NewAPIKey,
|
||||
"directive": resp.Directive, "k_escrow_b64": resp.KEscrowB64, "identity_escrow_b64": resp.IdentityEscrowB64,
|
||||
})
|
||||
}
|
||||
|
||||
// handleGetRestoreDirective serves the directive to an already-re-enrolled box (its rotated per-host
|
||||
// key), gated on recovery mode. Lets the box re-fetch without re-rotating.
|
||||
func (h *Handler) handleGetRestoreDirective(w http.ResponseWriter, r *http.Request, hostID string) {
|
||||
authHostID, _, isGlobal, ok := h.checkAuthHost(r)
|
||||
if !ok {
|
||||
http.Error(w, "Unauthorized", http.StatusUnauthorized)
|
||||
return
|
||||
}
|
||||
if !isGlobal && authHostID != hostID {
|
||||
http.Error(w, "Forbidden: host_id mismatch", http.StatusForbidden)
|
||||
return
|
||||
}
|
||||
host, err := h.store.GetHost(hostID)
|
||||
if err != nil || host == nil {
|
||||
http.Error(w, "Unknown host_id", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
if !host.InRecoveryMode(time.Now().UTC()) {
|
||||
http.Error(w, "Forbidden: host not in recovery mode", http.StatusForbidden)
|
||||
return
|
||||
}
|
||||
resp := reEnrollResponse{HostID: hostID, Directive: json.RawMessage("{}")}
|
||||
if bundle, err := h.store.GetHostDRBundle(hostID); err == nil && bundle != nil {
|
||||
resp.KEscrowB64 = base64.StdEncoding.EncodeToString(bundle.KEscrowBlob)
|
||||
resp.IdentityEscrowB64 = base64.StdEncoding.EncodeToString(bundle.IdentityBlob)
|
||||
if bundle.DirectiveJSON != "" {
|
||||
resp.Directive = json.RawMessage(bundle.DirectiveJSON)
|
||||
}
|
||||
}
|
||||
writeJSON(w, http.StatusOK, resp)
|
||||
}
|
||||
@@ -0,0 +1,117 @@
|
||||
package api
|
||||
|
||||
import (
|
||||
"encoding/base64"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"gitea.dooplex.hu/admin/felhom-hub/internal/store"
|
||||
)
|
||||
|
||||
// Recovery-mode arm is global-key-only; re-enroll is REFUSED unless recovery mode is active.
|
||||
func TestReEnroll_GatedOnRecoveryMode(t *testing.T) {
|
||||
h, st, _ := newTestHandler(t)
|
||||
seedHost(t, st, "h1", "c1", "OLDKEY")
|
||||
|
||||
// Re-enroll with recovery mode OFF → 403.
|
||||
if rr := do(h, http.MethodPost, "/hosts/h1/re-enroll", "", `{"new_api_key":"NEWKEY"}`); rr.Code != http.StatusForbidden {
|
||||
t.Fatalf("re-enroll without recovery mode = %d, want 403", rr.Code)
|
||||
}
|
||||
// Arm recovery mode requires the global key (per-host key refused).
|
||||
if rr := do(h, http.MethodPut, "/admin/hosts/h1/recovery-mode", "OLDKEY", `{"ttl_seconds":600}`); rr.Code != http.StatusForbidden {
|
||||
t.Errorf("per-host arm recovery = %d, want 403", rr.Code)
|
||||
}
|
||||
if rr := do(h, http.MethodPut, "/admin/hosts/h1/recovery-mode", globalKey, `{"ttl_seconds":600}`); rr.Code != http.StatusOK {
|
||||
t.Fatalf("global arm recovery = %d, want 200", rr.Code)
|
||||
}
|
||||
|
||||
// Now re-enroll succeeds, rotates the credential, returns the directive.
|
||||
rr := do(h, http.MethodPost, "/hosts/h1/re-enroll", "", `{"new_api_key":"NEWKEY"}`)
|
||||
if rr.Code != http.StatusOK {
|
||||
t.Fatalf("re-enroll in recovery mode = %d body=%s", rr.Code, rr.Body.String())
|
||||
}
|
||||
var resp struct {
|
||||
APIKeyRotated bool `json:"api_key_rotated"`
|
||||
NewAPIKey string `json:"new_api_key"`
|
||||
}
|
||||
json.Unmarshal(rr.Body.Bytes(), &resp)
|
||||
if !resp.APIKeyRotated || resp.NewAPIKey != "NEWKEY" {
|
||||
t.Errorf("re-enroll resp = %+v, want rotated NEWKEY", resp)
|
||||
}
|
||||
}
|
||||
|
||||
// Re-enroll ROTATES the hub credential: the old key no longer authenticates; the new one does.
|
||||
func TestReEnroll_RevokesOldKey(t *testing.T) {
|
||||
h, st, _ := newTestHandler(t)
|
||||
st.SaveCustomerConfig(&store.CustomerConfig{CustomerID: "c1", APIKey: "ckey", RetrievalPassword: "p"})
|
||||
seedHost(t, st, "h1", "c1", "OLDKEY")
|
||||
do(h, http.MethodPut, "/admin/hosts/h1/recovery-mode", globalKey, `{"ttl_seconds":600}`)
|
||||
|
||||
// Before: the OLD key authenticates a host-report.
|
||||
if rr := do(h, http.MethodPost, "/host-report", "OLDKEY", validReportBody("h1")); rr.Code != 200 {
|
||||
t.Fatalf("pre-rotate host-report with OLD key = %d, want 200", rr.Code)
|
||||
}
|
||||
// Re-enroll → rotate to NEWKEY.
|
||||
if rr := do(h, http.MethodPost, "/hosts/h1/re-enroll", "", `{"new_api_key":"NEWKEY"}`); rr.Code != 200 {
|
||||
t.Fatalf("re-enroll = %d", rr.Code)
|
||||
}
|
||||
// After: the OLD key is REVOKED (401), the NEW key works.
|
||||
if rr := do(h, http.MethodPost, "/host-report", "OLDKEY", validReportBody("h1")); rr.Code != http.StatusUnauthorized {
|
||||
t.Errorf("post-rotate OLD key = %d, want 401 (revoked)", rr.Code)
|
||||
}
|
||||
if rr := do(h, http.MethodPost, "/host-report", "NEWKEY", validReportBody("h1")); rr.Code != 200 {
|
||||
t.Errorf("post-rotate NEW key = %d, want 200", rr.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// The restore directive (the opaque blobs) is served ONLY in recovery mode, and expires.
|
||||
func TestRestoreDirective_GatedAndExpires(t *testing.T) {
|
||||
h, st, _ := newTestHandler(t)
|
||||
seedHost(t, st, "h1", "c1", "HKEY")
|
||||
// Seed a DR bundle: K-escrow row + identity blob + directive.
|
||||
st.SaveHostEscrow("h1", []byte("opaque-K-escrow"), "01:36:e9:…", "zero_knowledge", time.Now().UTC().Format(time.RFC3339))
|
||||
st.SaveHostDRBundle("h1", []byte("opaque-identity"), `{"pbs_repo":"r","tunnel_id":"t","expected_key_fingerprint":"01:36:e9:…"}`)
|
||||
|
||||
// Not in recovery mode → 403.
|
||||
if rr := do(h, http.MethodGet, "/hosts/h1/restore-directive", "HKEY", ""); rr.Code != http.StatusForbidden {
|
||||
t.Fatalf("directive without recovery mode = %d, want 403", rr.Code)
|
||||
}
|
||||
// Arm recovery mode → served, with both opaque blobs + directive.
|
||||
do(h, http.MethodPut, "/admin/hosts/h1/recovery-mode", globalKey, `{"ttl_seconds":600}`)
|
||||
rr := do(h, http.MethodGet, "/hosts/h1/restore-directive", "HKEY", "")
|
||||
if rr.Code != 200 {
|
||||
t.Fatalf("directive in recovery mode = %d", rr.Code)
|
||||
}
|
||||
var d struct {
|
||||
KEscrowB64 string `json:"k_escrow_b64"`
|
||||
IdentityEscrowB64 string `json:"identity_escrow_b64"`
|
||||
Directive json.RawMessage `json:"directive"`
|
||||
}
|
||||
json.Unmarshal(rr.Body.Bytes(), &d)
|
||||
kb, _ := base64.StdEncoding.DecodeString(d.KEscrowB64)
|
||||
ib, _ := base64.StdEncoding.DecodeString(d.IdentityEscrowB64)
|
||||
if string(kb) != "opaque-K-escrow" || string(ib) != "opaque-identity" {
|
||||
t.Errorf("served blobs wrong: K=%q identity=%q", kb, ib)
|
||||
}
|
||||
|
||||
// Simulate EXPIRY: set recovery_mode_until in the past → directive refused again.
|
||||
st.SetRecoveryMode("h1", time.Now().UTC().Add(-time.Minute))
|
||||
if rr := do(h, http.MethodGet, "/hosts/h1/restore-directive", "HKEY", ""); rr.Code != http.StatusForbidden {
|
||||
t.Errorf("expired recovery mode directive = %d, want 403", rr.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// Clearing recovery mode (global key) disables re-enroll.
|
||||
func TestRecoveryMode_Clear(t *testing.T) {
|
||||
h, st, _ := newTestHandler(t)
|
||||
seedHost(t, st, "h1", "c1", "HKEY")
|
||||
do(h, http.MethodPut, "/admin/hosts/h1/recovery-mode", globalKey, `{"ttl_seconds":600}`)
|
||||
if rr := do(h, http.MethodDelete, "/admin/hosts/h1/recovery-mode", globalKey, ""); rr.Code != 200 {
|
||||
t.Fatalf("clear recovery = %d", rr.Code)
|
||||
}
|
||||
if rr := do(h, http.MethodPost, "/hosts/h1/re-enroll", "", `{"new_api_key":"X"}`); rr.Code != http.StatusForbidden {
|
||||
t.Errorf("re-enroll after clear = %d, want 403", rr.Code)
|
||||
}
|
||||
}
|
||||
@@ -129,6 +129,20 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
||||
case r.Method == http.MethodPut && strings.HasPrefix(path, "/hosts/") && strings.HasSuffix(path, "/escrow"):
|
||||
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/hosts/"), "/escrow")
|
||||
h.handleHostEscrowPut(w, r, hostID)
|
||||
// DR capstone (slice 10D). Recovery-mode toggle (global key); re-enroll + restore-directive
|
||||
// (gated on recovery mode — no old key needed, the box is lost).
|
||||
case r.Method == http.MethodPut && strings.HasPrefix(path, "/admin/hosts/") && strings.HasSuffix(path, "/recovery-mode"):
|
||||
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/admin/hosts/"), "/recovery-mode")
|
||||
h.handleSetRecoveryMode(w, r, hostID)
|
||||
case r.Method == http.MethodDelete && strings.HasPrefix(path, "/admin/hosts/") && strings.HasSuffix(path, "/recovery-mode"):
|
||||
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/admin/hosts/"), "/recovery-mode")
|
||||
h.handleClearRecoveryMode(w, r, hostID)
|
||||
case r.Method == http.MethodPost && strings.HasPrefix(path, "/hosts/") && strings.HasSuffix(path, "/re-enroll"):
|
||||
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/hosts/"), "/re-enroll")
|
||||
h.handleReEnroll(w, r, hostID)
|
||||
case r.Method == http.MethodGet && strings.HasPrefix(path, "/hosts/") && strings.HasSuffix(path, "/restore-directive"):
|
||||
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/hosts/"), "/restore-directive")
|
||||
h.handleGetRestoreDirective(w, r, hostID)
|
||||
// Desired-state serving (slice 10A) — per-host-key, self-scoped (a host reads only its own).
|
||||
case r.Method == http.MethodGet && strings.HasPrefix(path, "/hosts/") && strings.HasSuffix(path, "/desired-state"):
|
||||
hostID := strings.TrimSuffix(strings.TrimPrefix(path, "/hosts/"), "/desired-state")
|
||||
@@ -619,6 +633,9 @@ type escrowUploadRequest struct {
|
||||
KeyFingerprint string `json:"key_fingerprint"` // for operator display only
|
||||
Posture string `json:"posture"` // e.g. "zero_knowledge"
|
||||
CreatedAt string `json:"created_at"` // RFC3339
|
||||
// Slice 10D.1 — optional DR bundle, stored alongside the K-escrow (both opaque/non-secret).
|
||||
IdentityBlobB64 string `json:"identity_blob_b64,omitempty"` // age-wrapped {tunnel_token, pbs_token}
|
||||
DirectiveJSON json.RawMessage `json:"directive,omitempty"` // non-secret directive (pbs repo/ns, expected fp, tunnel id)
|
||||
}
|
||||
|
||||
// handleHostEscrowPut stores a host's opaque escrow blob (doc 03 §8a). Authed with the PER-HOST key
|
||||
@@ -664,6 +681,26 @@ func (h *Handler) handleHostEscrowPut(w http.ResponseWriter, r *http.Request, pa
|
||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
// Slice 10D.1: optionally store the IDENTITY escrow blob + the non-secret DR directive alongside
|
||||
// the K-escrow (both opaque / non-secret — no usable secret hub-side). Additive: a slice-7
|
||||
// upload without these is unchanged.
|
||||
if req.IdentityBlobB64 != "" {
|
||||
idBlob, derr := base64.StdEncoding.DecodeString(req.IdentityBlobB64)
|
||||
if derr != nil || len(idBlob) == 0 {
|
||||
http.Error(w, "Invalid payload: identity_blob_b64 not valid base64", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
directive := req.DirectiveJSON
|
||||
if len(directive) == 0 || !json.Valid(directive) {
|
||||
directive = json.RawMessage("{}")
|
||||
}
|
||||
if err := h.store.SaveHostDRBundle(pathHostID, idBlob, string(directive)); err != nil {
|
||||
h.logger.Printf("[ERROR] Failed to store DR bundle for host %s: %v", pathHostID, err)
|
||||
http.Error(w, "Internal error", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
h.logger.Printf("[INFO] stored DR bundle for host %s (identity %d bytes + directive)", pathHostID, len(idBlob))
|
||||
}
|
||||
h.logger.Printf("[INFO] stored opaque escrow blob for host %s (%d bytes, posture=%s, fp=%s)",
|
||||
pathHostID, len(blob), req.Posture, req.KeyFingerprint)
|
||||
w.WriteHeader(http.StatusOK)
|
||||
|
||||
+104
-3
@@ -301,6 +301,16 @@ func (s *Store) migrate() error {
|
||||
return err
|
||||
}
|
||||
|
||||
// Slice 10D (DR capstone) — additive columns on existing tables (fire-and-forget; a duplicate
|
||||
// column on re-run is ignored). `recovery_mode_until` gates restore-directive serving + re-enroll
|
||||
// (NULL/past = off; future = recovery mode active, auto-expires). host_escrow gains the IDENTITY
|
||||
// blob (age-wrapped {tunnel_token, pbs_token}) + the NON-secret DR directive (pbs repo/namespace,
|
||||
// expected key fingerprint, tunnel id) — the hub serves these only in recovery mode; no usable
|
||||
// secret is hub-held (the blobs need R, which the hub never has).
|
||||
s.db.Exec(`ALTER TABLE hosts ADD COLUMN recovery_mode_until DATETIME`)
|
||||
s.db.Exec(`ALTER TABLE host_escrow ADD COLUMN identity_blob BLOB`)
|
||||
s.db.Exec(`ALTER TABLE host_escrow ADD COLUMN directive_json TEXT NOT NULL DEFAULT '{}'`)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -1287,10 +1297,16 @@ type Host struct {
|
||||
DesiredJSON string
|
||||
DesiredGeneration int64
|
||||
DRRecordJSON string
|
||||
RecoveryModeUntil *time.Time // slice 10D: recovery mode active until this time (nil/past = off)
|
||||
CreatedAt time.Time
|
||||
UpdatedAt time.Time
|
||||
}
|
||||
|
||||
// InRecoveryMode reports whether the host is currently in recovery mode (set + not expired).
|
||||
func (h *Host) InRecoveryMode(now time.Time) bool {
|
||||
return h.RecoveryModeUntil != nil && now.Before(*h.RecoveryModeUntil)
|
||||
}
|
||||
|
||||
// Guest is one controller LXC. Reality columns are report-driven; APIKey and
|
||||
// DesiredSpecJSON are INERT until slice 10 and must survive report upserts.
|
||||
type Guest struct {
|
||||
@@ -1335,10 +1351,10 @@ func GuestID(hostID string, vmid int) string {
|
||||
|
||||
func scanHost(scan func(dest ...any) error) (*Host, error) {
|
||||
var h Host
|
||||
var lastReport sql.NullString
|
||||
var lastReport, recoveryUntil sql.NullString
|
||||
var createdAt, updatedAt string
|
||||
err := scan(&h.HostID, &h.CustomerID, &h.APIKey, &h.AgentVersion, &lastReport,
|
||||
&h.DesiredJSON, &h.DesiredGeneration, &h.DRRecordJSON, &createdAt, &updatedAt)
|
||||
&h.DesiredJSON, &h.DesiredGeneration, &h.DRRecordJSON, &recoveryUntil, &createdAt, &updatedAt)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
@@ -1346,13 +1362,17 @@ func scanHost(scan func(dest ...any) error) (*Host, error) {
|
||||
t := parseSQLiteTime(lastReport.String)
|
||||
h.LastReportAt = &t
|
||||
}
|
||||
if recoveryUntil.Valid && recoveryUntil.String != "" {
|
||||
t := parseSQLiteTime(recoveryUntil.String)
|
||||
h.RecoveryModeUntil = &t
|
||||
}
|
||||
h.CreatedAt = parseSQLiteTime(createdAt)
|
||||
h.UpdatedAt = parseSQLiteTime(updatedAt)
|
||||
return &h, nil
|
||||
}
|
||||
|
||||
const hostSelectCols = `host_id, customer_id, api_key, agent_version, last_report_at,
|
||||
desired_json, desired_generation, dr_record_json, created_at, updated_at`
|
||||
desired_json, desired_generation, dr_record_json, recovery_mode_until, created_at, updated_at`
|
||||
|
||||
// GetHostByAPIKey looks up a host by its per-host hub key. Returns nil (no error)
|
||||
// if no match — parallels GetCustomerConfigByAPIKey.
|
||||
@@ -1525,6 +1545,87 @@ func (s *Store) DeleteSignedJob(hostID, jobID string) error {
|
||||
return err
|
||||
}
|
||||
|
||||
// ---- slice 10D: DR capstone (recovery mode, DR bundle, re-enroll) ----------------------------
|
||||
|
||||
// SetRecoveryMode arms recovery mode for a host until `until` (the operator toggle; bounded
|
||||
// auto-expiry). While active, the hub serves the restore directive + allows re-enroll. Errors
|
||||
// ErrNoRows for an unknown host.
|
||||
func (s *Store) SetRecoveryMode(hostID string, until time.Time) error {
|
||||
res, err := s.db.Exec(`UPDATE hosts SET recovery_mode_until = ?, updated_at = datetime('now') WHERE host_id = ?`,
|
||||
until.UTC().Format("2006-01-02 15:04:05"), hostID)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if n, _ := res.RowsAffected(); n == 0 {
|
||||
return sql.ErrNoRows
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ClearRecoveryMode disables recovery mode (operator confirm, or after re-enroll completes).
|
||||
func (s *Store) ClearRecoveryMode(hostID string) error {
|
||||
_, err := s.db.Exec(`UPDATE hosts SET recovery_mode_until = NULL, updated_at = datetime('now') WHERE host_id = ?`, hostID)
|
||||
return err
|
||||
}
|
||||
|
||||
// RotateHostAPIKey replaces a host's API key (the re-enroll credential rotation — the old box's hub
|
||||
// access is revoked the instant this commits; purely hub-internal, no Cloudflare/PBS write needed).
|
||||
func (s *Store) RotateHostAPIKey(hostID, newAPIKey string) error {
|
||||
res, err := s.db.Exec(`UPDATE hosts SET api_key = ?, updated_at = datetime('now') WHERE host_id = ?`, newAPIKey, hostID)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if n, _ := res.RowsAffected(); n == 0 {
|
||||
return sql.ErrNoRows
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// SaveHostDRBundle stores the IDENTITY escrow blob + the NON-secret DR directive alongside the
|
||||
// existing K-escrow blob (slice 10D.1). The K-escrow row must already exist (slice-7 escrow upload);
|
||||
// this updates the additive 10D columns. The hub holds only ciphertext + non-secret directive.
|
||||
func (s *Store) SaveHostDRBundle(hostID string, identityBlob []byte, directiveJSON string) error {
|
||||
if directiveJSON == "" {
|
||||
directiveJSON = "{}"
|
||||
}
|
||||
res, err := s.db.Exec(`UPDATE host_escrow SET identity_blob = ?, directive_json = ?, updated_at = datetime('now') WHERE host_id = ?`,
|
||||
identityBlob, directiveJSON, hostID)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if n, _ := res.RowsAffected(); n == 0 {
|
||||
return sql.ErrNoRows // no K-escrow row yet — upload the escrow first
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// HostDRBundle is the full DR directive served to a re-enrolling box (slice 10D): the two OPAQUE
|
||||
// escrow blobs (K + identity — useless without R) + the non-secret directive fields.
|
||||
type HostDRBundle struct {
|
||||
KEscrowBlob []byte
|
||||
IdentityBlob []byte
|
||||
DirectiveJSON string
|
||||
}
|
||||
|
||||
// GetHostDRBundle returns a host's DR bundle (nil if no escrow row). The blobs are opaque — the hub
|
||||
// cannot open them (it has no R).
|
||||
func (s *Store) GetHostDRBundle(hostID string) (*HostDRBundle, error) {
|
||||
var b HostDRBundle
|
||||
var directive sql.NullString
|
||||
err := s.db.QueryRow(`SELECT blob, identity_blob, directive_json FROM host_escrow WHERE host_id = ?`, hostID).
|
||||
Scan(&b.KEscrowBlob, &b.IdentityBlob, &directive)
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if directive.Valid {
|
||||
b.DirectiveJSON = directive.String
|
||||
}
|
||||
return &b, nil
|
||||
}
|
||||
|
||||
// SaveHostReport inserts a host_reports row and bumps the host's reality columns
|
||||
// (agent_version/last_report_at/updated_at) — never the inert intent columns.
|
||||
func (s *Store) SaveHostReport(hostID, customerID string, reportJSON []byte, d HostReportDenorm) error {
|
||||
|
||||
Reference in New Issue
Block a user