Commit Graph

20 Commits

Author SHA1 Message Date
admin 3457415117 slice 10D (hub): DR capstone — recovery mode + re-enroll + directive serving (hub v0.11.0)
Recovery-mode toggle (global key, bounded auto-expiry) gates re-enroll +
restore-directive serving. Re-enroll rotates the agent<->hub credential to the
new box (old key revoked); returns the opaque escrow blobs + non-secret
directive. Store gains recovery_mode_until + identity_blob + directive_json.
Hub holds no usable secret + no Cloudflare write-power (operator-side rotation).
Doc 03 §9: slice 10 CLOSED.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 09:48:38 +02:00
admin a22b87e6e3 docs: slice 10D core spike findings (identity-escrow + tunnel re-establishment) — GO
Validated both unvalidated 10D mechanisms: (1) identity-bundle escrow round-trip
via age scrypt+AEAD (recover on a secret-less box, wrong-R fails closed), (2)
Cloudflare tunnel re-establishment — running the recovered token on a new box
routes the hostname there immediately (no DNS change); the old connector is a
hot standby, superseded in routing but not auto-retired -> 10D must rotate the
tunnel/PBS token + retire the stale connector for host-loss security. Redacted;
secrets shredded; live demo untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 23:17:53 +02:00
admin a98210ae00 docs: slice 10C escrow consumption productionized (doc 03 §8a/§9)
Agent-only implementation (felhom-agent v0.17.0 escrow.Consume); no hub code
change. 10C done; 10D is the last piece of slice 10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 22:18:10 +02:00
admin 0c843286a2 slice 10B: signed-op job completion (DELETE clear-job) (hub v0.10.0)
Add DELETE /hosts/{id}/jobs/{job_id} (per-host self-scoped, idempotent) so the
agent clears a job after executing or terminally rejecting it. The hub stores
the operator-signed blobs opaquely (no signing key — cannot forge or open);
the agent verifies + executes. Doc 03 §4/§6/§9 updated (operator-signed path
live; 8C wipe completes; 10B done).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 20:14:32 +02:00
admin e54f882e70 slice 10A: hub desired-state serving + signed-jobs queue (Down channel) (hub v0.9.0)
Serve operator intent to authenticated hosts: PUT /admin/hosts/{id}/desired-state
(global key) bumps desired_generation; GET /hosts/{id}/desired-state + /jobs are
per-host self-scoped; the host-report envelope now carries the real generation +
has_signed_ops. New signed_jobs table + store methods. Desired-state stored/served
opaquely (agent owns the schema). Cross-repo golden (envelope + desired-state)
byte-identical with felhom-agent; doc 03 §4/§9 updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 19:03:14 +02:00
admin f9af3243b9 docs: slice 10C escrow-consumption spike findings (GO)
Validated escrow consumption end-to-end on a genuinely key-less box against
the real felhom-spike datastore: recover K from (blob,R) via the real
escrow.Unwrap, restore REAL data (spike-lxc rootfs, 2.5G) with the recovered
key only, wrong-R fails closed (no plausible-but-wrong key), live K
byte-unchanged. Redacted (no R/K/secret). GO to spec 10C + build 10D.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 17:10:31 +02:00
admin 4590fc0ee0 slice 9 docs + wire-contract: host.cpu_temp_c golden + doc 03 GET /host/metrics
Update the cross-repo host-report golden byte-identical with felhom-agent
(host.cpu_temp_c). Document GET /host/metrics in doc 03 section 6 and define
slice 9 in the section 9 roadmap. No hub code change / no version bump.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 16:16:38 +02:00
admin 5dc363771b doc 03 §8/§9: slice 8B.2 implemented — resume at snapshotted (downtime ~24s->~3s) (2026-06-10)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 15:02:14 +02:00
admin c6dd0ed505 doc 03 §6/§4/§9 + doc 02: slice 8C implemented — controller de-privileged, slice 8 CLOSED (2026-06-10)
§6: disk-management endpoints + reframed principle (non-data-destructive
self-serve; data-destructive stays operator-signed; classifier = agent-internal
device inspection). §4: data-bearing-ness is agent-internal, never caller-claimed.
§9: 8C implemented, slice 8 CLOSED. doc 02: EXECUTED banner. Validated live
(data-bearing format refused; de-privileged controller).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 14:06:55 +02:00
admin d1a3cd0625 doc 03: slice 8B implemented — §8 controller-driven quiesce, §9 table, changelog (2026-06-10)
§8: controller-driven quiesce (stop stacks -> POST /backup -> restart) implemented
(controller v0.36.0 internal/quiesce + agent v0.11.0 cadence/phases); crash-safety
centerpiece + 8B.2 snapshot-mode fast-follow documented. Validated live: quiesced
postgres restore clean vs crash-consistent WAL recovery. §9 table: 8B implemented.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 11:04:36 +02:00
admin e436b61368 doc 03: slice 8A implemented — §6a local-API impl, §9 back-half row, §13 (2026-06-10)
§6a (new): the local-API implementation — stable leaf-SHA-256 pin, token->guest
self-scoping (cross-guest 403), bootstrap.json contract + controller ingestion
(c), baked-controller deploy (no registry cred in guest), firewall narrowing.
§9 slice table: back-half = slice 8A implemented (8B quiesce / 8C de-priv split
out); build-golden.sh bakes the controller. §13 + doc changelog.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 10:02:11 +02:00
admin 4a81a96678 slice 8A spike: agent<->controller channel + controller deploy plumbing findings
Doc-only spike (no hub code change). Validated on demo-felhom (guest 8200,
torn down): (1) guest->host HTTPS over vmbr0 with fingerprint-pin + bearer +
self-scoping (200/401/403, wrong-pin TLS fail, no firewall rule needed);
(2) config-mount + golden-baked bootstrap unit deploys+runs the controller
(docker login/pull/run v0.34.0) with no pct exec. Verdict: GO to 8A spec.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 08:57:48 +02:00
admin 7eb3772000 hub: opaque PBS recovery-code escrow storage (v0.8.0) + doc 03 §8a posture model
Slice-7 close-out (hub half). PUT /api/v1/hosts/{host_id}/escrow (per-host key)
stores the agent's OPAQUE R-wrapped blob verbatim against the host; the hub never
decrypts it (no recovery code, no decrypt path). host_escrow table + Save/GetHostEscrow.
Tests: verbatim store, rotation last-write-wins, 401/403/400 auth+body, wire contract.

doc 03 §8a rewritten into the key-custody posture model: separation principle,
topology matrix, default + anti-lockout ladder, SSH-vs-key, breach/legal, integrity
caveat. Corrected: hub opaque storage is slice 7 (this task); serving is slice 10.
Slice table + §13 updated.

No secrets committed (R/K never appear; spike findings + docs use placeholders).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 07:46:33 +02:00
admin fe7d0850a5 spike(slice7): PBS recovery-code escrow round-trip findings (redacted)
Validated wrap->lose->unwrap->restore on a fenced throwaway: the R-recovered key
decrypts a real encrypted snapshot. Pins the PBS-native command sequence (key
change-passphrase --kdf scrypt/none), the pty requirement (F-A1: TTY-only, env var
ignored) + the echo caveat (F-A2: discard pty output so R can't leak), the blob
format/size, and the R format (EFF wordlist, >=128-bit). No K/R/token value recorded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 07:27:35 +02:00
admin 15c4728e2c doc(03-host-agent): slice-7 bring-up front half + golden host-key unit implemented
§9: the provision front half, guest-loss DR front half, and golden recipe are now
implemented (agent v0.8.0, internal/reconcile/bringup.go; configs/build-golden.sh).
Identity reset settled + implemented: provision resets MAC (unconditional, F1) +
hostname host-side; machine-id + SSH host keys regenerate guest-side (systemd + the
baked first-boot felhom-regen-hostkeys unit, F3) — agent stays host-side-only. Slice
mapping table statuses updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 21:37:54 +02:00
admin 33429933af spike(slice7): golden base build + live bring-up front-half findings
SPIKE-RUNBOOK Slice 7 Phase 0, executed live on demo-felhom. Golden base
(Debian 13 + Docker, nesting=1,keyctl=1, identity-cleaned) built as root@pam,
archived, then token-restored to a throwaway guest and brought up LINK-UP with
the FelhomAgent token (restore/config/resize/start all token-covered).

Key findings:
- MAC reset is UNCONDITIONAL — vzrestore preserves the archived MAC (F1).
- hostname reset is host-side token config (F2).
- machine-id auto-regenerates on first boot (free); SSH host keys do NOT —
  ssh.service fails, agent must run ssh-keygen -A guest-side OR bake a first-boot
  unit (F3, the one surface-widening design consequence).
- keyctl-through-restore is functional (Docker hello-world in the restored guest);
  storage driver overlayfs (F5/F6).
- Settles the §9 / doc-13 identity-reset field list for the provision path.

Verdict: READY to spec the unified bring-up reconcile job (Phase 7.1).
Golden archive kept; both spike guests torn down.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 20:48:50 +02:00
admin e7ed8a8483 doc(03-host-agent): slice-7 scope, scenario-specific identity-reset, PBS escrow (§8a)
- §9 rewritten: bring-up is a shared FRONT HALF only; identity-reset policy is
  scenario-specific (provision = fresh everything; guest-loss DR = preserve
  restic/tunnel/hub continuity, reset only collision-prone host-local identity).
  Added the slice 7/8/10 mapping table.
- NEW §8a: PBS recovery-code escrow (zero-knowledge) — live key on box; agent-generated
  recovery code R; PBS-native passphrase-wrap of K under R escrowed to hub; consumption
  slice 10; irreducible-residual + rotation != key-rotation stated.
- §13 updated (resolved: provision/DR slice boundary + escrow design; open: identity-reset
  set, hub-side escrow storage + restore-mode serving).

Doc-only; no version bump.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 20:25:11 +02:00
admin 94a236328b docs(spike): phase 5 PBS mechanism findings (DooPlex server ← N100 client)
Empirical PBS validation before the slice-6 Phase B spec. Records: PBS install on
Debian-13 DooPlex (trixie key ships in proxmox-archive-keyring, no standalone .gpg),
datastore + cert fingerprint, the PBS privsep gotcha (grant role on user AND token),
the encrypted pbs storage + key location (/etc/pve/priv/storage/<id>.enc), the snapshot
volid format + native fields (→ PBSSnapshot shape), restore-from-PBS works unchanged,
the verify mechanism (server-side; agent drives it remotely via the PBS API, result read
from snapshot verification.state), no operator-token privilege gap, and zero-knowledge
confirmed (server can't decrypt without the client key). PBS+datastore+storage left up
for Phase B; no secrets committed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 16:26:57 +02:00
admin 0d832def7b fix: update repo-name refs after deploy-felhom-compose -> felhom-controller rename
- hub/internal/web/templatefetcher.go: raw-template URL now points at the renamed
  repo (was relying on Gitea's post-rename redirect)
- documentation/ (moved here from the felhom-agent repo): fix controller-source path
  refs (deploy-felhom-compose -> felhom-controller) and the platform repo name
  (proxmox-controller -> felhom-agent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 14:03:13 +02:00
admin 715f644bf0 moved documentation to felhom.eu 2026-06-08 13:50:14 +02:00