Files
felhom.eu/documentation/architecture/03-host-agent.md
T
admin 3457415117 slice 10D (hub): DR capstone — recovery mode + re-enroll + directive serving (hub v0.11.0)
Recovery-mode toggle (global key, bounded auto-expiry) gates re-enroll +
restore-directive serving. Re-enroll rotates the agent<->hub credential to the
new box (old key revoked); returns the opaque escrow blobs + non-secret
directive. Store gains recovery_mode_until + identity_blob + directive_json.
Hub holds no usable secret + no Cloudflare write-power (operator-side rotation).
Doc 03 §9: slice 10 CLOSED.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 09:48:38 +02:00

650 lines
56 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Architecture Part 3 — The Host Agent
> Status: design draft (decision content). To be grounded by Claude Code against
> `docs/proxmox-platform.md` and `docs/architecture/02-controller-module-map.md`,
> then placed at `docs/architecture/03-host-agent.md`.
>
> Builds on Part 1 (`01-topology-and-trust.md`) and Part 2 (`02-controller-module-map.md`).
> Where this doc and the locked decisions disagree, the locked decisions win and this
> draft is wrong — flag it.
## 1. Purpose & scope
The **host agent** is the operator-tier component that runs on each Proxmox host and
owns *all* Proxmox interaction. It is the trusted host actor: it provisions and restores
guests, manages host storage, orchestrates backups and restore-tests, watches the host
and the tunnel, talks to the hub, and exposes a narrow local API to the in-guest
controllers it deploys.
It is the privileged tier. The controller deliberately holds **no** Proxmox credentials
(Part 1) — the privilege the controller shed by losing `storage/` did not disappear, it
**moved here**. That makes the agent's hardening and blast-radius discipline the most
security-sensitive part of the platform.
The agent manages a **set** of guests on its host (usually one customer = one guest, but
the multi-tenant/company case is not precluded — the agent's data model is per-host,
N-guests, never "the guest").
## 2. Responsibilities (and explicit non-responsibilities)
Owns:
1. **Proxmox lifecycle** — create/start/stop/destroy guests, snapshots, storage allocation. Via a scoped Proxmox API token (the **`FelhomAgent` operator role** — `proxmox-platform.md` §3.6, validated Phase 3 B3) for everything the API covers; raw host ops only where unavoidable.
2. **Storage management** — attach/classify targets, reconcile the storage manifest, mount USB-by-UUID, present mounts into guests.
3. **Backup/restore orchestration** — vzdump to the tiers, PBS, snapshot management, and the **self-restore-test**.
4. **Host & tunnel monitoring** — host metrics, guest up/down, storage-target status, and `cloudflared` health; reports the host domain to the hub.
5. **Provisioning** — provision a guest **by restoring the golden base image** (§9), deploy the controller into it, hand it its bootstrap config; also **build and refresh the golden base image** itself.
6. **Hub control loop** — poll for desired state + signed jobs, reconcile, execute, report, heartbeat.
7. **Local API** — the per-guest authorization gate the controller calls.
8. **Self-update** — update itself (carefully — it is a host service) and update the controllers it owns.
Explicitly does **not**:
- Serve application traffic or sit in the data path. **Control plane, not data plane**: if the agent dies, apps keep serving (Docker + LXC run without it); only *management* degrades — no new backups, no provisioning, hub loses the heartbeat.
- Hold or proxy customer application data.
- Run inside a guest. It is the thing that recovers guests and the host; it cannot be one of them.
- Manage **geo-restriction / the Cloudflare API**. Geo is hub-owned: the customer sets it in the controller UI, the controller reports the geo desired-state to the hub, and the **hub** (holding the CF API token) reconciles the WAF (S4). The agent manages only the *tunnel* service (`cloudflared`, §3/§5), never WAF rules.
## 3. Process model & host integration
- **Native Go binary, systemd service** on the host: boot-start, `Restart=always`, systemd watchdog (kill+restart on hang), journald logging, resource limits.
- **Root-minimized (boundary settled — Phase 3 B3).** The agent runs as a **non-root** service user with the scoped `FelhomAgent` token for all API-covered work + a **narrow `sudoers` allowlist** for true host ops. Per Phase 3 (B3) the boundary is settled: the entire per-customer guest lifecycle — provision (by restore, §9), config, start/stop, snapshot, backup, **restore**, destroy — is token-covered. Genuine OS-root is confined to: (1) building/refreshing the **golden base image** (`keyctl` create is `root@pam`-only — one-time at enrollment + a maintenance cadence, §9); (2) **host mounts** (USB mount-by-UUID, systemd mount units / fstab); (3) **SMART / hardware sensors**. Root therefore never sits on the per-customer path. See `proxmox-platform.md` §3.6 for the role + boundary table.
- **`cloudflared` is a separate systemd service**, not embedded in the agent. This is what makes the data path survive control-plane death by construction. The agent **manages and health-watches** it (see §5) but the tunnel does not live or die with the agent process.
## 4. Control model — reconcile + signed destructive ops
Two channels, split by **reversibility**, not by transport.
**(a) Desired-state reconciliation — steady state.**
The hub holds desired state for the host: which guests should exist (and at what spec),
the storage manifest, backup/retention policies, controller image versions. The agent
runs a reconcile loop converging actual Proxmox state → desired: idempotent, self-healing,
and tolerant of missed polls (drift is corrected on the next loop). Provisioning retries,
re-attach of a flapping USB target, redeploy of a crashed controller — all fall out of
reconciliation for free.
**(b) Signed one-shot jobs — operator actions.**
Restore-now, decommission, force-backup, break-glass-enable. Discrete, run-once
(idempotency key), written to the customer-visible audit log, and **outside** the reconcile
loop — they are point-in-time and often destructive, and a reconciler must never re-run a
restore because it "sees drift." A one-shot job names a **target** ("restore guest X from
snapshot S"), not a procedure; the agent owns the *how*.
**The reversibility gate (security-critical).**
"Signed jobs resist hub compromise" only holds if the agent also distrusts hub-supplied
*desired state* for destructive changes. The gate is by **provenance + data-bearing-ness, not
by verb**:
- **The reconciler MAY act without an operator signature** when: (a) creating/starting/restarting; (b) destroying resources it created earlier **within the same journaled transaction** (compensating rollback, §10); (c) destroying resources it **tagged ephemeral/scratch** (e.g. restore-test scratch guests, §8). The ephemeral/scratch tag is **agent-internal provenance and is never accepted from the hub** — else a compromised hub could relabel a data-bearing guest as scratch to walk the gate.
- **An operator signature is always required** to destroy/overwrite any resource holding the only/primary copy of customer data — live-guest destroy, storage detach/wipe, restore-overwrite, decommission — *regardless of whether it arrives as a job or as a desired-state delta*. A compromised hub cannot forge them because the signing key is **not held by the hub** (it lives with the operator / a separate signing path; the hub only queues opaque signed blobs).
- **Data-bearing-ness is agent-internal evidence, never a caller's claim (slice 8C).** For a customer-driven storage op (`POST /disks/format`, §6) the agent **inspects the actual device** (filesystem signature / partition table / partitions / mount, conservative — ambiguous → data-bearing) to decide the class. A blank device → benign self-serve `mkfs`; a data-bearing device → `ClassStorageWipe` → this gate → `pending_signature`. The **destructive completion of a data-bearing wipe is slice 10** (the operator-signed path); 8C refuses it. This mirrors the provenance rule above: just as the scratch tag is agent-internal (never hub-sourced), data-bearing-ness is agent-observed (never controller-asserted) — a compromised controller cannot relabel a data-bearing drive "blank" to walk the gate.
- **Healing a crashed controller is non-destructive by construction:** it is reconstructable from its image + the guest's persistent volume, so "redeploy" = restart the LXC / `docker compose up -d` **inside the existing guest** — never a guest destroy. (v0.33 precedent: `watchdog.go` restarts stopped stacks, it never destroys the guest.)
Signed payloads carry a **nonce + expiry** (anti-replay: a captured "restore" job cannot be
re-injected later) and a target binding (host + guest id) so a signature can't be retargeted.
Notification-on-destructive-op is an **audit signal, never the guard** — a compromised hub
could both issue and suppress the notice, which is exactly why the *signature* (not the
notification) is the control.
## 5. Hub ↔ agent protocol (host domain)
**Box-initiated poll.** The hub never connects inbound. Each poll cycle exchanges:
- **Up:** heartbeat + a host-domain state report — host CPU/RAM/disk, per-guest up/down + spec, storage-target status (USB connected? NFS/CIFS reachable? PBS reachable?), last backup per target, last restore-test result, `cloudflared` health, agent + controller versions, audit-log tail.
- **Down:** the current desired state, any pending signed one-shot jobs, and config (poll interval, update window, policy changes).
**Dead-man's-switch (essential, not optional).** In a box-initiated model the heartbeat
*is* the liveness signal — a box that stops checking in is otherwise invisible. The hub
alerts the operator when an agent misses its expected check-in window. This is the worst
failure mode for a managed service, so it gets first-class treatment hub-side.
**Break-glass.** Standing inbound control is off. But when the poll loop *itself* is wedged
(agent hung, host sick) you cannot fix it through the poll loop. So there is an explicit,
**off-by-default, customer-consented, fully-audited** emergency path: SSH to the host via
the Cloudflare Tunnel behind Cloudflare Access (or on-site). Enabling it is itself a signed,
logged operation; it auto-expires.
## 6. Agent ↔ controller local API
The controller (in its LXC) reaches the agent (on the host) over the local bridge.
- **Transport:** HTTPS to the host's bridge IP on a fixed port.
- **Auth:** a per-guest local token, minted by the agent when it deploys the controller and written into the guest's bootstrap config. The agent maps token → guest and **authorizes per guest**: a controller can only act on *its own* guest. This is the agent acting as the per-guest authorization gate from Part 1.
- **Surface (minimal, all scoped to the caller's own guest):**
- `GET /storage` — mounts available to this guest and their **class** (fast/slow), so the controller can place hot vs bulk volumes per `.felhom.yml`. (The agent owns the actual mounts; the controller just binds to the paths it's given.)
- `POST /snapshot` — snapshot *this* guest (the snapshot-before-deploy primitive).
- `POST /rollback` — roll *this* guest back to a named snapshot (post-deploy failure recovery).
- `POST /backup` — request a backup-now of *this* guest (enqueued; non-destructive).
- `GET /backup/due` — whether a policy-scheduled backup is due for *this* guest, so the controller can quiesce then call `POST /backup` (the app-consistent path, §8).
- `GET /backup/status`, `GET /restore-test/status` — read-only status for the controller's UI.
- **Host metrics (slice 9):** `GET /host/metrics`**host-wide** health for the customer's
monitoring view: cpu%/mem/load/uptime, **CPU/chassis temperature** (`cpu_temp_c`, nullable —
"n/a" when the hardware exposes no sensor), and per-storage capacity (total/used/fraction,
thin-pool fill, disk SMART temp+wear). It **reuses the slice-4 collector** (no duplicate
collection) and serves a **fresh** collect (current cpu%/temp, not the 15-min hub snapshot).
Unlike the rest of the surface this is **host-wide, not per-guest** (the box, not the caller's
guest) — correct for "see my box's health" — but still **token-authed** via the per-guest token.
**Assumption: one customer per host** (the home-server model); if a host ever served multiple
customers, host-wide CPU/mem would leak cross-customer load → revisit then. The de-privileged
controller (slice 8C) sees only its own cgroup, so it cannot read host health itself; this
re-serves the agent's existing host + storage observation to the customer. **Status:
implemented** (agent v0.14.0 `internal/localapi` + `internal/hub/cputemp.go`; controller v0.39.0
`internal/web/agent_host_metrics_handler.go` + the monitoring page's host-health card).
- **Disk management (slice 8C):** `GET /disks` (host drives + a **data-bearing flag**),
`POST /disks/assign` (attach a drive as a mount — benign, additive, self-serve), `POST
/disks/eject` (safe-unmount, **data preserved**, returns the dependent guests so the controller
warns which apps lose that storage — benign), `POST /disks/format` (see the reframed principle
below). The controller is Docker-only (de-privileged, slice 8C); **execution is the agent's**.
**The principle (reframed for 8C):** a controller may do **non-data-destructive** storage setup
**self-serve** (list, assign, eject, format a *blank* drive); **anything that can lose customer data
stays operator-signed (§4)**. The enforcer is the **classifier**: for `POST /disks/format` the agent
**inspects the actual device itself** (filesystem signature / partition table / partitions / mount —
agent-internal evidence, NEVER the caller's claim) and classifies conservatively (ambiguous →
data-bearing). A blank device → benign → `mkfs`. A data-bearing device → `ClassStorageWipe`
destructive → the §4 gate → refused **`pending_signature`** (the operator-signed completion is slice
10). So a compromised controller asserting "this drive is blank" **cannot** wipe a data-bearing
drive — the 8C analog of self-scoping. **Status: implemented** (agent v0.12.0 `internal/localapi` +
`internal/storage`; controller v0.37.0 `internal/web/agent_disk_handlers.go`).
Note what is *absent*: nothing here lets a controller touch **another guest**, the **host** beyond
this narrow disk surface, or **restore-overwrite**; and within the disk surface, **data-destructive**
power stays operator-signed (§4). Destructive/cross-guest power stays operator-signed.
A controller can only `POST /rollback` (or snapshot/backup) **its own** guest — the agent maps
token → guest and authorizes per guest, so a compromised controller's blast radius is
**self-scoped and bounded** to its own guest.
### 6a. Implementation (slice 8A — implemented)
**Status: implemented** (agent v0.10.0 `internal/localapi`; controller v0.35.0 `internal/bootstrap`
+ `internal/agentapi`). Grounded by `documentation/tests/slice8a-channel-deploy-spike-findings.md`
(commit `4a81a96`). The 7 endpoints above are live; `GET /backup/due` is **thin** in 8A (the
quiesce-on-due consumer is 8B), the rest wrap the existing slice-5/6/7 machinery.
- **Transport / pin.** The agent serves a **persisted self-signed leaf** bound to the host bridge IP
on a fixed port (default `:8443`). The controller pins the **leaf-cert SHA-256** (decision:
consistency with the agent's Proxmox/PBS cert pinning), carried in its bootstrap. The leaf is
generated **once and persisted**, so its fingerprint is stable across agent restarts (a fresh cert
each boot would invalidate every already-issued bootstrap pin). Defense-in-depth: the listener
binds the **bridge IP** (not `0.0.0.0`) and a host firewall rule narrows the port to the guest
bridge subnet (`configs/felhom-localapi-firewall.example`) — the **per-guest token stays the gate**.
- **Token custody.** The per-guest token is minted by the back-half (§9), persisted as a **SHA-256
hash** only (the plaintext exists transiently at mint→write-to-mount, then is discarded), in a
durable last-write-wins map. **Self-scoping** is enforced by the token→guest map alone: the VMID is
resolved from the token, never from a caller-supplied id; an explicit `vmid` that disagrees is
refused (**403**) and the Proxmox op is never issued for the other guest. Absent/unknown token → 401.
- **The bootstrap contract `(c)`.** The agent emits a stable `bootstrap.json`
(`schema: felhom.bootstrap/v1`: customer identity, hub, and the local-API `{endpoint, fingerprint,
token}`) into a read-only config mount; the controller **ingests it on first run and seeds its own
`controller.yaml`, skipping setup mode** (idempotent — never clobbers an existing config; fail-safe
— a malformed/absent bootstrap stays in setup). The agent emits the contract; the controller owns
the translation — they stay decoupled (no shared config schema). **No registry credential ever
enters a guest**: the controller image is **baked into the golden** (§9), so deploy does no
`docker login`/`pull`.
## 7. Storage manifest & reconciliation
The manifest is the load-bearing contract. It absorbs the **persisted** disk-state fields that
`settings.StoragePath` carries today **and adds** `durable_id`/UUID — today the controller
re-derives the UUID from fstab each boot (Part 2 / Phase-3), so persisting it is an
improvement. Held in the hub, reconciled by the agent.
Per target:
| field | meaning |
|---|---|
| `type` | `local-dir` / `usb` / `nfs` / `cifs` / `pbs` |
| `durable_id` | UUID (USB), `server:export` (NFS/CIFS), `repo+fingerprint` (PBS) — survives box loss |
| `class` | `fast` or `slow`, set **once at attach**, with an IOPS marker; no runtime speed-test |
| `role` | `primary` / `vzdump-target` / `pbs-offsite` / `bulk-data` |
| `creds` | encrypted (NFS/CIFS/PBS); USB has none |
| `policy` | schedule + retention for this target |
| `state` | `attached` / `disconnected` / `decommissioned` |
Reconciliation: ensure each `attached` target is mounted (USB-by-UUID via the sudoers
allowlist), each Proxmox storage entry matches, and `disconnected` targets are surfaced to
the hub (the storage watchdog — detect a USB drop in seconds, not at the next health cycle).
**Placement is per-volume, not per-app.** Hot volumes (DB/config) → a `fast` target,
**enforced**; bulk volumes (media) → may live on `slow`, declared in `.felhom.yml`.
A `bulk` volume **MUST** be realized as a `backup=0` **volume mount point** (or an external
bind mount) — **never** a Docker named volume in rootfs, which `vzdump` always captures
(verified, `phase3-findings.md` B2). Proven recipe: attach
`-mpN <storage>:<size>,mp=/mnt/bulk,backup=0`, then
`docker volume create --driver local -o type=none -o o=bind -o device=/mnt/bulk <vol>` (or a
compose bind). The per-volume placement component (Part 2 §5(2)) enforces this at deploy. The
**DR consequence** of excluding bulk is covered in §8.
**Field re-homing (from `settings.StoragePath`, Part 2):** `Label` → manifest (canonical);
`IsDefault`/`Schedulable` → manifest `policy`; `MigratedTo` + decommission → manifest `state`;
`StoppedStacks` → the **controller's `settings`** (app-domain: which apps to restart on
reconnect, not a host concern).
## 8. Backup/restore orchestration
Tiers double as backup *and* restore-source priority (fastest surviving source first),
per Part 1: **snapshot** (LVM-thin, transient, whole-guest rollback — not a backup) →
**local second storage** (vzdump to dir/NFS/CIFS) → **PBS offsite** (the DR substrate).
- **Quiescing (controller-driven for app-consistency) — implemented (slice 8B):** an LXC has no
fsfreeze (`proxmox-platform.md` §4.2), so app-consistency is the controller's job: it learns a
backup is due (`GET /backup/due`, §6) → **quiesces** (stops its app stacks) → `POST /backup`
polls `GET /backup/status` to `done`**unquiesces** (restarts exactly the stacks it stopped).
Implemented in `felhom-controller` v0.36.0 (`internal/quiesce`) + `felhom-agent` v0.11.0 (the
`/backup/due` cadence policy + `/backup/status` phases). **An agent-initiated vzdump is
crash-consistent only** (there is no inbound-to-guest channel to trigger a quiesce — §3/§5); the
controller stopping its stacks first is what makes the captured state **clean-shutdown-consistent**
(validated live: a quiesced postgres restore comes up clean — "database system was shut down" — vs
a crash-consistent restore doing WAL recovery — "redo starts… redo done"). Every Proxmox op is
async → the agent polls `task exitstatus`, never trusts the POST return.
- **Crash-safety (the centerpiece — a stranded-down app is worse than a crash-consistent backup):**
a persisted marker written **before** stopping anything; **guaranteed unquiesce** (restart on a
backup error, a status-poll error, the max-quiesce bound, or controller shutdown); a
**max-quiesce-duration** hard bound (restart the app no matter what — the backup finishes on the
agent); and **crash recovery** at controller startup (restart stacks left stopped by a mid-quiesce
crash). The marker also single-flights the loop. All proven live + unit-tested.
- **8B.2 downtime optimization — implemented (agent v0.13.0 + controller v0.38.0):** in snapshot
mode, vzdump only needs the app-stopped state captured at the **storage-snapshot moment**; after
that it reads from the snapshot. The agent watches the vzdump task log for the snapshot marker
(`create storage snapshot`, validated on PVE 9.2.2) and emits a **`snapshotted`** phase on
`/backup/status`; the controller **resumes its app at `snapshotted`** (not `done`), cutting app
downtime from *whole-backup* to *until-snapshot* (~24s→~1s for a 934 MB guest) with **no loss of
app-consistency** (the snapshot froze the app-stopped state). Depends on snapshot-capable storage
(lvm-thin/ZFS); on stop/downgraded storage the marker never appears and the controller **falls
back to resume-at-`done`** (8B). The controller keeps tracking to `done`/`failed` after early
resume (no overlapping backup; the backup isn't "successful" until `done`).
- **Bulk volumes have no DR coverage from the guest vzdump** — they are excluded (§7). Every
`bulk` volume needs an explicit own-backup decision: its own backup target per the manifest
`policy`, **or deliberately none** when the data is re-downloadable (customer informed). On
host-loss, un-backed-up bulk is gone; a **bind-mounted** bulk volume re-attaches only on the
*same* host, so cross-host DR needs the separate backup. A deliberate per-volume choice,
never a silent loss.
- **Key custody (PBS):** the **live** PBS key sits on the box so the agent can both back up
*and* run restore-tests. The hub holds only the **recovery-code-wrapped escrow** copy it
cannot open (zero-knowledge default). So: the box can restore-test; the operator cannot
read the data; the customer's offsite recovery code is the irreducible residual.
- **Self-restore-test:** the closing of the "tested restore is the critical gap" theme. The
agent periodically restores a backup into a **throwaway scratch guest**, boots it, runs
health checks, reports pass/fail, and tears it down. Zero-knowledge backups can *only* be
restore-tested by the box (the operator lacks the key) — so this lives in the agent by
necessity, not just convenience. Integrity-verify (cheap, ciphertext-level) runs more often
as the lighter check.
### 8a. PBS recovery-code escrow + the key-custody posture model (zero-knowledge offsite-key recovery)
The DR substrate is the PBS offsite tier, client-side encrypted (zero-knowledge): if the box dies,
restoring the offsite backups requires the **PBS client encryption key `K`**, which died with the
box. The escrow is how `K` comes back **without** Felhom ever being able to read customer data.
**Status: implemented** — escrow *creation* (agent v0.9.0, `internal/escrow`) + hub *opaque storage*
(hub v0.8.0, `PUT /api/v1/hosts/{host_id}/escrow`). Validated end-to-end on a throwaway in
`documentation/tests/slice7-escrow-spike-findings.md`. Restore-mode *serving/consumption* is slice 10.
#### The separation principle (the rule that governs every posture)
Reading customer data needs **BOTH** the encrypted chunks **AND** a usable key. **Zero-knowledge
holds for exactly as long as Felhom never holds both at once.** Every posture below is just a
choice about where the data and the key live; the principle decides who can read.
#### Topology matrix (data location × key custody → who can read)
| Data location | Key custody | Who can read | Notes |
|---|---|---|---|
| **Felhom storage** | customer-only key | **only the customer** | **the DEFAULT** — genuine zero-knowledge |
| **Felhom storage** | Felhom also holds a key | **Felhom can read** | the one dangerous cell — explicit, informed opt-in only; never default, never silent |
| Customer's own offsite | customer key | only the customer | self-hosted data; key XOR data |
| Customer's own offsite | Felhom holds a key | only the customer | safe by separation (key and data never co-located at Felhom) |
#### The escrow mechanism (decisions + the rationale that pins them)
- **Live key unencrypted on the box** (`0600`, root): the agent backs up *and* runs restore-tests
unattended — no passphrase prompt on the management path. The privilege concentration this implies
is the whole argument for §3 root-minimization + a small auditable agent.
- **Wrap — PBS-native, not custom crypto.** At enrollment the agent generates a high-entropy
**recovery code `R`** and produces a **passphrase-protected copy of `K` under `R`** via PBS's own
key passphrase KDF (`proxmox-backup-client key change-passphrase --kdf scrypt`; no bespoke AEAD).
The spike pinned two implementation constraints: that command is **TTY-only** (drive it over a
pty), and the pty **echoes the passphrase** (discard the pty output so `R` can't leak) — F-A1/F-A2.
- **Agent-side generation.** `R` is generated **on the box** (it already holds `K` and does the
wrapping), so `R` never touches the hub even in transit — zero-knowledge by construction. `R` is
≥128 bits, **word-list form** (EFF large wordlist, 10 words ≈ 129 bits) for off-paper transcription.
- **Self-verify before shipping.** Creation unwraps a copy of the blob with `R` and checks the key
fingerprint matches — "an escrow you haven't recovered isn't an escrow."
- **Escrow = the `R`-wrapped blob → hub (opaque storage, slice 7).** The hub stores the ciphertext
bytes against the host record and **never decrypts them** (it has no `R`; there is no decrypt
path). Per-host-key authed; rotation is last-write-wins. **Restore-mode serving is slice 10.**
- **Recovery code custody.** `R` is surfaced to the customer **exactly once** at enrollment
(printed/displayed) and **never stored by Felhom in any recoverable form**.
#### Default posture + the anti-lockout ladder (opt-in, increasing trust)
**Default:** *Felhom storage + customer-only key*, and **`R` is delivered durably (printed) always**
— note this is distinct from a raw-key paperkey: `R` is a safe two-factor *passphrase* (useless
without the hub's blob); the raw key is the footgun. The ladder trades resilience for trust:
- **(b) `R`-wrapped offline copy** — the same two-factor blob, for the customer to print/store. **No
extra trust**; resilience if the hub ever vanishes (still needs `R`). *Implemented (opt-in).*
- **(a) raw paperkey** — `proxmox-backup-client key paperkey` of the unwrapped key, for a safe.
Covers **losing `R`**, but it is **single-factor and unrevocable**. *Implemented (opt-in, loud
caveat).*
- **Felhom-holds-a-key** — maximum convenience, but **gives up zero-knowledge** (the dangerous
matrix cell). **Not implemented** — it needs a separate Felhom-side secure key store + explicit
opt-in UX, built only when a customer asks.
#### SSH-for-support is a SEPARATE grant — deliberately not coupled to key custody
Support access (active / consented / observable — customer-toggleable, commands shown) is **not**
the same as a standing / passive / invisible decryption capability. The transparency features prove
*controlled* support access **without Felhom holding a key**. Conflating the two is exactly the
mistake the separation principle prevents.
#### Why zero-knowledge stays the default (breach + legal)
Holding data **and** a key makes a single hub breach an **all-customer data leak**, and makes Felhom
**compellable** — a court can order what Felhom *can* produce. Genuine zero-knowledge means *"we
can't be forced to hand over what we can't read."* This is core to the sovereignty pitch, not a
nicety.
#### Honesty properties (stated to the customer at enrollment)
- **Irreducible residual:** losing `R` *and* the box (and, if not opted in, having no paperkey) =
the offsite backups are **unrecoverable, by anyone, including Felhom.** The cost of genuine
zero-knowledge — communicated, not buried.
- **Rotation ≠ key rotation:** rotating `R` re-wraps the escrow blob (and re-shows a new code) but
does **not** re-encrypt existing PBS data — that stays keyed by `K`. Changing `K` itself is a
separate, heavier op (new key → new backups; old backups still need old `K`), out of scope for
routine recovery-code rotation.
- **Integrity caveat (self-hosted-data postures):** moving data to the customer's own offsite
**loses Felhom's backup guarantees** — no PBS verify / monitoring on storage we can't reach. An
honest signup-time tradeoff, not a hidden one.
## 9. Provisioning & DR flows
**Provisioning (reconcile-driven, by restore).** Fresh creation of a Docker-capable LXC needs
the `keyctl=1` feature flag, which Proxmox permits only for `root@pam` (Phase 3, B3) — not the
scoped token. But a token-authorized **restore preserves `keyctl`** (Phase 3, B3, empirically:
a token `vzrestore` of a keyctl archive produced a guest that kept `features:
nesting=1,keyctl=1,unprivileged:1`), so the agent provisions **by restoring a golden base
image**, never by `pct create` on the per-customer path.
**Golden base image.** A **golden base archive** — minimal Debian + Docker, `nesting=1,keyctl=1`,
overlayfs — is built once as `root@pam` **at enrollment** (when the agent legitimately holds root
to mint its Proxmox token) and refreshed on a maintenance cadence. This is the one place
`keyctl`/root provisioning lives — off the per-customer path. Refresh cadence + fleet versioning
remain an operational open item (§13).
**Unified bring-up primitive (shared *front half* — NOT shared identity policy).** Provisioning
and DR-restore share one token-covered front-half code path:
> restore an archive → **reset identity** → size the guest (CPU/mem config + `pct resize`
> rootfs, token-covered) → attach storage mounts per the manifest
run as a **journaled reconcile job**; a mid-flight failure is compensating-rolled-back (destroy
the just-restored guest — allowed unsigned per §4, same-transaction provenance). They diverge in
the *archive* and the *back half*, **and in identity policy** (below).
**Identity reset is scenario-specific — this is a correctness boundary, not a detail.** "Reset
identity" is shorthand for two different operations:
- **Provision (golden base) → fresh identity, everything.** A provisioned guest is new: reset
MAC + hostname **host-side via the token config** (the agent does NOT touch guest internals),
while **`/etc/machine-id`** (a duplicate breaks journald/DHCP/systemd) and **SSH host keys**
regenerate **guest-side on first boot** — machine-id by systemd for free, host keys by a baked,
Condition-gated `felhom-regen-hostkeys.service` unit in the golden (the F3 decision: Debian does
NOT auto-regenerate host keys after a restore, so the golden carries the regeneration, keeping
the agent host-side-only). It then receives a **fresh** controller identity (host-id, local
token, hub channel), **fresh restic repo identity**, and a fresh tunnel association — all minted
in the back half (slice 8A — implemented).
- **Guest-loss DR (customer backup) → preserve continuity identity, reset only what would
collide.** The restored guest must *continue* the customer's world: **keep** the restic repo
identity (resetting it orphans the existing backup chain — a silent data-continuity bug), the
tunnel/DNS association, and the hub host/customer binding. Reset only collision-prone host-local
identity (`machine-id`, SSH host keys, hostname as needed). **MAC is reset only when a source
guest may still be live** (e.g. partial loss, or the restore-*test* which boots link-down for
exactly this reason); in a true total guest-loss the original is gone, so the MAC can be kept to
preserve DHCP reservations. The agent decides MAC handling from the scenario, not a fixed rule.
The exact reset set was pinned empirically by the slice-7 bring-up spike (live, link-up —
`documentation/tests/slice7-bringup-spike-findings.md`, commit `3342993`) and **implemented in the
unified bring-up reconcile job** (agent v0.8.0, `internal/reconcile/bringup.go`): F1 — a restore
preserves the archived MAC, so provision reset is unconditional (`PUT net0` with `hwaddr` omitted);
F3 — host keys via the baked golden unit, not an agent guest-internal op.
**Guest loss (slice 7).** Agent restores G from the fastest surviving tier (snapshot → local →
PBS) and applies the **DR identity policy** above so the restored guest rejoins cleanly. The
customer backup already contains the controller + data, so there is **no controller deploy** in
this path — bring up + reattach external storage and it is whole. This is fully in slice 7.
### Slice mapping (what is built where — keep this current)
| Capability | Slice | Status |
|---|---|---|
| Golden base image build (root@pam, at enrollment) | **7** | **recipe implemented** (`felhom-agent/configs/build-golden.sh`, incl. the F3 host-key unit; **now also bakes the controller image + a controller-bootstrap unit**, slice 8A); golden archived at enrollment |
| Unified bring-up **front half** (restore→reset identity→size→attach storage), journaled + compensating rollback | **7** | **implemented** (agent v0.8.0, `internal/reconcile/bringup.go`) |
| **Guest-loss DR** (front half + DR identity policy; no controller deploy) | **7** | **implemented** (v0.8.0, `dr_guest_loss` mode — continuity identity preserved) |
| PBS recovery-code escrow **creation** + **hub opaque storage** (§8a) | **7** | **implemented** (agent v0.9.0 `internal/escrow`; hub v0.8.0 `PUT /hosts/{id}/escrow`) |
| **Local API** server (§6) + provisioning **back half** — deploy controller, hand bootstrap config, mint per-guest local token | **8A** | **implemented** (agent v0.10.0 `internal/localapi` + `internal/provision`; controller v0.35.0 `internal/bootstrap` + `internal/agentapi`). The controller image is **baked into the golden** (no registry cred in any guest); the back-half mints the token, writes a 0600 `bootstrap.json` to a `chown 100000:100000` config mount, and `pct set`-attaches it read-only; the golden's baked unit deploys the controller, which ingests the bootstrap, comes up configured, and reaches the agent over the bridge (leaf-pin + token). Validated live end-to-end on the demo. |
| **Quiesced app-consistent backup** (`/backup/due`-driven stack-stop) | **8B** | **implemented** (agent v0.11.0 `/backup/due` cadence + `/backup/status` phases; controller v0.36.0 `internal/quiesce` — stop stacks → backup → restart, with crash-safety marker/guaranteed-unquiesce/max-bound/crash-recovery). Validated live incl. the postgres clean-vs-crash-recovery restore contrast. **8B.2 downtime optimization (resume at `snapshotted`) implemented** (agent v0.13.0 + controller v0.38.0 — §8). |
| **Controller de-privileging** (retire the disk-execution subsystem; new customer disk endpoints behind the slice-4 data-bearing classifier) | **8C** | **implemented — slice 8 CLOSED** (agent v0.12.0: `/disks` endpoints + the data-bearing classifier gate + `mkfs`; controller v0.37.0: ~12.3k LOC of disk-execution retired — storage/restic/cross-drive/migrate/watchdog/scanner/infra-backup — `backup.Manager` split to app-data only, disk mgmt rewired to the agent, container de-privileged). The data-bearing format refusal (§6) is the security centerpiece. |
| **Host metrics to the controller** (`GET /host/metrics` — the customer host-health view) | **9** | **implemented** (agent v0.14.0: `GET /host/metrics` reuses the slice-4 collector + a new CPU/chassis-temp collector `internal/hub/cputemp.go`, graceful-null; the shared `HostMetrics` gains `cpu_temp_c` so the hub report carries it too — cross-repo golden updated; controller v0.39.0: agentapi `HostMetrics()` + a thin `/api/host-metrics` proxy + the monitoring page's host-health card). **Host-wide, token-authed, fresh** (not the 15-min hub snapshot). **Assumption: one customer per host** (the home-server model) — host-wide CPU/mem would leak cross-customer load on a multi-customer host; revisit then. Out of scope: multi-tenant metric filtering; historical/time-series storage (this is a live snapshot). |
| **Hub desired-state serving** (the "Down" channel) — store + serve per-host desired-state, bump `desired_generation`, signed-jobs queue + `has_signed_ops`; agent activates the envelope + a hub-backed provider (benign reconciled, destructive gated pending) | **10A** | **implemented** (hub v0.9.0: `PUT /admin/hosts/{id}/desired-state` bumps the generation, `GET /hosts/{id}/desired-state` + `/jobs` self-scoped, `signed_jobs` queue; agent v0.15.0: `ControlEnvelope` fields live, `Client.FetchDesiredState`, `internal/desired` Syncer + `reconcile.CachingProvider` feeding the engine — an explicit guest `decommission` is the destructive delta, gated `pending_signature`). Serves to already-authenticated hosts only; desired-state stored opaquely (agent owns the schema). Cross-repo golden (envelope + desired-state) byte-identical. |
| **Signed-op execution** (verify + run the gated destructive op) | **10B** | **implemented** (agent v0.16.0: `cmd/felhom-opsign` offline signing CLI + `internal/signedjobs` runner/WipeExecutor + `internal/storage` durable-device resolution; hub v0.10.0: `DELETE /hosts/{id}/jobs/{job_id}` completion). Verify → durable nonce-burn → execute → clear; pinned-key (multi-key rotation, trusted path), host + **durable-id** anti-retarget, 8C re-inspect. Closes the 8C data-bearing-wipe gap. Other destructive executors (guest_destroy, decommission, restore-overwrite → 10D) reuse the same gate+runner machinery. |
| **PBS escrow consumption** (recover `K` on a new box) | **10C** | **implemented** (agent v0.17.0: `escrow.Consume` = Unwrap → fingerprint-gate → atomic install; spike-proven crypto + real-data restore productionized; `--selftest=escrow-consume`). Zero-knowledge holds (hub serves all but R). Spike findings: `documentation/tests/slice10-escrow-consumption-spike-findings.md`. The four inputs are sourced from the hub directive in 10D. |
| **Host/hardware loss** DR — re-enroll in "restore mode"; hub serves identity / tunnel token / restore directive; consume + restore + re-establish under identity; **operator-side** cred rotation | **10D** | **implemented — SLICE 10 CLOSED** (agent v0.18.0: identity escrow via `age` + `Consume`/identity-consume + restore-mode orchestration; hub v0.11.0: recovery-mode toggle + auto-expiry + re-enroll credential rotation + directive serving). Locked rotation model: **hub holds no Cloudflare write-power**; the operator deletes the stale connector + rotates the tunnel/PBS token from a trusted environment. Both 10D mechanisms spike-validated. Deferred (non-blocking): the DR web-UI page + a small operator rotation CLI. |
| Golden base refresh cadence + fleet versioning | post-launch | operational, non-blocking (§13) |
**Host/hardware loss (design intent — slice 10).** Re-enroll the new host in **restore mode**;
the hub — the durable source of truth that survives box death — hands the new agent the existing
identity, PBS namespace, tunnel token, storage manifest, a restore directive, and the **escrow
blob** (§8a) for the customer to unlock with their recovery code. Tunnel is reused from the hub
record, so DNS stays intact. This depends on hub desired-state serving (slice 10) and is not
buildable until then; recorded here so the front-half built in slice 7 lands ready for it.
## 10. Concurrency, crash-safety, idempotency
- **Per-guest serialization.** Reconcile, one-shot jobs, and local-API calls all feed a
work queue that serializes mutations **per guest** (Proxmox dislikes concurrent conflicting
ops on the same guest). Independent guests proceed in parallel.
- **Operation journaling.** Multi-step async ops (provision, restore, controller-update, agent
self-update) are journaled with their in-flight Proxmox task ids. On agent restart, the
journal is replayed: resume-or-rollback, so a crash mid-restore never leaves a corrupt or
half-built guest.
- **Idempotency keys** on one-shot jobs (run-once across retries and restarts).
## 11. Self-update
- **Agent (the hard case — a host service, no snapshot-rollback).** **A/B layout:** download →
verify signature → stage as the inactive slot → flip a `current → good|new` symlink → restart.
**Revert authority lives outside the swapped binary**`Restart=always` alone just
crash-loops a bad binary — so a **separate health-gate** (a systemd oneshot `ExecStartPost`
probe, or a tiny supervisor unit) flips `current` back to last-good and restarts on a failed
health window. The new version is **committed as "good" only after a clean health window**.
Triggered by a hub signed job within the update window; manual always allowed. Journaled (§10).
- **Controller (the easy case — it's a guest).** The agent owns the controller's lifecycle,
so the **agent updates the controller**: snapshot-before-update (free rollback, because the
controller *is* a snapshottable guest) → pull new image → redeploy → health-check → rollback
on failure. This resolves the Part-2 `selfupdate/` open: the controller is **agent-managed**,
not self-updating; the controller's old self-update path is removed.
## 12. Secrets at rest on the host
The agent holds, root-only on the host fs: the scoped Proxmox token, the hub API key, the
operator's **public** verify key (for §4 signatures — public, low-risk), the Cloudflare
tunnel token, encrypted storage creds (NFS/CIFS/PBS), and the **live PBS key**. The privilege
and the secret footprint that left the controller now concentrate here — which is the whole
argument for §3's root-minimization and a small, auditable agent.
## 13. Open items / what this unblocks
Resolved here: tunnel placement (host, agent-managed, own systemd service), the
reconcile-vs-jobs fork (hybrid, gated by reversibility), agent process model, self-update
ownership, the local-API surface (**implemented, slice 8A — §6a**), the storage-manifest schema,
**provision-by-restore**, the **provision/DR slice boundary** (7 front-half + guest-loss DR +
escrow creation; **8A provisioning back-half + local API — implemented**; 8B quiesced backup; 8C
controller de-privileging; 10 host-loss DR + escrow consumption — §9 table), the **PBS
recovery-code escrow design** (§8a), and the **root-vs-API boundary** (Phase 3, B3 — the slice-8A
back-half's host-side `chown`/`pct set` bind-mount is a deliberate, narrow addition OUTSIDE the
API token, in `internal/provision`, not the 3-exception `proxmox.Privileged` fence).
Still open:
- Multi-tenant **resource fairness** on a shared host (per-guest cgroup limits, noisy-neighbor) — deferred to the company-case pass.
- Operator-side **signing tooling** — where the operator signing key lives operationally and how a destructive op gets signed without undue friction (offline key vs. a small signing service; the security floor is "not in the hub").
- Hub-side **desired-state editing UX** and the host-domain report schema details — belong to the hub architecture doc.
- **Golden base image** refresh cadence + fleet versioning — operational, non-blocking (§9).
- **Identity-reset set** (live, link-up) — pinned empirically by the slice-7 bring-up spike; the
scenario-specific policy is settled in §9, the exact field list is the spike's deliverable.
- **Escrow restore-mode serving / consumption** — handing the opaque blob back to a re-enrolling
box and unwrapping `K` with `R` is slice-10 / doc-05 (§8a, §9 host-loss). *Escrow creation + hub
opaque storage are done (slice 7).*
This doc hands the implementation three contracts it was waiting on:
1. the **local-API surface** (§6) → the controller's NEW local-API client, snapshot-before-deploy, and self-restore-test wiring (Part 2);
2. the **storage-manifest schema** (§7) → the `settings.StoragePath` reshape and per-volume hot/bulk placement (Part 2);
3. the **backup contract** (§78) → the destination for the app-data-backup package extracted in the Part-2 refactor.
---
## Changelog — design-review + Phase-3 fold-in (2026-06-08)
### Slice-10D implemented — DR capstone; SLICE 10 CLOSED (2026-06-10)
- The host/hardware-loss DR flow is wired end-to-end, grounded by both 10-series spikes. **Rotation
model (locked): the hub holds no Cloudflare write-power** — it orchestrates recovery (recovery-mode
toggle, directive serving, re-enroll + its OWN agent↔hub credential rotation) and at most read-only
*verifies* connector state; the **destructive tunnel/PBS rotation + stale-connector delete is the
operator's step from a trusted environment** (same spirit as 10B — the operator authorizes/executes
the dangerous op). A compromised hub can only hand out opaque blobs + rotate its own per-host cred.
- **10D.1 identity escrow:** `{tunnel_token, pbs_token}` wrapped under the SAME `R` via `age` (scrypt +
ChaCha20-Poly1305) — a second opaque blob; the K-escrow + 10C `Consume` are untouched. The hub
stores both ciphertext blobs + the **non-secret** directive (pbs repo/ns, expected key fingerprint,
tunnel id). **No usable secret in the hub.**
- **10D.2 recovery mode + re-enroll:** operator-armed **recovery-mode toggle** with bounded
**auto-expiry** gates directive serving + re-enroll. The re-enroll handshake rotates the agent↔hub
credential to the new box's key (**old box's hub access revoked**, hub-internal). Re-enroll auth =
recovery-mode toggle + **R** (zero-knowledge for data *and* identity) + **out-of-band phone
validation** (operator protocol) + auto-expiry + rotation.
- **10D.3 restore mode (agent):** receive directive (10A) → prompt for **R** by hand → `Consume`
(K-escrow → K installed, fingerprint-gated; identity-escrow → tunnel/pbs tokens) → restore guests
from PBS (restore-overwrite gated by **10B** on a non-blank target) → re-establish the tunnel (run
the recovered connector + reconstitute the dashboard-expected origin) → host routes as host X. The
destructive cred rotation is then the operator's step.
- §9 slice table: **10D done → SLICE 10 CLOSED**. Status: implemented (agent v0.18.0; hub v0.11.0).
Deferred (non-blocking): the hub Config DR/Recovery **web UI** (functional via the recovery-mode
admin API today) + a small operator rotation CLI (the rotation is a documented operator procedure).
### Slice-10C implemented — escrow consumption (productionized) (2026-06-10)
- §8a: escrow **consumption** is now a real, tested path (`escrow.Consume`): **Unwrap → fingerprint-
gate → install**. The throwaway 10C spike harness is gone; the spike's findings are baked in (F-C2
install = place the raw `kdf=none` key where the restore reads it; **F-C3** wrong-R fails closed;
**F-C4** fingerprint-gate BEFORE any multi-GB restore; **F-C6** the blob is read-only/retryable, `K`
never mutated). **Zero-knowledge holds end-to-end through consumption**: the hub serves the blob +
expected fingerprint + PBS connection; **R comes from the customer by hand, never the hub** — a hub
compromise alone still cannot decrypt. The four inputs are PARAMETERS (standalone-testable);
`--selftest=escrow-consume` invokes the real path live (R via env, never a flag). Status:
implemented (agent v0.17.0; **no hub change** — 10D wires the hub directive that supplies inputs
1/3/4 and prompts for R).
- §9 slice table: **10C done**. **10D** (DR capstone) is the last piece of slice 10.
### Slice-10B implemented — operator-signed destructive completion (offline key + signing CLI) (2026-06-10)
- §4: the **operator-signed path is LIVE**. gate → pending op (the agent surfaces the bound intent:
op + target + params on **durable** ids) → the operator **signs OFFLINE** (`cmd/felhom-opsign`,
`ssh-keygen -Y sign` — hardware-ready) → uploads to the hub jobs queue → the agent fetches +
**verifies** (pinned-key SSHSIG, namespace, allow-list by key MATERIAL, crypto over raw bytes,
host target, time window, **durable nonce-burn**) → **executes**, journaled. Order: verify → burn
nonce → execute → report. Pinning is via a **trusted path** (provision/agent config, NEVER
hub-alone), **multi-key** for rotation (KeyID selects, role-scoped). **Key floor: the signing key
is not in the hub and not in the agent.** Resource-level anti-retarget: params bind a **durable
device id** (wwn/serial), and execution re-resolves + **re-inspects (8C)** — "wipe device X" wipes
exactly X, never whatever is at `/dev/sdb` now.
- §6: the **8C data-bearing wipe now COMPLETES** via 10B — `POST /disks/format` of a data-bearing
device still refuses `pending_signature`, but now surfaces the bound `storage_wipe` op (durable id
+ host) to sign; the signed job is executed by the agent's signed-jobs runner (re-resolve +
re-inspect → `mkfs`). A path-only, vanished, or no-longer-data-bearing target is refused **even
with a valid signature**.
- §9 slice table: **10B done**. 10C (escrow consumption — spike validated) / 10D (DR capstone,
reuses this gate for restore-overwrite) pending. Status: implemented (agent v0.16.0; hub v0.10.0).
The signed blob is opaque on the jobs wire (no golden change).
### Slice-10A implemented — hub desired-state serving (the "Down" channel) (2026-06-10)
- §4: the **control loop is live**. The report IS the heartbeat; its response — the **control
envelope** — is the Down channel. The envelope is a cheap change-notification: `desired_generation`
(version) + `has_signed_ops` (flag) + `poll_interval_seconds`. The agent **caches** the desired-state
+ its generation and re-fetches the heavy state (`GET /hosts/{id}/desired-state`, self-scoped) **only
when the generation advances**. The engine reconciles **benign** deltas; an explicit **destructive**
delta (a guest `decommission`) is classified Destructive → the gate refuses it **`pending_signature`**
(no signer in 10A → never executed). **Signed-job execution is 10B**; the `restore_directive` field
is carried in desired-state now but **consumed in 10D**.
- §9 slice table: **10A done** (hub serves desired-state + bumps generation + signed-jobs queue/flag;
agent activates the envelope + a hub-backed `CachingProvider` feeding the engine). 10B/10C/10D pending.
- Wire: the envelope's now-active fields + the `desired-state` response are a cross-repo contract —
`control-envelope.golden.json` + `desired-state.golden.json`, **byte-identical** agent↔hub. Status:
implemented (hub v0.9.0; agent v0.15.0). **Out of 10A (deliberate):** the hub stores/serves
desired-state **opaquely** (the agent owns the schema); signed-op **execution** + verification is 10B;
**restore-mode/re-enroll** consumption (a new box's first directive) is 10D — 10A serves only
already-authenticated hosts.
### Slice-9 implemented — host metrics to the controller (customer host-health view) (2026-06-10)
- §6: added **`GET /host/metrics`** — host-wide health (cpu%/mem/load/uptime/**`cpu_temp_c`**) +
per-storage capacity for the customer's monitoring view. **Reuses the slice-4 collector** (no
duplicate collection); host-wide, **token-authed**, **fresh** (not the 15-min hub snapshot).
- §9 slice table: **defined + marked slice 9** (the roadmap previously jumped 8→10; this fills it).
Noted the **one-customer-per-host** assumption (host-wide CPU/mem would leak cross-customer load on
a multi-customer host) and the out-of-scope items (multi-tenant filtering; time-series history).
- The one new collector is **CPU/chassis temp** (`internal/hub/cputemp.go`, sysfs hwmon/thermal-zone,
**graceful-null**), added to the **shared `HostMetrics`** → the hub report gains `cpu_temp_c` too
(operator freebie) → **cross-repo host-report golden updated** byte-identical. Status: implemented
(agent v0.14.0; controller v0.39.0).
### Slice-8C implemented — controller de-privileged, slice 8 CLOSED (2026-06-10)
- §6: added the **disk-management endpoints** (`/disks`, `/disks/assign|eject|format`) and
**reframed the principle** — a controller may do non-data-destructive storage setup self-serve;
**anything that can lose customer data stays operator-signed (§4)**, with the **classifier
(agent-internal device inspection)** as the enforcer. The 8C invariant: the agent decides
data-bearing-ness by **inspecting the device itself**, never the caller's claim; a data-bearing
format → `ClassStorageWipe` → gate → `pending_signature` (signed completion is slice 10).
- §9 slice table: **8C implemented — slice 8 CLOSED** (agent v0.12.0 `/disks` + classifier gate +
`mkfs`; controller v0.37.0 retired ~12.3k LOC of disk-execution + de-privileged + rewired to the
agent). The controller-side re-platform milestone: the in-guest controller is now Docker-only with
no disk/Proxmox privileges.
### Slice-8B implemented: app-consistent backup (quiesce / stack-stop) (2026-06-10)
- §8: the **controller-driven quiesce** (stop app stacks → `POST /backup` → restart) is **implemented**
(controller v0.36.0 `internal/quiesce` + agent v0.11.0 `/backup/due` cadence + `/backup/status`
phases). Documented the **crash-safety** centerpiece (marker-before-stop, guaranteed unquiesce,
max-quiesce bound, startup crash-recovery, single-flight) and the **8B.2** downtime-optimization
fast-follow (snapshot mode + a `snapshotted` phase). Validated live: a **quiesced** postgres restore
comes up clean ("database system was shut down") vs a **crash-consistent** restore doing WAL recovery.
- §9 slice table: **8B → implemented**; 8C (controller de-privileging) still pending.
### Slice-8A implemented: local API + provisioning back-half (2026-06-10)
- NEW §6a: the **local-API implementation** (agent v0.10.0 `internal/localapi`; controller v0.35.0
`internal/bootstrap` + `internal/agentapi`) — persisted self-signed leaf with a **stable
leaf-SHA-256 pin**, the **token→guest self-scoping** (explicit cross-guest id → 403, op never
issued), the stable **`bootstrap.json` contract + controller ingestion `(c)`** (seed
`controller.yaml`, skip setup; idempotent + fail-safe), and the **baked-controller deploy** (no
registry credential in any guest). Firewall narrowing = defense-in-depth; the token stays the gate.
- §9: the provisioning **back half** row is now **slice 8A — implemented** (split from the old "8");
`build-golden.sh` now **bakes the controller + a bootstrap unit**; quiesced backup → 8B, controller
de-privileging → 8C. The host-side `chown`/`pct set` bind-mount is a deliberate narrow surface in
`internal/provision` (NOT the 3-exception `proxmox.Privileged` fence). Validated live end-to-end.
- §13 updated accordingly.
### Slice-7 scope + escrow design (2026-06-09)
- §9 rewritten: the bring-up primitive is a **shared front half only** — identity-reset policy is
**scenario-specific** (provision = fresh everything; guest-loss DR = preserve restic/tunnel/hub
continuity identity, reset only collision-prone host-local identity). Added the **slice 7/8/10
mapping table** (front half + guest-loss DR + escrow creation in 7; provisioning back-half in 8;
host-loss DR + escrow consumption in 10).
- NEW §8a: **PBS recovery-code escrow** — live key unencrypted on box for unattended ops; agent
generates recovery code `R`; PBS-native passphrase-wrap of `K` under `R` escrowed to the hub
(zero-knowledge); consumption is slice 10. Irreducible-residual + rotation≠key-rotation stated.
- §13 updated accordingly.
- **NEW provision-by-restore** (§9): the agent provisions by **restoring a golden base image**
(token-covered, preserves `keyctl`), never `pct create` on the per-customer path; one unified
restore primitive shared with DR. §2 responsibility + §3 boundary updated.
- **B3** (§2/§3): replaced "Phase-1 minimal role" with the validated **`FelhomAgent`** operator
role; root-vs-API boundary **settled** (root only for golden-image build, host mounts, SMART).
- **B1** (§4): reversibility gate rewritten as **provenance + data-bearing** (scratch tag is
agent-internal, never hub-supplied; crashed-controller heal is non-destructive in-place).
- **B2** (§7/§8): validated bulk-as-`backup=0`-mountpoint recipe + the **bulk-DR consequence**
(excluded bulk needs its own backup decision).
- **S1** (§6/§8): `GET /backup/due` added; controller-driven quiescing; agent vzdump is
crash-consistent only. **S2** (§10/§11): A/B self-update with external revert authority;
controller-update + agent self-update journaled. **S3** (§7): `StoragePath` field re-homing.
**S4:** geo non-responsibility added (§2). **M2** (§7): manifest "absorbs + adds durable_id".
**§6:** rollback is self-scoped/bounded. **§13:** golden-image refresh cadence added as open.