4a81a96678
Doc-only spike (no hub code change). Validated on demo-felhom (guest 8200, torn down): (1) guest->host HTTPS over vmbr0 with fingerprint-pin + bearer + self-scoping (200/401/403, wrong-pin TLS fail, no firewall rule needed); (2) config-mount + golden-baked bootstrap unit deploys+runs the controller (docker login/pull/run v0.34.0) with no pct exec. Verdict: GO to 8A spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
160 lines
9.7 KiB
Markdown
160 lines
9.7 KiB
Markdown
# Slice 8 Phase A — agent↔controller channel + controller deploy plumbing: Findings
|
||
|
||
**Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, Debian 13 (Trixie). Bridge `vmbr0`,
|
||
LAN DHCP (router 192.168.0.1). The host's **`vmbr0` IP = 192.168.0.162** (its LAN address — the
|
||
guest reaches the agent here).
|
||
**Date:** 2026-06-10. **Driver:** SPIKE-RUNBOOK (root@pam for the throwaway stub + guest plumbing;
|
||
the real bring-up job — `felhom-agent v0.9.0` — to provision the spike guest).
|
||
**VMID:** spike guest `8200` (torn down). Fixed port **8443**.
|
||
|
||
> This document presents **data, observations, and design consequences**. It de-risks and feeds the
|
||
> **8A spec** (the real local-API server + the 7 §6 endpoints) and the **provisioning back-half**
|
||
> (deploy + per-guest token mint + bootstrap). The test local-API token and the registry pull
|
||
> credential are **secrets** — referenced by location, **redacted** here.
|
||
|
||
---
|
||
|
||
## 0. Setup / provenance
|
||
|
||
| Component | Value |
|
||
|---|---|
|
||
| Host `vmbr0` IP : port | `192.168.0.162:8443` (nothing else bound there pre-spike) |
|
||
| Controller image | `gitea.dooplex.hu/admin/felhom-controller:v0.34.0` (registry has 44 tags; latest is v0.34.0) |
|
||
| Registry pull cred | Gitea token — k8s `secret/gitea-creds` (user `admin`), **by reference** (never echoed/committed) |
|
||
| Spike guest 8200 | provisioned by the **real bring-up job** from golden `local:backup/vzdump-lxc-9100-2026_06_09-21_32_58.tar.zst` (`-mode provision -keep`) |
|
||
| Guest 8200 facts | DHCP IP `192.168.0.145`, fresh MAC `BC:24:11:59:F2:DD`, `features: nesting=1,keyctl=1`, **Docker 29.5.3 active** |
|
||
|
||
The bring-up job confirmed re-usable as the spike's guest factory: `Pass:true`, `Verified:"boot+running"`,
|
||
8s, fresh MAC — the slice-7 primitive delivered a golden, link-up, Docker-ready guest unchanged.
|
||
|
||
---
|
||
|
||
## 1. The channel (guest → host HTTPS over the bridge, fingerprint-pinned) — **PASS**
|
||
|
||
Throwaway HTTPS stub on `192.168.0.162:8443` (self-signed; `GET /storage`; the stub never logs the
|
||
`Authorization` header). Two tokens: one scoped to guest 8200, one scoped to a *different* guest.
|
||
|
||
| Cert handle | Value (public; not secret) |
|
||
|---|---|
|
||
| Leaf-cert SHA-256 | `CC:7B:03:DC:0F:FA:AC:94:C8:79:35:50:03:3F:FC:CF:CB:2B:49:AE:A7:8A:7D:7C:C7:49:80:9E:3D:EB:92:BC` |
|
||
| SPKI pubkey SHA-256 (curl `--pinnedpubkey sha256//`) | `uSSmg6cuEJj9CF7hiBdQ5OEJKOs0NszXJXjRNBwq8DM=` |
|
||
|
||
From **inside guest 8200** (`curl -k --pinnedpubkey sha256//<spki>`, token read from a file — value
|
||
never on the command line):
|
||
|
||
| # | Case | Expected | Result |
|
||
|---|---|---|---|
|
||
| T1 | correct pin + guest-8200 token | 200 | **HTTP 200** (`{"storage":"ok","guest":8200}`) |
|
||
| T2 | correct pin + **no** token | 401 | **HTTP 401** |
|
||
| T3 | correct pin + **other-guest** token | 403 | **HTTP 403** (self-scoping holds) |
|
||
| T4 | **wrong** pin + valid token | TLS failure | **HTTP 000, curl exit 90** (`CURLE_SSL_PINNEDPUBKEYNOTMATCH`) — the pin gates the handshake before any request is sent |
|
||
|
||
**Reachability / firewall:** **no rule needed.** PVE firewall is **off** by default on this demo
|
||
(no `cluster.fw` / `host.fw` / `8200.fw`; host `iptables INPUT policy ACCEPT`, `nft` empty). Guest
|
||
and host share the `vmbr0` L2 segment (192.168.0.0/24); the guest's route to the host is direct
|
||
(`192.168.0.162 dev eth0 src 192.168.0.145`).
|
||
|
||
**Security observation (design consequence):** the local-API binds the host's **LAN IP**, so it is
|
||
reachable by *anything on the LAN*, not just guests on the bridge — network isolation does **not**
|
||
gate it. The pin + bearer + self-scoping are the *only* gate, and at the plumbing level they held
|
||
airtight. The back-half should still consider narrowing exposure (bind to the bridge subnet and/or a
|
||
PVE firewall ACCEPT limited to the guest subnet → DROP otherwise) as defence-in-depth.
|
||
|
||
**Pin form:** curl validated the **SPKI** (`--pinnedpubkey`). The agent's existing convention is
|
||
**leaf-cert SHA-256** pinning. Both fingerprints are captured above; **8A picks one** for the Go
|
||
controller's pin (leaf-cert SHA-256 is the lower-friction match to the agent's PVE-cert pinning).
|
||
|
||
---
|
||
|
||
## 2. The deploy plumbing (no `pct exec` — host-side mount + golden-baked unit) — **PASS**
|
||
|
||
Validates the F3 principle end-to-end: the **agent stays host-side**, populates a config mount; a
|
||
**golden-baked oneshot** does the guest-side work.
|
||
|
||
**Config mount (host-side, agent-simulated):** a host dir bind-mounted **read-only** at
|
||
`/etc/felhom-bootstrap` (`pct set 8200 -mp0 <hostdir>,mp=/etc/felhom-bootstrap,ro=1`) carrying
|
||
`bootstrap.json` = `{ hub_url, host_id, local_api_endpoint, local_api_pin_spki_sha256,
|
||
local_api_token, registry{host,username,token}, controller_image }`.
|
||
|
||
- **Hotplugged live** — the bind mount appeared inside the running guest with **no restart**.
|
||
- **GOTCHA (unprivileged uid mapping):** host files must be `chown 100000:100000` so they appear as
|
||
`root:root 0600` inside the guest (host uid 0 maps to guest `nobody`, leaving the secret config
|
||
unreadable otherwise). The provisioning back-half's mount-populate step **must chown to the
|
||
container's mapped root**. Verified: after the chown, the guest saw `bootstrap.json` as
|
||
`-rw------- root root`.
|
||
|
||
**Golden-baked bootstrap unit** (`felhom-controller-bootstrap.service`, oneshot, `RemainAfterExit`,
|
||
`ConditionPathExists=/etc/felhom-bootstrap/bootstrap.json`, `After=docker.service
|
||
network-online.target`) → `/usr/local/sbin/felhom-controller-bootstrap.sh`:
|
||
`docker login` (token piped via `--password-stdin`, never echoed) → `docker pull` → `docker run`.
|
||
|
||
| Step | Result |
|
||
|---|---|
|
||
| `docker login gitea.dooplex.hu` (admin + pull token, from the mount) | **Login Succeeded** |
|
||
| `docker pull …/felhom-controller:v0.34.0` (guest→registry) | **Downloaded** (digest `sha256:463733a1…`) — registry creds + guest egress both work |
|
||
| Unit fired + finished | `active` (RemainAfterExit); journal clean; **no `pct exec`** used |
|
||
| Controller container | **Up (healthy)**, real `v0.34.0` |
|
||
| **Tie-to-S1:** in-guest process reads the bootstrap token from the mount → host `/storage` | **HTTP 200** |
|
||
|
||
**Controller boot (informational):** the container came up in **setup mode** (`[INFO]
|
||
felhom-controller v0.34.0 — setup mode`, setup wizard on :8080/:8081) because it looks for
|
||
`/opt/docker/felhom-controller/controller.yaml` and the spike mounted `/config/bootstrap.json`. The
|
||
container *running and healthy* is the spike's success criterion; **full self-configuration is an 8A
|
||
concern** (see gotcha 3).
|
||
|
||
---
|
||
|
||
## 3. Gotchas (carry into 8A / the back-half)
|
||
|
||
1. **Unprivileged-LXC uid mapping for the config mount** — the agent must `chown 100000:100000` (the
|
||
container's mapped root) the files it writes into the mount, or the guest reads them as `nobody`
|
||
and the secret config is inaccessible. (Bind mount itself hotplugs fine, no restart.)
|
||
2. **Registry-cred distribution** — the bootstrap currently carries the **shared `admin` pull token**
|
||
into every guest's mount. For production this should be a **narrow, read-only, ideally per-guest /
|
||
short-lived registry token** (the mount is the right delivery channel; the cred's *scope* is the
|
||
issue). Treat as a back-half decision.
|
||
3. **Controller config contract mismatch** — `bootstrap.json` (this spike's shape/path) ≠ the
|
||
controller's expected `controller.yaml` at `/opt/docker/felhom-controller/`. 8A must either (a)
|
||
emit the controller's real config format at the path it reads, or (b) have the bootstrap unit
|
||
translate `bootstrap.json` → `controller.yaml`. Until then the controller boots to *setup mode*.
|
||
4. **Pin form** — SPKI (validated by curl) vs leaf-cert SHA-256 (agent convention). 8A picks one for
|
||
the Go controller; both fingerprints captured in §1.
|
||
5. **LAN exposure** — §1's security observation: the local-API is on the host LAN IP, gated by
|
||
auth only. Consider bridge-bind / firewall narrowing in the back-half.
|
||
|
||
---
|
||
|
||
## 4. Verdict — **GO** to spec 8A + the provisioning back-half
|
||
|
||
Both unvalidated foundations are proven at the plumbing level:
|
||
|
||
- **Channel (doc §6 transport):** guest→host over `vmbr0` works with **no firewall rule** on this
|
||
demo; **fingerprint-pinning gates** the handshake (wrong pin = hard TLS failure); **bearer +
|
||
self-scoping** behave (200 / 401 / 403). → 8A can spec the real local-API server + the 7 §6
|
||
endpoints with confidence in the transport.
|
||
- **Deploy:** the **config-mount + golden-baked bootstrap unit** cleanly deploys *and* configures the
|
||
controller **without `pct exec`** (F3 principle holds); `docker login`+`pull` from the guest with a
|
||
Gitea pull token works; the controller runs healthy and an in-guest process reaches the host
|
||
endpoint with its bootstrap token. → the provisioning back-half can adopt this mechanism (mount +
|
||
baked unit + per-guest token mint), addressing gotchas 1–3.
|
||
|
||
---
|
||
|
||
## Out of scope (noted, not built here)
|
||
|
||
- The **real local-API server** + the 7 §6 endpoints, the per-guest token→guest map and self-scoping
|
||
*enforcement* → **8A spec**.
|
||
- The **provisioning back-half** proper (agent mints the per-guest token, writes the bootstrap mount,
|
||
the controller-bootstrap unit as a permanent golden-recipe addition + the config-format alignment
|
||
of gotcha 3) → **8A spec**, informed by this spike.
|
||
- **Quiesced app-consistent backup** (stack-stop contract) → **8B**.
|
||
- **Controller de-privileging** (retire the disk-*execution* subsystem; bind `GET /storage`; new
|
||
customer disk-management endpoints behind the slice-4 data-bearing classifier) → **8C**.
|
||
|
||
## Secret handling (held)
|
||
|
||
The test local-API tokens and the registry pull credential were kept in `0600` files on the host,
|
||
referenced by location, **never** logged or committed; the stub never logged the `Authorization`
|
||
header; `docker login` used `--password-stdin`. No real per-guest token or registry cred appears in
|
||
git. Only public cert fingerprints are recorded above.
|