Doc-only spike (no hub code change). Validated on demo-felhom (guest 8200, torn down): (1) guest->host HTTPS over vmbr0 with fingerprint-pin + bearer + self-scoping (200/401/403, wrong-pin TLS fail, no firewall rule needed); (2) config-mount + golden-baked bootstrap unit deploys+runs the controller (docker login/pull/run v0.34.0) with no pct exec. Verdict: GO to 8A spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.7 KiB
Slice 8 Phase A — agent↔controller channel + controller deploy plumbing: Findings
Host: demo-felhom (192.168.0.162) — Proxmox VE 9.2.2, Debian 13 (Trixie). Bridge vmbr0,
LAN DHCP (router 192.168.0.1). The host's vmbr0 IP = 192.168.0.162 (its LAN address — the
guest reaches the agent here).
Date: 2026-06-10. Driver: SPIKE-RUNBOOK (root@pam for the throwaway stub + guest plumbing;
the real bring-up job — felhom-agent v0.9.0 — to provision the spike guest).
VMID: spike guest 8200 (torn down). Fixed port 8443.
This document presents data, observations, and design consequences. It de-risks and feeds the 8A spec (the real local-API server + the 7 §6 endpoints) and the provisioning back-half (deploy + per-guest token mint + bootstrap). The test local-API token and the registry pull credential are secrets — referenced by location, redacted here.
0. Setup / provenance
| Component | Value |
|---|---|
Host vmbr0 IP : port |
192.168.0.162:8443 (nothing else bound there pre-spike) |
| Controller image | gitea.dooplex.hu/admin/felhom-controller:v0.34.0 (registry has 44 tags; latest is v0.34.0) |
| Registry pull cred | Gitea token — k8s secret/gitea-creds (user admin), by reference (never echoed/committed) |
| Spike guest 8200 | provisioned by the real bring-up job from golden local:backup/vzdump-lxc-9100-2026_06_09-21_32_58.tar.zst (-mode provision -keep) |
| Guest 8200 facts | DHCP IP 192.168.0.145, fresh MAC BC:24:11:59:F2:DD, features: nesting=1,keyctl=1, Docker 29.5.3 active |
The bring-up job confirmed re-usable as the spike's guest factory: Pass:true, Verified:"boot+running",
8s, fresh MAC — the slice-7 primitive delivered a golden, link-up, Docker-ready guest unchanged.
1. The channel (guest → host HTTPS over the bridge, fingerprint-pinned) — PASS
Throwaway HTTPS stub on 192.168.0.162:8443 (self-signed; GET /storage; the stub never logs the
Authorization header). Two tokens: one scoped to guest 8200, one scoped to a different guest.
| Cert handle | Value (public; not secret) |
|---|---|
| Leaf-cert SHA-256 | CC:7B:03:DC:0F:FA:AC:94:C8:79:35:50:03:3F:FC:CF:CB:2B:49:AE:A7:8A:7D:7C:C7:49:80:9E:3D:EB:92:BC |
SPKI pubkey SHA-256 (curl --pinnedpubkey sha256//) |
uSSmg6cuEJj9CF7hiBdQ5OEJKOs0NszXJXjRNBwq8DM= |
From inside guest 8200 (curl -k --pinnedpubkey sha256//<spki>, token read from a file — value
never on the command line):
| # | Case | Expected | Result |
|---|---|---|---|
| T1 | correct pin + guest-8200 token | 200 | HTTP 200 ({"storage":"ok","guest":8200}) |
| T2 | correct pin + no token | 401 | HTTP 401 |
| T3 | correct pin + other-guest token | 403 | HTTP 403 (self-scoping holds) |
| T4 | wrong pin + valid token | TLS failure | HTTP 000, curl exit 90 (CURLE_SSL_PINNEDPUBKEYNOTMATCH) — the pin gates the handshake before any request is sent |
Reachability / firewall: no rule needed. PVE firewall is off by default on this demo
(no cluster.fw / host.fw / 8200.fw; host iptables INPUT policy ACCEPT, nft empty). Guest
and host share the vmbr0 L2 segment (192.168.0.0/24); the guest's route to the host is direct
(192.168.0.162 dev eth0 src 192.168.0.145).
Security observation (design consequence): the local-API binds the host's LAN IP, so it is reachable by anything on the LAN, not just guests on the bridge — network isolation does not gate it. The pin + bearer + self-scoping are the only gate, and at the plumbing level they held airtight. The back-half should still consider narrowing exposure (bind to the bridge subnet and/or a PVE firewall ACCEPT limited to the guest subnet → DROP otherwise) as defence-in-depth.
Pin form: curl validated the SPKI (--pinnedpubkey). The agent's existing convention is
leaf-cert SHA-256 pinning. Both fingerprints are captured above; 8A picks one for the Go
controller's pin (leaf-cert SHA-256 is the lower-friction match to the agent's PVE-cert pinning).
2. The deploy plumbing (no pct exec — host-side mount + golden-baked unit) — PASS
Validates the F3 principle end-to-end: the agent stays host-side, populates a config mount; a golden-baked oneshot does the guest-side work.
Config mount (host-side, agent-simulated): a host dir bind-mounted read-only at
/etc/felhom-bootstrap (pct set 8200 -mp0 <hostdir>,mp=/etc/felhom-bootstrap,ro=1) carrying
bootstrap.json = { hub_url, host_id, local_api_endpoint, local_api_pin_spki_sha256, local_api_token, registry{host,username,token}, controller_image }.
- Hotplugged live — the bind mount appeared inside the running guest with no restart.
- GOTCHA (unprivileged uid mapping): host files must be
chown 100000:100000so they appear asroot:root 0600inside the guest (host uid 0 maps to guestnobody, leaving the secret config unreadable otherwise). The provisioning back-half's mount-populate step must chown to the container's mapped root. Verified: after the chown, the guest sawbootstrap.jsonas-rw------- root root.
Golden-baked bootstrap unit (felhom-controller-bootstrap.service, oneshot, RemainAfterExit,
ConditionPathExists=/etc/felhom-bootstrap/bootstrap.json, After=docker.service network-online.target) → /usr/local/sbin/felhom-controller-bootstrap.sh:
docker login (token piped via --password-stdin, never echoed) → docker pull → docker run.
| Step | Result |
|---|---|
docker login gitea.dooplex.hu (admin + pull token, from the mount) |
Login Succeeded |
docker pull …/felhom-controller:v0.34.0 (guest→registry) |
Downloaded (digest sha256:463733a1…) — registry creds + guest egress both work |
| Unit fired + finished | active (RemainAfterExit); journal clean; no pct exec used |
| Controller container | Up (healthy), real v0.34.0 |
Tie-to-S1: in-guest process reads the bootstrap token from the mount → host /storage |
HTTP 200 |
Controller boot (informational): the container came up in setup mode ([INFO] felhom-controller v0.34.0 — setup mode, setup wizard on :8080/:8081) because it looks for
/opt/docker/felhom-controller/controller.yaml and the spike mounted /config/bootstrap.json. The
container running and healthy is the spike's success criterion; full self-configuration is an 8A
concern (see gotcha 3).
3. Gotchas (carry into 8A / the back-half)
- Unprivileged-LXC uid mapping for the config mount — the agent must
chown 100000:100000(the container's mapped root) the files it writes into the mount, or the guest reads them asnobodyand the secret config is inaccessible. (Bind mount itself hotplugs fine, no restart.) - Registry-cred distribution — the bootstrap currently carries the shared
adminpull token into every guest's mount. For production this should be a narrow, read-only, ideally per-guest / short-lived registry token (the mount is the right delivery channel; the cred's scope is the issue). Treat as a back-half decision. - Controller config contract mismatch —
bootstrap.json(this spike's shape/path) ≠ the controller's expectedcontroller.yamlat/opt/docker/felhom-controller/. 8A must either (a) emit the controller's real config format at the path it reads, or (b) have the bootstrap unit translatebootstrap.json→controller.yaml. Until then the controller boots to setup mode. - Pin form — SPKI (validated by curl) vs leaf-cert SHA-256 (agent convention). 8A picks one for the Go controller; both fingerprints captured in §1.
- LAN exposure — §1's security observation: the local-API is on the host LAN IP, gated by auth only. Consider bridge-bind / firewall narrowing in the back-half.
4. Verdict — GO to spec 8A + the provisioning back-half
Both unvalidated foundations are proven at the plumbing level:
- Channel (doc §6 transport): guest→host over
vmbr0works with no firewall rule on this demo; fingerprint-pinning gates the handshake (wrong pin = hard TLS failure); bearer + self-scoping behave (200 / 401 / 403). → 8A can spec the real local-API server + the 7 §6 endpoints with confidence in the transport. - Deploy: the config-mount + golden-baked bootstrap unit cleanly deploys and configures the
controller without
pct exec(F3 principle holds);docker login+pullfrom the guest with a Gitea pull token works; the controller runs healthy and an in-guest process reaches the host endpoint with its bootstrap token. → the provisioning back-half can adopt this mechanism (mount + baked unit + per-guest token mint), addressing gotchas 1–3.
Out of scope (noted, not built here)
- The real local-API server + the 7 §6 endpoints, the per-guest token→guest map and self-scoping enforcement → 8A spec.
- The provisioning back-half proper (agent mints the per-guest token, writes the bootstrap mount, the controller-bootstrap unit as a permanent golden-recipe addition + the config-format alignment of gotcha 3) → 8A spec, informed by this spike.
- Quiesced app-consistent backup (stack-stop contract) → 8B.
- Controller de-privileging (retire the disk-execution subsystem; bind
GET /storage; new customer disk-management endpoints behind the slice-4 data-bearing classifier) → 8C.
Secret handling (held)
The test local-API tokens and the registry pull credential were kept in 0600 files on the host,
referenced by location, never logged or committed; the stub never logged the Authorization
header; docker login used --password-stdin. No real per-guest token or registry cred appears in
git. Only public cert fingerprints are recorded above.