Files
felhom.eu/documentation/tests/slice8a-channel-deploy-spike-findings.md
T
admin 4a81a96678 slice 8A spike: agent<->controller channel + controller deploy plumbing findings
Doc-only spike (no hub code change). Validated on demo-felhom (guest 8200,
torn down): (1) guest->host HTTPS over vmbr0 with fingerprint-pin + bearer +
self-scoping (200/401/403, wrong-pin TLS fail, no firewall rule needed);
(2) config-mount + golden-baked bootstrap unit deploys+runs the controller
(docker login/pull/run v0.34.0) with no pct exec. Verdict: GO to 8A spec.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 08:57:48 +02:00

9.7 KiB
Raw Blame History

Slice 8 Phase A — agent↔controller channel + controller deploy plumbing: Findings

Host: demo-felhom (192.168.0.162) — Proxmox VE 9.2.2, Debian 13 (Trixie). Bridge vmbr0, LAN DHCP (router 192.168.0.1). The host's vmbr0 IP = 192.168.0.162 (its LAN address — the guest reaches the agent here). Date: 2026-06-10. Driver: SPIKE-RUNBOOK (root@pam for the throwaway stub + guest plumbing; the real bring-up job — felhom-agent v0.9.0 — to provision the spike guest). VMID: spike guest 8200 (torn down). Fixed port 8443.

This document presents data, observations, and design consequences. It de-risks and feeds the 8A spec (the real local-API server + the 7 §6 endpoints) and the provisioning back-half (deploy + per-guest token mint + bootstrap). The test local-API token and the registry pull credential are secrets — referenced by location, redacted here.


0. Setup / provenance

Component Value
Host vmbr0 IP : port 192.168.0.162:8443 (nothing else bound there pre-spike)
Controller image gitea.dooplex.hu/admin/felhom-controller:v0.34.0 (registry has 44 tags; latest is v0.34.0)
Registry pull cred Gitea token — k8s secret/gitea-creds (user admin), by reference (never echoed/committed)
Spike guest 8200 provisioned by the real bring-up job from golden local:backup/vzdump-lxc-9100-2026_06_09-21_32_58.tar.zst (-mode provision -keep)
Guest 8200 facts DHCP IP 192.168.0.145, fresh MAC BC:24:11:59:F2:DD, features: nesting=1,keyctl=1, Docker 29.5.3 active

The bring-up job confirmed re-usable as the spike's guest factory: Pass:true, Verified:"boot+running", 8s, fresh MAC — the slice-7 primitive delivered a golden, link-up, Docker-ready guest unchanged.


1. The channel (guest → host HTTPS over the bridge, fingerprint-pinned) — PASS

Throwaway HTTPS stub on 192.168.0.162:8443 (self-signed; GET /storage; the stub never logs the Authorization header). Two tokens: one scoped to guest 8200, one scoped to a different guest.

Cert handle Value (public; not secret)
Leaf-cert SHA-256 CC:7B:03:DC:0F:FA:AC:94:C8:79:35:50:03:3F:FC:CF:CB:2B:49:AE:A7:8A:7D:7C:C7:49:80:9E:3D:EB:92:BC
SPKI pubkey SHA-256 (curl --pinnedpubkey sha256//) uSSmg6cuEJj9CF7hiBdQ5OEJKOs0NszXJXjRNBwq8DM=

From inside guest 8200 (curl -k --pinnedpubkey sha256//<spki>, token read from a file — value never on the command line):

# Case Expected Result
T1 correct pin + guest-8200 token 200 HTTP 200 ({"storage":"ok","guest":8200})
T2 correct pin + no token 401 HTTP 401
T3 correct pin + other-guest token 403 HTTP 403 (self-scoping holds)
T4 wrong pin + valid token TLS failure HTTP 000, curl exit 90 (CURLE_SSL_PINNEDPUBKEYNOTMATCH) — the pin gates the handshake before any request is sent

Reachability / firewall: no rule needed. PVE firewall is off by default on this demo (no cluster.fw / host.fw / 8200.fw; host iptables INPUT policy ACCEPT, nft empty). Guest and host share the vmbr0 L2 segment (192.168.0.0/24); the guest's route to the host is direct (192.168.0.162 dev eth0 src 192.168.0.145).

Security observation (design consequence): the local-API binds the host's LAN IP, so it is reachable by anything on the LAN, not just guests on the bridge — network isolation does not gate it. The pin + bearer + self-scoping are the only gate, and at the plumbing level they held airtight. The back-half should still consider narrowing exposure (bind to the bridge subnet and/or a PVE firewall ACCEPT limited to the guest subnet → DROP otherwise) as defence-in-depth.

Pin form: curl validated the SPKI (--pinnedpubkey). The agent's existing convention is leaf-cert SHA-256 pinning. Both fingerprints are captured above; 8A picks one for the Go controller's pin (leaf-cert SHA-256 is the lower-friction match to the agent's PVE-cert pinning).


2. The deploy plumbing (no pct exec — host-side mount + golden-baked unit) — PASS

Validates the F3 principle end-to-end: the agent stays host-side, populates a config mount; a golden-baked oneshot does the guest-side work.

Config mount (host-side, agent-simulated): a host dir bind-mounted read-only at /etc/felhom-bootstrap (pct set 8200 -mp0 <hostdir>,mp=/etc/felhom-bootstrap,ro=1) carrying bootstrap.json = { hub_url, host_id, local_api_endpoint, local_api_pin_spki_sha256, local_api_token, registry{host,username,token}, controller_image }.

  • Hotplugged live — the bind mount appeared inside the running guest with no restart.
  • GOTCHA (unprivileged uid mapping): host files must be chown 100000:100000 so they appear as root:root 0600 inside the guest (host uid 0 maps to guest nobody, leaving the secret config unreadable otherwise). The provisioning back-half's mount-populate step must chown to the container's mapped root. Verified: after the chown, the guest saw bootstrap.json as -rw------- root root.

Golden-baked bootstrap unit (felhom-controller-bootstrap.service, oneshot, RemainAfterExit, ConditionPathExists=/etc/felhom-bootstrap/bootstrap.json, After=docker.service network-online.target) → /usr/local/sbin/felhom-controller-bootstrap.sh: docker login (token piped via --password-stdin, never echoed) → docker pulldocker run.

Step Result
docker login gitea.dooplex.hu (admin + pull token, from the mount) Login Succeeded
docker pull …/felhom-controller:v0.34.0 (guest→registry) Downloaded (digest sha256:463733a1…) — registry creds + guest egress both work
Unit fired + finished active (RemainAfterExit); journal clean; no pct exec used
Controller container Up (healthy), real v0.34.0
Tie-to-S1: in-guest process reads the bootstrap token from the mount → host /storage HTTP 200

Controller boot (informational): the container came up in setup mode ([INFO] felhom-controller v0.34.0 — setup mode, setup wizard on :8080/:8081) because it looks for /opt/docker/felhom-controller/controller.yaml and the spike mounted /config/bootstrap.json. The container running and healthy is the spike's success criterion; full self-configuration is an 8A concern (see gotcha 3).


3. Gotchas (carry into 8A / the back-half)

  1. Unprivileged-LXC uid mapping for the config mount — the agent must chown 100000:100000 (the container's mapped root) the files it writes into the mount, or the guest reads them as nobody and the secret config is inaccessible. (Bind mount itself hotplugs fine, no restart.)
  2. Registry-cred distribution — the bootstrap currently carries the shared admin pull token into every guest's mount. For production this should be a narrow, read-only, ideally per-guest / short-lived registry token (the mount is the right delivery channel; the cred's scope is the issue). Treat as a back-half decision.
  3. Controller config contract mismatchbootstrap.json (this spike's shape/path) ≠ the controller's expected controller.yaml at /opt/docker/felhom-controller/. 8A must either (a) emit the controller's real config format at the path it reads, or (b) have the bootstrap unit translate bootstrap.jsoncontroller.yaml. Until then the controller boots to setup mode.
  4. Pin form — SPKI (validated by curl) vs leaf-cert SHA-256 (agent convention). 8A picks one for the Go controller; both fingerprints captured in §1.
  5. LAN exposure — §1's security observation: the local-API is on the host LAN IP, gated by auth only. Consider bridge-bind / firewall narrowing in the back-half.

4. Verdict — GO to spec 8A + the provisioning back-half

Both unvalidated foundations are proven at the plumbing level:

  • Channel (doc §6 transport): guest→host over vmbr0 works with no firewall rule on this demo; fingerprint-pinning gates the handshake (wrong pin = hard TLS failure); bearer + self-scoping behave (200 / 401 / 403). → 8A can spec the real local-API server + the 7 §6 endpoints with confidence in the transport.
  • Deploy: the config-mount + golden-baked bootstrap unit cleanly deploys and configures the controller without pct exec (F3 principle holds); docker login+pull from the guest with a Gitea pull token works; the controller runs healthy and an in-guest process reaches the host endpoint with its bootstrap token. → the provisioning back-half can adopt this mechanism (mount + baked unit + per-guest token mint), addressing gotchas 13.

Out of scope (noted, not built here)

  • The real local-API server + the 7 §6 endpoints, the per-guest token→guest map and self-scoping enforcement8A spec.
  • The provisioning back-half proper (agent mints the per-guest token, writes the bootstrap mount, the controller-bootstrap unit as a permanent golden-recipe addition + the config-format alignment of gotcha 3) → 8A spec, informed by this spike.
  • Quiesced app-consistent backup (stack-stop contract) → 8B.
  • Controller de-privileging (retire the disk-execution subsystem; bind GET /storage; new customer disk-management endpoints behind the slice-4 data-bearing classifier) → 8C.

Secret handling (held)

The test local-API tokens and the registry pull credential were kept in 0600 files on the host, referenced by location, never logged or committed; the stub never logged the Authorization header; docker login used --password-stdin. No real per-guest token or registry cred appears in git. Only public cert fingerprints are recorded above.