Files
felhom-agent/docs/tests/phase0-findings.md
T
2026-06-07 20:46:01 +02:00

15 KiB
Raw Blame History

Phase 0 — VM vs LXC Overhead Spike: Findings

Host: demo-felhom (192.168.0.162) — Proxmox VE 9.2.2, Debian 13 (Trixie), kernel 7.0.2-6-pve, 4 vCPU, 16 GB RAM (15771 MB MemTotal). Date: 2026-06-07. Measured one guest at a time, the other fully stopped.

This document presents data and observations only. No recommendation or verdict — the architecture decision is made elsewhere.


1. Provenance

Platform

Component Version
pve-manager 9.2.2 (b9984c6d90a4bd80)
kernel proxmox-kernel 7.0.2-6-pve
pve-qemu-kvm 11.0.0-3
qemu-server 9.1.15
pve-container 6.1.10
lxc-pve / lxcfs 7.0.0-2 / 7.0.0-pve1
criu 4.1.1-1

pvesh get /version → release 9.2, version 9.2.2.

Guest images

LXC (9001) VM (9000)
Source local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst debian-13-genericcloud-amd64.qcow2
Build Debian 13.1 standard CT template (downloaded via pveam, checksum verified) cloud build 20260601-2496; in-guest reports Debian 13.5 after apt update
qcow2 n/a virtual 3 GiB, on-disk 323 MiB, compat 1.1/zlib

Docker (identical in both guests)

LXC VM
Source Docker official apt repo, trixie channel (confirmed present) same
Version 29.5.3 build d1c06ef 29.5.3 build d1c06ef
Storage Driver overlayfs (not vfs) overlayfs (not vfs)
Cgroup Version / Driver v2 / systemd v2 / systemd
hello-world OK OK

Docker's official repo does have a trixie channel — no fallback to Debian's docker.io was needed. Docker 29 reports the driver as overlayfs (the containerd snapshotter image store) rather than the legacy name overlay2; this is the same overlay technology and is not a vfs fallback.


2. Comparison table

Baseline (both guests stopped): host RAM used median 1702 MB (range 16991703); host CPU ~0.1 % used (99.9 % idle). All RAM deltas below are vs this baseline. Host RAM used = MemTotal MemAvailable, 5 samples ~3 s apart (median reported).

Metric LXC (9001) VM (9000) Δ (VM LXC)
Idle host-RAM delta +211 MB (1913) +2056 MB (3758) +1845 MB
Under-load host-RAM delta +410 MB (2112) +2084 MB (3786) +1674 MB
Per-guest mem attribution cgroup memory.current = 1961 MB¹ KVM process RSS = 2031 MB (idle) / 2047 MB (load)
Idle host CPU used ~0.3 % (0.20 usr + 0.10 sys) ~6.0 % (3.37 usr + 2.31 sys + 0.29 guest) +5.7 pp
Under-load host CPU used ~39.4 % (17.1 usr + 7.5 sys + 14.5 iowait + 0.3 soft) ~53.9 % (31.9 guest + 16.4 iowait + 3.4 sys + 1.7 usr + 0.6 soft) +14.5 pp
pgbench throughput 2211.7 tps, lat 1.809 ms, 132 710 tx/60 s, 0 failed 1819.6 tps, lat 2.198 ms, 163 764 tx/90 s, 0 failed² 392 tps
Disk allocated 10 GiB 10 GiB 0
Disk used (host thin-LV) 26.73 % ≈ 2.67 GiB 29.33 % ≈ 2.94 GiB +0.27 GiB
Disk used (inside guest) 2.1 GiB / 9.7 GiB 2.4 GiB / 9.7 GiB +0.3 GiB
Provisioning (rough, create→ready) ~1015 s³ ~6075 s³

¹ memory.current counts reclaimable page cache shared with the host and therefore overstates the LXC's true incremental cost; the +211 MB host-RAM delta is the honest number. ² VM 60 s runs gave 1739 & 1759 tps — consistent with the 90 s definitive run. ³ Guest-creation step only; see §4. Docker install + first image pull (~network-bound, ~identical for both) is excluded.

Inside-guest free -m (context only — not the decisive number)

total used buff/cache available
LXC idle 2048 125 1851 1922
VM idle 1974 509 1524 1464

The VM sees 1974 MB usable of 2048 allocated (firmware/kernel reservation).


3. Docker-in-LXC viability

Worked cleanly in an unprivileged LXC with --features nesting=1,keyctl=1. No privileged fallback was needed.

  • --features nesting=1,keyctl=1 --unprivileged 1 accepted by pct create (PVE 9 syntax confirmed via pct help create).
  • docker run hello-world → success.
  • Storage driver: overlayfs (cgroup v2, systemd cgroup driver) — no vfs fallback.
  • Full 3-container stack (postgres:17, redis:7, nginx:alpine) came up healthy.
  • Named volume pgdata persisted a write (SELECT count returned 1 after table create/insert).
  • Multi-container networking + published port worked: curl localhost:8080HTTP 200.
  • 60 s pgbench load: 0 failed transactions.

No errors, no dmesg/journalctl anomalies, no workarounds. The privileged-LXC fallback path (step A5) was therefore not exercised.


4. Observations & confounds

  1. VM under-load CPU required a re-measurement (diagnosed, not hidden). The first VM-load sample showed host CPU ~5 % — identical to idle — while pgbench nonetheless completed a full 60 s run (1739 tps). Root cause: the VM load was launched through a nested SSH + nohup & layer (host→VM), which started pgbench after the sampling window. The LXC path used local pct exec (no nested SSH) so its first sample was valid. Re-running with pgbench held in the foreground of a long-lived SSH channel (guaranteed active) and sampling during a confirmed window gave the true 53.9 % (%guest=31.9). Confound: the two guests' load was driven through different plumbing (pct exec vs nested SSH); the throughput numbers are unaffected (pgbench self-reports its own duration), but the CPU figures came from methodologically asymmetric harnesses.
  2. Baseline drift from residual page cache. After stopping each guest, host RAM did not snap back to 1702 MB immediately (e.g. 1895 MB just after the LXC stopped; 1965→1794 MB drifting down after the VM). This is reclaimable cache, not a leak. Treat all RAM deltas as ±~100 MB.
  3. The headline RAM gap is structural, not incidental. LXC processes share the host kernel and page cache, so only the working set counts against the host (+211 MB idle). The VM, with no ballooning configured, has KVM back every guest-touched page — including the guest's own 1.5 GB page cache — so the host cost ≈ the full 2 GB allocation (KVM RSS ≈ 2031 MB) and is largely load-independent (3758 idle → 3786 load). Ballooning / KSM were not tested and could change this.
  4. cgroup memory.current ≠ host cost. For the LXC it read 1961 MB (near the 2 GB limit) because it includes reclaimable page cache; the real incremental host cost was +211 MB. Per the protocol, MemTotal MemAvailable is the decisive metric.
  5. VM idle CPU floor (~6 %) vs LXC (~0.3 %). QEMU device emulation + a full guest kernel's timer/housekeeping impose a small constant CPU cost even at rest.
  6. Throughput vs CPU trade. The VM did slightly less work (1820 vs 2211 tps) for more host CPU (53.9 vs 39.4 %). The extra cost surfaces as %guest (31.9 %) — the actual DB work plus virtualization overhead — whereas in the LXC the same DB work appears directly as host %usr/%sys. iowait was comparable (~1516 %, WAL fsync).
  7. Workload fits in RAM. pgbench scale -s 10 (~150 MB) fits in cache in both guests, so the test is commit/CPU-bound rather than disk-bound; a larger-than-RAM dataset would stress the storage paths differently and is not covered here.
  8. qemu-guest-agent confirmed on the VM (qm guest cmd 9000 ping → OK). This enables guest-fsfreeze-based app-consistent snapshot-mode vzdump for the VM — a capability the LXC has no equivalent for. The genericcloud image does not ship the agent; it had to be installed in-guest (and the VM IP had to be found via nmap/MAC until the agent was up).
  9. Provisioning asymmetry foreshadows cloning. LXC create is template-extract-bound (526 MiB at 387 MiB/s + SSH keygen, ~1015 s). VM create is qcow2-import-bound (3 GiB → LVM ≈ 30 s) plus a full firmware boot to SSH-ready (~3045 s). Figures are rough, single-run, and exclude the shared network-bound Docker install + first image pull.

5. Raw command log (appendix)

5.1 Provenance

$ pveversion -v | grep ...
pve-manager: 9.2.2 (running version: 9.2.2/b9984c6d90a4bd80)
proxmox-kernel-7.0: 7.0.2-6
criu: 4.1.1-1
lxc-pve: 7.0.0-2
lxcfs: 7.0.0-pve1
pve-container: 6.1.10
pve-qemu-kvm: 11.0.0-3
qemu-server: 9.1.15

$ pvesm status
local         dir      active  98497780  4333576  89114656  4.40%
local-lvm  lvmthin    active 365760512        0 365760512  0.00%

# Docker repo trixie channel:
$ curl -fsSL https://download.docker.com/linux/debian/dists/ | grep -oE 'trixie|bookworm|bullseye'
bookworm / bullseye / trixie        # trixie present

# Cloud image:
$ qemu-img info debian-13-genericcloud-amd64.qcow2
virtual size: 3 GiB ; disk size: 323 MiB ; compat 1.1 ; build 20260601-2496

5.2 Baseline (both guests stopped)

$ for i in 1..5; awk MemTotal-MemAvailable /proc/meminfo ; sleep 3
used=1699 MB / 1702 / 1702 / 1702 / 1703 MB      (median 1702)

$ mpstat 1 5
Average: all 0.05 usr 0.05 sys ... 99.90 idle

5.3 LXC 9001 — create + Docker

$ pct create 9001 local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst \
    --hostname spike-lxc --cores 2 --memory 2048 --rootfs local-lvm:10 \
    --net0 name=eth0,bridge=vmbr0,ip=dhcp --features nesting=1,keyctl=1 \
    --unprivileged 1 --start 1
  Logical volume "vm-9001-disk-0" created.
  extracting archive ... Total bytes read: 551505920 (526MiB, 387MiB/s)
  Creating SSH host key ... done
=== exit: 0 ; status: running
features: nesting=1,keyctl=1 ; unprivileged: 1 ; ip 192.168.0.115/24

# Docker install (official repo, trixie stable): DOCKER-INSTALL-OK
$ docker --version            -> Docker version 29.5.3, build d1c06ef
$ docker run --rm hello-world -> Hello from Docker!
$ docker info | grep -iE 'Storage Driver|Cgroup'
 Storage Driver: overlayfs
 Cgroup Driver: systemd
 Cgroup Version: 2
 Server Version: 29.5.3 ; Kernel: 7.0.2-6-pve ; OS: Debian GNU/Linux 13 (trixie)

5.4 LXC 9001 — stack health

$ docker compose ps
spike-cache-1  running   Up
spike-db-1     running   Up
spike-web-1    running   Up
$ curl -s -o /dev/null -w 'HTTP %{http_code}' localhost:8080   -> HTTP 200
$ psql CREATE TABLE spike_persist; INSERT; SELECT count(*)     -> 1   (volume persists)

5.5 LXC 9001 — idle measurement

Host RAM used (5x3s): 1913 / 1914 / 1913 / 1914 / 1913 MB     (median 1913, Δ +211)
cgroup memory.current: 2056036352 B = 1961 MB
inside free -m: total 2048 used 125 buff/cache 1851 available 1922
mpstat 1 5 Average: 0.20 usr 0.10 sys ... 99.70 idle   (~0.3% used)
pct df 9001: rootfs 9.7G size, 2.1G used, 21.6%

5.6 LXC 9001 — under-load measurement

$ pgbench -i -s 10  -> done in 1.39 s
$ pgbench -T 60 -c 4 (run concurrently with sampling):
Host RAM used (5x3s): 2149 / 2143 / 2112 / 2086 / 2071 MB     (median 2112, Δ +410)
cgroup memory.current: 2130382848 B = 2032 MB
mpstat 1 5 Average: 17.10 usr 7.50 sys 14.50 iowait 0.31 soft 60.59 idle  (~39.4% used)
pgbench result: scaling 10, clients 4, 60 s
  transactions: 132710 ; failed 0 (0.000%)
  latency average = 1.809 ms ; tps = 2211.713864
host thin LV vm-9001-disk-0: 10240 MB, Data% 26.73  (≈2.67 GiB)

5.7 VM 9000 — create + cloud-init

$ qm create 9000 --name spike-vm --cores 2 --memory 2048 \
    --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-single --agent 1
$ qm set 9000 --scsi0 local-lvm:0,import-from=/var/lib/vz/template/qcow2/debian-13-genericcloud-amd64.qcow2
  transferred 3.0 GiB of 3.0 GiB (100.00%)
  scsi0: successfully created disk 'local-lvm:vm-9000-disk-0,size=3G'
$ qm set 9000 --ide2 local-lvm:cloudinit --boot order=scsi0 --serial0 socket --vga serial0
$ qm disk resize 9000 scsi0 10G        -> resized 3.00 -> 10.00 GiB
$ qm set 9000 --ciuser spike --cipassword spike --sshkeys /root/spike-pubkey.pub --ipconfig0 ip=dhcp
   # pubkey file = the two real keys from the host's /etc/pve/priv/authorized_keys
   #   (incl. ssh-ed25519 ...kisfenyo@windows — the same workstation key)
$ qm start 9000   -> start-ok

5.8 VM 9000 — IP discovery + guest agent + Docker

# genericcloud has no guest-agent at first boot -> qm guest cmd ping failed.
# IP found via MAC on the bridge:
$ nmap -sn 192.168.0.0/24 | grep -B2 BC:24:11:C7:41:87
  Nmap scan report for 192.168.0.155 ; MAC BC:24:11:C7:41:87 (Proxmox)
$ ssh -i /root/.ssh/id_rsa spike@192.168.0.155 'hostname; cat /etc/debian_version'
  spike-vm ; 13.5
# install qemu-guest-agent + Docker (official repo, trixie): VM-INSTALL-OK
$ qm guest cmd 9000 ping            -> AGENT OK   (fsfreeze available)
$ docker --version                  -> Docker version 29.5.3, build d1c06ef
$ docker run --rm hello-world       -> Hello from Docker!
$ docker info | grep -iE 'Storage Driver|Cgroup'
 Storage Driver: overlayfs ; Cgroup Driver: systemd ; Cgroup Version: 2

5.9 VM 9000 — stack health

$ docker compose ps -> spike-cache-1 / spike-db-1 / spike-web-1 all running
$ curl ... localhost:8080 -> HTTP 200
$ psql ... SELECT count(*) -> 1   (volume persists)

5.10 VM 9000 — idle measurement

Host RAM used (5x3s): 3758 / 3757 / 3754 / 3759 / 3758 MB     (median 3758, Δ +2056)
KVM process RSS / VSZ: 2079988 / 3380896 KiB  (RSS = 2031 MB)
inside free -m: total 1974 used 509 buff/cache 1524 available 1464
mpstat 1 5 Average: 3.37 usr 2.31 sys 0.29 guest ... 94.04 idle  (~6.0% used)
qm config: scsi0 local-lvm:vm-9000-disk-0,size=10G
host thin LV vm-9000-disk-0: 10240 MB, Data% 29.33  (≈2.94 GiB)
inside df -h /: 9.7G size, 2.4G used, 25%

5.11 VM 9000 — under-load measurement (definitive, load confirmed active)

# First attempt (nested-ssh + nohup &) launched pgbench AFTER the sample window ->
# host CPU read a false ~5% (identical to idle). Diagnosed; re-run below holds
# pgbench in the foreground of a long-lived SSH channel and samples during it.

$ pgbench -T 90 -c 4 (foreground, channel held):
  transactions: 163764 ; failed 0 (0.000%)
  latency average = 2.198 ms ; tps = 1819.602345
  (60 s confirmation runs: 1739 & 1759 tps)

# Sampled 10 s into the confirmed-active load:
Host RAM used (5x3s): 3784 / 3786 / 3786 / 3786 / 3786 MB     (median 3786, Δ +2084)
KVM process RSS / VSZ: 2096508 / 4495008 KiB  (RSS = 2047 MB)
guest uptime: load average 1.71 (2 vCPU)  -> vCPUs busy
mpstat 1 8 Average:
  1.70 usr  3.40 sys  16.35 iowait  0.58 soft  31.89 guest  46.08 idle   (~53.9% used)

5.12 Teardown state

$ qm list  -> 9000 spike-vm stopped
$ pct list -> 9001 spike-lxc stopped
# both present, both stopped (numbers can be re-checked)

6. Teardown — destroy commands (NOT run)

Both guests were left stopped but present. To remove them:

qm destroy 9000 --purge            # VM   (also removes cloudinit + disks)
pct destroy 9001 --purge           # LXC
# optional spike artifacts on the host:
rm -f /var/lib/vz/template/qcow2/debian-13-genericcloud-amd64.qcow2
rm -f /root/spike-pubkey.pub /root/vm-install.sh
# (Debian 13 CT template left in place: local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst)