# Phase 0 — VM vs LXC Overhead Spike: Findings **Host:** `demo-felhom` (192.168.0.162) — Proxmox VE 9.2.2, Debian 13 (Trixie), kernel 7.0.2-6-pve, 4 vCPU, 16 GB RAM (15771 MB `MemTotal`). **Date:** 2026-06-07. **Measured one guest at a time, the other fully stopped.** > This document presents **data and observations only**. No recommendation or verdict — > the architecture decision is made elsewhere. --- ## 1. Provenance ### Platform | Component | Version | |---|---| | pve-manager | 9.2.2 (`b9984c6d90a4bd80`) | | kernel | proxmox-kernel 7.0.2-6-pve | | pve-qemu-kvm | 11.0.0-3 | | qemu-server | 9.1.15 | | pve-container | 6.1.10 | | lxc-pve / lxcfs | 7.0.0-2 / 7.0.0-pve1 | | criu | 4.1.1-1 | `pvesh get /version` → release 9.2, version 9.2.2. ### Guest images | | LXC (9001) | VM (9000) | |---|---|---| | Source | `local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst` | `debian-13-genericcloud-amd64.qcow2` | | Build | Debian 13.1 standard CT template (downloaded via `pveam`, checksum verified) | cloud build **20260601-2496**; in-guest reports Debian **13.5** after `apt update` | | qcow2 | n/a | virtual 3 GiB, on-disk 323 MiB, compat 1.1/zlib | ### Docker (identical in both guests) | | LXC | VM | |---|---|---| | Source | Docker official apt repo, **`trixie` channel** (confirmed present) | same | | Version | **29.5.3** build d1c06ef | **29.5.3** build d1c06ef | | Storage Driver | **`overlayfs`** (not vfs) | **`overlayfs`** (not vfs) | | Cgroup Version / Driver | **v2 / systemd** | **v2 / systemd** | | `hello-world` | OK | OK | > Docker's official repo **does** have a `trixie` channel — no fallback to Debian's > `docker.io` was needed. Docker 29 reports the driver as `overlayfs` (the containerd > snapshotter image store) rather than the legacy name `overlay2`; this is the same > overlay technology and is **not** a `vfs` fallback. --- ## 2. Comparison table Baseline (both guests stopped): host RAM used **median 1702 MB** (range 1699–1703); host CPU **~0.1 % used** (99.9 % idle). All RAM deltas below are vs this baseline. Host RAM used = `MemTotal − MemAvailable`, 5 samples ~3 s apart (median reported). | Metric | LXC (9001) | VM (9000) | Δ (VM − LXC) | |---|---|---|---| | **Idle host-RAM delta** | **+211 MB** (1913) | **+2056 MB** (3758) | **+1845 MB** | | **Under-load host-RAM delta** | **+410 MB** (2112) | **+2084 MB** (3786) | **+1674 MB** | | **Per-guest mem attribution** | cgroup `memory.current` = **1961 MB**¹ | KVM process RSS = **2031 MB** (idle) / **2047 MB** (load) | — | | **Idle host CPU used** | **~0.3 %** (0.20 usr + 0.10 sys) | **~6.0 %** (3.37 usr + 2.31 sys + 0.29 guest) | **+5.7 pp** | | **Under-load host CPU used** | **~39.4 %** (17.1 usr + 7.5 sys + 14.5 iowait + 0.3 soft) | **~53.9 %** (31.9 guest + 16.4 iowait + 3.4 sys + 1.7 usr + 0.6 soft) | **+14.5 pp** | | **pgbench throughput** | **2211.7 tps**, lat 1.809 ms, 132 710 tx/60 s, 0 failed | **1819.6 tps**, lat 2.198 ms, 163 764 tx/90 s, 0 failed² | **−392 tps** | | **Disk allocated** | 10 GiB | 10 GiB | 0 | | **Disk used (host thin-LV)** | 26.73 % ≈ **2.67 GiB** | 29.33 % ≈ **2.94 GiB** | +0.27 GiB | | **Disk used (inside guest)** | 2.1 GiB / 9.7 GiB | 2.4 GiB / 9.7 GiB | +0.3 GiB | | **Provisioning (rough, create→ready)** | ~10–15 s³ | ~60–75 s³ | — | ¹ `memory.current` counts reclaimable page cache shared with the host and therefore **overstates** the LXC's true incremental cost; the +211 MB host-RAM delta is the honest number. ² VM 60 s runs gave 1739 & 1759 tps — consistent with the 90 s definitive run. ³ Guest-creation step only; see §4. Docker install + first image pull (~network-bound, ~identical for both) is excluded. ### Inside-guest `free -m` (context only — not the decisive number) | | total | used | buff/cache | available | |---|---|---|---|---| | LXC idle | 2048 | 125 | 1851 | 1922 | | VM idle | 1974 | 509 | 1524 | 1464 | The VM sees **1974 MB** usable of 2048 allocated (firmware/kernel reservation). --- ## 3. Docker-in-LXC viability **Worked cleanly in an *unprivileged* LXC with `--features nesting=1,keyctl=1`. No privileged fallback was needed.** - `--features nesting=1,keyctl=1 --unprivileged 1` accepted by `pct create` (PVE 9 syntax confirmed via `pct help create`). - `docker run hello-world` → success. - **Storage driver: `overlayfs`** (cgroup v2, systemd cgroup driver) — **no `vfs` fallback**. - Full 3-container stack (`postgres:17`, `redis:7`, `nginx:alpine`) came up healthy. - Named volume `pgdata` persisted a write (`SELECT count` returned 1 after table create/insert). - Multi-container networking + published port worked: `curl localhost:8080` → **HTTP 200**. - 60 s pgbench load: **0 failed transactions**. No errors, no `dmesg`/`journalctl` anomalies, no workarounds. The privileged-LXC fallback path (step A5) was therefore **not exercised**. --- ## 4. Observations & confounds 1. **VM under-load CPU required a re-measurement (diagnosed, not hidden).** The first VM-load sample showed host CPU ~5 % — identical to *idle* — while pgbench nonetheless completed a full 60 s run (1739 tps). Root cause: the VM load was launched through a **nested SSH + `nohup &`** layer (host→VM), which started pgbench *after* the sampling window. The LXC path used local `pct exec` (no nested SSH) so its first sample was valid. Re-running with pgbench held in the **foreground of a long-lived SSH channel** (guaranteed active) and sampling during a confirmed window gave the true **53.9 %** (`%guest`=31.9). **Confound:** the two guests' load was driven through different plumbing (`pct exec` vs nested SSH); the *throughput* numbers are unaffected (pgbench self-reports its own duration), but the CPU figures came from methodologically asymmetric harnesses. 2. **Baseline drift from residual page cache.** After stopping each guest, host RAM did not snap back to 1702 MB immediately (e.g. 1895 MB just after the LXC stopped; 1965→1794 MB drifting down after the VM). This is reclaimable cache, not a leak. Treat all RAM deltas as ±~100 MB. 3. **The headline RAM gap is structural, not incidental.** LXC processes share the host kernel and page cache, so only the working set counts against the host (+211 MB idle). The VM, with **no ballooning configured**, has KVM back every guest-touched page — including the guest's own 1.5 GB page cache — so the host cost ≈ the full 2 GB allocation (KVM RSS ≈ 2031 MB) and is **largely load-independent** (3758 idle → 3786 load). Ballooning / KSM were not tested and could change this. 4. **`cgroup memory.current` ≠ host cost.** For the LXC it read 1961 MB (near the 2 GB limit) because it includes reclaimable page cache; the real incremental host cost was +211 MB. Per the protocol, `MemTotal − MemAvailable` is the decisive metric. 5. **VM idle CPU floor (~6 %) vs LXC (~0.3 %).** QEMU device emulation + a full guest kernel's timer/housekeeping impose a small constant CPU cost even at rest. 6. **Throughput vs CPU trade.** The VM did slightly *less* work (1820 vs 2211 tps) for *more* host CPU (53.9 vs 39.4 %). The extra cost surfaces as `%guest` (31.9 %) — the actual DB work *plus* virtualization overhead — whereas in the LXC the same DB work appears directly as host `%usr`/`%sys`. iowait was comparable (~15–16 %, WAL fsync). 7. **Workload fits in RAM.** pgbench scale `-s 10` (~150 MB) fits in cache in both guests, so the test is commit/CPU-bound rather than disk-bound; a larger-than-RAM dataset would stress the storage paths differently and is not covered here. 8. **qemu-guest-agent confirmed on the VM** (`qm guest cmd 9000 ping` → OK). This enables `guest-fsfreeze`-based app-consistent `snapshot`-mode vzdump for the VM — a capability the LXC has no equivalent for. The genericcloud image does **not** ship the agent; it had to be installed in-guest (and the VM IP had to be found via `nmap`/MAC until the agent was up). 9. **Provisioning asymmetry foreshadows cloning.** LXC create is template-extract-bound (526 MiB at 387 MiB/s + SSH keygen, ~10–15 s). VM create is qcow2-import-bound (3 GiB → LVM ≈ 30 s) plus a full firmware boot to SSH-ready (~30–45 s). Figures are rough, single-run, and exclude the shared network-bound Docker install + first image pull. --- ## 5. Raw command log (appendix) ### 5.1 Provenance ``` $ pveversion -v | grep ... pve-manager: 9.2.2 (running version: 9.2.2/b9984c6d90a4bd80) proxmox-kernel-7.0: 7.0.2-6 criu: 4.1.1-1 lxc-pve: 7.0.0-2 lxcfs: 7.0.0-pve1 pve-container: 6.1.10 pve-qemu-kvm: 11.0.0-3 qemu-server: 9.1.15 $ pvesm status local dir active 98497780 4333576 89114656 4.40% local-lvm lvmthin active 365760512 0 365760512 0.00% # Docker repo trixie channel: $ curl -fsSL https://download.docker.com/linux/debian/dists/ | grep -oE 'trixie|bookworm|bullseye' bookworm / bullseye / trixie # trixie present # Cloud image: $ qemu-img info debian-13-genericcloud-amd64.qcow2 virtual size: 3 GiB ; disk size: 323 MiB ; compat 1.1 ; build 20260601-2496 ``` ### 5.2 Baseline (both guests stopped) ``` $ for i in 1..5; awk MemTotal-MemAvailable /proc/meminfo ; sleep 3 used=1699 MB / 1702 / 1702 / 1702 / 1703 MB (median 1702) $ mpstat 1 5 Average: all 0.05 usr 0.05 sys ... 99.90 idle ``` ### 5.3 LXC 9001 — create + Docker ``` $ pct create 9001 local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst \ --hostname spike-lxc --cores 2 --memory 2048 --rootfs local-lvm:10 \ --net0 name=eth0,bridge=vmbr0,ip=dhcp --features nesting=1,keyctl=1 \ --unprivileged 1 --start 1 Logical volume "vm-9001-disk-0" created. extracting archive ... Total bytes read: 551505920 (526MiB, 387MiB/s) Creating SSH host key ... done === exit: 0 ; status: running features: nesting=1,keyctl=1 ; unprivileged: 1 ; ip 192.168.0.115/24 # Docker install (official repo, trixie stable): DOCKER-INSTALL-OK $ docker --version -> Docker version 29.5.3, build d1c06ef $ docker run --rm hello-world -> Hello from Docker! $ docker info | grep -iE 'Storage Driver|Cgroup' Storage Driver: overlayfs Cgroup Driver: systemd Cgroup Version: 2 Server Version: 29.5.3 ; Kernel: 7.0.2-6-pve ; OS: Debian GNU/Linux 13 (trixie) ``` ### 5.4 LXC 9001 — stack health ``` $ docker compose ps spike-cache-1 running Up spike-db-1 running Up spike-web-1 running Up $ curl -s -o /dev/null -w 'HTTP %{http_code}' localhost:8080 -> HTTP 200 $ psql CREATE TABLE spike_persist; INSERT; SELECT count(*) -> 1 (volume persists) ``` ### 5.5 LXC 9001 — idle measurement ``` Host RAM used (5x3s): 1913 / 1914 / 1913 / 1914 / 1913 MB (median 1913, Δ +211) cgroup memory.current: 2056036352 B = 1961 MB inside free -m: total 2048 used 125 buff/cache 1851 available 1922 mpstat 1 5 Average: 0.20 usr 0.10 sys ... 99.70 idle (~0.3% used) pct df 9001: rootfs 9.7G size, 2.1G used, 21.6% ``` ### 5.6 LXC 9001 — under-load measurement ``` $ pgbench -i -s 10 -> done in 1.39 s $ pgbench -T 60 -c 4 (run concurrently with sampling): Host RAM used (5x3s): 2149 / 2143 / 2112 / 2086 / 2071 MB (median 2112, Δ +410) cgroup memory.current: 2130382848 B = 2032 MB mpstat 1 5 Average: 17.10 usr 7.50 sys 14.50 iowait 0.31 soft 60.59 idle (~39.4% used) pgbench result: scaling 10, clients 4, 60 s transactions: 132710 ; failed 0 (0.000%) latency average = 1.809 ms ; tps = 2211.713864 host thin LV vm-9001-disk-0: 10240 MB, Data% 26.73 (≈2.67 GiB) ``` ### 5.7 VM 9000 — create + cloud-init ``` $ qm create 9000 --name spike-vm --cores 2 --memory 2048 \ --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-single --agent 1 $ qm set 9000 --scsi0 local-lvm:0,import-from=/var/lib/vz/template/qcow2/debian-13-genericcloud-amd64.qcow2 transferred 3.0 GiB of 3.0 GiB (100.00%) scsi0: successfully created disk 'local-lvm:vm-9000-disk-0,size=3G' $ qm set 9000 --ide2 local-lvm:cloudinit --boot order=scsi0 --serial0 socket --vga serial0 $ qm disk resize 9000 scsi0 10G -> resized 3.00 -> 10.00 GiB $ qm set 9000 --ciuser spike --cipassword spike --sshkeys /root/spike-pubkey.pub --ipconfig0 ip=dhcp # pubkey file = the two real keys from the host's /etc/pve/priv/authorized_keys # (incl. ssh-ed25519 ...kisfenyo@windows — the same workstation key) $ qm start 9000 -> start-ok ``` ### 5.8 VM 9000 — IP discovery + guest agent + Docker ``` # genericcloud has no guest-agent at first boot -> qm guest cmd ping failed. # IP found via MAC on the bridge: $ nmap -sn 192.168.0.0/24 | grep -B2 BC:24:11:C7:41:87 Nmap scan report for 192.168.0.155 ; MAC BC:24:11:C7:41:87 (Proxmox) $ ssh -i /root/.ssh/id_rsa spike@192.168.0.155 'hostname; cat /etc/debian_version' spike-vm ; 13.5 # install qemu-guest-agent + Docker (official repo, trixie): VM-INSTALL-OK $ qm guest cmd 9000 ping -> AGENT OK (fsfreeze available) $ docker --version -> Docker version 29.5.3, build d1c06ef $ docker run --rm hello-world -> Hello from Docker! $ docker info | grep -iE 'Storage Driver|Cgroup' Storage Driver: overlayfs ; Cgroup Driver: systemd ; Cgroup Version: 2 ``` ### 5.9 VM 9000 — stack health ``` $ docker compose ps -> spike-cache-1 / spike-db-1 / spike-web-1 all running $ curl ... localhost:8080 -> HTTP 200 $ psql ... SELECT count(*) -> 1 (volume persists) ``` ### 5.10 VM 9000 — idle measurement ``` Host RAM used (5x3s): 3758 / 3757 / 3754 / 3759 / 3758 MB (median 3758, Δ +2056) KVM process RSS / VSZ: 2079988 / 3380896 KiB (RSS = 2031 MB) inside free -m: total 1974 used 509 buff/cache 1524 available 1464 mpstat 1 5 Average: 3.37 usr 2.31 sys 0.29 guest ... 94.04 idle (~6.0% used) qm config: scsi0 local-lvm:vm-9000-disk-0,size=10G host thin LV vm-9000-disk-0: 10240 MB, Data% 29.33 (≈2.94 GiB) inside df -h /: 9.7G size, 2.4G used, 25% ``` ### 5.11 VM 9000 — under-load measurement (definitive, load confirmed active) ``` # First attempt (nested-ssh + nohup &) launched pgbench AFTER the sample window -> # host CPU read a false ~5% (identical to idle). Diagnosed; re-run below holds # pgbench in the foreground of a long-lived SSH channel and samples during it. $ pgbench -T 90 -c 4 (foreground, channel held): transactions: 163764 ; failed 0 (0.000%) latency average = 2.198 ms ; tps = 1819.602345 (60 s confirmation runs: 1739 & 1759 tps) # Sampled 10 s into the confirmed-active load: Host RAM used (5x3s): 3784 / 3786 / 3786 / 3786 / 3786 MB (median 3786, Δ +2084) KVM process RSS / VSZ: 2096508 / 4495008 KiB (RSS = 2047 MB) guest uptime: load average 1.71 (2 vCPU) -> vCPUs busy mpstat 1 8 Average: 1.70 usr 3.40 sys 16.35 iowait 0.58 soft 31.89 guest 46.08 idle (~53.9% used) ``` ### 5.12 Teardown state ``` $ qm list -> 9000 spike-vm stopped $ pct list -> 9001 spike-lxc stopped # both present, both stopped (numbers can be re-checked) ``` --- ## 6. Teardown — destroy commands (NOT run) Both guests were left **stopped but present**. To remove them: ```bash qm destroy 9000 --purge # VM (also removes cloudinit + disks) pct destroy 9001 --purge # LXC # optional spike artifacts on the host: rm -f /var/lib/vz/template/qcow2/debian-13-genericcloud-amd64.qcow2 rm -f /root/spike-pubkey.pub /root/vm-install.sh # (Debian 13 CT template left in place: local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst) ```