How I came to build a cheap server cluster for VDI

Michael Lipp
5 min readJul 11, 2024

--

In 2019, our department of electrical engineering noticed that we spend too much of our budget on maintaining the VDI for labs. At that time, our infrastructure on the server side consisted of a small, professional-style “data center” using rack components for hardware and VMware as software. I wondered if we really needed this. Of course, the system has to be reliable, especially during exams. So the failure of one component should be compensated for automatically. But anything beyond that seems like an exaggeration. And there simply must be a way to run VMs without incurring license costs.

Favorable and unexpected circumstances

Fortunately, I had access to a small cluster of three tower case servers (each equipped with 24 Zen 2 cores and 256 GiB of main memory, 6 hard drives, an M.2 NVMe, and two 10 Gbit/s network interfaces), two manageable switches (each with four 10 Gbit/s and eight 1 Gbit/s ports), as well as two routers with pfSense. You can find little to no free information about scaling servers for VDI, but my estimate was that this setup should support at least 100 VMs running concurrently. The total cost of the hardware sums up to about 20.000 € at that time — less than the amount that the department had to pay for the replacement of a single disk array after the failure of an air conditioning unit a few months before.

I had just started to evaluate things like pacemaker, heartbeat and oVirt when two events occurred: the COVID-19 pandemic and the university’s decision to provide VDI as a central service — with no costs for the department. The former regrettably put a hold on my evaluation in the short term because devising online lectures and labs got the highest priority, of course. The latter fortunately freed me from any deadlines and allowed me to do more thorough research in the long term.

A new attempt

When I picked up the project again, Kubernetes had evolved into a well established platform. So, why not build everything on top of that? And instead of having RAID arrays in each node and some non-trivial mirroring configuration, why not put Ceph on the nodes and make this a hyper-converged cluster? And then use the upcoming KubeVirt to run the VMs in pods … but wait.

At about the same time, the QEMU VMs on a lab server stopped working when we upgraded to AlmaLinux 9, because RedHat had decided to drop SPICE protocol support from RHEL. In the corresponding issue, you can find the remark that KubeVirt isn’t interested in supporting SPICE. Oops. Remember that our department is a department of electrical engineering? Many lab exercises involve programming microcontrollers on evaluation boards that need to be connected to the student's VM via USB. SPICE excels at supporting this. So not having SPICE support in the long term rules out KubeVirt for our use case.

A fresh approach

When the QEMU VMs stopped working on the lab server, I put them into a container as an emergency workaround. This is basically what KubeVirt does, too. From a high-level perspective, it wraps libvirt, which wraps QEMU, and then adds some management components. But as I only need to run QEMU VMs, this seems unnecessarily complicated. And besides, what if RedHat decides to exclude SPICE from their support for libvirt as well? I decided to evaluate a much simpler approach: start QEMU (optionally with a software TPM) in a pod without any additional overhead and make this configurable by using a helm chart.

To be honest, I was a bit surprised that this approach worked extremely well. Some features, however, such as changing the VM’s CD-ROM, cannot easily be done with a shell script implementing the controller for QEMU. So I “converted” the shell script to a Java program and extended its functions. As I wanted to gain a deeper understanding of Kubenetes anyway, I added a “VM-Operator”, which allows me to easily provide a web-interface for both administrators and users of the VMs.

The result

When I started 100 VMs for the first time, I was quite disappointed. The first dozens came up smoothly. But then the rate at which new VMs were created dropped significantly, effectively coming to a halt, and the last VM was started more than 10 minutes after the first.

First boot (requested RAM, corresponds to the number of running VMs)

As it turns out, even VMs booting from a cloned snapshot of a master have to write a certain amount of data at first boot. This completely saturates the write capacity of the hard disks. (And maybe, just maybe, I shouldn’t have used a pool with three replicas. But let’s leave optimization for later.)

Subsequent boot (requested RAM, corresponds to the number of running VMs)

Subsequent boots of the 100 VMs are much more performant. The boot shown above took only 1:15. Note that I have configured LVs on each NVMe in the machines as write-through caches for the hard disks, so most of the data read comes from memory and the NVMes. Of course, you’d use SSDs when buying hardware nowadays. So I repeated the tests with two SSDs in each server. As you would expect, this boosts the first boot of the VMs (only 1:49), but has little effect on subsequent boots (1:02). We are going to do a few more tests, but booting is definitely a higher parallel load than students working on their exercises.

The scenario described is certainly not everybody’s use case. But it shows that you can build a server cluster for 100 VMs with some redundancy for (nowadays less than) 20.000 €. And should our university decide to suspend the free VDI offering when new licensing is on schedule after the Broadcom/VMware deal, our department now has an affordable Plan B to fall back on.

--

--

Michael Lipp
Michael Lipp

Written by Michael Lipp

I like to understand things from ground up and tell others about them.

Responses (1)