Simulating the Performance of Load-Balancers in Videoconferencing

Published in

omi-uulm

8 min readJul 28, 2021

This article was written by Louis Schell, a high school student from Karl Maybach Gymnasium in Friedrichshafen, Germany, and is about his work on a topic that was granted to him in the course of the simulierte-welten scholarships offered by the kiz of Ulm University. He gave permission to publish it under the omi-uulm group.

The Problem

I am a high school student in Friedrichshafen, Germany. I was informed of the ‘Simulierte Welten’ scholarships by our headmaster, C. Felder of the Karl-Maybach-Gymnasium. Over the past 2 years I have frequently made use of services such as Big Blue Button, Zoom, and WebEx at school. While I am generally happy with the quality and performance of these conferencing systems, occasionally problems occur. While we tend to assume a bad Internet connection or a slow PC, the increasing amount of users that came with Covid-19 highlighted a new problem.

When you connect to the video conferencing service, a server that connects you to the other meeting-participants has to reserve memory resources, network bandwidth, and computational power for your session. If the server lacks any of these resources, it buckles under the load.

Under these circumstances, the connection may terminate, video could freeze, or audio may have a high latency. Even worse, services running in parallel on the same server may come to a standstill.

All in all, this results in a horrific user experience and poses a high risk to user satisfaction and Quality of Experience.

There is very little that can be done to combat this behaviour on a single server. Either you can upgrade the hardware, or you can upgrade the hardware. Really, you have no choice (besides not using a single server).

How you upgrade your hardware is a more interesting question. While improving a single server is intuitive, expanding to multiple servers makes more sense; from a financial standpoint and redundancy wise. More servers mean that less powerful (and hence cheaper) hardware is needed for each server and leaves you with a larger array of fail-safes.

This solution seems attractive, but comes with a challenge. While it would be nice for clients to know which server to join, it’s wishful thinking. We do not want to hand out permissions to clients, so adding an intermediary instance is unavoidable. This intermediary instance is now in charge of distributing the clients among the nodes. We call this the “load balancer”, because it is supposed to distribute the load across the servers.

Load balancing is the process of distributing network traffic across multiple servers. In the case of the Big Blue Button video conferencing tool, this translates to hosting multiple conferences on different servers, allowing for full utility of the benefits provided by a multi-node configuration.

While the process of balancing seems simple enough, there are a few constraints, that make this problem hard:

Users are organized in meeting-rooms.
Meeting-rooms cannot span multiple servers.
Once attached to a server, meeting-rooms cannot be moved.

The dynamic nature of this problem is what makes it so interesting to research. The challenge is to distribute n rooms to m servers, while distributing the load as equally as possible among all servers. With the help of simulations, we can analyse different load balancing strategies and rate them according to how well they distribute the load.

But why simulate something, when we could test it out real life?

The Uses of a Simulation

While it may seem intuitive at first, practically implementing, deploying and observing a model load-balancer in a real-world system is not a good idea. Here’s a couple reasons why simulations are superior.

Repeatability
The importance of repeatability as a fundamental property of a simulation must not be underestimated. In regards to a load-balancing simulation, this means we can use an identical dataset over and over again. This is beneficial when comparing results for different algorithms, while keeping the same environment variables.
Reproducibility
Reproducibility is another major principle of research. While in reality fluctuating environment variables may cause deviation among results, a simulation is deterministic (with the exception of probabilistic algorithms). This contributes to a clean set of results that can be used to fairly compare load-balancers.
Straightforwardness
While understanding the big picture is the goal of all scientists, progress can only be achieved in small steps. When applied to load-balancing, this means understanding the effects of an algorithm on a system bit by bit. While simulations mostly try to be as close to reality as possible, they still only map reality to a simpler depiction of our world. Therefore, they can assist in revealing certain patterns, that a complex system may not explicitly display.
Friendliness
Practically implementing a load-balancing algorithm into a real videoconferencing system seems intuitive at first, but the added level of complexity imposed by mammoth systems like Big Blue Button not only unbalances the ratio of “time spent implementing an algorithm” to “time spent researching an algorithm”, but also makes analysis and extraction of data needlessly more difficult. Simulations allow for test-case scenarios without the need to implement the algorithm in a frameworks. They also save the need for safety. If something goes wrong, its no big deal. “Rinse and repeat”, the real world stays unharmed.
Customizability
Simulations allow for targeted edge case simulation. Cases, that in the real world may be rare or non existent.

All in all, a load-balancing simulation is a great alternative to implementing a load-balancer in a real system. This is the reason that I worked on a simulation engine for the last 6 months.

The Simulation Engine

First off, the simulator is programmed in Java. Java is great for frameworks and its scalable nature allows for easy extension and adaption of the simulator.

Development was iterative, and the simulator was refactored at it’s core multiple times. Though tedious, a clean structure helped us provide a solution built for extensibility and modularity.

How does the Simulator work?

The simulation is event based and driven by the Simulation class. Each simulation is made up of a data iterator, a load balancer, and event listeners.

A data iterator is responsible for the ETL part of simulation and provides input data for the simulation.
The load balancer distributes rooms to servers in the simulation.
Event listeners provide a useful interface for developing additional logic on top of the core. This functionality is used to create graphs, process output data, and rate the load balancer.

The simulation runs in a loop as long as the database provides data. This behaviour is regulated through schedules, that define a start and end to the simulation. While it is running, the simulation repeatedly calls the data iterator for new data. The iterator serializes data from the database (or generates it) and turns it into meeting-rooms, which are then passed on to the load balancer. This is where stuff gets interesting.

The balancing part of the simulation is split into two steps:

Per default, the balancer pre-balances the load. This means handling all user joins/leaves that do not result in a new room being created/deleted, thus getting rid of all changes in user count, where balancing is not possible anymore (see constraints), because the rooms have already been attached to a server.
In the balancing phase, the load balancer calls its logic, that attaches rooms to servers. User joins/leaves are automatically handled, so the logic must only distribute the rooms to the servers.

After the load is has been balanced, the simulation goes into the next iteration of the loop and repeats the process. With every event, the registered event listeners are notified. This concludes the base functionality of the simulation.

Results

We were able to implement some custom balancing algorithms and rate them against each other.

In total, we tested 6 balancers that relied on both deterministic and heuristic approaches. We found, that heuristic approaches generally outperformed the other load-balancing algorithms. We presume this is due to the dynamic nature of the problem. Rooms still grow and shrink after distributing them. While this behaviour is somewhat predictable, it would require a neural network or a pattern detection algorithm to do so.

The Assessment Process:

For evaluation of the algorithms we relied on a simple loss function from machine learning, the quadratic loss function, aka. mean squared error.

This function can give us a “rating” on how close our value is to the optimal value. The higher the MSE, the worse our load-balancer is performing. There only stays the question, what to use for the optimal value. While there are ways to calculate the realistic optimal distribution of a system, this problem is very computational expensive. Instead, we took the average value (often a non integer value) as the optimal value. While this is unrealistic, it still allows us to compare the algorithms to each other.

We calculated the mean squared error for both the user count and the room count and repeated the simulation multiple times in case of a heuristic approach.

On top is the result of a simulation comparing 2 different types of random balancers, and 2 round robin balancers with differing quantums. While the values only differ from each other minimally, the balancer which takes the best of 3 random servers (in turquoise) outperforms the other load balancers.

While there are many more elegant algorithms, our mission was not to research the best load balancing algorithm, but instead help others develop these algorithms.

In Conclusion

Overall, I think my partner and I were successful. We accomplished our goal of simulating the loads of a video conferencing tool and had fun working on this project. It was a great learning opportunity both of us, and I can recommend trying for the “Simulierte Welten” scholarships to anyone interested in a challenge.

If by any means you would like to contact me, please send an e-mail to the official “Simulierte Welten” mailing address: info@simulierte-welten.de. I am happy to answer any questions regarding the simulation engine.