Testing with Kubernetes

Published in

Smartbox engineering

6 min readJul 31, 2019

Written by Sam Blair & Aniruddha Shinde — July 31, 2019

In this article, we would like to embark upon our journey to move from static Windows-based Selenium grid to a more dynamic setup using Kubernetes. Firstly, we will define the challenge that we were facing which was the driving factor towards the solution implemented. Later we will shed some light on how the solution helped us address this challenge.

The Challenge

With every solution that we design its important to understand what triggered the need for a change. The 3 key factors for us was the expansion of the IT team, the addition of new websites and enabling teams to deliver quality software faster. The biggest challenge we face as a QA team at Smartbox is the number of websites to validate. Currently, we have 10 websites to validate and projected to increase to 15. This is the reason why we focus on our automation suite to enable quality. With the addition of new tests as part of the features led to an expanding test automation suite. In addition, we have multiple IT teams working on several microservices. These teams would want to ship their code to production in parallel and as part of the code release, we run the automation tests which in turn puts more stress on the grid. These factors led to more and more stress on our selenium grid which was then acting as a bottleneck towards parallel code deliveries. Below was our selenium grid setup before moving to Kubernetes.

26 Static Windows VMs
Supported up to 90 parallel tests
Supported 2 different browsers (Firefox & Chrome)
Enabled Parallel releases

The Solution

After investigating and researching many potential solutions we decided upon Kubernetes as a platform for running selenium in Docker containers. We set up a test Kubernetes cluster initially which took a lot of experimenting and tweaking to get it working. Because this was a new technology for both of us, we had to learn all of this from scratch. Once it was up and running, we discovered how powerful this tool is. It used to take us several hours to set up new windows machines running Selenium. Now we can scale up from zero to one hundred Selenium nodes in a matter of minutes. With our automation, we could set up new Kubernetes nodes also in a matter of minutes. The time-saving features from this were truly extraordinary. We also had the unexpected feature of performance increases from our Selenium cluster running the regression suites, while consuming much less computing power.

Below is a diagram of a high-level view of what our setup looks like.

We have Selenium running in Docker and these Docker containers are being managed by Kubernetes. Kubernetes controls and manages everything resource related. We figured out the threshold of how many Selenium nodes we could run on our test infrastructure and gave an estimation of what we could potentially produce on a fast-dedicated environment.

Implementation

We implemented this by setting up seven virtual machines, one master node, and six slave nodes. The Master node just manages the Kubernetes slaves and the slaves contain the Selenium Docker containers. The Kubernetes nodes are set up and maintained using Chef. Chef manages the installation and setup of Kubernetes on the virtual machines and sets up the Kubernetes bootstrapping. We estimated that we could comfortably run one hundred and sixty Selenium nodes. We aimed and hoped to run at least two hundred Selenium nodes but settled for one hundred and eighty nodes as the cluster was unstable when we pushed it that far. We set up Selenium by finding the latest Selenium Docker images and creating Kubernetes files and deployed them on the cluster. We ran one Selenium hub deployment and scaled one Selenium Chrome node one hundred and eighty times.

Below are a few commands which we use to create our Selenium cluster and scale the nodes up and down. As you can see we can spin up an entire cluster with a few commands, which is a huge improvement to overall efficiency.

Setting up Selenium Hub

kubectl run selenium-hub — image selenium/hub:3.14.0-gallium — port 4444
kubectl expose deployment selenium-hub — type=NodePort

Setting up Selenium Nodes

kubectl run selenium-node-chrome — image selenium/node-chrome:3.14.0-gallium — env=”HUB_PORT_4444_TCP_ADDR=selenium-hub” — env=”HUB_PORT_4444_TCP_PORT=4444"
kubectl scale deployment selenium-node-chrome — replicas=180

Results

The results of these changes have been very good, to say the least. We drastically reduced our footprint in the data center while increasing our throughput by increasing the number of Selenium nodes available for test execution. Below is the hardware utilisation before and after the move to Kubernetes. We have significantly lowered the number of nodes, usage of CPUs and RAM utilisation while increasing the parallel execution capability by double. On our previous grid, we were able to run 90 tests in parallel, with the new grid we can run 180 tests in parallel.

Comparison between old and new selenium cluster

In addition to better resource utilisation, we have saved time in setting up new Selenium nodes and general admin of the selenium cluster. Selenium grid with Kubernetes is more robust than the previous Windows-based Selenium grid. If something goes wrong with the grid, we can shut down and spin up a grid in a matter of minutes. A simple Jenkins job has been created to restart the grid.

Limitations

We did experience a few limitations which required workarounds. The main issue was finding the sweet spot for the number of Selenium nodes with the resources we had with our Kubernetes slaves. We wanted to get the most power out of our cluster we could and it took a trial period to figure this out and a few crashed cluster. The next biggest limitation was running Selenium in Docker can be unstable if the containers are left for an extended period of time. We found that the Selenium hub sometimes crashed without an exit code which meant the Selenium hub container did not get restarted. So to combat this, we added a Jenkins job which restarts the cluster every third day. This took the failure rate to zero.

What’s next?

The current Selenium grid setup with Kubernetes is now stable, we would continue to enhance it further in the coming months. Some of the key points to be worked on are:

Add the capability to connect to Selenium workers via VNC to view live execution: this will help anyone to see the live execution of test cases and help ease the debug process.
Dynamic Selenium grid setup: the end goal is to spin up a new grid on the fly when a regression job is kicked off. Once the job is completed it will shut node the grid and free up the Kubernetes nodes.
Moving to the cloud: Once we implement the dynamic setup mentioned above the idea is to move this setup in the cloud service like AWS keeping in account the security concerns. This will help us use the resources when needed and free up the VMs when not in use. This will eventually help reduce our running costs.

The capability of Kubernetes is vast and there are several ways to keep enhancing and optimising the deployment environments not just for Selenium grid but for other systems as well. Ease of configuration and its natural ability to scale makes Kubernetes a great choice for setting up a Selenium grid.