Running Cross-Browser Selenium Tests in Parallel using Selenium Grid and Docker Containers
Table of Contents
- Common Misconceptions
- Understanding resource utilization
- Understanding the tools we have for parallelism
- Using Containers for deploying multiple instances of the SUT
Recently, I get a lot of questions regarding running tests in parallel and cross-platform. In many cases, the person asking the question mentions that he wants to use Selenium Grid and Docker to achieve this. When I ask some clarification questions, I often find that the person doesn’t really know what these tools are for, and what simpler alternatives can help them achieve the same goals.
In this (pretty long) article, I’ll try to put some order in the different tools and options, in order to help you make the right decisions. But before I’ll explain things, I want to start with some common misconceptions.
Misconception #1 — Selenium Grid runs Selenium tests in parallel
In fact, the basis for this misconception is another, deeper misconception, which is that Selenium is a testing framework. So let’s make it clear right on: Selenium is not a testing framework. In fact, Selenium doesn’t have any notion of a test! Selenium is just a software library that enables code to interact with the browser. In order to write tests using Selenium, you’re typically using a real testing framework, like JUnit or TestNG for Java, MSTest, NUnit or XUnit.Net for C#, PyTest for Python, etc. Selenium and Selenium Grid have nothing to do with running tests in parallel.
Misconception #2 — You need Selenium Grid in order to run Cross-Browser testing
While Selenium Grid gives you more flexibility in running cross-browser tests in a distributed fashion, you can still run cross-browser tests on a single machine without Selenium Grid, or use another mean to distribute the tests across multiple machines (e.g. using Jenkins or Azure DevOps agents), and have each of them run against a different browser.
Misconception #3 — Docker Containers help me run my tests in parallel
Docker Containers (and Kubernetes, AKA K8S) are great technologies. Some people tend to think that because containers are like a lightweight VM, and because they can easily spawn as many container instances as they like very easily, then containers can help them run more tests in parallel on a single host. Obviously, containers don’t add physical hardware resources, and frankly, most of what containers allow you to do, you could do before containers were around, using standard processes. The big advantage of containers is that they help you create isolated and predictable environments, i.e. with regard to the pre-installed applications, existing files and folders, IP ports, etc. But as we’ll see soon, this also doesn’t solve many of the common isolation problems.
Misconception #4 — Docker Containers help me avoid collisions between tests
Well, it’s true that one of the ideas behind containers is to improve isolation between environments. But first, we need to distinguish between using containers for the client (which run the browser and the driver), and using containers for the server before we can discuss how it can help you avoid collisions between tests.
If you thought about using containers for the client, then it can help you avoid collisions in your tests only if your tests manipulate some resources on the client (local) machine, not directly through Selenium. For example, if tests need to read the content of some downloaded files, then it can help you avoid collisions between tests that try to download and read the same file. But this is a pretty unique scenario and for most other scenarios it won’t help you avoid any collisions whatsoever, because the collisions are mostly related to data that is used by the server.
Using containers for the server can help a bit more, but still, most of the collisions are related to the data which is stored in the database, and is typically left outside of the containers anyway.
Understanding resource utilization
In order to understand what we can really achieve using these technologies and how, it’s important to understand what resources they use and manage.
Let’s start with the resources that the SUT itself uses. A typical Web application consists of a server side, which may consist out of multiple (micro-) services, one or more databases, and a web client. Most of the computing resources (i.e. CPU and memory) that are used by the SUT are used by the server side and the database. As long as we’re using a single, well-known “Test” environment (as most of us do, and as opposed to creating a new environment dynamically every run), and this environment is designed to support many concurrent users, then these resources have no direct effect on our ability to run tests in parallel efficiently.
Another process that we have in our Selenium test, which takes its portion of resources from the client machine, is the driver of the corresponding browser (e.g. ChromeDriver.exe, GeckoDriver.exe, etc.). Like the client side, and even more so, the resources it takes are pretty negligible.
And then there’s the tests themselves. The process that runs the tests is typically a unit-testing framework, like JUnit, MSTest, etc., which loads your tests and also the appropriate language binding of Selenium WebDriver. While most tests mainly just communicate with the browser through the Web Driver, they also consume their share of compute resources. Figure 1 shows the round-trip that most operations that the test performs (e.g. Click on a button) goes through in a standard configuration.
In most cases, most of the time taken by a test run is taken by the processing at the server side and the processing and disk access of the database. Often, most of the time that the other players in the flow take, is spent just waiting for the server and the database. In addition, a significant part of the time is taken by the communication overhead between the different processes. This overhead is even much more significant when the processes reside on different machines, and more so if the machines are farther away from each other (e.g. in the cloud)
Understanding the tools we have for parallelism
Now let’s try to clarify the effect that each of the tools have on the resources and on our ability to run tests in parallel
Most unit testing frameworks have an option to split tests to run on different threads (Refer to the documentation of your specific framework for more details). In the lowest level, if we run the tests using multiple threads, we take advantage of more cores on the machine that runs the tests. Even with a single core, when one test waits for a response from the Web Driver (which in turn waits for a response from the browser, which in turn waits for a response from the server…), then another test can use the CPU at the same time, making the entire test run to complete sooner.
However, because threads run in the same process, and share the same memory space, you must make sure that your tests manage the access to any shared memory resource. In practice, it usually means that you should avoid using
static fields, including singletons, in your test code. If you didn’t plan your tests to support it, then most chances are that it will be very difficult to retrofit this ability into your tests, and if you’d try to run your tests on different threads, then you’ll most likely end up with race-conditions and very unstable tests. Even if you don’t use static fields, but you didn’t design your tests to run in parallel, the tests may use shared data in the application (e.g. use the same user), which may cause them to interfere with each other.
Another thing to note is, given that your tests are designed to avoid any shared state, is that every thread (and every test) should create its own instance of Selenium driver, which in turn spawns its own instance of the corresponding Web Driver process, and a new instance of a browser. However, these browsers will typically communicate with the same server and database, creating more load on that server and database. Figure 2 depicts the configuration of running tests on multiple threads.
Like threads, processes also allow you to take advantages of more cores and utilize each core better. Unlike threads, processes don’t share memory, so you don’t have to worry about static fields. However, if your tests weren’t designed to run in parallel, in terms of the data that the application uses, then you may still have race conditions. For example, if one tests adds a product to a shopping cart and another test removes it, then these tests might interfere with one another if they share the same cart.
Note that some frameworks let you split the tests between different processes pretty seamlessly. These frameworks do this by using yet another process that synchronizes and manages the actual test processes. Some other framework don’t have means to split the tests between processes, and you need to manage it yourself. For example, you can invoke the test executable twice, providing a different list of tests to run to each of them.
In terms of resources, because processes don’t share memory, then running multiple processes takes up more memory than running them just in different threads. But if you’re not very limited with memory, then typically it shouldn’t bother you so much. From most other aspects, it’s pretty similar to using different threads. Figure 3 depicts the configuration of running tests using multiple processes.
Splitting the Tests between Physical Machines
If you have multiple physical machines then by definition you have more compute resources. While it won’t affect the time your tests wait for the server side to do its work, and may even incur some communication overhead, the test, driver and browser (client) side can take advantage of these added resources.
This is usually achieved by some kind of collaboration between the unit testing framework and agents of the appropriate build system, I.e., Jenkins, VSTS, etc. This way the tests are split pretty evenly between the different agents. Alternatively, if you want to run all the tests on different browsers, you may choose to install only the appropriate browser and driver on a different machine, and simply run all the tests on each agent with a different setting that tells it which driver to use.
An important thing to understand about this method, is that the binaries of your test code are copied by the build to the appropriate agent, and then runs on that agent. This means that if the tests have some dependencies on external libraries, files, or anything specific that should be installed on the machine in order for the test to run correctly, it should either be installed there in advance, or be copied by the build process. In particular, the appropriate browsers and their corresponding Web Drivers should be on the agents in order for the tests to run correctly. Figure 4 depicts the configuration of running Selenium tests on multiple machines.
Splitting the Tests between Virtual Machines
Running tests using multiple VMs is technically identical to running them on different physical machines. The advantage of VMs is that it’s much more flexible to manage and maintain them. But obviously multiple VMs share the same hardware resources of their host. If your company has its own data-center, it probably has many hosts and they try to optimize they resource utilization in the most efficient way, so the details are probably transparent to you, and you can treat the VMs just like you do with physical machines. However, because of that flexibility, you can consider requesting more compute resources from your admins instead of requesting new VMs. There are pros and cons to doing that, but it may be simpler for you to manage one strong test VM that runs all the tests rather than many VMs which split the tests between them. The communication overhead would also be reduced. On the other hand, this big VM will always reside on a single host, which may limit the optimizations that the admins can do to better utilize the hardware resources.
VMs can also be run on regular PCs or servers, like with Virtual Box, or Hyper-V on a Windows Server. This option may have an advantage over splitting the tests using mere processes on the host only in terms of isolation, but in terms of resources it will only take more resources due to the overhead of the VM, rather than save you anything. Figure 5 depicts the configuration of running tests on 2 VMs hosted on the same physical machine.
Using VMs (or Agents) in the Cloud
Today, most companies use one of the cloud vendors instead of managing their own data centers. In the cloud you can lease compute resources in the form of VMs, and in an operational standpoint it’s pretty much the same as using VMs. Typically the cloud vendor charge you per VM, and the more compute resources you request for it, the more you pay. The main advantages of the cloud is that the cloud vendor takes care of the maintenance of the hardware resources, and that you can easily expand or shrink the amount of resources you consume.
One more aspect to consider when using the cloud is that the network latency is usually bigger than when running it locally. Figure 6 depicts the configuration of running 2 test VMs in the cloud.
Using Selenium Grid
As we saw, Selenium Grid isn’t required in order to run tests in parallel, nor for cross-browser testing. When you’re using Selenium Grid, the tests run on one machine (or more) and the drivers and browsers run on another set of machines. Obviously, these machines may either be physical or virtual (VMs), and even Containers, as we’ll discuss shortly. The important thing to remember is that your tests don’t run on the same machine where the browser runs. This means that each call to
sendKeys, etc.) takes a round-trip between the machine that runs the test and the machine that runs the browser. As always, premature optimization is the root of all evil, but because these round-trips occur very frequently, there’s a high chance that it will impact the performance of your tests significantly.
Selenium Grid’s main role is to act as a router and a resource manager between the tests and available nodes that run the browsers. For example, you can configure 3 nodes running Chrome and 2 running Firefox. When the test would request a Firefox driver, Selenium Grid will route the test to one of the Firefox nodes, and associate that node with that test (in fact, it associates the node with an instance of the
WebDriver and not with the test itself, as Selenium doesn’t have a notion of a test). If another test requests another Firefox instance while the first one is still running, then Selenium Grid will reserve the 2nd Firefox node for the new test. When the first test completes and closes the driver, it releases that node so it’s available to serve yet another test.
Note that if you use Selenium Grid but all of the tests run from a single thread, then only one browser will be used at a time, making Selenium Grid pretty much redundant. As mentioned above: Selenium Grid by itself won’t make your tests run in parallel. Figure 7 depicts a configuration employing Selenium Grid with a single thread of tests.
However, Selenium Grid provides some important benefits, at least in some scenarios:
- If the browser is your bottleneck, then you can have multiple machines running either the same browser or different browsers. However, you’ll still have to run your tests in parallel using one of the above mentioned methods in order to take advantage of it.
- Most browsers don’t let you install different versions of them on the same machine. If you want to test your code on different versions of the same browser, then you must install them on different machines anyway. Using Selenium Grid allows you to keep the tests on one machine and direct the Selenium calls to the appropriate machine. This may be useful even if you don’t run the tests in parallel.
- Sometimes you want test the application not just on different browsers, but also on different operating systems (e.g., Windows, Linux, MacOS) or different versions of those same operating system (Windows 7, Windows 10, etc.). Selenium Grid can manage all of these machines and let you direct your Selenium calls to the appropriate machine
- Sometimes you have multiple teams that want to run cross-browser tests, and while each team has their own set of tests that run from a different machine, the pool of machines running browsers need to be shared between them, maybe due to security and IT maintenance considerations. Selenium Grid can route the traffic between the clients (test machines) and the appropriate nodes (machines which run the browsers), and manage the availability of the nodes.
Figure 8 depicts a configuration of multiple test machines using Selenium Grid to manage a pool of Selenium nodes. Of course that configurations having 2 test processes or even 2 threads are also possible.
Browser cloud providers
There are companies, like BrowserStack and SouceLabs to name a few, that hold a massive pool of Selenium Grid nodes, allowing you to rent these nodes in order to run cross-browser tests on them. The concept is almost identical to the use of Selenium Grid, with the advantage that you don’t have to manage and maintain the machines, their resources, updates, etc., and of course that it’s much more scalable.
Note that the server of the application doesn’t have to be running in the cloud. Anyway, keep in mind that every call to Selenium will round-trip from the test machine to the cloud, from there to the server, and all the way back to the test.
Figure 9 depicts the configuration of using a browser cloud provider.
The last technology, which we cannot complete this discussion without it, is containers. By far the most popular and significant tool in this space is Docker. If you’re not familiar with containers, here’s a short description:
You can think of containers as something between a process and a very light-weight VM. Containers apply virtualization capabilities of the hardware and the host OS in a way that resemble VMs, but with one main difference: While VMs run full blown OS (the guest OS), which can be completely different from the host OS (for example, you can run Windows 7 VM on a Linux host), Containers share the OS of their host, and only virtualizes resources like the file-system, processes, and networking, which make them more isolated and reproducible then simple processes. Note that Windows Server is able to run Linux containers (by running a Linux subsystem inside of Windows), but at least currently, it’s not possible to run Windows containers on Linux. The main advantages of containers are:
- They’re very light-weight. While an image of a full VM is very big because it has to contain the entire guest OS, an image of a container only contains the process or processes that we want to run, and maybe some files that we need the contained application to access.
- They can be started instantaneously. While a VM can takes pretty long to be “turned on”, containers start almost immediately.
- It’s easy to create multiple instances of the same container
- A container can easily be transferred from one host to another, as long as the hosts run the same OS.
- In Docker, it’s easy to configure how to compose the content of the container (files, processes, network resources, etc.) using a simple, textual configuration file that you can keep and manage inside your source-control repository. You can also compose images based on other images, and only specify the differences.
Regarding its usage for parallel execution of cross-browser tests, containers can be used almost in any place that a VM or a physical machine can be used. However, keep in mind these 2 things:
- Except for the case of running a Linux container inside Windows, the guest OS must be the same as the host OS. This means that it’s not appropriate if you want to run cross-browser/cross-platform tests. For example, you can’t test that your application runs in Chrome, both on Windows 10 and on Windows 7.
- As long as the containers are hosted inside the same physical machine, using them to run tests in parallel won’t make your tests run any faster compared to running them using different processes or threads, because the hardware resources are still the same.
Container orchestration tools (Kubernetes)
If you want to deploy and manage a complex environment that consists of different machines and services, you’d probably need a tool like Kubernetes to help you do that. For example, you can create and duplicate complete environments that contain a complete Selenium Grid lab, including all of the nodes, and even the server of the application under test. However, while there can be proper usages for this scenario, in most cases, the true benefit of container orchestration is in the deployment of a micro-services oriented application, and less so for just running cross-browser tests in parallel.
Using Containers for deploying multiple instances of the SUT
As mentioned in the section about resource utilization above, in many cases the bottleneck for running tests in parallel is the application itself. Many companies employ a limited number of environments where they deploy the application to, in a pipeline manner .These are typically named Dev, Test, Staging and Production (with some possible variations). Typically the automated tests run against the Test environment.
In order to break this bottleneck, it makes sense to duplicate the entire “Test” environment and create multiple instances of it. Using Docker containers and Kubernetes through the CI/CD pipeline are the natural fit for the job. Figure 10 depicts a configuration or 2 test machines running tests against 2 separate servers, each resides in its own container, but both are hosted on the same physical machine.
Breaking the database bottleneck
Duplicating the application servers and micro-services is usually the easy part, and sometimes it may remove your main bottleneck. However, often the real bottleneck is the database. Also, besides of performance, when using a shared database to serve multiple tests running simultaneously, it’s hard to ensure that tests don’t interfere with one another and to ensure their reliability and consistency. Moreover, especially with relational databases where the structure of the tables is fixed, a shared database prevents you from testing different versions (or branches) of the application simultaneously. But if you have separate databases for each environment, then you can more easily experiment with new features without affecting all other test runs.
The more difficult part is the database, which is typically the real bottleneck. Note that in production, it’s typical to have multiple application servers behind a network load balancer (NLB) to improve performance. However, there’s usually only one instance of the database server, because the data itself should be shared. But functional tests should better be isolated rather than share data between them, so for that reason, and also to improve the performance of the test runs, it often makes sense to have a separate database for each instance of the “Test” environments.
Creating multiple Test databases may sound infeasible to you, because they’re too big. But keep in mind that most of your tests don’t really need all the data that your Test database contains. It suffices to have a minimum set of data that your tests really need, of even start with an empty database and let the tests insert the data they need. This also makes the tests more self-contained and easier to maintain.
Figure 11 depicts 2 completely separated test environments, each belongs to its own Kubernetes Cluster and contains 3 containers: the Test container, having the test, driver and browser processes, another container for the server, and a 3rd containers for the database. The physical host of each of these containers is insignificant, as you can easily port if from one host to another. Note that this configuration doesn’t make use of Selenium Grid because each cluster can have a different browser to start with, and you don’t need a mechanism to route between the test and the browser (though there can be configurations in which it does make sense)
As you can see, there are many ways to run tests in parallel, and to test your application across multiple browsers and platforms. Each has its pros and cons, and it’s important to understand what you really need. The new and shiny tools, like Docker and Kubernetes are great, but use them only if it solves your problem. I recommend that you start by stating what is the problem that you’re trying to solve, and then look for the simplest solution that solves it. You can always change and improve it later if you need to.