Distributed Automation: How to run 1000 UI Automation Tests in 5mins

Published in

Expedia Group Technology

9 min readJun 15, 2018

Distributed Automation Using SeleniumGridScaler + AWS

Until late 2014, all our UI automation tests (~9k) were running as part of around 350 or more Jenkins jobs, and they were running directly on Jenkins executors (i.e the browsers would open directly on the executors). This presented us with numerous problems:

Each executor can run only a small number of browsers concurrently
No way to scale without increasing the number of jobs and executors, still that wouldn’t come near the concept of real concurrent execution
Stale browsers hanging in executors made the executors unusable
350 and ever growing Jenkins jobs just to run UI automation were simply too many!
No easy way to analyze test failures quickly

This was choking us as a company as automation jobs were taking hours to run. It wasn’t helping us run faster. As soon as we realized this, we set ourselves a goal:

To run all UI automated tests within the time taken by the slowest test case.

This is Distributed Automation (DA) using Amazon Web Services (AWS) and SeleniumGridScaler (which is Selenium Grid plus the EC2 API). I presented this implementation at the 2015 Selenium Conference in Portland,OR, but lots of things have changed since then.

We are running around 90+ SeleniumGridScaler hubs in AWS, owned by various teams, to run their tests in their own CI/CD pipeline and running more than 150,000 tests on a daily basis!

The Need for Speed

Production ready changes are a must for CI/CD. Whether the changes are verified and validated by automation before or after checkin, the feedback cycle to validate those changes should be super fast — especially when UI tests are involved.

When a project has, for example, over 300 UI tests which are used to validate every code change, those tests might take around ninety minutes to complete without parallelization. This is super slow and and not productive at all. An engineer should know if his or her change broke the product or not in a few minutes, not in hours.

SeleniumGridScaler (SeleniumGrid + AWS EC2) talks to AWS using the EC2 API and can auto scale the number of nodes running tests. This makes it easy to achieve our goal: Make the time to run all UI automation tests be the time taken by the longest-running test. If there are 300 tests to run and the slowest test takes 3 minutes to complete, then all 300 tests should be completed within 3 minutes.

The Need for Controlled Cost

When we use our data center machines for running automation, we seldom think about the cost we are incurring, because those machines are just there, and are always ours. But in fact, dedicated data center machines cost a lot in terms of maintenance, OS licenses, staff, electricity, air-conditioning, building rent, and so forth. Running in AWS is a paradigm shift when it comes to how we think of cost, but not necessarily the one we expected going in, because we are billed per second for all of the AWS resources we specify, whether we use them or not. AWS gives us the ability to pay for only what we use, and this is a boon. But if we don’t manage our resources well and not terminate or stop them when not in use, we are unnecessarily paying extra money to Amazon.

SeleniumGridScaler can autoscale when running automated tests, spinning up nodes for tests and terminating the nodes when they are no longer needed. This way, after an automation run is completed, all the nodes attached to the hub can be terminated immediately and the hub can be shutdown. You pay only for what you use. A shutdown AWS instance incurs no cost except for EBS volumes.

The Need for Autoscaling

With micro services, a product is a combination of hundreds of different apps, which run on their own CI/CD pipeline and release to production. Each app’s requirements vary, so that one might need 300 tests to validate while another requires just 20. You tell your SeleniumGridScaler hub how many tests are going to run for your app, and it immediately auto scales to the required number of nodes and attaches them to the hub. This flexibility allows different teams to have their own hubs and use them according to their need. Once a test run is completed, the nodes can be terminated and the cycle continues. You pay only for what you use! Everything is automated and no manual intervention is required.

One note of caution: You must scale your test infrastructure appropriately to handle the throughput generated by tests running concurrently. If you have switched to Distributed Automation but your infrastructure hasn’t scaled, then it is not going to be able to handle the throughput, and the result will be lots of flaky tests.

The Need for Persistence

Nothing gives deeper understanding of a problem than being able to recognize the pattern that underlies the problem. To recognize a pattern, we need persisted data spanning a reasonable period of time. When running UI tests on Jenkins, there are plugins that show test status trends, but data for individual tests is lost when the jobs are purged, so this data cannot be analyzed or worked on with other tools. Persisting browser console JavaScript errors, screenshots, HTML reports, automation errors, test scenarios, test environment, Splunk errors, and other artifacts into a lightweight database and having the ability to show them as a trend dashboard via an app helps narrow down issues quickly.

The Distributed Automation system includes a Node.js based dashboard in addition to SeleniumGridScaler. The dashboard app reads automation results from MongoDB and shows various trends over a period of 4 weeks, which help teams narrow down the root cause of any failure. Any sudden spike in red in the dashboard will be directly pointing to a change against which a particular batch of automation ran, making it easier and more straightforward to find the root cause.

Since DA is super fast due to its concurrent nature, the entire automation suite can be run separately for every change. Hence a test failure maps directly to the single change against which the automation was run, which reduces the time to find the cause of the failure.

Persisting automation artifacts also allows us to do a lot of magic like normalizing the errors and grouping them, which will show for example, that just two errors have caused 10 failed test cases. This further reduces time in failure analysis. Persisting also enables the use of Machine Learning to categorize error types. When we can programmatically classify a failure as an automation issue, flakiness, or a real bug, we can then automatically create the appropriate defect in Jira. This is something we are trying to achieve now.

Distributed Automation Fact Sheet

The Hub

Use c5.large (up to 300 test in parallel), or c5.2xlarge (300–1000 tests in parallel). Depending on your circumstances, you might do best with a different instance type if you have many more or many fewer tests.

The reason for using different type of hub instances is because, when running a high number of tests concurrently, the Hub’s bandwidth will be saturated for a few seconds during the WebDriver creation process. This is expected, and by itself is not a sign that you should change your instance type. But if the bandwidth used exceeds an instance type’s bandwidth limit, you might have to go to the next higher one to fix the problem.

Running the SeleniumGridScaler JAR will start the Hub, which accepts requests to, autoscale nodes for the number of tests, launch new nodes if tests need to run but no nodes are available, terminate nodes when not in use, terminate nodes on demand and shut down the hub itself on demand.

For a small or medium number of tests, a Dockerized Hub works. But for a larger number of tests, a Dockerized Hub can cause stability issues (communication issue between hub and its nodes), especially when running hundreds of tests concurrently. This is an observation I had and after switching to run hub as a regular java process, this issue disappeared. It could be due to docker being an additional layer on top of the OS combined with the traffic between the hub and nodes.

For smaller projects, instead of c5.large, you can try t2.medium, which can give burst performance when necessary and is 45% cheaper.

Nodes

Use c5.xlarge. It can run 15 instances of either Chrome or Firefox in parallel. The Hub creates each node based on a property file which has details like node AMI, security group, tags, vpc, subnet, etc. Bootstrap code will start the Selenium process in each node and help it attach it to the Hub which created it.

OS: Ubuntu, because it supports running the latest Firefox and Chrome (AWS Linux does not support those browsers.)

Why c5.xlarge instead of running one test per node on a smaller instance type? Simply, for cost control — using one c5.xlarge costs half the equivalent number of t2.small instances. Also with c5.xlarge , you are saving 14 IP addresses for every 15 tests.

On an average day, our 90+ hubs, create, use, and terminate around 4500+ nodes on demand as part of the feedback loops of various projects’ CI/CD pipeline.

Subnet

Make sure the Hub and nodes are created in the same subnet to avoid FORWARDING_TO_NODE_FAILED errors in Selenium. Also, the subnet should have enough free IP addresses.

Automation Framework

Any framework that is a wrapper around Selenium WebDriver (Watir, Nightwatch.js, ScalaTest, Geb, Java+Selenium, etc.)

Ability to run your tests in parallel

Make sure you can run your tests in parallel to make use of concurrent execution provided by the hub.

Example

mvn test -projects tests/projectname -Denvironment=TEST -DtestGroups=AcceptanceLive -Dbrowser=firefox -Dparallel=true -Dnumthreads=300 -DtestGroupsExclude=tier3

Pay only for what you use

Automate starting the Hub before running a test. (Have each DA Jenkins job attempt to start the hub, assuming it is down.)

Automate on-demand termination of instances after automation is completed.

Example: curl — connect-timeout 20 — max-time 10 -X “DELETE” http://${IP_OF_HUB}:4444/grid/admin/AutomationTestRunServlet — stderr -

The Hub can also terminate the nodes based on idle time. This is handy if your test suites run many times an hour, but the on-demand termination shown above is another option.

Cross Browser cloud providers vs Distributed Automation

Our strategy is to use our internal Distributed Automation solution in our regular pipeline, using Firefox or Chrome. For cross browser coverage, we run selected tests on cloud based cross-browser providers, to have coverage on OS X Safari, IE11, and mobile browsers based on our customer usage stats.

Source code repositories

Original SeleniumGridScaler repo

https://github.com/mhardin/SeleniumGridScaler

With Expedia-specific changes

https://github.com/ambirag/seleniumgridscaler/tree/SeleniumGridScalerExp
Important changes
We run 15 Firefox or Chrome sessions in 1 box instead of 1 session per instance.
Ability to run multiple Selenium hubs within the same subnet.
Hub can terminate all nodes on-demand and stop itself, invoked by calling a url.
Hub can also terminate a node itself, based on idle time and per-second billing cycle instead of hourly.
Most of these changes will be available in the master repo soon.

Different Distributed Automation Topologies

Depends on your project size, requirements, and spending limit, you can choose any of these topologies.

Distributed Automation Dashboard

Persisting automation artifacts help us narrow down issues very quickly by showing test status trend that spans over a period of 4 weeks. Full details in another blog post!

For different category of Automated Test Results like Acceptance|Regression and Stub| Un-stubbed

Having the ability to run any number of UI test concurrently, on multiple CI/CD pipelines, without any limitation, have enabled us to build lots of tools and features around Distributed Automation that increase developer productivity and quality of our releases. I hope to address those tools and features in future blogs.

I presented this topic in Selenium Conference, Portland, OR in 2015 — https://www.youtube.com/watch?v=cbIfU1fvGeo but lots of things have changed (read improved) since this was presented!.