Do you want to speed up your integration tests by 10x?

Published in

Expedia Group Technology

6 min readAug 19, 2015

By Yani Zhang and Joan Gamell

Everyone knows how important it is to have integration tests running that verify the operation of your entire software stack as part of your development process. Our team builds an overlay that allows for extremely easy booking and cross sell of other products; partners or other internal teams inject this overlay via javascript to their confirmation pages. In the beginning it was simple, supporting just hotels on a single partner site, but it has grown to encompass several lines of business and seamlessly integrates with just about any page.

As our project grew bigger so did our integration test suite; run times that started at the 10 minute mark stretched to over an hour. To maintain the velocity and test coverage we wanted, we were going to need to make the tests run much faster. Speeding up an individual test makes the suite faster, but as new tests are added total execution time will always trend upwards. To get the gains we wanted and maintain them over time, parallelization seemed like the best answer. This would allow us to increase the team’s productivity, increase code quality, and potentially save costs in AWS. Today we’ll share our experiences parallelizing our test suite. The general technique we used can be applied to just about any long running task that consists of many smaller tasks. Below you can see the huge difference in execution time before and after we adapt parallelization:

The problem

Our team uses an extensive build pipeline to help us ensure we are always close to a shippable build. Every time we check in code to our master branch:

a build is produced
Jenkins (our continuous integration server) creates a new stack on AWS and deploys the new build there
Jenkins runs the integration test suite runs against the new build
if the tests are green the stack is released to testers as the new test environment and the previous version is deleted.

With only a few feature tests at the beginning of the project we didn’t worry much about the test execution time. As the project grew we had almost 30 feature files (each of them with several scenarios) and the build/deployment times had climbed to 45–50 minutes.

We’ve learned over time that long test suite execution times encourage developers to skip the tests before committing, relying on the pipeline to catch errors. As a result not only were the build times climbing, but the builds were broken increasingly often. We eventually reached the (really bad) point where our CI pipeline was broken for too long and we started to ignore the results.

Let’s fix it

This situation was far from where we wanted to be so it was time to fix it. Should we start from scratch and look for a better/faster framework to run our tests (we currently use casper.js which runs on node) or should we try to optimize the current solution?

Vivian, a developer on the team observed that the CPUs on the test nodes were significantly underused, so she suggested executing the test runners in parallel. We started by building a node.js script called test-manager.js which would handle all the parallelization logic.

Our first idea was to use webworkers — which is “multithreading done right” for javascript apps / scripts — where each feature file would run in a different thread (web worker) in the same process. Good idea in theory, but in practice we ran into problems instantiating separate processes inside a web worker as casper.js is a standalone binary.

Spawn all the tests!

Given that the hip web worker solution failed we decided to go old school and implement the parallelization with our old friend fork(). Fork creates child processes that can complete and report their status back to the parent when they are done executing.

The test-manager.js uses node’s child_process library to interact with the OS and spawn a child process for every test — which is nothing more than a terminal command:

[javascript]var exec = require(‘child_process’).exec;
var child = exec(command, PROCESS_OPTIONS);
[/javascript]

This approach allows us to execute any commands we want with the script making it totally technology-agnostic and allowing any other team to use it (as long as you run on a UNIX platform :).

For each child process we spawn, we assign specific colors to its logs to distinguish them in the console (see below screenshot, colors provided by an npm package, “Colors”), and then we push that command to the running array where we keep track of all running child processes.

When there is an abnormal failure or exit code of this child process we print out the error on the console and mark the whole integration test suite as failed. The test manager is also configurable; it can decide how many processes to run at the same time and which commands to run.

In the following diagram we explain how the script manages the different running processes:

The script pushes the first 6 commands into the running array and we will have a run loop executing constantly to move unexecuted commands into the running array. For each command in the running array the test manager will spawn a child process to execute it.

Once a child process finishes we remove this command from the running array and print out the result report on the console. We repeat the process until all the processes have run (i.e. the waiting queue and the running array are empty).

Challenges

We thought our integration tests were air-tight, independent, and ready to be run in any order or in isolation. Whoops. A couple of runs of the tests in parallel were enough to uncover a number of issues in our tests:

Dependencies: Some features were partially dependent on other features being executed first.
Fooled by performance: Some of our features worked solely because they were executed quickly, i.e. “click the button A in page Y” worked because the action could be completed in less than 250ms. That was no longer the case with the tests running in parallel since individual execution times had climbed a little bit.
Global state: Some tests relied on global state (e.g. a global variable or txt file) to store information.

Once we realized these dependencies were there it was fairly straightforward to find the places that shared global state or expand the scenarios to remove the dependencies.

Results & Benefits

The results of this parallelization have been very promising:

10x faster test execution time (total execution time is now 5.5 mins in the AWS test environment and 1.5 minutes on a local development sandbox).
AWS cost savings as we have the current and potential test deployment stacks running side by side for much less time
Better code quality since developers can now run the tests much more often during their development cycle
Improved test code quality because parallelization forces you to correct common hidden bugs in your test code that otherwise might fail much more rarely
The faster feedback cycle allows us to fix stuff much faster and the CI pipeline to be green much more than before.
Improved team productivity as a result of the above. Before we had around 1–2 test suite runs per day and now we have between 3 and 4 on average.

How can your team benefit from all this?

The test script is not that difficult to write; and can run virtually any test or task you need in parallel (Cucumber, Jasmine, Selenium, etc.). You can download our sample test-manager.js (it is provided only as a guideline to get you started, not to be used verbatim. Usual disclosures apply).

There are only two requirements for you to use the script:

Needs a UNIX platform to run (Mac is fine) with node & npm installed.
You need to think how to split your tests and implement the minimal changes in the script to feed it the right commands. We decided to split them at feature level, but that is completely subjective and will depend on your test architecture.
Your tests must return standard UNIX exit codes (1 for error, 0 for expected output)

Future improvement

With the current implementation the exact test failure is still too hard to locate because of all the other processes’ logs. The first slated future improvement will be to collect the logs of one feature file in a buffer, and only print them out when the whole task finishes. This will give us blocks of code belonging to the same process for easy debugging.