JavaScript Test-Runners Benchmark

(Part 1, The Unit Testing)

Photo by Braden Collum

Performance is an important criteria when choosing test-runner. Tests should pass as fast as possible to detect errors earlier, improve developer experience and reduce CI servers running time. In this story I will compare the most popular JavaScript test-runners on the same set of unit tests and find the winners.

Candidates

I will not go deep into the specific features of test-runners. Each one has many pros and cons. For detailed comparison of features I can suggest this awesome Overview of JavaScript Testing in 2017 by Vitalik Zaidman. Here I will talk only about cold-run execution time. I will use all test-runners in out-of-box setup and toggle only a few common options that I will describe later.

So, please welcome the candidates for today’s performance competition:

If your favorite test-runner is missing here please feel free to post it in comments.

Preparation

Before shooting from a starter pistol let’s discuss the rules. To make a comparison fair enough I should apply each runner to the same set of tests and with the same running options. But it’s not possible because each runner has own format of tests and own running options. Therefore I will perform some unifications.

Generating test-files

For tests itself I’ve defined 3 common logical parameters:

  • synchronous / asynchronous (constant delay, random delay)
  • with nested suites / without nested suites
  • with before|after|beforeEach|afterEach hooks / without hooks

Every combination of these parameters is used to generate test-files. For example, synchronous tests without nested suites and hooks look like this:

describe('suite 0', function () { 
it('test 0 0', function () {});
it('test 0 1', function () {});
it('test 0 2', function () {});
it('test 0 3', function () {});
it('test 0 4', function () {});
});

and asynchronous tests with random delay look like this:

describe('suite 0', function () { 
it('test 0 0', function (done) { setTimeout(done, 3); });
it('test 0 1', function (done) { setTimeout(done, 0); });
it('test 0 2', function (done) { setTimeout(done, 8); });
it('test 0 3', function (done) { setTimeout(done, 2); });
it('test 0 4', function (done) { setTimeout(done, 5); });
});

Actually all tests do not perform any assertions. It allows to measure own test-runner performance cost.

Split per runner

Moreover, all test-files are generated for each test-runner in individual format. For example, in Mocha file looks like:

describe('suite 0', function () { 
it('test 0 0', function () {});
it('test 0 1', function () {});
it('test 0 2', function () {});
it('test 0 3', function () {});
it('test 0 4', function () {});
});

But the same file for AVA is:

import test from 'ava'; 

test('test 0 0', () => {});
test('test 0 1', () => {});
test('test 0 2', () => {});
test('test 0 3', () => {});
test('test 0 4', () => {});

Runner options

Each runner has different CLI options. I’ve separated 3 common options that exist in most of runners and affect the performance:

  • serial execution / parallel execution
  • for parallel execution — number of concurrent workers
  • with Babel transpiling / without Babel

For example, to start tap with 4 parallel workers:

tap test.js --jobs=4

although for Jest the same command is:

jest test.js --maxWorkers=4

Referee’s stopwatch

For measuring execution time I will use bash time command. I will apply it to each runner CLI call and take the real time. For example, to measure Mocha:

> time mocha %path_to_tests%
real    0m0.282s
user 0m0.265s
sys 0m0.043s

To call different runners with different options I’ve automated the benchmarking process. Using shelljs library I’ve arranged a racing loop: it iterates all runners, calls corresponding CLI command and saves the times from output:

runners.forEach(runner => {
const cmd = `time ${runner} ${testsPath}`;
const proc = shelljs.exec(cmd, {silent: true});
const matches = proc.stderr.match(/real\s+([0-9]+)m([0-9\.]+)s/);
const minutes = parseInt(matches[1], 10);
const seconds = parseFloat(matches[2]);
const totalTime = minutes * 60 + seconds;
results.push({runner, totalTime});
});

At the end I get array of execution time for each runner. These data allows to build a chart.

Charts

To make a visual representation of the results I will use bar charts. Chart.js library is a great tool for that.
All charts are also available online on benchmark’s GitHub page. I will share the link at the conclusion section.

Environment and equipment

I will run all benchmarks on Node.js 7.2 on my MacBook Pro 2,6 GHz Intel Core i5 (4 cpus, OS X El Capitan). All runners are the latest versions installed from npm at the time of writing.

RUNNER               VERSION
mocha 3.4.2
mocha.parallel 0.15.2
mocha-parallel-tests 1.2.9
jasmine 2.6.0
tape 4.6.3
qunit 2.3.3
lab 13.1.0
tap 10.3.3
jest 20.0.4
ava 0.19.1

For sure, if you run benchmark on your machine — you will get different absolute values. But relative results should be similar. If you get quite different picture feel free to comment.

Ready, Steady, Go!

The competition consists of two major groups:

  • synchronous tests
  • asynchronous tests

Synchronous tests

Actually such tests are empty functions:

it('test', function() {});

Let’s start with the simplest case — synchronous tests with no nested suites and no hooks. All runners can run such tests. The final chart is:

There are top 7 runners with very close time within 0.5 second. Jasmine is the leader with 215 ms. And the other 3 runners (Jest, Tap and AVA) are several times slower. I guess the reason of slowness is that both Jest and AVA are doing Babel transpiling by default. It takes considerable time.

Let’s do the next run where all participants will perform Babel transpiling. Some runners are excluded as they do not support it out-of-box:

Execution time expectedly increased for all runners. And the leadership was taken by lab. But AVA still 3x–5x slower than others (even when I set concurrency=4 that equals to 4 cores of my machine as suggested here).

I decided to arrange individual run for AVA with different concurrency on the same 50 test-files:

The fastest result is again about ~9 seconds. I’ve looked at several performance related issues in AVA repository. It seems the main “time-eater” is the forking of Node.js process. 
Also the chart shows that default run of AVA is not optimal — it does not set default concurrency. For the best performance you should set it manually depending on your cpus count.

The last run for synchronous tests is with nested suites and hooks. Some people love nested suites and some consider it redundant for unit testing. Some runners support it and some do not. Such benchmark should help to choose runner if you are from “nested-suites-camp”. Each test-file contains 2 nested suites with 5 tests per suite — totally 50 files, 500 tests:

describe('suite 0', function () { 
describe('nested suite 0', function () {
// 5 tests
});
describe('nested suite 1', function () {
// 5 tests
});
});

The result:

Basically nesting of suites does not have a big performance impact. The result is very similar to the first chart — Jasmine, QUnit and Mocha are the leaders.

Asynchronous tests

Each asynchronous test is empty function wrapped into setTimeout:

it('test', function(done) { setTimeout(done, 0); });

Asynchronous tests can be executed by runner in parallel due to async nature of Node.js. But not all runners support it. For example, Mocha does not support parallel test execution. That’s why I’ve included mocha.parallel and mocha-parallel-tests into the benchmark: they wrap original Mocha and allow to run tests in parallel.

The first run is all tests with zero delay and no suites / no hooks:

Lab is the fastest again. Mocha.parallel and mocha-parallel-tests look very good, but not too far from original Mocha. The picture is similar to synchronous case except QUnit became significantly slower.

But real-life tests do not have constant zero delay — actually they take some time. To emulate that I could simply insert Math.random() delay into each test. But this is incorrect approach: random values will differ from runner to runner and the benchmark will not be fair. For true emulation I’ve pre-generated the sequence of random numbers in range of 0–10. Then I used these numbers as delays in test-files for each runner. This ensures the identity of benchmarks.

Let’s look at the results:

Here mocha-parallel-tests is the winner. It is 2x faster than the nearest lab and 4x faster than Mocha. Other runners lined up in similar order as in previous zero delay run.

And the final loop. Keep the random delay but with nested suites and hooks:

The result is very interesting! Mocha-parallel-tests executed the job within a second while other runners took 10 seconds, 18 seconds and 61 seconds. To be sure I’ve re-checked several times — the result is persistent. Mocha-parallel-tests definitely deserves attention. 
Also this is the first case where Jest is faster than Mocha and Jasmine. Despite the fact that Jest performs Babel transpiling. It proves that parallelization and utilizing all cpus are very important for testing real asynchronous code.

The finish

While runners are taking a breath I’ll make a short conclusions:

  1. If you need just fast and simple runner for synchronous unit-testing — the one of time-proven Jasmine / Mocha / tape and a bit younger lab is a good choice.
  2. For asynchronous tests you may consider Mocha wrappers like mocha.parallel and mocha-parallel-tests. Lab also shows pretty results.
  3. Trendy Jest and AVA are slower in common. But in contrast they suggest many additional features and can significantly improve your testing experience.
  4. The benchmark itself is also the tool that can be improved. Each runner offers some additional options for performance tuning. If you know how to boost it— feel free to share. I’ve published all the benchmark code on GitHub. You can play with it locally and build your own charts. I believe it will help other developers to make the right choice and save more testing time. As the time is one of the main treasures in our life!

Thanks for reading and happy testing!