Testing the V8 JavaScript Engine

Published in

Compilers

11 min readSep 27, 2020

I’m a compiler enthusiast, who has been learning how the V8 JavaScript Engine works. Of course, the best way to learn something is to write about it, so that’s why I’m sharing my experiences here. I hope this might be interesting to others too.

Given how widespread the V8 JavaScript Engine has become, being a major part of Google Chrome, Microsoft Edge, and NodeJS, it’s obviously important to test it carefully. In this blog post, I’ll summarize the different test suites included with the V8 source code.

If you’re following along at home, you’ll find these test suites in the v8/test directory within the V8 source code repository. Each subdirectory within v8/test is considered a test suite if it contains a testcfg.py file (not all of them do), although I excluded a few suites that don’t seem to do much. Each suite can be invoked with the ./tools/run-tests.py command.

% ls -1 test/*/testcfg.pytest/benchmarks/testcfg.py
test/cctest/testcfg.py
test/debugger/testcfg.py
...
test/test262/testcfg.py
test/unittests/testcfg.py
test/wasm-api-tests/testcfg.py
test/wasm-js/testcfg.py
test/wasm-spec-tests/testcfg.py
test/webkit/testcfg.py% ./tools/run-tests.py --outdir=out/x64.release benchmarksBuild found: /Users/peter_smith/CompilerProjects/v8/out/x64.release
>>> Autodetected:
pointer_compression
>>> Running tests for x64.release
>>> Running with test processors
[00:06|% 100|+  55|-   0]: Done                                               
>>> 55 base tests produced 55 (100%) non-filtered tests
>>> 55 tests ran

We’ll do a quick tour of all 15 test suites in the v8/test directory:

benchmarks — Standard performance tuning benchmarks.
test262 — Conformance tests against the ECMAScript specification.
mjsunit — Unit tests written in JavaScript.
cctest/ unittests— C++ unit tests for internal V8 classes.
fuzzer — Input fuzzer tests providing invalid input, possibly crashing V8.
intl — Tests for Internationalization features of ECMAScript.
message — Validates error messages produced by invalid JavaScript code.
webkit — Test cases borrowed from the WebKit JavaScript Engine.
mozilla —Test cases borrowed from the Mozilla JavaScript Engine.
wasm-js — Validation of WebAssembly, using the JavaScript API.
wasm-api-tests — Validation of WebAssembly, using the C++ API.
wasm-spec-tests — Conformance to the WebAssembly specification.
inspector — Validates the V8 inspector interface (for debugging)
debugger — Validates the built-in debugger command.

If you refer to my previous blog post on building V8 from source code, you’ll know that run-tests.py is invoked by the gm.py build script. All of the test suites depend on binary executables first being compiled. Many suites use the d8 executable (a simple JavaScript command shell) for executing JavaScript programs and validating the results. However, other test suites such as the code-level unit tests, require a special purpose test driver.

Let’s dig into the detail…

Test Suite: benchmarks

Run time: 32 seconds (single threaded with run-tests.py -j 1 on a 2015 MacBook Pro)
Test binary: d8

This first test suite is focused on three performance-tuning benchmarks. The goal of each benchmark is to provide a comparison between different JavaScript engines when faced with typical code scenarios, such as processing JSON input, decompressing data, or rendering graphics. The competing JavaScript engines (such as V8, JavaScriptCore, or SpiderMonkey) are evaluated side-by-side to show how quickly they can compile and evaluate each benchmark. As a result, a lot of time has been spent on optimizing V8 to out-perform the competing JavaScript engines.

Unfortunately, experience shows that relying too much on specific benchmarks leads to over-fitting of the optimizations, with too much emphasis placed on the exact benchmark code. More recently, effort has been put into optimizing for real-world scenarios that are more representative of a web browser’s overall workload. For example, by observing the loading time for common applications such as Facebook, or Google Maps, optimizations will be more applicable to everyday use.

Inside the v8/test/benchmarks directory, there is code for three important benchmarks, each having their own unique origin:

The “SunSpider” Benchmark — Originally created by Apple in 2007, as part of their WebKit project, the SunSpider benchmark focuses on intensive algorithms such as cryptography, string manipulation, and ray tracing. According to their website, this benchmark is no longer supported (as of 2015) and has been replaced by the JetStream benchmark.
The “Kraken” Benchmark — Created as part of the Mozilla project in 2010, the Kraken benchmark also focuses on complex algorithms that were extracted from real-world workloads (but are not the full workload itself). Kraken still appears to be maintained, and can even be executed in your browser.
The “Octane” Benchmark — First released by Google in 2012, and then retired in 2017, the Octane benchmark similarly focuses on computationally complex algorithms. It can also be executed inside a browser.

Let’s now look at how V8 is tested for conformance against the ECMAScript specification.

Test Suite: test262

Run time: 37 minutes (also single threaded, on a 2015 MacBook Pro)
Test binary: d8

The official name for the JavaScript language is ECMAScript, where JavaScript is more of a marketing name. The ECMA262 standard provides an exact specification of the language and standard libraries. All JavaScript engines are required to conform to this standard, with all major browser vendors being involved in the ECMA262 committee (known as TC39). Obviously, loading a web page into Chrome must have the same effect as loading it into Firefox, Safari, or Edge, so conforming to this specification is vital.

To help with this conformance, a test suite known as Test262 has been created. This is a browser-agnostic test suite maintained by supporters of the ECMA262 standard. Test262 contains test cases to validate the ECMAScript language and libraries, the Internationalization API, and the JSON Data Interchange Format. Although the maintainers claim there’s always room for improvement, Test262 does an excellent job of validating conformance to ECMA262.

At the language level, every aspect of the specification is covered, including grammar definition, expressions, statements, modules, and pretty much everything else. To be specific, Test262 contains 43665 individual JavaScript test files, resulting in 74677 test cases that are run through the d8 interpreter. On my machine, executing these test cases took 37 minutes, using a single CPU core.

As an example, the following test case validates the new Optional Chain feature in JavaScript. In the header comment, a reference is made to the exact part of the ECMAScript specification, showing how optional chains can appear within loops (in this case, within a for-in statement):

/*---
esid: prod-OptionalExpression
description: >
  optional chain in test portion of do while statement
info: |
  IterationStatement
    for (LeftHandSideExpression in Expression) Statement
features: [optional-chaining]
---*/const obj = {
    inner: {
        a: 1,
        b: 2
    }
};let str = '';
for (const key in obj?.inner) {
    str += key;
}assert.sameValue('ab', str);

When this .js file is passed into V8 (specifically the d8 executable), the code snippet executes, with the final assert.sameValue validating whether the behaviour was correct or not.

Test Suite: mjsunit

Run time: 3 minutes 37 seconds
Test binary: d8

The mjsunit suite is similar to Test262, although was specifically written for V8 rather than being browser-agnostic. There are 5068 test cases implemented in .js or .mjs files, taking several minutes to execute.

As an example, the function-arguments-duplicate.js file validates V8’s behaviour when the same parameter name is used twice in the same function.

function f(a, a) {
  assertEquals(2, a);
  assertEquals(1, arguments[0]);
  assertEquals(2, arguments[1]);
  assertEquals(2, arguments.length);
  %HeapObjectVerify(arguments);
}f(1, 2);

The familiar assertEquals function is available (as are many other matchers), and this particular code shows the %HeapObjectVerify function which is built-in to V8, but not very well documented.

Test Suite: cctest

Run time: 5 minutes 40 seconds
Test binary: cctest

This test suite contains almost 7000 unit tests, spread across 246 C++ files. These are code-level unit tests directly invoking methods within the V8 core. As such, a special cctest executable is first compiled, which is in contrast to other test suites relying on the d8 executable to parse and validate JavaScript code.

Each C++ file has one or more test methods, each conforming to theTEST(testName) signature. Methods call the V8 internal classes, then use macros such as CHECK, CHECK_EQ, orCHECK_GE to validate the results. For example, in test-heap.cc:

TEST(InitialObjects) {
   LocalContext env;
   HandleScope scope(CcTest::i_isolate());
   Handle<Context> context = v8::Utils::OpenHandle(*env);   // Initial ArrayIterator prototype.
   CHECK_EQ(
      context->initial_array_iterator_prototype(),
      *v8::Utils::OpenHandle(*CompileRun("[][Symbol.iterator]
      ().__proto__")));   ...
   
   // Initial Object prototype.
   CHECK_EQ(context->initial_object_prototype(),
      *v8::Utils::OpenHandle(*CompileRun("Object.prototype")));
}

If a test fails, a stack trace is displayed, making it easy to debug the problem.

The run-tests.py script allows invocation of individual test cases. For example: ./tools/run-tests.py cctest/test-code-pages/* runs the seven test methods in test-code-pages.cc, whereas ./tools/run-tests.py cctest/test-code-pages/OptimizedCodeWithCodePages invokes only that single test case.

These unit tests are clearly designed with developers in mind. Each test provides concise examples of how to call the V8 APIs, as well as the internal methods and data structures. I’ve found these particular test cases to be invaluable for learning the V8 internals. I’m sure I’ll be writing more about them in the future.

Test Suite: unittests

Run time: 2 minutes 58 seconds
Test binary: unittests

The unittests suite is very similar to the cctest suite, providing 3763 test cases spread across 237 different C++ source files. It’s actually not clear if there’s any fundamental difference between the two suites, although perhaps the distinction is purely historical.

Test Suite: fuzzer

Run time: unknown
Test binary: many (see below)

The fuzzer test suite allows for fuzz testing of the input passed into V8. These tests randomly modify valid JavaScript programs, surgically generating invalid inputs in the hopes of crashing the V8 engine. This doesn’t just cause a JavaScript-level exception, but could instead cause corruption in the actual C++ code, possibly allowing for security breaches.

For example, to identify a potential bug in V8’s expression evaluation, the fuzzer modifies only that part of the code, yet provides valid input for the remainder of the program. Starting with valid code:

function f(a, b) {
  console.log(a + b)
}

the fuzzer creates an erroneous program:

function f(a, b) {
  console.log(a + %)
}

All portions of this code before, and after, the a + % expression must be valid, otherwise V8 simply rejects the program before reaching the expression evaluation code. For more detail, see how fuzzing is done with Chromium.

This fuzzer test suite generates several different executable programs, to support the range of different input-types that can be fuzzed.

v8_simple_json_fuzzer
v8_simple_multi_return_fuzzer
v8_simple_parser_fuzzer
v8_simple_regexp_builtins_fuzzer
v8_simple_regexp_fuzzer
v8_simple_wasm_async_fuzzer
v8_simple_wasm_code_fuzzer
v8_simple_wasm_compile_fuzzer
v8_simple_wasm_fuzzer

Each of these executables has a main program (a C++ file), taking the fuzzed input and passing it into V8 using the necessary test fixtures, such as providing JSON as input, a regular expression, or WASM code.

int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {   ... code for calling V8 functions that might crash ...   return 0;
}

If the C++ function returns 0, the program ran (or was rejected) correctly, but if the fuzz-attack was successful, the V8 engine would have already crashed.

Clearly this type of testing could take a very long time to execute, especially with all the possible ways of mutating the input. Therefore tests are performed on a large test cluster, running for an extended period of time (many hours or days).

Test Suite: intl

Run time: 9 seconds
Test binary: d8

This test suite, consisting of 218 JavaScript source files, performs validation of the internationalization features of V8. For example, there are test cases for time and date formats, time zone manipulation, character set collation (sort orders), as well as numeric data formats. Each test case uses JavaScript functions such as assertEquals or assertFalse to validate their results.

Test Suite: message

Run time: 8 seconds
Test binary: d8

This test suite provides validation of error messages. Each test case has a single .js (or .mjs) file, and a corresponding .out file sharing the same base file name. For example, here’s the content of arrow-formal-parameters.js, which contains invalid JavaScript code.

(b, a, a, d) => a

and the corresponding arrow-formal-parameters.out file specifies the expected error message when arrow-formal-parameters.js is passed through the d8 interpreter.

*%(basename)s:5: SyntaxError: Duplicate parameter name not allowed in this context
(b, a, a, d) => a
       ^
SyntaxError: Duplicate parameter name not allowed in this context

If the actual output doesn’t match the expected output, the test case is considered a failure. A very simple, yet very effective test suite.

Test Suite: webkit

Run time: 20 seconds
Test binary: d8

This test suite is borrowed from the WebKit project, the basis of the Safari Browser. It consists of 543 JavaScript files (with .js suffix), each paired with a corresponding -expected.txt file. Each JavaScript file is passed through the d8 executable, with the actual console output being captured and compared against the expected output.

Test Suite: mozilla

Run time: unknown
Test binary: d8

This test suite appears to be a clone of the regression testing for the Mozilla JavaScript tests. According to the repository commits, the snapshot taken from Mozilla is at least five years old, possibly even ten years old.

This repository contains 3481 individual .js files, as well as some .java files! After running the test suite, run-tests.py reported that 1921 test were executed, although it also showed a number of test failures, with a total completion of 0%. I suspect this test suite isn’t actively maintained.

Test Suite: wasm-js

Run time: 16 seconds
Test binary: d8

This suite validates the standard WebAssembly object, used for accessing the WebAssembly functionality within V8. There are 94 JavaScript source files, each exercising the WebAssembly object in some way.

Test Suite: wasm-api-tests

Run time: 1 second
Test binary: wasm_api_tests

Similar to the previous test cases, these validate the WebAssembly functionality within V8. However, rather than expressing the tests in JavaScript (using the WebAssembly object), they directly call V8’s C++ API. There are 17 such test cases in this suite.

Test Suite: wasm-spec-tests

Run time: 22 seconds
Test binary: d8

This third WebAssembly-related test suite provides 190 different JavaScript source files, and the same number of matching .wast files (a human-readable WebAssembly format). Here’s an example of this format:

(module
  (memory 1)
  (data (i32.const 0) "abcdefghijklmnopqrstuvwxyz")
  (func (export "8u_good1") (param $i i32) (result i32)
    (i32.load8_u offset=0 (local.get $i))             ;; 97 'a'
  )
  ...
)

Presumably, these test cases are derived from the WebAssembly specification.

Test Suite: inspector

Run time: 12 seconds
Test binary: inspector-test

This suite validates the Inspector Protocol, used by external debuggers (such as Chrome DevTools) to inspect and control the state of the JavaScript engine. There are 282 individual test cases (with .js file extension) paired up with the same number of -expected.txt files. The JavaScript file is executed, and the expected output is compared with the actual behaviour.

For example, here’s the content of the scoped-variables.js test case, showing how snippets of code can be injected into a running V8 engine:

InspectorTest.log('Evaluating \'let a = 42;\'');
var {result:{result}} = await Protocol.Runtime.evaluate({  
    expression:'let a = 42;'});
InspectorTest.logMessage(result);InspectorTest.log('Evaluating \'a\'');
var {result:{result}} = await Protocol.Runtime.evaluate({
    expression:'a'});
InspectorTest.logMessage(result);

The output of this test run is compared against the expected output in scope-variables-expected.txt :

Evaluating 'let a = 42;'
{
    type : undefined
}
Evaluating 'a'
{
    description : 42
    type : number
    value : 42
}

As you’d expect, if the output doesn’t match, the Inspector Protocol has a failure.

Test Suite: debugger

Run time: 18 seconds
Test binary: d8

This final test suite validates the built-in debugger. All 316 JavaScript test files invoke the debugger command (or a similar feature) to halt the program execution. The script then uses the standard debugger features to introspect the state of the program, ensuring that breakpoint debugging works as expected.

Summary

That’s it! A total of 15 different test suites for validating various aspects of the V8 JavaScript engine. Some of the test suites are written in JavaScript, whereas others are written directly in C++. Some of the test suites were written by the V8 maintainers, whereas others were from third-parties. In all, these test suites are a major reason why V8 is such a high-quality and performant product.