Calculating 1 + 1 in JavaScript — Part 2

Published in

Compilers

8 min readMar 1, 2021

I’m a compiler enthusiast who has been learning how the V8 JavaScript Engine works. Of course, the best way to learn something is to write about it, so that’s why I’m sharing my experiences here. I hope this might be interesting to others too.

This is the second part of a multi-part series describing how the V8 JavaScript Engine calculates 1 + 1. It’s a very simple expression with an obvious answer, but it still requires the full mechanism of scanning and parsing the input string, generating and executing byte codes, then displaying the result, all while maintaining data on the JavaScript heap.

If you haven’t read it, it’s important you start with Part 1, although it’s also recommended you come with a passion for compiler technology.

Last Time…

Last time we saw our example client program, including how it calls C++ methods in the standard V8 libraries. Our program stores the literal string of 1 + 1 in the JavaScript heap (as a SeqOneByteString object), then compiles the expression to byte code, executes that byte code, then displays the result on the console.

// Create a string containing the JavaScript source code.
Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");// Compile the source code.
Local<Script> script = 
    Script::Compile(context, source).ToLocalChecked();// Run the script to get the result.
Local<Value> result = script->Run(context).ToLocalChecked();// Convert the result to Number and print it.
Local<Number> number = Local<Number>::Cast(result);
printf("%f\n", number->Value());

In the first blog post we traced the full String::NewFromUtf8Literal() method, so this time we’ll continue with the Script::Compile() method:

Local<Script> script = 
    Script::Compile(context, source).ToLocalChecked();

Script::Compile() is responsible for a large number of activities:

Checking the Compilation Cache to see if the same script had already been compiled before. This saves us from repeatedly generating byte codes for commonly used scripts.
Scanning the input string into Tokens. As we’ll see, 1 + 1 is converted to a sequence of token values: Token::SMI (small integer), Token::ADD, and then a second Token::SMI.
Parsing the tokens into an Abstract Syntax Tree (AST), providing an in-memory view of the program.
Generating the corresponding V8 byte codes, while performing some amount of optimization.

The return value from Script::Compile() is a Local<Script> handle, referring to byte codes to be executed by the V8 virtual machine. For now though, we’ll focus exclusively on the first step above. That is, checking if the compiled code is already available in a cache.

It shouldn’t come as a surprise that almost all JavaScript code is downloaded multiple times, either in the same browser session, or in different sessions over a period time. To avoid recompiling source code that hasn’t changed, V8 provides two purpose-built cache mechanisms. The first is the per-Isolate cache, storing compiled byte codes directly in V8’s local memory. The second approach allows embedder applications (such as Chromium or NodeJS) to save their own copy of the compiled byte codes, most likely in a disk-based format. Let’s look at each approach.

Approach 1 — The Per-Isolate Cache

The per-Isolate cache is built into V8, and is enabled by default. In V8 terminology, an Isolate is an instance of a JavaScript virtual machine, complete with its own heap memory. When V8 is embedded into applications, such as a browser, it’s common to use different V8 Isolates as a means of separating (aka “isolating”) one JavaScript run-time environment from another. Perhaps the best example is browser tabs, where code running inside one tab must not impact the code in other tabs.

When a script is submitted to an Isolate for compilation, the source code string (such as 1 + 1) is used as a key for an in-memory hash table. If that exact source code had been compiled before, a SharedFunctionInfo object, containing the script’s byte codes, is read from the cache and returned to the caller. However, if there’s a cache miss, the script must be compiled from scratch, with the generated byte codes inserted into the cache for next time.

The per-Isolate cache (in the CompilationCache class, see src/codegen/compilation-cache.h) is not just a simple hash table, but has a number of features catering for different types of script. For example, the LookupScript() and PutScript() methods cache “normal” JavaScript source code, delegating their work to the CompilationCacheScript class. In contrast, the LookupEval() and PutEval() methods manage the cache of JavaScript strings passed into the eval() function, delegating their work to the CompilationCacheEval class. Likewise, there are sub-caches for regular expressions (regexes) and other code objects.

In addition, each sub-cache in the the per-Isolate cache has multiple generations, allowing older cached items to be aged out over time if they haven’t been used recently. There has clearly been a lot of thought and optimization put into the design of this in-memory cache system.

Here’s an example of how the compilation cache is laid out in memory, showing the hierarchical hash tables:

To see the per-Isolate cache in action, enter 1 + 1 into the d8 interpreter multiple times:

$ ./out/x64.debug/d8 --print-bytecode
V8 version 8.8.0 (candidate)
d8> 1 + 1
... lots of output given, including byte codes ...
2
d8> 1 + 1
2
d8> 1 + 1
2

As expected, large amounts of compilation output is generated the first time (thanks to the --print-bytecode flag), but no byte codes are generated the second (or third) times. In contrast, if you were to specify the --no-compilation-cache command-line flag, you’ll instead see the code being recompiled every time.

Approach 2 — Caching Byte Codes in the Embedder

There are several limitations of the per-Isolate cache mentioned above. In particular, an in-memory cache will not survive when the application restarts (such as shutting down your browser). Additionally, the cache is not shared between different instances of V8, implying that a web page loaded in one browser tab does not share the cache with other browser tabs.

To solve these issues, a second type of cache is available. As discussed in Code caching, the application can request that V8 provide a serialized version of the compiled code, which is saved in the application’s own cache (such as the Chromium browser cache). This serialized data is passed back to the application using the GetCacheData() method of the Source object (see include/v8.h) then saved in the application’s own cache.

When the application attempts to compile the same script again, such as when a web page downloads the same .js file multiple times, the browser passes the CachedData back to V8 to avoid regenerating the byte codes.

The clear advantage is that code can be cached for long periods of time, even if the application restarts. However, the downside is that byte codes must be serialized (see the CodeSerializer class) from V8’s in-memory format to a sequence of bytes more suited for on-disk storage. At a later point in time, this serialized data must be deserialized again before it can be executed. All of this requires extra time, slightly negating the value of caching byte codes in the first place.

Because of this extra overhead, V8 only serializes the data the second time it’s compiled, ensuring it’s not just a one-time script that will never be seen again. Also, V8 defers that serialization work until after the code has been executed, ensuring the serialization does not diminish the user’s experience.

Tracing the Code — Making the Cache Decision

To see how these caching techniques fit into our big picture of computing the 1 + 1 expression, let’s walk through the full code path. As mentioned earlier, we start by calling upon V8’s Script::Compile() method with 1 + 1 as an input parameter. Although this method initiates the entire compilation process, we’ll only look at how the caching mechanisms are involved.

Local<Script> script = 
    Script::Compile(context, source).ToLocalChecked();

As we saw in the first blog post, this calls into V8’s API layer (see src/api/api.cc) to validate the input arguments, add a few more important values (such as the pointer to the Isolate object), as well as translate between external Local handles and their corresponding V8 internal objects.

Before too long, we reach the Compiler::GetSharedFunctionInfoForScript() method, which is where the caching decisions are made (see src/codegen/compiler.cc). Here are the basic steps that are followed:

Line 2647 — One of the parameters for GetSharedFunctionInfoForScript() is compile_options, specifying how the embedder cache should be used. If the caller passes kConsumeCodeCache as the value for compile_option, V8 is asked to consider using the serialized byte codes that were saved in the embedder’s cache (available in the cached_data parameter). In our case though, this defaulted to kNoCompileOptions, indicating that no serialized data is available.
Line 2655 — For tracking purposes, we record the number of bytes loaded and compiled for this isolate. There are 5 bytes in 1 + 1.
Line 2659 — We must take JavaScript’s language mode into account, since it impacts code generation and therefore the byte codes that are cached. The options are kSloppy and kStrict, representing the traditional JavaScript syntax, versus the newer strict mode.
Line 2676 — Regardless of whether the embedder provided a cached_data parameter, we check whether the source code is already cached in the per-Isolate cache. This dives into the CompilationCache::LookupScript() method, which delegates to the CompilationCacheScript::LookUp() method in the “script” sub-cache. Eventually, that code performs a lookup in the multi-generational hash table. Checking this cache (even if we were passed cached_data by the embedder) is super fast, given that the byte codes are already in V8’s memory.
Line 2686 — Given that our 1 + 1 script had not previously been compiled and cached in V8’s memory, we now consider using cached_data from the embedder. In our example though, we weren’t given any cached_data by our embedder (our simple example program), so neither of the caches provide a hit. However, if cached_data had been provided, we’d need to deserialize it into the in-memory format, and then insert it into V8’s per-Isolate cache.
Line 2727 — Given that neither of our caches contained the pre-compiled byte codes for 1 + 1, we now proceed to compile the source code. This is done by the CompileScriptOnMainThread() method. As we’ll see in the next blog post, this is where all the complexity of scanning, parsing, and code generation takes place.
Line 2736 — If the compilation was successful, the SharedFunctionInfo object (containing the generated byte codes) is inserted into the per-Isolate cache, ready for the next time that 1 + 1 is evaluated.

So, that’s an overview of the V8 code caching mechanism. If you’re interested, there are several really great articles and presentations from the V8 team on how caching works, including the very comprehensive Code caching for JavaScript developers and BlinkOn 9: Caching (more) JavaScript code in Chrome.

Next Time…

In Part 3 of this blog post series, we’ll continue by tracing further along into the Script::Compile() method. That is, we’ll learn more about how V8’s lexical scanner reads a sequence of input characters (in our case, 1 + 1) and forms them into tokens to use as input into the parsing process.