If you haven’t read it, it’s important you start with Part 1, although it’s also recommended you come with a passion for compiler technology.
Last time we saw our example client program, including how it calls C++ methods in the standard V8 libraries. Our program stores the literal string of
SeqOneByteString object), then compiles the expression to byte code, executes that byte code, then displays the result on the console.
Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");// Compile the source code.
Local<Script> script =
Script::Compile(context, source).ToLocalChecked();// Run the script to get the result.
Local<Value> result = script->Run(context).ToLocalChecked();// Convert the result to Number and print it.
Local<Number> number = Local<Number>::Cast(result);
In the first blog post we traced the full
String::NewFromUtf8Literal() method, so this time we’ll continue with the
Local<Script> script =
Script::Compile() is responsible for a large number of activities:
- Checking the Compilation Cache to see if the same script had already been compiled before. This saves us from repeatedly generating byte codes for commonly used scripts.
- Scanning the input string into Tokens. As we’ll see,
1 + 1is converted to a sequence of token values:
Token::ADD, and then a second
- Parsing the tokens into an Abstract Syntax Tree (AST), providing an in-memory view of the program.
- Generating the corresponding V8 byte codes, while performing some amount of optimization.
The return value from
Script::Compile() is a
Local<Script> handle, referring to byte codes to be executed by the V8 virtual machine. For now though, we’ll focus exclusively on the first step above. That is, checking if the compiled code is already available in a cache.
Approach 1 — The Per-Isolate Cache
When a script is submitted to an Isolate for compilation, the source code string (such as
1 + 1) is used as a key for an in-memory hash table. If that exact source code had been compiled before, a
SharedFunctionInfo object, containing the script’s byte codes, is read from the cache and returned to the caller. However, if there’s a cache miss, the script must be compiled from scratch, with the generated byte codes inserted into the cache for next time.
The per-Isolate cache (in the
CompilationCache class, see
src/codegen/compilation-cache.h) is not just a simple hash table, but has a number of features catering for different types of script. For example, the
CompilationCacheScript class. In contrast, the
eval() function, delegating their work to the
CompilationCacheEval class. Likewise, there are sub-caches for regular expressions (regexes) and other code objects.
In addition, each sub-cache in the the per-Isolate cache has multiple generations, allowing older cached items to be aged out over time if they haven’t been used recently. There has clearly been a lot of thought and optimization put into the design of this in-memory cache system.
Here’s an example of how the compilation cache is laid out in memory, showing the hierarchical hash tables:
To see the per-Isolate cache in action, enter
1 + 1 into the
d8 interpreter multiple times:
$ ./out/x64.debug/d8 --print-bytecode
V8 version 8.8.0 (candidate)
d8> 1 + 1
... lots of output given, including byte codes ...
d8> 1 + 1
d8> 1 + 1
As expected, large amounts of compilation output is generated the first time (thanks to the
--print-bytecode flag), but no byte codes are generated the second (or third) times. In contrast, if you were to specify the
--no-compilation-cache command-line flag, you’ll instead see the code being recompiled every time.
Approach 2 — Caching Byte Codes in the Embedder
There are several limitations of the per-Isolate cache mentioned above. In particular, an in-memory cache will not survive when the application restarts (such as shutting down your browser). Additionally, the cache is not shared between different instances of V8, implying that a web page loaded in one browser tab does not share the cache with other browser tabs.
To solve these issues, a second type of cache is available. As discussed in Code caching, the application can request that V8 provide a serialized version of the compiled code, which is saved in the application’s own cache (such as the Chromium browser cache). This serialized data is passed back to the application using the
GetCacheData() method of the
Source object (see
include/v8.h) then saved in the application’s own cache.
When the application attempts to compile the same script again, such as when a web page downloads the same
.js file multiple times, the browser passes the
CachedData back to V8 to avoid regenerating the byte codes.
The clear advantage is that code can be cached for long periods of time, even if the application restarts. However, the downside is that byte codes must be serialized (see the
CodeSerializer class) from V8’s in-memory format to a sequence of bytes more suited for on-disk storage. At a later point in time, this serialized data must be deserialized again before it can be executed. All of this requires extra time, slightly negating the value of caching byte codes in the first place.
Because of this extra overhead, V8 only serializes the data the second time it’s compiled, ensuring it’s not just a one-time script that will never be seen again. Also, V8 defers that serialization work until after the code has been executed, ensuring the serialization does not diminish the user’s experience.
Tracing the Code — Making the Cache Decision
To see how these caching techniques fit into our big picture of computing the
1 + 1 expression, let’s walk through the full code path. As mentioned earlier, we start by calling upon V8’s
Script::Compile() method with
1 + 1 as an input parameter. Although this method initiates the entire compilation process, we’ll only look at how the caching mechanisms are involved.
Local<Script> script =
As we saw in the first blog post, this calls into V8’s API layer (see
src/api/api.cc) to validate the input arguments, add a few more important values (such as the pointer to the
Isolate object), as well as translate between external
Local handles and their corresponding V8 internal objects.
Before too long, we reach the
Compiler::GetSharedFunctionInfoForScript() method, which is where the caching decisions are made (see
src/codegen/compiler.cc). Here are the basic steps that are followed:
- Line 2647 — One of the parameters for
compile_options, specifying how the embedder cache should be used. If the caller passes
kConsumeCodeCacheas the value for
compile_option, V8 is asked to consider using the serialized byte codes that were saved in the embedder’s cache (available in the
cached_dataparameter). In our case though, this defaulted to
kNoCompileOptions, indicating that no serialized data is available.
- Line 2655 — For tracking purposes, we record the number of bytes loaded and compiled for this isolate. There are 5 bytes in
1 + 1.
- Line 2676 — Regardless of whether the embedder provided a
cached_dataparameter, we check whether the source code is already cached in the per-Isolate cache. This dives into the
CompilationCache::LookupScript()method, which delegates to the
CompilationCacheScript::LookUp()method in the “script” sub-cache. Eventually, that code performs a lookup in the multi-generational hash table. Checking this cache (even if we were passed
cached_databy the embedder) is super fast, given that the byte codes are already in V8’s memory.
- Line 2686 — Given that our
1 + 1script had not previously been compiled and cached in V8’s memory, we now consider using
cached_datafrom the embedder. In our example though, we weren’t given any
cached_databy our embedder (our simple example program), so neither of the caches provide a hit. However, if
cached_datahad been provided, we’d need to deserialize it into the in-memory format, and then insert it into V8’s per-Isolate cache.
- Line 2727 — Given that neither of our caches contained the pre-compiled byte codes for
1 + 1, we now proceed to compile the source code. This is done by the
CompileScriptOnMainThread()method. As we’ll see in the next blog post, this is where all the complexity of scanning, parsing, and code generation takes place.
- Line 2736 — If the compilation was successful, the
SharedFunctionInfoobject (containing the generated byte codes) is inserted into the per-Isolate cache, ready for the next time that
1 + 1is evaluated.
In Part 3 of this blog post series, we’ll continue by tracing further along into the
Script::Compile() method. That is, we’ll learn more about how V8’s lexical scanner reads a sequence of input characters (in our case,
1 + 1) and forms them into tokens to use as input into the parsing process.