Calculating 1 + 1 in JavaScript — Part 1

Published in

Compilers

12 min readNov 13, 2020

I’m a compiler enthusiast who has been learning how the V8 JavaScript Engine works. Of course, the best way to learn something is to write about it, so that’s why I’m sharing my experiences here. I hope this might be interesting to others too (also available in Chinese, thanks to @qqqqqqcy for the translation).

Yes, obviously the answer is that 1 + 1 = 2, but how does the V8 JavaScript Engine compute this answer?

Stepping back for a moment… one of my favourite interview questions is:

“When I type a URL into a web browser, and then press enter, what are all the things that happen to load the web page? Go into as much detail as you like”.

It’s a great question, because it shows the depth and breadth of a person’s understanding, allowing them to illustrate which parts of the process are the most interesting for them.

This is the first in a series of blog posts that will look at everything V8 does when 1 + 1 is entered. To start with, we’ll focus on how V8 stores the 1 + 1 string in its heap memory. It sounds simple, but it’s an entire blog post on its own!

The Client Application

To compute 1 + 1, the most likely approach you’d take is to fire up NodeJS, or the Chrome developer console, and simply enter 1 + 1. That will work, but in order to show the V8 internals, I’ve decided to modify hello-world.cc, one of the standard sample applications in the V8 source code.

I took the original code, which printed “Hello World”, and replaced it with the expression 1 + 1:

// Create a string containing the JavaScript source code.
Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");// Compile the source code.
Local<Script> script = 
    Script::Compile(context, source).ToLocalChecked();// Run the script to get the result.
Local<Value> result = script->Run(context).ToLocalChecked();// Convert the result to Number and print it.
Local<Number> number = Local<Number>::Cast(result);
printf("%f\n", number->Value());

Take a quick look at this code to get an idea of what it does. The lines of C++ might be fairly cryptic, but the comments will help. For this blog post we’ll focus exclusively on the first statement, allocating a new 1 + 1 string in the V8 heap:

Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");

To understand this code, let’s start with the high-level sequence of V8 modules involved. In this diagram, the flow of execution is from left-to-right, with the return value being passed back from right-to-left to be inserted into the source variable.

Application — This represents the client of V8. In our example, it’s the hello-world.cc program, although more realistically it could be the entire Chrome Browser, the NodeJS run-time system, or any other piece of software that embeds the V8 JavaScript Engine.
V8 External API — This is the client-facing API providing access to the functionality of V8. Although it’s implemented in C++, the API is shaped around various JavaScript concepts, such as numbers, strings, arrays, functions, and objects, allowing them to be created and manipulated in various ways.
Heap Factory — Internal to the V8 Engine (not exposed via the API) is a “factory” for creating various data objects on the heap. Quite surprisingly, the set of factory methods available is very different to what the external API provides, so a lot of translation is done within the API layer.
New Space — V8’s heap is very complicated, but newly-allocated objects are usually stored in the New Space, often known as the Young Generation. We won’t cover the detail here, but the new space is managed using Cheney’s Algorithm, a well-known algorithm for performing garbage collection.

Let’s now go into more detail on this flow, focusing on:

How the API layer decides what type of string to create, and where it should be stored within the heap.
What the string’s internal memory layout will be. This depends on the range of characters appearing in the string.
How the storage is allocated from the heap. In our example, 20 bytes are required.
Finally, how the pointer to the string is returned to the application, with the goal of garbage collecting in the future.

Determining How and Where to Store the String

As mentioned above, there’s a fair amount of translation that must happen between the client application and the heap factory, where the object is actually created. The bulk of this work is performed in src/api/api.cc.

Let’s start with the client application’s call:

String::NewFromUtf8Literal(isolate, "1 + 1");

The first argument is for an “Isolate” which is V8’s main internal data structure representing the state of the run-time system, isolated from other possible V8 instances. To understand this, think about having multiple browser windows open, where each window has a completely separate instance of V8 running, each with its own isolated heap. We won’t talk about the isolate argument much, other than noting that a very large number of API calls expect this parameter.

The String::NewFromUtf8Literal method (see src/api/api.cc) starts with basic string length checking, but also decides how to store the string in memory. Given that we only provided two arguments to the call, the third type argument defaults to NewStringType::kNormal, indicating the string should be allocated as a regular object on the heap. The alternative would have been to pass NewStringType::kInternalized indicating that de-duplication of the string is desired. This feature is very useful to avoid storing multiple copies of the same constant string.

The next step is to call the NewString method (see src/api/api.cc) which invokes factory->NewStringFromUtf8(string). Note that string here has been mapped into an internal Vector data structure, instead of a regular C++ string, because the heap factory has quite a different set of methods than the external API. This difference will become more apparent later when the return value is passed back to the client application.

Inside NewStringFromUtf8 (see src/heap/factory.cc), a decision is made on the optimal storage format for the string. Naturally, UTF-8 is a convenient format for storing a wide range of Unicode characters, but when only basic ASCII characters are used (such as1 + 1) V8 stores the string in “one byte” format. To make this decision, the string’s characters are passed into Utf8Decoder decoder(utf8_data) (declared in src/strings/unicode-decoder.h) .

Now that we’ve decided to allocate a one-byte string, using the normal (not internalized) approach, the next step is to invokeNewRawOneByteString (see src/heap/factory-base.cc), where the heap memory is allocated, and the string’s content is written into that memory.

The String’s In-Memory Structure

Inside V8, our 1 + 1 string is represented as an instance of the v8::internal::SeqOneByteString class (see src/objects/string.h). If you’re like most object-oriented developers, you’d expect that SeqOneByteString would have a number of public methods, as well as several private members, such as an array of characters or an integer storing the string’s length. However, that’s not the case! Instead, all internal object classes are actually just pointers to the heap where that data is stored.

As you can see from the code comment in src/objects/objects.h, there are roughly 150 internal classes that have the common parent class of v8::internal::Object. Each of these classes consists solely of a single 8-byte value (on a 64-bit machine) referring to the object’s heap location.

Keeping this in mind, here’s what our string looks like in memory:

There are many interesting parts here:

The SeqOneByteString Object

As mentioned, this is not a fully-fledged string class, but is instead a pointer to the on-heap location of the string’s actual content. On a 64-bit machine, this “pointer” will be an 8-byte unsigned long, with the type alias of Address. Note that the data on the heap (on the right side of the diagram) is not actually a real C++ object, so there’s no point in treating this Address as if it was a pointer to something strongly-typed (such as String *)

But, you might be wondering why the additional level of indirection exists in the first place. Why not simply access the heap block directly? This approach makes sense when you consider that garbage collection can result in objects being moved around the heap. It’s important that data can move, without the client application getting confused.

To clarify, in Generational Garbage Collection, objects are first allocated in the Young Generation (New Space), and if they survive long enough, they’ll be moved to the Old Generation (Old Space). To make this work, the garbage collector copies the heap block to the new heap space, then updates the Address value to point to the new memory location. Given that the SeqOneByteString object itself is still at exactly the same memory address as before, the client software won’t notice the change.

Compressed Pointer To Map (Bytes 0–3 of Heap Block)

JavaScript is a dynamically-typed language, which means that variables don’t have types, yet the values stored in those variables do have types. The “map” is V8’s way of associating each object in the heap with a description of the object’s data type. After all, if the object wasn’t tagged with its type, the heap block becomes a meaningless sequence of bytes.

We won’t go into much detail about the map for our 1 + 1 string, other than mentioning that maps are also a type of heap object, stored in the Read Only Space. Maps (also known as Shapes or Hidden Classes) can become very complex, although our constant string uses a pre-defined map by calling read_only_roots().one_byte_string_map() (see src/heap/factory-base.cc).

Interestingly, although this map field is a pointer to another heap object, it cleverly uses Pointer Compression to store a 64-bit pointer value in a 32-bit field.

Object Hash Value (Bytes 4–7 of Heap Block)

Every object has an internal hash value, but in this example it defaults to kEmptyHashField (value of 3) to indicate the hash is not yet computed.

String Length (Bytes 8–11 of Heap Block)

This is the number of bytes in the string (5).

The Characters and the Padding (Bytes 12–19 of Heap Block)

As you’d expect, the five single-byte characters are stored next. Additionally, to ensure that future heap objects are aligned based on the CPU’s architecture requirements, an additional three bytes of padding are added (aligning the object to a 4-byte boundary).

Allocating Memory From the Heap

We briefly mentioned that the factory class allocates a block of memory from the heap (20 bytes in our case), then fills that block with the object’s data. One remaining question is how that 20 bytes is allocated.

In Cheney’s algorithm for garbage collection, the Young Generation (New Space) is divided into two semi-spaces. To allocate a block of memory in the heap, the allocator determines if there are enough free bytes between the limit of the current semi-space, and the current top of that semi-space. If there’s enough room, the algorithm returns the address of the next block, then increments the top pointer by the requested number of bytes.

This basic case is shown here, showing the before and after states of the current semi-space:

If the current semi-space were to run out of free memory (top and limit get too close), then the collection portion of Cheney’s algorithm starts. Once collection is complete, all the live objects will have been copied to the beginning of the second semi-space, and all dead objects (remaining in the first semi-space) will be discarded. No matter what, a semi-space is guaranteed to have all its used space at the bottom, and all its free space at the top, so it’ll always look like the above diagram.

In our case though, there’s plenty of free memory in the current semi-space, so we carve off 20 bytes, then increase the top pointer. There’s no need for garbage collection, and the second semi-space isn’t involved. In the V8 code, there are numerous special cases to consider, but the final allocation of 20 bytes is handled by the NewSpace::AllocateFastUnaligned method in src/heap/new-spaces-inl.h.

Returning a Handle

Now that we have a pointer to a heap block, fully populated with the string’s content (including length, hash, and map) the pointer must be returned to the client application. If you recall, the client invoked this line of code:

Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");

But, what exactly is the type of source, and what does Local<String> actually mean? There are two key observations here:

Translating Internal to External Classes

First, it’s interesting to recall that V8 stored our string object using the v8::internal::SeqOneByteString class, which is simply a pointer to the data on the heap. However, the client application expects the data to be of type v8::String, which is part of the V8 API.

What may surprise you is that v8::internal::SeqOneByteString (a subclass of v8::internal::String) is in a completely different class hierarchy than v8::String. In fact, all of the internal classes are defined in the src/objects directory using the v8::internal namespace, whereas the external classes are defined in include/v8.h using the v8 namespace.

Revisiting the NewFromUtf8Literal method we discussed earlier (see src/api/api.cc), the very last step before returning the object pointer to the client application is to cast the result from a v8::internal::String to a v8::String.

return Utils::ToLocal(handle_result);

This conversion magic is done using macros defined in src/api/api-inl.h.

Managing the “Roots” for Garbage Collection

Second, let’s discuss what Local<String> means (which incidentally is an abbreviation for v8::Local<v8::String>). The Local concept is how we deal with garbage collection of the string object when it’s no longer needed.

As any JavaScript developer will know, objects are garbage collected when there are no remaining references to them. The collection algorithm starts at the “roots”, then traverses the entire heap to find all reachable objects. A root is a non-heap reference, such as a global variable, or a stack-based local variable that’s still in scope. If these variables are assigned new values, or if they were to go out of scope (their enclosing function ends), the data they once pointed to is potentially now garbage.

In the case of thehello-world.cc program, we also have pointers on the C++ stack that can refer to heap objects. These have no corresponding JavaScript variable name, since they only exist in the context of the C++ application (such as hello-world.cc, or Chrome, or NodeJS). For example:

Local<String> source = ...

In this case, source is a reference to a heap object, although now with an additional level of indirection. This diagram will explain:

On the left side is the C++ stack, which grows from top to bottom as the program executes, and the right side is the heap memory we saw earlier. As the client application executes, it pushes a HandleScope object onto the local C++ stack (see src/samples/hello-world.cc). Next, the return value from calling String::NewFromUtf8Literal() is stored on the C++ stack as a Local<String> object.

It looks like we’ve add yet another level of indirection, but there are benefits of doing this:

Finding roots is easier — The HandleScope object is a place to store “Handles” (aka pointers) to heap objects. As you recall, this is exactly what our SeqOneByteString object was, an 8-byte pointer to the underlying heap data. When garbage collection is initiated, V8 quickly scans the HandleScope object to find all the root pointers. It can then update those pointers if the underlying heap data is moved.
Local pointers are easy to manage — In contrast to HandleScope which is quite large, the Local<String> object is an 8-byte value on the C++ stack, which can be used in the same context as any other 8-byte value, such as pointers or integers. In particular, it can be stored in CPU registers, be passed to functions, or provided as return values. What’s notable is that the garbage collector is not required to locate or update these values when garbage collection occurs.
Eliminating scopes is easy — Finally, when the C++ function in the client application finishes, the HandleScope and Local objects on the C++ stack are removed, but only after their C++ object destructors have been called. These destructors remove all the handles from the garbage collector’s list of roots. They’re no longer in scope, so the underlying heap objects may have become garbage.

To conclude the story, the source variable, referring to our 1 + 1 string is now ready to be passed to the next line in our client application:

Local<Script> script = 
    Script::Compile(context, source).ToLocalChecked();

Next Time…

There was clearly a lot of work to allocate the 1 + 1 string on the heap. Hopefully it illustrated some parts of V8’s internal architecture, as well as how data is represented in different parts of the system. In future blog posts, I’ll look more into how our simple expression is parsed and executed, which will expose a lot more about how V8 operates.

In Part 2 of this blog post series, I’ll dig into how the Compilation Cache works, to avoid compiling code more than necessary.