# Calculating 1 + 1 in JavaScript — Part 1

Published in

--

I’m a compiler enthusiast who has been learning how the V8 JavaScript Engine works. Of course, the best way to learn something is to write about it, so that’s why I’m sharing my experiences here. I hope this might be interesting to others too (also available in Chinese, thanks to @qqqqqqcy for the translation).

Yes, obviously the answer is that `1 + 1 = 2`, but how does the V8 JavaScript Engine compute this answer?

Stepping back for a moment… one of my favourite interview questions is:

When I type a URL into a web browser, and then press enter, what are all the things that happen to load the web page? Go into as much detail as you like”.

It’s a great question, because it shows the depth and breadth of a person’s understanding, allowing them to illustrate which parts of the process are the most interesting for them.

This is the first in a series of blog posts that will look at everything V8 does when `1 + 1` is entered. To start with, we’ll focus on how V8 stores the `1 + 1` string in its heap memory. It sounds simple, but it’s an entire blog post on its own!

# The Client Application

To compute `1 + 1`, the most likely approach you’d take is to fire up NodeJS, or the Chrome developer console, and simply enter `1 + 1`. That will work, but in order to show the V8 internals, I’ve decided to modify `hello-world.cc`, one of the standard sample applications in the V8 source code.

I took the original code, which printed “Hello World”, and replaced it with the expression `1 + 1`:

`// Create a string containing the JavaScript source code.Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");// Compile the source code.Local<Script> script =     Script::Compile(context, source).ToLocalChecked();// Run the script to get the result.Local<Value> result = script->Run(context).ToLocalChecked();// Convert the result to Number and print it.Local<Number> number = Local<Number>::Cast(result);printf("%f\n", number->Value());`

Take a quick look at this code to get an idea of what it does. The lines of C++ might be fairly cryptic, but the comments will help. For this blog post we’ll focus exclusively on the first statement, allocating a new `1 + 1` string in the V8 heap:

`Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");`

To understand this code, let’s start with the high-level sequence of V8 modules involved. In this diagram, the flow of execution is from left-to-right, with the return value being passed back from right-to-left to be inserted into the `source` variable.

• Application — This represents the client of V8. In our example, it’s the `hello-world.cc` program, although more realistically it could be the entire Chrome Browser, the NodeJS run-time system, or any other piece of software that embeds the V8 JavaScript Engine.
• V8 External API — This is the client-facing API providing access to the functionality of V8. Although it’s implemented in C++, the API is shaped around various JavaScript concepts, such as numbers, strings, arrays, functions, and objects, allowing them to be created and manipulated in various ways.
• Heap Factory — Internal to the V8 Engine (not exposed via the API) is a “factory” for creating various data objects on the heap. Quite surprisingly, the set of factory methods available is very different to what the external API provides, so a lot of translation is done within the API layer.
• New Space — V8’s heap is very complicated, but newly-allocated objects are usually stored in the New Space, often known as the Young Generation. We won’t cover the detail here, but the new space is managed using Cheney’s Algorithm, a well-known algorithm for performing garbage collection.

Let’s now go into more detail on this flow, focusing on:

• How the API layer decides what type of string to create, and where it should be stored within the heap.
• What the string’s internal memory layout will be. This depends on the range of characters appearing in the string.
• How the storage is allocated from the heap. In our example, 20 bytes are required.
• Finally, how the pointer to the string is returned to the application, with the goal of garbage collecting in the future.

# Determining How and Where to Store the String

As mentioned above, there’s a fair amount of translation that must happen between the client application and the heap factory, where the object is actually created. The bulk of this work is performed in `src/api/api.cc`.

`String::NewFromUtf8Literal(isolate, "1 + 1");`

The first argument is for an “Isolate” which is V8’s main internal data structure representing the state of the run-time system, isolated from other possible V8 instances. To understand this, think about having multiple browser windows open, where each window has a completely separate instance of V8 running, each with its own isolated heap. We won’t talk about the `isolate` argument much, other than noting that a very large number of API calls expect this parameter.

The `String::NewFromUtf8Literal` method (see `src/api/api.cc`) starts with basic string length checking, but also decides how to store the string in memory. Given that we only provided two arguments to the call, the third `type` argument defaults to `NewStringType::kNormal`, indicating the string should be allocated as a regular object on the heap. The alternative would have been to pass `NewStringType::kInternalized` indicating that de-duplication of the string is desired. This feature is very useful to avoid storing multiple copies of the same constant string.

The next step is to call the `NewString` method (see `src/api/api.cc`) which invokes `factory->NewStringFromUtf8(string)`. Note that `string` here has been mapped into an internal `Vector` data structure, instead of a regular C++ string, because the heap factory has quite a different set of methods than the external API. This difference will become more apparent later when the return value is passed back to the client application.

Inside `NewStringFromUtf8` (see `src/heap/factory.cc`), a decision is made on the optimal storage format for the string. Naturally, UTF-8 is a convenient format for storing a wide range of Unicode characters, but when only basic ASCII characters are used (such as`1 + 1`) V8 stores the string in “one byte” format. To make this decision, the string’s characters are passed into `Utf8Decoder decoder(utf8_data)` (declared in `src/strings/unicode-decoder.h)` .

Now that we’ve decided to allocate a one-byte string, using the normal (not internalized) approach, the next step is to invoke`NewRawOneByteString` (see `src/heap/factory-base.cc`), where the heap memory is allocated, and the string’s content is written into that memory.

# The String’s In-Memory Structure

Inside V8, our `1 + 1` string is represented as an instance of the `v8::internal::SeqOneByteString` class (see `src/objects/string.h`). If you’re like most object-oriented developers, you’d expect that `SeqOneByteString` would have a number of public methods, as well as several private members, such as an array of characters or an integer storing the string’s length. However, that’s not the case! Instead, all internal object classes are actually just pointers to the heap where that data is stored.

As you can see from the code comment in `src/objects/objects.h`, there are roughly 150 internal classes that have the common parent class of `v8::internal::Object`. Each of these classes consists solely of a single 8-byte value (on a 64-bit machine) referring to the object’s heap location.

Keeping this in mind, here’s what our string looks like in memory:

There are many interesting parts here:

## The SeqOneByteString Object

As mentioned, this is not a fully-fledged string class, but is instead a pointer to the on-heap location of the string’s actual content. On a 64-bit machine, this “pointer” will be an 8-byte `unsigned long`, with the type alias of `Address`. Note that the data on the heap (on the right side of the diagram) is not actually a real C++ object, so there’s no point in treating this `Address` as if it was a pointer to something strongly-typed (such as `String *`)

But, you might be wondering why the additional level of indirection exists in the first place. Why not simply access the heap block directly? This approach makes sense when you consider that garbage collection can result in objects being moved around the heap. It’s important that data can move, without the client application getting confused.

To clarify, in Generational Garbage Collection, objects are first allocated in the Young Generation (New Space), and if they survive long enough, they’ll be moved to the Old Generation (Old Space). To make this work, the garbage collector copies the heap block to the new heap space, then updates the `Address` value to point to the new memory location. Given that the `SeqOneByteString` object itself is still at exactly the same memory address as before, the client software won’t notice the change.

## Compressed Pointer To Map (Bytes 0–3 of Heap Block)

JavaScript is a dynamically-typed language, which means that variables don’t have types, yet the values stored in those variables do have types. The “map” is V8’s way of associating each object in the heap with a description of the object’s data type. After all, if the object wasn’t tagged with its type, the heap block becomes a meaningless sequence of bytes.

We won’t go into much detail about the map for our `1 + 1` string, other than mentioning that maps are also a type of heap object, stored in the Read Only Space. Maps (also known as Shapes or Hidden Classes) can become very complex, although our constant string uses a pre-defined map by calling `read_only_roots().one_byte_string_map()` (see `src/heap/factory-base.cc`).

Interestingly, although this map field is a pointer to another heap object, it cleverly uses Pointer Compression to store a 64-bit pointer value in a 32-bit field.

## Object Hash Value (Bytes 4–7 of Heap Block)

Every object has an internal hash value, but in this example it defaults to `kEmptyHashField` (value of 3) to indicate the hash is not yet computed.

## String Length (Bytes 8–11 of Heap Block)

This is the number of bytes in the string (5).

## The Characters and the Padding (Bytes 12–19 of Heap Block)

As you’d expect, the five single-byte characters are stored next. Additionally, to ensure that future heap objects are aligned based on the CPU’s architecture requirements, an additional three bytes of padding are added (aligning the object to a 4-byte boundary).

# Allocating Memory From the Heap

We briefly mentioned that the factory class allocates a block of memory from the heap (20 bytes in our case), then fills that block with the object’s data. One remaining question is how that 20 bytes is allocated.

In Cheney’s algorithm for garbage collection, the Young Generation (New Space) is divided into two semi-spaces. To allocate a block of memory in the heap, the allocator determines if there are enough free bytes between the `limit` of the current semi-space, and the current `top` of that semi-space. If there’s enough room, the algorithm returns the address of the next block, then increments the `top` pointer by the requested number of bytes.

This basic case is shown here, showing the before and after states of the current semi-space:

If the current semi-space were to run out of free memory (`top` and `limit` get too close), then the collection portion of Cheney’s algorithm starts. Once collection is complete, all the live objects will have been copied to the beginning of the second semi-space, and all dead objects (remaining in the first semi-space) will be discarded. No matter what, a semi-space is guaranteed to have all its used space at the bottom, and all its free space at the top, so it’ll always look like the above diagram.

In our case though, there’s plenty of free memory in the current semi-space, so we carve off 20 bytes, then increase the `top` pointer. There’s no need for garbage collection, and the second semi-space isn’t involved. In the V8 code, there are numerous special cases to consider, but the final allocation of 20 bytes is handled by the `NewSpace::AllocateFastUnaligned` method in `src/heap/new-spaces-inl.h`.

# Returning a Handle

Now that we have a pointer to a heap block, fully populated with the string’s content (including length, hash, and map) the pointer must be returned to the client application. If you recall, the client invoked this line of code:

`Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");`

But, what exactly is the type of `source`, and what does `Local<String>` actually mean? There are two key observations here:

## Translating Internal to External Classes

First, it’s interesting to recall that V8 stored our string object using the `v8::internal::SeqOneByteString` class, which is simply a pointer to the data on the heap. However, the client application expects the data to be of type `v8::String`, which is part of the V8 API.

What may surprise you is that `v8::internal::SeqOneByteString` (a subclass of `v8::internal::String`) is in a completely different class hierarchy than `v8::String`. In fact, all of the internal classes are defined in the `src/objects` directory using the `v8::internal` namespace, whereas the external classes are defined in `include/v8.h` using the `v8` namespace.

Revisiting the `NewFromUtf8Literal` method we discussed earlier (see `src/api/api.cc`), the very last step before returning the object pointer to the client application is to cast the result from a `v8::internal::String` to a `v8::String`.

`return Utils::ToLocal(handle_result);`

This conversion magic is done using macros defined in `src/api/api-inl.h`.

## Managing the “Roots” for Garbage Collection

Second, let’s discuss what `Local<String>` means (which incidentally is an abbreviation for `v8::Local<v8::String>`). The Local concept is how we deal with garbage collection of the string object when it’s no longer needed.

As any JavaScript developer will know, objects are garbage collected when there are no remaining references to them. The collection algorithm starts at the “roots”, then traverses the entire heap to find all reachable objects. A root is a non-heap reference, such as a global variable, or a stack-based local variable that’s still in scope. If these variables are assigned new values, or if they were to go out of scope (their enclosing function ends), the data they once pointed to is potentially now garbage.

In the case of the`hello-world.cc` program, we also have pointers on the C++ stack that can refer to heap objects. These have no corresponding JavaScript variable name, since they only exist in the context of the C++ application (such as `hello-world.cc`, or Chrome, or NodeJS). For example:

`Local<String> source = ...`

In this case, `source` is a reference to a heap object, although now with an additional level of indirection. This diagram will explain:

On the left side is the C++ stack, which grows from top to bottom as the program executes, and the right side is the heap memory we saw earlier. As the client application executes, it pushes a `HandleScope` object onto the local C++ stack (see `src/samples/hello-world.cc`). Next, the return value from calling `String::NewFromUtf8Literal()` is stored on the C++ stack as a `Local<String>` object.

It looks like we’ve add yet another level of indirection, but there are benefits of doing this:

• Finding roots is easier — The `HandleScope` object is a place to store “Handles” (aka pointers) to heap objects. As you recall, this is exactly what our `SeqOneByteString` object was, an 8-byte pointer to the underlying heap data. When garbage collection is initiated, V8 quickly scans the `HandleScope` object to find all the root pointers. It can then update those pointers if the underlying heap data is moved.
• Local pointers are easy to manage — In contrast to `HandleScope` which is quite large, the `Local<String>` object is an 8-byte value on the C++ stack, which can be used in the same context as any other 8-byte value, such as pointers or integers. In particular, it can be stored in CPU registers, be passed to functions, or provided as return values. What’s notable is that the garbage collector is not required to locate or update these values when garbage collection occurs.
• Eliminating scopes is easy — Finally, when the C++ function in the client application finishes, the `HandleScope` and `Local` objects on the C++ stack are removed, but only after their C++ object destructors have been called. These destructors remove all the handles from the garbage collector’s list of roots. They’re no longer in scope, so the underlying heap objects may have become garbage.

To conclude the story, the `source` variable, referring to our `1 + 1` string is now ready to be passed to the next line in our client application:

`Local<Script> script =     Script::Compile(context, source).ToLocalChecked();`

# Next Time…

There was clearly a lot of work to allocate the `1 + 1` string on the heap. Hopefully it illustrated some parts of V8’s internal architecture, as well as how data is represented in different parts of the system. In future blog posts, I’ll look more into how our simple expression is parsed and executed, which will expose a lot more about how V8 operates.

In Part 2 of this blog post series, I’ll dig into how the Compilation Cache works, to avoid compiling code more than necessary.

--

--