V8 Bug Hunting Part 2: Memory representation of JS types

Disclaimer: This is not really a write-up, but it’s just me taking notes. The content may chaotic, wrong or there may be better ways to do things. If you notice that, please point it out.

<<< V8 Bug Hunting Part 1: Setting up the debug environment

Intro

After setting up the debug environment, I was ready to play around with V8. The first thing I wanted to look at is how various JS types are represented in memory. JS types are split into two main categories: primitive types (i.e. numbers, strings, boolean…) and objects. We’ll have a look at both categories.

Numbers

In JavaScript, all numbers are doubles (64 bit floats), which is the “number” primitive type. That’s what the specs tells us. But how does V8 implement them? In most cases, when using numbers in our code, we just use integers. We could just represent them as floats, but wouldn’t it be wasteful?

Let’s first have a look at a screenshot from my previous blog post:

There I have created an array like a=[1,2,3,4,5,6,7,8,9] and have identified it in memory. I have selected what I believed is the integer 2 from the array. I thought that would make sense considering that I’m on a 64 bit system and that memory is little-endian.

And although in C terms that may make sense, this is not how higher level languages like JavaScript work. JavaScript is a dynamically typed language, which means a variable can contain any type, and in order to be able to do that, JavaScript needs to store some type information with the variable.

JavaScript Core (WebKit JS engine) uses NaN-boxing to store both type information and variable value inside a 64-bit float. V8 uses tagged pointers to do that. Because of how memory is aligned, pointers usually point to memory locations that are multiples of 4 or 8 bytes. That means that the last 2–3 bits of a pointer will always be 0 and will never be used. V8 will make use of that and will encode some type information inside the last bit. That would work like this:

For pointers:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1

For pointers, V8 will always set the last bit to 1. If that bit is 1, it means we are dealing with a pointer. That also means that before using that pointer, we need to clear that last bit (set it to 0) because it was set to 1 just to mark that variable as a pointer.

For small integers (SMI):

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0

For small integers (SMI), the last bit will be set to 0. That means that small integers are 31bit long on 32 bit systems.

On 64bit systems it works slightly different - a SMI will be 32 bits and the lower 32 bits will always be set to 0:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx00000000000000000000000000000000

But the principle is the same: if the last bit of a memory location is a 1, we are dealing with a pointer, and if it’s 0, we are dealing with a SMI.

Let’s try that out by creating the following array and doing a Debug Print on it in D8:

a=[5,"test"]

Notice how all pointers in the debug output here have the last bit set to 1. If we want to use some of these pointers, we would have to clear the last bit of it, i.e. the address of elements of this array will become 02ae456d6e98 (instead of 02ae456d6e99). Let’s go to that address in WinDbg and inspect the elements:

There we’ll find our SMI 5 and right below it the pointer to our string “test”. The SMI is encoded exactly how we explained earlier. If we take in mind that memory is little-endian, our SMI looks like this in hex:

00 00 00 05 00 00 00 00

That means, lower 32 bits are set to 0, and higher 32 bits are used as the SMI value.

Perfect! :) Let’s now try to add an integer larger than 32 bits and a float to the array to see what will happen:

We see that both numbers have been stored on the heap. If we follow one of these pointers, we will see two things: 1) a pointer to a map (more on maps later), and the value of the number/float in IEEE 754 encoded form:

You can convert those values back to decimal form by using any IEEE 754 converter (64 bit in my case).

Strings and other primitive types

Similar to floats, strings will be stored in a similar way: a pointer pointing to a memory location containing two things: 1) a map describing the variable and 2) the value of the variable itself. Some primitives like null or undefined that do not contain any value will only be represented as a pointer to a map.

It is worth mentioning that although we talked about some types like strings and numbers as primitive types, it’s also possible to create them as objects:

var a = "test"; //primitive type
var o = new String("test"); //object

So let’s talk about objects now.

Objects

Let’s create an object and do a Debug Print for it:

var o={color:"yellow",shape:"round"};

There are quite a few new things here, but let’s try to load the memory address of that object in WinDbg because the only goal for now is to understand how it’s laid out in memory:

To make sense of this, let’s first understand how a JS object is supposed to look like in memory:

++++++++++++++++++++++++
+ JS OBJECT +
++++++++++++++++++++++++
| Map |
------------------------
| Properties |
------------------------
| Elements |
------------------------
| In-Object Property 1 |
------------------------
| In-Object Property 2 |
------------------------
| ... |
------------------------

That means, the first pointer at the memory location of the object is a pointer to the Map of that object. A Map is like hidden class that describes the layout of the object, mapping of property names to offsets, and few other things, like the pointer to the prototype of the object. Maps would be a topic on it’s own and I haven’t dug into that yet, but the memory layout of a Map itself and some other details can be found in the V8 source code: https://github.com/v8/v8/blob/master/src/objects/map.h

The second pointer is a pointer to the properties of that object. There are three types of object properties in V8:

  1. Very fast properites
  2. Fast properties
  3. Slow properties

In case an object has just a few properties, V8 will just place them inside the object itself directly (In-Object properties in the Object graph above), and those would be very fast properties. In case there are too many properties, V8 will use that second pointer in the Object to point to an array that contains the other properties, which could in that case be Fast or Slow. I won’t dig into the details now, but more details can be found here: https://v8.dev/blog/fast-properties

The third pointer in the Object is a pointer to the elements of the object. Elements are like properties but that have numeric names, i.e. {1:”green”, 2:”red"}, which would be used in arrays.

After that pointer come all the very fast properties, and if you compare the 4th memory location/line in the memory layout in WinDbg from the screenshot above, you will see that it contains the address/pointer to the string/value of our first property.

So far so good. Off to figuring out what to explore next :)

Resources

https://github.com/thlorenz/v8-perf/blob/master/data-types.md

https://stackoverflow.com/questions/7413168/how-does-v8-manage-the-memory-of-object-instances

https://v8.dev/blog/fast-properties

https://javascript.info/primitives-methods

https://www.youtube.com/watch?v=5nmpokoRaZI