V8 Internals: How Small is a “Small Integer?”
When binary is useful outside of a coding interview
This article is available in Chinese.
Smis, small integers. This article digs into V8’s source code to discover how large
Smis actually are.
Smis, small integers.
The range for
Smis on 64-bit platforms is -2³¹ to 2³¹-1 (2³¹≈ 2*10⁹). That might not be immediately obvious when you look at V8’s source code.
kSmiMaxValue are defined in include/v8.h as follows:
static const int kSmiMinValue =
(static_cast<unsigned int>(-1)) << (kSmiValueSize — 1);static const int kSmiMaxValue = -(kSmiMinValue + 1);
How is that equal to -2³¹ and 2³¹-1? Let’s dissect the
C++ code into smaller chunks.
<< is a bitwise left shift. Left shift means that we shift the binary representation of a number to the left by filling up the right side with zeros. For example,
5 << 3 = 40.
You might notice that left shift for positive numbers is the same as multiplication by 2.
Static Cast to an Unsigned Integer
What happens when we cast a negative value to an unsigned integer, i.e., a positive number? The bits stay the same but the value is interpreted differently. Casting to an unsigned integer allows us to use left shift because it is only defined for positive numbers in
What is the binary representation of
-1? In Two’s complement,
-1 is represented as
(111...111)_2 because 2⁶³-2⁶²-2⁶¹- … -2²-2¹–1 = 1.
Putting it All Together
If you follow the definitions in V8’s source code, you will find that
kSmiValueSize is defined as
32 on 64-bit machines, resulting in:
kSmiMinValue =(static_cast<unsigned int>(-1)) << (kSmiValueSize — 1)
= (111...111)_2 << (32-1)
= (111...111)_2 << 31
= (11...1100...00)_2 // 31 zeros
Now we use this result in the initial definition of
int kSmiMaxValue = -(kSmiMinValue + 1);
That is easy,
kSmiMaxValue = -(-2^31+1) = 2^31-1. The range for
Smis in V8 on 64-bit platforms is [-2³¹, 2³¹-1].
On 32-bit platforms,
kSmiValueSize = 31. So we shift by 30, and end up with
kMinValue = -2^30. Note, 2³⁰≈ 10⁹.
Why is the range for
Smis or heap objects using the least significant bit. If it is
1, it is a pointer. If the least significant bit is a
0, it is a
Smi. That means that 32-bit integers can use only 31 bits for the
Smi value, because one bit is used as the tag.
V8 tags all values as either Smis or heap pointers using the least significant bit.
Smis are not as small as you might have thought, but they easily fit into a 32-bit or 64-bit integer together with their encoding bit.
Bonus question: Given a non-empty array of integers, every element appears twice except for one. Find that single one. Can you use binary representation for a linear time, constant space solution?