Calculating 1 + 1 in JavaScript — Part 5

Peter Smith
Compilers
Published in
11 min readMay 18, 2021

I’m a compiler enthusiast who has been learning how the V8 JavaScript Engine works. Of course, the best way to learn something is to write about it, so that’s why I’m sharing my experiences here. I hope this might be interesting to others too.

This blog post discusses how JavaScript is compiled differently when entered into the REPL or via the eval command, versus being in a function body.

This is the fifth part in a series that dives into how the V8 JavaScript Engine computes the expression 1 + 1. This seems like a simple task, but utilizes a large portion of the JavaScript run-time environment, and so far has required four blog posts to describe. There will likely be four more to complete the full story, so kudos to anyone who reads them all!

Since I started writing this blog series, I’ve learn a lot of the unusual JavaScript quirks you wouldn’t normally think about. If you’re curious about the entire blog series, here’s what we’ve seen so far:

Let’s now see how V8 rewrites certain statements if they’re entered into the REPL or passed into the eval command.

Entering Statements into the REPL

Here’s a simple example. What’s the result of the following statement, when entered into the V8 JavaScript REPL (such as Node.js, or Chrome console)?

if (3 > 5) {
10
} else {
20
}

Alternatively, what’s the expected value when passed to eval?

eval("if (3 > 5) { 10 } else { 20 }")

It should be no surprise that 20 is returned in both cases. How about this similar example, using a function definition and call?

function f() {
if (3 > 5) {
10
} else {
20
}
}
f()

You might think it also returns 20, but that’s not the case. This function-based approach instead returns undefined. If you think about it, the constants 10 and 20 are effectively dead code since they’re executed without side-effects (not assigned to a variable), and they’re not explicitly returned, so the result will immediately be discarded.

Unlike some languages, JavaScript does not return the value of the then or else code paths as the value returned by the if statement itself. That is, the following code will fail:

> const max = if (a > b) { a } else { b }
^^
Uncaught SyntaxError: Unexpected token 'if'

Additionally, JavaScript does not support the implicit return of the last expression in a statement:

> function f(a, b) { a + b }
> f(10, 20)
undefined

Although many people think of JavaScript as a great language for functional programming, these examples demonstrate some weaknesses in that vision. As an aside, using the ternary ? : operator solves the first limitation, and lambda/arrow functions solve the second case, so at least some aspects of JavaScript allow functional style.

Back to our problem — why did our first example return 20, while the second example (the same code inside a function) gave us undefined? Let’s investigate.

Rewriting the Immediate Code

Our example worked because the JavaScript REPL, as well as the built-in eval feature, both modify the standard JavaScript semantics to be more functional in nature. Let’s face it, the REPL wouldn’t be useful if you couldn’t evaluate simple expressions. Consider what would happen if side-effect-free expressions were basically discarded:

> 1 + 1
undefined
> eval("1 + 1")
undefined

To make this practical, V8 rewrites JavaScript code (as long as it’s not inside a function) to actually return the value. Here’s our first example again, but with the code rewritten to include the necessary side-effects.

function f() {
let result = undefined
if (3 > 5) {
result = 10
} else {
result = 20
}
return result
}
f()

Note the use of the explicit result variable, with a single return statement at the end of the function. You might feel that adding return 10 and return 20 statements would be more efficient, but consider the following example:

if (3 > 5) {
10
} else {
20
25
}

The answer should be 25, but rewriting to use return 20; return 25; would incorrectly return 20.

Similarly, an upfront result = undefined statement might be necessary if there’s a path where no valid expression will be seen:

function f() {
let result = undefined
if (3 > 5) {
result = 10
}
return result
}
f()

In this case, the lack of an else clause results in undefined being returned if the false path is taken.

How V8 Rewrites the Abstract Syntax Tree

As we saw in previous blog posts, V8 encapsulates all standalone code (such as 1 + 1) into an implicit function (with no parameters and no properties). This allows V8 to use the FunctionLiteral AST node as the basic unit of compilation in a consistent way across the V8 engine.

However, simply inserting the standalone code into a function body causes the computed values to be lost, and undefined to be returned. To avoid this, V8 rewrites the AST using the Rewriter::Rewrite() method (see src/parsing/rewriter.cc) to assign expressions to a temporary result variable, then automatically inserts a return statement at the end of the function body.

In terms of the actual AST changes required for our earlier if example, Rewrite() takes the following high-level steps:

  1. Allocate a new temporary variable to contain the function’s result. This will be named .result, with a period at the start of the name so it won’t conflict with any user-defined variable names.
  2. For each child of the IfStatement AST node (the true path and the false path), search backward through the list of child statements and insert a new Assignment AST node for the last statement that creates a value. We’ll learn more about this shortly.
  3. At the end of the main FunctionLiteral's list of child statements, add a new ReturnStatement node to return the value of the temporary variable (.result).

As you’d expect, the end result will be:

function f() {
let .result = undefined
if (3 > 5) {
.result = 10
} else {
.result = 20
}
return .result
}

Let’s examine this process in more detail.

Creating the Temporary Variable

The rewriter code starts in the Rewriter::Rewrite() method, which calls RewriteBody() with the list of statements (aka “body”) that appears as the child of the FunctionLiteral AST node.

Other than a few sanity checks, the most important step here is to create the temporary variable, .result, using the following code:

Variable* result = 
scope->AsDeclarationScope()->
NewTemporary(info->ast_value_factory()->dot_result_string());

Once created, the temporary variable and the function’s body of statements are passed into a new instance of the Processor class, which proceeds to rewrite the AST to include the necessary variable assignments. This is all done by calling Processor::Process().

Visiting the AST

The Processor class is a subclass of AstVisitor which provides common “tree walking” functionality for traversing the AST in various ways. For example, AstVisitor<AstPrinter> is used when pretty-printing the AST (see the --print-ast option), and AstVisitor<BytecodeGenerator> is used when generating executable byte codes. In our case, AstVisitor<Processor> is used to traverse the AST to add the .result = statements.

AstVisitor and its subclasses all have a Visit() method that takes an AstNode * as input. This method determines the type of that AstNode, then dispatches to the appropriate tree-walker method. For example, if the AST node has type ExpressionStatement, then the VisitExpressionStatement() method is called. Likewise, seeing an IfStatement AST node results in the VisitIfStatement() method being called.

Inside each of the tree-walker methods, the code does whatever is necessary to process the information in that node. This likely involves a recursive call to Visit() to walk through the child AST nodes. For example, VisitIfStatement() calls Visit() on each of its children, to visit the true statement body, and then the else statement body, if it exists.

As you can see from eye-balling the src/parsing/rewriter.cc file, there are numerous VisitX() methods implemented, with each one handling a different type of AST node. Not all AST nodes have a corresponding VisitX() method though, but anything required for rewriting our AST will be present.

Walking Backwards Through Blocks

One interesting part of the rewriting algorithm is that it intelligently considers when to add the .result = assignment. In the following contrived example, we can see that only 10 and 25 need to be assigned to the .result variable, whereas assigning 20 would be pointless.

if (3 > 5) {
10
} else {
20
25
}

Our AST traversal code therefore walks backwards through the list of statements, only assigning the final expression to the .result variable. However, this assumes that the block is not “breakable” in the sense that the flow of control could leave the block earlier than the last statement.

For example, here’s another contrived example:

while (true) { 
10
break
20
}

In this case, if we had stopped after setting .result = 20, we’d miss the fact that the code exits the loop before getting to that point, so .result = 10 must also be assigned.

Inside the Processor::Process() method, the algorithm walks backward through the list of statements in the FunctionLiteral's body. It calls Visit() on each statement, then replaces the statement with the rewritten version (in the case where .result = was added). If rewriting wasn’t necessary, the corresponding VisitX() method simply sets replacement_ to be the original statement, unchanged.

Here’s the code:

void Processor::Process(ZonePtrList<Statement>* statements) {
for (int i = statements->length() - 1;
i >= 0 && (breakable_ || !is_set_); --i) {
Visit(statements->at(i));
statements->Set(i, replacement_);
}
}

Note the use of is_set_ and breakable_ variables in the loop. These ensure the loop terminates as soon as one of the statements (starting at the end of the list) is rewritten, unless the block of statements could potentially contain a break statement.

Assigning Values to the Temporary Variable

Let’s now see how each individual statement is rewritten. First, note that constants like 10 or 20 have type ExpressionStatement when they appear in the FunctionLiteral's statement body.

The following VisitExpressionStatement() method is invoked whenever an ExpressionStatement node is seen:

void Processor::VisitExpressionStatement(ExpressionStatement* node){
// Rewrite : <x>; -> .result = <x>;
if (!is_set_) {
node->set_expression(SetResult(node->expression()));
is_set_ = true;
}
replacement_ = node;
}

Note that the code only has an effect if is_set_ is false, indicating that no value has yet been assigned to .result in the current statement body (working backwards from the last statement, to the first). In this case, SetResult() is called to insert a new Assignment AST node into the AST.

// Returns ".result = value"
Expression* SetResult(Expression* value) {
result_assigned_ = true;

VariableProxy* result_proxy =
factory()->NewVariableProxy(result_);

return factory()->NewAssignment(
Token::ASSIGN, result_proxy, value, kNoSourcePosition);
}

The NewAssignment() method creates a new Assignment AST node, with a VariableProxy AST node as the left child (the thing being assigned to), and the original expression as the right child.

A More Complex Example: If Statements

Now for a more complex scenario, the case where an IfStatement node appears in the AST. We must handle the case where both a true branch and a false branch are included, but also consider that the false branch is optional, therefore returning undefined.

void Processor::VisitIfStatement(IfStatement* node) {
// Rewrite both branches.
bool set_after = is_set_;
Visit(node->then_statement());
node->set_then_statement(replacement_);
bool set_in_then = is_set_;
is_set_ = set_after;
Visit(node->else_statement());
node->set_else_statement(replacement_);
replacement_ = set_in_then && is_set_ ?
node : AssignUndefinedBefore(node);
is_set_ = true;
}

In this code, we see two calls to Visit(), once for the true (then) case, and once for the false (else) case. In both scenarios, they replace the true or false statement lists with the rewritten lists, assuming anything was actually rewritten. Note how the is_set_ variable is reset back to its original value before evaluating the false case.

The replacement_ variable is conditionally updated depending on whether both the true and false branches generated values. If they weren’t both used, the entire IfStatement AST node is prepended with an assignment of .result = undefined, so as to account for the path where no expression value was assigned to .result.

Finally, the is_set_ flag is always set true at the end, indicating that a value must have been set at this point in the code. Either the value came from one of the two branches, or the if statement will have returned undefined if there was no else case.

Adding the Final Return

To finish up the rewriting process, at the end of Rewriter::RewriteBody, the following code adds a ReturnStatement AST node at the end of the FunctionLiteral's statement list.

Statement* result_statement =
processor.factory()->NewReturnStatement(result_value, pos);
body->Add(result_statement, info->zone());

Back to Our 1 + 1 Example

Now that we understand the concept of rewriting the AST, and why it’s necessary, let’s return back to our over-arching story — how JavaScript calculates 1 + 1. It was necessary to branch off to give examples using if statements, otherwise it wouldn’t be clear why rewriting the AST was necessary.

If you recall from the previous blog post, we’ve already used constant folding to simplify our 1 + 1 expression into the even simpler 2 expression. Here’s the AST we ended up with:

Notice how V8 converted the simple expression into a function definition (with no arguments or properties), implying that the FunctionLiteral class is the basic unit of compilation.

Next, the Rewriter class modifies our code to:

function() {
let .result = 2
return .result
}

which gives us the following AST:

This function is now ready for compilation, and eventually execution. The result can be displayed on the console, or returned from the eval command.

Next Time…

In the next blog post, we’ll start with this rewritten AST, then generate byte codes. There’s a lot of complexity in generating byte codes, so it’ll surely be another detailed blog post.

--

--