Javascript evaluator part 2: Parser and Basic evaluator

This is the second part of writing Javascript evaluator series. I’m going to talk about my project developing Javascript evaluator in Rust. This post is going to briefly introduce Parsing that’s build on top of the results from Lexer in the first post. Then I will cover elements of evaluation of abstract syntax tree (AST).

I would like to thank Jason Williams, since my parser is inspired by Boa parser. And looking at similar project in Rust definitely helped me a lot to recognize problems that I’m going to face.


Parsing

My parser is build on top of the result from Lexer that was introduced in the previous blog post. Just to recap quickly the result of lexing Javascript is array of tokens and looks like:

KFunction,
Identifier("name"),
LBrace,
Identifier("console"),
Dot
Identifier("log"),
LRound
String("hello"),
RRound,
Semicolon,
RBrace,

The parser was build using ECMAScript specification. ECMAScript is defined by Context free grammar with a few exceptions. Example rule for parsing white space looks like:

WhiteSpace::
<TAB>
<NBSP>
<SP>

Every line covers alternative for what is considered to be a white space. In this case it’s tab, space, non-breakable space. These are called terminals, can be visualized as a leaf of tree. When terminal is reached we have to consume one token from lexer and token has to mach terminal if no we try another rule. Grammar additionally uses non-terminals. Non-terminals are place holder for another rule which they are going to be substituted with, these are nodes of the tree. For example in grammar:

S::
Aa
A::
b
c

We define S and A as non-terminals and a, b, c as terminals. If we start in S first we rewrite non-terminal to either terminal b or c and then we continue by consuming a.


For my parser I used nom combinator library which uses syntax very similar to context free grammars. This means we have to pretty much rewrite ECMAScript standard using nom library. Let’s start by showing example how to implement one rule of ECMAScript in nom syntax.

named!(pub Statement<Input<InputWrapper>, node::Statement>, alt!(
BlockStatement |
VariableStatement |
DebuggerStatement |
...
));
fn DebuggerStatement(tokens: Input<InputWrapper>) -> IResult<Input<InputWrapper>, node::Statement> {
let debugger = is_token!(tokens, Token::KDebugger)?;
let semicolon = is_token!(debugger.0, Token::Semicolon)?;
Ok((
semicolon.0,
node::Statement::DebuggerStatement(node::DebuggerStatement {}),
))
}

This simplified example implements parser for Statement. Terminals are defined by Rust functions which consume tokens and produce part of the abstract syntax tree (AST). Javascript code consisting from var a = 1 would produce this AST

AST can be later used by evaluator to produce result of program.

Converting ECMAScript to nom is tedious, but quite straightforward. Resulting tree looks very similar to one defined by ESTree specification.

Evaluator

In this section I will cover my Javascript evaluator that we’ve been continuously building up to. My evaluator uses Parser and Lexer explained earlier. Evaluator is by far the most complex part of my project and today I will cover only basics that I had to consider when writing my evaluator.

Let’s start by creating an Enum with all possible Javascript types as defined by ECMAScript: Undefined, Null, Boolean, String, Symbol, Number, and Object

enum ValueData {
Null,
Undefined,
Boolean(bool),
String(String),
Number(Number),
Object(GcCell<ObjectData>),
Function(GcCell<Function>),
}

We’ll be seeing Gc and GcCell structs all over the place. This allows us to use garbage collector for our Javascript implementation. Thanks to Gc we can store data that can be accessed and mutated by different branches of code.


Another fundamental part of Javascript is scoping. Scope defines visibility of variables. By default script starts with only Global scope and its predefined properties. All statement expression have a scope in which they are defined and where they can be accessed. To defined a scope we will use a new structure Scope.

pub struct Scope {
parent: Option<*mut Scope>,
isolated: bool,
context: VarMap,
}
type VarMap = HashMap<String, var::Var>;

When creating and evaluator we initialize only Global scope which doesn’t have any parents. Any variable modifies current state. When evaluating any expression a state could be modified. However, in function calls we will have multiple levels of scope. It’s valid to use declaration from parent scopes and modify them. When resolving identifier we will have to traverse whole parent list until we find required variable or reach Global scope.

For scope we want to support three variable declarations: var, let, const. Let and const are simple we just have to define variable in the current scope. Var is a bit more complicated as standard specifies that variable is defined in closest function scope. We can have other scopes than function as curly brackets are expressions which are allowed in the most places. Function scope can be recognized from the isolated property in the scope structure. When resolving variable we start at the current scope and traverse up to Global scope. If variable is not in the Global scope we report evaluation error.

Variable definition inside scope consists only from value itself and type of variable declaration to check if modification and re-declaration is allowed. For variable we use this structure:

enum VarKind {
Var,
Let,
Const,
}
struct Var {
kind: VarKind,
value: Value,
}

This concludes today's post. Thank you for getting this far. Today we have covered briefly parsing of Javascript and introduction to Javascript evaluation. In the next post I will continue with describing all Javascript types, their evaluation and we will see how functions, if conditions and loops are implemented