Making Languages

Part 3: Parse Trees

Christopher Pitt
Making Languages
Published in
3 min readOct 13, 2014

--

← Back | Forward →

A flat stream of tokens is better than a meaningless string of characters. Yet it’s still not all that useful for defining behaviour. Programming languages are hierarchical structures. Each operation is of a defined type and has a series of child nodes which form the operands. Consider the simple example we’ve used so far:

This equation can be represented as a hierarchical operation. An addition statement with a left-hand side and a right-hand side. These contain numerical values. The more complex the language construct, the more clear their hierarchical structure becomes:

This loop structure has an underlying structure: while (E) { E }. Seen as a series of tokens; this construct might be easy to miss. It’s just a combination of many tokens, beginning with while followed by whitespace etc.

A Parse Tree is what a compiler makes when it turns a series of tokens into a hierarchical structure.

Creating Parse Trees

We can create a class to replace groups of tokens with nested structures:

This is from Parser.php

We begin by stripping the whitespace tokens. Removing them makes parsing easier to understand, for the moment. We then try to identify operations in the token array:

This is from Parser.php

This gives us a nested array of operations, but it doesn’t tell us whether all the tokens formed part of an operation or not. For that, we’ll need to create another array to store the operations so we can isolate the unprocessed tokens:

This is from Parser.php

Now, all non-addition operations will remain in $tokens so we can identify parse errors. The complete function resembles:

This is from Parser.php

Given this, and a list of tokens, we can see more complex structures emerging from the code string:

← Back | Forward →

--

--