Leveling Up One’s Parsing Game With ASTs

Vaidehi Joshi
Dec 5, 2017 · 13 min read
Leveling up one’s parsing game with ASTs!

From concrete to abstract

Every good quest starts with a solid foundation, and our mission to demystify this structure should begin in the exact same way: with a definition, of course!

Abstract syntax tree: a definition

By contrast, an AST only contains the information related to analyzing the source text, and skips any other extra content that is used while parsing the text.

This distinction starts to make a whole lot more sense if we focus in on the “abstractness” of an AST.

Concrete versus abstract syntax trees
Revisiting the events leading up to parsing!
The syntax analysis phase generates the parse tree

We can think of the parts of the compiler as good friends, who all depend on each other to make sure that our code is correctly transformed from a text or file into a parse tree.

But back to our original question: where does the abstract syntax tree fit into this friend group? Well, in order to answer that question, it helps to understand the need for an AST in the first place.

Condensing one tree into another

Okay, so now we have two trees to keep straight in our heads. We already had a parse tree, and how there’s yet another data structure to learn! And apparently, this AST data structure is just a simplified parse tree. So, why do we need it? What even is the point of it?

What does it mean to actually represent a program by its most distinct parts?

As it turns out, sometimes all of the distinct parts of a program actually aren’t all that useful to us all the time.

Parse tree can often be super verbose.
Compressing a parse tree allows us to avoid redundancy.
Superfluous information that is of no use to us can be removed from a parse tree.
An AST abstracts away from the concrete syntax.
An AST is an abstract representation of a source text.
  1. An AST will have collapsed version of what would otherwise appear as single-successor nodes; it will never contain “chains” of nodes with a single child.
  2. Finally, any operator tokens (such as +, -, x, and /) will become internal (parent) nodes in the tree, rather than the leaves which terminate in a parse tree.

It would stand to reason, then, that if an AST is a compacted version of a parse tree, we can only really create an abstract syntax tree if we have the things to build a parse tree to begin with!

This is, indeed, how the abstract syntax tree fits into the larger compilation process. An AST has a direct connection to the parse trees that we have already learned about, while simultaneously relying upon the lexer to do its job before an AST can ever be created.

An AST is always the output of the parser.

Anatomy of an AST

Now that we know that the abstract syntax tree is important (but not necessarily intimidating!), we can start to dissect it a tiny bit more. An interesting aspect about how the AST is constructed has to do with the nodes of this tree.

The anatomy of an AST node.
A simplified visualization of our AST expression.
A code visualization of our AST expression, using JavaScript.
Building an AST can be complex sometimes.

Building an AST directly can be tricky, since the parser has to not only find the tokens and represent them correctly, but also decide which tokens matter to us, and which ones don’t.

In compiler design, the AST ends up being super important for more than one reason. Yes, it can be tricky to construct (and probably easy to mess up), but also, it’s the last and final result of the lexical and syntax analysis phases combined! The lexical and syntax analysis phases are often jointly called the analysis phase, or the front-end of the compiler.

The inner workings of the front-end of our compiler.

Resources

There are a whole lot of resources out there on ASTs, in a variety of languages. Knowing where to start can be tricky, especially if you’re looking to learn more. Below are a few beginner-friendly resources that dive into a whole lot more detail without being too overhwelming. Happy asbtracting!

  1. What’s the difference between parse trees and abstract syntax trees?, StackOverflow
  2. Abstract vs. Concrete Syntax Trees, Eli Bendersky
  3. Abstract Syntax Trees, Professor Stephen A. Edwards
  4. Abstract Syntax Trees & Top-Down Parsing, Professor Konstantinos (Kostis) Sagonas
  5. Lexical and Syntax Analysis of Programming Languages, Chris Northwood
  6. AST Explorer, Felix Kling

basecs

Exploring the basics of computer science, every Monday, for a year.

Vaidehi Joshi

Written by

Writing words, writing code. Sometimes doing both at once.

basecs

basecs

Exploring the basics of computer science, every Monday, for a year.