Implementing a Programming Language in Swift — Part 4: The Parser

This is the fourth part in a tutorial series on “Writing a Programming Language in Swift.” Be sure to check out Part 3.

Language parsing is widely considered something only experienced programmers can do. This is a common misunderstanding found in almost any field.

Certainly, being a core contributor to a widely used language such as Swift, indicates that you must have a lot of experience in the subject. But that goes with every type of programming and shouldn’t scare us away from creating a parser of our own.

What is a Parser?

The main objectives of a Parser is to convert a list the tokens generated by our Lexer into a data structure our language’s syntax, usually this “data structure” is a tree like structure, such as an Abstract Syntax Tree. Like with everything within computer science it’s best to look to wikipedia (:]) for definition of such a tree.

An abstract syntax tree is a tree representation of the abstract syntactic structure of source code written in a programming language.

Wow, that’s a lot of badly defined concepts. Let’s break it down:

  • Abstract — As it turns out, we don’t care about all of the tokens that the parser receives. Consider the expression 2 * (2 + 3), here the tree doesn’t really need information on the opening and closing parentheses. Just that the priority of the expression within them. The parentheses have been abstracted out.
  • Syntactic structure — In Swift terms, this generally means any structure struct, enum or class, representing how the source code is formed in our language’s syntax.

Top Down Parsing
There are many methods to implementing parsers, but the one we will look at today is top down parsing.

In top down parsing, highest level structures of the syntax syntax tree are generated before moving on to more detailed parts. It is generally concidered a slower technique than its counterpart (bottom up), but is much easier to grasp, which makes it perfect for this tutorial.

Recursive Decent Parsing
A very popular method for top down parsing is called recursive decent parsing. This popularity is no coincidence, as they’re simply the easiest to implement.

As you probably assumed, this simplicity comes at a cost and you would be right, as they are also the least performant. But we will select the recursive decent method anyways for the sake of keeping this tutorial as simple as possible.

Implementation

For our Parser, let’s start with simple class definition:

class Parser {
    let tokens: [Token]
    var index = 0
    init(tokens: [Token]) {
        self.tokens = tokens
    }
}

Before moving to its full implementation its good to recap on our grammar:

E ⇒ <Number> | <Number> <Operator> <E> | ( <E> )
Operator ⇒ <Plus> | <Minus> | <Multiplication> | <Division>
Number ⇒ <Decimal>.<Decimal> | <Decimal>
Decimal ⇒ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | <Decimal><Decimal>
--------
Example:
--------
       *
/ \
2 (+)
/ | \
1 + 3

To make our Parser as declarative as possible, we need a couple of helpers. For one, we are dealing with operations that differ in precedence, e.g. expressions surrounded by () should be evaluated before * and /, * and / should be evaluated before + and -, etc.

To to take these kinds of properties into account when parsing, we will add some helper structs, around each major Node in our AST. For our VERY simple language, the only such structs will be InfixOperation and Number, which will just be represented by Float.

protocol Node {}
extension Float: Node {}
struct InfixOperation: Node {
    let op: Operator
    let lhs: Node
    let rhs: Node
    var precedence: Int {
        switch self {
        case .minus, .plus: return 10
        case .times, .divideBy: return 20      
        }
    }
}

Now we are ready for a more complete version of our Parser, don’t worry, if you don’t understand the code right away. I’ll walk you through it.

class Parser {
    enum Error: Swift.Error {
        case expectedNumber
        case expectedOperator
        case expectedExpression
        case expected(String)
    }
    let tokens: [Token]
    var index = 0
    var canPop: Bool {
        return index < tokens.count
    }
    init(tokens: [Token]) {
        self.tokens = tokens
    }
    func peek() -> Token {
        return tokens[index]
    }

    func peekPrecedence() throws -> Int {
        guard canPop, case let .op(op) = peek() else {

return -1
        }
        return op.precedence
    }
    func popToken() -> Token {
        let token = tokens[index]
        index += 1
        return token
    }
    func parseNumber() throws -> Float {
        guard case let .number(float) = popToken() else {
            throw Error.expectedNumber
        }
        return float
    }
    func parseParens() throws -> Node {
        guard case .parensOpen = popToken() else {
            throw Error.expected("(")
        }
        let expressionNode = try parse()
        guard case .parensClose = popToken() else {
            throw Error.expected("(")
        }
        return expressionNode
    }
    func parseValue() throws -> Node {
        switch (peek()) {
        case .number:
             return try parseNumber()
         case .parensOpen:
             return try parseParens()
         default:
              throw Error.expected("<Expression>")
         }
    }
    func parseInfixOperation(node: Node,
nodePrecedence: Int = 0) throws -> Node {
        // TODO: IMPLEMENT
    }
    func parse() throws -> Node {
        guard canPop else { throw Error.expectedExpression }
        let node = try parseValue()
        return try parseInfixOperation(node: node)
    }
}

All the implemented parts so far are fairly trivial:

  • Our Parser class is initialized with tokens but also has a property called index which we use to know the current location of the parsing phase.
  • peekToken() — A way of checking the element at the current location without incrementing the index.
  • peekPrecedence() — A method we use to get the current token’s precedence. This requires the current token to be an InfixOperator.
  • popToken() — A way of checking the element at the current location, but ultimately incrementing the index.
  • parseValue() — A way of getting either a number or a parenthesized expression.
  • parseInfixOperation(node:) — This is where the magic will happen. This method takes in a node representing the left-hand side of an infix operation and should try to parse an operator followed by the right-hand side. If no expression is found on the right-hand side is found it should throw an error and if no operator is found it should simply return node.
  • parse() — With all of the aforementioned helpers it becomes very easy to implement the top level parsing method. All we have to do is extract a value (parenthesized expression or number) and call our parseInfixOperation(node:) with the extracted value as the left-hand side node parameter.

Like I mentioned, the real magic is actually inside our parseInfixOperation(node:) method. It might look simple, but for us to handle precedence properly we will have to jump a few hoops.

Don’t worry. I’ll make it easy to understand.

The function will be an implementation of method to writing a precedence parser called Precedence Climbing Method (wiki) .

Here goes:

func parseInfixOperation(node: Node, nodePrecedence: Int = 0) throws -> Node {
    var leftNode = node
    while let precedence = try peekPrecedence() as? Int,
precedence >= nodePrecedence {
        guard case let .op(op) = popToken() else {
            throw Error.expectedOperator
        }
        var rightNode = try parseValue()
        let nextPrecedence = try peekPrecedence()
        if precedence < nextPrecedence {
            rightNode = try parseInfixOperation(node: rightNode, nodePrecedence: precedence + 1)
        }
        leftNode = InfixOperation(op: op, lhs: leftNode, rhs: rightNode)
    }
    return leftNode
}

Here’s what’s happening:

  • If the next token is not an operator, or has a lower precedence than the current token, we’re done.
  • Consume and read the next value which should be a Float initially.
  • If the operator after the next value has a higher precedence, recursively parse the all tokens starting at the next value .
  • Form a sub-tree (in our Abstract Syntax Tree) with the current operator InfixOperation(op: op, lhs: leftNode, rhs: rightNode).

Next Steps

Stay tuned for next tutorial, where we’ll continue to implement the actual traversal of our AST and finally see our interpreter up and running.

If you found this topic exciting, I suggest you continue reading up on interpreter/compiler design. For this there are two very interesting resources:


I hope you guys found this tutorial interesting and as always, don’t be shy to give feedback e.g. by clapping or writing a response.

P.S. Remember that you can follow me here on Medium or twitter for notifications on future tutorials.

Until next time!