Writing Lispex, a Lisp interpreter in Elixir

Faraz Haider
11 min readJul 3, 2018

--

Photo by Katherine Hanlon on Unsplash

Motive

I started working with Elixir a few months back but never fully delved deep into the language’s intricacies. Elixir is a fairly recent language which runs on BEAM, the same VM which runs Erlang. So it inherits all the properties which make Erlang great like fault tolerance, high availability and distributed computing. The only barrier to entry in Erlang for most developers was it’s ecosystem and the language’s syntax. Elixir addresses these issues very elegantly with a modern syntax and a complete development toolchain.

In my learning process I realized writing a compiler/interpreter for some simple language would be the best way to learn and apply most of the constructs Elixir has to offer. I found Peter Norvig’s article on writing a Lisp interpreter in Python to be a very excellent starting point. Lisp is a very simple language as it follows a coherent structure. But there exists a plethora of Lisp variants, Scheme being one of the simplest, having very few constructs.

If you are new to Elixir going through this article you’ll see how most of the constructs in Elixir like Pattern Matching, Piping and List processing make your life easier as a programmer.

You can check out the source code and the documentation for reference.

Structure of Lisp ( Scheme Variant)

In Lisp, every statement is modeled as a list with 2 or 3 elements. The first element could be any of these : a language construct, an operator or a user defined function. The second and third parameters are always the arguments to the first element and these parameters could themselves be a list in itself thereby creating a recursive structure.

For example let’s find the max of two numbers in a language like C and in Scheme.

C

if( x > y) {
max = x;
}
else {
max = y;
}

Scheme

(begin (if (> x y) (set! max x) (set! max y)) 

C has a bigger set of constructs with things like line terminators, different types of brackets etc, while Scheme’s syntax is much more uniform.

In Scheme everything is an expression, numbers and symbols are called atomic expressions everything else being a list expression.

Interpreter Process

To write an interpreter for any language there are two main steps which are followed : Parsing and Evaluation.

Parsing

At the parsing stage the source program is broken down into a list of keywords after which an Abstract Syntax Tree (AST) of that program is created. AST closely mirrors the nested structure of the language and can be fed to the evaluation step as the input.

Abstract Syntax tree to find out max of two numbers in a C like language.

Evaluation

In this step the AST is processed according to the semantic rules of the language. For example the operator > means find the greater of the two child nodes in the AST.

Implementation of the Parser in Elixir

The parser is made up of two parts, the tokenizer breaks the code into keywords and an AST maker.

  • Tokenizer

In the example given below every single nested or non-nested Scheme Expression is separated by ( ) .

(begin (if (> x y) (set! max x) (set! max y))

We have to tokenize the source program by ( ) so let’s do that.

There’s a lot going on here so let’s unpack it: def is an Elixir macro to declare a named function. Macros are a very powerful construct used for metaprogramming in Elixir which won’t be in the scope for this article. Functions in Elixir are identified by their name and arity. Arity of a function is the number of arguments it takes. The tokenize function takes a single argument str so tokenize is identified as tokenize/1 .

Elixir also provides helpful string handling functions in String module. Modules in Elixir are a group of functions. A module can be defined using defmodule macro. We’ll later add more functions in our parsing step and group those functions in a Parse module.

Now for the tokenizing step we have to ensure there is a minimum of a single whitespace between keywords so we space out the brackets. |> is called the pipe operator and it works similar to the Unix pipe. The output of the first function is piped as the first argument to the next function. So str is passed as the first argument to String.replace/3 . Finally String.split/2 gets the spaced out string, String.split/1 has a second default argument which is a whitespace ' ' and it returns a list of Strings. List is one of the enumerable types in Elixir, some of the other being Tuple and Map . So for the above example code, the output of tokenize/1 would be

[ "(", "begin", "(", "if", "(", ">", "x", "y", ")", "(", "set!", "max", "x", ")", "(", "set!", "max", "y", ")", ")" ]
  • AST Maker

Before we delve into this step we need to know more about Lists, pattern matching and recursion in Elixir.

Understanding Lists

Lists in Elixir can be defined by enclosing comma separated values in square brackets.

list = [ 1, 2, 3]
[1, 2, 3]
length(list)
3

Internally in Elixir a list is represented as a linked list. This has some subtle effects such as finding the length of the list is a linear operation now.

We can concatenate two lists using ++ operator

[4] ++ list
[4, 1, 2, 3]
list ++ [4]
[1, 2, 3, 4]

Prepending to a list occurs in constant time while appending to a list would take linear time.

A list in Elixir can be divided into a head and tail similar to a linked list using | operator.

[head | tail] = listhead
1
tail
[2, 3]

head will have 1 and tail would itself be a list of 2 elements [2, 3]

Elixir provides helper functions hd/1 and tl/1 which find the head and tail of the list provided as the argument.

hd(list)
1
tl(list)
[2, 3]

Elixir also provides a dedicated List module which has plenty of functions to manipulate lists.

List.first(list)
1
List.last(list)
3

Atoms in Elixir

An atom is a constant who’s name represents its value. They can be defined by prepending a colon : to a name. The boolean values true and false are in fact Elixir atoms. Elixir provides a helper function to check for an atom.

my_atom = :johnis_atom(my_atom)
true
is_atom(false)
true

Pattern Matching Demsytified

In Elixir = operator is a match operator and it compares both the left hand and the right hand side values and if any of the sides don’t match it throws an error.

[1, 2, 3] = list
[1, 2, 3]
[1, 2, 5] = list
** (MatchError) no match of right hand side value: [1, 2, 3]

This might look similar to == operator but that operator only does a comparison. The match operator = here does a comparison as well as binds a value to a variable but the variable binding can only take place at the left side of the match operator. This makes for a very interesting use case in de-structuring complex types such as a List or a Tuple .

[first, mid, last ] = list
[1, 2, 3]
[first, mid, last]
[1, 2, 3]
[1, 2, _] = list
[1, 2, 3]

In the third example the last element is underscore _ . The underscore is a special construct in Elixir which will match to anything and will not bind the value to an element. It is commonly used in pattern matching where an an element has to be ignored.

Now to understand pattern matching let’s take an example where we query an API to fetch list of three students and store them in a list called students. Based on the elements of the list we have to do specific tasks.

students = fetchCoolKids()doCoolStuff(students)def doCoolStuff(["Stu", "Alan", "_"]) do
IO.puts "We don't need a third person."
end
def doCoolStuff(["Doug", _, _]) do
IO.puts "I don't need nobody."
end
# Scenario 1
students = ["Doug", "Alan", "Mark"]
I don't need nobody
#Scenario 2
students = ["Stu", "Alan", "Mark"]
We don't need a third person
#Scenario 3
students = ["Mark", "Stu", "Alan"]
(FunctionClauseError) no function clause matching in doCoolStuff/1

Type Definitions

Scheme has few types of objects, this is how we are going to represent them in Elixir

  • Symbol -> Implemented as an Atom . begin => :begin
  • Atom -> Implemented as an Atom or a Number .
  • Number -> Implemented as Float or Integer .
  • List -> Implemented as a List . (1, 2, 3) => [1, 2, 3]
  • Expression -> implemented either as an Atom or a List

Creating the AST

Below is the code for creating the AST with some helpful comments.

I have used recursion to go through all the tokens as there is no concept of while loop in Elixir because of immutability. parse/2 takes the list of tokens as the first argument and an accumulator acc as the second. [head | tail] form has been used to represent the list.

To understand the above code better lets take an example lisp code

(begin (define r 10) (* r r))

The tokenized form of the above lisp code would be this

["(", "begin", "(", "define", "r", "10", ")", "(", "*", "r", "r", ")", ")"]

We would pass the above list of tokens and an empty accumulator list to our parse/2 function. Pattern matching will take care of the specific function to be called. What we are trying to achieve is to make a list of operator and arguments every time we encounter '(' and ') . In an AST form the non-leaf nodes will always be a function or an operator and leaf nodes would be a symbol ( number or a string ).

Below is the output of the parsing stage

[ :begin, [ :define, :r, 10 ], [ :*, :r, :r ] ]
AST Representation

Evaluation of the AST in Elixir

To evaluate the Lisp code we have to traverse through the AST and evaluate each statement recursively. To the evaluate the code we need an Environment, a map which stores the language constructs and user defined variables. Keys of this map are Lisp constructs like ‘define’, ‘begin’ etc and values are anonymous functions which perform the required task. We’ll explore all this further before discussing more about evaluation.

Maps

Map is an enumerable type in Elixir. It is a key value pair, where there’s no restriction on the key type. A map can be created using %{} syntax and the key value pairs can be represented as key => value .

id_map = %{"Name" => "John", "Age" => 45, "Citizen" => "USA"}Map.get(id_map,"Name")
"John"
Map.put(id_map,"DateOfBirth","03-03-1973")%{"Name" => "John", "DateOfBirth" => "03-03-1973" "Age" => 45, "Citizen" => "USA"}

Anonymous Functions

We have previously seen how to define named functions using the def macro. We can also define functions without a name called Anonymous functions in Elixir and lambda functions in some other languages.

multiply = fn (a, b) -> a * b endmultiply.(3,4)
12
toss = fn "heads" -> "England won the toss."
"tails" -> "Australia won the toss."
_ -> "Invalid toss."
end
toss.("heads")
England won the toss

We define an anonymous function using the fn keyword and -> operator. These functions can be invoked by using . operator and they can also have multiple bodies invoked using pattern matching.

There is one more way to represent anonymous functions in Elixir.

multiply = &( &1 * &2 )

This type of function definition can neither have named parameters nor multiple function bodies. & is called the capture operator. &1 and &2 are the first and second argument analogous to a and b in the fn definition.

Scheme constructs implemented in Lispex

  • Definition : (define symbol exp)
  • Assignment : (set! symbol exp)
  • Variable reference: symbol
  • Constant Literal: number
  • Conditional: ( if test conseq alt)
  • Procedure: (proc args…)

Procedure is anything other than if, define and set! . Examples are all the arithmetic and logical operators, math functions, list operations such as car, cdr, cons .

Environment map

The above scheme constructs have been implemented through a map in Elixir, a gist of that map is given below.

Environment Map.

This Environment map is defined in an Env module. The module contains functions for inserting and fetching values from the map.

The get/2 function seems a little complex. I will break it down for your understanding. Similar to imperative languages, Scheme has a local and a global scope. The global scope is the first set of ( ) . A local scope starts with a new ( ) inside it. This can be nested to make any number of scopes, commonly referred to as parent and child scopes.

So when evaluating the Scheme code, for every instance we encounter a new ( ) we create a new environment, referring the parent environment by an:outer key. In get/2 we recursively search for a key until we reach the global environment.

Case statement and Function guards

Case statement allows us to pattern match a value and execute the statements corresponding to that match.

case [1, 3, 5] do
[_, 3, 5] ->
"Match at 1"
[_, _, 5] ->
"Match at 2"
_ ->
"No match found"
end

Function guards are an addition to the pattern matching. A pattern match only happens when the guard condition is satisfied.

Guards can be used in named and anonymous functions and case statements. Guards are defined by using a when keyword followed by an expression.

def parse(x) when is_list(x) do
IO.puts "List"
end

Type checks such as is_list/1 , is_atom/1 etc can be used with guards in addition to a host of other operators.

Evaluation of AST

We recursively evaluate the AST, evaluating each constructs and its respective arguments.

When the scheme expression is :

  • not a list and is an atom, fetch the value for that atom from the environment.
  • not a list and is a number, return that number.
  • a list and the head is :if , evaluate the second element, if true evaluate the third element, if false evaluate the tail.
  • a list and the head is :define , then the second element is the symbol. Evaluate the tail and put its value in the Environment map with symbol as the key.
  • a list and the head is :set!, then the behavior is similar to :define except that the variable has to exist in the environment map.
  • a list and the head is anything other than :if, :set! ,or :define then the head is the procedure , and the tail is the list of arguments for that procedure. Evaluate each element in the tail, get the value of the procedure from the environment map.

Creating a REPL

Manually calling the interpret function gets tedious and we also the environment map state, so creating a REPL is the best way to mirror the Scheme interpreter.

Here’s an example of the Lispex REPL in action

iex> Lispex.repl
lispex> (begin (define a 4) (* a 5))
20
lispex> (if a<3 (* a 4) (* a 6))
24

All of the code given here is included in four modules called Lispex , Parse , Env and Eval .

Conclusion

I tried to give you a practical introduction to Elixir as it helps in digesting these concepts easily. The source code also has a lot of documentation which may be helpful if you find some functions a bit cryptic.

In my next article I’ll discuss about metaprogramming in Elixir and the Erlang OTP which would be accompanied by a practical problem.

References

https://hexdocs.pm/elixir

http://norvig.com/lispy.html

--

--