Building a Scheme-to-WebAssembly Compiler: Part 3

In which we implement lambda functions and start discussing some of the drawbacks of our REPL, thus prompting a major refactor.

Sean Chen
Sean Chen
Nov 4 · 6 min read

Link to the previous post in the series in case you aren’t caught up.

Lambda functions are very useful in any programming language. It should come as no surprise then that when it comes to functional languages, they’re absolutely essential. So we definitely need to add the ability to execute them in our Scheme REPL. We want to be able to execute some code such as this:

(define double (lambda (x) (* x 2)))
(double 5) ;; => 10

Implementing Lambda Functions

We’ll start off by first adding a Lambda expression type to our Expression enum:

enum Expression:
...
Lambda(LambdaExp)
...

A Lambda expression needs to know a few things in order to execute properly, its parameters, its body, as well as its enclosing scope. One of the highlight features of lambda expressions is that they capture values from their parent scope and are able to make use of them.

The LambdaExp class will keep track of a lambda function’s parameters and body, both of which are Expression types themselves:

class LambdaExp:
params: Expression,
body: Expression,

Let’s first update our eval_keyword function so that it can also check for the lambda keyword:

def eval_keyword(
expr: Expression,
args: Array<Expression>,
env: Env
) -> Option<Expression>:
if typeof(expr) == Expression::Symbol:
...
if expr == "lambda":
return eval_lambda(args)
...

From there, the eval_lambda function will be responsible for pulling out the params and body from the expression and returning a new LambdaExp instance wrapped in a Lambda Expression:

def eval_lambda(args: Array<Expression>) -> Expression:
if args.len() > 2:
return Error("lambda function requires exactly two forms")

params, body = args[0], args[1]
return Expression::Lambda(
LambdaExp {
params: params,
body: body
}
)

Let’s see what this looks like so far in Rust:

With that setup done, we still have to contend with implementing the capability for lambda functions to access their parent scope. We’ll do that by extending our Env:

class Env:
operations: HashMap<String, Expression>,
parent_scope: Option<Env> // might contain an outer scope

Then we’ll have a separate function that handles initializing an Env instance for lambda functions. This new Env instance will contain all of the operations that the lambda function needs in order to execute, as well as a reference to its parent scope from which the lambda was initialized:

def init_lambda_env(
params: Expression,
args: Array<Expression>,
parent_scope: Env
) -> Env:
// ensure that `params` is a List
if typeof(params) != Expression::List:
return Error("expected args to be a list")
// fetch all the params into an Array of Expressions
symbols = params.filter(lambda x: typeof(x) == Expression::Symbol)
if symbols.len() != args.len():
return Error("expected {} args, got {}", symbols.len(), args.len())
evaluated_forms = args.map(lambda x: eval(x, env)) // once we've evaluated the args, it's time to insert them
// into a new Env HashMap
operations = HashMap()

// insert into Env with symbols as keys and forms as values
for key, value in symbols.zip(evaluated_forms):
operations.insert(key, value)
return Env {
operations: operations,
parent_scope: parent_scope
}

With all this in place, we can now update our eval function accordingly to handle lambdas:

def eval(expression: Expression, env: Env) -> Expression:
...
if typeof(expression) == Expression::List:
first, rest = expression[0], expression[1:]
result = eval_keyword(first, args, env)
if result != None:
return result
else:
first_eval = eval(first, env)
...
if typeof(first_eval) == Expression::Lambda:
params, body = first_eval.params, first_eval.body
new_env = init_lambda_env(params, args, env)
return eval(body, new_env)
...

Nice! Here’s the working Rust code:

With all that in place, our interpreter can now handle simple lambda functions!

Drawbacks of our REPL

After adding all of these features to our Scheme REPL, it’s a pretty good time to take a step back and examine our work so far. While we don’t have many operations at all implemented, just addition and subtraction, it would be relatively straightforward to add additional operations such as multiplication, division, minimum, maximum, absolute value, exponent, rounding, etc.

One rather constraining drawback lies in how we’re opting to tokenize input code. Specifically, our tokenization process simply involves splitting on whitespace. That means our REPL is very brittle in the sense that every token is required to be separated by whitespace, meaning input Scheme code has to adhere to a pretty strict format.

Also, the end goal of this series is to build a compiler, which isn’t what we’ve built up so far (which would be considered an interpreter). Compilers don’t receive input code line-by-line and evaluate it. Instead, they receive the input code in its entirety (either in the form of a single file or multiple files) and then emit that same logic in a different language.

As of now, our implementation can’t do this. If we were to give it a file containing multiple lines of Scheme code, it wouldn’t be able to handle things like line breaks, tabs, and many of the other idiosyncratic artifacts that people will usually leave in their written code. In order to implement a compiler, our tokenization process (also called the “lexing” phase) needs to be significantly beefed up.

Starting Our Lexing Phase Over From Scratch

Since we’re scrapping the assumption that our compiler will always receive properly-formatted Scheme code as input and thus can no longer simply split on whitespace, we’re going to instead come at it from the other direction. In other words, our lexer is going to comb through the input code, pick out every single token that we care about, and add it to an array of tokens that represent the input code.

The first step is to revamp our Expression enum, which we’re renaming to Token:

enum Token:
LParen, // left parens
RParen, // right parens
Symbol(String),
Number(Float),
Bool(Boolean),
Chr(Char), // represents a single character
Str(String) // represents a single string of multiple chars

With that in place, our lexer will kick off the lexing process. One thing to note here is that the input code is turned into an iterator, which should be pretty trivial since strings are considered iterable in most programming languages. This is done because when lexing some input code, it isn’t always a simple matter of looking at each character one by one. What we really care about is sequences of characters that, in conjunction with one another, encapsulate some meaning.

For example, if we want to be able to handle comments (which in Scheme are denoted with two semicolons, ;;), our lexer should skip all the characters until the current line as soon as it encounters a semicolon. Iterators give us slightly more fine-grained control over how we loop, which is a desirable trait in this case.

def tokenize(input: String) -> Array<Token>:
tokens = []
iter = input.iter() // turn input into an iterator
// keep lexing so long as our iterator
// hasn't reached the end yet
while iter.peek():
x = tokenize_single(iter)
tokens.push(x)
return tokens

The tokenize_single function’s job then is to figure out if the current character is a token we care about. If it encounters comments or whitespace, it will skip it and continue iterating:

def tokenize_single(iter: Iterator) -> Option<Token>:
while parse_whitespace(iter) or parse_comment(iter):
continue

parse_lparen(iter)
.or_else(parse_rparen(iter))
.or_else(parse_string(iter))
.or_else(parse_hash(iter))
.or_else(parse_symbol(iter))

Here’s the Rust code to accompany the above logic:

We’ll stop it here for today and pick up next time with delving into each of the functions that are responsible for identifying and tokenizing individual tokens in input Scheme code. Once the lexer is complete, we’ll be able to lex an input source file containing Scheme code into an array of Tokens to then pass along to the parser.

Post 4: In which we fix some issues with our interpreter and start adding more operations to it.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade