Make a LISP in 200 Lines of Code

Eric Weinstein
11 min readSep 18, 2018

--

In this repl.it tutorial, we’ll build a little LISP in under 200 lines of JavaScript! Collaboration in open source software is important, so we’ll be building on work done by Anton Davydov and Mary Rose Cook.

While the deep details are beyond the scope of this tutorial, our program will handle tokenizing (turning text into a stream of characters for processing) and parsing (performing that processing) in order to convert LISP programs like (+ 1 (* 2 3)) into a form that JavaScript can understand.

We also won’t go too deeply into the syntax of LISP, but here’s a quick overview of some of the main differences between the language we’ll be creating and JavaScript:

  • Expressions are either primitives, like numbers and strings, or lists of primitives. Lists are delimited with parentheses.
  • In our version of LISP, functions are are created using lambda, are named using define, and are called by using the function name. For example, (define square (lambda (x) (* x x))) is equivalent to let square = function(x) { return x * x; };.
  • Operations use Polish notation, meaning that instead of writing 1 + 1, we write (+ 1 1). This looks confusing at first, but is actually convenient for lists of arguments, since we don’t have to repeat the operator: instead of 1 + 2 + 3 + 4 + 5, we can write (+ 1 2 3 4 5).

We’ll start by creating three functions to tokenize and parse our program:

  1. parse(), which doesn’t do much right now, but we’ll update it soon to parse the tokens it receives;
  2. tokenize(), which cuts a string up into characters, first by ensuring each token is surrounded by one space on either side in order to split on that whitespace, then removing any tokens that are purely whitespace; and
  3. read(), which converts LISP style lists, delimited by parentheses, into square-bracket arrays that JavaScript can understand.

We’ll also add two more functions, atom() and eval(), which our program will use to evaluate the set of tokens it receives and take different actions depending on what those tokens are.

Let’s get started! Create a new repl.it project and select Node.js as the environment. We’ll put all our code in a single file, index.js, starting with the following (make sure to try typing it out rather than copying and pasting):

class Lisplet {
parse(program) {
return this.read(this.tokenize(program));
}
tokenize(characters) {
return characters.
replace(/\s\s+/g, " ").
replace(/\(/g, " ( ").
replace(/\)/g, " ) ").
split(" ").
filter(t => " \t\n".indexOf(t) === -1);
}
read(tokens) {
if (tokens.length === 0) {
return;
}
// Grab the first token.
let token = tokens.shift();
if (token === "(") {
let list = [];
while (tokens[0] !== ")") {
list.push(this.read(tokens));
}
// Keep going (since we may have nested lists).
tokens.shift();
return list;
} else if (token === ")") {
throw new Error("Unexpected token ')'");
} else {
return token;
}
}
}
let lisplet = new Lisplet();

As mentioned, parse() doesn’t do much at the moment (it just calls read() on the result of tokenize()), so we’ll gloss over it for now. The tokenize() method takes the text we put in (say, (define pi 3.14159)), turns multiple spaces into single spaces, adds space around parentheses, and then splits on whitespace, removing any empty tokens.

If you lisplet.tokenize("(define pi 3.14159)");, your program should output the list of individual tokens that make up that string:

[ '(', 'define', 'pi', '3.14159', ')' ]

The read() method takes our list of tokens and builds JavaScript-style arrays (with square brackets) into LISP-style lists (with parentheses). Check it out by typing lisplet.read(lisplet.tokenize("(define pi 3.14159)"));:

[ 'define', 'pi', '3.14159' ]

This is only very slightly different from what we had before, but the crucial difference is that this has been transformed into something JavaScript understands: an array of strings!

Soon, we’ll see how to teach JavaScript how to understand what to do with this array, but first, we’ll want to add some basic functions to our programming environment so we can build interesting programs. We’ll keep it simple by adding twelve small functions: comparators to check if two thing are equal or not equal, if one is less than (or equal to) or greater than (or equal to) another; arithmetic operators to add, subtract, multiply, and divide; and two identities to handle true and false values. Go ahead and add this constructor() function inside our Lisplet class at the very top, just after the line class Lisplet {:

constructor() {
this.env = {
"==": function() {
let [a, b] = Array.from(arguments)[0];
return a === b;
},
"!=": function() {
let [a, b] = Array.from(arguments)[0];
return a !== b;
},
"<": function() {
let [a, b] = Array.from(arguments)[0];
return a < b;
},
"<=": function() {
let [a, b] = Array.from(arguments)[0];
return a <= b;
},
">": function() {
let [a, b] = Array.from(arguments)[0];
return a > b;
},
">=": function() {
let [a, b] = Array.from(arguments)[0];
return a >= b;
},
"+": function() { return Array.from(arguments)[0].reduce((a, e) => { return a + e; }); },
"-": function() { return Array.from(arguments)[0].reduce((a, e) => { return a - e; }); },
"*": function() { return Array.from(arguments)[0].reduce((a, e) => { return a * e; }); },
"/": function() { return Array.from(arguments)[0].reduce((a, e) => { return a / e; }); },
"true": true,
"false": false,
};
}

These methods will come in handy when we start evaluating the tokens our program reads later on. Before we do that, though, we’ll need to do one more bit of token processing by adding another method, atom():

atom(token) {
if (/\.\d+/.test(token)) {
return parseFloat(token);
} else if (/\d+/.test(token)) {
return parseInt(token, 10);
} else if (this.env[token] && typeof this.env[token] !== "function") {
return this.env[token];
} else {
return token.toString();
}
}

The atom() method handles primitives in our new language by matching tokens against regular expressions for integers and floating point numbers, as well as a test for primitives we may have defined in our environment, this.env (for example, when we defined pi as 3.14159). If we don’t see a number or a term previously defined, we just return the string representation of the token.

We’ll also need to update our read() method’s else branch to return this.atom(token); instead of just return token;. Our code should now look like this:

class Lisplet {
constructor() {
this.env = {
"==": function() {
let [a, b] = Array.from(arguments)[0];
return a === b;
},
"!=": function() {
let [a, b] = Array.from(arguments)[0];
return a !== b;
},
"<": function() {
let [a, b] = Array.from(arguments)[0];
return a < b;
},
"<=": function() {
let [a, b] = Array.from(arguments)[0];
return a <= b;
},
">": function() {
let [a, b] = Array.from(arguments)[0];
return a > b;
},
">=": function() {
let [a, b] = Array.from(arguments)[0];
return a >= b;
},
"+": function() { return Array.from(arguments)[0].reduce((a, e) => { return a + e; }); },
"-": function() { return Array.from(arguments)[0].reduce((a, e) => { return a - e; }); },
"*": function() { return Array.from(arguments)[0].reduce((a, e) => { return a * e; }); },
"/": function() { return Array.from(arguments)[0].reduce((a, e) => { return a / e; }); },
"true": true,
"false": false,
};
}
run(code) {
return this.eval(this.parse(code));
}
parse(program) {
return this.read(this.tokenize(program));
}
tokenize(characters) {
return characters.
replace(/\s\s+/g, " ").
replace(/\(/g, " ( ").
replace(/\)/g, " ) ").
split(" ").
filter(t => " \t\n".indexOf(t) === -1);
}
read(tokens) {
if (tokens.length === 0) {
return;
}
// Grab the first token.
let token = tokens.shift();
if (token === "(") {
let list = [];
while (tokens[0] !== ")") {
list.push(this.read(tokens));
}
// Keep going (since we may have nested lists).
tokens.shift();
return list;
} else if (token === ")") {
throw new Error("Unexpected token ')'");
} else {
return this.atom(token);
}
}
atom(token) {
if (/\.\d+/.test(token)) {
return parseFloat(token);
} else if (/\d+/.test(token)) {
return parseInt(token, 10);
} else if (this.env[token] && typeof this.env[token] !== "function") {
return this.env[token];
} else {
return token.toString();
}
}
}

We can now re-run lisplet.read(lisplet.tokenize("(define pi 3.14159)"));. The result, [ 'define', 'pi', 3.14159 ], doesn’t look too different from our prior result, but notice we now have a numeric value for pi (which had been a string before). This is because when we pass the token to atom(), it matches the regular expression we wrote for floating point numbers, so our program calls parseFloat() on it to turn it into a number.

We can now focus on the last, longest, and most important function in our little language: eval()! This is where our program decides what to do based on the tokens it sees.

There are two JavaScript functions that you’ll see a few times in this code that you might not have a lot of experience with: bind() and call(). You might not see these very frequently in your code, but they come in handy when writing libraries like parsers and web frameworks, so it’s definitely worth your white to check out the linked MDN documentation to see how they work.

So! Without further ado, here’s where the magic happens:

eval(expression) {
if (!expression) {
return;
} else if (typeof expression === "number" || typeof expression === "boolean") {
return expression;
} else if (Object.keys(this.env).indexOf(expression[0]) !== -1) {
let fn = expression[0];
let args = expression.slice(1);
function handleArgs(args) {
args = [].slice.call(args);
for (let i = 0; i < args.length; i++) {
if (Array.isArray(args[i])) {
args[i] = this.eval(args[i]);
}
}
return this.eval(args);
}
return this.env[fn](handleArgs.call(this, args));
} else if (expression[0] === "define") {
let args = expression.slice(1);
this.env[args[0]] = args[1];
return global[args[0]] = this.eval(args[1]);
} else if (expression[0] === "if") {
let args = expression.slice(1);
let [test, conseq, alt] = args;
return this.eval(test) ? conseq : alt;
} else if (expression[0] === "lambda") {
let args = expression.slice(1);
let [params, body] = args;
return new Function(params, this.eval(body));
} else {
let args = expression.slice(1);
let params = expression[0][1];
if (params) {
let body = [].slice.call(expression[0][2]);
let bound = params.reduce((obj, k, i) => ({...obj, [k]: args[i] }), {});
function replace(bound, body) {
body = [].slice.call(body);
bound = Object.assign({}, bound);
for (let i = 0; i < body.length; i++) {
if (Array.isArray(body[i])) {
body[i] = this.eval(replace.call(this, bound, body[i]));
}
if (bound[body[i]]) {
body[i] = bound[body[i]];
}
}
return this.eval(body);
}
return replace.call(this, bound, body);
}
return eval(expression);
}
}

There’s a lot going on here, so let’s unpack it branch by branch. Our first check is to see whether there’s an expression to evaluate; if not, we just return. Next, we check to see if our expression is a number or a boolean; if so, we just return that.

Things get interesting in our next branch. We check to see if the expression is a variable function name we’ve put in our environment (like square; note that atom() will have caught variable names that map to primitives, like pi). We then separate the function name from the arguments passed to it and use handleArguments() to recursively parse those arguments for evaluation.

In the next branches, we handle the keywords define (used to assign variables for primitives and functions), if (to handle conditionals), and lambda (to let us create functions).

In our final branch, we evaluate function calls. This is probably the most complex part of our program, so an example might help. If we run the following:

lisplet.run("(define pi 3.14159)");
lisplet.run("(define circle-area (lambda (r) (* pi (* r r))))");
lisplet.run("(circle-area 10)");

When we run (circle-area 10), our program grabs the arguments, 10, and uses the bound object to map the value 10 to the local variable r: {r: 10}. We then replace r with 10 in the function body (using Object.assign() and [].slice.call() to make copies of our variable binding object and the arguments array) and recursively call our eval() function to process the tokens. (Our program finds pi by looking it up in the global environment.) It’s a little confusing, but if you walk through carefully and run different expressions in the REPL, you’ll start to get the hang of it!

Finally, if absolutely all else fails, we fall back to JavaScript’s native eval() to try to evaluate the code we’ve received. Note that this is not the eval() function we defined; ours is this.eval(), and JavaScript’s is just eval().

Our complete code should look like so:

class Lisplet {
constructor() {
this.env = {
"==": function() {
let [a, b] = Array.from(arguments)[0];
return a === b;
},
"!=": function() {
let [a, b] = Array.from(arguments)[0];
return a !== b;
},
"<": function() {
let [a, b] = Array.from(arguments)[0];
return a < b;
},
"<=": function() {
let [a, b] = Array.from(arguments)[0];
return a <= b;
},
">": function() {
let [a, b] = Array.from(arguments)[0];
return a > b;
},
">=": function() {
let [a, b] = Array.from(arguments)[0];
return a >= b;
},
"+": function() { return Array.from(arguments)[0].reduce((a, e) => { return a + e; }); },
"-": function() { return Array.from(arguments)[0].reduce((a, e) => { return a - e; }); },
"*": function() { return Array.from(arguments)[0].reduce((a, e) => { return a * e; }); },
"/": function() { return Array.from(arguments)[0].reduce((a, e) => { return a / e; }); },
"true": true,
"false": false,
};
}
run(code) {
return this.eval(this.parse(code));
}
parse(program) {
return this.read(this.tokenize(program));
}
tokenize(characters) {
return characters.
replace(/\s\s+/g, " ").
replace(/\(/g, " ( ").
replace(/\)/g, " ) ").
split(" ").
filter(t => " \t\n".indexOf(t) === -1);
}
read(tokens) {
if (tokens.length === 0) {
return;
}
// Grab the first token.
let token = tokens.shift();
if (token === "(") {
let list = [];
while (tokens[0] !== ")") {
list.push(this.read(tokens));
}
// Keep going (since we may have nested lists).
tokens.shift();
return list;
} else if (token === ")") {
throw new Error("Unexpected token ')'");
} else {
return this.atom(token);
}
}
atom(token) {
if (/\.\d+/.test(token)) {
return parseFloat(token);
} else if (/\d+/.test(token)) {
return parseInt(token, 10);
} else if (this.env[token] && typeof this.env[token] !== "function") {
return this.env[token];
} else {
return token.toString();
}
}
eval(expression) {
if (!expression) {
return;
} else if (typeof expression === "number" || typeof expression === "boolean") {
return expression;
} else if (Object.keys(this.env).indexOf(expression[0]) !== -1) {
let fn = expression[0];
let args = expression.slice(1);
function handleArgs(args) {
args = [].slice.call(args);
for (let i = 0; i < args.length; i++) {
if (Array.isArray(args[i])) {
args[i] = this.eval(args[i]);
}
}
return this.eval(args);
}
return this.env[fn](handleArgs.call(this, args));
} else if (expression[0] === "define") {
let args = expression.slice(1);
this.env[args[0]] = args[1];
return global[args[0]] = this.eval(args[1]);
} else if (expression[0] === "if") {
let args = expression.slice(1);
let [test, conseq, alt] = args;
return this.eval(test) ? conseq : alt;
} else if (expression[0] === "lambda") {
let args = expression.slice(1);
let [params, body] = args;
return new Function(params, this.eval(body));
} else {
let args = expression.slice(1);
let params = expression[0][1];
if (params) {
let body = [].slice.call(expression[0][2]);
let bound = params.reduce((obj, k, i) => ({...obj, [k]: args[i] }), {});
function replace(bound, body) {
body = [].slice.call(body);
bound = Object.assign({}, bound);
for (let i = 0; i < body.length; i++) {
if (Array.isArray(body[i])) {
body[i] = this.eval(replace.call(this, bound, body[i]));
}
if (bound[body[i]]) {
body[i] = bound[body[i]];
}
}
return this.eval(body);
}
return replace.call(this, bound, body);
}
return eval(expression);
}
}
}
let lisplet = new Lisplet();

And that’s it! Feel free to play around by doing arithmetic, evaluating conditionals, and defining and using your own functions. Note, however, that since we’re intentionally scoping our work to a subset of LISP’s full functionality, our version has some significant limitations (for example, it doesn’t handle list literals very well: try implementing car or cdr). Feel free to fork this REPL and add try adding quote and list, including more functions in the environment, or refactoring the code to make it more performant or more maintainable (that eval() function is very long and could probably be broken up into smaller helper functions).

Thanks for reading, and I hope you enjoyed this tutorial!

--

--