# On Lisp in 15 minutes

LISP is simple and difficult, elegant and ad hoc; it is a beautiful blend of foresight and fortuity.¹

Whether you agree with this statement or not, Lisp’s historical importance is indisputable. Even though a beautiful language in its own right, Lisp turned to be one of the most influential models* *of computer programming. It also gave birth to many widespread concepts in the industry today and, as mainstream is steadily shifting from von Neumann style to λ-calculus, we can suspect many to be adopted in the future. In effect, Lisp is of interest to both a historian and a futurist. Paul Graham suggested that what John McCarthy did for programming was something like what Euclid did for geometry². What is so important that McCarthy discovered? To answer this question we have to put aside decades’ worth of Internet mythos and take a look at Lisp’s humble beginnings — *Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I*³ by John McCarthy. In this article, I will follow his steps in order to explain how Lisp was constructed.

*I’m writing this article to practice my understanding and appreciation of McCarthy’s work. The goal of this article is to explain how Lisp came to be and discuss properties that make it special. Hopefully, someone will find it useful. I highly recommend reading the original paper, and Paul Graham’s take on it, after this short introduction (see References).*

# 1. Preliminaries

Lisp is constructed from the bottom. It is only appropriate that we start by laying out basic ideas and notations concerning functions which we will use later on. Most of the concepts presented here are well-known, so feel free to skip this section.

## 1.1. Partial functions

*Partial functions* (not to be confused with the partial application!) are functions that are defined only for parts of their domains. The *domain of a function** *is a collection of all possible arguments to a function. Most of the functions in computer programs are partial (at least in imperative languages). Consider the following example (in Python):

`def one_nth(n):`

return 1 / n

or, a bit more involved:

**def **query_purchases(client: BillingClient) -> List[Purchase]:

** if **client.is_initialized:

return client.query_purchases()

**else:**

raise Exception('Billing client is not set up')

The behavior of these functions (in a mathematical sense) is undefined for certain inputs, as passing zero to `one_nth`

or uninitialized `client`

to `query_purchases`

will result in an exception. Another possibility is that functions may not terminate, e.g. when caught in infinite recursion. It derives from the fact that the type systems are often too weak to constrain the type as much as the programmer would like or the penalty of assigning more precise type is too high. In mathematics, we don’t have this problem. In case of `one_nth`

, we could simply say that the domain of the function is ℝ \ {0}.

One example of a more “expressive type” system is Idris’, which supports dependent types. Although dependent types are a story for another time, in short, it simply means that types themselves are dynamic and first-class, in the sense that they possess properties related to values they take. For example, a function taking argument N may return the type of list which is no longer than N.

In general, any function that can result in an error, or does not terminate for some of its inputs, is considered partial.

In some languages, such as Haskell, a computation which never completes successfully is referred to as “bottom” and denoted with “⊥”. Therefore, when the evaluation of a partial function results in a partial behavior (such as a crash), its value is ⊥.

In Racket, Lisp dialect, the programmer can explicitly mark a function as partial.

Sometimes, when it is desirable to suppress errors and exceptions, a special value might be returned acting as a bottom, such as IEEE floating point’s not-a-number (

`).`

NaN

Analogically, when a function is *total*, we can say that it produces a valid result for all possible inputs. For example, the function `not`

is obviously defined for both `True`

and `False`

, which are the only possible values of the `bool`

type. Total functions allow a programmer to reason about programs using the types alone; a function with the type *A*→* B* implies that it is always possible to get a *B* when you have an *A*.

## 1.2. Predicates

A predicate is a function whose codomain (set of all possible outputs) consists only of truth values (i.e., truth or falsehood). A common convention is to denote truth with `t`

and falsehood with `f`

(in the original paper McCarthy used uppercase letters). Simply put, a predicate must return a Boolean.

## 1.3. Pure functions

One of the most important advantages of purely functional approach over imperative one (and by extension OOP) is *referential transparency*, which guarantees that for a specific set of arguments a function (or generally, an expression) will always return the same result (it is entirely predictable). In other words, an evaluation of an expression can have no effect other than to compute its value. As a result, one can freely replace expressions by their values. Functions with this property are called *pure*. Almost all functions in imperative environments are impure.

## 1.4. Conditional Expressions

The function above is a well-known *absolute value*. Because its value depends on the condition (whether *x* is greater-or-equal to zero), we can say that it is *conditional* in its behavior. However, this notation is not suited for expressing symbolically the dependence of quantities on truth values — it has to rely on English words.

A better notation for representing conditional expressions was proposed by McCarthy (now referred to as McCarthy formalism):

`(p₁ → e₁, … , pₙ → eₙ)`

Here, the *p*’s are propositional expressions and the *e*’s are expressions of any kind. It may be read as:

*If p₁ then e₁ otherwise if p₂ then e₂, … , otherwise, if pₙ then eₙ.*

Notice how we are assuming here the evaluation order — we consider *pᵢ* from left to right. With this notation under our belt, we can express the absolute value function simply as:

`(x ≥ 0 → x, x < 0 → -x)`

More importantly, however, this notation is sufficient to express **any** *if-then* statement. For example:

`if n % 3 == 0 and n % 5 == 0:`

return 'FizzBuzz'

elif n % 3 == 0:

return 'Fizz'

elif n % 5 == 0:

return 'Buzz'

else:

return n

…can be expressed as:

`(3 ∣ n ∧ 5 ∣ n → 'Fizzbuzz',`

3 ∣ n → 'Fizz',

5 ∣ n → 'Buzz',

t → n)

`means that`

m | n`is divisible by`

n`, or`

m`divides`

m`.`

n`is standard logical conjunction, or AND in Boolean algebra.`

∧Notice how by exploiting the left-to-right evaluation order we can encode the

elseclause using the`value.`

t

Conditionals were introduced by Lisp. In effect, *if* statements, as we know them today, are thanks to McCarthy. As of 1958, Fortran only had a conditional *goto *which was based on the branch instruction in the underlying hardware. McCarthy also got conditionals into Algol 60, whence they spread to most other languages⁸.

## 1.5. Recursive functions and label

Thanks to conditional expressions given above, we can now define functions recursively. For example, we can define factorial as:

`n! = (n = 0 → 1, t → n · (n − 1)!)`

Note, however, that we are making here one very important assumption — the body of the function can reference the function itself by its name (or symbol — `!`

in this case).

In most actual languages this is not a problem, but remember — we do not have any language yet!

In general, we can solve this problem with something called *Y combinator*, but that’s a topic for another time. As McCarthy most likely was not aware of Y combinator at the time of writing his paper, he introduced his notation instead.

`label(a, e)`

denotes the expression `e`

, with the special property that all occurrences of `a`

* *within `e`

are to be interpreted as referring to the expression `e`

itself. Therefore, by combining *label* with conditional expressions, we can define Euclid’s algorithm for calculating the greatest common divisor as follows:

`label(gcd, (m > n → gcd(n, m),`

m | n → m,

t → gcd(n % m, m)))

It is important to note that although recursion existed as a mathematical concept before Lisp, Lisp was the first programming language to support it.

# 2. S-expressions

First, we define *a* *symbolic expression (s-expression, *or just *expression* for* *short*)*. An s-expression is either:

*an atom*, which is a sequence of characters, e.g.`spam`

,`ham`

,`eggs`

or`f*&o$o`

. An atom cannot start with`(`

, which is reserved for lists.*a list*of zero or more s-expressions which are separated by whitespace and enclosed by parentheses. For example:`(abc)`

,`()`

,`(a (bc))`

.

In simplified extended Backus-Naur form (commonly used for describing the syntax of a language), s-expressions can be represented as:

— which is the exact restatement of the definition given above (except that atoms are assumed to be alphanumeric).

The above definition is recursive, as we are referring to s-expressions while defining lists of s-expressions. Some examples of valid s-expressions include:

`()`

foo

(bar)

(foo bar)

((spam) (ham ()) eggs ())

— but also:

`(+ 1 2 3 5 8 13)`

(> 24 512 12)

(and true false true)

# 3. Abstract machine

“The most practical way to think about the meaning of a program is what it does — when we run the program, what do we expect to happen? How do different constructs in the programming language behave at run time, and what effect do they have when they’re plugged together to make larger programs? This is the basis of operational semantics, a way of capturing the meaning of a programming language by defining rules for how its programs execute on some kind of device. This device is often an abstract machine: an imaginary, idealized computer that is designed for the specific purpose of explaining how the language’s programs will execute. Different kinds of a programming language will usually require different designs of an abstract machine in order to neatly capture their runtime behavior.”⁷ In order to construct Lisp, we first construct an imaginary machine on which it will operate.

## 3.1. Elementary functions

We start by defining seven (*The Magnificent Seven, *in fact) elementary functions on s-expressions. We call them elementary because we assume they exist on the abstract machine (just like we assume axioms to be true in some mathematical system). A function on s-expression receives an s-expression and returns a new one. By convention, we will refer to such functions as *operators*.

To keep our notation simple, we

representoperators using atoms and lists — if an s-expression is a list, the first element is an operator, and remaining ones arearguments. This means that functions on s-expressions are encoded using s-expressions themselves. For example, the`operator expects one argument, and so its invocation will take the form:`

car

`(car x)`

Atom “`car`

" on its own refers to the function itself. If `car`

expected an argument which is a function (it doesn’t), we could write `(car car)`

.

## atom

`atom`

is a very simple operator; it just returns `t`

if the argument passed is an atom, and otherwise.

`> (atom f)`

t

> (atom foo)

t

> (atom atom)

t

## quote

`quote`

is even simpler, it returns whatever was passed into it.

> (quote x)

x> (quote (atom x))

(atom x)> (quote (1 2 3 4 5))

(1 2 3 4 5)

It is crucial to understand how `quote`

behaves and why is useful. The second example — `(quote (atom x))`

—will reduce to the list containing `atom`

and `x`

. `(atom x)`

is not “evaluated” further (at this point we do not know what evaluation even is, but let us ignore that for a second). In essence, `quote`

is used to stop the evaluation — normally in`(1 2 3)`

, `1`

would be assumed to be an operator but we want to treat the list as a regular list. This is why we would use `quote`

. It is usually abbreviated with `'`

, which gives us:

`> '(1 2 3)`

(1 2 3)

Now that we have `quote`

, we can go back to `atom`

for one more example:

`> (atom '(foo bar baz))`

f

If the code would eventually be evaluated by some kind of program capable of doing that, `quote`

would prevent evaluation of any s-expression given as its argument, which is very important when passing data around.

## eq

`eq`

returns `t`

if the values of its arguments are the same atom and `f`

otherwise.

Some languages, such as Scheme, use

`. Clojure uses`

eq?`.`

=

## car

`car`

expects a list as an argument and returns its first element. It's name, similar to `cdr`

, is a historical legacy.

`> (car '(1 2 3 4))`

1

Certain languages, such as Clojure, rename

`with something more intuitive, e.g.`

car`. The “car” and “cdr” names are derived from the machine architecture that Lisp was designed for and nowadays languages use more familiar names.`

first

## cdr

`cdr`

expects a list as an argument and returns its back without the first element. For example:

`> (cdr '(1 2 3 4))`

(2 3 4)

In Clojure,

`behavior is provided via`

cdr`.`

rest

## cons

`cons`

(*construct*) takes two arguments, second of which must be a list, and prepends the list with the first argument:

`> (cons 1 '(2 3))`

(1 2 3)

Why would we prepend to lists instead of appending them is a topic on its own. In short, lists constructed as the head and rest are simpler to manipulate recursively, which is a huge benefit for a language such as Lisp.

## cond

We will now translate conditional expressions into s-expressions using `cond`

. `cond`

has the following form:

`(cond (p₁ e₁) ... (pₙ eₙ))`

The value is evaluated as follows. We evaluate `p`

expressions from left to right. If any predicate evaluates to `t`

, the value of the `cond`

expression is taken to be the value of the corresponding expression `e`

. For example:

`> (cond (t 1) (f 2))`

1

Moreover, because we evaluate from left to right:

`> (cond (t 1) (t 2))`

1

## 3.2. Denoting non-elementary functions

With basic functionality in place, we define a general scheme for defining new functions on the abstract machine. A function is expressed as

`(lambda (p₁ … pₙ) e)`

where *p₁ -pₙ* are called *parameters* and *e *is an *expression*. If you ever tinkered with lambda calculus (or with lambdas/anonymous functions in pretty much any modern programming language), you will recognize this notation.

An expression whose first element is a lambda, i.e., is of the form:

— is called a function call and its value is computed as follows. Each expression *aᵢ* is evaluated. Then *e* is evaluated. During the evaluation of *e*, the value of any occurrence of one of the *pᵢ* is the value of the corresponding *aᵢ* in the most recent function call.

Do you remember *label *notation*?* We can now translate it to s-expression form and use it on our abstract machine to obtain a general method for defining functions. The `label`

s-expression will have the following form:

`(label `*name lambda*)

— where *name** *is an atom representing the name of the function and

is the function body, also called to *lambda **form*.

Using label and the *Magnificent Seven* our expressive power is limitless.

For example, we can define the Boolean operator **NOT**:

(label not (lambda (x)

(cond (x f) (t t))))> (not t)

f> (not f)

t

As you can see, **NOT **expects one argument — *x*. In the body of the function, we perform a conditional check of *x*. If the value is equal to `t`

, we return `f`

and otherwise `t`

(`t`

in the last positional of the *cond* acts as the *else* clause in modern languages, since the condition is always true).

Next, we can define a bit more involved Boolean** AND**:

(label and (lambda (x y)

(cond (x (cond (y t) (t f)))

(t f))))> (and t f)

f> (and t t)

t

This body can be summarized as: If *x* is true, check if *y* is true and if so, return `t`

. If *y* is not true, or *x* is not true, return `f`

(we have two *else* clauses).

We can write functions to manipulate lists, Boolean expressions, pair elements, rewrite expressions, and so on. For example:

`(not x)`

returns`t`

if its argument returns`f`

and`t`

otherwise.`(pair x y)`

takes two lists of the same length and returns a third one containing two-element lists of corresponding elements in argument lits. Nowadays we would more likely call this function`zip`

.`(append x y)`

takes two lists and concatenates them.`(assoc x y)`

takes an atom`x`

and a`y`

of the form return by`pair`

and returns a second-element from a pair in`y`

whose first element is`x`

. This function emulates a lookup into a dictionary (alternatively called a hash table or associative array).

It is important to note that in Lisp, functions are first class objects — they’re data, just like numbers and strings. Therefore, functions can be stored in variables, passed around as arguments, and so on. The idea of a first-class function was pioneered by Lisp.

## 3.3. Evaluation

More importantly, however, we can write a function that takes arbitrary Lisp expression and returns its value:

That is a bit of mouthful. If you want a step by step description, which is beyond the scope of this article, I recommend reading *Recursive Functions…* by McCarthy³ and Paul Graham’s *The Roots of Lisp*², which take a thorough look at `eval`

‘s inner-workings.

The important thing is this: `eval`

**is a Lisp interpreter**, and thus, implements the language. Any valid Lisp code can be passed to it for evaluation and we don’t need any other infrastructure. With only seven functions and a bit of notation, we were able to implement the whole language and allow it to define new functions. That’s neat and very elegant, but so what? Now that we understand how Lisp operates, let’s step back and consider what it means.

If you’re curious how Lisp can be implemented in a few hundred lines of code, check out my toy Lisp implementation: Kite.

# 4. Chemical X: Homoiconicity

In Lisp, as we saw, both code and data are represented in the same form: as s-expressions (forming trees) which are available for unrestricted mutation and execution. This means that **the Lisp code is first-class**. It can be explicitly constructed from scratch, passed around, modified and executed. In other words, in Lisp

*everything is data*. A language in which the internal «representation matches the external representation is called

*homoiconic*(from the Greek words

*homo*meaning “the same” and

*icon*meaning “representation”). Other examples of a language with this property include Prolog (logic programming) and Julia (numerical analysis and computational science). One of the immediate consequences of homoiconicity is that by using just a few functions:

`quote`

, `atom`

, `eq`

, `car`

, `cdr`

, `cons`

, and `cond`

, we can define `eval`

, that fully implements the language, and with `label`

and `eval`

we can define any additional function we need. In fact, it is a famous fact that the core of the Lisp can be written in about 20–30 lines of Lisp. In (an unfair) comparison, the average person who writes a C compiler or interpreter requires about 20,000 lines of C to do so and must be moderately expert about compilers or interpreters (of course, we rely on *meta-circularity*in doing so, but that’s a topic for another post). At this point, you may be asking yourself — what’s the big deal?

The big deal, as it turns out, that when the representation of the program (code) is the same as the representation for that data, you can manipulate code like you would manipulate data. This allows, for example, for syntactic extensibility, which is the ability to extend (or completely) change the syntax of the language. For example, in a normal language, if you wished that you had the *for-each *construct, you would have to wait until creators of the language add it in the next version. In Lisp, you can write a macro in a matter of minutes. In many Lisp languages, program transformation is an explicit part of the evaluation strategy of the language.

Moreover, in Lisp, programs composed of expressions. Lisp programs are trees of expressions, each of which returns a value. At the time, this was a revolutionary idea, in contrast to languages such as Fortran which distinguish between expressions and statements. This was largely dictated by the limitations of the hardware available in the late 1950s.

Finally, what McCarthy discovered is not just a programming language but a very elegant *model of computation*, just a notch above lambda calculus. We can say that Lisp was discovered, and not invented, because McCarthy built it from axioms so fundamental that it would be very hard to make an argument for the otherwise.

Core Lisp corresponds to *an untyped, call-by-value lambda calculus extended with constants. *As it turned out, lambda calculus* corresponds to natural deduction in logic. Specifically, by Curry-Howard isomorphism, types in a program are propositions, programs are proofs and evaluation of programs is the simplification of proofs. This discovery is phenomenal. Among others, it means that we can reason about programs using logic and that we get theorems in logic for free in programming (and vice versa).

The big difference between Lips and lambda calculus is that lambda calculus doesn’t contain any non-functional data. In lambda calculus, everything is a function, even a number (if you’re curious how this is achieved see Church encodings or Mogensen-Scott encodings). Lisp, on the other hand, operates on atoms and numbers (hence, *constants*). This is most likely what makes Lisp readable to mortal humans — a property that lambda calculus lacks.

The adherence to mathematically sound principles of computation allows Lisp to reap the benefits in the form of simple semantics, clarity, consistency, and other nice things such as referential transparency (original Lisp formulation is completely value-based). In contrast, almost all imperative languages are “just” tools (sometimes very good tools nonetheless) with undefined formal semantics, which makes reasoning about them rigorously very hard or almost impossible.

Thank you for reading. Please check out the references and maybe my other stuff. If you have any questions, please feel free to send me an e-mail at *hi@iwoherka.eu* or leave a comment.

*Specifically, typed lambda calculus. For more details, I highly recommend great talk by Philip Wadler, co-creator of Haskell — Propositions as Types.