A Brief Introduction to Recursion Schemes

Recursion schemes are a powerful set of tools which have little to no visibility outside of the functional programming world. They allow you to express recursive algorithms by combining non-recursive functions with a toolkit of generic higher-order functions; effectively, giving you the ability to write algorithms which work on any recursive data type, without having to write recursion directly. This provides a level of abstraction over primitive recursion which has been compared to the abstraction that for and while loops provide over goto statements. Between their wild names and their nature as generic, recursive functions, they can be intimidating at first glance. (They definitely were for me.) This post is meant to be a very light introduction to them, introducing the general concept and the steps that you need to take to use them, and giving an example of the most basic recursion scheme, catamorphism. For now, we’ll treat recursion schemes as black boxes; the how and why of how they work will appear in future posts, as will deeper explorations of the various schemes.

Recursion schemes require the use of higher kinded types, so this post makes use of brands, as described here. Specifically, we’ll be using brands to implement functors, as is demonstrated in that second post. Other than the contents of those two posts, no prior knowledge about abstract math or functional programming is assumed.

Diving in with an example

This post will focus on the AST for a simple calculator-like language. We’ll imagine that we have a parser for strings like

AST for 2 * (1 + 1) * 4 * 3

which would produce an AST. For testing purposes, we’ll construct this AST manually, like this:

const expression = times(num(2),
times(paren(plus(num(1), num(1))),
times(num(4), num(3))));

Then we’ll write an evaluator function, evalExpr, which allows us to run the calculation and get back a numeric result:

console.log(evalExpr(expression)); // prints `48`

A non-recursion-scheme implementation

First, let’s look at an implementation of this AST written for use with explicit recursion:

type Expr
= Plus
| Times
| Paren
| Num;
class Plus {
left: Expr;
right: Expr;
constructor(left: Expr, right: Expr) {
this.left = left;
this.right = right;
}
};
class Times {
left: Expr;
right: Expr;
constructor(left: Expr, right: Expr) {
this.left = left;
this.right = right;
}
};
class Paren {
contents: Expr;
constructor(contents: Expr) {
this.contents = contents;
}
};
class Num {
value: number;
constructor(value: number) {
this.value = value;
}
};

And here is the recursive function which we’ll use to evaluate an AST:

function evalExpr (ex: Expr) : number {
return (
ex instanceof Plus ? evalExpr(ex.left) + evalExpr(ex.right)
: ex instanceof Times ? evalExpr(ex.left) * evalExpr(ex.right)
: ex instanceof Paren ? evalExpr(ex.contents)
: /* ex is a Num */ ex.value);
}

If you were wondering why we used classes for the AST implementation rather than object types, the reason was so we’d be able to easily match on the different cases in evalExpr, without having to add some sort of type field to the AST nodes. We will call Plus, Times, Paren, and Num “data constructors” of our Expr type, because when we want an Expr, we need to construct it using one of these classes. Having to use new everywhere makes for a more verbose interface when creating an AST, though, so let’s add some convenience functions to use instead:

export function plus(left: Expr, right: Expr) : Plus {
return new Plus(left, right);
}
export function times(left: Expr, right: Expr) : Times {
return new Times(left, right);
}
export function paren(contents: Expr) : Paren {
return new Paren(contents);
}
export function num(value: number) : Num {
return new Num(value);
}

These are the functions that our hypothetical parser would call in order to produce its output.

Making our AST recursion-scheme-friendly

Now, let’s convert this to use recursion schemes. Unfortunately, the claim that I made in the intro that recursion schemes work on any recursive data type was not *completely* accurate. To be more precise, recursion schemes work on data types which express recursion in a particular way, and there’s a straightforward way to convert “normal” recursive data structures into this form.

What we need to do is change our data type such that rather than being recursive, it is parameterized. What this means is that the fields which previously had a type of Expr will have a generic type instead. This will be important later, when we write functions which operate on a single node of our recursive data structure, rather than being given access to the structure as a whole.

type ExprI<A>
= Plus<A>
| Times<A>
| Paren<A>
| Num<A>;
export class Plus<A> {
left: A;
right: A;
constructor(left: A, right: A) {
this.left = left;
this.right = right;
}
};
export class Times<A> {
left: A;
right: A;
constructor(left: A, right: A) {
this.left = left;
this.right = right;
}
};
export class Paren<A> {
contents: A;
constructor(contents: A) {
this.contents = contents;
}
};
export class Num<A> {
value: number;
constructor(value: number) {
this.value = value;
}
};

Next, we need to give the scheme a way to recurse into our types. The way we’ll do this is by making Expr into a functor, whose map function applies its argument function to the generic fields inside of Expr:

export class IsExpr {};
// We call the exported type `ExprF` for `ExprFunctor`
export type ExprF<A> = HKT<IsExpr, A>;
export function inj<A>(a: ExprI<A>) : ExprF<A> {
return ((a: any): ExprF<A>);
};
export function prj<A>(a: ExprF<A>) : ExprI<A> {
return ((a: any): ExprI<A>);
};
export function map<A, B>(fn: (A) => B, a: ExprF<A>) : ExprF<B> {

const node = prj(a);
  const mapped = (
node instanceof Plus ? new Plus(fn(node.left), fn(node.right))
: node instanceof Times ? new Plus(fn(node.left), fn(node.right))
: node instanceof Paren ? new Paren(fn(node.contents))
/* Num */ node);

return inj(mapped);
};
if (false) {
({map} : Functor<IsExpr>)
}

(If you’re confused by what’s going on with inj, prj, HKT, and IsExpr, I recommend going back to my brands post and its example of creating a functor. Due to the implementation of recursion schemes, we need the type we create to be a functor, and Functor only works with types implemented using HTK.)

What we have, now, is a data structure that represents a single “slice” of a recursive data structure. Because it is generic, we can nest other data structures inside of it. Later on we’ll see utilities that make this data structure recursive again by nesting ExprF nodes within other ExprF nodes. But first, we’ll look at what we gained from this parameterization.

Transforming expressions one at a time

Next up, we’re going to rewrite evalExpr to no longer use explicit recursion. This is one of the main advantages that recursion schemes give us: the ability to write functions that only operate on a single step of a recursive algorithm, and letting the scheme worry about stringing the steps together. Let’s look at this function:

function _evalExpr (expression: ExprF<number>) : number {
const ex = prj(expression);
return (
ex instanceof Plus ? ex.left + ex.right
: ex instanceof Times ? ex.left * ex.right
: ex instanceof Paren ? ex.contents
: /* ex is a Num */ ex.value);
}
const evalExpr = ex => cata(exprFunctor, _evalExpr, ex);

The first thing this function does is project our ExprF into a usable form, an ExprI. This is just boilerplate that we need to do at the start of any function which takes a higher-kinded type as an argument, if we want to work with that argument at all. After we declare the function, we use cata to turn it into a recursive function, which also requires passing in the functor instance for expr. But these aren’t the interesting parts of this code. What’s up with the signature of _evalExpr, ExprF<number> => number? And what’s actually going on here?

For now, I’m going to give a very surface explanation: our final evalExpr function is going to walk down the recursive data type that it’s given until it gets to the Num nodes at the bottom, and then it’s going to walk back up the tree, evaluating and combining expressions as it goes. cata takes care of doing this walking, calling our _evalExpr function as it goes. At the bottom of the tree, we just return the number wrapped by our Num nodes. Cases where ex is a Plus, Times, or Paren occur higher in the tree. By the time we get to them, we’ve already converted the nodes beneath them into numbers, so cata takes care of plugging those numbers back into the types that hold them. In our original, primitively-recursive evalExpr function, the code handling Plus had to be concerned with the recursive nature of a Plus nodes contents, and manually recursed into them in order to get out the numbers that it could add together. In effect, it was responsible for evaluating both itself, and all steps of the algorithm under itself. In our new version, the code handling Plus just needs to know that this algorithm is producing numbers, so its fields will be numbers, and it just worries about combining those fields in a way which is appropriate for a Plus.

Using our new evaluation function

Unfortunately, there is one more change which we must make before we can use our new code. This is because the signature of our new evalExpr function doesn’t quite match the original. Initially it was of type Expr => number, but now it is Fix<IsExpr> => number.

Fix is a type which encapsulates recursion on a type. Effectively, our ExprF<A> nodes need to be nested inside of each other to actually form a recursive tree, and through techniques that are the subject of a future post, Fix accomplishes this nesting. Because Fix contains ExprF nodes, and ExprF is a higher-kinded type, Fix is marked with ExprF's brand, IsExpr. We can make things a little more clear by doing this:

type Expr = Fix<IsExpr>;

thus cleaning up our interface a little.

We still need to have a way to actually construct values that can be consumed by our new evalExpr function, though. The helper functions we made before come to the rescue here:

export function plus(left: Expr, right: Expr) : Expr {
return new In(new Plus(left, right));
}
export function times(left: Expr, right: Expr) : Expr {
return new In(new Times(left, right));
}
export function paren(contents: Expr) : Expr {
return new In(new Paren(contents));
}
export function num(value: number) : Expr {
return new In(new Num(value));
}

In is the one data constructor of a Fix type, and is so named because it’s a way of putting a piece of data into a Fix. All we have to do is make these functions wrap their outputs with In and change their return type to Expr and we’re done. Now, things that construct Exprs (like our hypothetical parser) can create them using the same interface as before. We can do this by hand to see everything in action:

// AST for 2 * (1 + 1) * 4 * 3
const expr = times(num(2),
times(paren(plus(num(1), num(1))),
times(num(4), num(3))))
console.log(evalExpr(expr)); // prints `48`

Hooray! This is a fairly minimal example of cata's possibilities, but more complex examples will be saved for a future post.


This concludes our introduction. I hope that I’ve given you a glimpse of what recursion schemes are: that they allow you to express a recursive algorithm without using explicit recursion, instead writing functions that just do one step of an algorithm at a time. catamorphism was our first scheme, which we could use to transform a recursive data structure from the bottom up. We covered the method of declaring data structures that can be used with recursion schemes, by turning these structures into functors and combining them with the In class. However, I’ve left a lot of things unexplained: What Fix and In really are, the real essence of catamorphism, what other schemes exist, and generally how any of this works. More posts on these are coming soon.

Code samples for this post may be found here.
The full (in-progress) list of my recursion schemes post lives
here.

Thanks to @NealEhardt for assistance editing this post.