Catamorphisms

Tearing down functors, recursively

This post is part of my series on recursion schemes.

In previous posts, we introduced recursion schemes and learned how and why they work. Now we can get to the fun stuff, and actually learn the schemes themselves. First up is the scheme which we’ve been using for examples so far, cata, the scheme which produces catamorphisms. In this post, we’ll see the definition of a catamorphism and the implementation of cata, as well as a few example catamorphisms.

So what exactly is a catamorphism? “Morphism” is a term from category theory (the mathematical field in which the paper introducing recursion schemes was written), and a morphism is basically a function which transforms one data structure to another. All recursion schemes are morphisms, with their prefixes giving a hint about the kind of transformation they involve. In this case we have the Greek root “cata-”, which means “down” or “back”, in the sense of “catastrophe” or “cataclysm.” A catamorphism is a function that “tears down” a data structure by recursing on it with a function that collapses a functor into its contained type. cata takes such a function, and returns a catamorphism.

Let’s finally look at the actual implementation of cata, from static-land-recursion-schemes:

export type Algebra<F, A> = HKT<F, A> => A;export function cata<F, A>(functor: Functor<F>,
transformer: Algebra<F, A>,
term: Fix<F>) : A {
const extracted = out(term),
childrenMapped = functor.map(
x => cata(functor, transformer, x),
extracted),
transformed = transformer(childrenMapped);
return transformed;
}

Going through this, step-by-step:

First, we define the type Algebra. An algebra is a function with the signature F<A> => A, where F is a functor. The name “algebra” here comes from category theory, in which we’d say that A is the “carrier type” of our function, and that our function evaluates the functor. I don’t particularly understand this, or why exactly this was the chosen name; I think of an algebra as extracting a value out of a functor’s context, or of collapsing a functor into its contained type, and so far this has worked for me.

The definition of cata is relatively minimal. Our input type is wrapped in Fix, so we use out to get our type out. Then we use map to recurse into the extracted type, calling cata on whatever it contains. In this way, cata drills down to the bottom of the tree, until it gets to values for which map doesn’t call its input function. (These are the values which do not make use of the functor’s type parameter. In our running example of arithmetic expressions, these values are instances of the Num class.)

Remember the signature of map: (A => B, F<A>) => F<B>. In this case, map calls cata, which returns something of type A, so childrenMapped holds something of type F<A>. This is our input functor, with anything that was nested within it already collapsed into a value of type A. We then pass this to our algebra, giving our final result of type A.

In this section, we’ll work with the Expr type that was used in previous posts. The full implementation of Expr is here.

In our introduction to recursion schemes, we saw that we could convert a nested Expr to a number using an evaluation function. Now, let’s convert Expr to a string using a pretty-print function:

const prettyPrint = ex => cata(exprFunctor,
function (expression: ExprF<string>) : string {
const ex = prj(expression);
return (
ex instanceof Plus ? `${ex.left} + ${ex.right}`
: ex instanceof Times ? `${ex.left} * ${ex.right}`
: ex instanceof Paren ? `(${ex.contents})`
: /* ex is a Num */ String(ex.value));
}, ex);

Let’s try this on an example AST and see the output:

const expr = times(num(2),
paren(times(plus(num(1), plus(num(1), num(1))),
times(num(4), num(3)))));
prettyPrint(expr); // 2 * (1 + 1 + 1 * 4 * 3)

That doesn’t look quite right. The problem is that in the input AST, the associativity of operations is clear, but when we flatten our AST to a string, we can no longer tell the order in which these operations should be performed. To make our expression readable, we need to remove the parenthesis that are in the input, and add parenthesis where they’re necessary for understanding. First, let’s perform the removal:

const flatten = ex => cata(exprFunctor,
function (expression: ExprF<Expr>) : Expr {
const ex = prj(expression);
return ex instanceof Paren ? ex.contents : new In(expression);
}, ex);
prettyPrint(flatten(expr)); // 2 * 1 + 1 + 1 * 4 * 3

Note that, while we said that catamorphisms “tear down” a recursive value, collapsing it into some inner value, we didn’t say that inner value needs to be primitive. In this case, the contained value is an Expr. The result is that we have a transformation that moves over our data structure, preserving most nodes but filtering out ones of type Paren.

Now we can add the parenthesis back in where they’re needed for readability. Addition and multiplication both follow the associative law, so we don’t need parenthesis when we add two addition nodes or multiply two multiplication nodes. The only time we really need them is in cases where we multiply something by an addition, in which case we need parenthesis around the addition nodes:

const addNecessaryParens = ex => cata(exprFunctor,
function (expression: ExprF<Expr>) : Expr {
const ex = prj(expression);
if (!(ex instanceof Times)) return new In(expression); const left = prj(out(ex.left)),
right = prj(out(ex.right));
return new In(new Times(
left instanceof Plus ? paren(ex.left) : ex.left,
right instanceof Plus ? paren(ex.right) : ex.right
));
}, ex);

Like the previous example, this algebra produces a recursively nested structure. In this case we can also see that our algebra can look multiple levels into the wrapped value, and that it can return extra nested nodes, resulting in a larger output data structure than our catamorphism took as input.

Not only can the inner value produced by our algebra be a nested expression; our algebra can also look as far into the inner value as it likes, and it can return a value that’s bigger than its initial input.

We can go one step farther, and convert one nested type into a completely different nested type. Let’s add in another fixed-point type which represents boolean statements using and and or, as described here. There’s a natural mapping from a numerical equation to a boolean equation: plus turns to or, times turns to and, and num turns to a true if the number is nonzero, and false if the number is zero. We can implement this transformation as a catamorphism as so:

function exprToBoolean(expr: Expr) : BoolExpr {
return cata(exprFunctor,
function(expression: ExprF<BoolExpr>) : BoolExpr {
const ex = prj(expression);
return (
ex instanceof Plus ? or(ex.left, ex.right)
: ex instanceof Times ? and(ex.left, ex.right)
: ex instanceof Paren ? boolExpr.paren(ex.contents)
: /* ex is a Num */ bool(ex.value !== 0));
}, expr);
}

We can implement a pretty-printing function for our boolean expression similar to that for our numerical function, which lets us see how this works:

const boolPrettyPrint = ex => cata(boolExprFunctor,
function (expression: BoolExprF<string>) : string {
const ex = boolExpr.prj(expression);
return (
ex instanceof Or ? `${ex.left} OR ${ex.right}`
: ex instanceof And ? `${ex.left} AND ${ex.right}`
: ex instanceof Paren ? `(${ex.contents})`
: /* ex is a Bool */ String(ex.value));
}, ex);
const toBooleanExpr = plus(num(0),
paren(times(num(1),
paren(plus(num(0),
num(1))))));
prettyPrint(toBooleanExpr);
// 0 + (1 * (0 + 1))
boolPrettyPrint(exprToBoolean(toBooleanExpr)));
// false OR (true AND (false OR true))

Our pretty-printing example is much the same as before: We produce “OR” for an Or node, “AND” for an And node, and parenthesis for Paren. If the node we’re dealing with is a Bool, we use the String constructor to produce “true” for true and “false” for false.

This allows us to see the results of our exprToBoolean function: 0 turned to false, 1 to true, + to OR and * to AND, just as we intended. So we see that we were successfully able to convert a recursive Expr type to a completely different recursive type.

Catamorphisms are a very versatile and useful class of functions. Their basic operation is deceptively simple: walking over a structure and applying an algebra to each value in it from the inside out. Simple as this may be, it can be used to express a huge number of transformations, including collapsing nested types into flat ones, adding or removing nodes from a requested type, and transforming one nested type into another.

Thanks to Lizz Katsnelson for editing this post.

All code from this post is available in runnable form here.

JavaScript developer with a focus on typed functional programming. He/him. https://jnkr.tech