# Catamorphisms

This post is part of my series on recursion schemes.

In previous posts, we introduced recursion schemes and learned how and why they work. Now we can get to the fun stuff, and actually learn the schemes themselves. First up is the scheme which we’ve been using for examples so far, `cata`, the scheme which produces catamorphisms. In this post, we’ll see the definition of a catamorphism and the implementation of `cata`, as well as a few example catamorphisms.

## Tearing things down

So what exactly is a catamorphism? “Morphism” is a term from category theory (the mathematical field in which the paper introducing recursion schemes was written), and a morphism is basically a function which transforms one data structure to another. All recursion schemes are morphisms, with their prefixes giving a hint about the kind of transformation they involve. In this case we have the Greek root “cata-”, which means “down” or “back”, in the sense of “catastrophe” or “cataclysm.” A catamorphism is a function that “tears down” a data structure by recursing on it with a function that collapses a functor into its contained type. `cata` takes such a function, and returns a catamorphism.

## Implementation

Let’s finally look at the actual implementation of `cata`, from `static-land-recursion-schemes`:

`export type Algebra<F, A> = HKT<F, A> => A;export function cata<F, A>(functor: Functor<F>,                           transformer: Algebra<F, A>,                           term: Fix<F>) : A {  const extracted = out(term),        childrenMapped = functor.map(          x => cata(functor, transformer, x),          extracted),        transformed = transformer(childrenMapped);  return transformed;}`

Going through this, step-by-step:

First, we define the type `Algebra`. An algebra is a function with the signature `F<A> => A`, where `F` is a functor. The name “algebra” here comes from category theory, in which we’d say that `A` is the “carrier type” of our function, and that our function evaluates the functor. I don’t particularly understand this, or why exactly this was the chosen name; I think of an algebra as extracting a value out of a functor’s context, or of collapsing a functor into its contained type, and so far this has worked for me.

The definition of `cata` is relatively minimal. Our input type is wrapped in `Fix`, so we use `out` to get our type out. Then we use `map` to recurse into the extracted type, calling `cata` on whatever it contains. In this way, `cata` drills down to the bottom of the tree, until it gets to values for which `map` doesn’t call its input function. (These are the values which do not make use of the functor’s type parameter. In our running example of arithmetic expressions, these values are instances of the `Num` class.)

Remember the signature of `map`: `(A => B, F<A>) => F<B>`. In this case, `map` calls `cata`, which returns something of type `A`, so `childrenMapped` holds something of type `F<A>`. This is our input functor, with anything that was nested within it already collapsed into a value of type `A`. We then pass this to our algebra, giving our final result of type `A`.

## Example usages

In this section, we’ll work with the `Expr` type that was used in previous posts. The full implementation of `Expr` is here.

In our introduction to recursion schemes, we saw that we could convert a nested `Expr` to a `number` using an evaluation function. Now, let’s convert `Expr` to a `string` using a pretty-print function:

`const prettyPrint = ex => cata(exprFunctor,  function (expression: ExprF<string>) : string {    const ex = prj(expression);    return (      ex instanceof Plus  ? `\${ex.left} + \${ex.right}`    : ex instanceof Times ? `\${ex.left} * \${ex.right}`    : ex instanceof Paren ? `(\${ex.contents})`    : /* ex is a Num     */ String(ex.value));  }, ex);`

Let’s try this on an example AST and see the output:

`const expr = times(num(2),                   paren(times(plus(num(1), plus(num(1), num(1))),                         times(num(4), num(3)))));prettyPrint(expr); // 2 * (1 + 1 + 1 * 4 * 3)`

That doesn’t look quite right. The problem is that in the input AST, the associativity of operations is clear, but when we flatten our AST to a string, we can no longer tell the order in which these operations should be performed. To make our expression readable, we need to remove the parenthesis that are in the input, and add parenthesis where they’re necessary for understanding. First, let’s perform the removal:

`const flatten = ex => cata(exprFunctor,  function (expression: ExprF<Expr>) : Expr {    const ex = prj(expression);    return ex instanceof Paren ? ex.contents : new In(expression);  }, ex);prettyPrint(flatten(expr)); // 2 * 1 + 1 + 1 * 4 * 3`

Note that, while we said that catamorphisms “tear down” a recursive value, collapsing it into some inner value, we didn’t say that inner value needs to be primitive. In this case, the contained value is an `Expr`. The result is that we have a transformation that moves over our data structure, preserving most nodes but filtering out ones of type `Paren`.

Now we can add the parenthesis back in where they’re needed for readability. Addition and multiplication both follow the associative law, so we don’t need parenthesis when we add two addition nodes or multiply two multiplication nodes. The only time we really need them is in cases where we multiply something by an addition, in which case we need parenthesis around the addition nodes:

`const addNecessaryParens = ex => cata(exprFunctor,  function (expression: ExprF<Expr>) : Expr {    const ex = prj(expression);    if (!(ex instanceof Times)) return new In(expression);    const left = prj(out(ex.left)),          right = prj(out(ex.right));    return new In(new Times(      left instanceof Plus ? paren(ex.left) : ex.left,      right instanceof Plus ? paren(ex.right) : ex.right    ));  }, ex);`

Like the previous example, this algebra produces a recursively nested structure. In this case we can also see that our algebra can look multiple levels into the wrapped value, and that it can return extra nested nodes, resulting in a larger output data structure than our catamorphism took as input.

Not only can the inner value produced by our algebra be a nested expression; our algebra can also look as far into the inner value as it likes, and it can return a value that’s bigger than its initial input.

We can go one step farther, and convert one nested type into a completely different nested type. Let’s add in another fixed-point type which represents boolean statements using `and` and `or`, as described here. There’s a natural mapping from a numerical equation to a boolean equation: `plus` turns to `or`, `times` turns to `and`, and `num` turns to a `true` if the number is nonzero, and `false` if the number is zero. We can implement this transformation as a catamorphism as so:

`function exprToBoolean(expr: Expr) : BoolExpr {  return cata(exprFunctor,    function(expression: ExprF<BoolExpr>) : BoolExpr {      const ex = prj(expression);      return (        ex instanceof Plus  ? or(ex.left, ex.right)      : ex instanceof Times ? and(ex.left, ex.right)      : ex instanceof Paren ? boolExpr.paren(ex.contents)      : /* ex is a Num     */ bool(ex.value !== 0));    }, expr);}`

We can implement a pretty-printing function for our boolean expression similar to that for our numerical function, which lets us see how this works:

`const boolPrettyPrint = ex => cata(boolExprFunctor,  function (expression: BoolExprF<string>) : string {    const ex = boolExpr.prj(expression);    return (      ex instanceof Or    ? `\${ex.left} OR \${ex.right}`    : ex instanceof And   ? `\${ex.left} AND \${ex.right}`    : ex instanceof Paren ? `(\${ex.contents})`    : /* ex is a Bool    */ String(ex.value));  }, ex);const toBooleanExpr = plus(num(0),                           paren(times(num(1),                                       paren(plus(num(0),                                                  num(1))))));prettyPrint(toBooleanExpr);// 0 + (1 * (0 + 1))boolPrettyPrint(exprToBoolean(toBooleanExpr)));// false OR (true AND (false OR true))`

Our pretty-printing example is much the same as before: We produce “OR” for an `Or` node, “AND” for an `And` node, and parenthesis for `Paren`. If the node we’re dealing with is a `Bool`, we use the `String` constructor to produce “true” for `true` and “false” for `false`.

This allows us to see the results of our `exprToBoolean` function: `0` turned to `false`, `1` to `true`, `+` to `OR` and `*` to `AND`, just as we intended. So we see that we were successfully able to convert a recursive `Expr` type to a completely different recursive type.

Catamorphisms are a very versatile and useful class of functions. Their basic operation is deceptively simple: walking over a structure and applying an algebra to each value in it from the inside out. Simple as this may be, it can be used to express a huge number of transformations, including collapsing nested types into flat ones, adding or removing nodes from a requested type, and transforming one nested type into another.

Thanks to Lizz Katsnelson for editing this post.

All code from this post is available in runnable form here.

--

--