Folding Right — a path towards functional origami

Published in

Prima Engineering

11 min readJan 11, 2018

11 Jan 2018

In this small article we explore what a fold is, what its properties are, and how it is the basis for all other higher order functions (HOF). At the end we’ll have a better appreciation for functions that fold as much as for those that unfold.

What’s in a fold? Patterns of recursion

Higher order functions abstract a common pattern of recursion. Map and filter are usually easy to understand, fold can be trickier.

def sum([]), do: 0 
def sum([head|tail]), do: head + sum(tail)

In this sum function, we’re going over a list, accumulating partial results in the stack until there’s only one element left.

public int Length<T>(List<T> myList) {
	if(myList.Empty){
		return 0;
	}
	return 1 + Length(myList.Skip(1));
}

This C# is a bit more clunky than the previous Elixir, but it shouldn’t be too hard to read. We have a base case (the list is empty) where we return 0, otherwise we sum 1 to the rest of the computation, calling Length on a shortened list (.Skip(1) is a little LINQ trick doing exactly what it says).

Both these functions evaluate according to this pattern, given a list of three elements and an infix operator “op”:

1 `op` fun([2,3])
1 `op` (2 `op` fun([3]))
1 `op` (2 `op` (3 `op` fun([])))
1 `op` (2 `op` (3 `op` 0))

Now, if we suppose that op is actually the sum operator, we can reduce the equation to:

1 + (2 + (3 + 0))
1 + (2 + 3)
1 + (5)
6

What about our length function? It looks like the operator is not binary anymore, as the second addendum is 1, but look closer!

1 + (fun[2, 3])
1 + (1 + fun[3])
1 + (1 + (1 + fun([])))
1 + (1 + (1 + 0))
1 + (1 + 1)
1 + (2)
3

If you consider the actual implementation, the operator must be binary, in fact, it’s equivalent to the following function:

(fn _el acc -> 1 + acc end)

One of the arguments is simply ignored.

The distinguishing elements of fold that we’ve seen so far are:

the inner function is binary
one argument of the inner function is an element of a data structure (for simplicity, the article will only feature recursion on lists)
the other argument of the inner function is the result of the computation so far
there is a “seed” element representing the value at the base case

If we consider map or filter in their most common API, we can see that their inner function is unary, i.e., it simply accepts the current element. Does that make you think of something tricky we can do with fold?

What we showed in the previous example is not a very efficient fold for large lists. You might have heard of tail-recursive functions, which are actually your more common for loops in disguise. The issue we’re now facing is that the stack of operations could get very large, producing an infamous Stackoverflow error in our code. We don’t want that. We can rewrite our functions like so:

def sum([], acc), do: acc
def sum([head|tail], acc), do: sum(tail, head + acc)

Look at the evaluation tree now:

fun([1,2,3], 0)
fun([2,3], 1)
fun([3], 3)
fun([], 6)
6

This is a function you must have written hundreds of times, the “max” variable acts as our accumulator:

public int GetMax(List<int> myList){
	var max = 0;
	foreach(el in myList){
		if(el > max){
			max = el;
		}
	}
	return max;
}

For now, you should notice that the only thing differing in the folds we’ve seen, besides the little aside on tail-recursion, is the binary operator. We’re ready to implement the fold function itself:

def reduce(f, [], acc), do: acc
def reduce(f, [head|tail], acc), do: reduce(f, tail, f(head, acc))

Go right, go left

Warning: this section contains some Haskell code since there’s no right vs left fold in Elixir. Don’t panic! I promise I won’t talk about monads. Some familiarity with reading type signatures however would greatly increase your understanding.

Folds are commonly classified in “right fold” and “left fold”. Right folds are right associative, left folds are left associative.

Let’s see some code:

foldr (+) 0 [1, 2, 3]
6

The function is called “foldr”, or fold right. (+) is our binary inner function. 0 is our seed value, this is what we will pass as output once the data structure is at its end. [1, 2, 3] is the data structure we’re folding over.

What happens under the hood is our familiar deconstruction:

1+(2+(3+0))
1+(2+(3))
1+(5)
6

You should notice that parentheses are on the right, in this sense we say “right associative” and that’s the mystery behind the term “foldr”. By contrast, let’s see a left fold:

foldl (+) 0 [1, 2, 3]
6

The result is identical, but what’s happening is slightly different:

((0+1)+2)+3
((1)+2)+3
(3)+3
6

While the number of steps is the same, the process is not. You might recall from grade school that addition is associative, so what’s the big deal? Well, if you’re working in a context where you have lazy evaluation, since associativity is on the left, in order to return any result, fold left needs to go through the whole structure, it cannot pass intermediate results. This is a bit of a bummer if you’re working with infinite lists!

Take this example:

head' xs = foldr const 0 xs

(const given two arguments returns the first). You might be thinking that to write “last” in terms of fold, you only need to use foldl, and to a point, you’re correct:

last' xs = foldl (flip const) 0 xs

(flip inverts the order of the arguments, we’ll see why we need this when we consider the different type signatures of foldr and foldl).

But if you’re working with infinite lists, the matter complicates quite a lot. Take the function below. In the case of a finite list, it would return the seed value, but since the list is infinite, it will never return!

foldl (const) 0 (repeat 3)

However, we can take the head of an infinite list with foldr:

foldr (const) 0 (repeat 3) 
3

How does it work? Consider the first evaluation step for foldr:

3 const fun(const, repeat(3))

const has type: a -> b -> a. Due to equational transparency, a=3, b=repeat 3. Since Haskell is lazy, we don’t need to evaluate b and simply return a.

But don’t dismiss foldl just yet, it can be used to code neat tricks like:

reverse = foldl (flip (:)) []

This example allows us to introduce the structure of the inner binary operator. At each step, we apply the function we passed to fold which takes the next element in the structure and the accumulator. But which is which? In Haskell, that’s easy to answer:

foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b

Don’t mind the Foldable typeclass. The inner function of fold right takes an element (a) and the accumulator (b) in this order, while fold left works in the exact opposite fashion. So could we reverse a list with fold right? what’s so special with fold left? We can, but it’s a bit uglier:

reverse xs = foldr (\a b -> b ++ [a]) [] xs

which is equivalent to this beauty:

reverse = foldr (flip (++) . (:[])) []

There’s three main points to take home when talking about right and left folds:

associativity: the order of computation is different
inner function: the different association is reflected in the order in which the arguments are passed to the inner binary function
when dealing with infinite lists, fold left is not really an option, use with caution

Properties of fold

Warning: this section is a bit math heavy, give it your best or safely skip ahead if it’s not your cup of tea.

Did you know that any recursive function can be expressed in terms of fold? Let’s see some examples applied on lists. You might want to experiment on your own before you read further.

Map

map f = foldr (\a b -> f a : b) []

Filter

filter p = foldr (\a -> if p a then (a:) else id) []

In the past century, some clever folks started studying what’s called “recursion theory” and found some fascinating properties of fold. I’ll list some of them and provide reference for the more inquisitive reader. All examples and definitions below are from Hutton, 1999.

Universal property

Any recursive function can be expressed in terms of fold.

If:

g _ [] = v 
g (x:xs) = f x (g xs)

Then:

g = fold f v

Wait, what does this mean? The first equation simply says: there’s a seed value v. The second equation describes how the function works otherwise: g recurses over a list, f is applied to each element and the accumulator. When these two equations stand, we have a fold. Let’s take a familiar function as an example:

sum [] = 0 
sum (x:xs) = x + sum xs

It’s easy to see how v = 0 and f = (+), then:

sum = fold (+) 0

Fusion property

The composition of an arbitrary function and a fold can be fused together to give a single fold.

As you can imagine, the universal property is at play here! You can read the proof in the Hutton paper, the result is:

If:

h w = v 
h (g x y) = f x (h y)

Then:

h . fold g w = fold f v

Which, in more layman terms can be exemplified to:

(+1) . fold (+) 0 = fold (+) 1

Follow the previous line of thought with this example in mind to convince yourself it is true. Personally, I find the fusion property the most useful thing to remember for beginner origami students. So many times we waste resources writing stuff like:

[1,2,3] 
|> Enum.map(&(&1 + 1)) 
|> Enum.map(&(&1 * 2))

When the following would give an equal result without enumerating twice:

[1,2,3] 
|> Enum.map(&((&1 + 1)*2))

(Alternatively, Stream acts like a fusion since we’re passing each element through instead of iterating n times). As a specific case of fusion using map and fold, it should be easy to see that:

fold f w $ map g xs = fold (f . g) w xs

Lastly, fun fact: did you know that Elixir’s famous |> operator is implemented in terms of fold?

Unfold your potential

There’s a little known fact among beginner functional programmers: what one can fold, one can unfold! If that just sounds like a stupid play on words, see the following example from the Elixir docs:

Enum.scan(1..5, &(&1 + &2)) 
[1, 3, 6, 10, 15]

I wonder what will happen if I do

scanl (+) 0 [1,0,1,1,2,3]

In this last section we’ll quickly go over:

the scan pattern of recursion
the difference between scanl and scanr
a little application of the universal property

If we take the previous Elixir function, we can imagine it does something like this:

(1 + 0)
: (1 + 0 + 2)
: (1 + 0 + 2 + 3)
: (1 + 0 + 2 + 3 + 4)
: (1+  0 + 2 + 3 + 4 + 5)
: []

The pattern of recursion then is similar to that of fold, except we want to keep intermediate results. If we take our more analytical description of the characteristic of fold, we see that they all apply to unfold. In fact both fold and unfold have an inner binary function taking an element and the accumulator, and both have a seed value. What’s different then is how scan operates after applying the inner function. Recall that we wrote fold as:

def reduce(f, [], acc), do: acc 
def reduce(f, [head|tail], acc), do: reduce(f, tail, f(head, acc))

scan could then be:

def scan(f, [], acc), do: reverse(acc) 
def scan(f, [head|tail], acc = [ha|_]), do scan(f, tail, [f(head, ha) | acc])

Coming to the question of left and right, we’re in a bit of a dilemma. We have an intuition that the two directions indicate associativity once again, but of what operation? Is it of the inner function or the list accumulation (read : or | as cons)? Well, if you think about it, it wouldn’t make much sense to put parenthesis differently around cons. Previously we showed the example of “const” where left and right associativity had very different outcomes, but since in any scan we’ll have the same cons behavior, what’s left is the inner function.

If you managed to read the previous section on the properties of fold, you know that any recursive function can be written in terms of fold. Scan is a recursive function, so how can we rewrite it?

Our previous Elixir implementation of scan doesn’t give much of a hint, that’s because we didn’t emphasize the cons operator. Remember that by the fusion property

h . fold g w = fold f v

We can imagine h being cons and g being our inner function. It doesn’t really work, but it’s an intuition. All we need to do then is to extract the cons from our previous pattern of recursion. The Haskell Prelude implementation of scanr is pretty easy to read:

scanr                   :: (a -> b -> b) -> b -> [a] -> [b]
scanr _ q0 []           =  [q0]
scanr f q0 (x:xs)       =  f x q : qs
                           where qs@(q:_) = scanr f q0 xs

We apply scanr recursively. Scanr by itself applies the inner function (f x q) and cons to the rest (:qs = scanr on the tail!). So where is the fold? That’s exactly the f x q part in disguise, equivalent to our f(head, a) in the Elixir implementation. Can you see why? q is in fact the result of scanr on the tail, which is itself an application of f x q. That means that q is an accumulator of sorts on which we apply f x.

At this point you might be thinking: OK, I have a better understanding of fold, I can appreciate its power, but unfold? Really? Have you ever actually used it? That’s the problem with self-contained examples, sometimes it’s hard to see beyond them. So here’s a couple of code snippets from production©.

Here’s my solution to the second part of day 11 of 2017 Advent of Code, the infamous Hexagon Manhattan distance problem.

move :: String -> (Int, Int, Int)
move "n" = (0, 1, -1)
move "ne" = (1, 0, -1)
move "nw" = (-1, 1, 0)
move "s" = (0, -1, 1)
move "se" = (1, -1, 0)
move "sw" = (-1, 0, 1)

transform :: [String] -> [(Int, Int, Int)]
transform = map move

-- oh my, I could have put the coordinates in lists and used zipWith (+) a b!
sumTwoCoords :: (Int, Int, Int) -> (Int, Int, Int) -> (Int, Int, Int)
sumTwoCoords (a, b, c) (x, y, z) = (a+x, b+y, c+z)

distance :: (Int, Int, Int) -> Int
distance (x, y, z) = maximum [abs(x), abs(y), abs(z)]

solve2 :: [String] -> Int
solve2 a = maximum $ map distance $ scanl sumTwoCoords (0, 0, 0) $ transform a

so what the scanl does here is it creates a series of coordinates given a series of moves. The input is a series of moves (n = go north, sw = go south west) and it asks to return the maximum distance from the origin ever encountered. What we do is simply apply the movement to the previous coordinate. If that’s not production© enough for you, here’s a snippet from our own codebase. It’s not a straight-out scan since we don’t recurse over a given list, but the pattern is quite similar:

def unfold_substitutions(nil, acc), do: acc
def unfold_substitutions(%Quote{substitution_insurance_id: nil}, acc), do: acc
def unfold_substitutions(%Quote{substitution_insurance_id: insurance_id}, acc) do
prev = %{save: %{quote: kuote}} =
Insurance
|> Insurance.by_insurance_id(insurance_id)
|> Insurance.preload_save_quote()
|> Repo.one()

unfold_substitutions(kuote, acc ++ [prev])
end

Oh my, acc ++ [prev]? Who wrote this garbage?! (surprise! It was me!). What this function does is simply list all the predecessors of a particular insurance. An insurance policy can be changed while still active, spawning a second insurance. This process is commonly called “substitution” of an insurance. The id of the previous insurance is stored on the quote object related to the insurance. The funny spelling kuote is so it doesn’t clash with the Elixir reserved quote keyword.

Conclusion

In this short article we went over a lot of material, starting from the basics of fold to its most esoteric properties. If you were uneasy about this peculiar recursion pattern, I hope you feel a bit more comfortable. The getaway points should be:

each HOF is really a particular recursion scheme
fold holds a special place among them because everything can be written in terms of fold
fold has an interesting opposite: scan!

Next time you’re doing some data transformation, remember your friends fold and scan, they can turn an ugly mess into beautiful origami!

Originally published at prima.engineering on January 11, 2018.