Fixpoints in Haskell

I gave a brief informal talk about fixpoints and their uses in functional programming at the Norcross Haskathon yesterday. I ended up promising to write it up as a blog post, as well. This is that post.

Just to set expectations from the beginning:

  • This is not research; it’s just exposition of things that are already widely known in programming language theory. If you eat denotational semantics for breakfast, nothing here will be new.
  • This is also not an applied post. You won’t learn (at least directly) how to build better software in Haskell. You might, though, learn a new way of thinking about what your existing Haskell code means.

If you’re still with me, let’s go!

What is a fixpoint?

If you have a function from any set to itself, then a fixpoint of that function is any input that maps to itself. On a standard, graph you can think of these as being values that fall on the line x = y.

Fixpoints of a simple function (from Wikimedia Commons)

Here are a few examples:

  • Consider the function f(x) = x². It has two fixpoints: 0 and 1. These are the two real numbers that don’t change when you square them. (Negative numbers become positive, so they can’t work. Numbers between 0 and 1 become smaller. Numbers larger than 1 get larger.)
  • Now consider the cosine function, with its argument in radians. You may have unknowingly calculated the fixpoint of this function when you were bored in high school math class. If you start with any number, and repeatedly press the cosine button on your calculator, you’ll notice that the answer converges to a number around 0.739. This is the only fixpoint of the cosine function.
  • Finally, consider g(x) = x + 1. This function has no fixpoints, since if g(x) = x, then subtracting x from both sides, you could conclude that 0 = 1.

In general, deciding whether a function has fixpoints or not can be difficult. So there are many different fixpoint theorems. For example, Brouwer’s fixpoint theorem says that any continuous function from a closed ball into itself in Euclidean space must have a fixpoint. Another, which will be important in a bit, is the Knaster-Tarski theorem, which says that any order-preserving function on a complete lattice has a complete lattice of fixpoints. But more on that in a bit.

Why care about fixpoints?

That’s what a fixpoint is, but why should we care about it? Are fixpoints more interesting than a children’s amusement with the cosine button of a calculator? In fact, they are.

A really beautiful sequence of the MIT text Structure and Interpretation of Computer Programs deals with progressively more abstract representations of algorithms for computing a square root. Heron of Alexandria proposed an algorithm as follows. To compute the square root of n: (a) start with a guess, a; (b) compare a with n / a; (c) if they are close enough to each other, then stop; otherwise, average the two and repeat with that as your guess.

For example, if we start by guessing that the square root of 10 is 5, then we go through these steps:

  1. a = 5, but n / a = 2. Those are not close, so our new guess is 7/2.
  2. a = 7/2 (3.5), but n / a = 20/7 (about 2.86). Those are not close, so our new guess is 89/14.
  3. a = 89/28 (about 3.18), but n / a = 280/89 (about 3.15). Those are not quite close enough, so our new guess is 15761/4984.
  4. a = 15761/4984 (about 3.16), and n / a = 49840/15761 (also about 3.16). Those pretty close, so we can stop.

See what’s happening? Just like with the cosine button, we’re repeating a computation to make a guess better until it converges. The point where it converges is a fixpoint. But this time, the computation was chosen so that its fixpoint is the square root of n, giving a technique for calculating square roots.

To express this, we can write a Haskell implementation built around the notion of searching for a fixpoint.

You can play with this code using CodeWorld, and compute your own square roots.

This is sort of a funky way to compute a fixpoint, though.

  • First, it requires a starting value, which is a wart in the API. A function’s fixpoints don’t really depend on any particular starting value.
  • Second, it relies on the function to converge — to produce a value closer to a fixpoint in its result than its input. This is the reason the average is there at all, instead of just using n/2, which has the right fixpoint but doesn’t converge to that fixpoint.
  • Third, the function may not even have a fixpoint at all! Here, a fixpoint exists for Double because it has finite precision. But the function f(x) = (x + 10/x) / 2 has no fixpoint in the rational numbers, so there’s no chance this would also work with Rational.

These problems are hard to overcome in the current form, but it turns out that by creatively modifying some definitions, we can distill the essence of this into a more abstract combinator, which is Haskell’s fix.

Haskell’s fix combinator

In the Data.Function module of Haskell’s base library, there’s a surprising function.

fix :: (a -> a) -> a

This function computes a fixpoint of any Haskell function. It does it without any of the disadvantages of the version above, and even without any constraints on the types! But how can this be? First of all, even demonstrating the existence of fixpoints is so difficult that it requires heavy mathematical theorems. And even worse, some functions (like g(x) = x + 1, above) have no fixpoints at all!

The trick is that Haskell functions aren’t just ordinary functions on ordinary sets. Haskell functions can throw exceptions, and they can loop indefinitely. To reconcile this with the world of mathematical functions, we say that Haskell functions work not just with the set of ordinary values of its types, but the set of all partially defined values.

Here’s how that works. In math, the set of integers is {0, 1, -1, 2, -2, …}. That’s it! There is nothing else in that set. But in Haskell, we say that the type Integer includes one more value: ⊥. This special value is used to indicate that the function doesn’t produce a true number at all, instead either running forever in an infinite loop, or throwing an exception. In essence, ⊥ means “I don’t know.”

Once ⊥ is taken into account, g(x) = x + 1 does actually have a fixpoint, because g(⊥) = ⊥. That is, if you don’t know what number is passed in, you also don’t know what number comes back out. And this is, in fact, the fixpoint you get from fix. The expression fix g loops forever instead of producing a value, so we say its value is ⊥.

In fact, any time that f(⊥) = ⊥, fix f will produce ⊥. (This is unfortunate if we’re looking for actual numbers as fixpoints like we are above. But later, we’ll be able to recover our original notion of fixpoint in terms of this new one!) This might make you wonder why fix is useful at all! After all, if you don’t know the input to a function, can you ever know the output? Are there any functions for which ⊥ is not a fixpoint? Well, yes there are, and it’s all because of laziness. Consider a constant function, like h(x) = 42. Because it’s a non-strict language, Haskell will produce an output of 42 without even looking at the input. In other words, h(⊥) = 42. ⊥ is not a fixpoint. And, in fact, fix h is 42 for this function.

The definedness ordering

With functions on simple types like Double, these are the only two possibilities: if the function is constant, then fix will produce the constant value, and otherwise it will produce ⊥. (The word for these simple types is “flat”; either you know nothing about the value, or you know everything.) But the result is more interesting when the types have more structure. Given a linked list, for example, there are many possible values where different parts of the value are unknown:

  • ⊥ means that nothing about the list is known.
  • ⊥ : ⊥ means that the list is known to be non-empty, but neither the first element nor the rest of the list are known. This is more information than you have about ⊥ (which might be an empty list).
  • 3 : ⊥ means that the list starts with a 3, but you don’t know what (if anything) comes after that. This is strictly more information than ⊥ : ⊥.
  • ⊥ : [] (also known as [⊥]) means that the list is known to only contain one element, but it’s not known what that element is. This is again strictly more information than ⊥ : ⊥, but it’s incomparable to 3 : ⊥. They are both more info than just whether the list is empty, but neither is strictly more informative than the other. They provide different information.
  • 3 : [] (also known as [3]) is strictly more information than either of ⊥ : [] or 3 : ⊥.

This defines a partial order on Haskell values. It’s different from the order given by Ord, which is the usual meaningful order for that type. This order sorts the values by how much we know about them. So in this order, neither of 3 or 4 is less than the other (they are both completely defined, just different); but ⊥ is less than ⊥ : ⊥, which is in turn less than either 3 : ⊥ or ⊥ : [], and each of these is in turn less than 3 : [].

In fact, any Haskell type has a complete semilattice of values in this definedness order, which just means that given any nonempty collection of values, there is always some unique greatest lower bound, which is the most-defined possible value that’s still less defined — but compatible — with all of them. For example, for lists of numbers, the greatest lower bound of the set {[3, 2], [3, 1, 5], [3, 4, 5, ⊥]} is 3 : ⊥ : ⊥, since all three lists are compatible with this type, but making it any more defined would have to exclude some of them.

An important attribute of Haskell functions is that they preserve this definedness order. In other words, if x is less than or equal to y in this order — meaning that x is compatible with y but possibly less defined — , then f(x) is also less then or equal to f(y). If you think about it, this makes sense. If knowing a little bit about the input gives you some knowledge about the output, then learning more about the input might tell you more about the output, but it should never contradict or take away from what you already knew.

Now here’s where everything starts to come together. The fix combinator always produces the least fixpoint in this definedness ordering. This least fixpoint will be guaranteed to exist by the Knaster-Tarski theorem mentioned earlier, which says that any order-preserving function on a complete semilattice must also have a complete semilattice of fixpoints — and in particular, there must be a least one of them. Definedness is a complete semilattice, and all Haskell functions are order-preserving, and that’s good enough to guarantee that the least fixpoint exists.

Let’s look at another example. Define threeAnd list = 3 : list. A fixpoint here is a list that is not changed at all by prepending a 3 onto it. It can’t be ⊥, because 3 : ⊥ is definitely not the same as ⊥. The answer is an infinite list of 3s, so fix threeAnd gives you exactly that: an infinite list of 3s. We can check this with GHCi.

ghci> import Data.Function
ghci> fix (3 :)
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ^C

Fixpoints and recursion

The reason that fixpoints play such a dominant role in functional programming is that they are intricately related to recursion, and that relationship is an important bridge between the operational realm — understanding what the program does— and the denotational realm — understanding what the program means.

In the operational sense, Haskell’s fixpoints can be defined using recursion, like this:

fix f = x  where x = f x

This definition seems almost too cute to work, but it does! It just starts with an x, and then keeps replacing every x with f(x). After an infinite number of substitutions, we find that the least fixpoint of f is just f(f(f(f(f(f(f(f(…)))))))). In other words, any time we need to refer to the input of the function, we just substitute another function application instead.

In the opposite direction, though, general recursion can be defined in terms of the fixpoint combinator. Suppose you have a programming language with no direct recursion allowed, but you are given a fixpoint combinator. Then there’s a simple syntactic sugar for recovering general recursion. Just take your recursive definition, like x = ... x ..., and rewrite it with an extra parameter that takes the place of recursive uses: x1 x0 = ... x0 .... Notice that x1 is not defined recursively. But setting x = fix x1 satisfies the equation given for x.

In the untyped lambda calculus, fix is known as the Y combinator, and it’s a crucial step to showing that the untyped lambda calculus is computationally universal. That’s because lambda calculus doesn’t have general recursion as an a priori language feature. But Y combinators can be built within the untyped lambda calculus itself, so that general recursion can be simulated with fixpoints as described above.

Once you add types, things change a little bit. It’s impossible to define a Y combinator directly in most sound typed lambda calculi, because it would allow for nontermination, which typed lambda calculi usually try to avoid. When these calculi are used as the basis for general programming languages, the first step is usually introduce some kind of general recursion (such as a recursive let) to allow for nonterminating programs to be written.

There’s still an important place for a fixpoint combinator, though! If we care only about what our programs do, then adding recursion might be sufficient. But recursion literally means defining things in terms of themselves, and that’s not safest thing to do if you would like words to keep having meanings that make sense. So what does a recursive definition even mean?

Fortunately, the Knaster-Tarski theorem guarantees that any function has a unique least fixpoint in the definedness order! So, at least for the pure fragment of Haskell, it’s okay to just define the meaning of a recursive definition to be the least fixed point of the related non-recursive function obtained by pull recursive uses into a parameter exactly as we did above. This ensures that any recursive definition can be given a solid meaning, even though on face value it’s just a circular definition.

(Incidentally, fixpoints are a nice way out of all kinds of self-referential paradoxes like this — what Hofstadter called “strange loops”. So next time you’re debating your science fiction geek friends about time travel and the integrity of the space-time continuum, remember that there’s no paradox in time travel as long as the universe is a fixpoint of the function obtained by pulling the back references out as parameters.)

Looping back, then, we could now redefine the fixpoint function from the introduction — using iterative refinement to close in on the right value of the square root — using the least fixpoint of a Haskell function.

A fun puzzle

Putting together all that we’ve talked about, here’s a question. Suppose you get frustrated with fixing your Haskell code, and type the following into GHCi:

ghci> import Data.Function
ghci> fix error

What happens next? Give it some thought before reading on.

To solve the puzzle, let’s first explore the types. The error function in Haskell has the type String -> a. It cannot reasonably have a fixpoint unless the domain and codomain are the same, so the type specializes to String -> String. Then fix error has the type String.

What is this magical string that fixes all errors? Well, it must be a fixpoint of the error function, but the result of theerror function is always ⊥. That is its whole purpose in life! So fix error must be ⊥, too. Not so special after all?

Ah, but on the operational side, there are many kinds of ⊥. Some are just infinite loops, while others throw exceptions. If they throw exceptions, then there’s an exception value to think about. These are all invisible to the pure fragment of Haskell, but immensely important to what the program does! To understand what’s going on here, recall that fix error = error (error (error ...))), an infinite nesting of error functions. The result is quite interesting.

ghci> fix error
"*** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception:^C

Let’s just say it did not fix any errors. Instead, it threw an exception, with an associated exception message that itself throws an exception when printed, and that exception has another exploding error message, and so on as long as you allow it to continue!