I recently decided to become a major contributor to the Haskell Foundation, and I’m writing about the reasons behind my decision. I also hope to encourage others who can easily do so (and *only* those who can easily do so) to join me in pledging ongoing support for the organization as individuals with passion and love for Haskell and its community. In short, there’s a real opportunity here and we’re always stronger when we act together.

I hope this goes without saying, but I will say it anyway. Haskell has, for many years now, been a source of great joy in my life. It has been a way to unwind, whether it’s solving Project Euler problems, throwing together a physics demo, or solving a math problem. It’s been with me when I just need to get away from the stresses of daily life. It has also been my life’s work. Though I’ve yet to be paid to write Haskell, my volunteer work in schools, teaching children with Haskell, has been my way to give back to the world and accomplish something meaningful with my life.

But more than that: it has been my community. The first time I attended a Haskell user group, I impulsively jumped on a plane from the U.S. to Cambridge, U.K. on a few days notice, for an event called AngloHaskell. There, I met so many phenomenal people — Simon Peyton-Jones, Neil Mitchell, Don Stewart, Simon Marlow, and more — all in one fantastic weekend! Since then, I’ve continued to be welcomed in this community at ICFP, BayHac, Compose, The New York Haskell User’s Group, Atlanta Functional Programming Meetup, and even helped run a couple Haskell events in Atlanta and New York. More importantly, the people I’ve met in this community are some of my closest friends. They’ve been with me through not just technical challenges, but also times of loneliness and struggle as well as times of celebration.

I know this is not just me. I’ve lost count of the stories I have heard from others about how our community welcomed them when they needed a friend, felt alienated and alone, doubted themselves, or felt like they couldn’t make it in the world. These are people I’ve mentored, and also people who have mentored me. They are close friends, and a few distant acquaintances. We’re all in this together. Not all of these people are able to contribute to Haskell financially, but they are here nevertheless, passing along that kind of support to others. I like to think I’m donating in their name, also.

Of course, there are lots of great technical arguments for Haskell. That’s why I got involved in the first place, and I wholeheartedly believe in them. At some point, though, I was surprised to discover that isn’t what matters. Haskell isn’t just a programming language. It’s the place where I belong.

Now it’s time to remove those rose-colored glasses. As a community, we have challenges. We can be downright embarrassing. We can be frustrating, fractured, and lost.

As a small language community, we naturally have big gaps: documentation, compatibility, library maintenance, education materials, etc. Let’s face it: most mainstream programming languages operate with big support from big names in the tech industry, like Google, Facebook, Microsoft, Mozilla, Sun (R.I.P.), or Apple paying dozens or even hundreds of full-time core language and library contributors. Haskell operates on a fraction of that kind of support. Instead, we have a community of mostly volunteers, who care deeply about what they are doing but often don’t even use Haskell professionally themselves!

We have our share of communication problems. It wasn’t too long ago that the whole Haskell community felt on the verge of collapse, with separate and competing Haskell language websites set up to vie for some authority and image of official support for their own ideas of the right way to do things. But we *are* strong enough to overcome these challenges, to fix these divisions, and to be better for it. We are fortunate to have people of phenomenal character who we all admire and trust — especially Simon Peyton-Jones, our figurehead and inspiration. His kindness and even-tempered friendly spirit have healed many of these divisions when it seemed beyond all hope! Thanks, Simon. The rest of us are doing better these days, as well. But it’s going to take a sustained long-term effort.

The Haskell Foundation is well placed to meet these challenges head on. As an umbrella organization looking after the entire Haskell user experience, it’s set up to recognize the places where our individual projects need help, or fail to meet needs, and make sure users aren’t slipping through the cracks between different projects, visions, and personalities. With strong leadership across the whole community, it’s also perfectly suited and thoughtfully led to help resolve conflict, build consensus, and find ways forward in the midst of differences of opinion. This is a phenomenal resource for solving some of the great challenges facing our community.

Let’s talk about the Haskell community. One of the things that’s always set Haskell’s community apart is that it’s *not* a typical programming language community. Haskell has generally maintained strong representation from several very different groups:

- Academics who use Haskell as a testbed to try cutting-edge new ideas and build a better future for programming languages.
- Educators who use Haskell to teach big concepts about programming languages and practice.
- Professionals who use Haskell, cutting edge innovations and all, to solve so-called “real world” — that is, business — problems. (An unfortunate term, indeed, because of course
*all*of Haskell’s users are part of the real world!) - Individuals who enjoy how Haskell’s language design leads them on journeys of discovery and learning, helping shape and clarify their ideas and self-expression.

Early in Haskell’s life, the academic influence was strong, and it took a deliberate effort by the community to remain inclusive to others, including educators and professionals. Lately, the professional side has grown quite strong, and once again the community needs to evaluate where it stands. Both then and now, we’ve seen strong leadership proclaim that Haskell’s identity doesn’t lie in any one group, but in a great meeting of minds, sticking it out on the dirt road rather than taking the easy path and leaving part of the community behind.

The Haskell community can stand to make some progress, too, in diversity and inclusiveness. We still skew largely white, largely male, largely American and European. (As a white male American, of course, I’m part of that!) Nevertheless, being inclusive to many sources of enthusiasm and interests is a key part of being inclusive to many types of people. It’s difficult to consider diversity in the Haskell community without thinking about Tidal Cycles and other projects of personal passion and expression that flourish here.

This brings me to the final reason why I feel the need to personally support the Haskell Foundation. I’m offering this support because as happy as I am to see amazing corporate sponsorship for Haskell, I’m worried that this shouldn’t be the *only* financial support the Haskell Foundation receives.

Industry support is absolutely not the only contribution to Haskell! The contributions of the many academics who support GHC and other Haskell technology with their research and development are priceless. The contributions of passionate individuals who maintain key parts of Haskell’s tooling and ecosystem as enthusiasts are amazing. Nevertheless, I like to think that the Haskell Foundation can look to its own funding and see it also reflect their responsibility to the entire Haskell community, corporate and non-corporate alike.

This was a very emotional post for me. Not much like my usual blog posts, I know. But ultimately, I just needed to reach out and tell all of you Haskell programmers, that you are valuable to me. We have something worth keeping here. I’m going to do everything I can to help us keep it.

Feel the same? There’s a lot you can do to help!

- Financially, you can join me in making monthly donations to the Haskell community through the Haskell Foundation. There are other opportunities, as well, to support parts of the Haskell community through Patreon [1, 2, 3, 4], GitHub sponsorships [1, 2, 3, 4, 5, 6, 7, 8], Haskell.org donations, etc. But personally, I believe the power of the Haskell Foundation to shine a spotlight and focus visibility and resources on the most critical problems is extremely compelling.
- Technically, please just follow your heart. People acting out of their own passion and excitement are far more valuable than anyone going through the motions doing as they are told. Do cool stuff, and build the things you need to do it. There are a few guidelines that do wonders here. Maybe most of all: prefer to fix upstream rather than work around downstream.
- Socially, reach out to others with kindness, support, and shared enthusiasm. This really is the backbone of the Haskell community: we all love what we’re doing, believe in it, and will talk your ear off about it if you let us! Whether it’s on IRC, Reddit, mailing lists, Matrix, Discourse, a local interest group or gathering at work, etc.

Either way, no matter how you personally choose to give back to our community, I look fondly on the 15 years I’ve known you all, and look forward eagerly to the many years to come. Thank you all.

]]>Something like this happens in categories by considering limits and colimits. One simply draws a simple picture, follows a simple set of rules, and important ideas pop out. Here, I’m going to present the technique, and then draw a particular picture and see how two big ideas in dynamical systems jump out and startle us with their unexpected appearance.

Don’t worry if you’re not up to speed on category theory. In fact, I won’t even be using *real* category theory at all here, because we don’t need that kind of abstraction. The only ideas we need are *sets*, *functions*, *function composition*, and *identity functions*. I’ll assume you know what these mean.

Here’s a game: How much can we say about sets *without* talking about their elements (that is, the things inside of them)? We can still talk about the *functions* from one set to another, including the *identity* function, and even *composition* of those functions. We just can’t talk about the members themselves.

At first, that might seem like a silly restriction, sort of like trying to go a whole day without using any words with the letter Q. That’s okay; I don’t need to convince you it’s not silly. If I were trying to convince you, though, I might mention that the more general notion of categories includes more than just sets, and *those* things might not have elements. I might also say that the whole idea of membership in a set is an *ultimately undefinable primitive idea* in conventional mathematics, and using that idea too informally has led to logical paradoxes in the past. So it makes some sense to ask whether, instead of just *assuming* that set membership makes sense, we can start with a different foundation (functions) and derive the idea of set membership in a logically sound way. But again, you don’t need to be convinced. Just humor me. It’s a game. Those are the rules.

Let’s practice:

- Can I say “The set
*A*has only one element” without saying anything about elements?

…

Surprisingly,**yes**! Instead of talking about the one element, I can say this: “Given any other set*B*, there is*exactly one*function*f*:*B*→*A*.”

If *A* had more than one element, then you could choose where to map an element of *B*. *A* having only one element means that everything in *B* has to map to that element; there is no choice. So the existence of exactly one such function means the same thing as saying *A* has one element.

Can you find a way to say “The set *A* is empty”? There is more than one answer. I’ll leave it as an exercise for the reader! (Hint: you can do it with a straight-forward application of constructions and diagrams in this post.)

Let’s do one that’s a little more complicated, just to get an idea of how things go in general.

- Can you say “The set
*A*has*the same elements*as the set*B*” without saying anything about elements?

…

Since we have no way to actually name or describe those elements (since we’re forbidden from even talking about them), we can**not**say this. But we can get close, by saying that*A*has the same*number*of elements as*B*. Here’s how to do it: “There is a function*f*:*A*→*B*, and another function*g*:*B*→*A*, and*g*∘*f*is the identity function on*A*, and*f*∘*g*is the identity function on*B*.”

In mathematics, this is known as a *one-to-one correspondence*, and it proves that these two sets have the same number of elements (formally: the same *cardinality*, since it might actually be infinite and therefore not a number, per se.)

This will be a sort of pattern. We can’t name the things in our sets, but we can get an idea of what’s in them by showing that they *correspond* to the things we want. So instead of a set with the same elements, we got a set, and a *function* that shows how whatever is in that set corresponds to the set we wanted. Another way of saying this is to say that the sets are the same “up to one-to-one-correspondence,” or more abstractly, “up to isomorphism.”

Getting the hang of it? Okay, one more. This one will come up later, so follow along. Take a couple of sets: call them *A* and *B*. The *cartesian product* of *A* and *B*, written *A* × *B*, is the set of ordered pairs (*x*, *y*), where *x* is an element of *A*, and *y* is an element of *B*.

Back to our game from earlier.

- Can you define the
*cartesian product*without talking about elements?

…

Well, just like the previous example, we can’t be sure what exactly the elements*in*our set are called, but we**can**say that a set has the right*number*of elements to be the cartesian product. And what’s more, we can even provide a pair of functions that give the components of the ordered pair, giving a way to interpret everything in that set as an ordered pair.

To look at how that’s done, let’s define a couple functions *p* and *q*, called *projections*, that each pick one of the components out of an ordered pair.

The green function is *p* : *A* × *B* → *A*, and the red function is *q* : *A* × *B* → *B*. They don’t really do much. They are almost as boring as the identity function! But just like the identity function was useful for saying how two sets have the same number of elements, these will be useful for defining the cartesian product.

We have a problem: the projections are defined in terms of what they do with *elements*, but we cannot talk about elements. (The rules of the game allow us to talk about the identity function as we did last time, but we have no such exception for projections.) So at first glance, the best we can do is just say that *p* and *q* are functions from *A* × *B* to *A* and *B*.

All that structure about ordered pairs and components, sadly, is lost. But we can recover it by saying something about how this relates to *other* ways to define functions to *A* and *B*. Here’s the final answer.

*“A* × *B* is a set where there are two functions *p* : *A* × *B* → *A*, and *q* : *A* × *B* → *B*, **and** for any **other** set *S* and functions *f* : *S* → *A* and *g* : *S* → *B*, there is a* **unique**h *: *S* → *A* × *B*, such that *f* = *p* ∘ *h*, and *g* = *q* ∘ *h.”*

Well, that was a mouthful. Here’s what it means.

- The functions
*f*and*g*mean that every element of*S*has some associated pair of elements from*A*and*B*. Any particular*S*may not be associated to*all*of the possible pairs, but importantly, we can choose an*S*and*f*and*g*that associate any pair, if we like. - That
*h*exists means, therefore, that for any choice of elements from*A*and*B*, there’s an element in*A*×*B*that corresponds to it. The projections*p*and*q*tell us which*A*and which*B*each element corresponds to. - That
*h*is*unique*means that there is only*one*such element for each pair, since otherwise there would be a choice of which element*h*points at.

So, all together, this says *A* × *B* has exactly one element in it for each choice of a pair of elements from *A* and *B*, and *p* and *q* tell us which pair that is. Success!

That last answer has a common form. It says that among all the *possible* choices of a set and related functions (that is, *S*, *f*, and *g*), there is one particular choice (*A* × *B*, *p*, and *q*) that’s somehow *universal*, because any other choice you could make is nothing more than a mapping to that universal one. In category theory, this form of thing is called a *limit*. This is only distantly related to the limits you might have seen in calculus, so if that’s what jumped in your mind, go ahead and forget it again.

The cool thing about limits is that you can take any collection of sets and functions that you like, and ask what their *limit* is. Start from any little diagram of sets and functions, and an idea pops out. Often, it’s an important central idea about sets and functions, like cardinality and one-to-one correspondence (which come from the diagram with one set) or cartesian products (which come from the diagram with two or more sets, as we saw). The limit is always the *universal* set, along with functions to all of the sets you started with, so that any *other* set and functions are uniquely determined by a mapping from that set into the universal set. The idea then just pops out of which diagram you start with.

Let’s look at another example, though it’s a bit of an odd one. Let’s take an *empty* diagram, with no sets and functions at all. Then the *limit* of this diagram is just some set *U*, such that for any other set *S*, there is a unique function *h* : *S* → *U*.

Sound familiar? That was the answer to the first question in our game where we described sets with one element. That answer, as well, is just the limit, this time of the empty diagram!

So far, the diagrams we’ve looked at had only sets, and not functions. There’s another wrinkle in the picture if you want to construct the limit of a diagram with a function. Here’s a diagram with two sets *A* and *B* and a function *f*, as well as the universal object *U* and functions *p* and *q* that form the limit of this diagram.

The rule says that when you compose a function into the diagram (say, *p*) with a function in the diagram itself (say, *f*), you have to get the same thing as the function into the diagram that points to the same place. So *f* ∘ *p* = *q*. In category theory, this is sometimes expressed by saying it “commutes”, which is just a fancy word that means it doesn’t matter which path you take, because you get the same function in the end.

In this case, that means *q* is completely determined by *p* and *f*, so we don’t need to think about it at all! In fact, this ends up meaning we don’t need to think about *B* or *f* either, and the limit here is the same as a diagram with just *A:* any set with the same cardinality as *A*, where *p* is a specific one-to-one correspondence that witnesses this. So this wasn’t an interesting example. But there are interesting limits with functions, and we’re about to see one.

If we’re thinking of simple diagrams to work out limits for, what about this one?

The diagram has a single object *A*, and a single function *f* from *A* to itself. One of the smallest diagrams we can think of. What could its limit be?

We can just follow the same process, and end up with this statement: “*U* is some set, together with a function *g* : *U* → *A*, where *f* ∘ *g* = *g*. Additionally, for any other set *S* and function *k *: *S* → *A* where *f* ∘ *k* = *k*, there is a unique function *h* : *S* → *U* so that *g* ∘ *h* = *k*.”

But what does that really mean? We can puzzle it out.

*f*∘*g*=*g*means that*f*doesn’t change anything in the result of*g*. In other words, for any*x*in the result of*g*, we must have*f*(*x*) =*x*. There’s a word for values that are unchanged by a function: fixed points (or sometimes fixpoints, or stationary points). So*g*can only point to the fixed points of*f*.- By the same logic, any choice of
*k*can only point to the fixed points of*f*. But it’s possible to choose a*k*that points to any fixed point of*A*. - That
*h*exists, then, means that*g*must map some element of*U*to every fixed point of*f*. (If it didn’t, then*k*might map something to that fixed point, and it couldn’t be obtained by composing with*g*.) - That
*h*is unique means that*g*maps only one thing to each fixed point of*f*. (Otherwise, there would be a choice of where*h*should map some things, so*h*wouldn’t be unique.)

So *U* is in one-to-one correspondence with the fixed points of *f*, and *g* is that one-to-one correspondence. Essentially (up to isomorphism) the limit of this diagram *is* the fixed points of *f*.

Fixed points are a fundamental idea that pops up all over the place. I’ve written about their role in denotational semantics of programming languages before. So I find it delightful that they jump out of such a simple diagram. At the same time, though, it’s not surprising that they jump out of *this* diagram, which is the one that really expresses the essence of a discrete dynamical system: a set (the phase space), together with a function from that set to itself that defines a dynamics on that phase space.

One famous thing about category theory that everything has a dual, obtained by just reversing the directions of all the arrows. In this case, we can convert the definition of a *limit* into a different definition: the colimit. This looks just like the limit, but reverses the direction of the arrows.

As a warm-up, let’s consider the diagram with just two sets. Recall that the limit of this diagram was the cartesian product of those two sets. (More specifically, it was the cartesian product, along with the two projection functions that explained how to interpret each element as an ordered pair.)

The same diagram can be used with arrows reversed to find a colimit.

The formal definition is this: The colimit of the diagram is a set *A* + *B*, together with functions *p* : *A* → *A* + *B* and *q* : *B* → *A + B*, such that for every other set *S* and functions *f* : *A* → *S* and *g* : *B* → *S*, there is a unique function *h* : *A* + *B* → *S* so that *f* = *h* ∘ *p* and *g* = *h* ∘ *q*.

Let’s unpack what that means, now. First of all, everything in *A* maps to something in *A* + *B*, and likewise, everything in *B* maps to something in *A* + *B*. (We would usually call *p* and *q* injections now instead of projections.) But that’s not the whole picture, because we also have to look at *h*.

- The existence of
*h*means that no two elements of*A*or*B*map to the same element of*A*+*B*. If they did, then we could find an*f*and*g*that map them somewhere different, and then we couldn’t write that*f*and*g*as compositions with*h*. - The uniqueness of
*h*means that there’s nothing*else*in*A*+*B*except these elements that are mapped from*A*and*B*. If there were, then there would be a choice of where*h*maps those extra elements.

If you’re familiar with basic ideas about sets, you might want to say that *A* + *B* is the union of *A* and *B*. But that’s not quite right, because if *A* and *B* have elements in common, *A* + *B* still needs to contain an element for each one (because *f* and *g* could map that same value to different elements of some *S*.) So it’s actually called the *disjoint union*. It’s a set that contains a copy of everything in *A* and everything in *B*, keeping two copies if *A* and *B* have elements in common.

In some sense, both the limit and the colimit of this diagram are about combining the elements of *A* and *B*, but they just combine them in different ways: the limit with a cartesian product that describes ways to choose one element from each, and the colimit with the disjoint union that describes ways to choose one elements from either one. In general, even though it was this time, the limit isn’t necessarily different from the colimit. For example, diagram with just one set has the same limit and colimit: both of them are just a set and a one-to-one correspondence between that set and the original set.

Let’s return to the earlier diagram of a dynamical system:

It’s now natural to ask what the colimit would be for this diagram. So let’s work that out with this picture.

Here’s what the definition of the colimit works out to this time: A colimit of this diagram is a set *U* together with a function *k* : *A* → *U* for which *k* = *k *∘ *f*. Furthermore, for any other set *S* and function *g* : *A* → *S* where *g* = *g* ∘ *f*, there is a unique function *h* : *U* → *S* such that *g* = *h* ∘ *k*.

Let’s break down the meaning one piece at a time.

- First,
*k*=*k*∘*f*. What this tells us is that for any*x*in*A*,*k*(*f*(*x*)) =*k*(*x*). But that’s true for anything at all in*A*, including*f*(*x*), and*f*(*f*(*x*)) and so on, so we can write a chain of equalities:*k*(*x*) =*k*(*f*(*x*)) =*k*(*f*(*f*(*x*))) =*k*(*f*(*f*(*f*(*x*)))) = …, forever and ever. This brings us to another idea in dynamical systems: the orbit. An**orbit**(or sometimes*trajectory*) is a set of values that eventually map to the same things when iterating the transition function. So this says that*k*maps everything in the same orbit of*f*to the same result. (So do all valid choices of*g*, which has the same condition.) - The existence of
*h*tells us that*k*maps distinct orbits of*f*to different places. (Otherwise, if*g*mapped them to different places, then*g*couldn’t be written as a composition with*k*.) - The uniqueness of
*h*tells us that there’s nothing else in*U*other than one element for each orbit of*f*. (Otherwise,*h*would have a choice where to map these extra elements, so it wouldn’t be unique.)

Summing it all up, *U* is effectively the set of all orbits of the function *f*. An orbit can be thought of an an “eventual” behavior of *f* when applied over and over if you just ignore differences at the beginning of the process. In fact, a fixed point is one type of orbit! If *x* is a fixed point of *f*, then anything that ever reaches *x* will stay there forever since applying *f* more often doesn’t change the value. But it’s not the only kind of orbit. You can also have a longer cycle of repeating values. Or, despite what the name implies, an orbit need not be cyclic at all. For example, the function *f*(*n*) = *n + *1 on the natural numbers has only one orbit: the one that counts off toward positive infinity increasing one number at a time.

Orbits come up all the time when one looks at iterated functions. The famous Collatz conjecture, for example, is just the statement that the function *f* on the positive integers defined by

has only one orbit. If the Collatz conjecture is true, then that one orbit cycles through the sequence 4, 2, 1, 4, 2, 1, … forever, and every possible starting point eventually ends up in that cycle. If the Collatz conjecture is false, then there is some other orbit, whether it’s another cycle or a sequence that ends up growing larger and larger forever.

Just like disjoint unions and cartesian products were two different takes on combining two sets, fixed points and orbits are related but different takes on understanding the eventual behaviors of a dynamical system. One is the limit, and one is the colimit, but both start from the same simple diagram.

This was just one example of a fundamental idea from a branch of mathematics that just pops out of a very abstract and general construction, fully defined as if by magic. Even though the diagram involved is trivial, I haven’t actually seen this mentioned in standard lists of examples of limits and colimits in textbooks! This is surprising, but I’m also glad, because I think I’d find this a little less exciting if it had just been example 2 in a textbook.

- I would strongly encourage you to explore limits and colimits on your own by considering other simple diagrams you can draw, and checking whether there’s some natural characterization of the limit and colimit.
- I’ve also stuck to sets and functions in this article, but the “game” where we didn’t talk about elements or membership did have a point. When you play that game with sets, you get an alternative kind of set theory for mathematics called the “Elementary Theory of the Category of Sets”, or ETCS for short. In the traditional ZF set theory, you assume there’s some notion of membership, and then work to build up ordered pairs, then relations, then functions. But in ETCS, you assume there’s a notion of functions, and work backward to recover ordered pairs and such. You can even recover an idea analogous to set membership: an element of a set is analogous to a function from some other one-element set, which we defined without reference to elements, into that set.
- But because there was no direct reference to elements, the same definitions work in any other category; that is, a collection of things and arrows between them that compose in a manner like function composition. You can, for example, consider any algebraic structure (groups, rings, etc.) and the homomorphisms between them, or topological spaces and continuous functions between them, or even fundamentally non-function-like things such as partial orders (or any other reflexive transitive relation).
- Although the same definitions work for any category, but they can mean very different things, and they may or may not even exist. One example: if you work out what sums and products of two things (that just means the colimit and limit of a diagram with two objects and no arrows) look like for the category of abelian groups, you’ll discover that they are the same thing! Both correspond to a notion that’s usually called a direct sum or direct product, which are the same thing for any finite number of groups. That’s very different from sets, where one was the cartesian product and the other the disjoint union.

Please do play around with your own diagrams, your own categories, and so on, and see where it takes you. If you find another interesting limit or colimit that isn’t a common example, please comment and let me know about it.

]]>The first theorem, initially by Allan Gibbard and generalized by later work by Gibbard and others, showed that any collective decision-making process must have one of three properties:

- There are only two possible outcomes.
- There is a dictator, who can choose the outcome regardless of any choices made by anyone else.
- It is strategic, meaning that the best way to get what you want depends not just on your preferred outcome but also on anticipating what others will do.

This is a pretty big deal. Think about elections, for example. Assuming we have more than two candidates and don’t have a dictatorship, strategic voting must be possible. That’s too bad, because we’d like to be able to tell people that they can always just vote for their favorite candidate. Alas, this can never be the case. It’s widely known that hopeless third parties can “spoil” the election by siphoning votes from major candidates. Clever systems like instant-runoff are supposed to fix that problem, but Gibbard showed that in general, it can never completely be fixed. (At least, not without accepting a limited two-party system or a dictatorship.)

It’s not strictly part of Gibbard’s theorem, but Gibbard and others showed later, that this remains true even if there is randomness is involved. You can wait until after the voting happens to choose the dictator, or to decide which two outcomes to allow; but ultimately the same conditions apply. That won’t be important for the rest of this article, though.

While Gibbard’s theorem addresses any collective decision-making process, Gale and Shapley’s Deferred Acceptance (DA) algorithm is a solution for a specific kind of decision-making problem: matching. The idea is to match up some set of candidates with some set of positions. Each candidate has preferences about which position they want, and each position has preferences on which candidates they prefer. (The presentation is historically given in terms of marriage, but that presents difficulties when you consider same-sex marriage, gender identity, etc., so let’s call it candidates and positions!)

Gale and Shapley proved that there exists a matching of candidates to positions which is *stable*, in the sense that no candidate would prefer a position which would *also* prefer them. (The word “stable” perhaps comes from the fact that such a candidate could then switch to their preferred position, so an unstable matching wouldn’t last long.) They gave a specific algorithm, called DA or Deferred Acceptance, which is described in the previous link. It starts from a list of each candidate and position’s preferences, and generates a stable matching. They went on to prove that the matching it generates is the unique one that all candidates would agree is best.

You might imagine that someone very clever could find a way to get a better position by being untruthful about their preferences with this matching algorithm. For example, if someone wants a position but doesn’t think they’ll be accepted, would it be wise to instead express a more realistic preference? Surprisingly, Alvin Roth showed that this is impossible! The DA algorithm is strategy-proof, in the sense that the best way to get the position you most prefer is to provide your honest-to-goodness *true* list of preferences.

So now we have a dilemma. The stable matching definitely has more than two possible outcomes. It definitely doesn’t have a dictator. So it must, by Gibbard’s theorem, be strategic. But hold on… Roth showed that an honest list of preferences is always best, regardless of the preferences of others. How can both be true? The answer is subtle, and it will show that we need to be careful about exactly what words mean in each case.

Here’s the key:

- Gibbard’s theorem assumes that each participant has a set of preferences about the
*entire*outcome of the process. In this case, the outcome of the process is the matching of*all*candidates to*all*positions. - Roth, on the other hand, assumes that each candidate has a set of preferences about which position they
*individually*will be assigned to. They are*not*entitled to a preference about which positions are given to which other candidates.

Roth’s preferences are limited enough that they only produce conflict indirectly, such as when there are more candidates who want a certain position than there are slots to accept them. And that turns out to yield just enough slack to escape the consequences of Gibbard’s theorem, and allow the decision process to be non-strategic in Roth’s sense.

But be careful! This means that if candidates *do* have preferences that depend on the placement of others, Roth’s result no longer applies. To take a simple example, suppose two close friends want to work together in the same position. Despite Roth’s dismissal of strategy, they will need to strategize to accomplish this. For example, they would be well-advised to *favor* applying for positions that are likely to be generally unpopular. They should also *avoid* applying to positions that wouldn’t want also want their friend. Both of these are strategic choices that involve anticipating the preferences of others.

In the end, of course, Gibbard and Roth’s results can co-exist, but one must be careful in exactly what they mean.

- I want to be clear that this isn’t new. In fact, Roth cited Gibbard’s result in his paper on the matching problem, and is obviously aware of this nuance.
- For this analysis, I’m considering only candidates as the participants in the decision-making process, and the employer preferences as being baked into the decision algorithm. Roth’s result does
*not*extend to employers expressing their true preferences! In fact, Roth shows that no process can exist that also incentivizes employers to indicate their true preferences among candidates.

In this article, we’ll develop a Haskell library for continued fractions. Continued fractions are a different representation for real numbers, besides the fractions and decimals we all learned about in grade school. In the process, we’ll build correct and performant software using ideas that are central to the Haskell programming language community: equational reasoning, property testing, and term rewriting.

*I posted **an early version** of this article to Reddit, and I’m grateful to those who responded with insight and questions that helped me complete this journey, especially **iggybibi** and **lpsmith**.*

Let’s start with a challenge. Assume I know how to write down an integer. Now, how can I write down a real number? In school, we’ve all learned a few answers to this question. The big two are *fractions* and *decimals*.

**Fractions **have simple forms for rational numbers. Just write two integers, for the numerator and denominator. That’s great! But alas, they are no good at all for irrational numbers like π, or the square root of 2. I could, of course, write a fraction that’s close to the irrational number, such as 22/7 as an approximation of π. But this is only good up to a certain limited precision. To get a closer approximation (say, 355/113), I have to throw it away and start over. Sure, I could write an infinite sequence of fractions which converge to the rational number, but *every *finite prefix of such a sequence is redundant and wasted.

Irrationals are just the limiting case of rational numbers, and very precise rational numbers suffer from this, too. It’s not easy to glance at a fraction like 103993/33102 and get a sense of its value. (Answer: this is another approximation of π.) Or, can you quickly tell whether 5/29 is larger or smaller than 23/128?

Instead, we can turn to **decimals**. If you’re like me, you don’t think about the mechanics of decimal numbers very often, and conventional notation like 3.14159… really hides what’s going on. I’d like to offer a different perspective by writing it like this:

Here, *d*₀ is the integer part, and *d*₁, *d*₂, *d*₃, *d*₄, etc. are called the decimal *digits*. Notice the symmetry here: after the integer part, a decimal has inside of it *another decimal* representing its tenths.

This recursive structure leads to the main property that we care about with decimals: we can *truncate* them. If I have a hundred-digit decimal number, and don’t need that much accuracy, I can simply chop off what I don’t need to get a simpler approximation. This is the same property that lets me write a decimal representation of an *irrational* number. As long as we accept that infinite sequences exist — and as Haskell programmers, infinite data structures don’t scare us…. much — there is a *unique* infinite sequence of digits that represents any irrational number. The sequence must be infinite is to be expected, since there are uncountably many irrationals, and any finite representation has only countable values. (Unfortunately, this uniqueness isn’t quite true of rational numbers, since 0.4999… and 0.5 are the same number, for example!)

As nice as that is, though, there are also some significant disadvantages to decimals as a representation:

- Although a few rational numbers have simple decimal representations, almost all rational numbers actually need infinite repeating decimals. Specifically, a rational number whose denominator has any prime factor other than 2 or 5 repeats indefinitely.
- Why 10? This is a question for historians, anthropologists, perhaps biologists, but has no answer within mathematics. It is a choice that matters, too! In a different base, a different subset of the rational numbers will have finite decimals. It’s the choice of base 10 that makes it harder to compute with 1/3. Having to make an arbitrary choice about something that matters so much is unfortunate.

We can make up some ground on these disadvantages with a third representation: **continued fractions**. A continued fraction is an expression of this form:

Just like with decimals, this gives a representation of a real number as a (possibly finite, or possibly infinite) sequence [*n*₀, *n*₁, *n*₂, *n*₃, *n*₄, …]. Now we call the integer parts *terms,* instead of *digits*. The first term, *n*₀, is the integer part. The difference is that when we have a remainder to write, we take its reciprocal and continue the sequence with that.

Note:

- Continued fractions represent
*all*rational numbers as finite sequences of terms, while still accounting for*all*irrationals using infinite sequences. - Continued fractions do
*not*depend on an arbitrary choice of base. The reciprocal is the same regardless of how we choose to write numbers. - Continued fractions, like decimals, can be truncated to produce a rational approximation. Unlike decimals, though, the approximations are
*optimal*in the sense that they are as close as possible to the original number without needing a larger denominator.

With this motivation in mind, let’s see what we can do about using continued fractions to represent real numbers.

Time to write some code. As a declarative language that promotes equational reasoning, Haskell is a joy to use for problems like this one.

If you want to see it all together, the code and tests for this part can be found at https://code.world/haskell#PVxpksYe_YAcGw3CNdvEFzw.

To start, we want a new type for continued fractions. A continued fraction is a (finite or infinite) sequence of terms, so we could simply represent it with a Haskell list. However, I find it more readable to define a new type so that we can choose constructors with suggestive names. (It helps with this decision that standard list combinators like map or ++ don’t have obvious interpretations for continued fractions.) I’ve defined the type like this.

dataCFracwhere

(:+/) :: Integer -> CFrac -> CFrac

Inf:: CFrac

deriving instanceEq CFrac

deriving instanceShow CFrac

infixr 5 :+/

Here, x :+/ y means *x* + 1/*y*, where *x* is an integer, and *y* is another continued fraction (the reciprocal of the remainder). This is the basic building block of continued fractions, as we saw above.

The other constructor may be surprising, though. I’ve defined a special continued fraction representing infinity! Why? We’ll use it for terminating fractions. When there is no remainder, we have *x* = *x *+ 0 = *x* + 1/∞, so Inf is the reciprocal of the remainder for a continued fraction that represents an integer. That we can represent ∞ itself as a top-level continued fraction wasn’t the goal, but it doesn’t seem worth dodging. Noticing that an empty list of terms acts like infinity is actually a great intuition for things to come.

There is a simple one-to-one correspondence between continued fractions and their lists of terms, as follows:

- The :+/ operator corresponds to the list cons operator, :.
- The Inf value corresponds to the empty list, [].

We can define conversions to witness this correspondence between CFrac and term lists.

terms:: CFrac -> [Integer]

terms Inf = []

terms (n :+/ x) = n : terms x

fromTerms:: [Integer] -> CFrac

fromTerms = foldr (:+/) Inf

Rational numbers can be converted into continued fractions by following the process described above, and the code is very short.

cfFromFrac:: Integer -> Integer -> CFrac

cfFromFrac _ 0 = Inf

cfFromFrac n d = n `div` d :+/ cfFromFrac d (n `mod` d)

cfFromRational:: Rational -> CFrac

cfFromRational r = cfFromFrac (numerator r) (denominator r)

The real fun of continued fractions, though, is that we can represent irrational numbers precisely, as well!

Rational numbers are precisely the solutions to *linear* equations. The simplest *irrational* numbers to write as continued fractions are the *quadratic* irrationals: numbers that are not rational, but are solutions to quadratic equations, and these are precisely the continued fractions with infinitely repeating terms. We can write a function to build these:

cycleTerms:: [Integer] -> CFrac

cycleTerms ns = fix (go ns)

where

go [] x = x

go (t : ts) x = t :+/ go ts x

And then we can write a quick catalog of easy continued fractions, including some small square roots and the golden ratio.

sqrt2:: CFrac

sqrt2 = 1 :+/ cycleTerms [2]

sqrt3:: CFrac

sqrt3 = 1 :+/ cycleTerms [1, 2]

sqrt5:: CFrac

sqrt5 = 2 :+/ cycleTerms [4]

phi:: CFrac

phi = cycleTerms [1]

It’s really worth pausing to think how remarkable it is that these fundamental quantities, whose decimal representations have no obvious patterns at all, are so simple as continued fractions!

And it doesn’t end there. Euler’s constant *e*, even though it’s a *transcendental* number, also has a simple pattern as a continued fraction.

exp1:: CFrac

exp1 = 2 :+/ fromTerms (concatMap (\n -> [1, 2 * n, 1]) [1 ..])

This really strengthens the claim I made earlier, that being based on the reciprocal instead of multiplying by an arbitrary base makes continued fractions somehow less arbitrary than decimals.

I wish I could tell you that π has a similar nice pattern, but alas, it does not. π has a continued fraction just as random as its representation in other notations, and just as hard to compute to arbitrary precision.

It will be interesting later, though, to look at its first few terms. For that, it’s good enough to use the Double approximation to π from Haskell’s base library. It’s not really π; in fact, it’s a rational number! But it‘s close enough for our purposes. Computing an exact value of π needs some more machinery, which we will develop in the next section.

approxPi:: CFrac

approxPi = cfFromRational (realToFrac (pi :: Double))

We’ll now return to a more theoretical concern. We want each real number to have a unique continued fraction representation. In fact, I snuck something by you earlier when I derived an Eq instance for CFrac, because that instance is only valid when the type has unique representations. However, this isn’t quite true in general. Here are some examples of continued fractions that are written differently, but have the same value:

Case (1) deals with negative numbers. These will actually cause quite a few problems — not just now, but in convergence properties of algorithms later, too. I have been quietly assuming up to this point that all the numbers we’re dealing with are non-negative. Let’s make that assumption explicit, and require that all continued fraction terms are positive. That takes care of case (1).

This might seem like a terrible restriction. In the end, though, we can recover negative numbers in the same way humans have done so for centuries: by keeping track of a sign, separate from the absolute value. A signed continued fraction type, then, would wrap CFrac and include a Bool field for whether it’s negative. This is entirely straight-forward, and I leave it as an exercise for the interested reader.

Case (2) involves a term with a value of zero. It’s normal for the integer part of a continued fraction to be zero. After that, though, the remaining terms should never be zero, because they are the reciprocals of numbers less than one. So we will impose a second rule that only the first term of a continued fraction may be zero.

Case (3) involves a rational continued fraction with 1 as its final term. It turns out that even after solving cases (1) and (2), every rational number has two distinct continued fractions: one that has a 1 that is *not* the integer part as its last term, and one that doesn’t. This is analogous to the fact that terminating decimals have two decimal representations, one ending in an infinite sequence of 0s, and the other in an infinite sequence of 9s. In this case, the trailing 1 is longer than it needs to be, so we’ll disallow it, making a third rule that a term sequence can never end in 1, unless that 1 is the integer part.

Subject to these three rules, we get the canonical forms we wanted. It’s unfortunate that non-canonical forms are representable in the CFrac type (i.e., we have failed to make illegal data unrepresentable), but we can do the next best thing: check that our computations all produce canonical values. To do so, let’s write some code to check that a CFrac obeys these rules.

isCanonical:: CFrac -> Bool

isCanonical Inf = True

isCanonical (n :+/ cont) = n >= 0 && isCanonicalCont cont

isCanonicalCont:: CFrac -> Bool

isCanonicalCont Inf = True

isCanonicalCont (1 :+/ Inf) = False

isCanonicalCont (n :+/ cont) = n > 0 && isCanonicalCont cont

One very powerful idea that originated in the Haskell community is property testing. We state a property that we want to verify about our code, and allow a testing framework like to QuickCheck to make up examples and test them. Now is a great time to try this out. Here I’ve defined a few properties around canonical forms. The first is a trivial property that just asserts that we make the right decision about the specific examples above. The second, less trivial, guarantees that our cfFromRational function, the only real semantic function we’ve written, produces canonical forms.

prop_isCanonical_examples:: Bool

prop_isCanonical_examples =

not (isCanonical (2 :+/ (-2) :+/ Inf))

&& isCanonical (1 :+/ 2 :+/ Inf)

&& not (isCanonical (1 :+/ 0 :+/ 2 :+/ Inf))

&& isCanonical (3 :+/ Inf)

&& not (isCanonical (1 :+/ 1 :+/ Inf))

&& isCanonical (2 :+/ Inf)

prop_cfFromRational_isCanonical:: NonNegative Rational -> Bool

prop_cfFromRational_isCanonical (NonNegative x) =

isCanonical (cfFromRational x)

(We could try to check that the irrational values are canonical, as well, but alas, we run into non-termination since they are infinite. This will be a theme: we’ll have to be satisfied with checking rational test cases, and relying on the fact that any bugs with irrational values will manifest in some nearby rational value.)

We can now run our code for this section. Try it yourself at https://code.world/haskell#PVxpksYe_YAcGw3CNdvEFzw. In addition to running the tests, we’ll print out our example continued fractions so we can see them.

Testingprop_isCanonical_examples

+++ OK, passed 1 test.

Testingprop_cfFromRational_isCanonical

+++ OK, passed 100 tests.

3/7= 0 :+/ (2 :+/ (3 :+/ Inf))sqrt2= 1 :+/ (2 :+/ (2 :+/ (2 :+/ (2 :+/ (2 :+/ (2 :+/ (2...sqrt3= 1 :+/ (1 :+/ (2 :+/ (1 :+/ (2 :+/ (1 :+/ (2 :+/ (1...sqrt5= 2 :+/ (4 :+/ (4 :+/ (4 :+/ (4 :+/ (4 :+/ (4 :+/ (4...phi= 1 :+/ (1 :+/ (1 :+/ (1 :+/ (1 :+/ (1 :+/ (1 :+/ (1...e= 2 :+/ (1 :+/ (2 :+/ (1 :+/ (1 :+/ (4 :+/ (1 :+/ (1...approxPi= 3 :+/ (7 :+/ (15 :+/ (1 :+/ (292 :+/ (1 :+/ (1 :+/ (1 :+/

(2 :+/ (1 :+/ (3 :+/ (1 :+/ (14 :+/ (3 :+/ (3 :+/ (2 :+/ (1 :+/

(3 :+/ (3 :+/ (7 :+/ (2 :+/ (1 :+/ (1 :+/ (3 :+/ (2 :+/ (42 :+/

(2 :+/ Inf))))))))))))))))))))))))))

In the previous section, we converted from rational numbers into continued fractions. We now turn to the inverse conversion: from a continued fraction to a rational number.

The code and tests for this part can be found at https://code.world/haskell#PemataxO6pmYz99dOpaeT9g.

Not all continued fractions correspond to rational numbers. However, one of the key benefits of the continued fraction representation is that its truncated term sequences represent particularly efficient rational approximations. These rational approximations are called the *convergents*. It’s not hard to write a simple recursive function to compute the convergents.

naiveConvergents:: CFrac -> [Rational]

naiveConvergents Inf = []

naiveConvergents (n :+/ r) =

fromInteger n :

map (\x -> fromInteger n + 1 / x) (naiveConvergents r)

We can also use QuickCheck to write a property test, verifying that a rational number is its own final convergent. This gives a sanity check on our implementation.

prop_naiveConvergents_end_at_Rational::

NonNegative Rational -> Property

prop_naiveConvergents_end_at_Rational (NonNegative r) =

last (naiveConvergents (cfFromRational r)) === r

You may guess from my choice of name that I am not happy with this implementation. The problem with naiveConvergents is that it recursively maps a lambda over the entire tail of the list. As it recurses deeper into the list, we build up a bunch of nested mapped lambdas, and end up evaluating O(*n*²) of these lambdas to produce only *n* convergents.

Let’s fix that quadratic slowdown. A more efficient implementation can be obtained by defunctionalizing. The lambdas have the form \x -> fromInteger n + 1 / x. To avoid quadratic work, we need a representation that lets us compose functions of this form into something that can be applied in constant time. What works is *mobius transformations*.

A **mobius transformation** is a function of the form:

For our purposes, the coefficients *a*, *b*, *c*, and *d* are always integers. Instead of an opaque lambda, we represent a mobius transformation by its four coefficients.

dataMobiuswhere

Mobius:: Integer -> Integer -> Integer -> Integer -> Mobius

deriving instanceEq Mobius

deriving instanceShow Mobius

We seek to rewrite our computation of convergents using mobius transformations. First of all, we will restructure it as a left fold, which exposes the mobius transformation itself as an accumulator. We’ll want these building blocks:

- An
*identity*mobius transformation, to initialize the accumulator. - A
*composition*to combine two mobius transformations into a single one.

In other words, mobius transformations should form a monoid!

instanceSemigroup Mobiuswhere

Mobius a1 b1 c1 d1<>Mobius a2 b2 c2 d2 =

Mobius

(a1 * a2 + b1 * c2)

(a1 * b2 + b1 * d2)

(c1 * a2 + d1 * c2)

(c1 * b2 + d1 * d2)

instanceMonoid Mobiuswhere

mempty= Mobius 1 0 0 1

There are axioms for monoids: mempty should act like an identity, and <> should be associative. We can test these with property tests.

This is as good a time as any to set up a QuickCheck generator for mobius transformations. To keep things sane, we want to always want to choose mobius transformations that have non-zero denominators, since otherwise the function is undefined on all inputs. We will also want non-negative coefficients in the denominator, since this ensures the transformation is defined (and monotonic, which will matter later on) for all non-negative input values.

instanceArbitrary Mobiuswhere

arbitrary=

suchThat

( Mobius

<$> arbitrary

<*> arbitrary

<*> (getNonNegative <$> arbitrary)

<*> (getNonNegative <$> arbitrary)

)

(\(Mobius _ _ c d) -> max c d > 0)

shrink(Mobius a b c d) =

[ Mobius a' b' c' d'

| (a', b', NonNegative c', NonNegative d') <-

shrink (a, b, NonNegative c, NonNegative d),

max c' d' > 0

]

With the generator in place, the tests are trivial to write:

prop_Mobius_id:: Mobius -> Property

prop_Mobius_id m = mempty <> m === m .&&. m <> mempty === m

prop_Mobius_assoc:: Mobius -> Mobius -> Mobius -> Property

prop_Mobius_assoc m1 m2 m3 = (m1 <> m2) <> m3 === m1 <> (m2 <> m3)

Now we return to computing convergents. Here is the improved implementation.

convergents:: CFrac -> [Rational]

convergents = go mempty

where

go m Inf = []

go m (n :+/ x) = mobiusLimit m' : go m' x

where

m' = m <> Mobius n 1 1 0

mobiusLimit (Mobius a _ c _) = a % c

In the expression, go m x, m is a mobius transformation that turns a convergent of x into a convergent for the entire continued fraction. When x is the entire continued fraction, m is the identity function. As we recurse into the continued fraction, we compose transformations onto m so that this remains true. The lambda \x -> fromInteger n + 1 / x from the naive implementation is now defunctionalized into Mobius n 1 1 0.

The remaining task is to compute a truncated value of the continued fraction at each step. Recall that a terminating continued fraction is one which has an infinite term. To determine this truncated value, then, we want to consider the limit of the accumulated mobius transformation as its input tends to infinity. This is:

and that completes the implementation. A quick test helps us to be confident in the result.

prop_convergents_matches_naive:: NonNegative Rational -> Property

prop_convergents_matches_naive (NonNegative r) =

convergents (cfFromRational r)

=== naiveConvergents (cfFromRational r)

Let’s also consider the conversion from continued fractions to decimals. This time, both representations can be approximated by just truncating, so instead of producing a sequence of approximations as we did with rational numbers, we will just a single decimal that lazily adds to the representation with more precision.

Mobius transformations are still a good tool for this job, but we need to refine our earlier observations. Consider the general form of a mobius transformation again:

It’s worth taking the time now to play around with different choices for *a*, *b*, *c*, and *d*, and what they do to the function. You can do so with the Desmos link below:

We previously noted one of these bounds, and now add the other:

As long as *c* and *d* are positive (remember when we agreed to keep them so?), *f* is monotonic on the entire interval [0, ∞), so *f* is bounded by these two fractions. In particular, if *a*/*c* and *b*/*d* have the same integer part, then we know this is the integer part of the *result* of *f*, regardless of the

(always non-negative) value of x. We can express this insight as a function:

mobiusIntPart:: Mobius -> Maybe Integer

mobiusIntPart (Mobius a b c d)

| c /= 0 && d /= 0 && n1 == n2 = Just n1

| otherwise = Nothing

where

n1 = a `div` c

n2 = b `div` d

We can now proceed in a manner similar to the computation of convergents. We maintain a mobius transformation which maps the *remaining* continued fraction to the *remaining* decimal. Initially, that is the identity. But this time, instead of blindly emitting an approximation at each step, we make a choice:

- If the mobius transformation is bounded to guarantee the integer part of its result, then we can produce that decimal digit. We then update the transformation
*m*to yield the remaining decimal places after that. This will tend to widen its bounds.

- Otherwise, we will pop off the integer part of the continued fraction, and update the transformation to expect the remaining continued fraction terms after that. This will tend to narrow its bounds, so we’ll be closer to producing a new decimal digit.

- If we ever end up with the zero transformation (that is,
*a*and*b*are both zero), then all remaining decimal places will be zero, so we can stop. If we encounter an infinite term of input, though, we still need to continue, but we can update*b*and*d*to match*a*and*c*, narrowing those bounds to a single point to indicate that we now know the exact value of the input.

Here is the implementation:

cfToBase:: Integer -> CFrac -> [Integer]

cfToBase base = go mempty

where

go (Mobius 0 0 _ _) _ = []

go m x

| Just n <- mobiusIntPart m,

let m' = Mobius base (-base * n) 0 1 <> m

= n : go m' x

go m (n :+/ x) = go (m <> Mobius n 1 1 0) x

go (Mobius a _ c _) Inf = go (Mobius a a c c) Inf

cfToDecimal:: CFrac -> String

cfToDecimal Inf = "Inf"

cfToDecimal x = case cfToBase 10 x of

[] -> "0.0"

[z] -> show z ++ ".0"

(z : digits) -> show z ++ "." ++ concatMap show digits

To test this, we will compare the result to standard Haskell output. However, Haskell’s typical Show instances for fractional types are too complex, producing output like 1.2e6 that we don’t want to match. The Numeric module has the simpler functionality we want. There are some discrepancies due to rounding error, as well, so the test will ignore differences in the last digit of output from the built-in types.

prop_cfToDecimal_matchesRational:: NonNegative Rational -> Property

prop_cfToDecimal_matchesRational (NonNegative r) =

take n decFromRat === take n decFromCF

where

decFromRat = showFFloat Nothing (realToFrac r :: Double) ""

decFromCF = cfToDecimal (cfFromRational r)

n = max 10 (length decFromRat - 1)

Earlier, we settled for only an approximation of π, and I mentioned that π doesn’t have a nice pattern as a continued fraction. That’s true, but it’s not the whole story. If we relax the definition of a continued fraction just a bit, there *are* several known expressions for π as a *generalized* continued fraction. The key is to allow numerators other than 1.

Here’s one that’s fairly nice:

Generalized continued fractions don’t have many of the same nice properties that standard continued fractions do. They are not unique, and sometimes converge very slowly (or not at all!), yielding poor rational approximations when truncated. But we can compute with them using mobius transformations, using the same algorithms from above. In particular, we can use the same tools as above to convert from a generalized continued fraction to a standard one.

First, we’ll define a type for generalized continued fractions.

dataGCFracwhere

(:+#/) :: (Integer, Integer) -> GCFrac -> GCFrac

GInf:: GCFrac

deriving instanceShow GCFrac

This time I haven’t defined an Eq instance, because representations are not unique. Converting from a standard to a generalized continued fraction is trivial.

cfToGCFrac:: CFrac -> GCFrac

cfToGCFrac Inf = GInf

cfToGCFrac (n :+/ x) = (n, 1) :+#/ cfToGCFrac x

The conversion in the other direction, though, requires a mobius-like algorithm.

gcfToCFrac:: GCFrac -> CFrac

gcfToCFrac = go mempty

where

go (Mobius a _ c _) GInf = cfFromFrac a c

go m gcf@((int, numer) :+#/ denom)

| Just n <- mobiusIntPart m =

n :+/ go (Mobius 0 1 1 (- n) <> m) gcf

| otherwise = go (m <> Mobius int numer 1 0) denom

We can write a property test to verify that, at least, this gives a correct round trip from continued fractions to generalized, and back to standard again.

prop_GCFrac_roundtrip:: NonNegative Rational -> Property

prop_GCFrac_roundtrip (NonNegative r) =

gcfToCFrac (cfToGCFrac x) === x

where

x = cfFromRational r

Here’s the definition of π above, expressed in code, and converted to a continued fraction. I’ve written a test to verify that it mostly agrees with the approximate value we defined earlier, just as a sanity check on the implementation.

gcfPi= (0, 4) :+#/ go 1

where

go i = (2 * i - 1, i ^ 2) :+#/ go (i + 1)

exactPi= gcfToCFrac gcfPi

prop_correct_pi:: Property

prop_correct_pi =

take 17 (cfToDecimal approxPi)

=== take 17 (cfToDecimal exactPi)

We now have built enough to get some interesting results. For example, we have exact representations several irrational numbers, and we can use those to obtain both rational approximations and long decimal representations of each.

In addition to running our tests, we will print the continued fraction terms, convergents, and decimal representation of each of our test values. Check out the full code at https://code.world/haskell#PemataxO6pmYz99dOpaeT9g.

Testingprop_naiveConvergents_end_at_Rational

+++ OK, passed 100 tests.

Testingprop_Mobius_id

+++ OK, passed 100 tests.

Testingprop_Mobius_assoc

+++ OK, passed 100 tests.

Testingprop_convergents_matches_naive

+++ OK, passed 100 tests.

Testingprop_cfToDecimal_matchesRational

+++ OK, passed 100 tests.

Testingprop_GCFrac_roundtrip

+++ OK, passed 100 tests.

Testingprop_correct_pi

+++ OK, passed 1 test.

3/7:- terms: 0 :+/ (2 :+/ (3 :+/ Inf))

- frac : [0 % 1,1 % 2,3 % 7]

- dec : 0.428571428571428571428571428571428571428571428571

sqrt2:

- terms: 1 :+/ (2 :+/ (2 :+/ (2 :+/ (2 :+/ (2 :+/ (2 :+/ (2

- frac : [1 % 1,3 % 2,7 % 5,17 % 12,41 % 29,99 % 70,239 % 1

- dec : 1.414213562373095048801688724209698078569671875376

sqrt3:

- terms: 1 :+/ (1 :+/ (2 :+/ (1 :+/ (2 :+/ (1 :+/ (2 :+/ (1

- frac : [1 % 1,2 % 1,5 % 3,7 % 4,19 % 11,26 % 15,71 % 41,9

- dec : 1.732050807568877293527446341505872366942805253810

sqrt5:

- terms: 2 :+/ (4 :+/ (4 :+/ (4 :+/ (4 :+/ (4 :+/ (4 :+/ (4

- frac : [2 % 1,9 % 4,38 % 17,161 % 72,682 % 305,2889 % 129

- dec : 2.236067977499789696409173668731276235440618359611

phi:

- terms: 1 :+/ (1 :+/ (1 :+/ (1 :+/ (1 :+/ (1 :+/ (1 :+/ (1

- frac : [1 % 1,2 % 1,3 % 2,5 % 3,8 % 5,13 % 8,21 % 13,34 %

- dec : 1.618033988749894848204586834365638117720309179805

e:

- terms: 2 :+/ (1 :+/ (2 :+/ (1 :+/ (1 :+/ (4 :+/ (1 :+/ (1

- frac : [2 % 1,3 % 1,8 % 3,11 % 4,19 % 7,87 % 32,106 % 39,

- dec : 2.718281828459045235360287471352662497757247093699

approxPi:

- terms: 3 :+/ (7 :+/ (15 :+/ (1 :+/ (292 :+/ (1 :+/ (1 :+/

- frac : [3 % 1,22 % 7,333 % 106,355 % 113,103993 % 33102,1

- dec : 3.141592653589793115997963468544185161590576171875

exactPi:

- terms: 3 :+/ (7 :+/ (15 :+/ (1 :+/ (292 :+/ (1 :+/ (1 :+/

- frac : [3 % 1,22 % 7,333 % 106,355 % 113,103993 % 33102,1

- dec : 3.141592653589793238462643383279502884197169399375

Most of these are exact irrational values, so we can only look at a finite prefix of the convergents, which go on forever getting more and more accurate (at the expense of larger denominators). I’ve chopped them off at 50 characters, so that they comfortably fit on the screen, but you can change that to see as much of the value as you like.

With this data in front of us, we can learn more about continued fractions and how they behave.

Did you notice that the convergents of π jump from 355/113 all the way to 103993/33102? That’s because 355/113 is a *remarkably* good approximation of π. You have to jump to a much larger denominator to get a better approximation. You can observe this same fact directly from the continued fraction terms. The fifth term is 292, which is unusually large. 292 is the next term after the truncation that yields the convergent 355/113. Large terms like that in a continued fraction indicate unusually good rational approximations.

Conversely, what if there are no large continued fraction terms? More concretely, what can we say about the number that has all 1s as its continued fraction? That’s the golden ratio! What this means is that there are *no* particularly good rational approximations to the golden ratio. And you can see that, and hear it!

- Flowers tend to grow new petals or seeds at an angle determined by the golden ratio, specifically because there are no small intervals (i.e., small denominators) after which they will overlap.
- Musical notes sound harmonic when their frequencies are close to simple rational numbers. The least harmonic frequencies are those related by the golden ratio, and you can hear it!

(By the way, did you notice the fibonacci numbers in the convergents of the golden ratio? That’s a whole other topic I won’t get into here.)

Now that we’ve got a nice type and some conversions, let’s turn our attention to computing with continued fractions.

The code and tests for this section can be found at https://code.world/haskell#P_T4CkfWGu3bgSwFV0R04Sg.

There are a few computations that are easy, so let’s warm up with those.

Computing the reciprocal of a continued fraction turns out to be almost trivial, because continued fractions are built out of reciprocals! All we need to do for this one is shift all the terms by one… in either direction, really. Since our normal form only allows zero as the first term, we’ll make the decision based on whether the whole part is already zero. We also need a special case for 1 to prevent building a non-normal representation.

cfRecip:: CFrac -> CFrac

cfRecip (0 :+/ x) = x

cfRecip (1 :+/ Inf) = 1 :+/ Inf

cfRecip x = 0 :+/ x

We will test a few expected properties:

- The reciprocal is self-inverse.
- The result matches taking a reciprocal in the rationals.
- The reciprocal always gives an answer in normal form.

prop_recipRecip_is_id:: NonNegative Rational -> Property

prop_recipRecip_is_id (NonNegative r) =

r /= 0 ==> cfRecip (cfRecip (cfFromRational r))

=== cfFromRational r

prop_cfRecip_matches_Rational:: NonNegative Rational -> Property

prop_cfRecip_matches_Rational (NonNegative r) =

r /= 0 ==> cfFromRational (recip r) === cfRecip (cfFromRational r)

prop_cfRecip_isCanonical:: NonNegative Rational -> Property

prop_cfRecip_isCanonical (NonNegative r) =

r /= 0 ==> isCanonical (cfRecip (cfFromRational r))

Comparing two continued fractions is also not too difficult. It’s similar to comparing decimals, in that you look down the sequence for the first position where they differ, and that determines the result. However, since we take reciprocals at each term of a continued fraction, we’ll need to flip the result at each term.

cfCompare:: CFrac -> CFrac -> Ordering

cfCompare Inf Inf = EQ

cfCompare _ Inf = LT

cfCompare Inf _ = GT

cfCompare (a :+/ a') (b :+/ b') = case compare a b of

EQ -> cfCompare b' a'

r -> r

Let’s write a QuickCheck property to verify that the result matches the Ord instance for Rational, and then define an Ord instance of our own.

prop_cfCompare_matches_Rational::

NonNegative Rational -> NonNegative Rational -> Property

prop_cfCompare_matches_Rational (NonNegative x) (NonNegative y) =

compare x y === cfCompare (cfFromRational x) (cfFromRational y)

instanceOrd CFracwhere

compare= cfCompare

To make any progress on the rest of basic arithmetic, we must return to our trusty old hammer: the mobius transformation. Recall that we’ve used the mobius transformation in two ways so far:

- To produce convergents, we produced mobius transformations from terms, composed them together, and also produced a new approximation at each step.
- To produce decimals, we looked at the bounds on the result of a mobius transformation, and proceeded in one of two ways: produce a bit of output, and then add a new mobius transformation to the output side; or consume a term of input, and then add a new mobius transformation to the input side.

We will now implement mobius transformations that act on continued fractions and produce new continued fractions. The strategy is the same as it was when producing decimals, but modified to produce continued fraction terms as output instead of decimal digits.

That leads to this implementation.

cfMobius:: Mobius -> CFrac -> CFrac

cfMobius (Mobius a _ c _) Inf = cfFromFrac a c

cfMobius (Mobius _ _ 0 0) _ = Inf

cfMobius m x

| Just n <- mobiusIntPart m =

n :+/ cfMobius (Mobius 0 1 1 (- n) <> m) x

cfMobius m (n :+/ x) = cfMobius (m <> Mobius n 1 1 0) x

There are two properties to test here. The first is that computations on continued fractions match the same computations on rational numbers. To implement this test, we’ll need an implementation of mobius transformations on rational numbers. Then we’ll test that cfMobius gives results in their canonical form. For both tests, we don’t care about transformations whose rational results are negative, as we will just agree not to evaluate them.

mobius:: (Eq a, Fractional a) => Mobius -> a -> Maybe a

mobius (Mobius a b c d) x

| q == 0 = Nothing

| otherwise = Just (p / q)

where

p = fromInteger a * x + fromInteger b

q = fromInteger c * x + fromInteger d

prop_cfMobius_matches_Rational::

Mobius -> NonNegative Rational -> Property

prop_cfMobius_matches_Rational m (NonNegative r) =

case mobius m r of

Just x

| x >= 0 ->

cfMobius m (cfFromRational r) === cfFromRational x

_ -> discard

prop_cfMobius_isCanonical::

Mobius -> NonNegative Rational -> Property

prop_cfMobius_isCanonical m (NonNegative r) =

case mobius m r of

Just rat

| rat >= 0 ->

let x = cfMobius m (cfFromRational r)

in counterexample (show x) (isCanonical x)

_ -> discard

The punchline here is that, using an appropriate mobius transformation, we can perform any of the four arithmetic functions — addition, subtraction, multiplication, or division — involving one continuous fraction and one rational number.

This was all sort of a warm-up, though, for the big guns: binary operations on two continued fractions, yielding a new continued fraction. Here, for the first time, our trusted mobius transformations fail us. Indeed, this remained an unsolved problem until 1972, when Bill Gosper devised a solution. That solution uses functions of the form:

Again, for our purposes, all coefficients will be integers, and the coefficients in the denominator will always be positive. Since these are the generalization of mobius transformations to binary operations, we can call them the *bimobius transformations*.

dataBimobiuswhere

BM ::

Integer ->

Integer ->

Integer ->

Integer ->

Integer ->

Integer ->

Integer ->

Integer ->

Bimobius

deriving instanceEq Bimobius

deriving instanceShow Bimobius

The rest of the implementation nearly writes itself! We simply want to follow the same old strategy. We keep a bimobius transformation that computes the rest of the output, given the rest of the inputs. At each step, we emit more output if possible, and otherwise consume a term from *one* of two inputs. As soon as either input is exhausted (if, indeed that happens), the remaining calculation reduces to a simple mobius transformation of the remaining argument.

First, we need an analogue of mobiusIntPart, which was how we determined whether an output was available.

bimobiusIntPart:: Bimobius -> Maybe Integer

bimobiusIntPart (BM a b c d e f g h)

| e /= 0 && f /= 0 && g /= 0 && h /= 0

&& n2 == n1

&& n3 == n1

&& n4 == n1 =

Just n1

| otherwise = Nothing

where

n1 = a `div` e

n2 = b `div` f

n3 = c `div` g

n4 = d `div` h

The other building block we used to implement mobius transformations was composition with the Monoid instance. Unfortunately, that’s not so simple, since bimobius transformations have two inputs. There are now *three* composition forms. First, we can compose the mobius transformation on the left, consuming the output of the bimobius:

(<>||) :: Mobius -> Bimobius -> Bimobius

-- (mob <>|| bimob) x y = mob (bimob x y)

Mobius a1 b1 c1 d1<>||BM a2 b2 c2 d2 e2 f2 g2 h2 =

BM a b c d e f g h

where

a = a1 * a2 + b1 * e2

b = a1 * b2 + b1 * f2

c = a1 * c2 + b1 * g2

d = a1 * d2 + b1 * h2

e = c1 * a2 + d1 * e2

f = c1 * b2 + d1 * f2

g = c1 * c2 + d1 * g2

h = c1 * d2 + d1 * h2

We can also compose it on the right side, transforming either the first or second argument of the bimobius.

(||<) :: Bimobius -> Mobius -> Bimobius

-- (bimob ||< mob) x y = bimob (mob x) y

BM a1 b1 c1 d1 e1 f1 g1 h1||<Mobius a2 b2 c2 d2 =

BM a b c d e f g h

where

a = a1 * a2 + c1 * c2

b = b1 * a2 + d1 * c2

c = a1 * b2 + c1 * d2

d = b1 * b2 + d1 * d2

e = e1 * a2 + g1 * c2

f = f1 * a2 + h1 * c2

g = e1 * b2 + g1 * d2

h = f1 * b2 + h1 * d2

(||>) :: Bimobius -> Mobius -> Bimobius

-- (bimob ||> mob) x y = bimob x (mob y)

BM a1 b1 c1 d1 e1 f1 g1 h1||>Mobius a2 b2 c2 d2 =

BM a b c d e f g h

where

a = a1 * a2 + b1 * c2

b = a1 * b2 + b1 * d2

c = c1 * a2 + d1 * c2

d = c1 * b2 + d1 * d2

e = e1 * a2 + f1 * c2

f = e1 * b2 + f1 * d2

g = g1 * a2 + h1 * c2

h = g1 * b2 + h1 * d2

That’s a truly dizzying bunch of coefficients! I’m definitely not satisfied without property tests to ensure they match the intended meanings. To do that, we need two pieces of test code.

First, we want to generate random bimobius transformations with an Arbitrary instance. As with mobius transformations, the instance will guarantee that the denominators are non-zero and have non-negative coefficients.

instanceArbitrary Bimobiuswhere

arbitrary=

suchThat

( BM

<$> arbitrary

<*> arbitrary

<*> arbitrary

<*> arbitrary

<*> (getNonNegative <$> arbitrary)

<*> (getNonNegative <$> arbitrary)

<*> (getNonNegative <$> arbitrary)

<*> (getNonNegative <$> arbitrary)

)

(\(BM _ _ _ _ e f g h) -> maximum [e, f, g, h] > 0)

shrink(BM a b c d e f g h) =

[ BM a' b' c' d' e' f' g' h'

| ( a',

b',

c',

d',

NonNegative e',

NonNegative f',

NonNegative g',

NonNegative h'

) <-

shrink

( a,

b,

c,

d,

NonNegative e,

NonNegative f,

NonNegative g,

NonNegative h

),

maximum [e', f', g', h'] > 0

]

And second, we want a simple and obviously correct base implementation of the bimobius transformation to compare with.

bimobius:: (Eq a, Fractional a) => Bimobius -> a -> a -> Maybe a

bimobius (BM a b c d e f g h) x y

| q == 0 = Nothing

| otherwise = Just (p / q)

where

p =

fromInteger a * x * y

+ fromInteger b * x

+ fromInteger c * y

+ fromInteger d

q =

fromInteger e * x * y

+ fromInteger f * x

+ fromInteger g * y

+ fromInteger h

With this in mind, we’ll check each of the three composition operators performs as expected on rational number, at least. There’s one slight wrinkle: when composing the two functions manually leads to an undefined result, the composition operator sometimes cancels out the singularity and produces an answer. I’m willing to live with that, so I discard cases where the original result is undefined.

prop_mob_o_bimob::

Mobius -> Bimobius -> Rational -> Rational -> Property

prop_mob_o_bimob mob bimob r1 r2 =

case mobius mob =<< bimobius bimob r1 r2 of

Just ans -> bimobius (mob <>|| bimob) r1 r2 === Just ans

Nothing -> discard

prop_bimob_o_leftMob::

Bimobius -> Mobius -> Rational -> Rational -> Property

prop_bimob_o_leftMob bimob mob r1 r2 =

case (\x -> bimobius bimob x r2) =<< mobius mob r1 of

Just ans -> bimobius (bimob ||< mob) r1 r2 === Just ans

Nothing -> discard

prop_bimob_o_rightMob::

Bimobius -> Mobius -> Rational -> Rational -> Property

prop_bimob_o_rightMob bimob mob r1 r2 =

case (\y -> bimobius bimob r1 y) =<< mobius mob r2 of

Just ans -> bimobius (bimob ||> mob) r1 r2 === Just ans

Nothing -> discard

Now to the task at hand: implementing bimobius transformations on continued fractions.

cfBimobius:: Bimobius -> CFrac -> CFrac -> CFrac

cfBimobius (BM a b _ _ e f _ _) Inf y = cfMobius (Mobius a b e f) y

cfBimobius (BM a _ c _ e _ g _) x Inf = cfMobius (Mobius a c e g) x

cfBimobius (BM _ _ _ _ 0 0 0 0) _ _ = Inf

cfBimobius bm x y

| Just n <- bimobiusIntPart bm =

let bm' = Mobius 0 1 1 (- n) <>|| bm in n :+/ cfBimobius bm' x y

cfBimobius bm@(BM a b c d e f g h) x@(x0 :+/ x') y@(y0 :+/ y')

| g == 0 && h == 0 = consumeX

| h == 0 || h == 0 = consumeY

| abs (g * (h * b - f * d)) > abs (f * (h * c - g * d)) = consumeX

| otherwise = consumeY

where

consumeX = cfBimobius (bm ||< Mobius x0 1 1 0) x' y

consumeY = cfBimobius (bm ||> Mobius y0 1 1 0) x y'

There are a few easy cases at the start. If either argument is infinite, then the infinite terms dominate, and the bimobius reduces to a unary mobius transformation of the remaining argument. On the other hand, if the denominator is zero, the result is infinite.

Next is the output case. This follows the same logic as the mobius transformations above:

But since *m* and *m’ *are now bimobius transformations, we just use one of the special composition forms defined earlier.

The remaining case is where we don’t have a new term to output, and must expand a term of the input, using the now-familiar equation, but composing with the ||< or ||> operators, instead.

There’s a new wrinkle here, though! Which of the two inputs should we expand? It seems best to make the choice that narrows the bounds the most, but I’ll be honest: I don’t actually understand the full logic behind this choice. I’ve simply copied it from Rosetta Code, where it’s not explained in any detail. We will rely on testing for confidence that it works.

Speaking of testing, we definitely need a few tests to ensure this works as intended. We’ll test the output against the Rational version of the function, and that the result is always in canonical form.

prop_cfBimobius_matches_Rational::

NonNegative Rational ->

NonNegative Rational ->

Bimobius ->

Property

prop_cfBimobius_matches_Rational

(NonNegative r1)

(NonNegative r2)

bm =

case bimobius bm r1 r2 of

Just x

| x >= 0 ->

cfFromRational x

=== cfBimobius

bm

(cfFromRational r1)

(cfFromRational r2)

_ -> discard

prop_cfBimobius_isCanonical::

NonNegative Rational ->

NonNegative Rational ->

Bimobius ->

Bool

prop_cfBimobius_isCanonical

(NonNegative r1)

(NonNegative r2)

bm =

case bimobius bm r1 r2 of

Just x

| x >= 0 ->

isCanonical

(cfBimobius bm (cfFromRational r1) (cfFromRational r2))

_ -> discard

If you like, it’s interesting to compare the implementations of continued fraction arithmetic here with some other sources, such as:

- The Rosetta Code page mentioned above.
- Mark Dominus’ slides for his talk Arithmetic with Continued Fractions

These sources are great, but I think the code above is nicer. The reason is that with lazy evaluation instead of mutation, you can read the code more mathematically as a set of equations, and see what’s going on more clearly. This is quite different from the imperative implementations, which modify state in hidden places and require a lot of mental bookkeeping to keep track of it all. As I said, Haskell is a joy for this kind of programming.

We now have all the tools we need to write Num and Fractional instances for CFrac. That’s a great way to wrap this all together, so that clients of this code don’t need to worry about mobius and bimobius transformations at all.

checkNonNegative:: CFrac -> CFrac

checkNonNegative Inf = Inf

checkNonNegative x@(x0 :+/ x')

| x < 0 = error "checkNonNegative: CFrac is negative"

| x > 0 = x

| otherwise = x0 :+/ checkNonNegative x'

instanceNum CFracwhere

fromIntegern

| n >= 0 = n :+/ Inf

| otherwise = error "fromInteger: CFrac cannot be negative"

(+) = cfBimobius (BM 0 1 1 0 0 0 0 1)

x-y = checkNonNegative (cfBimobius (BM 0 1 (-1) 0 0 0 0 1) x y)

(*) = cfBimobius (BM 1 0 0 0 0 0 0 1)

signum(0 :+/ Inf) = 0

signum _ = 1

abs= id

negate(0 :+/ Inf) = 0 :+/ Inf

negate _ = error "negate: CFrac cannot be negative"

instanceFractional CFracwhere

fromRationalx

| x < 0 = error "fromRational: CFrac cannot be negative"

| otherwise = cfFromRational x

recip= cfRecip

(/) = cfBimobius (BM 0 1 0 0 0 0 1 0)

You can find the complete code and tests for this section at https://code.world/haskell#P_T4CkfWGu3bgSwFV0R04Sg. In addition to running the tests, I’ve included a few computations, demonstrating for example that:

Everything we’ve done so far is correct, but there’s one way that it’s not terribly satisfying. It can do a lot of unnecessary work. In this part, we’ll try to fix this performance bug.

You can find the code for this part in https://code.world/haskell#PP8uMTTchtH2F36nJYWT3VQ.

Consider, for example, a computation like x + 0.7. We know (and, indeed, Haskell also knows!) that 0.7 is a rational number. We should, then, be able to work this out by applying cfMobius to a simple mobius transformation.

But that’s not what GHC does. Instead, GHC reasons like this:

- We assume that x is a CFrac. But + requires that both arguments have the same type, so 0.7 must also be a CFrac.
- Since 0.7 is a literal, GHC will implicitly add a fromRational, applied to the
*rational*form 7/10, which (using the Fractional instance defined in the last part) uses cfFromRational to convert 0.7 into the continued

fraction 0 :+/ 1 :+/ 2 :+/ 3 :+/ Inf. - Now, since we have two CFrac values, we must use the heavyweight cfBimobius to compute the answer, choosing between the two streams at every step. In the process, cfBimobius must effectively
*undo*the conversion of 7/10 into continued fraction form, doubling the unnecessary work.

We can fix this with GHC’s rewrite rules. They can avoid this unnecessary conversion, and reduce constant factors at the same time, by rewriting expressions to use cfMobius instead of cfBimobius when one argument need not be a continued fraction. Similarly, when cfMobius is applied to a rational argument, it can be replaced by a simpler call to cfFromFrac after just evaluating the mobius transformation on the Rational directly. Integers are another special case: less expensive because their continued fraction representations are trivial, but but we might as well rewrite them, too.

{-# RULES

"cfrac/cfMobiusInt" forall a b c d n.

cfMobius (Mobius a b c d) (n :+/ Inf) =

cfFromFrac (a * n + b) (c * n + d)

"cfrac/cfMobiusRat" forall a b c d p q.

cfMobius (Mobius a b c d) (cfFromFrac p q) =

cfFromFrac (a * p + b * q) (c * p + d * q)

"cfrac/cfBimobiusInt1" forall a b c d e f g h n y.

cfBimobius (BM a b c d e f g h) (n :+/ Inf) y =

cfMobius (Mobius (a * n + c) (b * n + d) (e * n + g) (f * n + h)) y

"cfrac/cfBimobiusRat1" forall a b c d e f g h p q y.

cfBimobius (BM a b c d e f g h) (cfFromFrac p q) y =

cfMobius

( Mobius

(a * p + c * q)

(b * p + d * q)

(e * p + g * q)

(f * p + h * q)

)

y

"cfrac/cfBimobiusInt2" forall a b c d e f g h n x.

cfBimobius (BM a b c d e f g h) x (n :+/ Inf) =

cfMobius (Mobius (a * n + b) (c * n + d) (e * n + f) (g * n + h)) x

"cfrac/cfBimobiusRat2" forall a b c d e f g h p q x.

cfBimobius (BM a b c d e f g h) x (cfFromFrac p q) =

cfMobius

( Mobius

(a * p + b * q)

(c * p + d * q)

(e * p + f * q)

(g * p + h * q)

)

x

#-}

There’s another opportunity for rewriting, and we’ve actually already used it! When mobius or bimobius transformations are composed together, we can do the composition in advance to produce a single transformation. Especially since these transformations usually have simple integer arguments that GHC knows how to optimize, this can unlock a lot of other arithmetic optimizations.

Fortunately, we’ve already defined and tested all of the composition operators we need here.

{-# RULES

"cfrac/mobius_o_mobius" forall m1 m2 x.

cfMobius m1 (cfMobius m2 x) =

cfMobius (m1 <> m2) x

"cfrac/mobius_o_bimobius" forall m bm x y.

cfMobius m (cfBimobius bm x y) =

cfBimobius (m <>|| bm) x y

"cfrac/bimobius_o_mobiusLeft" forall bm m x y.

cfBimobius bm (cfMobius m x) y =

cfBimobius (bm ||< m) x y

"cfrac/bimobius_o_mobiusRight" forall bm m x y.

cfBimobius bm x (cfMobius m y) =

cfBimobius (bm ||> m) x y

#-}

This all looks great… but it doesn’t work. The problem is that GHC isn’t willing to inline the right things at the right times to get to exactly the form we need for these rewrite rules to fire. To fix this, I had to go *back* through all of the previous code, being careful to annotate many of the key functions with {-# INLINE ... #-} pragmas.

There are two forms of this pragma. When a trivial function might just get in the way of a rewrite rule, it’s annotated with a simple INLINE pragma so that GHC can inline and hopefully eliminate the function.

But when a function actually appears on the left hand side of a rewrite rule, though, we need to step more carefully. Inlining that function can *prevent* the rewrite rule from firing. We could use NOINLINE for this, but I’ve instead picked the form of inline that includes a phase number. The phase number *prevents* inlining from occurring prior to that phase. It looks something like:

{-# INLINE [2] cfRecip #-}

Choosing a phase number seems to be something of a black art. I picked 2, and it worked, so… maybe try that one?

All these edits mean we have completely new versions of all of the code, and it’s in these files:

- Part 1: https://code.world/haskell#PKVqY8mch2YP__zZG3KJcaQ
- Part 2: https://code.world/haskell#PnYywOggsg_WaDdNXIpiHkg
- Part 3: https://code.world/haskell#PEKM_o0o1WbKnKhjUIjQ_hQ
- Part 4: https://code.world/haskell#PP8uMTTchtH2F36nJYWT3VQ

Done… but how do we know it worked?

Rewrite rules are a bit of a black art, and it’s nice to at least get some confirmation that they are working. GHC can tell us what’s going on if we pass some flags. Here are some useful ones:

- -ddump-rule-rewrites will print out a description of every rewrite rule that fires, including the before-and-after code.
- -ddump-simpl will print out the final Core, which is GHC’s Haskell-like intermediate language, after the simplifier (which applies these rules) is done. Adding -dsuppress-all makes the result more readable by hiding metadata we don’t care about.

In CodeWorld, we can pass options to GHC by adding an OPTIONS_GHC pragma to the source code. If you looked at the Part 4 link above, you saw that I’ve done so. The output is long! But there are a few things we can notice.

First, we named our rules, so we can search for the rules by name to see if they have fired. They do! We see this:

...

Rule fired

Rule: cfrac/cfBimobiusInt1

Module: (CFracPart4)

...

Rule fired

Rule: timesInteger

Module: (BUILTIN)

Before: GHC.Integer.Type.timesInteger ValArg 0 ValArg n_aocc

After: 0

Cont: Stop[RuleArgCtxt] GHC.Integer.Type.Integer

Rule fired

Rule: plusInteger

Module: (BUILTIN)

Before: GHC.Integer.Type.plusInteger

ValArg src<GHC/In an imported module> 0 ValArg 1

After: 1

...

Rule fired

Rule: cfrac/bimobius_o_mobiusLeft

Module: (CFracPart4)

...

Rule fired

Rule: cfrac/cfBimobiusInt2

Module: (CFracPart4)

Good news: our rules are *firing*!

I highlighted the timesInteger and plusInteger rules, as well. This is GHC taking the *result* of our rules, which involve a bunch of expressions like a1 * a2 + b1 * c2, and working out the math at compile time because it already knows the arguments. This is great! We want that to happen.

But what about the result? For this, we turn to the -ddump-simpl output. It’s a bit harder to read, but after enough digging, we can find this (after some cleanup):

main4 = 1

main3 = 0

main2 = 2

main1

= $wunsafeTake

50# (cfToDecimal ($wcfMobius main4 main4 main3 main2 sqrt5))

This is essentially the expression take 50 (cfDecimal (cfMobius (Mobius 1 1 0 2) sqrt5)). Recall that what we originally wrote was (1 + sqrt5) / 2. We have succeeded in turning an arithmetic expression of CFrac values into a simple mobius transformation at compile time!

We finally reach the end of our journey. There are still some loose ends, as there always are. For example:

- There’s still the thing with negative numbers, which I left as an exercise.
- Arithmetic hangs when computing with irrational numbers that produce rational results. For example, try computing sqrt2 * sqrt2, and you’ll be staring at a blank screen while your CPU fan gets some exercise. This might seems like a simple bug, but it’s actually a fundamental limitation of our approach. An incorrect term arbitrarily far into the expression for sqrt2 could change the integer part of the result, so any correct algorithm must read the
*entire*continued fraction before producing an answer. But the entire fraction is infinite! You could try to fix this by special-casing the cyclic continued fractions, but you can never catch all the cases. - There’s probably plenty more low-hanging fruit for performance. I haven’t verified the rewrite rules in earnest, really. Something like Joachim Breitner’s inspection testing idea could bring more rigor to this. (Sadly, libraries for inspection testing require Template Haskell, so you cannot try it within CodeWorld.)
- Other representations for exact real arithmetic offer more advantages. David Lester, for example, has written a very concise exact real arithmetic library using Cauchy sequences with exponential convergence.

In the end, though, I hope you’ve found this part of the journey rewarding in its own right. We developed a non-trivial library in Haskell, derived the code using equational reasoning (even if it was a bit fuzzy), tested it with property testing, and optimized it with rewrite rules. All of these play very well with Haskell’s declarative programming style.

]]>Here’s a fact that comes up in high school mathematics: you can *demote* multiplication into addition by using logarithms. That is:

That is, you can compute the log of a product, given only the logs of the factors.

To students today, this might seem like just another algebraic identity. But in the age before calculators, it was actually the main reason for a typical high school student to be interested in logarithms at all! Multiplication is more difficult than addition, so if you have a way to represent numbers that makes multiplication into addition, that helps. This is whole principle behind doing multiplication with a slide rule, for example: one just converts to logarithms, adds the resulting distances, and then converts back.

Similarly, one can use logarithms to demote powers into multiplication:

But if we’re imagining a world where we work entirely with logarithms, it’s not entirely fair to just multiply by *y*, so I’m going to rewrite this (let’s agree that all logarithms are natural) as:

There’s an additional exponential function there, but if we take that as given, we can now compute the log of a power using only multiplication, the exponential function, and the logs of the inputs.

An interesting question to ask is: what about addition? The following does *not* work, although math teachers will recognize it as a very common mistake!

So, can we complete this equation?

At first glance, thinking of the logarithm as translating operations down one order (multiplication into addition, and exponents into multiplication), this seems to call for an operation an order *lower* than addition. What could fit in such a place?

We can start to answer this question using simple algebra and our existing identities. Let’s assume *x* is not zero (since then it would have no logarithm anyway!), and then we can factor:

So by applying the log rule for multiplication, we get this nifty little formula:

Notice that although the presentation here doesn’t look symmetric, it actually is. Swapping the *x* and *y* values doesn’t change the result.

Again, imagining that we have only the logarithms and not the actual values, that fraction at the end is sort of cheating. Just as I did with the multiplication formula, I’ll introduce an explicit exponential, and it simplifies nicely.

In order to write this more clearly, I’ll name a new function, *h*, and define in terms of that:

It’s true that we haven’t succeeded in getting rid of addition, but this is leading somewhere interesting. But what is this mysterious function *h*?

We can start to explore *h* by looking at a graph.

At first glance, it looks like *h*(*x*) is approximately zero for any input less than -2, and approximately *x* for any input greater than 2. This sounds like the so-called “rectified linear” function:

Indeed, we can graph the two functions on the same axes, and see that they agree except near zero. (You can also verify this by reasoning about the formula. For inputs far less than zero, the exponential term becomes insignificant, while for inputs far greater than zero, the constant term becomes insignificant. This is the basis of a not-too-hard proof that these are asymptotes.)

We can, therefore, think of *h* as a *soft* rectified linear function; what you get by just rounding out the rectified linear function around its sharp corner.

(This rectified linear function, incidentally, has been popularized in machine learning, where for reasons that depend on who you ask, it has turned out to be wildly successful as an activation function for artificial neural networks. Part of the reason for that success is that it is so simple it can be computed quickly. But that’s not enough to explain all of its success! I suspect another part of the reason is that it’s closely related to sums exactly in the sense of the very investigation we’re doing now.)

So if *h* is so similar to the rectified linear function, what happens when you (inaccurately) use the rectified linear function itself in the sum formula above. Remarkably, you get this:

In other words, in terms of logarithms, adding numbers is *approximately* the same as just taking the maximum! At least, it is when the difference between the numbers is large. That sort of makes sense, actually. If you add a very large number to a very small number, the result will indeed be approximately the same as the large number. (Remember that since we’re only thinking about numbers with logarithms, both inputs must be positive. We need not worry about the case where both numbers are large but with opposite signs.)

We can pull out of this pattern a sort of “soft” maximum function, which is *almost* like just giving the greater of its two arguments, but if the arguments are close then it rounds off the curve. Unfortunately, the phrase *softmax* already means something different and somewhat more complex to the machine learning community mentioned above, so perhaps we ought to call this something like *smoothmax *instead.

Then we have our answer:

It’s not easily computable, really, in the sense that products and powers were, but this still gives some intuition for the function that *does* compute the log of a sum, given the logs of the summands. Anyway, I’m satisfied enough with that answer.

This tells us that this *smoothmax* function can play the role of addition in mathematical expressions. That implies that all of the algebraic properties of addition ought to hold for *smoothmax*, as well. That’s interesting!

For example, *smoothmax* ought to be commutative. That is:

Indeed, this is true. I made that observation above when first introducing the formula. One can also expect that *smoothmax* is associative. That is:

And, indeed, although the algebra is a little more complex, this turns out to be true, as well. In fact, we need not really show each of these with complicated algebra. We’ve already shown that smoothmax *is* addition, just using the logarithms to represent the numbers.

I think things get even more interesting when we consider the *distributive* property. Remember that when we work with logs, multiplication gets replaced with addition, so we have this:

Thinking of this as a softened maximum, this works out to be some kind of *translation invariance *property of the maximum: if you take the maximum of two numbers and then add *x*, that’s the same as adding *x* to each one and then taking the maximum! That intuitively checks out.

There are some things that don’t work, though.

You might also hope for something like an identity property, since for addition we have *x* + 0 = *x. *This one doesn’t turn out so well, because we cannot take the logarithm of zero! We end up wanting to write something like:

This would make sense given the asymptotic behavior of the *smoothmax* function, but we’re playing sort of fast and loose with infinities there, so I wouldn’t call it a true identity. To say that correctly, you need limits.

You also need to be careful with expecting smoothmax to act like a maximum! For example:

That’s weird… but not if you remember that smoothmax is at its least accurate when its two inputs are close together, so both inputs being the same is a worst case scenario. Indeed, that’s where the true *max* function has a non-differentiable sharp corner that needed to be smoothed out. And, indeed, the exact behavior is given by *addition*, rather than maximums, and addition is *not* idempotent (i.e., adding a number to itself doesn’t give the same number back).

In fact, speaking of smooth maxing a number with itself:

which resembles a sort of definition of addition of log-naturals as “repeated *smoothmax* of a number with itself”, in very much the same sense that multiplication by naturals can be defined as repeated addition of a number with itself, strengthening the notion that this operation is sort-of one order lower than addition.

So there you have it. That’s as far as my flight of fancy goes. I found it interesting enough to share.

]]>