Idle Speculation on Lenses and Zippers
This is the story of two related concepts that almost mesh together in an interesting way. Fair warning: it doesn’t work out cleanly in the end. But in the interest of sharing some fun exploration, here we go anyway.
(First of all, this story isn’t new. I wrote about it on Google+ about a year and a half ago. Since that platform is going away, I won’t bother linking to my original post and creating another broken link. I gave a lightning talk on this at the February meeting of the Atlambda Haskathon, as well. That’s when I realized I need to write about this again or risk losing it to defunct hosting.)
There are two concepts in Haskell that seem to have an interesting relationship to each other: lenses and zippers.
Intro to zippers
The zipper is one of the classical functional pearls: a data structure that was originally created for practical purposes, and then out of the blue, also exhibits some insane abstract relationships!
The original “zipper” is a way of working with linked lists.
Suppose that instead of traversing from the beginning, you’d like to navigate through the list, keeping track of a current position, and only making modifications around that current position. For example, the list might be a list of characters in a text editor, or a list of windows in a window manager. You want to be able to insert, delete, and modify the elements around the cursor efficiently.
The trick is to reverse the section of the list that comes before the cursor. You end up with two linked lists — one reversed list of values before the cursor, and one forward list of values after it — and one value at the cursor itself.
The green box represents a hole, containing a single element of the list. Surrounding this hole is the context, which is the surrounding data structure from the point of view of the cursor at that hole. In this representation, changes to the focused element are easy, since it’s separate from the context. Changes to the context near the cursor are also easy: to insert or remove an element adjacent to the cursor, you push or pop from one of the two lists. These are all constant time operations. Mission accomplished.
Zippers don’t just work for linked lists, though. In fact, most data structures containing elements also have associated zipper types, which separate the structure into a single focused element, and a context that describes the rest of the structure. For a linked list, the context consists of a left and right half, but for something like a binary tree, the context includes left and right sub-trees, and the parent nodes and its other children. It’s less obvious what this might look like. But there’s a trick! You can calculate these zipper types from the algebraic structure of the tree.
The semi-ring of types
For this to make sense, we need to think about data types as things that have standard algebraic operations on them. In particular, we want to know how to add types, multiply types, and raise them to powers.
The intuition here is to think about the number of values that a type has, and find operations on the type that have the same effect on the number of values. (Of course, types often have infinitely many possible values… but let’s agree to ignore that for now.)
Here are the basic operations on types:
- Addition corresponds to a tagged union. That is, A + B is a type that can contain either a value of type A, or a value of type B, and (unlike unions in C, for example) you can find out which one.
- Multiplication corresponds to tuples with fields. That is, A * B is a type that contains both a value of type A, and a value of type B.
- Exponents correspond to functions. That is, A^B is a type of functions (and I mean mathematical functions, so they must be referentially transparent with no side effects) that take values of type B as arguments, and return values of type A.
If this is a new point of view, it’s worth taking the time to convince yourself that the normal mathematical properties of these operations still hold! For example:
- Is it still true that X² = X * X?
Here, “2” can be seen as referring to a Boolean type, since it has two possible values. Choosing a pure function f from Boolean to X is the same thing as choosing a value of X to return for f(true), and another for f(false).
- Does the distributive property hold, so A * (B + C) = A * B + A * C?
Suppose you have an A, and you have either a B or a C. Then maybe you have both an A and a B, or maybe you have both an A and a C. That works, too.
Zippers via differentiation
With this in mind, the question of how to build a zipper is answered in an amazing way by Conor McBride in an unpublished but well-known paper. The idea is that the context we saw above is just the derivative of the data type!
Let’s return to the example of a binary tree. A binary tree containing values of type X can be:
- A leaf, which is just a single value that terminates the tree.
- A branch, which consists of an X value, a left sub-tree, and a right sub-tree.
Expressing this algebraically, if T is the type of binary trees, then we can write T = 1 + X * T². The 1 refers to the nil value used for the leaves, and the X * T² is the branch node, which consists of an X value, and two sub-trees.
We can compute the derivative by differentiating both sides of that equation with respect to X. Applying the product rule, power rule, and chain rule, we get this:
dT/dX = T² + 2 * X * T * dT/dX
Do you see why this makes sense as the zipper type for a tree? Remember, the zipper type is the type of binary trees with one missing X value. The missing X value can be either:
- At the root: In this case, since the X is missing, all that remains is to say what sub-trees live to the left and right. This is the T² term.
- In one of the sub-trees of the root: In this case, you need to know which sub-tree (a choice between two values), what X value is at the root, what the sub-tree looks like that’s not missing an X, and another context for the subtree that is missing a value. This is the 2 * X * T * dX/dT term.
Well, that’s interesting, if you haven’t seen it before. But let’s not stop there.
The information content of the zipper
One thing that changed here is that while the original zipper by Huet reversed the point of view so that values near the cursor are easier to access, this new zipper doesn’t do so. That is an important point for practical uses of the zipper type, but not so much for this observation, which is entirely about the information content of the zipper, and not its efficiency.
Thinking about what information is contained in a zipper, you might note that a zipper type, dT/dX, contains almost a complete value of type T, but it’s missing an X. In practice, like we saw with the list zippers above, you’ll often keep a zipper (for context) alongside a specific X value that’s currently being edited. But that’s just another product!
You may be tempted to write an equation like X * dT/dX = T. But that’s wrong. Having an X and a dT/dX doesn’t just give you a value of the type T. It also gives you a location of a specific X inside of that T. How would that be represented?
Lenses and zippers?
Many programming languages tackle the question of how to identify specific places where one value fits inside the other. The answer usually comes down to two operations: get, which retrieves the value in some location from the larger structure, and set, which puts a new value into that location.
Entirely separate from zippers, functional programming languages have developed an elaborate theory of what basically amounts to getter/setter pairs for values. The result is the lens, which for any structure type T and field type X, is a new type I’ll call L(T, X), which says:
- For any T, how to get an X out of the T.
- How to put a new X back into the T as well.
In some intuitive sense, you can think of a lens as a locator for the X inside of the T. So you may be tempted to write an equation like this:
X * dT/dX = T * L(T, X)
Let’s take a simple example, and think about it. Suppose you have the type T = X³, which is just tuples of three X values. Then dT/dX = 3X² (by the power rule), and you might expect that L(T, X) = 3, since a lens just tells you which of the three X values inside the T you are looking at. Then indeed, the above equation holds!
What’s the problem?
That’s exciting, but the excitement won’t last. Thinking further about this leads to a few sobering realizations.
First, if you try this for something like a list, it doesn’t work out. If the type T is a list of X, then dT/dX = T², as we saw above. But L(T,X) doesn’t even make sense, because it’s impossible to find an X inside an arbitrary list (which might be empty). For the equation to work out, you would need L(T,X) = X * T, and that seems clearly wrong. If you play around with this, you’ll discover that this problem comes up any time you’ve got a sum type. Types that are pure products, like X³, work just fine, but as soon as different values might have different shapes, everything stops working.
The reason is some subtlety in what we mean by identifying the location of an X inside a T. A lens knows how to locate an X inside any possible value of type T. But the extra information we have from X * dT/dX is the location of an X in one specific value of type T. Basically, we have a quantifier in the wrong place: trying to use a lens here consists of saying “for all…” when we meant “there exists…”.
There’s a second problem, too: in general, it’s not true that L(X³,X) = 3 as assumed in the previous section. There are three obvious ways to find an X inside a value of type X³, but depending on the type X, there might be all sorts of other ways! For example, if X is the type of 32-bit integers, you could hide one integer by taking the lower-order 16 bits of the first one, and the higher-order 16 bits of the other. This is a perfectly good lens, but not one of the three we anticipated. Parametricity sometimes solves this problem if we quantify over possible values of X (because you can’t construct an X out of pieces if you don’t know what type it is, so you’re forced to use one that occurs intact in the data structure!), but that’s another accomodation needed.
In fact, it’s even worse. We can’t even safely say that X * dT/dX contains a T at all! If T is a list of X, for example, the type X * dT/dX can only contain non-empty lists (plus the location info), since dT/dX only represents contexts that leave a hole for some X value in the first place.
As I warned in the beginning, there’s not really a satisfying wrap-up here. We looked at some ideas that initially appear to be related. In some limited situations — when the zipper types are constructed over non-empty product-only types, and the lenses are restricted to being natural enough — things seemed to work out. But we also saw why it’s important to be precise about what these kinds of abstractions mean.