# Solving the mystery behind Abstract Algorithm’s magical optimizations

Yesterday, I reported the bizarre observation that certain functions can behave as if they had negative complexity. You don’t need to read that article before this one. In short, the λ-term `f(bits) = copy(comp(inc,n,bits))`

is asymptotically faster than `g(bits) = comp(inc,n,bits)`

; i.e.,`copy`

(a `O(1)`

operation for a fixed size) behaves as if it had a `O(1/n)`

complexity, causing the program to run faster by doing more things (!?).

That’s not the only bizarre complexity result I had when experimenting with the Abstract Algorithm. Years ago, I noticed that it is capable of computing programs that wouldn’t fit in a computer the size of the universe. Recently, I posted how it can apply an `O(1)`

operation `N`

times in `O(log(N))`

time (!?). Despite insightful answers being provided, it was hard to make sense of such unintuitive behaviors.

After a lot of research and some recent insights, it finally hit me. Turns out the explanation for all of that is pretty simple, and, in this article, I’ll explain what is going on carefully (not jokingly this time). If you’re in a hurry, feel free to skip to the tl;dr at the end.

## 1. Optimizable functions are those that fuse when self-composed

First, let’s start with the simplest example on which something interesting happens: repeated `not`

. Let’s implement it in two different ways:

`True = λt. λf. t`

False = λt. λf. f

not_a = λb. (b False True)

not_b = λb. λt. λf. (b f t)

If you don’t understand those definitions, you can read about lambda encodings, but that’s not too important. Now, how would consecutive applications of `not_a`

and `not_b`

perform in Absal (an implementation of the **Abs**tract **Al**gorithm)? Here is a chart:

Here, complexity is measured as graph rewrites. Don’t worry about that, though, just assume the time it takes to run a λ-term is proportional to the number of graph rewrites. As you can see, **not_a**** performs linearly, while ****not_b**** performs logarithmically**. That’s a huge difference, for such a small change! And is also very unintuitive, because you shouldn’t be able to apply `N`

`not`

s in `O(log(N))`

. Then why that happens? Before answering, let’s first make an experiment. Let’s compose both functions with themselves and check their normal forms.

- norm(
`not_a.not_a`

):`λb. (b (λt.λf.f) (λt.λf.t) (λt.λf.f) (λt.λf.t))`

- norm(
`not_b.not_b`

):`λb. λt. λf. (b t f)`

Interesting: once self-composed, `not_a`

grows larger, but `not_b`

stays the same size. Why? Because the later function **fused**. That’s the same effect that allows Haskell to turn several passes over a list in a single pass, eliminating intermediate structures. This leads us to hypothesize that the magical speedup could have something to do with that property. Let’s check that hypothesis by testing the `map`

operation on lists:

`Nil = λc. λn. n`

Cons = λh. λt. λc. λn. c h (t c n)

map_a = λf. λl. l (λh. λt. Cons (f h) t) Nil

map_b = λf. λl. λc. λn. l (λh. λt. c (f h) t) n

list = Cons True (Cons False (Cons True (Cons False Nil)))

Here, we once again defined it in two different ways: `map_b`

fuses and `map_a`

doesn’t. Let’s measure the complexity of applying `map`

repeatedly to a list:

As you can see, the same behavior is observed here: the version that fuses performs logarithmically, while the version that doesn’t performs linearly, supporting our hypothesis. Woa! But why would that happen?

(Here is the code for reproducing that experiment and generate the charts.)

## 2. Exponentiation by squaring = composition by sharing

To understand that, let’s first talk about integer exponentiation. If you’re a developer, chances are you’ve heard about exponentiation by squaring. It is an ancient, simple algorithm capable of computing `A^B`

in much less steps than it’d take to multiply `A`

repeatedly, `B`

times. To refresh our minds, let’s visualize how it works:

# Computing 13^8 with repeated multiplication13^2 = 13 * 13 = 169

13^3 = 13 * 169 = 2197

13^4 = 13 * 2197 = 28561

13^5 = 13 * 28561 = 371293

13^6 = 13 * 371293 = 4826809

13^7 = 13 * 4826809 = 62748517

13^8 = 13 * 62748517 = 815730721# Computing 13^8 with exponentiation by squaring13^2 = 13 * 13 = 169

13^4 = 169 * 169 = 28561

13^8 = 28561 * 28561 = 815730721

As you can see, the first approach multiplies 13 one-by-one, needing `N`

multiplications to reach `13^N`

(linear time), while the second repeatedly squares the first number instead, allowing it to jump from `13^1`

to `13^2`

to `13^4`

to `13^8`

in a single multiplication for each jump (logarithmic time). To compute values that aren’t perfect squares, you just multiply sub-terms, i.e., `13^11 = 13^8 * 13^2 * 13^1`

.

Neat, right? But how is that relevant? Due to a less known fact, for any function that can be fused by composition, we can also compute its repeated application in `O(log(N))`

steps using that same trick! Let’s visualize it too:

# Computing not^8(t) with repeated applicationnot^2(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b f t) = (λb.λf.λt. b t f)

not^3(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b t f) = (λb.λf.λt. b f t)

not^4(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b f t) = (λb.λf.λt. b t f)

not^5(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b t f) = (λb.λf.λt. b f t)

not^6(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b f t) = (λb.λf.λt. b t f)

not^7(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b t f) = (λb.λf.λt. b f t)

not^8(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b f t) = (λb.λf.λt. b t f)# Computing not^8(t) with composition by sharingnot^2(t) = (λb.λf.λt. b f t) . (λb.λf.λt. b f t) = (λb.λf.λt. b t f)

not^4(t) = (λb.λf.λt. b t f) . (λb.λf.λt. b t f) = (λb.λf.λt. b t f)

not^8(t) = (λb.λf.λt. b t f) . (λb.λf.λt. b t f) = (λb.λf.λt. b t f)

Ok, that is way too hard to read. Let’s replace lambdas by their names:

# Computing not^8(x) with repeated applicationnot^2(x) = not . not = id

not^3(x) = not . id = not

not^4(x) = not . not = id

not^5(x) = not . id = not

not^6(x) = not . not = id

not^7(x) = not . id = not

not^8(x) = not . not = id# Computing not^8(x) with composition by sharingnot^2(x) = not . not = id

not^4(x) = id . id = id

not^8(x) = id . id = id

Can you see it? The first approach applies `not`

s one by one, needing `N`

applications to reach the final result (linear time), while the second repeatedly self-composes the initial function, jumping from `not^2`

to `not^4`

to `not^8`

with just a single composition for each jump (logarithmic time). In other words, **as long as the self-composition of ****f**** fuses, we can implement repeated application in logarithmic time, using the same trick behind exponentiation by squaring**!

## 3. Absal performs runtime fusion

Sadly, knowing that trick isn’t sufficient to speed-up fusible algorithms: we also need a functional language capable of performing short-cut fusion during runtime. Haskell, for one, isn’t capable of doing that for native functions. To illustrate, let’s implement “composition by sharing” in it:

This function takes a number `n`

, a function `f`

, and returns a new function that applies `f`

repeatedly to `x`

, `n`

times, using the “composition by sharing” method. As an example, (`comp(not,10,x)`

) evaluates to the equivalent of:

Notice how this function applies `not`

10 times to its input, even though it only uses composition 3 times? That’s why it should be efficient. So, what is the complexity of `comp(not, N, x)`

in Haskell? Not logarithmic as we’d expect, but linear! You can test that fact by running this Gist. If you change `2^20`

to something higher such as `2^40`

, it will run out of memory. Why? Because it is too lazy and never evaluates inside lambdas, which would be necessary to fuse those intermediate expressions. And, even if it did, it’d still not work. This separate posts investigates why, examining Haskell and JavaScript’s evaluation model and why they’re not optimal.

Now we’re real close to solving our puzzle. But, first, we need to talk about parallel universes; I mean, parallel reductions. The main insight behind Absal is the use of interaction nets instead of simpler textual representations. Interaction nets are a model of computation based on graphs with labelled ports (one being the “main port”) and a very peculiar caveat: whenever two “main ports” met, the nodes connected by them are rewritten.

That causes a very interesting effect: no matter what order you rewrite your nodes, the total number of steps stays constant. If, thus, you compile λ-terms to interaction nets, then you can reduce expressions in any order: functions first, arguments first, **even in parallel**. In the end, the performed work will be the same!

The trick, of course, is how to properly translate λ-terms to interaction nets. Sadly, there is no solution to that covers all λ-terms efficiently. But, for a huge class of them (probably whatever you’d write in practice), there is a simple and elegant way to do it: that’s the Abstract Algorithm. To explain it shortly, each lambda and each application of a λ-term becomes a ternary node on the net, and each repeated occurrence of a variable becomes a “fan node”. Also, the main port of an application node points to the function port, which is why redexes can never be duplicated.

If you’re overwhelmed, don’t worry. The main insight is that, by being free to reduce terms in whatever order it wants, **Absal is capable of being as eager as possible without any performance penalty, which, in turn, allows it to fuse intermediate compositions. This is exactly what is needed to exploit the “composition by sharing” trick**. If you do want to have a better understanding of the Abstract Algorithm, I recommend reading this book, it is extremely pedagogical. Alternatively, you can just contemplate the image below:

It depicts the evaluation of the`(λg. g (g λx. x)) (λh. (λf. f (f λz. z)) (h λy. y))`

term on the abstract algorithm. Notice how sub-expressions can be shared by several terms, how we can evaluate inside lambdas without ever risking duplicating work (and while still being able to be lazy), and how a single constant-time beta-reduction in this format is equivalent to several beta-reductions on more traditional representations. Beautiful, no?

# But what about the negative complexity stuff?

Let’s recap all that we have learned so far. First, some λ-calculus functions fuse when they are self composed. By “fuse”, I mean the size of their normal forms stay constant. Second, those functions can be applied `N`

times with `O(log(N))`

complexity using the “composition by sharing” strategy, as long as the host language can eagerly normalize sub-expressions inside lambdas. Third, Absal is capable of evaluating inside lambdas, which allows it to do exactly that, fusing inner compositions during runtime. That leads us to one logical conclusion: `copy`

must somehow cause `inc`

to fuse! Wow! Could it be? Let’s test that hypothesis by looking normal forms!

-- Normal form of `inc`:

λa. λb. λc. λd. (((a c) λe. (b λf. λg. λh. (((e g) λi. (f λj.

λk. λl. (((i k) λm. (j m)) l))) h))) d)-- Normal form of `inc . inc`:

λa. λb. λc. λd. (((a λe. (b λf. λg. λh. (((e g) λi. (f λj. λk

. λl. (((i k) λm. (j m)) l))) h))) λe. (c λf. λg. λh. (((e g)

λi. (f λj. λk. λl. (((i k) λm. (j m)) l))) h))) d)-- Normal form of `inc . inc . inc`:

λa. λb. λc. λd. (((a λe. (c λf. λg. λh. (((e g) λi. (f λj. λk

. λl. (((i k) λm. (j m)) l))) h))) λe. (b λf. λg. λh. (((e λi

. (f λj. λk. λl. (((i k) λm. (j m)) l))) λi. (g λj. λk. λl. (

((i k) λm. (j m)) l))) h))) d)--------------------------------------------------------------- Normal form of `copy . inc`:

λa. (((a λb. λc. λd. λe. (d (((b λf. λg. λh. λi. (g (((f λj.

λk. λl. λm. (k j)) λj. λk. λl. λm. (l j)) λj. λk. λl. l))) λf

. λg. λh. λi. (h (((f λj. λk. λl. λm. (k j)) λj. λk. λl. λm.

(l j)) λj. λk. λl. l))) λf. λg. λh. h))) λb. λc. λd. λe. (c (

((b λf. λg. λh. λi. (h (((f λj. λk. λl. λm. (k j)) λj. λk. λl

. λm. (l j)) λj. λk. λl. l))) λf. λg. λh. λi. (g (((f λj. λk.

λl. λm. (l j)) λj. λk. λl. λm. (k j)) λj. λk. λl. l))) λf. λg

. λh. h))) λb. λc. λd. d)-- Normal form of `copy . inc . inc`:

λa. (((a λb. λc. λd. λe. (c (((b λf. λg. λh. λi. (h (((f λj.

λk. λl. λm. (k j)) λj. λk. λl. λm. (l j)) λj. λk. λl. l))) λf

. λg. λh. λi. (g (((f λj. λk. λl. λm. (l j)) λj. λk. λl. λm.

(k j)) λj. λk. λl. l))) λf. λg. λh. h))) λb. λc. λd. λe. (d (

((b λf. λg. λh. λi. (h (((f λj. λk. λl. λm. (k j)) λj. λk. λl

. λm. (l j)) λj. λk. λl. l))) λf. λg. λh. λi. (g (((f λj. λk.

λl. λm. (l j)) λj. λk. λl. λm. (k j)) λj. λk. λl. l))) λf. λg

. λh. h))) λb. λc. λd. d)-- Normal form of `copy . inc . inc . inc`:

λa. (((a λb. λc. λd. λe. (d (((b λf. λg. λh. λi. (h (((f λj.

λk. λl. λm. (k j)) λj. λk. λl. λm. (l j)) λj. λk. λl. l))) λf

. λg. λh. λi. (g (((f λj. λk. λl. λm. (l j)) λj. λk. λl. λm.

(k j)) λj. λk. λl. l))) λf. λg. λh. h))) λb. λc. λd. λe. (c (

((b λf. λg. λh. λi. (g (((f λj. λk. λl. λm. (l j)) λj. λk. λl

. λm. (k j)) λj. λk. λl. l))) λf. λg. λh. λi. (h (((f λj. λk.

λl. λm. (l j)) λj. λk. λl. λm. (k j)) λj. λk. λl. l))) λf. λg

. λh. h))) λb. λc. λd. d)-- Note: using a fixed depth of 3 for better visualization

There it is! That’s exactly what happens. Don’t worry about the contents of those terms, we’re just considering their sizes. Here, the composition of `inc`

with itself is small, but as we keep appending more `inc.`

s, the normal form grows bigger, linearly. In other words, `inc`

does not fuse with itself. If we just append `copy.`

to the sequence of `inc.`

s, then the normal form of the resulting function is larger, but stays constant, regardless of the number of compositions. In other words, its self-composition merely permutes variables and, thus, fuses. That, in turn, allows the “composition by sharing” effect to kick off, which causes the `N`

th composition of `inc`

to have logarithmic complexity. Feels good, `inc.`

!

And that’s it. Mystery solved. Here is the code for replicating this experiment with Absal (see my previous post for install instructions).

A last pending question would be: why `copy`

causes `inc`

to fuse? But that’s not hard to understand. First, let’s visualize the definitions of `copy`

and `inc`

:

Now, let’s fix a depth `d=2`

and visualize what `inc`

becomes when composed with `copy`

:

Turns out `copy`

sucked `inc`

into itself, feeding it a bunch of constructors, which transformed it into a function that exhaustively pattern-matches all branches of its input to the maximum depth, instead of prematurely returning when no more work is needed, as it did before. Notice how line 16 on the first snippet was replaced by lines 8-21 on the second one. In other words, `inc`

essentially became a big permutation / lookup table instead of a normal recursive function. That sounds like a bad idea, but turns out this complete coverage is exactly what was needed for the function to be able to fuse with itself. Without it, the symbolic variables that are prematurely returned in certain branches couldn’t be reduced further, accumulating in self-composed terms, causing them to grow larger and larger. Or, alternatively, the additional pattern matches self-destroy under self-composition, keeping the function’s body small.

## Conclusion

Well, that was quite a trip. Turns out that the Abstract Algorithm had simple explanations behind its magical results. I’m glad I finally figured those things out, and thankful to the Redditors who commented with insights and speculations; in special, /u/matt-noonan, who mentioned the composition by sharing trick which put all pieces together in my head.

## tl;dr

In Absal, an optimal implementation of functional languages, repeated application is performed with church-numbers, which are represented by functions with a high amount of internal sharing. For example, 21 is represented as:

`λf1. λx. `

let f2 = f1 . f1 in

let f4 = f2 . f2 in

let f8 = f4 . f4 in

let f16 = f8 . f8 in

f16 (f4 (f1 x))

That format is very similar to “exponentiation by squaring”, a classic algorithm capable of computing the exponentiation of `A^B`

in `O(log(N))`

instead of `O(N)`

. That alone isn’t sufficient to explain the phenomena; in Haskell, for example, using that function gives no speedups. Absal, though, is extremely eager (despite suffering no slowdowns for it), which makes it capable of computing the normal form of intermediate `f . f`

compositions, essentially performing fusion at runtime. That’s only useful when `f`

fuses when self-composed, otherwise the growth of its normal form would cause an exponential slowdown due to the cost of copying increasingly larger terms.

If `f`

fuses under self-composition, though, then Absal manifests a “composition by sharing” effect, which is similar to “exponentiation by squaring”, except with functions and composition instead of integers and multiplication. This is what allows it to compute `N`

repeated applications of `f`

in `O(log(N))`

time. Finally, turns out `copy . inc`

transforms `inc`

into a function that self-fuses, by replacing symbols that accumulate by lambdas that self-destroy under self-composition, which is why it is asymptotically faster than `inc`

alone.

## tl;dr tl;dr

Magic.