Optimizing Higher Order Functions with Hypothetical Inverses

4/29/2016 edit: Feel free to play with the code I’ve written to perform these transformations (in a very strange language called Maude): https://github.com/pschanely/wf-optimizer

Much of my independent-thinking life has been consumed by questions like this: How do we make compilers smart enough to know that “sum(reverse(x))” can be rewritten to “sum(x)”? Or that “reverse(reverse(x))” can be rewritten to “x”? Without special knowledge of what “sort” and “reverse” mean, these are challenging problems. I don’t know of any popular compilers that make these optimizations on their own. (please tell me about it if you know of one!)

Just recently, I noticed a pattern to perform these kinds of optimizations. I presume that some more general form of this has already been discovered and researched, but I couldn’t find such a discussion easily, so I’m going to quickly jot down my approach here.

Functional languages make use of a lot of higher order functions like map and reduce. In order to optimize these programs, you often want to move code between these functions. In general, doing so requires some sort of inductive proof, which is extremely difficult to do automatically. However, in more specialized cases, you can do something a little easier. I’ll discuss three such special-case transformations below.

#1 Map-Map Transformation

This is the trivial one and, unlike the two following cases, is well-known and always possible. We simply compose the two functions in a single map:

    map(f1, (map(f2, l))
rewrites to
map(λ(x): f1(f2(x)), l)

#2 Map-Reduce Transformation

When a function, f, is mapped over a list and then passed to reduce, we may be able to push f over the reduce. We do this by first supposing the existence of an inverse of f, f⁻¹. The inverse function need not be known, and in most cases, will not even exist. That’s okay, though, because the rewriting process will only succeed if the instances of the inverse function are canceled out by instances of f. The transformation pushes f outside the reduce and, inside the reduce function, we apply f to the two inputs and f⁻¹ to the output:

    reduce(r, map(f, l))
rewrites to
f(reduce(λ x,y: f⁻¹(r(f(x), f(y))), l))
if, after simplification, the call to f⁻¹ is eliminated

I find this a little easier to understand as a graph of operations:

In order to cancel the instance of f⁻¹, we will have to push f through r, possibly changing r along the way. This mixing of the code in f and r can produce some exciting optimization opportunities.

An Example

Consider this case:

reduce(max, map(incr, l))

Here, I presume “incr” to be a function that adds one to its argument, and “max” to be a function that returns the largest of its two arguments. In this example, the call to map() is wasteful; we don’t need to increment every element in the list. Instead, we could have just taken the maximum and incremented that value afterwards. Let’s try to make that optimization. Applying the Map-Reduce transformation above, we get:

incr(reduce(λ x,y: incr⁻¹(max(incr(x), incr(y))), l))

Which simplifies, though a presumed native rewrite rule, max(incr(x), incr(y)) = incr(max(x,y)):

incr(reduce(λ x,y: incr⁻¹(incr(max(x, y))), l))

And, via inverse elimination, to:

incr(reduce(max, l))

Much better.

#3 Reduce-Anything Transformation

Going in the opposite direction, we can take an arbitrary function, f, that is applied to the output of a reduce, and attempt to move it inside. We adopt a similar strategy as before, supposing an inverse of f, and attempting to cancel it in the body of the reduce function.

    f(reduce(r, l))
rewrites to
reduce(λ x,y: f( r( f⁻¹(x), f⁻¹(y) )), map(f, l))
if, after simplification, both calls to f⁻¹ are eliminated

And, visually:




The call to reverse here is unnecessary; we really just want to add all the numbers together. Let’s expand the definition of reverse (admittedly using a definition that is useful in this context, employing map/reduce):

sum(reduce(λ x,y: append(y,x), map(λ x:[x], l)))

We can apply the Reduce-Anything transformation to try and move the call to sum further down:

reduce(λ x,y: sum( append( sum⁻¹(y), sum⁻¹(x) )), 
map(sum, map(λ x:[x], l)))

With the double map now surrounding l, we apply the Map-Map transformation:

reduce(λ x,y: sum( append( sum⁻¹(y), sum⁻¹(x) )), 
map(λ x:sum([x]), l)))

The sum of one thing rewrites to itself:

reduce(λ x,y: sum( append( sum⁻¹(y), sum⁻¹(x) )), 
map(λ x:x, l)))

And mapping the identity function is rewritten to nothing:

reduce(λ x,y: sum(append( sum⁻¹(y), sum⁻¹(x) )), l))

Moving into the body of the reduce function, we use a (primitive) rewrite rule, sum(append(x, y)) = add(sum(x), sum(y)):

reduce(λ x,y: add(sum(sum⁻¹(y)), sum(sum⁻¹(x))), l)

Inverse elimination:

reduce(λ x,y: add(y, x), l)

And, add() is commutative:

reduce(add, l)

Future Work

I believe that this concept of “hypothetical inverses” could be applied to other higher order functions, including more iterative constructs like Haskell’s until().

Not every application of these transformations will yield gains. To implement this in a real compiler, we would need a facility to evaluate the relative efficiency of the transformation.

These transformations might only be useful in obvious optimization scenarios (cases that no programmer would intentionally create). More research needs to be done here. At worst, they may only be useful in whole-system optimization scenarios; imagine a unix shell that performed this in linear time:

$ cat foo | sort | head -1


(1) The derivation is significantly more complex, but using some of these transformations, it’s possible to perform some surprising optimizations, like a functional version of the unix shell example above:

sort(l)[0]  =>  min(l)

Exciting! See the code below.

(2) I’ve been implementing my optimizer experiments in a strange language called “Maude.” It’s rough around the edges, and I only understand about 30% of Maude, but it’s really nice as a rewrite engine. See the code here: https://github.com/pschanely/wf-optimizer

(3) In order to make all this work out easily, I am using a narrow definition of reduce, intended for associative operations with identity. It is undefined on lists of size zero, returns a value of type T, and takes a list of type T and a function over values of the same type, (T, T)→T. I think a similar pattern could potentially be applied to non-associative reductions (folds), but it requires a little more work.

(4) Hey, and follow me on medium/twitter to get more updates on my adventures in optimizing compilers. https://twitter.com/pschanely