λz. (λy. y (λx. x)) (λx. z x) written with de Bruijn indices

Lambda Binders are Global State

Part 1: Anaphora and Complexity

11 min readMay 27, 2018

Technically, I’m brain damaged. Reducing cognitive load is what originally brought me to functional programming. I find it’s a good conversation starter at meetups and the like. For about exactly one decade now I’ve had chronic migraine disorder — essentially one continuous migraine that never went away. I’ll leave discussing that for another time. Besides, I’m not actually that concerned about effects on my intelligence these days. I do, however, know that my working memory is decidedly worse than it was when I was twenty and isn’t about to improve over the decades to come.

While programming in languages modeled on lambda calculus, my experience has been that the amount of cognitive overhead is dominated by tracking the relationships between local variable and their binding sites — often through multiple levels of lexical scope. I would like to put this hypothesis to a rigorous test: in fact, my original idea for this post was actually to quantify it through something similar to a CPS transform.

I’m deeming this “part one” in a series in hope I’ll get to that next time (I do need to work with Clojure ASTs anyway for a numerical DSL I’ve been developing for a little over a year now anyway), but for now I’ll settle for offering what I hope to be a precise description of the phenomenon.

As a point of contrast, there are alternative syntactic conventions when working with lambda calculus — other than combinators — in other words, still among those that preserve variables. I first thought of this idea after a talk by Gershom Bazerman at Papers we Love when he mentioned offhand that lambda binders cannot be rewritten in first-order term rewriting systems. Because eta reduction is not bidirectional, i.e. η-reduction and η-expansion are not always equivalent, lambda binders are not compatible with first-order term rewriting systems. One method to bypass this are de Bruijn indices, where natural numbers are assigned to bound variables.

A version of de Bruijn indices is implemented as syntactic sugar in Clojure as “anonymous function literals.” They look like this:

;; these two functions are semantically equivalent:
=> (fn [a b] (+ a b))
=> #(+ %1 %2)

I’m told there’s something similar in Scala as well. However, Clojure does not allow anonymous function in literal syntax to be nested. Collision of bound variables results in anaphora confusion, the abuse of which was infamously championed by Paul Graham as an example of the power of Common Lisp’s decidedly unhygienic macro system. Resolving this necessitates some form of scoping rules so, in classic Clojure style when it comes to arbitrary design decisions, nested function literals are verboten.

Still, I was surprised to learn from Nicola Mometto that I could just delete two lines in the compiler that enforced this error. I forked it here and the default behavior was the most obvious one: variables in inner scopes overrode those in outer ones. After all, function literals just desugar to gensymed lambdas with the binding sites hidden. They’re really more a style of writing lambda terms: similar to Henry Baker’s use of linear logic, but only at the level of lexical scope.

Here, I explore four different style:

Plain lambda calculus
Literal syntax with lambdas lifted in let bindings
Linear-style, e.g. where all function literals are univariate and each variable shadows the one above it at s-expression boundaries
de Bruijn indices with natural numbers representing nested scope

I also wrote an Emacs mode for highlighting local variables and corresponding bindings that I’ll paste screenshots from. I stayed up all night and clumsily hacked it together with regex, what I told myself was an exercise in state-machine masochism as I began to view lambdas as suspect, and the very next evening attended a talk by Richard Eisenberg at the New York Haskell User’s Group to find he had done almost the exact same thing: except for the purpose of teaching simply-typed lambda calculus in an undergraduate programming language theory course.

The macros took a bit of work. I ended up with two versions, similar to Closure’s “thread-first” and “thread-last” that facilitate pipelining of collection transformations by replicating point-free style through a simple syntactic transformation without either currying or partial application.

The first version was designed for mapping over nested data structures using Nathan Marz’s Specter library, essentially Clojurian lenses, and is therefore a bit more complicated:

(defmacro fn->>
  "Like `->>` but converts bound variables from 
  de Bruijn indices to univariate gensymed fns."  
  [& x]
  (loop [x (macroexpand-1 (cons `->> x))
         i 9]
    (if (zero? i)
      x
      (let [pattern (str "%" i)
            fresh-var (gensym)]
        (recur (transform [SEQ-ZIP
                            (find-first #(= % (symbol pattern)))
                            UP
                            NODE
                            (transformed [TREE 
                                          symbol? 
                                          NAME 
                                          (re-pattern pattern)]
                                         (fn [_] (str fresh-var)))]
                          #(list `fn [fresh-var] %)
                          x)
               (dec i))))))

Another version stacks bindings of gensyms only at the top of the AST, but allowed for non-linearity inside one level of scope. This was necessary for my second example in order to maintain syntactic equivalence in operations on Church numerals by converting them to strings. I try my best to avoid converting symbols to string: it’s the dirtiest form of metaprogramming. Before throwing my hands up in the air, this brought me face to face with the massive question of ordering in argument lists that we all somehow ignore as a mass delusion. Multivariate functions are both difficult to engineer and to formalize. They’re necessary, but they don’t compose well. And currying doesn’t actually solve that problem.

Additionally, I’ll discuss the problem of composing multivariate lambda terms in a more theoretical context towards the end.

Here’s an example of the string equivalence problem in Clojure:

;; with both functions curried:
=> ((fn [f] ((f (fn [n] (format "f(%s)" n))) "n")) 
    (fn [f] (fn [x] (f (f x)))))
"f(f(n))";; with the input uncurried using literal syntax:
=> ((fn [f] ((f (fn [n] (format "f(%s)" n))) "n")) 
    #(%1 (%1 %2)))
ArityException Wrong number of args (1) passed to...

I would love to hear from anyone who can solve this, but for the purposes of this post I ended up writing this macro:

(defmacro fn->
  "Converts bound variables from de Bruijn
  indices to curried univariate gensymed fns."  
  [x]
  (loop [x x
         i 9]
    (if (zero? i)
      x
      (let [pattern (re-pattern (str "%" i))
            fresh-var (gensym)] 
        (recur (transform 
                 (collect TREE symbol? (selected? NAME pattern))
                 #(if (not-empty %1)
                    (list `fn [fresh-var]
                      (setval [TREE symbol? NAME pattern]
                              (str fresh-var)
                              %2))
                    %2)
                 x)
               (dec i))))))

From here on, I’ll refer to both macros as indexed-literals.

When looking for examples I found many that were three levels deep, but never more than that. Interestingly, though, none of these used shadowing (when an outer bound variable appears in an inner lambda) so would all have been unaffected by deleting those two lines from the compiler.

For some practical code, I work a lot in computer algebra so chose a function that implements Faà di Bruno’s formula, a generalization of the chain rule to higher-order derivatives, using a combinatoric method from Michael Hardy’s Combinatorics of Partial Derivatives that allows for multiple variables in the inner function by “collapsing” partitions:

In Clojure, with the aid of our Emacs mode:

Note that this is a naive implementation and there are many low-hanging fruit for optimization by pruning computation of redundant derivatives and scaling them instead, as described in the paper. I’m considering introducing this more generally: as a lazy version of the radix trie I use as an IR form.

Here’s what Faà di Bruno’s formula looks like with lambda lifting:

Using nested literals:

And with the indexed-literal macro:

It seems clear that the last two examples are much more legible, although the advantages of the the indexed version are somewhat obscured by the presence of a binary lambda term in the innermost scope. For compatibility with the indexed-literal macro, I rewrote this as a univariate lambda that destructures its argument at the binding site. The next example, although unlikely to appear in practical code, should be clearer.

I also tried to think of an unusually complicated example in terms of bindings sites. Stephen Cole Kleene’s infamous predecessor function for Church numerals immediately came to mind:

If this is the first time you’re seeing this, yes, this is actually how one decrements a natural number in pure lambda calculus. In fact, it originally eluded Church and his students precisely because it depends on local variables crossing boundaries of scope. Before Kleene discovered it in a famous flash of insight while anesthetized with nitrous oxide to have his wisdom teeth removed, it had not yet been proven that the untyped lambda calculus could represent all μ-recursive functions: a correspondence now well-known as the Church-Turing thesis.

Using nested function literals to represent Chuch numerals proved much more difficult than I had imagined. This is because Church numerals in their traditional form are dependent on anaphora of bound variables! Sigh…I’ve always been more of a Gödel numbering kind of girl.

The predecessor function with nested literals:

And finally, with the indexed-literal macro:

Lambda calculus as coding theory!

Hopefully the syntax highlighting has clarified my point by now.

While I began by discussing the subjective effects of programming style, there also seems to be a formal correspondence with the inference rules of substructural logics. With the example of the predecessor function, we saw that anaphora of local variables in their bindings is essential to the lambda calculus as a model of universal computation, i.e. Turing-completeness.

We then saw how the variable shadowing that naturally occurs from nesting anonymous function literals is similar to linear logic, where all bindings must be invoked exactly once. However, restricting bindings to only the scope at which they’re defined with no restrictions on their invocation more precisely corresponds to linear logic with the addition of weakening, the introduction of additionally bindings while preserving semantic equivalency: what Jean-Yves Girard termed affine logic.

Finally, the examples that dispensed with lambda binders entirely in favor of de Bruijn indices correspond to contraction, the semantics-preserving unification of multiple variables. Since the macro expansion desugars indices into bound variables corresponding to the ordering of unary lambdas, bindings may still be invoked inside nested scopes.

That de Bruijn indices actually work depends on the the exchange rule, which has numerous practical implications for programming language design: particularly in the area of normalization that has traditionally been of great importance for logic programming. I contend the exchange rule also applies to an issue that is comparatively prosaic: the problem of formalizing the order and quantity of arguments in multivariate functions.

Many users of statically typed functional languages that tend to implement currying by default likely see this problem as solved. However, I would argue that currying only formalizes argument ordering for the specific purpose of typechecking. Attempting to implement Church numerals using either nested literals (affine logic) or indexed literals (de Bruijn indices), while maintaining syntactic equivalence under arithmetic operations, convinced me of this even more. Although outside the scope of this post, and still very much a work in progress, the full code can be found here.

For example, given two binary functions f(a b) and g(c d) how would one compose them into one binary function while preserving α-equivalence? Any technique that composed them in this manner would be making assumptions about the order (and, if used more generally, quantity) of bindings in order to reduce the body of the resulting function.

This example is actually the same definition posited by domain theory, where lambdas are represented as continuous functions over the powerset of integers. This makes them clones, functions closed under multivariate composition, and implies a weakening of eta reduction as many unary functions may correspond to the same lambda term. Interestingly, this also applies to the example we used from differential calculus: Faà di Bruno’s formula only allows for multiple variables in the inner function as no corresponding formula exists for calculating weak derivatives of the composition of generalized functions that are both multivariate.

Hasse diagram of Emil Post’s lattice of binary clones

Very clearly this problem of composition is not solved by currying. To the contrary, it has the odd irony of being unnecessary when using the traditional lambda calculus notation of nested unary bindings with anaphoric invocation and impossible without introducing additional structure when using the nested and/or indexed syntax that eliminate anaphora through implicit η-reduction. This presents an interesting problem: perhaps one I may explore in future installments of this series.

Putting that aside, the indexed implementation of the predecessor function is a key example of more immediate practical directions in which this work could go. It’s been known for some decades now that linearity constraints allows for memory to be allocated in a manner that can eliminate the need for garbage collection in performance-intensive applications.

Going further, using methods of deterministic contraction and weakening to represent nested lambda terms as sequences of natural numbers allows for arbitrary programs—even quite sizable ones — to be encoded as integers at the level of machine words. Others have already demonstrated bit-level implementations of both lambda calculus and combinatory logic.

Combined with techniques from integer arithmetic and non-standard numeric types, this could immediately be applied to the sort of arithmetic programs that stand as performance bottlenecks in a variety of applications: from modeling systems of partial differential equations to deriving loop invariants from backwardly propagated constraints to constructing non-interactive proofs of knowledge.

In part two, I hope to introduce a source-transformation macro using Clojure’s tools.analyzer to automate mappings between local variables and their binding sites, then use this both to further explore the design-space of these structures and to scientifically test my hypothesis regarding cognitive load: like microbenchmarks for synaptic activity in the prefrontal cortex.

Thanks to Gershom Bazerman, Nicola Mometto, Richard Eisenberg, and Cliff Click for the discussion and inspiration.

Lambda Binders are Global State

Part 1: Anaphora and Complexity

Written by Sophia Gold