The Y Combinator (no, not that one)

A crash-course on lambda calculus

What do you think of when you hear “Y Combinator”? If you’re like most of us, you’d probably think of the venture capital company based in Mountain View, California.

Well, here is the original Y combinator:

λf. (λx. f (x x))(λx. f (x x))

Please, please don’t run away. There’s a reason why Silicon Valley’s Y Combinator is named after this. It’s actually kind of awesome. (Kudos to them for picking a cool name.)

To understand what λf. (λx. f(x x))(λx. f(x x)) means and what it can do, we’ll first need to learn the basics of lambda calculus.

If you’re interested in functional programming or are curious about it, I think you might enjoy it. Don’t worry if you don’t fully understand it after your first read-through; the article won’t disappear!

Lambda Calculus

Lambda calculus (or λ-calculus) was invented by Alonzo Church in 1930 as a formal system for expressing computation. Although it has the word “calculus” in it, it is far from related to the calculus that Newton and Leibniz invented. In fact, it is a lot closer to programming than mathematics as most of us know it.

Valid λ-calculus expressions can be defined inductively as follows:

  1. A variable x is a valid λ-term.
  2. If t is a valid λ-term and x is a variable, then λx. t is a valid λ-term.
  3. If t and s are both valid λ-terms, then t s is a valid λ-term.

Maybe that made sense, maybe it didn’t. If you’re anything like me, you might like learning through examples. So let’s build up a few λ-terms from scratch. We’ll talk about what these λ-terms mean after we have a few of them to work with.

Based on #1 (“A variable x is a valid λ-term.”), we can build these λ-terms:

x
y

Okay, that wasn’t that bad. Let’s see what we can do with #2 (“If t is a valid λ-term and x is a variable, then λx. t is a valid λ-term.”).

Having x and y as valid λ-terms in our pocket, we can now create:

λx. x
λy. y
λx. y
λy. x

And with #3 (“If t and s are both valid λ-terms, then t s is a valid λ-term.”), we can create:

x x
x y

x. x) y
y. y) x
(
λx. x) (λx. x)
(
λy. y) (λx. x)

As an exercise, try writing down some of your own λ-terms.

Let’s take a look at some of the λ-terms that we’ve created.

λx. x is known as the identity function. The x that is bolded in λx. x can be interpreted as the input, and the x that is bolded in λx. x can be interpreted as the output. That is, it takes in some input x, and outputs the same x.

If you’ve played around with languages that support lambdas, like Ruby, you might have seen it written like:

i = ->(x) { x }

(If you squint a little, you might notice that -> kind of looks like λ.)

What about λx. y? That is known as the constant function, since it ignores the input x and returns y no matter what.

c = ->(x) { y }

You might ask, “Isn’t λy. y also the identity function?” Indeed it is! λx. x and λy. y are α-equivalent (alpha-equivalent). In fact, all of the following are α-equivalent:

λx. x
λy. y
λz. z
λ☃.

This leads us to the discussion of bound vs. free variables.

Bound vs. free variables

A bound variable is a variable that occurs in the body of a λ with an argument of the same name. A free variable is a variable that is not a bound variable.

For example, x is a bound variable in λx. x, but a free variable in (λy. y) x.

What’s special about bound variables is that you can rename them, as long as you do so consistently. If two λ-terms are the same up to renaming of bound variables, they are α-equivalent.

Let’s take a look at an example in code.

m = ->(horrible_variable_name) { horrible_variable_name * y }

In the above example, it’s safe to rename horrible_variable_name (a bound variable) to another name like x.

m = ->(x) { x * y }

But you can’t just go and rename y, because it’s defined outside of the scope. Maybe y is defined as the number 2, and m is a function that multiplies its input by 2. If we went and renamed it to z (let’s say it’s defined to be 0), m would turn into a function that always returns 0.

Function application

Okay, awesome. We have functions. But how do you do stuff with them? That’s where rule #3 comes into play.

If t and s are both valid λ-terms, then t s is a valid λ-term.

x. x) y is an example of a function application. More concretely, it represents the act of calling the function λx. x with y as an input.

You can reduce a function application using β-reduction (beta-reduction). The rules of β-reduction say that a term (λx. t) s can be reduced to t [x := s], which reads “t where all bound occurrences of x in t are substituted for s.”

For example, you can β-reduce (λx. x) s to just s by replacing all bound occurrences of x in the body (which just happens to be x in this case) with s, which gets us s. Line by line, it looks like this:

x. x) s
x [x 
:= s]
s

And there, we see that λx. x does exactly what we would expect the identity function to do.

It’s also worth mentioning that you can define functions that take in more than one argument based on the inductive definition of λ-terms.

For example, take a look at the following term:

λy.(λx. x) y

… and feed it two inputs, a and b:

y.(λx. x) y) a b
((
λx. x) y) [y := a]) b
(
λx. x) a b
(x [x 
:= a]) b
a b

It’s important to note here that function application is left-associative. That is,

y.(λx. x) y) a b = (((λy.(λx. x) y) a) b)

Now take a look at the following λ-term:

x. x x)(λx. x x)

What happens when you apply β-reduction to it?

(λx. x x)(λx. x x)
(x x) [x := (λx. x x)]
(λx. x x
)(λx. x x)

Wait. Did we just end up where we started? You have just discovered Ω (omega), a divergent combinator. A λ-expression is divergent if it has no β-normal form. A λ-expression exhibits β-normal form if no β-reduction can be applied to it. A combinator is a λ-expression that contains no free variables.

Isn’t infinite recursion such a curiosity?

Speaking of recursion, how can we define something like the factorial function in λ-calculus? In simple Ruby, it might look something like:

def fact(n)
if n == 0
1
else
n * fact(n-1)
end
end

What’s special about this function? It refers to itself. You might say, “Doh, that’s just recursion.” Well, you might be shocked to hear that λ-calculus does not allow this kind of self-reference, at least not directly.


Self-reference is such a simple yet powerful concept that allows curiosities like the statement below come into existence:

This statement is false.

Is that statement true? If it is, it’s false. Then is that statement false? If it is, it’s true.

How about:

The statement below is true.
The statement above is false.

If this intrigues you at all, I recommend reading Douglas Hofstadter’s book, Gödel, Escher, Bach. I’m only about halfway through it so far, but it’s like wandering through a labyrinth of Escher’s drawings themselves. Trust me, it’s worth getting lost—but I digress.

Print Gallery (M. C. Escher)

So, how do we achieve self-reference without self-reference? I think we might be ready to revisit our mysterious friend, the Y combinator.

The Y combinator, at last

Here it is again:

λf. (λx. f (x x))(λx. f (x x))

Hopefully it looks a little more familiar to you now.

Now back to the factorial function. If functions were allowed to reference themselves, we could have something of the form:

f := λx.(if x == 0 then 1 else x * f (x–1))

But we already know that we’re not allowed to do this. Instead, let’s take a step back and try something different. Let’s define a well-behaved λ-expression F that takes in f as an argument:

:= λf. λx.(if x == 0 then 1 else x * f (x–1))

But what can we do with F on its own? We need something to kick-start F by providing a value for the the parameter f. Furthermore, we want p such that Fp is equivalent to p. This is both because p is the value that we are passing into the λ-expression as the value for f (i.e. the pseudo-recursive function), and because we want Fp to do the same thing that p would have done on its own had it been allowed to refer to itself, instead of being passed in as a parameter.

That part just now was the trickiest part for me to grasp, and it also happens to be the part where we very sneakily introduce self-reference without direct self-reference, so make sure you understand why we want Fp to be equivalent to p before we move on. Feel free to give the paragraph at least another read.

In other words, we want to find a fixed-point of F. A fixed-point of a function in mathematics is an input that is unchanged by that function. For example, for f(x) = x * x, there are two fixed-points: 0 and 1.

If we can find a fixed-point p of F such that Fp is equivalent to p, we can use Fp or p (they are the same thing) as the “recursive” function without direct self-reference.

It turns out that for any λ-expression f, (λx. f (x x))(λx. f (x x)) is a fixed-point of f.

Let’s see that in action:

X = (λx. f (x x))(λx. f (x x))
X = f (x x) [x := λx. f (x x)]
X = f ((λx. f (x x)) (λx. f (x x)))
X = f X

As you can see, for any arbitrary function f, the input X remains unchanged.

Given this, we can build a function that returns a fixed-point for any function f by taking the function in as an argument:

λf. (λx. f (x x))(λx. f (x x))

And that right there is the Y combinator. It’s a function that will return a fixed-point for any input function f. Let’s call it Y. For any function f, Yf is a fixed-point of f. That is, f(Yf) is equivalent to Yf. In fact, another name for the Y combinator is the fixed-point combinator for this reason.

We just found that we can set p = YF to satisfy our requirement that Fp is equivalent to p. To kick-start the function, we can now provide p as an input to F to get F(YF), which is equivalent to just YF, and we’re ready to roll.

Let’s see what happens if you plug in our λ-expression for factorial as F and feed it the input 3 to compute the value of 3 factorial.

YF 3
F(YF)
3
λf. λx.(if x == 0 then 1 else x * f (x–1)) (YF) 3
λx.(if x == 0 then 1 else x * (YF)(x–1)) 3
if 3 == 0 then 1 else 3 * (YF)(3–1)
3 * (YF) 2
3 * F(YF) 2
3 * (λf. λx.(if x == 0 then 1 else x * f (x–1)) (YF) 2)
3 * (λx.(if x == 0 then 1 else x * (YF)(x–1)) 2)
3 * (if 2 == 0 then 1 else 2 * (YF)(2–1))
3 * (2 * (YF) 1)
6 * (YF) 1
6 * F(YF) 1
6 * (λf. λx.(if x == 0 then 1 else x * f (x–1)) (YF) 1)
6 * (λx.(if x == 0 then 1 else x * (YF)(x–1)) 1)
6 * (if 1 == 0 then 1 else 1 * (YF)(1–1))
6 * (YF) 0
6 * F(YF) 0
6 * (λf. λx.(if x == 0 then 1 else x * f (x–1)) (YF) 0)
6 * (if 0 == 0 then 1 else 0* (YF)(0–1))
6 * 1
6

… and there you have it.

3! = 6

That’s all I have to say.


Waterfall (M. C. Escher)

If you’ve made it this far, I hope you’ve learned something. I’m an iOS engineer by trade, but I think it’s fun to think about these things for a change of pace and perspective. If you have any questions or comments, feel free to leave them here or find me @ayanonagon on Twitter.

Thanks for reading!

P. S. If you like it, recommend it. ❤


Notes

  • Special thanks to Brent Yorgey for teaching me λ-calculus in CIS 399, Fall 2012. Easily one of my favorite classes I’ve ever taken.
  • I took some shortcuts like using if-else statements for the sake of readability, but these can be defined using λ-calculus as shown here if you want to be more pedantic.
  • You might need to read this article more than once to fully grasp some of the concepts. It might help to read through the Wikipedia article too, although it’s a little dense.
  • If you enjoyed this, you should look into Haskell (the official Haskell website even has a section dedicated to λ-calculus) or some other functional language (Scheme, OCaml) or functional-flavored language (Ruby, Python, Swift).