Digging Deeper into Recursive Inner Functions in Python

Greyboi
6 min readApr 3, 2016

Ok, so in the previous article, I showed that you can serialise Python functions using pure Python, except for a specific class of functions; recursive inner functions.

Here’s an even simpler example than the one from that article.

>>> import dill
>>> def dosomething():
... def factorial(n):
... return factorial(n-1) * n if n > 0 else 1
...
... return factorial
...
>>> dill.dumps(dosomething())

And the failure we get is the same:

RuntimeError: maximum recursion depth exceeded

To really get into the problem here, we’re going to need to see what’s going on. To do that, I’m going to use the following diagnostic function, for seeing into a function and its closure graph.

import typesdef dumpf(f):
def getprinti(indent):
def printi(arg):
print "%s%s" % ("#" * indent, arg)
return printi
def dodumpitem(item, indent, foundf):
printi = getprinti(indent)

if isinstance(item, types.FunctionType):
printi(item)
dodumpclosure(item, indent+1, list(foundf) + [item])
else:
printi("%s" % item) # not a function
def dodumpclosure(f, indent, foundf):
printi = getprinti(indent)
if f.func_closure:
for item in [lcell.cell_contents for lcell in f.func_closure]:
if isinstance(item, types.FunctionType):
if f == item:
printi("%s - Function in its own closure, stop traverse" % item)
elif item in foundf:
printi("%s - Function in ancestor closure, stop traverse" % item)
else:
dodumpitem(item, indent, foundf)
else:
dodumpitem(item, indent, foundf)
dodumpitem(f, 0, [])

Let’s give it a shot.

>>> def identity(n):
... return n
...
>>> dumpf(identity)
<function identity at 0x7ff8c5f08938>

Ok, that function’s got no closure. Let’s try one that does:

>>> def makeadder(a, b):
... def adder():
... return a + b
... return adder
...
>>> dumpf(makeadder(5, 3))
<function adder at 0x7ff8c5f08e60>
#5
#3

Now that’s more interesting. We’re dumping the inner function adder, with a closure with the values 5 and 3; these correspond to the values of the parameters a and b in makeadder.

How about a function with another function in its closure?

>>> def logged(f):
... def callf(*args, **kwargs):
... print (args)
... print (kwargs)
... return f(*args, **kwargs)
... return callf
...
>>> loggedid = logged(identity)
>>> x = loggedid(6)
(6,)
{}
>>> x
6
>>> dumpf(loggedid)
<function callf at 0x7ff8c5f08f50>
#<function identity at 0x7ff8c5f08938>

The function logged takes a function, and wraps it in a function that logs its arguments, but otherwise leaves it unchanged. Experienced Pythonistas will recognise this as a standard decorator pattern (which I could have applied to identity with @logged at declaration time).

Look at the dump, you can see that loggedid is actually the function callf, with identity as the value of the one free variable in callf (ie: f).

Ok, now let’s try our problematic function, dosomething. Remember that dosomething returns the factorial function, which is recursive:

>>> dumpf(dosomething())
<function factorial at 0x7ff8c5f14488>
#<function factorial at 0x7ff8c5f14488> — Function in its own closure, stop traverse

We see here that the result of dosomething is indeed factorial, and that in factorial’s closure is just one item, factorial itself, with the same pointer value; ie: we’re looking at a bonefide closure loop.

>>> def outer():
... def fact1(n):
... return fact2(n-1) * n if n > 0 else 1
... def fact2(n):
... return fact1(n-1) * n if n > 0 else 1
... return fact1
...
>>> f = outer()
>>> dumpf(f)
<function fact1 at 0x7ff8c5f141b8>
#<function fact2 at 0x7ff8c3c28668>
##<function fact1 at 0x7ff8c5f141b8> — Function in ancestor closure, stop traverse

This is a fun one. This time, factorial is split into two functions which are mutually recursive; each one calls the other. And in the dump, you can see that f is fact1, which has fact2 in its closure, which in turn has fact1 in its closure. That’s another, more indirect loop.

So all that’s interesting, but can you serialise it?

Reframing the problem

So, we’ve seen that functions can have a closure, which is a sequence of things (actually a tuple), that this closure can contain more functions, and in the case of recursion, functions can appear in their own closure graph, or in the closure of functions in their closure and so on transitively, meaning we can get cycles.

The library dill’s straightforward approach to serialisation treats closures as a tree. As long as the closure is a tree, it works. But as soon as there are cycles, dill’s approach breaks.

You could straightforwardly modify dill’s save function to detect cycles and stop the recursive walk, in exactly the way the the dumpf function above stops when it detects recursion. And you could add placeholders at this point, reversing this during deserialisation.

But, it turns out that you can’t deserialise something like that, because the FunctionType method requires a tuple of closure items, ie: an immutable structure. If you had placeholders in the serialised closure structure, you’d need to replace them during deserialisation like this:

1 — Construct closure c

2 — Construct function f using closure c

3 — Replace placeholder(s) in c with f

However, you can’t do step 3, because both c and f are immutable.

You could try some sort of strategy with putting f in globals, and having the placeholder function lookup f in globals at runtime, and, well, look, if you can make that work, good on you. I have tried. It hurt. I failed.

But there’s another approach. What if we could somehow eliminate the recursion?

Combinators

A combinator is a function with no free variables.

def f1(a):
return a
def f2():
return b
def f3(a):
return f3(a-1) + 1 if a > 0 else 0

f1 is a combinator, because it only references things in its argument list; ie: a is bound to an argument. f2 is not a combinator, because it references something external, ie: b is a free variable.

f3, a recursive function, is not a combinator because its self reference, f3, is not in its own argument list; f3 is actually considered a free variable in its own body.

In a Python function, names can be bound to any of three places:

  • Globals
  • Arguments
  • Closure

The globals aren’t something I’m looking at here, and you really don’t have to worry about them too much; you can’t really serialise them, you just have to either avoid touching them, or ensure that you are deserialising in the context of the same globals as where you are serialising (which is normally the case).

The arguments are bound inside the __code__ object, and aren’t involved in serialisation of functions because we don’t actually supply values until call time (ie: after deserialisation).

The closure is interesting though. The closure contains bound values for all the names in your function (besides globals) which would otherwise be considered free variables. Because of this, these free variables don’t behave like free variables at all; they can be considered bound.

The point of lexical closure is really to turn any function into a combinator, by supplying an implicit set of arguments whose values are carried along with the first class function object.

This is excellent for serialisation, but the one problem we have is that we can’t serialise/deserialise if there are cycles, ie: we can’t cope with recursion in the closure.

It turns out that in the theory of combinators, recursive functions are not considered combinators because the recursion doesn’t refer to something in the argument list (as explained in f3 above).

Why care about this? Because there is a great body of theoretical work involved in trying to make recursion work in languages / formalisms that only work with combinators, and that theory was entirely successful.

If we consider functions with closures that are proper trees (no cycles) to be combinators, then we should be able to apply that same theory to transform our recursive inner functions into non-recursive equivalents, and thus serialise this class of functions.

I’ll show how to do this in the next very weird post.

--

--

Greyboi

I make things out of bits. Great and terrible things, tiny bits.