Serialising Functions in Python

Greyboi
6 min readApr 2, 2016

--

Serialising something means turning it from an in-memory structure to a simple string. This process can be reversed by deserialising the string back to the in-memory structure.

Functions are first class objects in Python. Sometimes it’d be very convenient to serialise/deserialise them. Can we do it?

Let’s look at a really simple function, the identity function:

def identity(n):
return n

That’s a pretty boring function. Let’s call it:

identity(10)
=> 10

Ok. But can you serialise it?

The standard library for serialising things in Python is called pickle. In fact, it’s so standard that serialisation is called “pickling”.

Let’s pickle & unpickle our identity function:

>>> import pickle
>>> def identity(n):
... return n
...
>>> identity(4)
4
>>> ser = pickle.dumps(identity)
>>> ser
‘c__main__\nidentity\np0\n.’
>>> id2 = pickle.loads(ser)
>>> id2(5)
5

That’s really cool! So I can define a function, pickle it, send it over the wire, unpickle it and run it. Security concerns notwithstanding, that’s awesome.

Have a look at that “serialised” string though…

>>> ser
‘c__main__\nidentity\np0\n.’

That looks like a reference to the module, the function name, and the signature (ie: __main__, identity, p0, n). Where’s the function body?

It’s going to require identity to already exist during deserialisation. I did the above in an interactive python session. Now I’ll stop that session, start a new one, and try again below:

>>> import pickle
>>> id2 = pickle.loads(‘c__main__\nidentity\np0\n.’)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/usr/lib/python2.7/pickle.py”, line 1382, in loads
return Unpickler(file).load()
File “/usr/lib/python2.7/pickle.py”, line 858, in load
dispatch[key](self)
File “/usr/lib/python2.7/pickle.py”, line 1090, in load_global
klass = self.find_class(module, name)
File “/usr/lib/python2.7/pickle.py”, line 1126, in find_class
klass = getattr(mod, name)
AttributeError: ‘module’ object has no attribute ‘identity’

Oh. Super disappointing.

Let’s try a replacement for pickle, called dill. We’ll serialise identity, and see what we get.

>>> import dill
>>> def identity(n):
... return n
...
>>> identity(4)
4
>>> ser = dill.dumps(identity)
>>> ser
‘\x80\x02cdill.dill\n_create_function\nq\x00(cdill.dill\n_load_type\nq\x01U\x08CodeTypeq\x02\x85q\x03Rq\x04(K\x01K\x01K\x01KCU\x04|\x00\x00Sq\x05N\x85q\x06)U\x01nq\x07\x85q\x08U\x07<stdin>q\tU\x08identityq\nK\x01U\x02\x00\x01q\x0b))tq\x0cRq\rc__builtin__\n__main__\nh\nNN}q\x0etq\x0fRq\x10.’

Whoo, that’s a bit more serious. Maybe it includes everything we need? Let’s try importing it in a context where identity isn’t already defined:

>>> import dill
>>> id = dill.loads(‘\x80\x02cdill.dill\n_create_function\nq\x00(cdill.dill\n_load_type\nq\x01U\x08CodeTypeq\x02\x85q\x03Rq\x04(K\x01K\x01K\x01KCU\x04|\x00\x00Sq\x05N\x85q\x06)U\x01nq\x07\x85q\x08U\x07<stdin>q\tU\x08identityq\nK\x01U\x02\x00\x01q\x0b))tq\x0cRq\rc__builtin__\n__main__\nh\nNN}q\x0etq\x0fRq\x10.’)
>>> id(7)
7

Ok, that’s great. Now dill is a pure python library, so it must be possible to completely serialise this function, including the function body, and restore it again later, using pure python, even though pickle doesn’t know how to do this.

Dill is a pretty small library, and most of the interesting stuff happens in dill.py. The important bit for serialising functions is this function (boring awful bits elided):

def save_function(pickler, obj):
...
if PY3:
pickler.save_reduce(_create_function, (obj.__code__,
globs, obj.__name__,
obj.__defaults__, obj.__closure__,
obj.__dict__), obj=obj)
else:
pickler.save_reduce(_create_function, (obj.func_code,
globs, obj.func_name,
obj.func_defaults, obj.func_closure,
obj.__dict__), obj=obj)

and this one:

def _create_function(fcode, fglobals, fname=None, fdefaults=None, \
fclosure=None, fdict=None):
if fdict is None: fdict = dict()
func = FunctionType(fcode, fglobals, fname, fdefaults, fclosure)
func.__dict__.update(fdict) #XXX: better copy? option to copy?
return func

In save_function, obj is the function. It breaks the function into bits, and serializes the bits, along with a reference to _create_function; that function reverses the process and returns a new function which should be identical to the original one.

When putting the function back together, the vital line is this one:

FunctionType(fcode, fglobals, fname, fdefaults, fclosure)

These parameters are the important ones. What are they?

  • fcode is a code object, representing the function body (and signature I think). Comes from obj.__code__
  • fglobals is a list of globals.
  • fname is the function’s name. Comes from obj.__name__
  • fdefaults contains the default argument values, comes from obj.__defaults__
  • fclosure contains all the externally referenced symbols not in globals, which are used in the function. Comes from obj.__closure__

So basically, you can copy a function by pulling these pieces out, and then putting them back into FunctionType. Let’s try it!

>>> funcdetails = [
... identity.__code__,
... identity.__globals__,
... identity.__name__,
... identity.__defaults__,
... identity.__closure__
... ]
>>> funcdetails
[<code object identity at 0x7f7eff2d8e30, file "<stdin>", line 1>, {'dill': <module 'dill' from '/usr/local/lib/python2.7/dist-packages/dill/__init__.pyc'>, '__builtins__': <module '__builtin__' (built-in)>, '__package__': None, '__name__': '__main__', 'funcdetails': [...], '__doc__': None, 'identity': <function identity at 0x7f7efcfb50c8>}, 'identity', None, None]
>>> from types import FunctionType
>>> id2 = FunctionType(*funcdetails)
>>> id2(6)
6

So we can see that the elements of the function object that really matter here are the code, the globals, and the name.

Ok, this is starting to make some sense.

Let’s try serialising some more complex functions. How about a recursive function?

>>> import dill
>>> def factorial(n):
... return factorial(n-1) * n if n > 0 else 1
...
>>> ser = dill.dumps(factorial)
>>> ser
“cdill.dill\n_create_function\np0\n(cdill.dill\n_load_type\np1\n(S’CodeType’\np2\ntp3\nRp4\n(I1\nI1\nI3\nI67\nS’|\\x00\\x00d\\x01\\x00k\\x04\\x00r\\x1e\\x00t\\x00\\x00|\\x00\\x00d\\x02\\x00\\x18\\x83\\x01\\x00|\\x00\\x00\\x14Sd\\x02\\x00S’\np5\n(NI0\nI1\ntp6\n(S’factorial’\np7\ntp8\n(S’n’\np9\ntp10\nS’<stdin>’\np11\ng7\nI1\nS’\\x00\\x01'\np12\n(t(ttp13\nRp14\nc__main__\n__dict__\ng7\nNN(dp15\ntp16\nRp17\n.”
>>> fact2 = pickle.loads(ser)
>>> fact2(10)
3628800

Great! How’s the recursion working? Let’s look in globals:

>>> factorial.__globals__
{‘__builtins__’: <module ‘__builtin__’ (built-in)>, ‘__name__’: ‘__main__’, ‘__doc__’: None, ‘factorial’: <function factorial at 0x7f8f84c8d578>, ‘__package__’: None}

Right, the recursive reference to factorial is happening via globals.

You know what we haven’t tried? A lambda expression. Let’s give it a shot.

>>> import dill
>>> inc = lambda x: x + 1
>>> ser = dill.dumps(inc)
>>> ser
‘\x80\x02cdill.dill\n_create_function\nq\x00(cdill.dill\n_load_type\nq\x01U\x08CodeTypeq\x02\x85q\x03Rq\x04(K\x01K\x01K\x02KCU\x08|\x00\x00d\x01\x00\x17Sq\x05NK\x01\x86q\x06)U\x01xq\x07\x85q\x08U\x07<stdin>q\tU\x08<lambda>q\nK\x01U\x00q\x0b))tq\x0cRq\rc__builtin__\n__main__\nh\nNN}q\x0etq\x0fRq\x10.’
>>> inc2 = dill.loads(ser)
>>> inc2(4)
5

Ok, great. How about higher order functions?

>>> def makeadder(a):
... def adder(b):
... return a + b
... return adder
...
>>> add5 = makeadder(5)
>>> add5(9)
14
>>> ser = dill.dumps(makeadder)
>>> ma2 = dill.loads(ser)
>>> add7 = ma2(7)
>>> add7(3)
10

Cool, ok that’s pretty powerful. How about the inner function it returns?

>>> import dill
>>> add8 = makeadder(8)
>>> add8(4)
12
>>> add8.__globals__
{‘__builtins__’: <module ‘__builtin__’ (built-in)>, ‘add8’: <function adder at 0x7f13f03875f0>, ‘makeadder’: <function makeadder at 0x7f13f0387578>, ‘__name__’: ‘__main__’, ‘__package__’: None, ‘__doc__’: None}
>>> add8.__closure__
(<cell at 0x7f13f038f280: int object at 0x15090b0>,)
>>> add8.__closure__[0].cell_contents
8
>>> ser = dill.dumps(add8)
>>> add8b = dill.loads(ser)
>>> add8b(14)
22

Look at that closure. Up to now they’ve all been empty (trust me), but this one’s a tuple, containing one value, 8. That’ll be the value corresponding to the free variable a in adder.

Let’s try a higher order recursive function… oh noes!!!

>>> def makerangefact(a):
... def factorial(n):
... return factorial(n-1) * n if n > 0 else 1
... def rangefact(b):
... return factorial(b) / factorial(a) if b > a and a > 0 else None
... return rangefact
>>> rf = makerangefact(3)
>>> rf(5)
20
>>> ser = dill.dumps(rf)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/usr/local/lib/python2.7/dist-packages/dill/dill.py”, line 243, in dumps
dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
File “/usr/local/lib/python2.7/dist-packages/dill/dill.py”, line 236, in dump
pik.dump(obj)
File “/usr/lib/python2.7/pickle.py”, line 224, in dump
self.save(obj)
File “/usr/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/usr/local/lib/python2.7/dist-packages/dill/dill.py”, line 798, in save_function
obj.__dict__), obj=obj)
File “/usr/lib/python2.7/pickle.py”, line 401, in save_reduce
save(args)
File “/usr/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/usr/lib/python2.7/pickle.py”, line 562, in save_tuple
save(element)
File “/usr/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
File “/usr/lib/python2.7/pickle.py”, line 548, in save_tuple
save(element)
File “/usr/lib/python2.7/pickle.py”, line 286, in save
f(self, obj) # Call unbound method with explicit self
… snipped out a lot of repeated stuff …File “/usr/local/lib/python2.7/dist-packages/dill/dill.py”, line 1039, in save_cell
pickler.save_reduce(_create_cell, (obj.cell_contents,), obj=obj)
File “/usr/lib/python2.7/pickle.py”, line 401, in save_reduce
save(args)
File “/usr/lib/python2.7/pickle.py”, line 284, in save
f = self.dispatch.get(t)
File “/usr/local/lib/python2.7/dist-packages/dill/dill.py”, line 359, in get
return self[key]
RuntimeError: maximum recursion depth exceeded

Why did it do that? Let’s take a peek into rf’s closure:

>>> rf.__closure__
(<cell at 0x7f13ee099600: int object at 0x1509128>, <cell at 0x7f13ee099638: function object at 0x7f13ee0a4320>)
>>> rf.__closure__[0].cell_contents
3
>>> rf.__closure__[1].cell_contents
<function factorial at 0x7f13ee0a4320>
>>> rf.__closure__[1].cell_contents.__closure__
(<cell at 0x7f13ee099638: function object at 0x7f13ee0a4320>,)
>>> rf.__closure__[1].cell_contents.__closure__[0].cell_contents
<function factorial at 0x7f13ee0a4320>

Can you see what’s going on? rf has factorial in its closure, and factorial has factorial in its closure. That is, there’s a cycle. Note the pointer values are the same; this is the same function object.

It turns out that the closures of inner functions form a graph which is simple tree, except for recursive inner functions (including mutually recursive and more complex structures), which have cycles and so are much more complex kinds of graph. So any naive tree walking algorithm will blow out in this case, as we saw above with dill.

That seems bad, and it is. It turns out that this whole class of recursive inner functions cause serialisation to fail. These are a perfectly legitimate and very useful class of functions (eg: they turn up when writing functional mapping code).

So far I haven’t found any libraries that can handle this class of functions. They can be accommodated however. First we need to dig deeper into Python’s functions and closures, which I’ll do in the next post.

--

--

Greyboi

I make things out of bits. Great and terrible things, tiny bits.