Magic Python: mixing magics

Lars van Gemerden
The Startup
5 min readFeb 12, 2020

--

This article gives a practical example of how to use some of the python ‘magic’ features to create your own magic. We will use namedtuple, attrgetter and a descriptor to turn slow instances of classes into much faster objects, through a minimal intuitive API.

This article is mainly intended to spark your creativity, but you can use the code as is. There are also some challenges near the end to test yourself.

Update: a somewhat more advanced version of the tool described in this article can be found on https://github.com/gemerden/snapshot.

Use Case

Sometimes the classes we create are relatively slow and heavy. For example ORM model classes in SqlAlchemy or Django need to stay in sync with the database, so attribute access can take a while. If you use these classes in algorithms which need to access these attributes often, this can become a performance bottleneck.

In practice I encountered this problem when I was writing a platform for planning vehicles for package delivery. The objects (stops) were stored in a database and the algorithm needed to access the location, stop times and time restrictions of the deliveries frequently.

Solution

So what would be a solution to this problem? The ideal solution would need to be fast on attribute access, require little extra coding, be robust and easy to use and be reusable in similar cases.

The choice for an alternative data structure was not hard: a namedtuple from the collections standard library module is fast and could have the same signature (attributes) as the original classes, so there would be no need to change the algorithms to access these attributes.

It would be easy to create a method on the classes that generates these tuples, but to make it as DRY as possible I wanted to generalize the tuple generation.

A mixin class could probably have worked but a descriptor seemed like a more decoupled solution. Ideally the solution would look like this (and it does ;-) from a user-developer point of view:

import time
from tuplifier import tuplifier
class SlowObject(object):
fast_tuple = tuplifier('some', 'thing', 'slow') # <- this is all

def __init__(self, some, thing):
self.some = some
self.thing = thing

@property
def slow(self): # emulate slow attribute
time.sleep(0.001)
return self.some * self.thing

Where you can create the namedtuple by calling slow_object.fast_tuple. The only change to the original class is the he addition of tuplifier. the arguments determine which attributes are included in the namedtuple.

Before going into the implementation here is a basic example:

from tools import print_time

S = SlowObject(3, 4)
print("sample fast tuple:", S.fast_tuple)

slow_objects = [SlowObject(i, j) for i in range(20) for j in range(50)]

with print_time('summing slow objects'):
slow_sum = sum(s.slow for s in slow_objects)

with print_time('creating fast tuples'):
fast_tuples = [s.fast_tuple for s in slow_objects]

with print_time(' summing fast tuples'):
fast_sum = sum(f.slow for f in fast_tuples)
assert slow_sum == fast_sum

would print:

sample fast tuple: fast_tuple(some=3, thing=4, slow=12)summing slow objects took 1.001421 seconds
creating fast tuples took 1.002835 seconds
summing fast tuples took 0.000072 seconds

As you can see, in this (somewhat artificial) example the algorithm (sum()) became more than a 10.000 times faster.

print_time() is a simple performance timer for code snippets.

Implementation

Essentially the solution creates snapshots of instances of the original class. Writing it required surprisingly little code, due to how well the python standard library fits together in practice:

from operator import attrgetter
from collections import namedtuple

class tuplifier(object):

def __init__(self, *names):
self.names = names # names of the attributes
self.getter = attrgetter(*names) # create a getter
self.tuple_type = None

def
__set_name__(self, cls, name): # create a namedtuple type
self.tuple_type = namedtuple(name, self.names)

def __get__(self, obj, cls=None):
if obj is None: # if called on class, return descriptor
return self
return self.tuple_type(*self.getter(obj))

The tuplifier descriptor above works in the following steps:

  1. When an instance of the descriptor is created, the constructor takes the attribute names to be gathered. It creates a getter function with attrgetter to efficiently get them from any object having these attributes,
  2. During creation of the class with a tuplifier descriptor (e.g. SlowObject) python automatically calls __set_name__ which creates the named tuple subclass assigned to tuple_type,
  3. When the attribute with the descriptor name is called on the SlowObject instance, the __get__ method is called, which gets the attribute values from the object in a tuple and constructs the namedtuple from this tuple.

Note that the functions (like attrgetter) in the python standard library are often implemented in C. This makes them relatively fast, compared to writing your own. Also note that although creating these tuples increases the memory footprint of the program, (named-) tuples have a very low memory footprint compared to normal class instances.

At the end of the article, some pointers and links are included for the use of attrgetter, namedtuple and descriptors.

Benefits

In the end we have small tool that gives us;

  • A way to transform instance of many classes into fast namedtuples, which can be used as drop-in replacements for instances of the original class,
  • A decoupled way to add this transformation functionality to a class using a descriptor; no other changes to the target class are required,
  • The chances of serious bugs are relatively low, due to the simplicity of the solution and the use of the well-tested standard library,
  • By turning complex class instances into (named-) tuples, it becomes easier to reason about what happens with the instances in complex scenarios,
  • It can simplify the use of external libraries, for example: pickle or other serializers will work with less issues and might become faster,
  • Using (named-) tuples can have other benefits, like immutability, handy methods like ._asdict() (for JSON serialization) and the ability to unpack, as in:
    for some, thing, slow in fast_tuples:
do_something_with(some, thing, slow)

You can use multiple tuplifier attributes in the same class to get different namedtuple versions of the class.

Challenges

The are a couple of potential enhancements to tuplifier. Instead of just adding the code here (and probably making the story confusing), I though it would be more interesting to add them as a challenge:

  1. Make the tuplifier be able to access nested values in the original objects. Hint: look at the python docs for attrgetter(),
  2. Add a method to the tuplifier to create .csv files. Calling this method would look something like SlowObject.fast_tuple.to_csv(slow_objects, filename),
  3. Add a method to the tuplifier to apply the tuplifier to a list of slow objects, like: fast_tuples = SlowObject.fast_tuple(slow_objects). Hint: use a dunder (double underscore) method.

These challenges should not be more than a few extra lines of code each.

Pointers

Apart from the use of a descriptor, there are two standard library items that might need some explanation (with links to better explanations):

  • attrgetter() creates a function that returns a single value or a tuple with the attributes taken from its argument, so if g = attrgetter('some', 'thing') then g(SlowObject(3, 4)) returns (3, 4). More info can be found in the python docs here.
  • namedtuple creates a new tuple subclass where the fields can be accessed as attributes, so if N = namedtuple('N', ['a', 'b']) and if n = N(3, 4) , n.b would be 4. More in the python docs here.

Descriptors modify attribute access (get, set and delete) on objects. In tuplifier only the ‘get’ (__get__()) method is implemented. A good explanation of descriptors can be found here.

This is my first article on Medium. Feedback is more than welcome!

--

--