ADVANCED PYTHON PROGRAMMING

Metaphysics

This time, we’re going to talk about classes as objects, and their genesis: metaclasses—and see it’s not all that complicated.

Dan Gittik
15 min readApr 23, 2020

--

So far, we’ve been talking about objects: what they are, how they behave, methods and attributes, descriptors, context management and creation. All objects are defined in classes, so we’ve been actually talking about them, too, all along—but classes are interesting in and of themselves; and, at least in my experience, vastly misunderstood.

The Curious Case of the Class in Python

Other languages, like C++, don’t actually have a notion of a “class” in their code—it’s mainly directives for the compiler as to how it should lay out and wire objects of this type. This way, all the attributes and methods are linked to the same values and code; all the checks and constraints are enforced; and there’s no need for a class object in the final executable. Even Java, which supports reflection, doesn’t really treat classes like objects—it simply stores their metadata and provides an interface to manipulate it.

Python, on the other hand, makes no such discrimination: classes can be passed as arguments, returned as return values, have attributes, and more— much like functions:

>>> class A:
... pass
>>> A
<class 'A'>
>>> A.__name__
'A'
>>> def instantiate(cls):
... return cls()
>>> a = instantiate(A)
>>> a
<A object at 0x...>
>>> def create_class(x):
... class A:
... def f(self):
... return x
... return A
>>> A = create_class(1)
>>> a = A()
>>> a.f()
1

Decorating Classes

An interesting implication of this is that, just like functions, classes can be decorated—you simply stick a decorator on top, and the moment the class is created, it’s piped through that decorator, with whatever’s returned bound to its name instead. While functions are usually replaced with a wrapper, classes are modified in-place—like iterating over their methods and replacing those. For example:

>>> def double(f):
... def wrapper(*args, **kwargs):
... return 2 * f(*args, **kwargs)
... return wrapper
>>> def double_all(cls):
... for key, value in cls.__dict__.items():
... if key.startswith('_') or not callable(value):
... continue
... setattr(cls, key, double(value))
... return cls
>>> @double_all
... class A:
... def inc(self, x):
... return x + 1
... def add(self, x, y):
... return x + y
>>> a = A()
>>> a.inc(1)
4 # (1 + 1) * 2
>>> a.add(1, 2)
6 # (1 + 2) * 2

It’s pretty simple: we go over the class’s __dict__, skip any private or magic methods and non-callable attributes, and replace the rest with a doubled version of themselves. A few notes:

  1. Unfortunately, class __dict__s don’t support assignment; if you try to do A.__dict__['x'] = 1 it just won’t work. Use setattr instead.
  2. Some people don’t like the __dict__ notation, because it looks a bit low-level, so you might see them using the built-in vars function instead (for both classes and objects). vars(o) simply returns o.__dict__, pretty much like type(o) simply returns o.__class__.
  3. Don’t forget to return cls from your decorator after you’re done—otherwise it’ll return None, which will bind to A, and you’ll get a weird 'NoneType' object is not callable exception when you instantiate it.

But let’s see an example of something that’s actually useful—like making a class thread-safe. If you’d be up for it, you could go over it carefully, and apply multiple locks in clever ways to achieve optimal performance; but if you just want to get it done as soon as possible so that your app stops crashing, this decorator can save the day:

import threading
def threadsafe(cls):
cls._lock = threading.Lock()
for key, value in cls.__dict__.items():
if key.startswith('_') or not callable(value):
continue
setattr(cls, key, synchronized(value, cls._lock))
return cls
def synchronized(f, lock):
def wrapper(*args, **kwargs):
with lock:
return f(*args, **kwargs)
return wrapper

What this does is iterate over a class’s methods, and replace them with proxy methods that synchronize against one coarse lock; none of these methods will happen in parallel to another, thus rendering the class thread-safe. And all you have to do to apply it like this:

@threadsafe
class File:
def __init__(self, path):
self.path = path
def read(self):
with open(self.path) as fp:
return fp.read()
def write(self, data):
with open(self.path, 'w') as fp:
fp.write(data)

We definitely don’t want one thread reading a file while another thread is writing it; so you might as well make file access synchronized. Do note that it means all files accesses are synchronized, because the lock is defined on the class level—even though threads working on separate files can go about their business concurrently. We can remedy this with a slightly more sophisticated approach:

import threading
def threadsafe(cls):
for key, value in cls.__dict__.items():
if key.startswith('_') or not callable(value):
continue
setattr(cls, key, synchronized(value))
init = getattr(cls, '__init__', None)
if init:
def __init__(self, *args, **kwargs):
init(self, *args, **kwargs)
self._lock = threading.Lock()
else:
def __init__(self, *args, **kwargs):
super(cls, self).__init__(*args, **kwargs)
self._lock = threading.Lock()
cls.__init__ = __init__
return cls
def synchronized(f):
def wrapper(self, *args, **kwargs):
with self._lock:
return f(self, *args, **kwargs)
return wrapper

What this code does is replace all methods with versions of themselves that are synchronized to a per-instance _lock; and the lock is added to every instance by replacing its __init__ with one that calls the original, but also sets that _lock—or, if it doesn’t have one, adds it. Again, a couple notes:

  1. Python 3 lets you call super without arguments: it automagically knows what class and instance you mean—but it does so by inspecting the context of the class in which it’s defined. Since we’re defining it outside this context, we have to provide them explicitly.
  2. We’re access the methods through the class, so we get them unbound; that means that when we call init in our alternative __init__, or f in our wrapper, it’s not enough to pass *args and **kwargs—we also need to pass the self we barrowed for our own use.

Similarly, you might remember that when we just started talking about objects, I mentioned functools.total_ordering, which hopefully makes more sense now. As an exercise, you can implement it yourself—a class decorator that extrapolates __eq__ and one more comparison operator into a total order, with all other comparison operators derived from that.

The Meta-Birds and Meta-Bees

Hopefully by now, I’ve convinced you that treating classes like objects has its merits—but wait, it gets better. We discussed how objects are created, and how at the end of the day, it’s object that does the heavy lifting. But how are classes created? And who’s pulling the strings?

Let’s start at the beginning: you might not know it, but you’ve actually been creating classes all this time. Just so:

>>> class A:
... pass
# Class created!

Admittedly, that doesn’t look exactly like assignment—but neither does a def statement, and it creates a function nonetheless. And while objects can have constructors with any signature, depending on the definition of their __init__s and __new__s—classes have a fixed format (again, much like functions): a name, a list of superclasses, and a dictionary of attributes and methods. Here it is:

>>> A.__name__
'A'
>>> A.__bases__
(object,)
>>> A.__dict__
{}

And with a slightly more interesting example:

>>> class B(A):
... x = 1
... def f(self):
... return 2
>>> B.__name__
'B'
>>> B.__bases__
(A,)
>>> B.__dict__
{'x': 1, 'f': <function B.f at 0x...>}

Frankly, it’s nothing more than syntactic sugar: the name is inferred from the identifier after the class; the superclasses, from the identifiers in-between the parenthesis; and the attributes dictionary is simply the local scope of the class’s body, after it’s been executed. Don’t believe me? See for yourself:

>>> class A:
... for i in range(3):
... print('Hello, world!')
Hello, world!
Hello, world!
Hello, world!
>>> A.i
2

Unlike C++ and Java, a class is not a bunch of instructions for the compiler—it’s a living organism, and its body executes just as any other code, even if it means printing stuff. Incidentally, after the loop is over, i remains in the scope with the value of its last iteration, 2—so Python adds it as a class attribute.

But who does the actual allocation? Well, similarly to how object has the final word in object creation, so does type for classes. Yeah—it’s the same function we used to figure out an object’s type:

>>> type(1)
<class 'int'>

But when you call it with three arguments, it actually doubles as a class factory. Just pass in a name, a tuple of superclasses, and an attributes dictionary—and it’ll bake you a class:

>>> A = type('A', (), {})
>>> A
<class 'A'>
>>> B = type('B', (A,), {'x': 1, 'f': lambda self: 2})
>>> B
<class 'B'>
>>> b = B()
>>> b.x
1
>>> b.f()
2
>>> isinstance(b, A)
True

Exhilarating, isn’t it?

Transcendence

type is actually more than just a class-producing function; it’s the classes’ class, also known as a metaclass. That’s why you get:

>>> class A:
... pass
>>> A.__class__
<type 'type'>
>>> type(A)
<type 'type'>

And similarly to how a class defines its instances behavior—so does a metaclass define its classes behavior. It’s all very meta, but we’ll do it step by step: when you want to customize an object’s initialization, you simply do:

>>> class A:
... def __init__(self, name, bases, attrs):
... print('Hello, world!')
>>> a = A()
Hello, world!

Similarly, when we want to customize a class’s initialization, we add an __init__ method to its metaclass. While it’s true that all classes inherit from type, which we can’t change—it’s also true that all objects inherit from object, even if in Python 3 you don’t have to write it explicitly. So to define a metaclass, we simply need to inject ourselves into the type family, similarly to how we injected ourselves into the object family before:

>>> class M(type):
... def __init__(cls):
... print('Hello, world!')

To communicate to a class that we want it to be created by a metaclass other than type, we add the metaclass keyword to its declaration:

>>> class A(metaclass=M):
... pass
Hello, world!

The moment you press enter in your interpreter, a class object is created—its name and superclasses are collected, its body is executed, and everything is passed to its metaclass; which, in our case, is M. One word on conventions: if in regular methods we use self to denote the instance (and cls to denote the class in class methods), so we use cls to denote the class in metaclasses (and mcs to denote the metaclass, in the equivalent to meta-class methods).

This relation applies to other behaviors, too; let’s say we don’t like this representation:

>>> class A:
... pass
>>> A
<class 'A'>

We can override it in the class’s metaclass:

>>> class M(type):
... def __repr__(cls):
... return f'{cls.__name__}!'
>>> class A(metaclass=M):
... pass
>>> A
A!

Or, we can override some other behaviors. How about…

>>> class M(type):
... def __getitem__(cls):
... return 1
>>> class A(metaclass=M):
... pass
>>> A['x']
1

It’s freaky—but if you’ve ever worked with type annotations, you probably saw syntax such as List[int]; now you know how it’s done!

And that’s it: I have no idea why people treat metaclasses as a super advanced topic. Just as objects are molded by a class that defines their behavior—so are classes, which are objects themselves, molded by a metaclass. It’s just a step in an already familiar evolution: yet another class, defining yet another behavior—only it inherits from type and has a few fixed signatures and some syntactic sugar; nada más.

The Bitter Truth about Metaclasses

As cool as they are—metaclasses aren’t very practical. I mean, they’re absolutely essential for Python’s implementation, but anything else you can do with a metaclass—you can usually do without it. Let’s go over a few use-cases, and see for ourselves.

Modification

Some metaclasses modify their classes—for example, systematically replace their attributes with typed properties that assert their types. This lets you have some nice syntax, like so:

>>> class A(metaclass=TypeSafe):
... x : int = 1
... y : int = 2

This is called class annotations, and it’s pretty similar to the function ones:

>>> A.__annotations__
{'x': <class 'int'>, 'y': <class 'int'>}
>>> A.__dict__
{'x': 1, 'y': 2}

So the metaclass would be able to iterate over these, and replace them with typed properties, like so:

class TypeSafe(type):    def __init__(cls, name, bases, attrs):
for key, type in cls.__annotations__.items():
default = getattr(cls, key, None)
setattr(cls, key, TypedProperty(key, type, default))
class TypedProperty: def __init__(self, key, type, default):
self.key = key
self.type = type
self.default = default
def __get__(self, instance, cls):
if instance is None:
return self
# If the value is not defined, fall back to the default
if self.key not in instance.__dict__:
instance.__dict__[self.key] = self.default
return instance.__dict__[self.key]
def __set__(self, instance, value):
if not isinstance(value, self.type):
raise AttributeError('{self.key} must be '
'{self.type.__name__}')
instance.__dict__[self.key] = value
# If the value is deleted, reset it to the default
def __delete__(self, instance):
instance.__dict__[self.key] = self.default
class A(metaclass=TypeSafe):
...

As you can see, most of the code is actually the TypedProperty; all the metaclass does is modify the class on creation—and frankly, this can be done with a class decorator more easily:

def type_safe(cls):
for key, type in cls.__annotations__.items():
default = getattr(cls, key, None)
setattr(cls, key, TypedProperty(key, type, default))
return cls
class TypedProperty:
... # Same as before
@type_safe
class A:
...

Registration

Another use-case for metaclasses is registration—for example, in Django, you define classes that inherit from Model, whose metaclass collects them and creates a table in the database for each. We can do this with decorators…

>>> models = []
>>> def model(cls):
... models.append(cls)
... return cls

But it’s not ideal, because it doesn’t support inheritance:

>>> @model
... class A:
... pass
>>> class B(A):
... pass
>>> models
[<class 'A'>]

You’d expect B to be a model as well, but since a decorator only ever works on whatever’s right underneath it, it doesn’t get re-invoked on subclasses. Metaclasses, on the other hand, do propagate: once a metaclass creates a class, all its subclasses are also “instances” of it, so it gets called for each:

>>> models = []
>>> class ModelMetaclass(type):
... def __init__(cls, name, bases, attrs):
... models.append(cls)
>>> class Model(metaclass=Model):
... pass
>>> class A(Model):
... pass
>>> class B(A):
... pass
>>> models
[<class 'Model'>, <class 'A'>, <class 'B'>]

This also mixed in Model, which we didn’t intend, but it’s easy to filter it out. However, there’s a better solution still: classes can define the special __init_subclass__ method, which gets invoked whenever they’re subclassed:

>>> models = []
>>> class Model:
... def __init_subclass__(subclass):
... models.append(subclass)
>>> class A(Model):
... pass
>>> class B(A):
... pass
>>> models
[<class 'A'>, <class 'B'>]

So that’s perfect; and between modification and registration, there’s not much that happens on the class level—so like I said, metaclasses aren’t very practical.

Class Behavior

“But wait!” you might exclaim, “aren’t metaclasses the only way to customize class behavior? What if we want to support some funny syntax?”

In that case, you’re right indeed—there’s no way to override the class’s representation, for example, without writing a metaclass with a custom __repr__. But in practice, the only funny class syntax that common in the Python ecosystem is that List[int] I’ve mentioned, and even that can be done on the class level, using the special __class_getitem__ method:

>>> class A:
... def __class_getitem__(cls, key):
... return 1
>>> A['x']
1

Voodoo

The only time we’d really want to use a metaclass, is when we’re doing something very, very sketchy—and in those cases, we’d probably end up using its unique __prepare__ method. This interesting mechanism is invoked before the class body is executed, and returns a dictionary (or a subclass thereof) which will be used as the body’s local scope—so you could do something like replaced it with a collections.OrderedDict to retain the attribute definition order:

>>> class Ordered:
... def __prepare__(mcs, name, bases):
... return collections.OrderedDict()
... def __new__(mcs, name, bases, attrs):
... cls = super().__new__(mcs, name, bases, attrs)
... cls._order = list(attrs)
... return cls
>>> class A(metaclass=Ordered):
... x = 1
... y = 2
>>> A._order
['x', 'y']

The reason I used __new__, by the way, is that this special dictionary only gets this far; after the class is created, and its __dict__ established, it’s converted to a standard dictionary (well, something called mappingproxy) and loses all special powers.

This used to be useful, but since Python 3.6, dictionaries are ordered by default—so again, there’s no need for a metaclass. However, I could think of a few fun use-cases:

Enumerating Fruit

When I programmed in C, I used to define enums like so:

enum Fruit {
APPLE,
ORANGE,
BANANA,
}

And APPLE would be automatically assigned 0, ORANGE 1, and BANANA 2. In Python, I have to do this:

class Fruit:
APPLE = 0
ORANGE = 1
BANANA = 2

I actually think that it’s better to be explicit (even in C); but the idea that there’s something that’s easier to do in C than in Python drove me mad, so I set out on a quest to make this syntax work:

class Fruit:
APPLE
ORANGE
BANANA

At first sight, it looks ridiculous—but in fact, it’s absolutely valid Python code; the reason it crashes isn’t so much the syntax, as the fact that these silly statements, containing an expression of a single name that gets evaluates and immediately tossed away—they all reference names that were never assigned, and when the body is executed, Python tries to resolve them and we get a NameError.

But what if we’d use the metaclass to pass in a more forgiving dictionary? Say, one that doesn’t complain about missing keys—but simply assigns to them the next value of its counter?

import itertools
class EnumDict(dict): def __init__(self):
self.counter = itertools.count()
def __getitem__(self, key):
if key not in self:
self[key] = next(self.counter)
return super().__getitem__(key)
class EnumMetaclass(type):
def __prepare__(name, bases):
return EnumDict()
class Enum(metaclass=EnumMetaclass):
pass

Now, all we have to do is inherit from Enum, et voilá:

>>> class Fruit(Enum):
... APPLE
... ORANGE
... BANANA
>>> Fruit.APPLE
0
>>> Fruit.ORANGE
1
>>> Fruit.BANANA
2

Magic! Of course, we didn’t have to define Enum; metaclass=EnumMetaclass in Fruit would be enough—but using inheritance looks more elegant and feels more familiar, so it doesn’t spook people.

Overloading Your Brain

Another cool use-case is to support overloading in Python—letting the user define multiple functions with the same name but a different signature, and invoking the right one. Since Python doesn’t have types, let’s start with the function’s arity—that is, how many arguments it has. Here’s the end result:

>>> class A(Overloaded):
... def f(self, x):
... print(1)
... def f(self, x, y):
... print(2)
>>> a = A()
>>> a.f(None)
1
>>> a.f(None, None)
2

To do this, we’re going to replace the class’s dictionary with one that keeps a list for every key, and appends redifinitions to it— so it’ll let us group all the methods with the same name, and later wrap them in a dispatcher based on their arity:

class MultiDict(dict):    def __getitem__(self, key):
return super().__getitem__(key)[-1]
def __setitem__(self, key, value):
if key not in self:
super().__setitem__(key, [])
super().__getitem__(key).append(value)

To get this to work, we’d have to support __getitem__; if the body’s code assigns some variable and references it, like x = 1 and then y = x + 1, we can’t resolve it into a list—so we return the last value assigned to it, as any code would expect. Having overwritten both __setitem__ and __getitem__, we have to call into dict whenever we want to actually get or set an item, so pardon the supers. Then:

class OverloadedMetaclass(type):    def __prepare__(mcs, name, bases):
return MultiDict()
def __new__(mcs, name, bases, attrs):
real_attrs = {}
for key, values in attrs.items():
if callable(values[0]):
real_attrs[key] = Overload(values)
else:
real_attrs[key] = values[-1]
return super().__new__(mcs, name, bases, real_attrs)
class Overload: def __init__(self, fs):
self.fs = {}
for f in fs:
arity = f.__code__.co_argcount
self.fs[arity] = f
def __call__(self, *args, **kwargs):
arity = len(args) + len(kwargs)
f = self.fs[arity]
return f(*args, **kwargs)
class Overloaded(metaclass=OverloadedMetaclass):
pass

What this does is replace the regular dictionary with a MultiDict, and just before the class is created—goes over all its attributes, grouping any callables into an Overload and resolving the rest to the last value assigned to them. Overload, in turn, keeps all the functions by their arity (kindly provided by the code object’s co_argcount), and whenever it gets invoked—delegates to whatever function is registered for this amount of arguments.

If you feel adventurous, you can even support overloading based on the argument types, which you can specify using annotations; I’ll just leave this here:

>>> class A(Overloaded):
... def f(self, x: int):
... return x + 1
... def f(self, x: str):
... print('Hello, world!')
>>> a = A()
>>> a.f(1)
2
>>> a.f('Hello, world!')
Hello, world!

Parameterized Classes

One last metaclass trick: some of its special methods—namely, __prepare__, __new__ and __init__, accept variadic keyword arguments, **kwargs. This dictionary is populated by whatever keywords you put in the class’s declaration, except for metaclass itself:

>>> class M(type):
... def __init__(cls, name, bases, attrs, **kwargs):
... print(kwargs)
>>> class A(metaclass=M, x=1, y=2):
... pass
{'x': 1, 'y': 2}

This lets you define classes with keywords, which in turn lets you parameterize their definition, and can be handy for aspect-oriented programming—but we’ll talk about it when we get there. By the way: __init_subclass__ also accepts extra keywords in the subclass definition:

>>> class A:
... def __init_subclass__(cls, **kwargs):
... print(kwargs)
>>> class B(A, x=1, y=2)
... pass
{'x': 1, 'y': 2}

So yeah. Python is sorcery.

Conclusion

In this here article, we took another step towards enlightenment—this time, talking about classes as objects, and seeing how their behavior can be defined in metaclasses, just like object behavior is defined in classes (well, almost). We then saw that it’s actually not that useful—there are solutions in place to address any issue you might have that requires a metaclass; except, of course, if you’re making horcruxes. Next time, we’ll cover all the miscellaneous object-related topics that didn’t fit anywhere else, and then move on to even bigger things: modules and packages!

The Advanced Python Programming series includes the following articles:

  1. A Value by Any Other Name
  2. To Be, or Not to Be
  3. Loopin’ Around
  4. Functions at Last
  5. To Functions, and Beyond!
  6. Function Internals 1
  7. Function Internals 2
  8. Next Generation
  9. Objects — Objects Everywhere
  10. Objects Incarnate
  11. Meddling with Primal Forces
  12. Descriptors Aplenty
  13. Death and Taxes
  14. Metaphysics
  15. The Ones that Got Away
  16. International Trade

--

--

Dan Gittik

Lecturer at Tel Aviv university. Having worked in Military Intelligence, Google and Magic Leap, I’m passionate about the intersection of theory and practice.