ADVANCED PYTHON PROGRAMMING
Meddling with Primal Forces
This time, we cover attribute access and method resolution, and see how methods work under the hood using descriptors.
Last time, we said that everything in Python is an object—integers, strings, functions, instances, and even classes. We said each object has three defining properties: a unique identifier, a type, and a value. The type is the important part: it defines that object’s structure and behavior under different circumstances—whether it’s testing for equality, doing arithmetics, or iterating over it. The value is just the object’s state, which parametrizes this behavior.
What’s Left to Say
Curiously, the most sophisticated behavior is also the most underrated one: attribute access. It looks deceivingly simple, and all objects have it—but in fact, there’s a lot to be said, even if we start with just the simplest case: instance attributes. Take this class, for example:
class A:
def __init__(self, x):
self.x = x
Looks pretty nice, and works pretty well:
>>> a1 = A(1)
>>> a1.x
1
>>> a2 = A(2)
>>> a2.x
2
But where are these attributes actually stored? Knowing Python, it’s probably some namespace—and indeed:
>>> a1.__dict__
{'x': 1}
>>> a2.__dict__
{'x': 2}
What sets instances apart—them being of the same type, with the same structure and behavior—is their value, their state, which is nothing more than a dictionary:
>>> a1.__dict__['x'] = 3
>>> a1.x
3
>>> del a1.__dict__['x']
>>> a1.x
Traceback (most recent call last):
...
AttributeError: 'A' object has no attribute 'x'
Let’s make it more interesting: what happens if we define a class attribute?
class A:
y = 2
def __init__(self, x):
self.x = x
This time…
>>> a1 = A(1)
>>> a1.y
2
>>> a2 = A(2)
>>> a2.y
2>>> a1.__dict__
{'x': 1}
>>> a2.__dict__
{'x': 2}
It seems y
isn’t part of either instance’s state—so, not surprisingly, it’s the same for all instances. But again: where is it actually stored?
>>> A.__dict__
{'y': 2, ...}
The class, being an object and all, also has a namespace which keeps all of its attributes. If you spend a minute reading through the default attributes every class has in its __dict__
, you’ll find another custom one we defined: __init__
, pointing to nothing more than a function that takes self
and x
, and assigns x
to self.x
. This function works well outside the class’s context, too: but in it—specifically, in the context of one of its instances—the self
parameter is magically bound to the instance. We’ll understand how that happens soon enough; in the meantime, we’re just observing, once again, that everything is just a hierarchy of namespaces with a particular resolution order. In case of scopes, we’ve had the local one, non-local ones, and eventually the global one. In case of classes, we have the instance, the class—and, as you might suspect, any superclasses.
Inheritance Disputes
>>> class B(A):
... pass>>> b = B()
>>> b.y
2>>> b.__dict__
{}
>>> B.__dict__
{...} # No y
In this case, neither the instance nor the class have the y
attribute, yet somehow it’s resolved. This is because each class keeps a reference to its superclasses, like so:
>>> B.__bases__
(A,)
Which lets us traverse the family tree in search of our attribute. This, too, is easier said than done: what’s the right order to traverse this tree when it has the dreaded diamond diagram?
class A:
x = 1class B(A):
x = 2class C(A):
x = 3class D(B, C):
pass
D
inherits from both B
and C
, both of which inherit from A
— and what’s worse, all of them define their own x
! In this case, it turns out to be:
>>> d = D()
>>> d.x
2
So, B
wins; but even the slightest change, like this…
class D(C, B): # Changed order of superclasses
pass
Results in a different resolution order:
>>> d = D()
>>> d.x
3
The actual algorithm Python uses to “linearize” the hierarchy is called MRO, for method resolution order (even though it’s true for attributes, too). It’s a bit complicated, and not very interesting; the bottom line is that it, too, is available to us via the __mro__
class attribute:
>>> # In the first case:
>>> D.__mro__
(<class 'D'>, <class 'B'>, <class 'C'>, <class 'A'>,
<class 'object'>)>>> # In the second case:
>>> D.__mro__
(<class 'D'>, <class 'C'>, <class 'B'>, <class 'A'>,
<class 'object'>)
So in short, Python comes up with some sensible way to flatten the family tree into a line—and then loops over it, looking for our attribute one class a time.
The Curious Case of Super
Other languages, like C++ and Java, provide a way to invoke a superclass’s method, even from within a method that overrides it. Python introduces the same feature:
>>> class A:
... def f(self):
... print('A')>>> class B(A):
... def f(self):
... print('B')
... super().f()>>> b = B()
>>> b.f()
B
A
However, super
is a pretty bad name for it—because it doesn’t actually go to the superclass, but rather to the next class in the MRO sequence. This doesn’t matter in C++, which doesn’t support diamond diagrams, or Java, which doesn’t support multiple inheritance at all—but in Python, this can cause some pretty surprise behavior. Let’s go back to our previous example:
class A:
def f():
print('A')class B(A):
def f():
print('B')
super().f()class C(A):
def f():
print('C')
super().f()class D(B, C):
def f():
print('D')
super().f()
What happens if I call d.f()
? We’ll definitely get a D
—and then what? Python clearly considers B
“more of a superclass” than C
, as we’ve seen from the way it resolved x
before, so we’ll probably get a B
. But then, B
calls its superclass, so we ought to get A
next. Right?
>>> d = D()
>>> d.f()
D
B
C
A
That’s really weird if you think about it in terms of regular super
s: why the heck would C
be B
’s superclass? But I already told you the answer: it’s not. It’s just super
should’ve been called next_in_mro
or something; and that’s exactly what it does.
If you’re interested to learn more about it, check out Raymond Hettinger’s excellent PyCon talk, Super Considered Super!; otherwise, on we move.
The Favourite Child
If this isn’t enough, Python also supports dynamic attributes, like so:
>>> class A:
... def __getattr__(self, key):
... print(f'getting {key}')
... return 42
... def __setattr__(self, key, value):
... print(f'setting {key} to {value!r} (not really')
... def __delattr__(self, key):
... print(f'deleting {key} (not really')>>> a = A()
>>> a.x
getting x
42
>>> a.x = 1
setting x to 1 (not really)
>>> del a.x
deleting x (not really)
What, then, is the exact priority of all these different ways to resolve an attribute? Let’s do an experiment, and then try to implement it ourselves:
>>> class A:
... x = 'class'
... def __init__(self):
... self.x = 'instance'
... def __getattr__(self, key)
... return 'dynamic'>>> a = A()
>>> a.x
'instance'
So it seems Python checks in the instance’s __dict__
first. But what if…
>>> a.__dict__
{'x': 'instance'}
>>> del a.x
>>> a.__dict__
{}>>> a.x
'class'
Once we remove x
from the instance’s __dict__
, the attribute is still resolved—only now it takes a moment longer, since Python has to go and look for it further, in the class’s __dict__
. Let’s delete that, too:
>>> del a.x
Traceback (most recent call last):
...
AttributeError: x
Oh, wait—it’s not a
’s to delete anymore. This is really similar to scopes: the local scope is the instance’s __dict__
, and the scopes above it are its class and superclasses’ scopes. When you resolve a name or an attribute, Python checks all these scopes and __dict__
s; but when you set or delete it, Python only ever works on a specific scope. We’d have to do something like that:
>>> A.__dict__
{'x': 'class', ...}
>>> del A.x
>>> A.__dict__
{...}>>> a.x
'dynamic'
To clear the way for Python to get all the way to __getattr__
. This lookup algorithm is pretty simple to implement—and it kind of is, in another function I haven’t told you about yet: __getattribute__
. This function does, more or less, this:
class A:
def __getattribute__(self, key):
if key in self.__dict__:
return self.__dict__[key]
for cls in self.__class__.__mro__:
if key in cls.__dict__:
return cls.__dict__[key]
if hasattr(self, '__getattr__'):
return self.__getattr__(key)
raise AttributeError(key)
The truth is, when you access a.x
, it actually invokes a.__getattribute__('x')
. This, in turn, checks the instance’s __dict__
, its class and superclasses’ ones, and its __getattr__
method, before raising an AttributeError
.
The same is not true for __setattr__
and __delattr__
, by the way: since set and delete only ever work on one “scope”, they’re algorithm is pretty straightforward, and you don’t need more than one magic method to tweak it. As an aside, __getattribute__
encapsulates some pretty basic core functionality, so you shouldn’t tweak with it either, unless you really know what you’re doing. In fact, even the code I’ve written above doesn’t really work: already in checking if key in self.__dict__
, we recursively invoke __getattribute__('__dict__')
, which spins into an infinite recursion loop and raises a RuntimeError
. You’d have to do something like this:
class A:
def __getattribute__(self, key):
instance_dict = super().__getattribute__('__dict__')
if key in instance_dict:
return instance_dict[key]
...
So as you can see, it’s better not to anger the gods. Instead, let me confess to another inaccuracy in my algorithm:
How to Method
Methods are pretty interesting creatures, yet they’re pretty much taken for granted. Have you ever thought how weird is it that functions behave differently depending on whether or not they were defined inside a class, and accessed through an instance? I’ll show you what I mean:
class A:
def __init__(self, x):
self.x = x
def f(self):
return x
If I access f
through A
, I get a regular function:
>>> A.f
<function A.f at 0x...>
So I can’t call it without an argument:
>>> A.f()
Traceback (most recent call last):
...
TypeError: f() missing 1 required positional argument: 'self'
But instead, I’d have to create an instance and pass it in:
>>> a = A(1)
>>> A.f(a)
1
This is pretty cumbersome—which is why when you access the function through the instance, it happens automatically:
>>> a.f()
1
But how? Our first clue is taking a second look at the function, as it is exposed through the instance:
>>> a.f
<bound method A.f of <A object at 0x...>>
That’s… not a function; Python does some trick, and replaces it with a bound method: a callable whose first parameter is already fixed to a
. To make things worse, this only works when it’s defined in the class’s context:
>>> def g(self):
... return self.x>>> a.g = g
>>> a.g
<function g at 0x...>
>>> a.g()
Traceback (most recent call last):
...
TypeError: f() missing 1 required positional argument: 'self'
g
doesn’t know that it’s a method, so a
is not bound to its first parameter automatically. However, if you add it to the class…
>>> del a.g
>>> A.g = g
>>> a.g
<bound method A.g of <A object at 0x...>>
>>> a.g()
1
… Then not only does it become available through a
—it actually works! But how does it work?
Descriptors
Python actually has a mechanism just for that: it’s called the descriptor protocol, and it lets you define custom behavior for when your object is accessed within the context of a class or an instance. Some people confuse it with __getattr__
and __getattribute__
, because its name is so similar—but if you forget about attribute resolution for a moment, you’ll see __get__
is something else entirely:
>>> class D:
... def __get__(self, instance, cls):
... print(f'getting {self} from {instance} ({cls})')
... return 42>>> class A:
... d = D()>>> a = A()
>>> a.d
getting <D object at 0x...> from <A object at 0x...> (<class 'A'>)
42
So, __get__
has nothing to do with resolving d
’s attributes—it’s all about how that d
is resolved inside other classes or instances. I keep saying “classes”, because even though functions ignore this feature, we can “hijack” d
’s resolution even from within a class:
>>> A.d
getting <D object at 0x...> from None (<class 'A'>)
42
As you can see, this simply calls __get__
with instance
set to None
. The descriptor protocol actually supports assignment and deletion, too (albeit, only through instances):
>>> class D:
... def __get__(self, instance, cls):
... print(f'getting {self} from {instance} ({cls})')
... return 42
... def __set__(self, instance, value):
... print(f'setting {self} to {value!r} (not really)')
... def __delete__(self, instance):
... print(f'deleting {self} (not really)')>>> class A:
... d = D()>>> a = A()
>>> a.d
getting <D object at 0x...> from <A object at 0x...> (<class 'A'>)
42
>>> a.d = 1
setting <D object at 0x...> to 2 (not really)
>>> del a.d
deleting <D object at 0x...> (not really)>>> # A.d = 2 or del A.d would actually overwrite/delete d
This means we need to rethink our previous __getattribute__
implementation; because if an object is a descriptor (that is, has a __get__
method), we shouldn’t just return it, but give it a chance to “describe itself” instead. This is true only for object resolves within a class’s context, while looping through the __mro__
:
class A:
def __getattribute__(self, key):
if key in self.__dict__:
return self.__dict__[key]
for cls in self.__class__.__mro__:
if key in cls.__dict__:
# This part is new:
value = cls.__dict__[key]
if hasattr(value, '__get__'):
return value.__get__(self, self.__class__)
return value
if hasattr(self, '__getattr__'):
return self.__getattr__(key)
raise AttributeError(key)
This also affects __setattr__
and __delattr__
, because assignment and deletion also need to consider descriptors, and give them a change to set or delete themselves; which is pretty confusing, since assignment and deleting only ever work on the “local scope” (that is, __dict__
), and descriptors reside in higher scopes (that is, classes) by definition. If you still follow, it’d look like this:
class A: def __setattr__(self, key, value):
# First, look for descriptors:
for cls in self.__class__.__mro__:
if key in cls.__dict__:
descriptor = cls.__dict__[key]
descriptor.__set__(self, value)
return
# And if none is found...
self.__dict__[key] = value def __delattr__(self, key):
# Same...
for cls in self.__class__.__mro__:
if key in cls.__dict__:
descriptor = cls.__dict__[key]
descriptor.__set__(self, value)
return
if key not in self.__dict__:
raise AttributeError(key)
del self.__dict__[key]
To be completely honest, that’s still not exactly what happens—there are some edge-cases depending on whether the descriptor only defines __get__
, or defines __set__
and __delete__
as well; but I think we’ve dived deep enough. Let’s do something fun!
The Miller-Urey Experiment
In the one experiment, biochemists Stanley Miller and Harold Urey demonstrated that several organic compounds could be formed spontaneously by simulating the conditions of Earth’s early atmosphere. In other words, they put in some mud, fire and lightning in a box, and effectively created life in a lab, simulating its natural habitat. Our goal is more modest: let’s try to create a method “in interpreto”, tickling a function just right as to make bind it to some instance:
>>> def f(self):
... return self.x
>>> dir(f)
[..., '__get__', ...]
That explains some stuff—functions are descriptors right off the bat; it’s just only ever activated when they’re defined (or rather, accessed from) within a class. Functions and methods aren’t really different, then—it’s how we access them that is.
>>> class A:
... def __init__(self, x):
... self.x = x>>> a = A(1)
Now, of course we can call f
on a
explicitly:
>>> f(a)
1
But in order to bind them, we’d have to imagine f
was accessed through a
, which invoked A
’s __getattribute__
, which noticed f
a descriptor, and rather than returning it—returned f.__get__(a, A)
. That is:
>>> m = f.__get__(a, A)
>>> m
<bound method f of <A object at 0x...>>
Bingo! And if we call it without an argument:
>>> m()
1
It works! So now that we’ve created a function outside of its natural habitat, it’s time to implement methods and bound methods ourselves, just because we can—and next time, use descriptors for some more interesting features.
Our Very Own Method
First, let’s use a decorator, which would hijack a function and replace it with our own descriptor:
class A:
def __init__(self, x):
self.x = x
@Method
def f(self):
return self.x
We haven’t defined the Method
decorator yet, but it ought to copy a reference to the original function, and provide a __get__
method that’d let him create a bound method on demand.
class Method:
def __init__(self, f):
self.f = f
def __get__(self, instance, cls):
if instance is None:
return self.f
return BoundMethod(self.f, instance)
If its instance
is None
, it means it’s been accessed through a class, and there’s nothing to do—so it might as well return the original function, much like Python normally does. However, if it’s accessed through an instance, a BoundMethod
is returned in its stead:
class BoundMethod:
def __init__(self, f, instance):
self.f = f
self.instance = instance
def __call__(self, *args, **kwargs):
return self.f(self.instance, *args, **kwargs)
All this class does is collect the function and the instance, and when it’s called, forward the arguments to the function—but not before slipping the instance in as its first argument. Let’s see that it works:
>>> A.f
<function f at 0x...>
>>> a.f
<BoundMethod object at 0x...>
>>> a.f()
1
Hooray! So while our __repr__
s are a bit less elegant that Python’s, it works; and this, my friend, is how methods.
Conclusion
Python has a unique resolution mechanism—both in its ingenuity and in its openness. That lets us not only understand it better, but tap into it—whether to reimplement core features ourselves, like we did in this article, or to create seemingly impossible extensions of our own, enriching the platform into the powerful, flexible and vibrant ecosystem it is today. See you then!
The Advanced Python Programming series includes the following articles:
- A Value by Any Other Name
- To Be, or Not to Be
- Loopin’ Around
- Functions at Last
- To Functions, and Beyond!
- Function Internals 1
- Function Internals 2
- Next Generation
- Objects — Objects Everywhere
- Objects Incarnate
- Meddling with Primal Forces
- Descriptors Aplenty
- Death and Taxes
- Metaphysics
- The Ones that Got Away
- International Trade