How Does Attribute Access Work?
Have you ever wondered how the CPython interpreter handles attribute access on a class or an instance of a class? Or how the course of normal attribute lookup changes when you override the attribute access dunder methods __getattribute__ and __getattr__ ? What underlying C functions get called and in what order? Do the same C functions get invoked when you access an attribute on a class as opposed to an instance of a class? In this post we are going to take an in depth look at how attribute access works at the CPython level. We are going to follow through and analyze the chain of the core C function calls which take place when the CPython interpreter encounters the attribute access dot (.) operator.
As you read through this post, I encourage you to keep a copy of the CPython source code handy and refer to it from time to time. You can get a copy of the source code by following the instructions here. This post was written using Python 3.5.
With that said, let’s begin by analyzing how attribute access works on instances of classes by looking at a simple example:
We define an empty class Foo, create an instance f of that class and try to access an attribute x on this instance. This operation raises an AttributeError — no surprises there. So how does the CPython interpreter decide that it can’t find the attribute ‘x’ and needs to raise an AttributeError? Let’s investigate by first looking at the execution instruction set (aka disassembled bytecode) the python compiler generates for the interpreter from our very simple example above:
2 0 LOAD_BUILD_CLASS
2 LOAD_CONST 0
4 LOAD_CONST 1 ('Foo')
6 MAKE_FUNCTION 0
8 LOAD_CONST 1 ('Foo')
10 CALL_FUNCTION 2
12 STORE_NAME 0 (Foo)
5 14 LOAD_NAME 0 (Foo)
16 CALL_FUNCTION 0
18 STORE_NAME 1 (f)
6 20 LOAD_NAME 1 (f)
22 LOAD_ATTR 2 (x)
26 LOAD_CONST 2 (None)
The compiler emits quite a few opcodes (instructions) for executing our very simple example. We are particularly interested in the LOAD_ATTR . This is the opcode which initiates the attribute lookup process on a class or an instance of a class. The C code which gets executed when the python interpreter actions LOAD_ATTR can be found in Python/ceval.c:
The GETITEM macro (line 2) gets a pointer to the attribute which needs to be looked up (x in our example). TOP (line 3) pops the top of the stack to get a pointer to the object on which the attribute is being accessed on (the instance f of the class Foo). The PyObject_GetAttr function (line 4)takes in the attribute pointer (name) and the pointer to the object (owner) on which the attribute is being accessed and does the magic of attribute lookup.
If we look inside the PyObject_GetAttr function, we see a few interesting things happening:
First, a pointer to the type (Foo in our case. Note that type and class are practically the same and will be used interchangeably in this post) of the object is retrieved (line 4). This is important since it is the type of the object which determines how different operations like attribute access will be performed on the object. Then some basic sanity checking is performed on the attribute name being looked up. And finally two somewhat cryptic looking methods tp_getattr and tp_getattro (lines 14 and 20) are looked up and invoked on the type object depending on which of these is available on the type. Let’s take a deeper look at the tp_getattr and tp_getattro and figure out what these methods are actually doing.
tp_getattr and tp_getattro are function pointers which point to either the PyObject_GenericGetAttr function in Objects/object.c or to the slot_tp_getattro or the slot_tp_getattr_hook functions in Objects/typeobject.c. In the following paragraphs we will be looking at the implementations of all three of these functions. Also note that tp_getattr is now deprecated as per the python docs and tp_getattro should give us all we need in terms of understanding attribute lookup.
tp_getattro is something which is known as a slot in the CPython lingo. Slots are essentially attributes on a type which point to functions and structures (which may in turn point to other functions). These functions are what the python C interpreter will actually invoke when operations are performed on an object instantiated from that type. Besides tp_getattro, the PyTypeObject (the main structure which defines a new type) has many attributes which are detailed here. I encourage you to take a look.
As mentioned above, tp_getattro could point to one of three functions generally — The PyObject_GenericGetAttr function in Objects/object.c or to the slot_tp_getattro or the slot_tp_getattr_hook functions in Objects/typeobject.c. The decision on which function tp_getattro will point to actually happens at the time of class definition. To better understand this, let’s take a look at what happens when the CPython interpreter encounters a class definition:
Definition of a new class results in the creation of a new type. This type creation is managed by the function static PyObject * type_new in Objects/typeobject.c. This is a beefy function which manages the creation and initialization of a new type. Towards the end of this function, there is a call to a function named fixup_slot_dispatcher. This function is responsible for wiring up appropriate functions to each of the slots defined on the type object depending on which of the special dunder methods the user defined class overrides. Which function fixup_slot_dispatcher installs for the tp_getattro slot depends on whether or not the __getattr___ or __getattribute__ dunder methods are overridden in the class. If either of __getattr__ or __getattribute__ is overridden by the class , the fixup_slot_dispatcher installs the function slot_tp_getattr_hook in the the tp_getattro slot. Otherwise PyObject_GenericGetAttr is installed.
Let's begin by looking at how PyObject_GenericGetAttr operates. This function implements what could be called the normal attribute lookup procedure and will most likely be doing the job of managing attribute access in the your day to day python programming.
PyObject_GenericGetAttr has a simple signature taking in a pointer to the object the attribute is being looked up on and a pointer to the name of the attribute being looked up. PyObject_GenericGetAttr delegates the lookup call to the function _PyObject_GenericGetAttrWithDict. This method implements pythons attribute lookup protocol which essentially goes like this:
- Lookup the attribute in the dictionary of the classes which make
up the objects MRO (Simply put, the MRO is the order in which attribute lookup gets resolved in an inheritance hierarchy. More details could be found here).This task is done by the method _PyType_Lookup (line 26)
- If the attribute is found in the dictionary of the classes which make up.
the objects MRO, then check to see if the attribute being looked up points to a Data Descriptor (which is nothing more that a class implementing both the __get__ and the __set__ methods).
If it does, resolve the attribute lookup by calling the __get__ method of the Data Descriptor (lines 28–33).
- If we don’t have a Data Descriptor, then try to resolve the attribute lookup by looking at the instance dictionary of the object on which the attribute lookup is being performed (lines 37–48).
- If the instance dictionary does not contain the attribute being looked up
but the attribute was found in the dict of classes making up the objects MRO and the attribute pointed towards a Non-Data Descriptor (class only implementing the __get__ method) then resolve the attribute lookup by calling the __get__ method of the Non Data Descriptor. Else just return the attribute found in the class dicts (lines 52–59).This is how one is able to access class variables via an instance of a class
- Raise AttributeError if lookup is not resolved by any of (2) — (4) (line 61)
Now let’s see what happens in slot_tp_getattr_hook when you override the __getattribute__ or __getattr__ method:
The slot_tp_getattr_hook function first needs to determine which of the special dunder methods has been overridden by user code. It first checks if the __getattr__ method has been overridden or not (lines 8–9). If not , then it must be the case that user code has overridden __getattribute__ otherwise the fixup_slot_dispatcher function would not have installed the slot_tp_getattr_hook function in the tp_getattro slot of our type in the first place. So if user code has overridden the __getattribute__ method, then CPython delegates the call to this method right away by installing the slot_tp_getattro function in the types tp_getattro slot and then calling it. The tp_slot_getattro function does nothing more than simple invoke the user overridden __getattribute__ method.
Things get a little more interesting when user has overridden the __getattr__ method. The CPython interpreter does not directly invoke the user defined __getattr__ method. The interpreter first checks if the __getattribute__ method is defined on the type. One thing to note about the __getattribute__ method is that it will always be defined on a type (unless you are doing some diabolic work with the C Python API) and will point to the PyObject_GenericGetAttr even if you do not explicitly override it. This is because all types inherit from the PyBaseObject_Type (object) type and this type has the __getattribute__ method defined and hooked up to call PyObject_GenericGetAttr. So all that the lines 14–18 in above code snippet are doing is checking to see if there is a __getattribute__ method defined and if this method is the same as the one latched onto the base object class and if so, then follow the usual object lookup protocol by invoking PyObject_GenericGetAttr. If not, then invoke the user overridden __getattribute__ method. Only if the PyObject_GenericGetAttr or the user overridden __getattribute__ method raises an “AttributeError” , the user overridden __getattr__ method gets invoked (lines 24–26).
We have looked in detail how attribute access works on instances of a class. What about attribute access on a class? Attribute access on a class is not very different from how attribute access works on instances of a class. Why is this? Because classes themselves are instances of type type. Similar to attribute lookup on instances, attribute lookup on a class results in the generation of the LOAD_ATTR opcode. If the user does not override the __getattribute__ or the __getattr__ methods in the type (aka meta class) of the class on which the attribute is being looked up on, the interpreter calls type_getattro in Objects/typeobject.c to resolve the attribute lookup . This function is similar to the PyObject_GenericGetAttr function. If __getattribute__ or the __getattr__ methods are overridden in the meta class, then just like in the case of attribute lookup on instances, attribute lookup on classes gets resolved via the slot_tp_getattr_hook route.