An in-depth look at objects and types in the C-layer of CPython interpreter

Mahdi Haghverdi
15 min readJul 16, 2023

--

We have often heard that everything in Python is an object. Although this sentence is true, a question arises! C is not Python, Python is not C, so how can we write something in C, which then we will be able to call everything a object in Python? This is the question Guido van Rossum has answered about 33 years ago.

If you have seen the Python source code, you will encounter such lines:

As you may have guessed, these are functions for datetime type. The return types of all of them are a pointer to the PyObject type, also, the type of their parameters are PyObject too!

These are the C API functions for list in Python:

And this is also part of the most important file (in our context) and one of the the oldest source code files of CPython, which, as they say, This is iheritance built by hand :D

In this article we will talk about:

  • a C structcalled PyObject
  • a C struct for Python Types
  • tuple object and type

A structure called PyObject

As I showed above, a cursory look at the CPython source code will show you how much PyObject struct is used, in fact, when the interpreter executes our code in the interpreter loop (a very large loop that executes bytecodes one by one), considers all values ​​in the evaluation stack as PyObject. In other words, we can say that this PyObject is the superclass of all Python objects (as we say in Python, the object class is the superclass of all our classes and they all inherit from it.) Of course values ​​in CPython are never declared as PyObjects, but always A pointer to values ​​*can* be cast to PyObject.

A word about C structs
when we say:
Values ​​in Python are never declared as PyObjects, but always a pointer to values ​​*can* be cast to PyObject. we are referring to an implementation detail dependent on the C programming langugage and how it interprets data at memory locations. C structs which are used to represent python objects are just groups of bytes which we can interpret in any manner which choose to. For example, a struct, test, maybe composed of 5 short values each 2 bytes in size and summing up to 10 bytes. In C, given a reference to ten bytes we can interpret those ten bytes as test struct composed of 5 short values regardless of whether the 10 bytes were actually defined as a test struct — however the output when you try to access the fields of the struct maybe gibberish. This means that given n bytes of data that represent a Python object where n is greater than the size of a PyObject, we can interpret the first n bytes as a PyObject. (Inside the Python Virtual Machine, 2018, p. 37)

Here is PyObject:

Which the macro is:

The job of the macro is to replace _PyObject_HEAD_EXTRA with what is in-front of it, which will be:

which defines two fields _ob_next and _ob_prev that point to the previous and next created object, which is an implicit doubly linked list, which means that all Python objects are connected to each other like a chain :D

The ob_refcnt field, which is a number in fact, is used for memory management, and the important ob_type field is a pointer to another structure that specifies the type of that object. This pointer specifies that: It is this type that determines what the data represents, what kind of
data it contains and the kind of operations that can be performed on that object.

which can be looked at in Python, like this:

This name is referring to a string object whose type is str.

The million dollar question

How this is possible?!

A word about reference counting
CPython uses reference counting for memory management; This is a simple method that when a new *name* is bound to an object (like the above example for name and that ‘obj’), the reference count of that object is increased, and vice versa, when a reference to an object goes away (for example, we use
del keyword to delete a reference) reference count decreases. Note that when the reference count of an object becomes zero, that object is deallocated by the VM.

Very well, Til now we talked that all the values ​​can be casted to PyObject, and we also talked about the implementation detail that exists in C structs, but can we see it in the source code as well? Here are some examples:

What do they all have in common? Yes, exactly this macro:

This macro leaves an ob_base field of type PyObject for them, which makes it possible to cast them to PyObject (according to the above explanation about C structs).

Type PyVarObject sister of PyObject

First, let’s see the source code of the list object:

There is no PyObject_HEAD , however there another one, this macro:

which leaves an ob_base of type PyVarObject for the object, but what is this PyVarObject that we know as PyObject’s sister?

This is it:

This struct is used for objects such as list s, which have some *notion* of length, and they keep the number of internal items of that object; And this is why the len function in Python is always O(1) 😁 because it only has to read and return a value.

A structure for types

Types in CPython are defined by the _typeobject struct. This C struct is actually the base of all the types used in CPython and it has many fields that are mostly pointers to other C functions that implement the functionality of that type.

which is:

All these field are nicely documented in here:

But let’s check this structure with an example.

tuple type

First of all the tuple object:

And this is the tuple type:

  • The macro at the beginning of this struct:

Do you remember we said that we can cast all objects to PyObject? And do you remember that we introduced PyVarObject, the sister of PyObject, which this PyVarObject itself could be converted to PyObject? And do you remember what fields we had in PyObject?

One field was ob_refcnt and one field was ob_type, and we are exactly setting the ob_type of this struct with this macro! And we are going to answer that million doller question, yes we are setting its metaclass to type type of Python.

This will cause us to see the following output:

Type is the default metaclass of all Python classes, which if we look at its source code:

This is the answer, the type of type is type !

Later in the article, I will explain why PyVarObject_HEAD_INIT is used to bring the ob_size field in this type.

  • tp_name field:

This field is used to name this type, which is tuple in our example. Actually, __name__ reads this field and returns it to us:

This field, when the class is in a specific module or package, specifies the name of that module and package using dots in its name:

  • tp_basicsize and tp_itemsize fields:

in the documentation:

In fact, Python uses these fields to know how much memory it should occupy when creating an instance of this object and type. (let’s say an empty tuple)

There are two cases for tp_itemsize. According to the documentation, for objects that are variable-size (like our example), this field must store the size othe value they hold (which are all pointers from PyObject which is 4 bytes in 32-bit and 8 bytes in 64-bit systems), and according to the documentation, their size is calculated as follows:

Remember, above I said that I’ll explain why the PyVarObject_HEAD_INIT macro is used and why the field ob_size should exist? So the documentation tells us (red lines):

It says that for variable-length objects, the field ob_size must exist, why? Because it needs this field to calculate the size of that object, we need N, which is the *length* of that object, and that length value is stored in ob_size. Look here:

You see that the size is increased with 8 bytes.

The second case that happens for tp_itemsize is that the type is not a variable-length object and is a statically allocated type object, and in that case this field should be considered 0 for that object, like our own type:

  • The tupledealloc function for the tp_dealloc field:

This function specified here, performs the action of destroying and deallocating that object from the memory. Interesting points mentioned in the documentation:

  1. This function is not defined for types that never disappear, namely None and Ellipsis!

2. For the second point, first see this code snippet from tupledealloc:

You can see that a loop (by the number of objects that the tuple kept inside itself (Py_SIZE actually returns the len of that object)) is decreasing one of the references of the objects that the tuple kept inside itself. And of course, it is natural to see this, the tuple is being deleted from the memory and their is no longer a tuple to hold them.

You see, when we put s in a tuple, one was added to its references, and when we removed the only name that pointed to that tuple, that reference was also subtracted from s.
What does the documentation say:

3. But the documentation has other instructions to *how to write* this function:

All of which are followed in order in tupledealloc:

  • The tuplerepr function for the tp_repr field:

The documentation:

But the underlined lines are one of the best practices for writing __repr__: this function should return an str value that, in the right conditions, if we pass it to the eval function, it will create that instance for us; But what is less heard is that if it is not possible, it should return the str value starting with < and > telling what type is that object and what is holds.

  • tuple_as_sequence values:

This field, which specifies the Sequence Protocol for the tuple, and has these sub-slots:

This means:

  • Tuples have a len value and we can pass them to the len function
  • We can concatenate them with +
  • We can use repetition on them
  • We can get their items with index
  • And use the in operator
  • tuple_as_mapping values:

If you are surprised, oh! We can use the slices on the tuples, so you might say

I must say that the function that provides the possibility of subscripting tuples is included in this protocol:

If you pay attention, it supports slides.

  • tuplehash function:

This function also does not need any additional explanation:

But what about list s? If we go to see its tp_hash field:

Which brings us to:

It’s a recognition error, isn’t it? 😁

  • PyObject_GenericGetAttr function:

According to the documentation:

But in short, this function implements the normal routine of getting an attribute from an object.

  • tp_doc field:

This field sets the value of __doc__ and is actually the same thing that the help function shows us:

  • tp_richcompare field and tuplerichcompare function:

The function placed in this field supports comparison operations for that type:

  • tp_iter field and tuple_iter function:

Suppose we want to write a class that can use the for loop on it, what should we do? We have to implement the iterable protocol:

In simple language, iterable objects are objects that give us an iterator object when we pass them to the iter function! In the documentation of Python, it is mentioned again that if containers want to be iterable, they must implement the __iter__ method, and in the C layer, that type must give the appropriate function to the tp_iter slot:

So, so far we have realized (and of course we knew) that tuple is an iterable container!

Now, what is the iterator object?

An object that implements these two methods is called an iterator.

Now let’s review: if we want to (for example) do a for loop on an object, that object must have __iter__, what does __iter__ return? An Iterator. What is the function of iterator? This means, every time we call the next method (the for loop does this by itself), it returns an item to us, and when there are no more items, it raises StopIteration exception. Now let’s see how this happens intuples.

First of all __iter__:

which is this function:

Exactly as the doc said, it’s just returning an iterator.

Now let’s see _PyTupleIterObject type:

A PyObject_HEAD that we talked about, an index and a PyTupleObject, that’s it.

But those of us who are reading this article are careful that this is an object, so what is the thing that it *must* have? Yes, it must have a type (so that ob_type should be set by something)

Now what is the type of this PyTupleIterObject:

What is important is exactly this tupleiter_next function. Let’s see this function:

That is very simple. If you pay attention, it takes an PyTupleIterObject and uses its index field, and takes the next item with PyTuple_GET_ITEM and returns it.

  • tp_methods field

This field stores the methods of that type, for example, tuples have the index method, it is defined here:

This macro is:

which finally reaches this function (the tuple_index function calls this function):

You can see a for loop that checks the items one by one and returns the index if found.

  • tp_new field:

And finally, the function that creates a new tuple object:

This function is responsible for creating a new tuple object, and actually __new__ is considered to be of that type.

Relationship between types and objects

Simply defining and writing a C struct that says what fields this object is supposed to have, does not create and define a new object or type in Python. One of the essential fields of PyObject is the ob_type field, which is actually the most important struct defines the object, specify its creation and destruction, tells what protocols it supports and… . So it can be concluded as follows: from connecting an object (which has PyObject_HEAD (or its variant like PyVarObject_HEAD) and defining its ob_type field as a struct of PyTypeObject , we have a new object or type in Python that we can use.

😂😂😂 which becomes something like this:

If you want to write your own type and use it in Python, this tutorial can help you:

https://docs.python.org/3/extending/newtypes_tutorial.html

I hope you enjoyed the article.

--

--

Mahdi Haghverdi

Young Python Enthusiast | Writes about Python, CPython and software engineering.