How list.append is implemented in C-layer of CPython interpreter?

Mahdi Haghverdi
13 min readJul 19, 2023

--

Usually, the list data type is one of the first data types taught in Python tutorials, and accordingly, the append method is one of the first method calls that are performed by us. But here is the CPython list and we want to deep dive to CPython and see what actually appends an object to a list in Python.

In this article we examine:

  • Theoretically, how Python lists are dynamic size?!
  • What was the algorithm for Python 3.8 and earlier? And how did it change from Python 3.9 and later?
  • How exactly are these algorithms implemented in the interpreter?

Theoretically, how Python lists are dynamic size?!

In this part of the article, we will examine a concept called dynamic arrays and how to implement them!

Concept of Dynamic Array

Wikipedia:
In
computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. Dynamic arrays overcome a limit of static arrays, which have a fixed capacity that needs to be specified at allocation.

A dynamic array is not the same thing as a dynamically allocated array or variable-length array, either of which is an array whose size is fixed when the array is allocated, although a dynamic array may use such a fixed-size array as a back end.[1]

How is it done?

Very simply, if we want to look at the issue, this is how we occupy a certain amount of memory, supposedly in a C Array and supposedly the size of 4 houses, then we store our objects in it, and when the 4 houses are full, we come to a C Array We will build more, but for example, the size of 8 houses, and then we pour all the amounts into it, and so on until the end.

Wikipedia:
Bounded-size dynamic arrays and capacity
A simple dynamic array can be constructed by allocating an array of fixed-size, typically larger than the number of elements immediately required. The elements of the dynamic array are stored contiguously at the start of the underlying array, and the remaining positions towards the end of the underlying array are reserved, or unused. Elements can be added at the end of a dynamic array in
constant time by using the reserved space, until this space is completely consumed. When all space is consumed, and an additional element is to be added, then the underlying fixed-size array needs to be increased in size. Typically resizing is expensive because it involves allocating a new underlying array and copying each element from the original array. Elements can be removed from the end of a dynamic array in constant time, as no resizing is required. The number of elements used by the dynamic array contents is its logical size or size, while the size of the underlying array is called the dynamic array’s capacity or physical size, which is the maximum possible size without relocating data.[2]

A fixed-size array will suffice in applications where the maximum logical size is fixed (e.g. by specification), or can be calculated before the array is allocated. A dynamic array might be preferred if:

  • the maximum logical size is unknown, or difficult to calculate, before the array is allocated
  • it is considered that a maximum logical size given by a specification is likely to change
  • the amortized cost of resizing a dynamic array does not significantly affect performance or responsiveness

Like this picture

So, something like this must be implemented somewhere in Python’s list, which we will check in this article in two versions of Python (3.8 and 3.11) by looking at the interpreter C codes.

But there is another issue, which is the rate at which this increase is done?! According to the table in Wikipedia, the growth rate of this type of array in different languages ​​is as follows:

Python 3.8

First of all, let’s take a look at the list object itself.

If you want to learn about PyObject and get understanding about Python objects and type, read this article:

PyListObject:

its code:

PyListType:

In studying the code of the listobject.c file, the first sign and function that is found to perform the append operation is this:

Very simple, first it takes the list object and the object that should be appended to it, then it calls the app1 function, if the output of app1 is zero, it means that appending is done well and the value None is returned, if not, the value NULL indicates an error and will be returned.

Let’s look at the app1 function:

The most important part of which is line 346! Well, we know that we have to do a resize to add something to the underlying array of the PyListObject.

This is the whole list_resize function:

Mr. Raymond Hettinger optimized this function about 20 years ago and his method was used until Python 3.8:

Tracing CPython functions during an append.

Now that we have all the functions to append to a list, let’s run this code:

In order to fully understand what is happening, I wrote a minimal version of Python based on the C codes above, so that we can use a good debugger.

Here are the codes we are going to use:

I didn’t use Python tricks and methods to write exactly the same C codes, so let’s trace this:

At first, everything is according to the default things written in the PyListObject class:

Then the append method is called and this method calls the list_append function:

list_append function:

Then it calls the app1 function:

What happens:

The first line of the function gets the len value of the list. Then it checks whether it has reached the maximum size of a list, (which I think is the largest 32-bit number, but I am not sure) if it is, it will set and exception to be raised.

Then our execution process reaches the list_resize function:

What happens:

First, it takes the allocated value, then it checks to see if it can continue without resizing the list, that is, whether the list still has empty space or not! The condition checks if allocated is greater than or equal to new_size and new_size is greater than or equal to the half of allocated value. In our case if condition is false to we will continue.

Then the new_allocated value is calculated:

According to the debugger window, the value is 4. The calculation has a simple formula that says:

The new size of the list (len) + the integer part of one eighth of the new value + (3 if the new value is less than 9 and otherwise 6)

Then we come to the next condition that checks whether new_allocated has become too big or not, and if it does, it will throw an exception, which is not the case in our code! (Of course, I should mention that 1 in the C code was sizeof(PyObject *) , I just wanted to keep things simple but the actual values that are stored in the underlying array, are just PyObject pointers)

Then:

It checks again if the new size is small enough, and if it is, it does the important thing: copying the values ​​to a location with the new size. This is done by PyMem_Realloc function. This function uses realloc function to allocate new bytes of memory for the underlying array.

For that purpose I explained a new value must be calculated, which is the amount of bytes that the next function must take from the memory, the actual value of this in the CPython code is new_allocated * sizeof(PyObject *), which we set equal to one:

Inside the PyMem_Realloc function (in my Python version):

Well, what I did was to add the required number (new_allocated_byted — len(self), which means, for example, now my list has 4 members, and new_allocated says it should be 8, which means 4 empty places should be added to it) And I used ‘*’ to show the empty place for new values:

A series of simple checks are done and we arrive at:

You can see that the ob_item value of this list is updated and then:

Now the len value shown by the ob_size field has been updated and:

The allocated value is also updated and zero means the operation is done successfully!

But when is the object going to be added to the list? Right here:

which is this function:

It puts the value of “Py 3.8” in the first index of the list and returns 0 and list_append finally returns None .

Appending to a list was implemented so simply and beautifully :D But this was an empty list, obviously a resize should be done, what about list s that have values in them? Here’s what happens:

This is what happens and if we want to look at it in a much higher number, the grouth rate and the way it grows will be:

What do these numbers mean?

These numbers show that when the size of our list is 184 bytes, in the resize step it will add 8 new places to the underlying array, or for example, when the size of the list reaches 520 bytes, list_resize adds 12 empty cells to the ob_item ; or in 760 bytes, 14 empty places are added to ob_item and etc.

This graph is drawn from the data generated by my Python implementation of append using this function:

Now let’s see the results of our own implementation with the results of running this code:

Let’s see which is run with Python 3.8:

They seem really similar, so let’s put them together:

Yes, exactly overlapping :)) I couldn’t believe that we would get exactly the same results :D

Now let’s look at Python 3.11 list.append implementation.

Python 3.11

In the git blame listobject.c file of Python’s main branch, something like this can be seen:

Someone 3 years ago, for Python version:

Core developers discussion is here for more detail:

What is pull request brought to CPython source code:

You can see that both growth pattern and the new_allocated calculation formula have changed, the reasons are mentioned in the above bug tracker, and the results are significant.

First, let’s see what changes have been made to appending to a list in Python in version 3.11:

Several new functions to handle appending have been added to Python, the order of their calling and their code is as follows:

_PyList_AppendTakeRef :

Similarly, this function has a condition after checking several things and getting the len and allocated values! That is, if allocated was more than the len, increase len by one and put the value in the list and that’s it! But if the allocated value is not greater than len, we go to call the next function:

First, it gets the value of len, then it checks whether allocated is -1 (which was written in PyListObject, this happens when we call list.sort for example) or whether allocated is equal to len if so, it calls list_resize function:

So far, we haven’t had any changes, but according to the pull request given by that person, he changed the formula:

Again, let’s run the code to understand how this new formula works.

I again wrote the Python vresion of this new logic which is in the same repository.

The execution is as follows:

Now let’s see what happens if the allocation is big enough:

This was the execution flow of list.append in Python version 3.11. But one question is left: What was the effect of this change?

Let’s put the two graphs generated from Python numbers 3.11 and 3.8 on top of each other and see the difference clearly:

The orange line is Python 3.11 :)

Another small explanation: the orange line says: when the size of the list changes from 760 bytes to 904 bytes, for each resize, only 16 empty places are added to ob_item and so on.

But these numbers are only for 128 numbers, let’s increase the number to 460 objects

This is a comparison between the growth rates of these two algorithms.

In this article, I tried to show you what really happens in an append very simply, with simpler code and step-by-step results.

I hope you enjoyed it!

--

--

Mahdi Haghverdi

Young Python Enthusiast | Writes about Python, CPython and software engineering.