I often use Python, but I really don’t care about the way how Python works internally. So today I focus on the Python list and explain inside the implementation of it. Python’s list.append and list.pop change the list size dynamically, which make them run fast in O(1) time. Please note that list.pop for the last item only takes constant time. In this post, I’ll show the reason why it has become possible.
We call the data structure like Python list the dynamic array and call normal array the static array. This post is structured as follows.
- What is a dynamic array?
- The way how dynamic array appends or pops an item
- How to balance append and pop
I explain these based on MIT’ Table Doubling, Karp-Rabin lecture.
1. What is a dynamic array?
Dynamic array resizes its size when we append or pop an item. Of course, resizing isn’t always the case. Dynamic array adjusts the balance of append and expand, pop and shrink for efficient resizing.
Dynamic array generates a new array and copies items from an old array when an item fills up an old array. The figure below shows an example of expanding an array. I separate append and copy for easily understood. Both appending an item and copying an item have the same achieve the same effect, though.
When expanding array size, the efficiency gets worse if you increase only a little or a lot. As shown in the figure above, when you expand size by only one, it’s a waste of operations. This is because every time you append an item to an old full array, all items in that array has to be copied to a new array. On the other hand, if you expand size a lot, it’s a waste of space which is empty because no item is stored.
In the dynamic array, when you pop an item from the state below, the dynamic array shrinks its size.
From the figure above, you can see the space is wasted. When shrinking the size, the efficiency gets worse if you decrease only a little or a lot. If you decrease a little, wasted space will increase. On the other hand, if decreasing a lot, you need to expand the size when you append a new item.
In the next section, I’ll explain how dynamic array avoids this wastefulness.
2. The way how dynamic array appends or pops an item
2.1 Appending an item
First, let us consider the situation that appending an item or expanding the size in the dynamic array. The dynamic array expands its size when it lacks space to append a new item. The dynamic array uses the growth factor to decides the size of a new array. Here the growth factor means how many times the size of the new array compared to the size of the old array. Each programming language has a different growth factor. This article of Wikipedia says the growth factor is 2 in C or Golang and 1.125 in Python. I use 2 as the growth factor in this post for simplicity.
Take a look at the time complexity that you append n items to the dynamic array with the growth factor, 2. I will describe the time complexity appending an item later. The size of the array varies below every time you append an item, assuming the initial array size is 2. I show the time of appending and copying at the left of the array.
The time complexity of appending n items becomes as follow.
The first half of the equation above stands the cost of appending n items, and the other half stands the cost of copying. The cost of copying equals the cost of expanding the array size. The time complexity becomes the total number of operations because appending an item or copying runs in O(1) time.
The equation below stands the geometric sequence, so we can deform it and the time complexity will be as follow. I ignore constant terms during deforming.
Finally, we get the time complexity of appending n items to a dynamic array as O(n) time. This means appending items to dynamic array runs in linear time while expanding its size at the same time.
We can think appending an item runs in linear time on average because the time complexity of appending n items is O(n). This kind of time complexity is called amortized complexity time. Also, the way to analyze the whole operation rather than each operation is called amortized analysis. By amortized analysis, list.append in Python takes linear time on average.
In general, we only care about the time to complete a whole task rather than the time of several operations. This is the reason why we use amortized analysis. For instance, in the task to store each line in a CSV file to an array, it is more important to store all lines than one line.
2.2 Popping an item
Dynamic array shrinks the array size by 1/(growth factor) along with popping an item. The wasted space will increase if the number of items is much less against the array size. This operation corresponds to list.pop in Python. Following the growth factor in appending an item, we assume dynamic array shrink the array size by half when popping an item. Popping an item from a dynamic array behaves as bellow, supposing the initial array size as 16.
Thinking the same way as appending an item, the time complexity of popping n items becomes O(n) time. So amortized time complexity is O(1) time, and popping an item from a dynamic array run in linear time while shrinking the array size at the same time.
3. How to balance appending and popping
In section 2, I explain both behaviors of appending and popping separately, we should take the efficiency of executing both operations into account. Appending and popping an item repeatedly in the below case will increase the wasted operations.
We can handle this problem to adjust the timing of expanding and shrinking. For example, we shrink the array size by half when the number of items equals the quarter of the array size. The expanding stays the same.
Even if you append or pop an item repeatedly, it won’t increase the operation times. This is because the timing of expanding and shrinking are different.
That’s all for the explanation. In this post, I explained how a dynamic array works of the implementation in Python. By amortized analysis, Python’ s list.append and list.pop run in linear time. If you are interested in other operations in the list or in other data structures in Python, I recommend you to visit TimeComplexity in Python wiki. Thank you for reading.