How Python’s List works so dynamically and efficiently: Amortized Analysis

Published in

Techspace

5 min readAug 16, 2019

Suppose you want to create an array but you don’t know about its size beforehand, this could be a problem because in computer languages you need to specify the size of an array while initializing it.

But, Python allows you to create a List(Pythonic way of saying array) which gives you freedom from defining size at initialization.

Best Coding Languages to Learn in 2019 | Data Driven Investor

During my years as an undergrad, I skipped many night-outs to pick up Java hoping it would one day help me get ahead in…

www.datadriveninvestor.com

Those who know python, Do you ever wonder how Python does that in an efficient way because when you think about it you will realize that it is a very computationally expensive and impossible task to define an array of size infinite. Also, your system cannot give large chunks of extra storage to a single list just to make sure that list never gets out of storage.

And, also the system cannot allocate list more storage later as it might have already allocated the adjacent storage to other variables.

So, How does Python manages List to store an indefinite amount of data without asking to explicitly mention its size?

Answer:

Python declares a size by itself and when the list fills completely it makes a new list with more space and copies the entire list’s elements to the new list of more size and deletes the smaller sized list.

What you are thinking right now: That’s a very bad idea because this would take Ω(n²) to insert every new element. And you would show me some mathematical proof which looks something like this:

Well, you are not wrong but I haven’t told you the whole answer yet. I am not saying that increasing the size by 1 every time a new element enters is the correct answer because this would be a very bad optimization.

Correct answer:

Instead of increasing size by 1, we increase the size by some predefined equation. To explain this, let’s first write one small script to know the size of the list at each length.

The output of this script shows the current length of the list corresponding to its size in bytes.

I run this script on 64-bit machine which stores addresses in 8 bytes, if you are on 32-bit machine your size would be different but the logic will be same.

You can get length of the list in constant time as python stores length of the list explicitly.

As you have noticed that when the length of the list is 0, the size is 64 bytes this is because python stores reference to the list and length prior to adding elements to it.

First, after adding first 4 elements into the list, the size remains 96 bytes in which 64 bytes are reserved so we get 96–64=32 bytes(4 elements can be stored).

When the 5th element added, the size becomes 128 bytes with the capacity of 8 elements.

When 9th element gets added, the size becomes 192 bytes with the capacity of 16 elements.

Now, when we add the 17th element, the size becomes 264 bytes with the capacity of 25 elements.

Size changes only when the length of list becomes: 1, 5, 9, 17, 26…

But, what is the relation between these sizes and how python knows which size should be picked next?

The equation python follows is :

when n ≥ 1 and n < 9, where n = current length of the list.

size = ceil(1.125 * n + 3)

If n ≥ 9 the equation slightly changes as :

size = ceil(1.125 * n + 6)

This equation will only be applied when we need to resize the list to get the next larger list of size calculated by this equation.

From the above equation, we can see that the time complexity of each append operation will be O(n). But, this analysis is incorrect.

Now, we will do its correct analysis

By Amortization:

When we calculated the complexity, we assumed that every time an element is inserted the append function gonna take O(n) due to resizing and copying the elements in larger list but in reality the only time append costs O(n) is that when resizing is needed other than that append function takes O(1) time.

For example, when n = 4 and a new element is to be inserted the append will take O(n) time but now n = 5 and a new element will take O(1) time till n becomes n=8.

Therefore, the time complexity of each append call is O(1).

I can prove in practice that appends call is performed in constant amount of time by this script.

For running this script with different n, we are getting approximately the same average running time of append function thus verifying our theory.

How Python’s List works so dynamically and efficiently: Amortized Analysis

Best Coding Languages to Learn in 2019 | Data Driven Investor

During my years as an undergrad, I skipped many night-outs to pick up Java hoping it would one day help me get ahead in…

Written by Aakash Bindal