Underneath Numpy array & Python list

Charles Patel
DataDrivenInvestor
Published in
8 min readJul 16, 2020

--

Ihave been working with Numpy for a long time and, I always noticed that Numpy arrays are faster than python lists in terms of speed and space executing tasks. I want to share my findings regarding the in-depth compilation and execution processes of these two data structures. Let’s dig in!

There are many tutorials/blogs out there to compare Python list and Numpy array, you can refer to those. In this tutorial, we are going to explore their inner workings, how they work internally, their advantages and disadvantages, and some benchmarking to get a better idea.

Before we dive further into the performance of both data structures, First, we need to understand their internal working (the time where it is compiled into C) as both library and their data structures implemented in C.

When the python software is installed on your machine, minimally, it has an interpreter and support library. The interpreter is nothing but software that runs our Python scripts. Interestingly, it can be implemented in any programming language! CPython is the default interpreter for python which is written in C programming language.

What’s all this fuss about CPython, Cython, Jython, IronPython? I will cover all of your doubts. Please hold yourself till last and get ready for little advance terms used in python. Let us first understand all of these in detail.

CPython

  • CPython is the reference implementation of the python programming language. Written in C and Python. Developed by the same author as of python, Guido van Rossum in 1994
  • CPython compiles your python code into bytecode (transparently) and interprets that bytecode, and executing it as it goes. Python uses CPython as the base

Cython

  • CPython does not translate your Python code to C by itself. Instead, it runs an interpreter loop, So, Here Cython comes in play to translate Python code to C.
  • Cython adds a few extensions to the python language and lets you compile your code to C extensions, code that plugs into the CPython interpreter.

Jython, IronPython, and PyPy are the current “other” implementations of the python programming language; these are implemented in Java, C#, and RPython (a subset of Python), respectively. Jython compiles your Python code to Java bytecode, so your Python code can run on the JVM. IronPython lets you run Python on the Microsoft CLR. And PyPy, being implemented in (a subset of) Python, lets you run Python code faster than CPython

Now let’s dive into the internal implementation of python lists and Numpy arrays

List

A list is a most versatile datatype available in python, everyone knows what list can do and why we use the list. So, for instance, a list can be used as a stack, as a container to store values of different data types and the list has some basic functions like append, extend, pop, remove, etc.

But the question is how it is implemented?

CPython’s lists are variable-length arrays, not Lisp-style lists (uses linked lists). The CPython's implementation of the list uses a contiguous array of references to other objects and keeps a pointer to this array and the array’s length in a list head structure. This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.

When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.

Below shown a structure of list implemented in C. don’t worry about this C code, This is just for your knowledge, So if someone asks how the list works behind the scene then you can at least say something about list object structure in C and also, I believe details are always better to know.

Now let’s take a look at how numpy works.

Numpy

In this section, we will see what happens under the hood when using the NumPy array.

A NumPy array is a multidimensional array of objects of the same type and is basically described by metadata (notably the number of dimensions, the shape, and the data type) and the actual data.

The data is stored in a homogeneous and contiguous block of memory, at a particular address in system memory (Random Access Memory, or RAM). This block of memory is called the data buffer.

This is the main difference between an array and a pure python structure, such as a list, where the items are scattered across the system memory. This aspect is the critical feature that makes NumPy arrays so efficient.

Here’s a quick look at how a numpy array objects look like in C. The structure PyArrayObject has four elements (*data, nd, *dimensions, *strides) that are needed in order to access the array's data from C code, please ignore other fields for brevity:

Numpy arrays structure in C

Why NumPy efficient?

  • Computations on arrays written very efficiently in a low-level language such as C (and a large part of NumPy is actually written in C). Knowing the address of the memory block and the data type, it is just simple arithmetic to loop over all items, for example. There would be a significant overhead to do that in python with a list.
  • Spatial locality in memory access patterns results in performance gains notably due to the CPU cache. Indeed, the cache loads bytes in chunks from RAM to the CPU registers. Adjacent items are then loaded very efficiently (locality of reference)
  • Finally, the fact that items are stored contiguously in memory allows NumPy to take advantage of vectorized instructions of modern CPUs, such as Intel’s SSE and AVX, AMD’s XOP, and so on. For example, multiple consecutive floating-point numbers can be loaded in 128, 256, or 512 bits registers for vectorized arithmetical computations implemented as CPU instructions.
  • Additionally, NumPy can be linked-to highly optimized linear algebra libraries such as BLAS and LAPACK through ATLAS or the Intel Math Kernel Library (MKL). A few specific matrix computations may also be multithreaded, taking advantage of the power of modern multicore processors.

However, numpy has some disadvantages too. It must require continuous memory allocation. Insertion and deletion operations become costly as data is stored in contiguous memory locations

In conclusion, storing data in a contiguous block of memory ensures that the architecture of modern CPUs is used optimally, in terms of memory access patterns, CPU cache, and vectorized instructions which is not possible in a python list.

Python’s list support efficient insertion, deletion, appending, and concatenation, and Python’s list comprehensions make them easy to construct and manipulate.

However, python’s list has certain limitations:

  • Python’s list does not support “vectorized” operations, like element-wise addition and multiplication
  • Python’s list can contain objects of differing types mean that python must store type information for every element and must execute type-dispatching code when operating on each element. which leads to checking types at each iteration.

So, we came to know why numpy so fast and efficient as compared to python’s list. Now, let’s compare its performance at a very high level.

Performance Comparison

In this section, we are going to benchmark python’s list and Numpy array in terms of memory consumption and time taken to execute. First import, necessary modules, and compare the performance of array/list for different sizes. In this example, we are going to take the ratio between the results from time/memory taken by python list and numpy array and see which performs better. Also, the ratio will be taken as a python list: numpy_array, so the results show how much times numpy is better than the python list.

Memory Consumption

Here’s the quick example to show memory consumption to store different sizes of arrays.

Let’s plot this as a graph to visualize better by taking the ratio between two.

The above graph shows numpy outperforms for all different sizes. this is because of python stores type information for each element which certainly requires more space unlike numpy, which has all the elements of the same type.

Time Comparison

The below example is about computing squares of every element of list/array and measures the time.

In the above graph, for size=1 the python list performs better and in all the other cases numpy is performs better. The reason is, although python list is fast (both are using C as backend), its just an overhead of checking dtype for each element. In numpy, with type checking it also has to check other header information that we see in its C structure i.e dimension, stride, etc. whereas in python only dtype is enough.

Second Example, a vectorized operation, this example adds the elements of two array.

In the above graph, for all sizes, numpy is an out-performer, because of its ability to do operations element-wise, and due to highly optimized linear algebra libraries, it can also take advantage of computing within multiple threads. But there are a lot of factors at play here, including the underlying library used (BLAS/LAPACK/Atlas)

Finally, I hope I shed some light on the inner working of python’s dynamically allocated list and NumPy's array.

Thanks for reading. Peace.

References

--

--