The Reasons Behind Numpy’s Speed are Often Misunderstood

Articles online give misleading, even incorrect, reasons. Part 1

4 min readSep 19, 2022

Notebooks can be found on my GitHub

In my research (see References), articles online give (a variation of) these 3 reasons for Numpy’s superior performance. Let’s go over them one by one.

1. Parallelism

Assumption: Numpy executes code in parallel, speeding up calculations. Found on various GeeksForGeeks and Medium articles.

Wrong.

We can test this by trying a long-running Numpy operation, like a matrix multiplication; the bread and butter of Numpy. Let’s open htop and watch the CPU load by core. If Numpy is running in parallel, we should expect all CPU cores to have high utilization.

Let’s see what actually happens

Numpy CPU usage

And now let’s look at Dask, an excellent parallel computation library and in many ways a straight-up improvement over Numpy.

Dask CPU usage

This is how parallel execution is supposed to look like. Full utilization of your CPU. Therefore, parallelism is NOT why Numpy is fast.

2. Numpy arrays use far less memory than lists — from here

True. I was surprised the difference was this big, but Python Lists take up 5x more memory than Numpy arrays in my test below. Note: you cannot use system.getsizeof() to get the correct size of objects. See this Stackoverflow post.

Python lists vs. Numpy memory usage

3. Contiguity, or Why Swapping Lists for Numpy Arrays Isn’t Going to Save You

Argument for Contiguity: If all data are “neighbors” in memory (memory locality), computation is faster

But in reality: If all data are “neighbors” in memory, computation CAN BE faster. You need to work for it. Unfortunately, most people would still use a for loop. Check this out, just summing up the elements of Lists vs Numpy arrays:

Numpy slower than lists!!

Numpy is actually 50% slower than Python lists at just iterating over indices like this!! See this great Stackoverflow post for why

Bad Numpy usage example — from a Medium Article!

Tons of examples online show NOT TO use Numpy. Example from here. Let’s study what’s wrong with it.

Bad benchmark example. Found here

There’s a major thing wrong with this test: we want to quantify how fast Python lists vs. Numpy arrays sum 1B numbers. But crucially, above we’re also timing the range Python standard function, and below we’re timing the Numpy arange function. These two are NOT equivalent!

For one, Python’s range only works with integer steps, and Numpy takes floats, among other differences. See below, arange is 10x slower! This messes up any comparative value from this test.

Numpy arange() is much slower than Python range()

Second mistake is doing benchmarks only once. You cannot confirm a hypothesis with just 1 observation. Much better to use %%timeit in a Jupyter Notebook, which runs the code several times and gives an average and standard deviation.

Also, we need to use Numpy functions as much as possible to see the most benefit. “Let the Numpy do the work”, to misquote Gordon Ramsay.

Same example as above, Numpy is much more than just 50% faster, it’s nearly 100x faster!

Example above, but much improved

Join me in Part 2 to explain from a fundamental level, why Numpy is so fast? Hint: my 2nd reference. I’m writing the article as we speak and it will be up very soon.

Notebook where I ran all my experiments can be found here.

References

--

--

Ariel Lubonja
Ariel Lubonja

Written by Ariel Lubonja

I am a PhD student in Computer Science at Johns Hopkins University. Area: High Performance Computing, Graph Machine Learning

No responses yet