# Advantages of using NumPy over Python Lists

## Features and performance gains of using NumPy for numerical operations

In this article, I will show a few neat tricks that come with NumPy, yet are must faster than vanilla python code.

# Memory usage

The most important gain is the memory usage. This comes in handy when we implement complex algorithms and in research work.

`array = list(range(10**7))np_array = np.array(array)`

I found the following code from a blog. I will be using this code snippet to compute the size of the objects in this article.

`get_size(array) ====> 370000108 bytes ~ 352.85MBget_size(np_array) =>  80000160 bytes ~  76.29MB`

This is because NumPy arrays are fixed-length arrays, while vanilla python has lists that are extensible.

# Speed

Speed is, in fact, a very important property in data structures. Why does it take much less time to use NumPy operations over vanilla python? Let’s have a look at a few examples.

## Matrix Multiplication

In this example, we will look at a scenario where we multiply two square matrices.

`from time import timeimport numpy as npdef matmul(A, B):    N = len(A)    product = [[0 for x in range(N)] for y in range(N)]    for i in range(N):        for j in range(N):            for k in range(N):                product[i][j] += matrix1[i][k] * matrix2[k][j]    return productmatrix1 = np.random.rand(1000, 1000)matrix2 = np.random.rand(1000, 1000)t = time()prod = matmul(matrix1, matrix1)print("Normal", time() - t)t = time()np_prod = np.matmul(matrix1, matrix2)print("Numpy", time() - t)`

The times will be observed as follows;

`Normal 7.604596138000488Numpy 0.0007512569427490234`

We can see that the NumPy implementation is almost 10,000 times faster. Why? Because NumPy uses under-the-hood optimizations such as transposing and chunked multiplications. Furthermore, the operations are vectorized so that the looped operations are performed much faster. The NumPy library uses the BLAS (Basic Linear Algebra Subroutines) library under in its backend. Hence, it is important to install NumPy properly to compile the binaries to fit the hardware architecture.

## More Vectorized Operations

Vectorized operations are simply scenarios that we run operations on vectors including dot product, transpose and other matrix operations, on the entire array at once. Let’s have a look at the following example that we compute the element-wise product.

`vec_1 = np.random.rand(5000000)vec_2 = np.random.rand(5000000)t = time()dot = [float(x*y) for x, y in zip(vec_1, vec_2)]print("Normal", time() - t)t = time()np_dot = vec_1 * vec_2print("Numpy", time() - t)`

The timings on each operation will be;

`Normal 2.0582966804504395Numpy 0.02198004722595215`

We can see that the implementation of NumPy gives a much faster vectorized operation.

Numpy vectorized operations also provide much faster operations on arrays. These are called broadcast operations. This is because the operations are broadcasted over the entire array using Intel Vectorized instructions (Intel AVX).

`vec = np.random.rand(5000000)t = time()mul = [float(x) * 5 for x in vec]print("Normal", time() - t)t = time()np_mul = 5 * vecprint("Numpy", time() - t)`

Let’s see how the running times look;

`Normal 1.3156049251556396Numpy 0.01950979232788086`

Almost 100 times!

# Filtering

Filtering includes scenarios where you only pick a few items from an array, based on a condition. This is integrated into the NumPy indexed access. Let me show you a simple practical example.

`X = np.array(DATA)Y = np.array(LABELS)Y_red = Y[Y=='red'] # obtain all Y values with REDX_red = X[Y=='red'] # feed Y=='red' indices and filter X`

Let’s compare this against the vanilla python implementation.

`X = np.random.rand(5000000)Y = np.int64(10 * np.random.rand(5000000))t = time()Y_even = [int(y) for y in Y if y%2==0]X_even = [float(X[i]) for i, y in enumerate(Y) if y%2==0]print("Normal", time() - t)t = time()np_Y_even = Y[Y%2==0]np_X_even = X[Y%2==0]print("Numpy", time() - t)`

The running times are as follows;

`Normal 6.341982841491699Numpy 0.2538008689880371`

This is a pretty handy trick when you want to separate data based on some condition or the label. It is very useful in data analytics and machine learning.

Finally, let’s have a look at `np.where` which enables you to transform a NumPy array with a condition.

`X = np.int64(10 * np.random.rand(5000000))X_even_or_zeros = np.where(X%2==0, 1, 0)`

This returns an array where even-numbered slots are replaced with ones and others with zeros.

These are a few vital operations and I hope the read was worth the time. I always use NumPy with huge numeric datasets and find the performance very satisfying. NumPy has really helped the research community to stick with python without levelling down to C/C++ to gain numeric computation speeds. Room for improvements still exists!

Cheers!

## The Startup

Get smarter at building your thing. Join The Startup’s +793K followers.

### By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

Medium sent you an email at to complete your subscription.

Written by

Blogger | Traveler | Programmer PhD Scholar

## The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +793K followers.

Written by

Blogger | Traveler | Programmer PhD Scholar ## The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +793K followers.

## More From Medium

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium