# Advantages of using NumPy over Python Lists

## Features and performance gains of using NumPy for numerical operations

In this article, I will show a few neat tricks that come with **NumPy**, yet are must faster than vanilla python code.

# Memory usage

The most important gain is the memory usage. This comes in handy when we implement complex algorithms and in research work.

`array = list(range(10**7))`

np_array = np.array(array)

I found the following code from a blog. I will be using this code snippet to compute the size of the objects in this article.

`get_size(array) ====> 370000108 bytes ~ 352.85MB`

get_size(np_array) => 80000160 bytes ~ 76.29MB

This is because NumPy arrays are fixed-length arrays, while vanilla python has lists that are extensible.

# Speed

Speed is, in fact, a very important property in data structures. Why does it take much less time to use NumPy operations over vanilla python? Let’s have a look at a few examples.

## Matrix Multiplication

In this example, we will look at a scenario where we multiply two square matrices.

from time import time

import numpy as npdef matmul(A, B):

N = len(A)

product = [[0 for x in range(N)] for y in range(N)] for i in range(N):

for j in range(N):

for k in range(N):

product[i][j] += matrix1[i][k] * matrix2[k][j]

return productmatrix1 = np.random.rand(1000, 1000)

matrix2 = np.random.rand(1000, 1000)t = time()

prod = matmul(matrix1, matrix1)

print("Normal", time() - t)

t = time()

np_prod = np.matmul(matrix1, matrix2)

print("Numpy", time() - t)

The times will be observed as follows;

`Normal 7.604596138000488`

Numpy 0.0007512569427490234

We can see that the NumPy implementation is almost 10,000 times faster. Why? Because NumPy uses under-the-hood optimizations such as transposing and chunked multiplications. Furthermore, the operations are vectorized so that the looped operations are performed much faster. The NumPy library uses the **BLAS (Basic Linear Algebra Subroutines)** library under in its backend. Hence, it is important to install NumPy properly to compile the binaries to fit the hardware architecture.

## More Vectorized Operations

Vectorized operations are simply scenarios that we run operations on vectors including dot product, transpose and other matrix operations, on the entire array at once. Let’s have a look at the following example that we compute the element-wise product.

vec_1 = np.random.rand(5000000)

vec_2 = np.random.rand(5000000)t = time()

dot = [float(x*y) for x, y in zip(vec_1, vec_2)]

print("Normal", time() - t)

t = time()

np_dot = vec_1 * vec_2

print("Numpy", time() - t)

The timings on each operation will be;

`Normal 2.0582966804504395`

Numpy 0.02198004722595215

We can see that the implementation of NumPy gives a much faster vectorized operation.

## Broadcast Operations

Numpy vectorized operations also provide much faster operations on arrays. These are called **broadcast operations**. This is because the operations are broadcasted over the entire array using Intel Vectorized instructions (Intel AVX).

vec = np.random.rand(5000000)t = time()

mul = [float(x) * 5 for x in vec]

print("Normal", time() - t)

t = time()

np_mul = 5 * vec

print("Numpy", time() - t)

Let’s see how the running times look;

`Normal 1.3156049251556396`

Numpy 0.01950979232788086

Almost 100 times!

# Filtering

Filtering includes scenarios where you only pick a few items from an array, based on a condition. This is integrated into the NumPy indexed access. Let me show you a simple practical example.

X = np.array(DATA)

Y = np.array(LABELS)Y_red = Y[Y=='red'] # obtain all Y values with RED

X_red = X[Y=='red'] # feed Y=='red' indices and filter X

Let’s compare this against the vanilla python implementation.

X = np.random.rand(5000000)

Y = np.int64(10 * np.random.rand(5000000))t = time()

Y_even = [int(y) for y in Y if y%2==0]

X_even = [float(X[i]) for i, y in enumerate(Y) if y%2==0]

print("Normal", time() - t)

t = time()

np_Y_even = Y[Y%2==0]

np_X_even = X[Y%2==0]

print("Numpy", time() - t)

The running times are as follows;

`Normal 6.341982841491699`

Numpy 0.2538008689880371

This is a pretty handy trick when you want to separate data based on some condition or the label. It is very useful in data analytics and machine learning.

Finally, let’s have a look at `np.where`

which enables you to transform a NumPy array with a condition.

`X = np.int64(10 * np.random.rand(5000000))`

X_even_or_zeros = np.where(X%2==0, 1, 0)

This returns an array where even-numbered slots are replaced with ones and others with zeros.

These are a few vital operations and I hope the read was worth the time. I always use NumPy with huge numeric datasets and find the performance very satisfying. NumPy has really helped the research community to stick with python without levelling down to C/C++ to gain numeric computation speeds. Room for improvements still exists!

Cheers!