Save your time with these numpy techniques

5 min readApr 20, 2023

list or array is a crucial data structure used very often in programming nowadays, due to a lot of data that we can process. However, a lot of times that people who use list in python doesn’t use it efficiently. In Python, there is a library called “Numpy”, which allow vectorization to speed up the calculation and reduce a tons of boilerplate code.

Initialization

First you have to install numpy library, you can see how to install here. Then we can import as the following

import numpy as np

you can use only import numpy as well, but import numpy as np will help us to abbreviate the word numpy to np, which makes things easier when we reference it.

Then we can create a numpy array as the following:

my_array = np.array([1,2,3,4,5])

In this way, it will provides you a numpy array [1,2,3,4,5]. In addition, we can also use numpy library to create an array of sequence like the following

np.arange(1,10,1)

If we use list, instead, we have to write the following

[i for i in range(1,10,1)]

list(range(1,10,1))

Now, when we are using the array data for plotting graph, we want to split the axis into the equal size, we can use the following function:

np.linspace(1,10,4)

Which we can use in matplotlib like the following:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(1,10,4)
y = x**2
plt.plot(x,y)

You might curious why we can use power operation while x is a numpy array, this is because “broadcasting” in numpy. We will explore about it later down below.

Broadcasting

There are many situation when value in array would be treated the same and represent the corresponding position. Using loop would take a lot of times, and boilerplate code for for loop itself. Here is the difference

old_arr = [1,2,3,4,5]
new_arr = [x**2 for x in old_arr]

and

old_arr = np.array([1,2,3,4,5])
new_arr = old_arr**2

As you can see, it simplified the code by a lot, and the code is more readable. The broadcasting also applies to the custom function or any function as well. For example :

import math
old_arr = np.array([1,2,3,4,5])
new_arr = sqrt(old_arr) # check this later

and you also use custom function like this:

def some_process(num):
  return 10 if num % 2 == 0 else 20
old_arr = np.array([1,2,3,4,5])
new_arr = some_process(old_arr)

Mutation Problem

Let’s take a look at this example

A = np.array([1,2,3,4,5])
B = A

B[1] = 10

print(A)
print(B)

As we can see, when we assign the array A to B. array B reference to the original array A, therefore, any change in value in array B will affect in array A as well.

We can avoid this problem by using copy() method, to copy the numpy array as a different instance, so that changing value wouldn’t affect the original numpy array.

A = np.array([1,2,3,4,5])
B = A.copy()

B[1] = 10

print(A)
print(B)

Note : this would be different when the element in the array is not primitive as the following.

class Bag:
  def __init__(self, val):
    self.val = val
    self.capacity = 30

A = np.array([Bag(10), Bag(20), Bag(30)])
B = A.copy()

B[1].val = 100

print(A[1].val)
print(B[1].val)

In this case, you need to use the array construction with the list comprehension, or use deepcopy() from copy library, or you could also use deepcopy() directly if you used torch.Tensor()

Indexing & Slicing

Now, what if we would like to access an element or some group of elements? This could be done by indexing and slicing.

Indexing :

As you might have seen it or already know, we can access the element at the certain index with the [] operator like the following:

A = np.array([10, 20, 30, 40,50])

print(A[0]) # print the 0-th index element (first element) => 10

But you could also use multiple index like the following:

A = np.array([0,10,20,30,40,50,60])
print(A[[0,2,4,5]]) # This will return [0,2,4,5]

For simpler understanding, this mean that you could access multiple index in this format :

numpy_array[list_of_index]

Therefore, we can create a list of index with some specific condition, for example, only indexes that can divided by 4.

divided_by_four = [i for i in range(7) if A[i]%4 == 0]
print(A[divided_by_four])

However, it is still long, we can also do the following

print(A[A % 4 == 0])

To simplify, this would be similar to applying some filter mask that has boolean value for each position like the following

Illustration when we apply some filter mask on numpy array

From now on, for any conditional indexing that apply to every item in the numpy array, you can use it in this format

numpy_array[condition]

Slicing :

Instead of specify the index of the array, we can “slice” the array with a range of an element, for example :

A = np.array([10,20,30,40,50,60])
print(A[1:4])

This is similar to list slicing in Python, so I wouldn’t go into detail here, but when it comes to 2 dimension or more, we could do this:

A = np.array([[10,20,30],[10,20,30],[10,20,30]])
print(A[1, :])
print(A[:, 0])
print(A[0:2, 0:2])

This is the illustration of what happen above :

Concatenation

Sometimes we need to work with multiple vectors and combine both of them, so we need to know how to do concatenation. There are several ways to do it, here are the examples:

np.stack()

This will stack multiple vector into the new dimension, for example:

A = np.array([1,3,5,7,9])
B = np.array([2,4,6,8,10])
C = np.stack([A,B], axis=0)

np.concatenate()

This will concatenate multiple vector along the same dimension, for example:

A = np.array([1,3,5,7,9])
B = np.array([2,4,6,8,10])
C = np.concatenate([A,B], axis=0)

np.block()

This function will concatenate multiple blocks of numpy array like a sub-blocks. Here is an example:

A = np.array([1,2],[2,3])
B = np.array([-1,-2],[-2,-3])
C = np.block([[A, B], [B,A]])

Here is the illustration of what happens above. (However, there is some restriction in using this, but we won’t go into further detail here, you can read more at the numpy documentation if you are interested.)

Illustration example for using np.block() command

Conclusion & more

Numpy provides a lot of tool that would speed your process in handling array a lot faster. However, there are multiple methods and tools that I couldn’t cover them in here. For the numpy, you can look further in the documentation here

In machine learning world, matrix multiplication is very important, and it needs to be fast and efficient due to the large amount of parameters and data. Therefore, there is a data structure called tensor in both Pytorch and Tensorflow library, which would allow you to do matrix multiplication on GPU to parallelize the process and makes it much faster. Both of them are also allow to generate a random value matrix or vector easier than numpy.

However, if you learned how to use numpy.array, you will be able to adapt to torch.Tensor or tensorflow.Tensor really quickly, since most of the basic operation are quite similar.

I hope this article will helps you to be able to write the code more efficiently with numpy, let me know if you have any question.