Data Science Basics: Numpy

Getting started with NumPy

Kantajit Shaw
Geek Culture
4 min readJun 11, 2021

--

Photo by Pawel Czerwinski on Unsplash

Numpy

Numpy is a python library popular in the data science community. It can be used for multidimensional large array manipulation. It is also used for mathematical operations on arrays.

Numpy is useful as it is easy to learn, efficient, and has high computing capabilities.

To use NumPy, we first have to import it. We use a shorter alias np to use numpy.

import numpy as np

Numpy array creation

Numpy arrays are homogeneous data structures, the data in an array has to be of the same data type.

Numpy arrays can be created from a list.

>>> a_list = [1,2,3,4,5,6,7,8,9]
>>> a_arr=np.array(a_list)
>>> print(a_arr)
[1 2 3 4 5 6 7 8 9]

In case the list is not homogeneous, the items are upcasted to match the data type.

>>> a_list = [1,2,3.9]
>>> a_arr=np.array(a_list)
>>> print(a_arr)
[1. 2. 3.9]
>>> print(a_arr.dtype)
float64
>>> a_list = [1,'2',3.9]
>>> a_arr=np.array(a_list)
>>> print(a_arr)
['1' '2' '3.9']
>>> print(a_arr.dtype)
<U21

There are other ways to create numpy arrays like np.ones, np.zeros, np.random.random, np.arange, np.linspace etc.

>>> print(np.ones((4,3)))
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
>>> print(np.zeros((4,3)))
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
>>> print(np.ones((4,3), dtype=np.int))
[[1 1 1]
[1 1 1]
[1 1 1]
[1 1 1]]
>>> print(np.random.random((2,3)))
[[0.7386583 0.02779194 0.97240368]
[0.58384794 0.78786521 0.20495265]]
>>> print(np.arange(3))
[0, 1, 2]
>>> print(np.arange(3.0))
[ 0., 1., 2.]
>>> print(np.arange(3,7))
[3, 4, 5, 6]
>>> print(np.linspace(2.0, 3.0, num=5))
[2. 2.25 2.5 2.75 3. ]
>>> print(np.linspace(2.0, 3.0, num=5, endpoint=False))
[2. 2.2 2.4 2.6 2.8]

As we can see in the above examples, we can specify array dimensions in a tuple to create arrays. There are other methods like ones_like, zeros_like, etc. which takes an array as an input and returns an array with the same dimensions.

>>> a=np.ones((4,3))
>>> b=np.zeros_like(a)
>>> print(b)
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]

There’s one more method tile in numpy which can be useful. It takes an array as an input and repeats it multiple times.

>>> print(np.tile([0,1,2],3))
[0 1 2 0 1 2 0 1 2]
>>> print(np.tile(np.array([0,1,2]),3))
[0 1 2 0 1 2 0 1 2]

It accepts an array or a list or a set as input.

Numpy array properties

There are some attributes of numpy array that helps us understand its structure and content. Following are some examples of those.

>>> a=np.ones((4,3))
>>> a.shape
(4, 3)
>>> a.size # total no of elements in the array
12
>>> a.ndim # number of dimensions
2
>>> a.dtype
dtype('float64')

Operations on NumPy array

Element-wise operations can be done using operators.

>>> print(np.array([[0,1],[2,3]])+np.array([[7,8],[5,6]]))
[[7 9]
[7 9]]

For element-wise operations between two arrays to happen, both of them should be of similar dimensions, else we will get an error.

>>> print(np.array([[0,1],[2,3]])+np.array([[7,8],[5,6],[3,4]]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (2,2) (3,2)

We can use operators to add/multiply or other operations with a constant too.

>>> print(np.array([[0,1],[2,3]])**3)
[[ 0 1]
[ 8 27]]
>>> print(np.array([[0,1],[2,3]])/3)
[[0. 0.33333333]
[0.66666667 1. ]]

Array manipulations

We can use reshape to change the dimensions of an array.

>>> print(np.arange(6))
[0 1 2 3 4 5]
>>> print(np.arange(6).reshape(3,2))
[[0 1]
[2 3]
[4 5]]

One more interesting trick, if we do not know one of the dimensions, we can use -1 to leave the numpy to calculate the unknown dimension.

>>> print(np.arange(9).reshape(3,-1))
[[0 1 2]
[3 4 5]
[6 7 8]]

Arrays can be stacked together horizontally or vertically. To stack arrays vertically number of columns should be the same for both arrays. Similarly to stack horizontally number of rows should be the same.

>>> a1=np.arange(12,24).reshape(3,4)
>>> a1
array([[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
>>> a2=np.arange(9).reshape(3,-1)
>>> a2
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a3=np.arange(14,28).reshape(-1,7)
>>> a3
array([[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27]])
>>> a12=np.hstack((a1,a2))
>>> a123=np.vstack(a12,a3)
>>> a12
array([[12, 13, 14, 15, 0, 1, 2],
[16, 17, 18, 19, 3, 4, 5],
[20, 21, 22, 23, 6, 7, 8]])
>>> a123
array([[12, 13, 14, 15, 0, 1, 2],
[16, 17, 18, 19, 3, 4, 5],
[20, 21, 22, 23, 6, 7, 8],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27]])

Arrays can be transposed.

>>> x = np.array([[1,2],[3,4]]) 
>>> x
array([[1, 2],
[3, 4]])
>>> x.T
array([[1, 3],
[2, 4]])

Mathematical Operations on array

We can apply mathematical operations to the whole array.

>>> a = np.array([[1,2],[3,4]]) 
>>> print(np.sin(a))
[[ 0.84147098 0.90929743]
[ 0.14112001 -0.7568025 ]]
>>> print(np.cos(a))
[[ 0.54030231 -0.41614684]
[-0.9899925 -0.65364362]]
>>> print(np.exp(a))
[[ 2.71828183 7.3890561 ]
[20.08553692 54.59815003]]

If we need to apply a user-defined function to an array, We can iterate through the array and apply the function to each element. But, there is an efficient and vectorized way to do this.

>>> f = np.vectorize(lambda x: x/(x+1))
>>> f(a)
array([[0.5 , 0.66666667],
[0.75 , 0.8 ]])

We can create a function and let vectorize method handle the rest.

Linear algebra operations on array

We can perform basic linear algebra operations on an array. NumPy’s inbuilt package np.linalg allows us to apply linear algebra operations.

>>> np.linalg.inv(a)
array([[-2. , 1. ],
[ 1.5, -0.5]])
>>> np.linalg.det(a)
-2.0000000000000004
>>> np.linalg.eig(a) # eigen vector and eigen values
(array([-0.37228132, 5.37228132]), array([[-0.82456484, -0.41597356],
[ 0.56576746, -0.90937671]]))

Matrix multiplications can be done using np.dot method.

>>> np.dot(a, np.linalg.inv(a))
array([[1.00000000e+00, 1.11022302e-16],
[0.00000000e+00, 1.00000000e+00]])

Numpy is a useful tool that helps us perform data analysis. In a way, it is a pillar for the practical aspect of data science.

To learn more about Numpy, click here.

Hope you like it. Stay tuned for more.

--

--

Kantajit Shaw
Geek Culture

Deep learning enthusiast, interested in Computer Vision and Natural Language Processing problems.