Understand the basics of Numpy
Numpy is one of the fundamental packages for numerical computing in Python. Before learning any other package for data science and machine learning tasks, it is necessary to understand and get used to numpy. In this article, we’ll discuss the importance of numpy, numpy arrays and the various operations and functions that numpy provides.
Are we ready…?
Before starting with the coding part, let’s discuss some theory.
What is Numpy?
It is a Python library which is used for performing numerical computation. It provides a multidimensional array object and fast operations on arrays like arithmetic and logical operations, sorting, shape manipulation and many more.
What is a Numpy array?
It is a multidimensional array which contains the same type of data (homogeneous). The size of the Numpy array is fixed during its creation.
What is the difference between a Numpy array and a Python list (sequence)?
- Numpy array has fixed sizes at creation, unlike Python lists which grow dynamically.
- Numpy array has elements of the same type, whereas in Python lists the elements can be of different types.
- Numpy array allows advanced mathematical and other types of operations on large numbers of data, which takes less time and effort compared to Python lists.
How Numpy arrays are more efficient in performing numerical computation on large numbers of data?
Numpy arrays store data in a continuous block of memory. It uses a C type array, which is a static array and is not a referential array. (It stores the values of its elements and not the memory address of the elements).
Also, Numpy operations perform complex computations on the entire array without using loops.
Now we know about Numpy and its importance. Next, we learn how to use numpy. Numpy is a vast library however to keep it simple this article focuses on the basics. The following topics will be discussed in this article:
- Import Numpy
- Create a Numpy array
- Attributes of Numpy array
- Numpy Datatypes
- Different types of Numpy arrays
- Numpy operations
- Numpy functions
- Indexing and Slicing
- Iterating
- Transpose
- Stacking and Splitting
- Import Numpy and create a Numpy array.
import numpy as np
a = np.array([1,2,3,4,5])
In order to use the Numpy library we have to first import it. We generally use an alias “np” for Numpy.
We can create a Numpy array with the array() and pass a list of numbers. This will create a 1D array.
We can also create 2D and 3D arrays.
# Create a 2D array
b = np.array([[1,2,3],[4,5,6]])
# create a 3D array
c = np.array([[[1,2],[3,4]], [[5,6],[7,8]]])
We can also create arrays using arange(). It is similar to range(), it creates an array from the given starting point (included) till the end point (excluded).
a = np.arange(1,10)
# [1 2 3 4 5 6 7 8 9]
a = np.arange(1,10,2)
# [1 3 5 7 9]
It also has a step argument which takes that many steps to go to the next number. By default, the value is 1.
We can change the shape of the array using reshape(). It creates the given rows and columns for the array.
a = np.arange(0,10).reshape(2,5)
# [[0 1 2 3 4]
# [5 6 7 8 9]]
We changed the shape of the 1D array to a 2D array. The number of elements in the array should be equal to the product of the number of rows and columns for the reshape to occur, otherwise, it will give an error.
2. Attributes of a Numpy array
The Numpy library provides a few attributes which give information about the array.
- ndim — gives the number of dimensions of the array
- shape — tells about the number of elements in each dimension
- size — total number of elements
- itemsize — size of each element in the array
- dtype — the type of the element in the array
print(a.ndim)
# 2
print(a.shape)
# (2,5)
print(a.size)
# 10
print(a.itemsize)
# 8
print(a.dtype)
# int64
3. Numpy Datatypes
The int64 is one type of datatype others include int8, int16, int32, float32, float64 and others.
We can specify the datatype during array creation using dtype argument.
a = np.arange(0,10, dtype = np.int32)
print(a.dtype)
# int32
We can also change the datatype using astype().
a = a.astype(np.float64)
print(a)
4. Different Types of Numpy arrays
There are different functions that create different types of arrays. For example:
np.ones() — creates a matrix with value 1.
np.zeros() — creates a matrix with value 0.
np.random.random() — creates a matrix of random numbers.
np.identity() — creates a square matrix of the given size with a diagonal of value 1 and rest as 0.
np.linspace() — generates linearly separable items between a given range.
print(np.ones(5))
# [1. 1. 1. 1. 1.]
print(np.ones(6).reshape(2,3))
# [[1. 1. 1.]
# [1. 1. 1.]]
print(np.zeros(5))
# [0. 0. 0. 0. 0.]
print(np.zeros(6).reshape(3,2))
# [[0. 0.]
# [0. 0.]
# [0. 0.]]
print(np.random.random(5))
# [0.78830219 0.74662065 0.99272408 0.98968297 0.06164713]
print(np.random.random(18).reshape(2,3,3))
# [[[0.31062747 0.24376 0.77747633]
# [0.55792925 0.5663601 0.12395478]
# [0.22748456 0.61284644 0.92787649]]
#
# [[0.78157401 0.56903228 0.97204262]
# [0.74739407 0.18317532 0.90954089]
# [0.07494408 0.665462 0.73489555]]]
print(np.identity(5))
#[[1. 0. 0. 0. 0.]
# [0. 1. 0. 0. 0.]
# [0. 0. 1. 0. 0.]
# [0. 0. 0. 1. 0.]
# [0. 0. 0. 0. 1.]]
print(np.linspace(0,10,5))
# [ 0. 2.5 5. 7.5 10. ]
We can create different dimensional arrays using reshape().
5. Array Operations
We can perform two different types of operations with arrays — scalar and vector
Scalar operations — When using any arithmetic or relational operations with a scalar value, the operation occurs as each element of the array.
a = np.arange(0,6).reshape(2,3)
print(a)
# [[0 1 2]
# [3 4 5]]
print(a - 4)
# [[-4 -3 -2]
# [-1 0 1]]
print(a % 2 == 0)
# [[ True False True]
# [False True False]]
Vector operations — When performing operations between equal-size arrays, the operation applies element—wise.
a = np.arange(0,6).reshape(2,3)
print(a)
# [[0 1 2]
# [3 4 5]]
b = np.arange(10,16).reshape(2,3)
print(b)
# [[10 11 12]
# [13 14 15]]
print(a+b)
# [[10 12 14]
# [16 18 20]]
We are halfway through, let's take a minute to revise and then continue.
6. Array Functions
The Numpy package provides many built-in functions for common purposes. The most useful ones are:
Mathematical functions — max(), min(), sum(), prod(), square(), sqrt()
Statistical functions — mean(), median(), std(), var()
Trignometric functions — sin(), cos(), tan()
Dot product — dot(m1, m2)
Log and exponential — log(), exp()
Rouding off — round(), floor(), ceil()
print(a)
# [[0 1 2]
# [3 4 5]]
print(np.max(a))
# 5
print(np.min(a))
# 0
print(np.sum(a))
# 15
print(np.prod(a))
# 0
print(np.square(a))
# [[ 0 1 4]
# [ 9 16 25]]
print(np.sqrt(a))
# [[0. 1. 1.41421356]
# [1.73205081 2. 2.23606798]]
print(np.mean(a))
# 2.5
print(np.median(a))
# 2.5
print(np.std(a))
# 1.707825127659933
print(np.var(a))
# 2.9166666666666665
print(np.sin(a))
# [[ 0. 0.84147098 0.90929743]
# [ 0.14112001 -0.7568025 -0.95892427]]
print(b)
# [[10 11]
# [12 13]
# [14 15]]
np.dot(a,b)
# [[ 40 43]
# [148 160]]
print(np.log(a))
# [[ -inf 0. 0.69314718]
# [1.09861229 1.38629436 1.60943791]]
print(np.exp(a))
# [[ 1. 2.71828183 7.3890561 ]
# [ 20.08553692 54.59815003 148.4131591 ]]
b = np.array([1.15, 2.55, 3.95])
print(np.round(b))
# [1. 3. 4.]
print(np.floor(b))
# [1. 2. 3.]
print(np.ceil(b))
# [2. 3. 4.]
7. Indexing and Slicing
To access an item we use its index. The index in the array starts with 0. Numpy array also supports negative indexing.
print(a)
# [[0 1 2]
# [3 4 5]]
# access first element
print(a[0,0])
# 0
# access the last element
print(a[-1, -1])
# 5
# access multiple elements;
# print last two elements in the seconf row
print(a[1,1:])
# [4 5]
# access multiple elements with steps;
# print first and third elements in the first row
print(a[0,::2])
# [0 2]
# access second row
print(a[1])
# [3 4 5]
# access third column
print(a[:,2])
# [2 5]
b = np.arange(0,12).reshape(2,2,3)
print(b)
# [[[ 0 1 2]
# [ 3 4 5]]
#
# [[ 6 7 8]
# [ 9 10 11]]]
# access the second element in the first row on fisrt 2D array
print(b[0,0,1])
# 1
# fetches the first 2D array
print(b[0])
# [[0 1 2]
# [3 4 5]]
# access first row in both 2D arrays
print(b[:, 0,:])
# [[0 1 2]
# [6 7 8]]
# access first and third columns in second 2D array
print(b[1,:,::2])
# [[ 6 8]
# [ 9 11]]
8. Iterating
We can use for loop to iterate an array.
# 1D array : iterate through each item
a = np.array([1,2,3])
for i in a:
print(i)
# 1
# 2
# 3
# 2D array : iterate through each row
b = np.array([[1,2],[2,3]])
for i in b:
print(i)
# [1 2]
# [2 3]
# 3D array: iterate through each 2D array
c = np.array([[[1,2],[2,3]], [[3,4],[4,5]]])
for i in c:
print(i)
# [[1 2]
# [2 3]]
# [[3 4]
# [4 5]]
We can use nditer() to iterate through the array. It converts nd array to 1 dimension and returns each item.
for i in np.nditer(b):
print(i)
# 1
# 2
# 2
# 3
9. Transpose
We can use transpose() to swap the rows with columns.
a = np.arange(0,6).reshape(2,3)
print(a)
output
# [[0 1 2]
# [3 4 5]]
print(np.transpose(a))
# [[0 3]
# [1 4]
# [2 5]]
# can also write it as a.T
10. Stacking and Splitting
We have hstack() and vstack() to join arrays horizontally and vertically, respectively.
To split the data horizontally and vertically we use hsplit() and vsplit().
a = np.arange(0,4).reshape(2,2)
print(a)
# [[0 1]
# [2 3]]
b = np.arange(11, 15).reshape(2,2)
print(b)
# [[11 12]
# [13 14]]
print(np.hstack((a,b)))
# [[ 0 1 11 12]
# [ 2 3 13 14]]
print(np.vstack((a,b)))
# [[ 0 1]
# [ 2 3]
# [11 12]
# [13 14]]
c = np.arange(0,16).reshape(4,4)
print(c)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]
# [12 13 14 15]]
np.hsplit(c,2)
# [array([[ 0, 1],
# [ 4, 5],
# [ 8, 9],
# [12, 13]]),
# array([[ 2, 3],
# [ 6, 7],
# [10, 11],
# [14, 15]])]
np.hsplit(c,2)
# [array([[0, 1, 2, 3],
# [4, 5, 6, 7]]),
# array([[ 8, 9, 10, 11],
# [12, 13, 14, 15]])]
That is all for the basics in Numpy. Try some practice problems to reinforce the concepts discussed. Happy learning.