A complete guide on NumPy for data science

A guide to learn and implement NumPy from basic to advanced level for exploratory data analysis when dealing with data.

Jayashree domala
Nerd For Tech
7 min readNov 20, 2020

--

What is NumPy?

NumPy is a library of Python that will help in analyzing the data. It is used by individuals who deal with data science. It is a linear algebra library that has bindings to C libraries making it really fast.

How to install NumPy?

To install NumPy using pip:

pip install numpy

To install Numpy using Anaconda:

conda install numpy

What are NumPy Arrays?

While working with NumPy for data science, mostly we have to deal with NumPy arrays. These arrays are of two types:

  1. Matrices

Matrices are usually two-dimensional but they can still have either only one row or one column.

2. Vectors

Vectors on the other hand are strictly one-dimensional.

How to create NumPy Arrays using lists?

→ Importing the library

>>> import numpy as np

→ Creating a list and then converting it into an array of 1 dimension.

>>> list1 = [11,23,34,56]
>>> list1
[11, 23, 34, 56]
>>> np.array(list1)
array([11, 23, 34, 56])
>>> array1 = np.array(list1)
>>> array1
array([11, 23, 34, 56])

→ Creating a list of lists and converting it into an array of 2 dimensions.

>>> list2 = [[11,22,33],[55,66,77],[88,99,100]]
>>> np.array(list2)
array([[ 11, 22, 33],
[ 55, 66, 77],
[ 88, 99, 100]])

As seen above, there are two dimensions i.e rows and columns. The dimension is also indicated with the number of brackets the array is enclosed in. There is one round bracket and a square bracket with encloses the array, therefore it is of 2 dimensions.

How to create NumPy Arrays using built-in methods?

→ Creating using the arange method which is similar to the python range. The arguments are start, stop and step values. The first value is ‘start’ and goes up to (stop-1) just like the range function.

>>> np.arange(0,10,1)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(0,10,2)
array([0, 2, 4, 6, 8])

→ Creating an array of all zeros.

>>> np.zeros(5)
array([0., 0., 0., 0., 0.])
>>> np.zeros((3,2))
array([[0., 0.],
[0., 0.],
[0., 0.]])

→ Creating an array of all ones.

>>> np.ones(5)
array([1., 1., 1., 1., 1.])
>>> np.ones((3,2))
array([[1., 1.],
[1., 1.],
[1., 1.]])

→ Creating an array where the values are spaced equally in an interval. It takes the arguments: start, stop, number of values.

>>> np.linspace(1,20,5)
array([ 1. , 5.75, 10.5 , 15.25, 20. ])

As seen above, it returns 5 numbers in the interval 1 to 20 which are evenly spaced.

→ Creating an identity matrix.

>>> np.eye(3)
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])

→ Creating an array with random numbers of uniform distribution (0–1).

>>> np.random.rand(3)
array([0.13426 , 0.22672772, 0.98574852])
>>> np.random.rand(3,2)
array([[0.13636649, 0.3366877 ],
[0.36993761, 0.02392286],
[0.20869183, 0.59256244]])

→ Creating an array with random numbers of the normal distribution (centered around 0).

>>> np.random.randn(3)
array([ 0.71105797, -0.33395766, 0.67756835])
>>> np.random.randn(4,2)
array([[ 1.21447908, 0.6830743 ],
[-0.28203856, 0.16459752],
[-0.32451067, -0.1618622 ],
[-0.9331776 , 0.6281955 ]])

→ Creating an array with random integers using randint() where the arguments to be passed are low, high and size. Low is inclusive and high is exclusive.

>>> np.random.randint(1,50)
49
>>> np.random.randint(1,50,5)
array([27, 43, 44, 39, 16])

What are the attributes and methods of NumPy Array?

>>> arr1 = np.arange(10,35)
>>> arr1
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

→ Reshape method to change the array into a new shape.

>>> arr1.reshape(5,5)
array([[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])

Incase while reshaping the matrix is not filled then it will give an error. Make sure that the number of rows multiplied by the number of columns equals the number of elements in the array.

>>> arr1.reshape(3,3)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-2c9beb517969> in <module>
----> 1 arr1.reshape(3,3)
ValueError: cannot reshape array of size 25 into shape (3,3)

→ Finding the maximum and minimum values in the array.

>>> arr1
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])
>>> arr1.max()
34
>>> arr1.min()
10

To know the index at which the max or min value is present, use argmax() or argmin().

>>> arr1.argmax()
24
>>> arr1.argmin()
0

→ shape() method to find the shape of the array.

>>> arr1.shape
(25,)

This denotes that the array is 1-D with 25 elements.

>>> arr2 = arr1.reshape(5,5)
>>> arr2.shape
(5, 5)

This denotes that the array is 2-D and has 5 rows and 5 columns.

→ Finding the datatype of the elements in the array.

>>> arr1.dtype
dtype('int32')

How to perform indexing and selection of elements in 1-D NumPy Array?

>>> arr1
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

→ Using the slicing notation to pick elements from the array.

>>> arr1[3]
13
>>> arr1[1:5]
array([11, 12, 13, 14])

→ Using slicing to change values in an array, i.e broadcasting.

>>> arr1[1:5] = 50
>>> arr1
array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

Now, let's slice the array and broadcast it. Notice how the sliced array will have changed values but also the original array.

>>> arr2 = arr1[10:15]
>>> arr2
array([20, 21, 22, 23, 24])
>>> arr2[:] = 25
>>> arr2
array([25, 25, 25, 25, 25])
>>> arr1
array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

To avoid this, make a copy of the array and then broadcast.

>>> arr_copy = arr1.copy()
>>> arr_copy
array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])
>>> arr_copy[:] = 1
>>> arr_copy
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
>>> arr1
array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

How to perform indexing and selection of elements in 2-D NumPy Array?

>>> arr2 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> arr2
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

→ Indexing using brackets

For indexing, pass row value and then the column value.

>>> arr2[0][1]
2

You can also use the below notation where the row and column values are written separated by a comma.

>>> arr2[0,1]
2

→ Getting the subpart of a matrix.

To get the submatrix, use slicing. In the below example, select rows o and 1 and columns 1 and 2.

>>> arr2[:2,1:]
array([[2, 3],
[5, 6]])

How to perform conditional selection using a boolean array?

>>> arr1
array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])
>>> bool_arr = arr1 > 20
>>> bool_arr
array([False, True, True, True, True, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])

→ Getting the values where the boolean value is True.

>>> arr1[bool_arr]
array([50, 50, 50, 50, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])
>>> arr1[arr1<20]
array([10, 15, 16, 17, 18, 19])

How to perform Array with Array operations?

>>> arr3 = np.arange(0,5)
>>> arr3
array([0, 1, 2, 3, 4])

→ Elementwise operations

>>> arr3 + arr3
array([0, 2, 4, 6, 8])
>>> arr3 - arr3
array([0, 0, 0, 0, 0])
>>> arr3 * arr3
array([ 0, 1, 4, 9, 16])

How to perform Array with Scalar operations?

Scalar means just a single number. So when dealing with scalar and array operations, NumPy broadcasts the scalar into an array and performs the elementwise operations.

>>> arr3
array([0, 1, 2, 3, 4])
>>> arr3 + 5
array([5, 6, 7, 8, 9])
>>> arr3 - 2
array([-2, -1, 0, 1, 2])
>>> arr3 * 3
array([ 0, 3, 6, 9, 12])
>>> arr3 / 6
array([0. , 0.16666667, 0.33333333, 0.5 , 0.66666667])
>>> arr3 ** 3
array([ 0, 1, 8, 27, 64], dtype=int32)

In python, 0/0 gives an error but in NumPy, when 0/0 is performed, it gives a warning and returns a NAN (Null) value.

>>> arr3
array([0, 1, 2, 3, 4])
>>> arr3/arr3
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in true_divide
"""Entry point for launching an IPython kernel.
array([nan, 1., 1., 1., 1.])
>>> 1/arr3
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
array([ inf, 1. , 0.5 , 0.33333333, 0.25 ])

How to perform NumPy array universal functions?

→ Finding the square root of each element in the array.

>>> np.sqrt(arr3)
array([0. , 1. , 1.41421356, 1.73205081, 2. ])

→ Finding the exponential of each element in the array.

>>> np.exp(arr3)
array([ 1. , 2.71828183, 7.3890561 , 20.08553692, 54.59815003])

→ Finding the maximum of the array.

>>> arr3.max()
4

→ Trigonometric functions

>>> np.sin(arr3)
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
>>> np.cos(arr3)
array([ 1. , 0.54030231, -0.41614684, -0.9899925 , -0.65364362])

→ Logarithmic function

>>> np.log(arr3)
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log
"""Entry point for launching an IPython kernel.
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436])

For more detailed information on various methods and functions of NumPy, check the official documentation here.

Refer to the notebook for code here.

Books to refer to:

Reach out to me: LinkedIn

Check out my other work: GitHub

--

--

Jayashree domala
Nerd For Tech

Self-driven woman who wishes to deliver creative and engaging ideas and solutions in the field of technology.