Boadcious world of Numpy

Sanjay Singh
Oct 9, 2019 · 7 min read

Hello there!

If you aspire to become Data Scientist or Machine Learning expert you cannot ignore Numpy. In fact this is the vital Python library you will have to know by heart.

I won’t go through academic details of Numpy. You could find such details at numpy site https://numpy.org/

As a data scientist I am more interested in making you aware of practical use of various aspects of Numpy.

I hope you are using Jupyter notebook on Anaconda, because I am going to provide you link to my Jupyter notebook containing all the commands I have written for this article.

Let’s begin!

Before we take a deeper dive. Let’s have same understanding about the terms I am going to use in this article.

A Single value is also referred as Scalar.

More than one values in horizontal one dimensional array format like [1 2 3 4] or vertical array format like below

[

1

2

3

]

is also referred as Vector.

A multidimensional array like [[1 2 3], [4 5 6], [7 8 9]] is also referred as Matrix.

Let’s go through one dimensional array

First import numpy library

import numpy as np

Scalar

If you just want to generate a random integer using numpy you can do so by calling random.ranint().

Below code will print random integer less than 17.

dnpintsc=np.random.randint(17)
print(dnpintsc)

likewise random.randn() will return a random number.

dnpintsc=np.random.randn()
print(dnpintsc)

Below is the result I got for above. Of course, it changes with next run.

One Dimensional Array (Vector)

Let’s go through better known and used feature of numpy. It’s one dimensional array.

You can create one dimensional array just by calling array function and passing the list to it like below.

dnpvector1=np.array([1,2,3,4,5,6,7])
print(dnpvector1)

Result:

[1 2 3 4 5 6 7]

If you already have a list defined like below, you can just pass that list array function to create numpy array.

list1=[1,2,4,7,8,9]
dnplistarr=np.array(list1)
dnplistarr

Result:

array([1, 2, 4, 7, 8, 9])

Likewise if you want to convert a numpy array to python list, you can do so by calling tolist function.

dnplistarr.tolist()

Result:

[1, 2, 4, 7, 8, 9]

append function will append new values at the end of numpy array.

dnpvector1=np.append(dnpvector1,10)
dnpvector1

Result:

array([ 1,  2,  3,  4,  5,  6,  7, 10])

If you want to add value at a specific position (called index) in numpy array, you should use insert function.

Below code will add 100 at index 1 in np array dnpvector1

dnpvector1=np.insert(dnpvector1,1,100)
dnpvector1

Result:

array([  1, 100,   2,   3,   4,   5,   6,   7,  10])

Note: Numpy array start position is 0. In above case the value at index 0 was 1, and index 1 was 2. 100 is added at index 1 and value 2 is moved to index 3.

Array values can be sorted using sort function.

np.sort(dnpvector1)

Result:

array([  1,   2,   3,   4,   5,   6,   7,  10, 100])

Values can be dropped/deleted from array using delete function.

Below code will delete value at index1 in array dnpvector1.

dnpvector1=np.delete(dnpvector1,1)
dnpvector1

Result:

array([ 1,  2,  3,  4,  5,  6,  7, 10])

A new array can be created by concatenating values from more than one array using concatenate function.

Below code is concatenating arrays dnpvector2 and dnpvector1 and creating new array dnpvector3.

dnpvector2=np.array([101,102,103])
dnpvector3=np.concatenate((dnpvector1,dnpvector2), axis=0)
dnpvector3

Result:

array([  1,   2,   3,   4,   5,   6,   7,  10, 101, 102, 103])

Are you wondering, how to find an index of a value? This feature comes very handly when you have a array dataset with millions of records.

You can use where function for this.

Below code will return index of value 7 in array dnpvector3

np.where(dnpvector3==7)

Result:

(array([6], dtype=int32),)

Do you want to create one dimensional array of numbers 1 to 10 with interval of 2. You need to use arange function. This function will come very handy while plotting visualizations during data exploration and result analysis.

Below code will create array dnp1 with values from 1 to 10 (10 not included) with interval of 2.

dnp1=np.arange(1,10,2)
dnp1

Result:

array([1, 3, 5, 7, 9])

How about creating an array with all zero values? Use zero function for that.

Below code will create one dimensional array containing ten zeros.

dnpzero=np.zeros(10)
print(dnpzero)

Result:

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Like wise you can create array with number 1, using ones function. I don’t think it needs a demonstration.

A very useful function in numpy is linspace. It creates numbers in the range provided with the space mentioned. Let’s go through below example.

If you want to print 20 numbers between 1 and 5 (5 included) with equal spacing, you need to run below code.

dnplinespace=np.linspace(1,5,20)
print(dnplinespace)

Result:

[1.         1.21052632 1.42105263 1.63157895 1.84210526 2.05263158
2.26315789 2.47368421 2.68421053 2.89473684 3.10526316 3.31578947
3.52631579 3.73684211 3.94736842 4.15789474 4.36842105 4.57894737
4.78947368 5. ]

This function will come handy while drawing visualization.

Let’s create a one dimensional array of sixteen random numbers using randn function.

dnprandomarray=np.random.randn(16)
print(dnprandomarray)

Result:

[-1.76865566  0.44877883  1.65279147  0.9642689   1.14989971 -0.44664962
-1.29255905 -2.04026537 0.4573302 1.24653905 1.23037704 0.02799583
1.49594113 -0.34639749 -1.41667387 1.31684675]

Maximum and minimum values of array can be known through max and min functions.

print(dnprandomarray.max())
print(dnprandomarray.min())

Result:

1.6527914692877606
-2.0402653731995333

If you want to know the index of maximum and minimum values, you can use argmax and argmin functions.

print(dnprandomarray.argmax())
print(dnprandomarray.argmin())

Result:

2
7

How to find the size of an array? Use size function.

Below code will return size of dnprandomarray array.

dnprandomarray.size

Result:

16

Wondering how to find value at a specific index. Just call array[index] like below.

print(dnprandomarray[5])

Result:

[0.44877883 1.65279147 0.9642689  1.14989971]

If you want to fetch value between indexes run array[firstindexposition:lastindexposition+1]

Below code will return value at index 1,2,3 and 4 from array dnprandomarray

print(dnprandomarray[1:5])

Result:

[0.44877883 1.65279147 0.9642689  1.14989971]

Multidimensional Array (Matrix)

Very Good!

Now that you got enough exposure to single dimensional array, let enhance that knowledge with multidimensional array.

While single dimensional array contains value either in one row (horizontal array) or one column (vertical array), multidimensional array contains values in more than one row and one column. Multidimensional arrays are indicated as array[rowindexfrom:rowindexto,columnindexfrom:columnindexto]

Let’s create a 5x4 multidimensional array of random numbers and explore the world of multidimensional array.

dnprandom=np.random.rand(5,4)
print(dnprandom)

Result:

[[0.21067598 0.07644456 0.51538545 0.81564459]
[0.66354799 0.75554928 0.74104759 0.29199204]
[0.54019744 0.96360781 0.62939973 0.07646806]
[0.29973016 0.76815988 0.3176048 0.1235475 ]
[0.39714328 0.95994687 0.43036685 0.08273214]]

Let’s find shape of this matrix using shape function.

print(dnprandom.shape)

Result:

(5, 4)

Let’s find data type of this matrix values using dtype.

print(dnprandom.dtype)

Result:

float64

Let’s print value at 4th row and 3rd column.

print(dnprandom[4][2])
print(dnprandom[4,2])

Result:

0.4303668539112995
0.4303668539112995

Notice below picture to understand how numpy numbers rows and columns and why it is returning last row and second last column value when you call 4,2

Also, note you can get value at 4th row and 2nd column by calling array[4][2] or array[4,2] . Both will return same value.

How about, if I want to create a subset of this matrix?

Below code will create subset of above matrix with values from row 1 and 2 and columns 2 and 3

print(dnprandom[1:3,2:4])

Result:

[[0.74104759 0.29199204]
[0.62939973 0.07646806]]

Not the 3 from row range 1:3 and 4 from column range 2:4 are not included in the resulting data subset.

Operations

Now that we understand numpy arrays, let’s do some mathematical operations on it. These operations can be performed on single dimensional as well as multidimensional arrays.

Let’s create a multidimensional array

dnpintmat=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])
print(dnpintmat)

Result:

[[ 1  2  3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]

What would be result of below code?

print(dnpintmat>7)

It will indicate True for all the values greater than 7 and False for other values.

Result:

[[False False False]
[False False False]
[False True True]
[ True True True]
[ True True True]]

What if you want to create a subset of above array with all the values between 7 and 14.

Run below code

dnpintmat2=[(dnpintmat>7)&(dnpintmat<14)]
print(dnpintmat[dnpintmat2])

Result:

[ 8  9 10 11 12 13]

What is there is need of broadcasting. As ans example, what if one wants to replace values in 3rd row and 4th column of dnpintmat array with 100.

It can be done by directly assigning the value like below.

dnpintmat[3:4]=100
print(dnpintmat)

Result:

[[  1   2   3]
[ 4 5 6]
[ 7 8 9]
[100 100 100]
[ 13 14 15]]

Likewise you can add values as below

This code will add 50 to each value of matrix dnpintmat

print(dnpintmat+50)

Result:

[[ 51  52  53]
[ 54 55 56]
[ 57 58 59]
[150 150 150]
[ 63 64 65]]

Mathematical & Statistical Functions

You can directly call mathematical and statistical functions like sum, sqrt, log, exp and std on numpy arrays.

Below are some examples.

sum funtion will return total of all the array values.

print(np.sum(dnpintmat))

Result:

387

Likewise std will calculate standard deviation among the array values.

print(np.std(dnpintmat))

Result:

37.31880669760668

sqrt will calculat sqrt of each value in matrix.

print(np.sqrt(dnpintmat))

Result:

[[ 1.          1.41421356  1.73205081]
[ 2. 2.23606798 2.44948974]
[ 2.64575131 2.82842712 3. ]
[10. 10. 10. ]
[ 3.60555128 3.74165739 3.87298335]]

Congratulations!!

Now you know enought numpy to start exploring it on data in you journey of Machine Learning and Data Science.

Link to my jupyter notebook with above commands. ->https://github.com/srssingh/Machine-Learning/blob/master/Numpy%20for%20ML.ipynb

Reference

Machine Learning Hands-on

Data Science, Machine Learning and Artificial Intelligence

Sanjay Singh

Written by

Sanrusha

Sanrusha

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade