Hello there!

If you aspire to become Data Scientist or Machine Learning expert you cannot ignore Numpy. In fact this is the vital Python library you will have to know by heart.

I won’t go through academic details of Numpy. You could find such details at numpy site https://numpy.org/

As a data scientist I am more interested in making you aware of practical use of various aspects of Numpy.

I hope you are using Jupyter notebook on Anaconda, because I am going to provide you link to my Jupyter notebook containing all the commands I have written for this article.

Let’s begin!

Before we take a deeper dive. Let’s have same understanding about the terms I am going to use in this article.

A Single value is also referred as Scalar.

More than one values in horizontal one dimensional array format like [1 2 3 4] or vertical array format like below

[

1

2

3

]

is also referred as Vector.

A multidimensional array like [[1 2 3], [4 5 6], [7 8 9]] is also referred as Matrix.

Let’s go through one dimensional array

First import numpy library

import numpy as np

## Scalar

If you just want to generate a random integer using numpy you can do so by calling random.ranint().

Below code will print random integer less than 17.

`dnpintsc=np.random.randint(17)print(dnpintsc)`

likewise random.randn() will return a random number.

`dnpintsc=np.random.randn()print(dnpintsc)`

Below is the result I got for above. Of course, it changes with next run.

## One Dimensional Array (Vector)

Let’s go through better known and used feature of numpy. It’s one dimensional array.

You can create one dimensional array just by calling array function and passing the list to it like below.

`dnpvector1=np.array([1,2,3,4,5,6,7])print(dnpvector1)`

Result:

`[1 2 3 4 5 6 7]`

If you already have a list defined like below, you can just pass that list array function to create numpy array.

`list1=[1,2,4,7,8,9]dnplistarr=np.array(list1)dnplistarr`

Result:

`array([1, 2, 4, 7, 8, 9])`

Likewise if you want to convert a numpy array to python list, you can do so by calling tolist function.

`dnplistarr.tolist()`

Result:

`[1, 2, 4, 7, 8, 9]`

append function will append new values at the end of numpy array.

`dnpvector1=np.append(dnpvector1,10)dnpvector1`

Result:

`array([ 1,  2,  3,  4,  5,  6,  7, 10])`

If you want to add value at a specific position (called index) in numpy array, you should use insert function.

Below code will add 100 at index 1 in np array dnpvector1

`dnpvector1=np.insert(dnpvector1,1,100)dnpvector1`

Result:

`array([  1, 100,   2,   3,   4,   5,   6,   7,  10])`

Note: Numpy array start position is 0. In above case the value at index 0 was 1, and index 1 was 2. 100 is added at index 1 and value 2 is moved to index 3.

Array values can be sorted using sort function.

`np.sort(dnpvector1)`

Result:

`array([  1,   2,   3,   4,   5,   6,   7,  10, 100])`

Values can be dropped/deleted from array using delete function.

Below code will delete value at index1 in array dnpvector1.

`dnpvector1=np.delete(dnpvector1,1)dnpvector1`

Result:

`array([ 1,  2,  3,  4,  5,  6,  7, 10])`

A new array can be created by concatenating values from more than one array using concatenate function.

Below code is concatenating arrays dnpvector2 and dnpvector1 and creating new array dnpvector3.

`dnpvector2=np.array([101,102,103])dnpvector3=np.concatenate((dnpvector1,dnpvector2), axis=0)dnpvector3`

Result:

`array([  1,   2,   3,   4,   5,   6,   7,  10, 101, 102, 103])`

Are you wondering, how to find an index of a value? This feature comes very handly when you have a array dataset with millions of records.

You can use where function for this.

Below code will return index of value 7 in array dnpvector3

`np.where(dnpvector3==7)`

Result:

`(array(, dtype=int32),)`

Do you want to create one dimensional array of numbers 1 to 10 with interval of 2. You need to use arange function. This function will come very handy while plotting visualizations during data exploration and result analysis.

Below code will create array dnp1 with values from 1 to 10 (10 not included) with interval of 2.

`dnp1=np.arange(1,10,2)dnp1`

Result:

`array([1, 3, 5, 7, 9])`

How about creating an array with all zero values? Use zero function for that.

Below code will create one dimensional array containing ten zeros.

`dnpzero=np.zeros(10)print(dnpzero)`

Result:

`[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]`

Like wise you can create array with number 1, using ones function. I don’t think it needs a demonstration.

A very useful function in numpy is linspace. It creates numbers in the range provided with the space mentioned. Let’s go through below example.

If you want to print 20 numbers between 1 and 5 (5 included) with equal spacing, you need to run below code.

`dnplinespace=np.linspace(1,5,20)print(dnplinespace)`

Result:

`[1.         1.21052632 1.42105263 1.63157895 1.84210526 2.05263158 2.26315789 2.47368421 2.68421053 2.89473684 3.10526316 3.31578947 3.52631579 3.73684211 3.94736842 4.15789474 4.36842105 4.57894737 4.78947368 5.        ]`

This function will come handy while drawing visualization.

Let’s create a one dimensional array of sixteen random numbers using randn function.

`dnprandomarray=np.random.randn(16)print(dnprandomarray)`

Result:

`[-1.76865566  0.44877883  1.65279147  0.9642689   1.14989971 -0.44664962 -1.29255905 -2.04026537  0.4573302   1.24653905  1.23037704  0.02799583  1.49594113 -0.34639749 -1.41667387  1.31684675]`

Maximum and minimum values of array can be known through max and min functions.

`print(dnprandomarray.max())print(dnprandomarray.min())`

Result:

`1.6527914692877606-2.0402653731995333`

If you want to know the index of maximum and minimum values, you can use argmax and argmin functions.

`print(dnprandomarray.argmax())print(dnprandomarray.argmin())`

Result:

`27`

How to find the size of an array? Use size function.

Below code will return size of dnprandomarray array.

`dnprandomarray.size`

Result:

`16`

Wondering how to find value at a specific index. Just call array[index] like below.

`print(dnprandomarray)`

Result:

`[0.44877883 1.65279147 0.9642689  1.14989971]`

If you want to fetch value between indexes run array[firstindexposition:lastindexposition+1]

Below code will return value at index 1,2,3 and 4 from array dnprandomarray

`print(dnprandomarray[1:5])`

Result:

`[0.44877883 1.65279147 0.9642689  1.14989971]`

## Multidimensional Array (Matrix)

Very Good!

Now that you got enough exposure to single dimensional array, let enhance that knowledge with multidimensional array.

While single dimensional array contains value either in one row (horizontal array) or one column (vertical array), multidimensional array contains values in more than one row and one column. Multidimensional arrays are indicated as array[rowindexfrom:rowindexto,columnindexfrom:columnindexto]

Let’s create a 5x4 multidimensional array of random numbers and explore the world of multidimensional array.

`dnprandom=np.random.rand(5,4)print(dnprandom)`

Result:

`[[0.21067598 0.07644456 0.51538545 0.81564459] [0.66354799 0.75554928 0.74104759 0.29199204] [0.54019744 0.96360781 0.62939973 0.07646806] [0.29973016 0.76815988 0.3176048  0.1235475 ] [0.39714328 0.95994687 0.43036685 0.08273214]]`

Let’s find shape of this matrix using shape function.

`print(dnprandom.shape)`

Result:

`(5, 4)`

Let’s find data type of this matrix values using dtype.

`print(dnprandom.dtype)`

Result:

`float64`

Let’s print value at 4th row and 3rd column.

`print(dnprandom)print(dnprandom[4,2])`

Result:

`0.43036685391129950.4303668539112995`

Notice below picture to understand how numpy numbers rows and columns and why it is returning last row and second last column value when you call 4,2

Also, note you can get value at 4th row and 2nd column by calling array or array[4,2] . Both will return same value.

How about, if I want to create a subset of this matrix?

Below code will create subset of above matrix with values from row 1 and 2 and columns 2 and 3

`print(dnprandom[1:3,2:4])`

Result:

`[[0.74104759 0.29199204] [0.62939973 0.07646806]]`

Not the 3 from row range 1:3 and 4 from column range 2:4 are not included in the resulting data subset.

## Operations

Now that we understand numpy arrays, let’s do some mathematical operations on it. These operations can be performed on single dimensional as well as multidimensional arrays.

Let’s create a multidimensional array

`dnpintmat=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])print(dnpintmat)`

Result:

`[[ 1  2  3] [ 4  5  6] [ 7  8  9] [10 11 12] [13 14 15]]`

What would be result of below code?

print(dnpintmat>7)

It will indicate True for all the values greater than 7 and False for other values.

Result:

`[[False False False] [False False False] [False  True  True] [ True  True  True] [ True  True  True]]`

What if you want to create a subset of above array with all the values between 7 and 14.

Run below code

`dnpintmat2=[(dnpintmat>7)&(dnpintmat<14)]print(dnpintmat[dnpintmat2])`

Result:

`[ 8  9 10 11 12 13]`

What is there is need of broadcasting. As ans example, what if one wants to replace values in 3rd row and 4th column of dnpintmat array with 100.

It can be done by directly assigning the value like below.

`dnpintmat[3:4]=100print(dnpintmat)`

Result:

`[[  1   2   3] [  4   5   6] [  7   8   9] [100 100 100] [ 13  14  15]]`

Likewise you can add values as below

This code will add 50 to each value of matrix dnpintmat

`print(dnpintmat+50)`

Result:

`[[ 51  52  53] [ 54  55  56] [ 57  58  59] [150 150 150] [ 63  64  65]]`

## Mathematical & Statistical Functions

You can directly call mathematical and statistical functions like sum, sqrt, log, exp and std on numpy arrays.

Below are some examples.

`print(np.sum(dnpintmat))`

Result:

`387`

Likewise std will calculate standard deviation among the array values.

`print(np.std(dnpintmat))`

Result:

`37.31880669760668`

sqrt will calculat sqrt of each value in matrix.

`print(np.sqrt(dnpintmat))`

Result:

`[[ 1.          1.41421356  1.73205081] [ 2.          2.23606798  2.44948974] [ 2.64575131  2.82842712  3.        ] [10.         10.         10.        ] [ 3.60555128  3.74165739  3.87298335]]`

Congratulations!!

Now you know enought numpy to start exploring it on data in you journey of Machine Learning and Data Science.