# Boadcious world of Numpy

Hello there!

If you aspire to become Data Scientist or Machine Learning expert you cannot ignore Numpy. In fact this is the vital Python library you will have to know by heart.

I won’t go through academic details of Numpy. You could find such details at numpy site https://numpy.org/

As a data scientist I am more interested in making you aware of practical use of various aspects of Numpy.

I hope you are using Jupyter notebook on Anaconda, because I am going to provide you link to my Jupyter notebook containing all the commands I have written for this article.

Let’s begin!

Before we take a deeper dive. Let’s have same understanding about the terms I am going to use in this article.

A Single value is also referred as **Scalar**.

More than one values in horizontal one dimensional array format like [1 2 3 4] or vertical array format like below

[

1

2

3

]

is also referred as **Vector.**

A multidimensional array like [[1 2 3], [4 5 6], [7 8 9]] is also referred as **Matrix.**

Let’s go through one dimensional array

First import numpy library

import numpy as np

## Scalar

If you just want to generate a random integer using numpy you can do so by calling **random.ranint**().

Below code will print random integer less than 17.

`dnpintsc=np.random.randint(17)`

print(dnpintsc)

likewise **random.randn**() will return a random number.

`dnpintsc=np.random.randn()`

print(dnpintsc)

Below is the result I got for above. Of course, it changes with next run.

## One Dimensional Array (Vector)

Let’s go through better known and used feature of numpy. It’s one dimensional array.

You can create one dimensional array just by calling array function and passing the list to it like below.

`dnpvector1=np.array([1,2,3,4,5,6,7])`

print(dnpvector1)

Result:

`[1 2 3 4 5 6 7]`

If you already have a list defined like below, you can just pass that **list** array function to create numpy** array**.

`list1=[1,2,4,7,8,9]`

dnplistarr=np.array(list1)

dnplistarr

Result:

`array([1, 2, 4, 7, 8, 9])`

Likewise if you want to convert a numpy array to python list, you can do so by calling **tolist** function.

`dnplistarr.tolist()`

Result:

`[1, 2, 4, 7, 8, 9]`

**append** function will append new values at the end of numpy array.

`dnpvector1=np.append(dnpvector1,10)`

dnpvector1

Result:

`array([ 1, 2, 3, 4, 5, 6, 7, 10])`

If you want to add value at a specific position (called index) in numpy array, you should use **insert **function.

Below code will add 100 at index 1 in np array dnpvector1

`dnpvector1=np.insert(dnpvector1,1,100)`

dnpvector1

Result:

`array([ 1, 100, 2, 3, 4, 5, 6, 7, 10])`

Note: Numpy array start position is 0. In above case the value at index 0 was 1, and index 1 was 2. 100 is added at index 1 and value 2 is moved to index 3.

Array values can be sorted using sort function.

`np.sort(dnpvector1)`

Result:

`array([ 1, 2, 3, 4, 5, 6, 7, 10, 100])`

Values can be dropped/deleted from array using **delete** function.

Below code will delete value at index1 in array dnpvector1.

`dnpvector1=np.delete(dnpvector1,1)`

dnpvector1

Result:

`array([ 1, 2, 3, 4, 5, 6, 7, 10])`

A new array can be created by concatenating values from more than one array using **concatenate** function.

Below code is concatenating arrays dnpvector2 and dnpvector1 and creating new array dnpvector3.

`dnpvector2=np.array([101,102,103])`

dnpvector3=np.concatenate((dnpvector1,dnpvector2), axis=0)

dnpvector3

Result:

`array([ 1, 2, 3, 4, 5, 6, 7, 10, 101, 102, 103])`

Are you wondering, how to find an index of a value? This feature comes very handly when you have a array dataset with millions of records.

You can use **where** function for this.

Below code will return index of value 7 in array dnpvector3

`np.where(dnpvector3==7)`

Result:

`(array([6], dtype=int32),)`

Do you want to create one dimensional array of numbers 1 to 10 with interval of 2. You need to use **arange** function. This function will come very handy while plotting visualizations during data exploration and result analysis.

Below code will create array dnp1 with values from 1 to 10 (10 not included) with interval of 2.

`dnp1=np.arange(1,10,2)`

dnp1

Result:

`array([1, 3, 5, 7, 9])`

How about creating an array with all zero values? Use **zero** function for that.

Below code will create one dimensional array containing ten zeros.

`dnpzero=np.zeros(10)`

print(dnpzero)

Result:

`[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]`

Like wise you can create array with number 1, using **ones** function. I don’t think it needs a demonstration.

A very useful function in numpy is **linspace**. It creates numbers in the range provided with the space mentioned. Let’s go through below example.

If you want to print 20 numbers between 1 and 5 (5 included) with equal spacing, you need to run below code.

`dnplinespace=np.linspace(1,5,20)`

print(dnplinespace)

Result:

`[1. 1.21052632 1.42105263 1.63157895 1.84210526 2.05263158`

2.26315789 2.47368421 2.68421053 2.89473684 3.10526316 3.31578947

3.52631579 3.73684211 3.94736842 4.15789474 4.36842105 4.57894737

4.78947368 5. ]

This function will come handy while drawing visualization.

Let’s create a one dimensional array of sixteen random numbers using **randn** function.

`dnprandomarray=np.random.randn(16)`

print(dnprandomarray)

Result:

`[-1.76865566 0.44877883 1.65279147 0.9642689 1.14989971 -0.44664962`

-1.29255905 -2.04026537 0.4573302 1.24653905 1.23037704 0.02799583

1.49594113 -0.34639749 -1.41667387 1.31684675]

Maximum and minimum values of array can be known through **max** and **min **functions.

`print(dnprandomarray.max())`

print(dnprandomarray.min())

Result:

`1.6527914692877606`

-2.0402653731995333

If you want to know the index of maximum and minimum values, you can use **argmax** and **argmin** functions.

`print(dnprandomarray.argmax())`

print(dnprandomarray.argmin())

Result:

`2`

7

How to find the size of an array? Use **size** function.

Below code will return size of dnprandomarray array.

`dnprandomarray.size`

Result:

`16`

Wondering how to find value at a specific index. Just call array[index] like below.

`print(dnprandomarray[5])`

Result:

`[0.44877883 1.65279147 0.9642689 1.14989971]`

If you want to fetch value between indexes run array[firstindexposition:lastindexposition+1]

Below code will return value at index 1,2,3 and 4 from array dnprandomarray

`print(dnprandomarray[1:5])`

Result:

`[0.44877883 1.65279147 0.9642689 1.14989971]`

## Multidimensional Array (Matrix)

Very Good!

Now that you got enough exposure to single dimensional array, let enhance that knowledge with multidimensional array.

While single dimensional array contains value either in one row (horizontal array) or one column (vertical array), multidimensional array contains values in more than one row and one column. Multidimensional arrays are indicated as array[rowindexfrom:rowindexto,columnindexfrom:columnindexto]

Let’s create a 5x4 multidimensional array of random numbers and explore the world of multidimensional array.

`dnprandom=np.random.rand(5,4)`

print(dnprandom)

Result:

`[[0.21067598 0.07644456 0.51538545 0.81564459]`

[0.66354799 0.75554928 0.74104759 0.29199204]

[0.54019744 0.96360781 0.62939973 0.07646806]

[0.29973016 0.76815988 0.3176048 0.1235475 ]

[0.39714328 0.95994687 0.43036685 0.08273214]]

Let’s find shape of this matrix using **shape** function.

`print(dnprandom.shape)`

Result:

`(5, 4)`

Let’s find data type of this matrix values using **dtype.**

`print(dnprandom.dtype)`

Result:

`float64`

Let’s print value at 4th row and 3rd column.

`print(dnprandom[4][2])`

print(dnprandom[4,2])

Result:

`0.4303668539112995`

0.4303668539112995

Notice below picture to understand how numpy numbers rows and columns and why it is returning last row and second last column value when you call 4,2

Also, note you can get value at 4th row and 2nd column by calling array[4][2] or array[4,2] . Both will return same value.

How about, if I want to create a subset of this matrix?

Below code will create subset of above matrix with values from row 1 and 2 and columns 2 and 3

`print(dnprandom[1:3,2:4])`

Result:

`[[0.74104759 0.29199204]`

[0.62939973 0.07646806]]

Not the 3 from row range 1:3 and 4 from column range 2:4 are not included in the resulting data subset.

## Operations

Now that we understand numpy arrays, let’s do some mathematical operations on it. These operations can be performed on single dimensional as well as multidimensional arrays.

Let’s create a multidimensional array

`dnpintmat=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])`

print(dnpintmat)

Result:

`[[ 1 2 3]`

[ 4 5 6]

[ 7 8 9]

[10 11 12]

[13 14 15]]

What would be result of below code?

print(dnpintmat>7)

It will indicate True for all the values greater than 7 and False for other values.

Result:

`[[False False False]`

[False False False]

[False True True]

[ True True True]

[ True True True]]

What if you want to create a subset of above array with all the values between 7 and 14.

Run below code

`dnpintmat2=[(dnpintmat>7)&(dnpintmat<14)]`

print(dnpintmat[dnpintmat2])

Result:

`[ 8 9 10 11 12 13]`

What is there is need of **broadcasting**. As ans example, what if one wants to replace values in 3rd row and 4th column of dnpintmat array with 100.

It can be done by directly assigning the value like below.

`dnpintmat[3:4]=100`

print(dnpintmat)

Result:

`[[ 1 2 3]`

[ 4 5 6]

[ 7 8 9]

[100 100 100]

[ 13 14 15]]

Likewise you can add values as below

This code will add 50 to each value of matrix dnpintmat

`print(dnpintmat+50)`

Result:

`[[ 51 52 53]`

[ 54 55 56]

[ 57 58 59]

[150 150 150]

[ 63 64 65]]

## Mathematical & Statistical Functions

You can directly call mathematical and statistical functions like **sum, sqrt, log, exp** and** std** on numpy arrays.

Below are some examples.

**sum** funtion will return total of all the array values.

`print(np.sum(dnpintmat))`

Result:

`387`

Likewise **std** will calculate standard deviation among the array values.

`print(np.std(dnpintmat))`

Result:

`37.31880669760668`

**sqrt** will calculat sqrt of each value in matrix.

`print(np.sqrt(dnpintmat))`

Result:

`[[ 1. 1.41421356 1.73205081]`

[ 2. 2.23606798 2.44948974]

[ 2.64575131 2.82842712 3. ]

[10. 10. 10. ]

[ 3.60555128 3.74165739 3.87298335]]

Congratulations!!

Now you know enought numpy to start exploring it on data in you journey of Machine Learning and Data Science.

Link to my jupyter notebook with above commands. ->https://github.com/srssingh/Machine-Learning/blob/master/Numpy%20for%20ML.ipynb

Reference