Introduction to numpy for beginners

Published in

Analytics Vidhya

10 min readApr 4, 2020

Numpy Introduction

Numerical python — Numpy is the fundamental python library that supports scientific computing. If you are an individual that aims to master machine learning or analytical skills with python , numpy is like the 102 course. (python basics is 101)

Numpy library contains the following main features, among many other things.

A powerful N-dimensional array object — ndarray

Useful linear algebra, Fourier transform, and random number capabilities

And a hand full of functions to manipulate those operations

And essentially Numpy lays the foundation of the Machine Learning and Data analyzing methodologies. So having a good understanding of numpy is nearly mandatory for those who deal with those disciplines .

In this article I aim to cover the most basic and frequently used , minimally sufficient set of Numpy principles and operations to start with.

Getting Started

I will keep this simple. you can install this library to your preferred environment using the pip or conda repository.

pip install numpy — for python2 version or
pip3 install numpy — for python3 version or
conda install numpy — to install with conda

Then you should import this library to the python program with an easy to use alias , as follow

import numpy as np

This term np is arbitrary, you may use a term you prefer because its like a nick name we are giving to the library to refer to it here after within our program. But its recommended to use np, as it is the standard and the easiest alias to use. Now lets see what we can do with numpy.

Numpy’s main object is the homogeneous multidimensional array. It is also known by the inbuilt alias “array”

(Homogeneous — composed of same type objects )

It is a table of elements all of the same type, indexed automatically by a tuple of non-negative integers.

In Numpy dimensions are called axes.

In case you might wonder isn’t it the same as usual python lists ,

No, ndarrays are not the same as usual list data structure in python, they come with many more operations inbuilt, and their implementation is faster than a normal list , and at the same time they occupy less storage space compared to normal lists . that is mainly due to its homogeneous property and the fixed size.

OK , now depending on the number of dimensions that an ndarray has, we refer to them as follow.

1d array — vector

2d array — matrix

Numpy arrays / vectors

This data structure has two components in it. The data part and the index part. But indexing is done automatically.

myvector = np.array([1,2,3])

as you can see, what we have given inside this np.array( ) function as the argument, is a list. [1,2,3]

so we can do the same using the following method aswell.

mylist = [1,2,3]

myvector = np.array(mylist)

Also, like we generate lists using the range( ) function, we have arange( ) function to generate a range of values into an ndarray.

And check the following to get an idea about how arange( ) function really works, this picture will explain it better.

np.arange (start , stop , step )

Another important facility in numpy is being able to generate data structures of desired size with random values. Lets see how we can generate a vector of length 3, using random number generation.

Accessing elements

We can easily access the elements in an ndarray using its index, to use that stored value in any operation, or to change that value, as follow.

Creating a Matrix

2-D Arrays in Numpy are usually called Matrices. Lets create a matrix of 3 columns and 2 rows

This matrix right here, along with its indices, will look like this in an graphical representation

Indices of the elements in a matrix is expressed with both its vertical and horizontal position.

Index = (row number , column number)

Index of 3 = (0,2)

Index of 5 = (1,1)

Using this indexing method we can access the elements of a matrix perfectly as we did with the vector.

ndarray attributes

Numpy ndarrays has many attributes (keep in mind that these are properties, not functions) and following are the most often used ones.

ndarray.ndim

The number of axes (dimensions) of the array.

ndarray.shape

Dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension.

For a matrix with n rows and m columns, shape will be (n,m).

The length of the shape tuple is therefore the number of axes / dimensions ,which is given by ndim.

ndarray.size

The total number of elements of the array. This is equal to the product of the elements of shape.

ndarray.data

The buffer address containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

ndarray.itemsize

The size in bytes of each element of the array.

For example, an array of elements of data type float64 has itemsize 8 (=64bits/8) , and one of data type complex32 has itemsize 4 (=32bits/8).

It is equivalent to ndarray.dtype.itemsize

ndarray.dtype / ndarray.dtype.name

Type of the elements in the array. One can create or specify dtypes using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

reshape ( ) function

shape is the property that shows the dimensions of the array.

With reshape function we can change the shape of an array. Without changing its data. Following are the important things to keep in mind about reshape function.

When you give a new shape you should make sure that the total number of elements in the new shape is the same as the current shape.

For example we have 6 elements in an 2*3 matrix, we can reshape this to 3x2 , but cannot reshape into 4x2 ( see the error in below image)

Also, the change will not be applied to the original array unless we mention it to , as you can see below

Also we can use flatten() function to convert a matrix to a linear array

Manually adding new elements to arrays

append( ) function

append function can be used to concatenate new elements to a vector. But if you use this with a matrix, it will be flattened . always remember that to apply the changes to the original structure, you should always use the syntax as

old_structure = operation (old_structure , new_change)

Adding rows and Columns to a matrix

Unlike with the append( ) function , we can add rows and columns to a matrix using the following method.

There are another two functions that does the same thing. When you have the data in a different format, in the case of columns. knowing that will be usefull as-well. vstack( ) and hstack( )

SELECTING ELEMENTS / SLICING

We can select elements in a numpy ndarray in different ways in addition to selecting just one element at a time, using following regular expressions.

Finding the max and min values , Calculating Average, Variance and Standard deviation

#Return the max element
np.max(matrix)  or
matrix.max()#Return the min element
np.min(matrix)  or
matrix.min()#To find the max element in each column
np.max(matrix,axis=0)   or 
matrix.max(axis =0)#To find the max element in each row
np.max(matrix,axis=1)   or
matrix.min(axis=1)

#Mean
np.mean(matrix)  or
matrix.mean()#Standard Deviation
np.std(matrix)   or
matrix.std()#Variance
np.var(matrix)   or
matrix.var()

Matrix operations

Transposing a vector or a Matrix

By transposing you interchange the rows and columns of a Matrix. That is one of the basic matrix operation

#Transpose the matrix
print( matrix.T )

Finding the Determinant

#Calculate the Determinant
 np.linalg.det(matrix)

Getting the Diagonal of a Matrix

We can use this function when we need to extract only the diagonal elements of a matrix

#Print the Principal diagonal
matrix.diagonal()#Print the diagonal one above the Principal diagonal
matrix.diagonal(offset=1)#Print the diagonal one below Principal diagonal
matrix.diagonal(offset=-1)

Calculating the trace of a Matrix

Trace of a Matrix is the sum of elements on the Principal Diagonal of the Matrix.

#Print the Trace
 matrix.trace()

Inverting a Matrix

This is used when you want to calculate the inverse of a Square Matrix

#Returns the inverse
np.linalg.inv(matrix)

Adding, Subtracting and Multiplying Matrices

matrix_1 = np.array([[1,2,3],[4,5,6],[7,8,9]]) 
matrix_2 = np.array([[7,8,9],[4,5,6],[1,2,3]])#Addition
np.add(matrix_1,matrix_2)#Subtraction
np.subtract(matrix_1,matrix_2)#Multiplication Element wise,
matrix_1*matrix_2#Multiplication – Dot product
#here you have to make sure that no. of columns in 1st matrix = no. #of rows in #2nd matrix.
np.dot(matrix_1,matrix2_)

Creating a Sparse Matrix using scipy library

In real world cases where we have to apply the matrix operations it is very likely to have data sets with so many null or zero values in it. Specially in Machine Learning it is very common having a huge amount of Data; and most of the elements in the data are Zeros.

In such cases we can convert our normal matrix into an sparse matrix where it stores only non zero elements and assume all other values will be zero, leading to significant computational savings , and making the structure perform fast. But this function is not a numpy function, its included in scipy yet it is so useful so I will mention it here

#import the feature
From scipy import sparse#Create a Matrix
matrix = np.array([[0,0],[0,1],[3,0]])
print(matrix)#Create Compressed Sparse Row(CSR) matrix
matrix_sparse = sparse.csr_matrix(matrix)
print (matrix_sparse)

Generating Random values

Numpy offers a wide variety of means to generate Random Numbers. We already talked about the most basic way of doing it but that’s not all, lets see whats more we can do with the random generations. I will discuss 3 important features of the random class

np.random.rand( n ) simply gives you n different random values.

#try
print( np.random.rand (5))

np.random.randint(lower limiit,upper limit, n) gives you n random integers between the lower and upper limit

#try
print( np.random.randint(0,10,4) )

np.random.normal(mean,std, n )

Gives you a normal distribution of n random values , as per the mean and standard diviation you input.

#try
print ( np.random.normal(2,3,5) )

np.random.seed()

Random numbers are generated by a (pseudo-)random generator. This generator works by starting with a number called the seed, multiplying it by a large number, adding an offset, then taking modulo of that sum.

The resulting number is then used as the seed to generate the next “random” number.

When you set the seed manually every time you use a random generation, it does the same thing every time, giving you the same set of numbers because it can sometimes be useful to return the same random numbers to get predictable, repeatable results. We can do so by setting the ‘Seed’ of the pseudo-random generator.

If you want seemingly random numbers eachh time, do not set the seed.

Broadcasting

It is the ability of numpy to extend an operation to all the elements of an matrix without having to use an external loop.

np.zeros ( (m,n) )

gives you a matrix of zeros of the size m x n .

np.eye( n )

gives you an Identity metrix of n x n

well i hope this might be an useful resource to anybody who wishes to refresh the numpy knowledge or wishes a head start in python machine learning or analytics. See you soon with an article for pandas aswell, adios then.