The Three Musketeers of ML: Introduction to NumPy, Pandas and Sklearn .

Anushkad
GoDataScience
Published in
6 min readAug 6, 2020

Who are the targeted audience?

This tutorial has been prepared for those who want to learn about the basics of NumPy, Pandas and Sklearn. It is specifically useful for algorithm developers and anyone who is curious about Machine Learning and wants to have in depth knowledge about ML or just needs to brush up a few concepts. After completing this tutorial, you will find yourself at a basic level of expertise from where you can take yourself to higher levels of expertise.

Why would this tutorial prove useful ?

Since libraries are an integral part of Data preprocessing understanding these libraries is of utmost importance. Knowing the functions these libraries can provide can make your coding tasks a lot simpler and help you save your precious time and energy.

To explore any path, we need to brush-up some skills that lay foundation and help us ease our journey to reach our ultimate destination.

In depth knowledge of Python Libraries helps us lay this strong foundation in mastering Machine Learning which proves essential in the long run.

Numpy, Pandas, Scikit-learn are some of these important libraries which can make machine learning a whole lot easier and time saving. They are the pillars on which a strong model can be designed.

What are python Libraries?

A Python library is a reusable chunk of code that you may want to include in your programs/ projects. Each library in Python contains a huge number of useful modules that you can import for your everyday programming.

With technology reaching astonishing heights, Data Science, Artificial Intelligence, Machine Learning are some frequently used buzzwords we get to hear. They have completely transformed the way of living. This technology has proved to be a wonder in itself. So what’s all the fuss?

What is Machine Learning?

Here’s all you need to know about beginning your journey to excel in machine learning.

Machine Learning (ML) is an application of Artificial Intelligence (AI) that provides the system with the ability to learn and improve from experience without the need for explicit programming. Thus, the formal definition of ML is

A computer program is said to learn from experience ‘E’ concerning some task ‘T’ and some performance measure ‘P’ which improves with experience(E)

Okay, so now that we are clear with what Machine learning is, let us understand why we should invest time in mastering it.

Goals of studying Machine Learning:

  • To make the computer smarter/more intelligent. The more direct objective in this aspect is to develop a system for specific practical learning tasks in the application domain.
  • To develop computational models of the human learning process and perform computer simulations.
  • To explore new learning methods and develop general learning algorithms independent of applications.

Now, let us dive right in ML and start our journey to master it. Let us first get acquainted with some python libraries required for Machine Learning.

(Note : The scope of python libraries is very vast to cover up, thus only the basic requirement is fulfilled in this article which can get you going with ease)

If you are thinking about a career in Machine Learning or Data science, the very first thing you will need to do is study some libraries.

Why are Libraries important in Machine Learning?

Machine Learning is largely based upon mathematics. Designing a ML model involves complex mathematical calculations. Python libraries enable us to do these calculations effortlessly without writing numerous lines of code.

Basic Study of NumPy Library:

NumPy forms the foundation for the machine learning stack. NumPy (Numerical Python) is a python package, consisting of multi-dimensional array objects and a collection of routines for processing these array objects.

In this article, we will cover frequently used NumPy operations used in ML

Firstly, we need to import the NumPy library using the following code:

import numpy as np

Once we import the NumPy library we can use various routines that come with the library to perform array operations with ease. These include

  1. Creating a Vector:

1-D array is known as a vector. Vector can be created using NumPy as follows:

#Load Libraryimport numpy as np#Create a vector as a Rowvector_row = np.array([11,21,31])#Create vector as a Columnvector_column = np.array([[15],[25],[35]])

2. Creating a Numpy Array: A 2-D array is known as Matrix. It can be created using NumPy as follows:

#Load Libraryimport numpy as np#Create a Matrixmatrix = np.array([[1,2,3],[41,52,63]])print(matrix)

3. Selecting Elements: Selection of one or more elements from the matrix can be done using the NumPy library as follows:

#Load Libraryimport numpy as np#Create a vector as a Rowvector_row = np.array([ 1,2,3,4,5,6 ])#Create a Matrixmatrix = np.array([[1,2,3],[4,5,6],[7,8,9]])print(matrix)#Select 3rd element of Vectorprint(vector_row[2])#Select 2nd row 2nd columnprint(matrix[1,1])#Select all elements of a vectorprint(vector_row[:])#Select everything up to and including the 3rd elementprint(vector_row[:3])#Select the everything after the 3rd elementprint(vector_row[3:])#Select the last elementprint(vector[-1])#Select the first 2 rows and all the columns of the matrixprint(matrix[:2,:])#Select all rows and the 2nd column of the matrixprint(matrix[:,1:2])

Basic Study of Pandas Library:

Pandas which stands for ‘Panel Data’ has so many uses that it might be a time-saver to point out the things it cannot do, instead of what it can! As humans, we have some basic needs similarly, Pandas is the basic need for your data. Pandas help in analyzing, cleaning, and transforming your data.

We will now look at some essential bits of information regarding Pandas and its use.

To import Pandas we usually import it with a shorter name (np) since it is easy to use and used widely.

import pandas as pd

The primary two components of pandas are Series and DataFrame.

A series is essentially a column, and a Data Frame is a multi-dimensional table made up of a collection of Series.

There are many ways to create a Data frame, the simplest method is to create using a dictionary and then pass it to the DataFrame constructor.

1. Creating a Data frame and locating values:

a. Create a dictionary

data = {‘Pears’: [3, 7, 0, 11],‘oranges’: [0, 9, 5, 2]}

b. Pass it to DataFrame constructor

orders = pd.DataFrame(data)

A dictionary in Python is a pair of keys and values.

Let’s add corresponding keys to the values.

orders= pd.DataFrame(data, index=[‘Jonas’, ‘Dan’, ‘Serena’, ‘Emily’])

c. Locate Values

purchases.loc[‘Serena’]

2. Reading Values from a CSV file :

With CSV files all you need is a single line to load in the data:

df = pd.read_csv(‘Address where your csv is stored’)

Basic Study of Scikit-Learn Library:

If you are looking for a robust library using which you can use to bring your machine learning models into production, Scikit-learn is always a preferred option.

Scikit-learn supports different operations that are performed by machine learning models like classification, regression, clustering, model selection, etc.

You name it — and scikit-learn has a module for that.

This is the basic prerequisite to get you started with some basic ML models. The more we dive deeper the more libraries you’ll explore.

(Image Source: Internet)

--

--

Anushkad
GoDataScience

•Machine Learning and AI enthusiast • Python Programming • Bibliophile • Currently pursuing UG in Information Technology.