Introduction to Machine Learning-Studying about linear and logistic regression

Hello World!

Dhairya Parikh
Coinmonks
Published in
8 min readJun 27, 2018

--

Welcome to my first machine learning tutorial, in this I am going to explain how to get started with one the most trending fields of the 21st century :

“MACHINE LEARNING”

Now, to know more about it, first we need to know about the main branch on which it is based, that is Artificial Intelligence.

What is Artificial Intelligence?

Artificial intelligence (AI) is the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.

The computer processes include learning , reasoning and self-correction.

Virtual personal assistants like Amazon’s Alexa, Apple’s Siri, etc. are excellent examples where AI is largely deployed.

As seen in the figure, Machine learning is a sub branch of the vast field of artificial intelligence, and the concept of machine learning can be easily explained by the simple diagram that is shown below:

So, this states that — the ability of as computer to generate a program when it is provided with inputs and outputs of a problem is simply called machine learning.

Machine learning is further subdivided into the following branches :

We will stick to the supervised learning algorithms for this tutorial. The topics that will be covered in this tutorial will be :

1. Setting up the environment

2. Introduction to Supervised Learning

3. Linear Regression

4. Logistic Regression

5. Conclusion

Lets get started!

1. Setting up the environment :

This is first step to take when you want to start with machine learning. You need to setup a work environment so that you can start your machine learning expedition. Please note that I have used Python 2 for this tutorial, so it would be best if you stick with python 2, if you are a beginner.

To do that follow the steps as stated below:

1. Download and Install anaconda from https://www.anaconda.com/download/ .It’s available for OS like Windows, Linux and Mac. Please download the setup after checking whether you are on a 32 or 64 bit OS.

This is a very useful software for you which is packed with a bunch of additional features which make our lives easy when we are working in the field of artificial intelligence.

I would recommend to use this over the base python package as it has the conda environment creator and jupyter notebook preinstalled, which will be used very frequently in the coming tutorials.

2. Now, once you have downloaded the anaconda setup, run it and install it on your PC. This might take some time, so be patient.

After it is done, go to the start menu and scroll down search for the anaconda folder, and open the anaconda prompt.

When you open it, you will see the following window appear on your screen :

Now, you are ready to proceed to step 3.

3. Now that you have your prompt window up and running, its time to create an virtual environment. These creates a workspace for a particular project, and whatever libraries we download within this environment, are accessible only when we have that environment activated in our anaconda prompt window.

Type the following line in the prompt to create a new environment.

conda create -n yourenvname python=x.x anaconda

Select the desired name you want and replace it with the youenvname and specify your python version which is 2.7 in my case. After you run this command, you will see something like this on your prompt.

It will take a few minutes to create the environment, so be patient! Once its done the prompt will display the message that it was successful. Now to activate the environment, you just need to type:

activate yourenvname

which will look something like this, after its activated.

Now install the following libraries in this environment as they are the basic and required if you want to start machine learning. They are -

1. Numpyfor data pre processing and model creation

2. Pandasfor data fetching

3. Matplotlibfor plotting the results.

Using the pip command download the following libraries:

pip install numpypip install pandaspip install matplotlib

After this is done, you are all set and ready to dive into the world of machine learning so lets continue!

2. Introduction to Supervised Learning

Supervised Learning

Supervised Learning is learning that is executed using labeled data points. In other words, parameters are predefined and the algorithm knows that its looking for relationships between predefined parameters. The training data that the algorithm used already has enough details and labels which allow the algorithm to use positions of data points to infer a relationship between multiple variables. Let’s go through an example:

Supervised Learning — Regression

Suppose you have a data set of apartment rent pricing in New York City. Each apartment has an attribute of monthly rent and square footage. Therefore, a data set graphed out would look something like this:

P = Monthly Rent A = Square Foot

Based on the above, a machine learning algorithm would analyze the positions of each data point and generate a predictive function that can accurately determine the price of an apartment based on its square footage. The function can be represented by the solid line below.

Line represents the relational function between Square Footage and Monthly Rent

Based on the predictive function, the algorithm will now be able to estimate the price of an apartment based on its square footage. So where does the learning come into play? As more apartments are rented out, the supervised learning platform will add more “real” data points to the graph and the machine learning algorithm will update and change the function accordingly reflect the positions of the new data points. Therefore, as more apartments are rented, the algorithm has more real time data to analyze which will make the predictive algorithm more accurate. The aforementioned example is a form of regression tool of supervised learning. There other version of supervised learning is the classification method in which there are a finite amount of classes (Yes or No).

Now that we know about regression, we will now be learning about the two most basic algorithms of supervised learning, that are :

1. Linear Regression

2. Logistic Regression

So lets get started. :)

3. Linear Regression

This is the simplest algorithm that we use in machine learning.

In this, we make an assumption that we have a linear relationship between the input and the output, the simplest non-trivial relationship.

Even if the data is non linear, it is linear in a range of interest and we use this range for our problem.

Now, to understand this, I have coded a sample iPython notebook to better understand the concept of linear regression. For accessing this, download the files from my git repo link for which is provided below. After that is done, extract the zip file and copy the file path.

Now, open the anaconda prompt and activate your environment. Now change the directory to the downloaded folder using the following command :

cd yourdirectorypath

Now, once you are in the directory, type jupyter notebook in the prompt. After a few seconds you will see a local window like this open on your default browser.

Now, open the linear regression example file and you will have the notebook in front of you, so lets understand it!

In the first box, we have imported all the required libraries for the task.

Now, to run a cell in jupyter notebook, you just need to press Shift+Enter.

Now, in the second cell, we have created the data that we will be using. We created an 10 element array under the variable x (which is the input) and the output is y = 2x+3 which is a basic linear relationship. The input are called Features and the output are called Labels.

Now, all the other details are clearly provided in my notebook, with comments and explanations, so go through it and feel free to ask any doubts you have regarding this.

4. Logistic Regression

Logistic regression is estimating the parameters of a logistic model. More formally, a logistic model is one where the log-odds of the probability of an event is a linear combination of independent or predictor variables. The two possible dependent variable values are often labelled as “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. The binary logistic regression model can be generalized to more than two levels of the dependent variable: categorical outputs with more than two values are modeled by multinomial logistic regression, and if the multiple categories are ordered, by ordinal logistic regression, for example the proportional odds ordinal logistic model.

Now, to understand the algorithm with an example, please open the logistic regression example notebook from my repo.

Everything you need to know about this algorithm is clearly explained in my notebook, so please refer it!

5. Conclusion

So, now you know what is machine learning and you know about the two most basic algorithms too.

This tutorial is for the people who are enthusiastic about AI and ML but have difficulty in learning through the complex tutorials that are available on the web, so this a simple tutorial created for them.

I will be posting further tutorials on ML if I get a good response on this one, so please let my know, did this tutorial help you and any improvements that can be made. Thank you!!

The link to my git repository is :

https://github.com/Dhairya1007/Machine-Learning-Basics

--

--