Your guide to Supervised Machine Learning — Regression

Published in

CodeX

12 min readJan 21, 2022

Learning and developing an intuition for the working of Regression

Note: Four pillars of Machine Learning #2 — Linear algebra and calculus is the previous blog in the “Complete Machine Learning and Deep Learning for Beginners” series. It is recommended to have the prior knowledge discussed in the last of blogs.

The little Code and data used in this Blog can be found ‘here’, Do not worry if you do not know how to code or do not understand the code in this blog. This blog is not a coding tutorial. This blog’s purpose is to explain the concepts, you can ignore the code as this tutorial is not coding dependent, but if someone wants to dig deep, they can go through it.

For those unfamiliar with code or Libraries used, a blog will be released as a part of this series which will introduce you to coding and libraries used.

Blog’s Outline:

Introduction
Supervised Machine Learning

2.1 Introduction to Regression
2.2 Introduction to Classification
2.3 Continuous and Discrete Data

3. Regression

3.1 Dependent and Independent Variables
3.2 Types of Regressions
3.3 Simple Linear Regression
3.4 Multiple Linear Regression

1. Introduction

We have already discussed supervised and unsupervised machine learning earlier in this series. In this blog, we will learn more about Regression in Supervised Machine Learning and how it works. You will need a bit of Mathematics, nothing too high; just a basic knowledge of Linear Algebra will be helpful.

2. Supervised Machine Learning

As discussed earlier in one of our previous blogs (Getting Familiar to The World of Machine Learning), when we train our model based on some labelled dataset, it falls under Supervised Machine Learning. Now, this type of Machine Learning is further divided into two parts :

2.1 Regression: In Regression, we are trying to predict “Continuous Variables” (discussed ahead) like Price of a house, Number of covid cases, Price of a stock, Temperature on a particular day etc.

Example: Suppose you were given data about how many hours a student studies, and you have to predict students marks.

2.2 Classification: In classification, we are trying to predict something called “Discrete Variable”(discussed ahead) like the category of a Flower, Human Gender, Is an email Spam or not etc., where we are sure that the output is going to be from a particular list of categories.

Example: Suppose your friend owns a pet, and she tells you its characteristics and that it is one among [Rabbit, Dog, Cat, Cow]. Now you guess if it is like a Dog or a Cat etc.

2.3 Continuous and Discrete Data

We need you to do two things for us:

Guess a number.
Pick an animal from [Dog, Cat, Cow, Rabbit]

Now what you did here is just learnt what discrete and continuous variables are. In the first one, we guessed some form of a number which may be anything; we are not bounded, this is continuous data it could be 10, 100, 10.10, 57.80, etc. whereas, in the second one, we have picked some value from a well-defined pool of values which are discrete.

3 Regression

Regression is the study and use of data to predict a continuous variable based on the findings from the dataset. To do this, we need to know about Dependent Variable and Independent Variable.

3.1 Dependent and Independent Variables:

“Dependent Variable is a variable that is dependent upon different factors that serve as independent variables.”

Got confused? don't worry read ahead

Suppose you have data which is as follows :

This data is about how many hours a student studies versus how many marks scored in the exam. In this case scenario, your dependent variable is the Marks Scored in the exam by a student, which is affected by the number of hours the student studied, which is an independent variable in this data.

So, Hours = Independent Variable, Marks = Dependent Variable and as we already know, the Marks will depend upon the number of hours studied.

Now try to reread the First statement; it would not be as confusing as before.

3.2 Type of Regressions:

Now Regression can be done in many ways. Some Regression methodologies are:

Simple Linear Regression
Multiple Linear Regression
Logistic Regression
Ridge Regression
Lasso Regression
Polynomial Regression
Support Vector Regression
Decision Tree Regression
Random Forrest Regression

We will cover the First 2 in this blog, and the rest will be covered later in the series as they are a bit advanced.

3.3 Simple Linear Regression:

Have you ever heard of slope in a graph? or just the equation y=mx+c? If you have, then you already know what Simple Linear Regression is. If you haven't, then we are here for you. Read ahead.

Suppose you buy two chocolates for 10$, can you guess the Price of 4 chocolates, it will be 20$ well you can come to this result in two ways :

First:

Figure out the price of 1 chocolate, then calculate the price of 4 chocolates. Like if 2 chocolates are for 10$, then 1 chocolate is for 5$ that means four chocolates are for 20$

Second:

Make a graph for it (a bit tedious but does the work) so suppose you ask the shopkeeper the price of 2 chocolates he tells you 10$, 20$ for 4, 30$ for 6 and so on…, so what you can do with this is sketch a graph which will look something like this.

Price of Chocolates VS Number of Chocolates. By Author

Now if someone asks you the price of 50 chocolates, looking at the graph, you can say that it is 250$.

Now what you did here in the second method was to create a Simple Linear Regressor. Actually, in this example, it does not make sense to use linear Regression because this is not something you would need Linear regression for. This is not an estimation but a sure thing. This was to show you how we can use graphs to relate two variables. Now we will see how we can use Linear regression for real-world data.

Let us learn how to apply Simple Linear Regression to real-world data:

Suppose you want to predict how many marks will be scored by a student if he studies for a certain number of hours, you can have a look at the data below, and you can find the whole data here

Hours Studies VS Marks Scored. By Author

Now we want to predict the marks scored on the number of hours spent studying. Now to do this, we will create a Simple Linear Regression model. First, let us plot a 2-D graph of the students' marks VS Hours Studied. It will look something like this :

Now we will try to create a Linear Regression line to predict the scores. Now how to draw the line? for this, we will have to come up with a line that closely fits the data to predict the marks more accurately. Look at these graphs carefully:

Those blue dots represent the data points, and the red line represents our predictions. In which graph do you think we are close to the accurate answer? The correct answer will be the top-right graph because our prediction line fits the data well, and our prediction will be very close to the actual answer.

But why did our intuition suggest that it should be a top-right graph? The top-right chart minimises something called Mean-Squared-Error or (MSE); the smaller it is, the better our prediction line works. Let us understand what MSE is. Look at the graphs below:

These vertical lines represent how much error was there in our prediction. In actual data, if we take an average of the square of this error, we will have something called the Mean Squared Error. Whenever we try to predict something, we look to minimize the error between the actual answer and our prediction. This error can be called “cost”; thus, we are trying to minimise this cost, and this cost or error is calculated using something called “Cost Function” in this case, the Cost Function is Mean Squared error. The less the Cost Function, or we can say Mean Squared error, the better the Model.

So, now you know which line is best for our model out of the above 4, but the question is how we got there? How do we get to those 4 lines? Let us learn that below:

So every line on a graph can be represented in an equation that is y=mx + c. Now we can manipulate the line by changing the “m” that is called slope and “c”, which is called Intercept now. Changing these two properties can change the line. Have a look below:

Now, we can guess the combination of “m” that is the slope, and “c” that is Intercept enough number times. This is what our computer does, it checks over different combinations of “m” and “c”, and then after enough guesses, we will pick the combination that has the lowest “cost function” value, i.e. the combination of “m” and “c” that has minimum Mean Squared Error that we discussed earlier.

3.4 Multiple Linear Regression :

In simple linear Regression, we saw how we could relate one dependent variable and one independent variable together to create a Linear Regression model, but what if we have one dependent variable and two independent variables?

Suppose you want to predict the price of a house, so it will depend upon house area, water availability, electricity availability, etc. For now, let us consider that the price of the house is only dependent upon two features: electricity water availability. Now how would we predict the cost of a home using these two?

Suppose we have a dataset:

We have discussed in Simple Linear Regression about how we can approach creating a Linear Regression. If we just had two variables that can be Price VS Water OR Price VS Electricity, the graph would be plotted as :

If we apply Linear Regression over the data, we will get:

So now we have two independent variables: electricity availability and water availability. We want to use them to predict the dependent variable house price, so we have three variables to draw it into a 3d-graph with one z-axis as the price of a house and x-axis as water availability and y-axis as electricity availability.

If we plot the data on a 3-D graph:

Notice the Positive Quadrant, i.e. (+x,+y,+z), keep the reference of +x, +y, +z axes to notice the placements of points.

By Author (Site used to create the graph: Link)

Now how would we fit a Linear Regression Model over this? If you notice in our Simple Linear Regression as it was 2-Dimensional, we got Regression Linear Regression Model, now as we have moved to 3-D in this case, we will get a plane? But to define a Plane in a 3-D space, we need an equation, and you can specify any plane in a 3-D space using the equation:

By Author

We saw that in the case of Simple Linear Regression, we needed an equation like y=mx + c, but the trick was finding “m” the slope and “c”, the intercept. we tried different values, which gave us different lines, and the Line that gave us the minimised Cost Function(MSE, “Mean Squared Error” in this case) gave us the required line. (Read that again if you got lost, you would get it, we have covered this in Simple Linear Regression).

We are going to follow the same procedure here. We will try different values of a, b, d and the plane that gives us the minimum MSE(Mean Squared Error) will provide us with the required plane, which will be our Regression Model. (To guess a,b,d fast, we use an Optimisation Technique called “Gradient Descent”, which will be explained in another blog).

After different tries over a,b,d, our computer came up with the values which will give us minimised MSE, and those values for this model were

So this is the equation of Plane that will give us Minimized MSE. If we plot this plane on our 3-D space, we will get something like this :

So this will be our Linear Regression plane, which tells us the price of a house based on the feature inputs, But the question that arises here is that

What if there are more than two features?

Suppose you have a dataset where your dependent variable is the Price of a house. Independent variables are (House Area, No. of Rooms, No. of Floors, Water, Electricity, Locality), all these can not be represented in a 3-D graph, if we try to plot this data then we will need to have a 6-D graph, but that is not possible to visualise, we can not move ahead of 3-D space, so we will need a hyperplane, but instead of trying to understand it using hyperplane concept we will appreciate it this way:

In the 3-D space example, we saw that the Linear Regression equation was like Z = 0.19264305*X + 0.22561308*Y + 0.6171662125340589, which resembles the equation.

By Author

So in this equation, Z is the price of House, ‘x’ is the feature variable (Water Availability in this case), ‘y’ is the feature variable (Electricity Availability in this case) And the a,b are the weights of the feature which gives us the required line so in the example we solved earlier the weight of the feature x was 0.19264305, weight of the feature y was 0.22561308 and d was the intercept which was 0.6171662125340589. So if we want to deal with a dataset where your dependent variable is the Price of a house and Independent variables are (House Area, No. of Rooms, No. of Floors, Water, Electricity, Locality) instead of going into the hyperplane theory, for now, we can follow the pattern of the above equation which will result into:

By Author

Where Y is the price of a house, X1 is the House Area feature, and b1 is the weight of this feature, X2 is the No. of Rooms feature, and b2 is the weight of this feature, and so on till the Locality feature that is X5 and its weight is b5, the last we are left with is e that is the Intercept.

This way, we can create a linear regression for a dataset with more features.

Conclusion

Linear Regression is widely used in the world of Machine Learning. It is one of the powerful yet easy algorithms to predict continuous data. There are different modifications of Linear Regressions like Ridge, Lasso which will be covered later in this series after you are made familiar with Gradient Descent, Overfitting and Underfitting etc.

It would have been a lot to digest if you were an absolute beginner to Regression. Take your time and go through Regression once more if you are not clear in the first attempt, it will get clearer.

If you found this blog helpful, kindly consider following us to receive new informative blogs from the series.

Previous Blog — Four pillars of Machine Learning #2 — Linear algebra and calculus

Next blog — Your guide to Supervised Learning — Classification

Harshit Yadav - Medium

Getting Familiar to The World of Machine Learning Bird's eye view of Machine Learning Note: This is the first blog in…

medium.com

Sarthak Malik - Medium

Four pillars of Machine Learning #1 - Statistics and Probability Hola, Machine learning lovers! This blog is the second…

medium.com

Your guide to Supervised Machine Learning — Regression

Blog’s Outline:

1. Introduction

2. Supervised Machine Learning

2.3 Continuous and Discrete Data

3 Regression

3.1 Dependent and Independent Variables:

3.2 Type of Regressions:

3.3 Simple Linear Regression:

3.4 Multiple Linear Regression :

Conclusion

Harshit Yadav - Medium

Getting Familiar to The World of Machine Learning Bird's eye view of Machine Learning Note: This is the first blog in…

Sarthak Malik - Medium

Four pillars of Machine Learning #1 - Statistics and Probability Hola, Machine learning lovers! This blog is the second…

Written by Harshit Yadav