Supervised Machine Learning (Regression Algorithms)

Published in

The Startup

6 min readMay 29, 2020

In the last post, we discussed the different types of machine learning and how each one of them was different from the other. In this post, we will discuss the first type of machine learning in the series, that is, supervised machine learning in detail. So buckle yourself up for this post and the series of posts coming up to fully learn about Supervised Learning 😃😃.

Supervised machine learning helps to find the mapping function that maps input, i.e, features to the output, i.e, labels.
A pair of input-output examples are fed to the model which analyzes the pattern in the data and helps to form a mapping function which further helps to find the output for another set of input examples, i.e, test data and later you can check the accuracy of your algorithm by comparing to the output of your algorithm vs the real output.

There are two types of Supervised machine learning:

Regression: In this type of supervised learning, the output has continuous value. For example, house prices are a continuous value that depends on the features of the house. The goal of the model is to predict the prices of the houses based on the features provided (no of bedrooms, area, etc.).
Classification: In this type of supervised learning, the output is in the form of labels (discrete values). In binary classification, the output has two labels, for example, 0/1 or yes/no. In multi-class classification, the output has more than two labels, for eg, a dataset containing four different types of flower pictures and labels are the name of those flowers.

So, today we will be discussing Regression in detail.

Regression

Simple Linear Regression

Simple Linear Regression is a regression machine learning algorithm that focuses on finding the relationship between two continuous variables. The variable which helps in prediction is the independent variable whereas the variable which is being predicted is called the dependent variable. The relationship between the two variables is the statistical one, which means the variables may not be completely accurate in determining the relationship between them. For example, the relationship between the salary and years of experience of an employee. You cannot determine the salary of an employee accurately just by considering his years of experience.

The goal of the simple linear regression is to find the line that fits the data best and the overall error is low, meaning the distance between the line and the data points is minimum.

h(x) = Q0 + Q1*x

where Q0 and Q1 are parameters of the regression equation, x is the independent variable, and h(x) is the predicted output. Our goal is to find the values of Q0 and Q1 such that the value of (h(x)-y), y being the true dependent variable, is minimum. We want the difference between the predicted value and the true value to be minimum to get the best line that passes through almost all the data points.

Gradient descent algorithm is the solution used to minimize the squared mean error, i.e, (h(x)-y)², to give the best possible result.

In the above diagram, we are trying to estimate the height of a person using the weight information. 130.2 and 0.6 are the parameters Q0 and Q1 respectively, weight is the independent variable, and height is the dependent variable.

Python code:

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0)
y_train.shape[0]from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train.reshape(400,1),y_train.reshape(400,1))
y_pred = regressor.predict(x_test.reshape(100,1))

Multivariate Linear Regression

Multivariate Linear Regression is a machine learning algorithm in which we use multiple variables to predict the outcome of the dependent variable. This algorithm is an extension of the simple linear regression. We have more than one independent variable which is used for the prediction of the dependent variable. For example, while calculating the cost of a house, we take the area, number of rooms, balconies, etc. into account for a better and more accurate result.

h(x) = Q0 + Q1*x1 + Q2*x2 + Q3*x3 +……..+Qnxn

In the above equation, Q0, Q1, Q2, Q3…Qn are the parameters, x1,x2,x3…xn are the independent variables and h(x) is the dependent variable which is to be predicted. The goal is to find such values of the above parameters so that the mean squared error, i.e, (h(x)-y)² (also called the cost function) is minimum and we achieve optimum results.

Gradient descent helps to find parameters such that the value of cost function is minimized and the predicted values obtained are as close to the real values as possible.

Polynomial Regression

Polynomial Regression is another type of regression algorithm in which the relationship between independent and dependent variables is modeled as the nth degree of a polynomial. The polynomial regression fits the non-linear relationship between the independent variable, x, and the dependent variable, y.

h(x) = Q0 + Q1*x + Q2*x² +…Qn*x^n

In the above diagram, data points have been placed which represent the relationship between x and y. Let’s now apply simple linear regression to find a line that fits the data.

Linear Regression applied to the dataset

We can see that Simple Linear Regression failed to provide the best fit line for the data since the data points are quite far away from the line and the error rate is quite high. Now, let’s try fitting the same dataset with the help of the polynomial regression model.

The line (or curve) obtained using polynomial regression is quite optimum since it fits almost all the data points and the error rate is at its minimum. This is why when the two variables have a non-linear relationship, we use the Polynomial Regression algorithm.

# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

Support Vector Regression

Let us talk about this algorithm with the help of diagrams.

In the above diagram, we can see the line fits a few points and it’s not the best line.

Now, in SVR, we form two lines on either side of the given line, and the data points lying within it are disregarded from the error point of view. The lines around the best fit line are called decision boundaries and the best fit line is called a hyperplane.

The tube-like structure that we see above is called the insensitive tube because the data points present in this tube are ignored from the error point of view. Let’s consider the two decision boundaries are at a distance of |h| from the hyperplane, the upper decision boundary is at a distance of ‘+h’ and the lower decision boundary is at a distance of ‘-h’ from the hyperplane.

Y = wx + b or Y = Q0 + Q1x

So the equation of decision boundary becomes:

+a = wx + b (Upper decision boundary)
-a = wx + b (Lower decision boundary)

So that means, all the data points and hyperplane will lie within this:

-a < Y- (wx+b) <+a

SVR comes from SVM (Support Vector Machines) which is a classification algorithm that we will discuss in the next blog.

from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)

In the next blog, we will discuss Decision trees and Random Forests in detail. Both the algorithms are full of concepts so, we will be discussing them in the next blog.

Stay tuned and Happy Learning!