Machine Learning — Univariate Linear Regression

Anar Abiyev
Sep 5, 2020 · 7 min read

Linear Regression (LR) is one of the main algorithms in Supervised Machine Learning. It solves many regression problems and it is easy to implement. This paper is about Univariate Linear Regression(ULR) which is the simplest version of LR.

The paper contains following topics:

  • The basics of datasets in Machine Learning;
  • What is Univariate Linear Regression?
  • How to represent the algorithm(hypothesis), Graphs of functions;
  • Cost function (Loss function);
  • Gradient Descent.

The basics of datasets in Machine Learning

In ML problems, beforehand some data is provided to build the model upon. The datasets contain of rows and columns. Each row represents an example, while every column corresponds to a feature.

Image for post
Image for post

Then the data is divided into two parts — training and test sets. With percent, training set contains approximately 75%, while test set has 25% of total data. Training set is used to build the model. After model return success percent over about 90–95% on training set, it is tested with test set. Result with test set is considered more valid, because data in test set is absolutely new to the model.

What is Univariate Linear Regression?

In Machine Learning problems, the complexity of algorithm depends on the provided data. When LR is used to build the ML model, if the number of features in training set is one, it is called Univariate LR, if the number is higher than one, it is called Multivariate LR. To learn Linear Regression, it is a good idea to start with Univariate Linear Regression, as it simpler and better to create first intuition about the algorithm.

Hypothesis, graphs

To get intuitions about the algorithm I will try to explain it with an example. The example is a set of data on Employee Satisfaction and Salary level.

Figure 1. Raw dataset

As it is seen from the picture, there is linear dependence between two variables. Here Employee Salary is a “X value”, and Employee Satisfaction Rating is a “Y value”. In this particular case there is only one variable, so Univariate Linear Regression can be used in order to solve this problem.

In the following picture you will see three different lines.

Figure 2. 3 lines on the dataset

This is already implemented ULR example, but we have three solutions and we need to choose only one of them. Visually we can see that Line 2 is the best one among them, because it fits the data better than both Line 1 and Line 3. This is rather easier decision to make and most of the problems will be harder than that. The following paragraphs are about how to make these decisions precisely with the help of mathematical solutions and equations.

Now let’s see how to represent the solution of Linear Regression Models (lines) mathematically:

Image for post
Image for post

Here,

  • hθ(x) — the answer of the hypothesis
  • θ0 and θ1 — parameters we have to calculate to fit the line to the data
  • x — the point from the dataset

This is exactly same as the equation of line — y = mx + b. As the solution of Univariate Linear Regression is a line, equation of line is used to represent the hypothesis(solution).

Let’s look at an example. For instance, there is a point in the provided training set — (x = 1.9; y = 1.9) and the hypothesis of h(x) = -1.3 + 2x. When this hypothesis is applied to the point, we get the answer of approximately 2.5.

After the answer is got, it should be compared with y value (1.9 in the example) to check how well the equation works. In this particular example there is difference of 0.6 between real value — y, and the hypothesis. So for this particular case 0.6 is a big difference and it means we need to improve the hypothesis in order to fit it to the dataset better.

Image for post
Image for post

But here comes the question — how can the value of h(x) be manipulated to make it as possible as close to y? In order to answer the question, let’s analyze the equation. There are three parameters — θ0, θ1, and x. X is from the dataset, so it cannot be changed (in example the pair is (1.9; 1.9), and if you get h(x) = 2.5, you cannot change the point to (1.9; 2.5)). So we left with only two parameters (θ0 and θ1) to optimize the equation. In optimization two functions — Cost function and Gradient descent, play important roles, Cost function to find how well the hypothesis fit the data, Gradient descent to improve the solution.

Cost function (Loss function)

In the examples above, we did some comparisons in order to determine whether the line is fit to the data or not. In the first one, it was just a choice between three lines, in the second, a simple subtraction. But how will we evaluate models for complicated datasets? It is when Cost function comes to aid. In a simple definition, Cost function evaluates how well the model (line in case of LR) fits to the training set. There are various versions of Cost function, but we will use the one below for ULR:

Image for post
Image for post

Here,

  • m — number of examples in training set;
  • h — answer of hypothesis;
  • y — y values of points in the dataset.

The optimization level of the model is related with the value of Cost function. The smaller the value is, the better the model is. Why? The answer is simple — Cost is equal to the sum of the squared differences between value of the hypothesis and y. If all the points were on the line, there will not be any difference and answer would be zero. To put it another way, if the points were far away from the line, the answer would be very large number. To sum up, the aim is to make it as small as possible.

So, from this point, we will try to minimize the value of the Cost function.

Gradient Descent

In order to get proper intuition about Gradient Descent algorithm let’s first look at some graphs.

Image for post
Image for post

This is dependence graph of Cost function from theta. As mentioned above, the optimal solution is when the value of Cost function is minimum. In Univariate Linear Regression the graph of Cost function is always parabola and the solution is the minima.

Gradient Descent is the algorithm such that it finds the minima:

Image for post
Image for post
Image for post
Image for post

Here,

  • α — learning rate;

The equation may seem a little bit confusing, so let’s go over step by step.

  1. What is this symbol — ‘:=’?
  • Firstly, it is not same as ‘=’. ‘:=’ means “to update the left side value”, here it is not possible to use ‘=’ mathematically, because a number cannot be equal to subtraction of itself and something else (zero is an exception in this case).

2. What is ‘j’?

  • ‘j’ is related to the number of features in the dataset. In Univariate Linear Regression there is only one feature and j is equal to 2. ‘j’ = number of features + 1.

3. What is ‘alpha’?

  • ‘alpha’ is learning rate. Its value is usually between 0.001 and 0.1 and it is a positive number. If it is high the algorithm may ‘jump’ over the minima and diverge from solution. If it is low the convergence will be slow. In most cases several instances of ‘alpha’ is tired and the best one is picked.

4. The term of partial derivative.

  • Cost function mentioned above:
Image for post
Image for post
  • Cost function with definition of h(x) substituted:
Image for post
Image for post
  • Derivative of Cost function:
Image for post
Image for post

5. Why is derivative used and sing before alpha is negative?

  • The answer of the derivative is the slope. The example graphs below show why derivate is so useful to find the minima.
Image for post
Image for post

In the first graph above, the slope — derivative is positive. As is seen, the interception point of line and parabola should move towards left in order to reach optima. For that, the X value(theta) should decrease. Now let’s remember the equation of the Gradient descent — alpha is positive, derivative is positive (for this example) and the sign in front is negative. Overall the value is negative and theta will be decreased.

Image for post
Image for post

In the second example, the slope — derivative is negative. As is seen, the interception point of line and parabola should move towards right in order to reach optima. For that, the X value(theta) should increase. Now let’s remember the equation of the Gradient descent — alpha is positive, derivative is negative (for this example) and the sign in front is negative. Overall the value is positive and theta will be increased.

The coming section will be about Multivariate Linear Regression.

Thank you.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Anar Abiyev

Written by

Process Automation Engineering Student, Machine Learning Learner

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Anar Abiyev

Written by

Process Automation Engineering Student, Machine Learning Learner

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store