How does Linear Regression work Mathematically?

--

mathematics behind Linear regression.

I made a simple video on youtube how to develop a machine learning system. which can be trained using the Dataset and that system can predict the output for the unknown dataset without explicitly programmed. If you are interested please have a look at this: I have covered the implementation under 3 steps:

URL : https://www.youtube.com/watch?v=4smaKYJm1Xw&t=312s

In this blog, I am going to talk about how this machine learning algorithm works mathematically so that we can use machine learning libraries efficiently.

This was the dataset that we used in our previous video for the machine’s brain for the training.

Dataset prepared for the video

We can easily visualize the dataset using the matplotlib. This is the library which is used for the visualizing purpose.

Dataset Visualization using matplotlib

Now in this graph, we have to draw a straight line in such a way that, The line should touch most of the data point. Like this. Please Note: this line is also known as the best fit line. Once the best fit line is selected, Now our machine’s brain can predict easily by taking the reference of this best fit line.

Prediction using the dataset.

As we can see, it has predicted the same output when we implemented the linear regression under the 3 steps in a video. This is the general idea behind the prediction system of our machine’s brain. Now there may be questions arising like :

  • how to determine this best fit line because there can be many best fit lines that could be available?
Multiple best fit lines.
  • How does this algorithm work mathematically? Because if it will be mathematically easy we can use this algorithm to implement using any programming language?

Let's understand this algorithm by understanding the answers to this question.

You might remember the linear equation from your high school days math class. Home prices can be presented as the following equation,

y = m * x +c

According to our scenario equation would be:

home price = m * (area) + b

Generic form of the same equation is,

Linear regression algorithm.

In this example, we can compare that our x is our area and y is our home price which I need to predict. Whereas m and b are the slope and intercept respectively. In other words, m and b are the tweaking sliders.

m and b values are like this slider.

that can be used to adjust the line in such a way that our line should touch most of the data points. Now other question may be arising how can we determine that what is our best fit line because we can draw many lines like this:

Multiple best fit lines.

For this, we have to calculate the errors. This part we call it as mean squared error. In this, we start with the random line or we can say that random values of m and b for line equation and then we try to find the difference between each data point with respect to our line and then we sum up all the differences and then finding the mean of that

MEAN SQUARED ERROR

and then again we adjust the line and again find the mean squared error we have to do this process until the less error we found. Whole procedure looks like this :

Linear regression reaching to the best fit line demo.

Once the best fit line reached we simply note the value of m and b which can we used for the prediction.

Prediction of 3300 sq area house using scikit learn and manual mathematics.

In this image we can see reg.coef_ and reg.intercept is a value of m,b respectively determined by the scikit learn like:

m=135.78767123 and b=180616.4383561

according to the line equation, we can predict the output such as :

3300 * 135.78767123 + 180616.4383561

Prediction: 628715.75

Now, this is a very iterative process by checking every error and iterating with small steps which can be a very hectic process. To resolve all this Gradient Decent Comes into the picture. Which we will see in the upcoming content.

HOPE YOU ENJOYED THIS BLOG. HAVE A GOOD Day

--

--

Wakeupcoders - Digital Marketing & Web App Company
Analytics Vidhya

We make your business smarter and broader through the power of the internet. Researcher | Web developer | Internet of things | AI | www.wakeupcoders.com