Linear Regression

Alperen ÜGÜŞ
MyTake
Published in
4 min readSep 7, 2019

Ninety percent of the data in the world today has been created in the last two years alone.

Growth of Data in the World Today

Everyone might have heard the statistic above even more than once. People started to produce enormous sizes of data in the last few years. From my perspective, the 21st century will be called “The Data Explosion Century” in the future. So, what do people do with all this data? How do they make sense of it? What is the benefit of having this much data?

Mathematicians and computer scientists found out very interesting ways of benefitting from data and they are still working on new methodologies. In this article, I will focus on Linear Regression that is one of the most powerful and useful approaches to get the use of data. The fundamental purpose of Linear Regression is to fit a line or hyperplane (depends on the dimension of the data) to scattered data points.

Fitting a Line to the Data Points

As known from the calculus, a line has the equation: y = b + xw. In general, we need to find the best parameters (b and w for this case) that fits the data points with the minimal error I have mentioned in my previous ‘How to make an introduction to Machine Learning?’ article. The least the error, the best the line! Here is the concept of error function shows up again. We need to define an error function for this problem. The most common approach is summing the square of differences between the actual value of each data point and the corresponding value on the line.

In this formula, b + xw is the value that corresponds to a specific data point on the line and y is the actual value of the data. In the beginning, b and w take initial values according to preference. They could be zero or take random values. In order to minimize this error function g(b, w) we can benefit from gradient descent method that uses the first derivative of functions. The formulas below are generalized forms according to the dimension of the data. Therefore, they use matrix notation:

Compact Matrix Notation
Rearranging the Error Function
The Gradient of the Error Function

When we set the gradient to zero, we find out that:

Solving the above equation, the parameters we find for w are the ones that minimize the error i.e we find the parameters of our best-fitting line.

I know that the formulas seem to be hard to understand but if you try to understand them by knowing which variable stands for what, then it will be very easy for you to grasp the idea.

After finding the right parameters, the only thing remained is to let our line make predictions for the unseen data points.

For example, if we are dealing with the weights of people with respect to their ages, first we collect the data, determine an error function. Find out the parameters of our line. Now, we can test our model with a person whose data is not collected before. Just put his age to the function, let it return you its prediction! See how correct is its prediction.

Linear Regression is a very useful and simple tool. From business to statistics, from medicine to education, it has a wide variety of application areas.

By using this link, you can play with a Live Linear Regression simulator.

For further information, you can see:

  1. Machine Learning Refined: Foundations, Algorithms, and Applications by Jeremy Watt, Reza Borhani, and Aggelos K. Katsaggelos

2. https://developers.google.com/machine-learning/crash-course/descending-into-ml/linear-regression

Thank you for your time!

--

--