STATS101: Linear Regression

3 min readAug 31, 2018

Linear regression is of course an extremely simple and limited learning algorithm, but it provides an example of how a learning algorithm can work. Linear regression solves a regression problem.

The goal of a regression problem is to build a system that can take a vector x∈Rn as input and predict the value of a scalar y∈R as its output. There are 3 main components in a regression equation

1> Parameters

2> Intercept

3> Error

1> Parameters : In case of linear regression, the output is a linear function of the input i.e any given change in an independent variable (input value x) will always produce a corresponding change in the dependent variable (output value y).

𝑦̂ — It is the value that our model predicts

w — It is a vector of parameters

NOTE : Parameters are values that control the behavior of the system. We can think of 𝑤 as a set of weights that determine how each feature affects the predictions.

Positive correlation — Increase the weight of a feature increases the value of 𝑦̂.

Negative correlation — Decrease the weight of a feature increases the value of 𝑦̂.

Large Magnitude — Then the feature has large effect on the prediction

Zero weight — Then the feature does not have any effect on the prediction.

2> Intercept: Adding to the above equation is an additional parameter- an intercept term b.

This extension means that the plot of the model’s predictions still looks like a line, but it need not pass through the origin. Instead of adding the bias parameter b, one can continue to use the model with only weights but augment x with an extra entry that is always set to 1. The weight corresponding to the extra 1 entry plays the role of the bias parameter. The intercept bis often called the bias parameter since the output of the transformation is biased towards bin the absence of any input.

Another term that completes the above linear regression equation is the error term –

The performance of a linear regression model is measured by computing error term— the mean squared error on the test set. Mean squared error is given by

So the error increases whenever the Euclidean distance between the predictions and the target increases.

To make an effective machine learning algorithm, we need to design an algorithm that will improve the weights w in a way that reduces 𝑀𝑆𝐸(𝑡𝑒𝑠𝑡) when the algorithm is allowed to gain experience by observing a training set (𝑋(𝑡𝑟𝑎𝑖𝑛),𝑦(𝑡𝑟𝑎𝑖𝑛)). One intuitive way of doing this is just to minimize the mean squared error on the training set, 𝑀𝑆𝐸(𝑡𝑟𝑎𝑖𝑛).

STATS101: Linear Regression

Written by Keerthana