Linear Regression

Understanding Linear Regression

Gajendra
7 min readJun 8, 2022

As described by each words Linear (arranged in or extending along a straight or nearly straight line) Regression (measure of the relation between variables). Linear Regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, or features, and the other is considered to be a dependent variable, or target.

In the field of machine learning Linear Regression is a considered a supervised learning task.

Linear Regression Line

A Linear Regression line has an equation of the form Y = mX + C, where X is the explanatory variable or feature variable and Y is the dependent variable or a target variable. The slope of the line is m, and C is the intercept (the value of y when x = 0).

Linear Regression can be used for both univariate and multivariate datasets. We will cover both these types in the sections below.

Univariate

For univariate dataset the linear regression estimation function is a straight line of the form shown below.

Equation for a simple Linear Regression

Where,

Slope of the line

Example

Let’s say we have a dataset like as shown below and we would like to predict the value of Y for the next value of X using our equation mentioned above.

Given dataset

First, we can see that our Y intercept, Y when X = 0, is 2. Next, we will calculate the slope (m) of the line using the equation of the slope mentioned above.

Slope

Now we can calculate the value of Y for the given X using our equation for the linear regression. Table below shows the predicted value of Y for the given X using linear regression equation.

Prediction using Linear Regression

A Linear Regression line for the above dataset will look like below.

Linear Regression Line Plot

In a generic form, to represent a hypothesis test the equation can be represented as below.

Linear Regression Hypothesis

Where,

Hypothesis under Test
Parameters or Hyperparameters
Features or Predictors

Multivariate

In case of a univariate dataset the Linear Regression results in a form of a line. The form changes as we add more features or dimensions to our model. In case of multivariate dataset the Linear Regression take a form of a plane.

To build the equation for the multivariate linear regression we will extend our original equation to include additional features or dimensions. The equation for the Linear Regression for n features is given by below equation.

The value of first feature is always 1 i.e.

So, the final equation will look like

In machine learning field we represent the multivariate dataset in the form of matrices. We can generalize our equation above to represent it in a matrix form.

Where,

Parameter Vector
Transformed Parameter Vector
Feature Vector

So far we have discussed how Linear Regression works for univariate and multivariate dataset. Understanding this behavior is very important but it still doesn’t explains how to build a model that makes an accurate predictions.

When we plot of our dataset in a 2D or higher dimensional space its nearly impossible to draw a straight line or a plane that passes through all the data points. We can force our model to pass through as many data points as possible but there is a risk of overfitting.

So how do we find the best line or a best plane?

One of the way is to implement a method called Least Squares Regression.

Least Squares Regression

The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). Because the deviations are first squared, then summed, there are no cancellations between positive and negative values.

Least Squares Regression is a method which minimizes the error in such a way that the sum of all square error is minimum.

Steps

Let’s look at the steps to perform a Least Squares Regression. We will use the same sample dataset we used above.

Step 1: For each (x, y) point calculate x2 and xy

Step 2: Sum all x, y, x2 and xy, which gives us Σx, Σy, Σx2 and Σxy (Σ means “sum up”)

Step 3: Calculate Slope m:

(N is the number of data points)

Step 4: Calculate Intercept C:

Step 5: Assemble the equation of a line

Using the steps above we can predict the value of Y for the given X.

Cost Function

Cost function (J) of Linear Regression is the Root Mean Squared Error (RMSE) between predicted y value (predicted) and true y value (y).

So the objective of the learning algorithm is to find the best parameters to fit the dataset i.e. choose θ such that hθ(x) is close to y for the training examples (x, y). This can be mathematically represented as,

Cost Function

One of the objective of the Linear Regression is to minimize the Cost Function (J).

Types of Errors

Absolute Error (AE)

Squared Error (SE)

Root Mean Squared Error (RMSE)

Gradient Descent

Gradient Descent is one of the most important and commonly used optimization technique used in machine learning.

Gradient Descent is a process to update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE value) and achieving the best fit line. The idea is to start with random θ1 and θ2 values and then iteratively updating the values, reaching minimum cost.

Gradient Descent is a partial derivative of Cost Function(J) multiplied by the Learning Rate(alpha).

Regularization

This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.

L1 Regularization (Lasso Regression)

The acronym “LASSO” stands for Least Absolute Shrinkage and Selection Operator.

The regularized Cost Function(J) is -

L2 Regularization (Ridge Regression)

L2 regularization adds an L2 penalty, which equals the square of the magnitude of coefficients. All coefficients are shrunk by the same factor (so none are eliminated). Unlike L1 regularization, L2 will not result in sparse models.

Assumptions

Here are some of the assumptions when applying Linear Regression-

  1. Linearity: The relation between the independent variables and the dependent variable must be linear.
  2. Multicollinearity: Linear regression assumes that there is no multicollinearity between the independent variables.
  3. Homoscedasticity: The error associated with each data point should be equally spread (meaning “constant variance”) along the best fit line.

I hope this article provides you with a good understanding of Linear Regression.

If you have any questions or if you find anything misrepresented please let me know.

Thanks!

--

--

Gajendra

| AWS MLS, SAA, CLF | MIT - ADSP | Software Engineer | Data Scientist | Machine Learning | Artificial Intelligence | Hobby Blogger |