Linear Regression
As described by each words Linear (arranged in or extending along a straight or nearly straight line) Regression (measure of the relation between variables). Linear Regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, or features, and the other is considered to be a dependent variable, or target.
In the field of machine learning Linear Regression is a considered a supervised learning task.
Linear Regression Line
A Linear Regression line has an equation of the form Y = mX + C, where X is the explanatory variable or feature variable and Y is the dependent variable or a target variable. The slope of the line is m, and C is the intercept (the value of y when x = 0).
Linear Regression can be used for both univariate and multivariate datasets. We will cover both these types in the sections below.
Univariate
For univariate dataset the linear regression estimation function is a straight line of the form shown below.
Where,
Example
Let’s say we have a dataset like as shown below and we would like to predict the value of Y for the next value of X using our equation mentioned above.
First, we can see that our Y intercept, Y when X = 0, is 2. Next, we will calculate the slope (m) of the line using the equation of the slope mentioned above.
Now we can calculate the value of Y for the given X using our equation for the linear regression. Table below shows the predicted value of Y for the given X using linear regression equation.
A Linear Regression line for the above dataset will look like below.
In a generic form, to represent a hypothesis test the equation can be represented as below.
Where,
Multivariate
In case of a univariate dataset the Linear Regression results in a form of a line. The form changes as we add more features or dimensions to our model. In case of multivariate dataset the Linear Regression take a form of a plane.
To build the equation for the multivariate linear regression we will extend our original equation to include additional features or dimensions. The equation for the Linear Regression for n features is given by below equation.
The value of first feature is always 1 i.e.
So, the final equation will look like
In machine learning field we represent the multivariate dataset in the form of matrices. We can generalize our equation above to represent it in a matrix form.
Where,
So far we have discussed how Linear Regression works for univariate and multivariate dataset. Understanding this behavior is very important but it still doesn’t explains how to build a model that makes an accurate predictions.
When we plot of our dataset in a 2D or higher dimensional space its nearly impossible to draw a straight line or a plane that passes through all the data points. We can force our model to pass through as many data points as possible but there is a risk of overfitting.
So how do we find the best line or a best plane?
One of the way is to implement a method called Least Squares Regression.
Least Squares Regression
The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). Because the deviations are first squared, then summed, there are no cancellations between positive and negative values.
Least Squares Regression is a method which minimizes the error in such a way that the sum of all square error is minimum.
Steps
Let’s look at the steps to perform a Least Squares Regression. We will use the same sample dataset we used above.
Step 1: For each (x, y) point calculate x2 and xy
Step 2: Sum all x, y, x2 and xy, which gives us Σx, Σy, Σx2 and Σxy (Σ means “sum up”)
Step 3: Calculate Slope m:
Step 4: Calculate Intercept C:
Step 5: Assemble the equation of a line
Using the steps above we can predict the value of Y for the given X.
Cost Function
Cost function (J) of Linear Regression is the Root Mean Squared Error (RMSE) between predicted y value (predicted) and true y value (y).
So the objective of the learning algorithm is to find the best parameters to fit the dataset i.e. choose θ such that hθ(x) is close to y for the training examples (x, y). This can be mathematically represented as,
One of the objective of the Linear Regression is to minimize the Cost Function (J).
Types of Errors
Absolute Error (AE)
Squared Error (SE)
Root Mean Squared Error (RMSE)
Gradient Descent
Gradient Descent is one of the most important and commonly used optimization technique used in machine learning.
Gradient Descent is a process to update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE value) and achieving the best fit line. The idea is to start with random θ1 and θ2 values and then iteratively updating the values, reaching minimum cost.
Gradient Descent is a partial derivative of Cost Function(J) multiplied by the Learning Rate(alpha).
Regularization
This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.
L1 Regularization (Lasso Regression)
The acronym “LASSO” stands for Least Absolute Shrinkage and Selection Operator.
The regularized Cost Function(J) is -
L2 Regularization (Ridge Regression)
L2 regularization adds an L2 penalty, which equals the square of the magnitude of coefficients. All coefficients are shrunk by the same factor (so none are eliminated). Unlike L1 regularization, L2 will not result in sparse models.
Assumptions
Here are some of the assumptions when applying Linear Regression-
- Linearity: The relation between the independent variables and the dependent variable must be linear.
- Multicollinearity: Linear regression assumes that there is no multicollinearity between the independent variables.
- Homoscedasticity: The error associated with each data point should be equally spread (meaning “constant variance”) along the best fit line.
I hope this article provides you with a good understanding of Linear Regression.
If you have any questions or if you find anything misrepresented please let me know.
Thanks!