Regression with Least Squares Intuition
In this post, we look at regression’s intuition.
Let’s consider the most straightforward scenario we saw as high schoolers. Assume that we want to find the line equation that goes through two points. The following example shows how we do it in linear algebra.
But, when the number of data points increases, all data points won’t lie on a line or curve. Curve means using a polynomial to describe the data. But, for keeping it simple, let’s consider we want to find the parameters of a line for a bunch of data (more than two). In the following example, the blue dots are our data points, and the red line is the line we fitted on it. Then, in regression, we want to find the parameters of that line (slope and offset).
We want to find the parameters of a linear model for fitting our data. The best answer in this scenario will be seeing the line with the least average distance from all data points. The least-square method is a well-known solution for finding the parameters of the linear model we want to use for our following predictions. The next video is eye-opening on how it works from the numerical calculation perspective.
From a linear algebra perspective, we have the following calculation.
The following figure shows it in more detail.
The reason for having this calculation is that in linear algebra, regression depends on the projection concept.
In this regression, polynomial functions are used because the distribution of data points is in a way that a line cannot cover. Thus, instead of having two parameters, we deal with finding more parameters. The model is like this:
The equation to solve and find parameters:
The same calculation is done for finding the parameters. There is a function in python’s Numpy package for finding the parameters.
x, res, rank, s = np.linalg.lstsq(design_matrix, y, rcond=None)