Linear Regression With Ordinary Least Squared Method and Gradient Descent From Scratch In Python.
Linear regression is modeling the linear relationships between real-valued predictors and target variables. We find the function that best representative of the data.
Linear regression is the first model we learn about regression analysis since high school. We are looking for regression line that fit the samples drawn from population with the most commonly used statistical method, ordinary least square regression (OLS). Weestimate model parameters. However, estimates functions get complex as we have more independent variables to be included in the model. Gradient descent (GD)is another option that is widely applied to a range of models training. It is simpler to implement for linear regression model.
Here, for simplicity, we are trying to build a linear regression from scratch for simple linear regression.
Ordinary least square method is non-iterative method to fit a model by seeking to minimize sum of squared errors. There is a list of assumptions to satisfy when we are applying OLS.
- Function must be linear in parameters and error term.
- Error terms are independent with each other and all independent variables.
- Error terms are normally distributed with mean of zero and constant variance.
- No independent variables are perfectly correlated.
Estimates of coefficients for simple linear regression are derived as below:
- Compute sum of square error, SSE.
2. Differentiate with respect to parameters.
3. Let differential equation equal to zero. Solve the simultaneous equation to get estimates of parameters.
The process above gets lengthy and complicated as we have more independent variables included, and hence more estimate functions to be derived.
OLS from Scratch
Objective Function / Loss function
Loss function is the cost function associated with error in prediction. Error is difference between our predictions and true values. Often, we square the error for ease of derivatives computation. Loss function, mean squared error will be applied in gradient descent method below.
This is a popular optimization method. We are working iterative to find best coefficients for regression line, having the minimum loss. The logic behind gradient descent is like when a ball rolling down a curve. To reach the bottom, it should move in opposite direction to the slope. The scenario is illustrated below.
What is the step size to be taken to ensure we do not miss the bottom? This will be the challenge in GD method. When the step size is too large, we miss the destination. When the step size is too small, we take too long to the point.
- Compute gradient of loss function.
2. Selecting appropriate learning rate. Randomly select parameters of linear regression function.
3. Update the parameters.
4. Repeat the process until the loss is within our acceptance level or parameters converges.
Model from Scratch by Python
Looking at gradient descent visualization above, the degree at which the fitted line rotates and shifts appears to slow down as it approaches the final result. This is similar to the ball rolling illustration, as the ball approaches bottom, gradient decreases, and hence update size (delta in Fig 13) decreases. As discussed, OLS is a single run where information is substituting into equation derived to get estimates of parameter directly, while GD is running iterative until it arrived at the best result satisfying required condition. Obviously, OLS will become tougher to apply as features dimension increases.