Linear Regression in Python
The shortest and clearest explanation from scratch line by line
Linear Regression is in charge of modeling the relationships between variables by fitting a straight line.
This is the simplest linear relationship between 2 variables “vector x” and “vector y” (dependent and independent).
“Vector x” and “vector y” are vectors consisting of a certain number of components:
For a complete understanding, please, learn the NumPy library. This is a library for working with multidimensional arrays and matrices.
import numpy as np
X = np.array([4.5, 5, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0])
The number of components in “Vector x” always equals the number of components in “vector y”.
y = np.array([34, 44, 45, 53, 53, 60, 65, 70])
“x” with index “i” and “y” with index “i” are components of “Vector x” and “vector y”. And “i” = 1..n. So:
Time to find “β with index 0” and “β with index 1” (intercept and coefficient):
X = np.vstack((np.ones(len(X)), X)).T# [[1. 4.5]
# [1. 5. ]
# [1. 5.5]
# [1. 6. ]
# [1. 6.5]
# [1. 7. ]
# [1. 7.5]
# [1. 8. ]]
This is equal to:
Finding the values of “vector β” from the equations is a model fitting:
Lets do it step by step:
Xt = X.T
Xt_mul_X = Xt.dot(X)# [[ 8. 50.]
# [ 50. 323.]]
Xt_mul_X_inv = np.linalg.inv(Xt_mul_X)# [[ 3.8452381 -0.5952381]
# [-0.5952381 0.0952381]]
Xt_mul_y = Xt.dot(y)# [ 424. 2750.5]
res = Xt_mul_X_inv.dot(Xt_mul_y)
# [-6.82142857 9.57142857]
Lets put it all together:
Here “Intercept” is “β with index 0” and “Coefficient” is “β with index 1”.
Lets compare the results with the sklearn library:
Same result. Lets move forward!
The matrix with three variables:
And here we got three coefficients because “X” here is a matrix with three variables.
Time to use this coefficients for prediction!
X.T.dot(coefficients) + intercept
Just multiply “X” by coefficients and add intercept.
Now, put it all together:
As you can see predicted values are close to real ones.
Lets compare with the “sklearn” library method:
Same result.
Now, after you understand how Linear Regression works its time to know how you can use this study in real life:
Find full code here: