ML Model — Linear Regression
The complex structure of Data Science and all the related fields is the reason for it to be enjoyable. One of the main terms in Data Science and Machine Learning is the training of ML models to a range of problems. In this blog, the Linear Regression, one of the most popular and easiest training models, will be described.
Linear Regression is a linear approach for modelling the relationships between dependent and independent variables. Two types of Linear Regression are avialable, namely: Simple and Multiple. Simple Linear Regression is the training model in which only one independent variable exists, while in Multiple two or more are used.
Linear Regression is useful in finding the relationships among several relations, and at the same time is used for predictions in Machine Learning. Comparing to other models, Linear Regression is used in statistical relation rather than deterministic.
The mathematical definition of Linear Regression is as follows:
· y(hat) is the predicted value;
· n is the number of features;
· x_i is the ith feature;
· thetha_j is the jth mod parameter. Moreover, these coefficients are called weights;
· thetha_0 is the bias term.
Rewriting the equation above in the vectorized form,
· thetha is the parameter vector, including theta_0;
· X is the feature vector;
· The multiplication is the dot product.
First, it is worth to mention that training a model means setting of appropriate parameters, such that the model fits well. That is why, good (or poor) performance of the model should be researched. Root Mean Square Error (RMSE) function is a commonly used function for checking the performance of Regression models.
According to the definition of RMSE, it is required to find the value of thetha which will minimize the value of RMSE. However, for easy implementation Mean Square Error (MSE) is an alternative way of finding the performance. MSE will lead to the same solution as RMSE, because the value minimizing the function will minimize its root form, as well.
The MSE for Linear Regression is defined in the following way.
Another method for finding the value of theta minimizing the cost function is a normal equation, which is a closed-form solution.
The mathematical definition of the function is:
· theta(hat) is the value of theta that minmize the cost function
· y is the vector of target values.
In this part, the implementation of Linear Regression in Python will be described. There are several ways of finding the solutions for such problems, but here the Scikit library will be shown.
#defining linear Regression model and fitting it to train sets.
lr = linear_model.LinearRegression()
predict = lr.predict(X_test)#checking the coefficient values
print('The intercept : ', lr.intercept_)
print('The coefficient: ', lr.coef_)#checking the MSE value
print('Mean Square Error', metrics.mean_squared_error(y_test, predict))
However, in case of normal equations, the code is written manually.
#finding the value of theta(hat)
def theta_calc(df1, df2):
n_data = df1.shape
bias_term = np.ones((n_data, 1))
df1_bias = np.append(bias_term, df1, axis=1)
theta_1 = np.linalg.inv(np.dot(df1_bias.T, df1_bias))
theta_2 = np.dot(theta_1, df1_bias.T)
theta = np.dot(theta_2, df2)
The small difference is generally observed between mentioned two methods.
Linear Regression is one of the most popular Machine Learning models and one of the most important terms in Statistics and Machine Learning. Two different approaches of this model were described in this article. However, there are different approaches, as well, which will be described later.
The whole code is shared on my Github profile.