Evaluation Metrics for Regression Models
Most widely used Regression Metrics for evaluating model performance
In my previous article, I had talked about various metrics used in classification models for evaluation. In this article, I’ll discuss some metrics that we use in regression models.
In classification, our model predicts a class label while our model predicts a numeric value in regression. In regression, we can’t use classification accuracy to evaluate our model. There are error metrics designed for regression models. We will discuss more in this article.
Let’s get started then….
Introduction
Regression is a problem where we try to predict a continuous dependent variable using a set of independent variables. For example, weather forecasting, market trends, etc. These problems are used to answer “How much?” or “How many?”
In regression problems, the prediction error is used to define the model performance. The prediction error is also referred to as residuals and it is defined as the difference between the actual and predicted values.
The regression model tries to fit a line that produces the smallest difference between predicted and actual(measured) values.
Residuals are important when determining the quality of a model. You can examine residuals in terms of their magnitude and/or whether they form a pattern.
- Where the residuals are all 0, the model predicts perfectly. The further residuals are from 0, the less accurate the model is.
- Where the average residual is not 0, it implies that the model is systematically biased (i.e., consistently over-or under-predicting).
- Where residuals contain patterns, it implies that the model is qualitatively wrong, as it is failing to explain some properties of the data.
Residual = actual value — predicted value
error(e) = y — ŷ
So, the question is when you have residuals then why do we need different metrics? Let’s find out...
We can calculate the residual for every point in our data set, and each of these residuals will be of use in assessment.
Residual = Inflation — Predicted
We can technically inspect all residuals to judge the model’s accuracy, but this does not scale if we have thousands or millions of data points. That’s why we have summary measurements that take our collection of residuals and condense them into a single value representing our model's predictive ability.
Now we’ll turn our focus to metrics of our model.
Regression Evaluation Metrics:
In this section, we will take a closer look at the popular metrics for regression models.
Mean Absolute Error (MAE):
It is the average of the absolute differences between the actual value and the model’s predicted value.
where,
N = total number of data points
Yi = actual value
Ŷi = predicted value
If we don’t take the absolute values, then the negative difference will cancel out the positive difference and we will be left with a zero upon summation.
A small MAE suggests the model is great at prediction, while a large MAE suggests that your model may have trouble in certain areas. MAE of 0 means that your model is a perfect predictor of the outputs.
Here’s a Scikit-learn implementation of MAE:
The mean absolute error (MAE) has the same unit as the original data, and it can only be compared between models whose errors are measured in the same units.
The bigger the MAE, the more critical the error is. It is robust to outliers. Therefore, by taking the absolute values, MAE can deal with the outliers
Here, a big error doesn’t overpower a lot of small errors and thus the output provides us with a relatively unbiased understanding of how the model is performing. Hence, it fails to punish the bigger error terms.
MAE is not differentiable so we have to apply various optimizers like Gradient descent which can be differentiable.
Mean Squared Error (MSE):
It is the average of the squared differences between the actual and the predicted values.
Lower the value, the better the regression model.
where,
n = total number of data points
yi = actual value
ŷi = predicted value
Its unit is the square of the variable’s unit.
Here’s a Scikit-learn implementation of MSE:
If you have outliers in the dataset then it penalizes the outliers most and the calculated MSE is bigger. So, in short, It is not Robust to outliers which were an advantage in MAE.
MSE uses the square operation to remove the sign of each error value and to punish large errors.
As we take the square of the error, the effect of larger errors become more pronounced then smaller error, hence the model can now focus more on the larger errors.
The main reason this is not that useful is that if we make a single very bad prediction, the squaring will make the error even worse and it may skew the metric towards overestimating the model’s badness.
On the other hand, if all the errors are small, or rather, smaller than 1, then we may underestimate the model’s badness.
Root Mean Squared Error (RMSE):
It is the average root-squared difference between the real value and the predicted value. By taking a square root of MSE, we get the Root Mean Square Error.
We want the value of RMSE to be as low as possible, as lower the RMSE value is, the better the model is with its predictions. A Higher RMSE indicates that there are large deviations between the predicted and actual value.
where,
n = total number of data points
yj = actual value
ŷj= predicted value
Here’s a Scikit-learn implementation of RMSE:
Max Error:
While RMSE is the most common metric, it can be hard to interpret. One alternative is to look at quantiles of the distribution of the absolute percentage errors. The Max-Error metric is the worst-case error between the predicted value and the true value.
Here’s a Scikit-learn implementation of Max Error:
R² score, the coefficient of determination:
R-squared explains to what extent the variance of one variable explains the variance of the second variable. In other words, it measures the proportion of variance of the dependent variable explained by the independent variable.
R squared is a popular metric for identifying model accuracy. It tells how close are the data points to the fitted line generated by a regression algorithm. A larger R squared value indicates a better fit. This helps us to find the relationship between the independent variable towards the dependent variable.
R² score ranges from 0 to 1. The closest to 1 the R², the better the regression model is. If R² is equal to 0, the model is not performing better than a random model. If R² is negative, the regression model is erroneous.
It is the ratio of the sum of squares and the total sum of squares
where SSE is the sum of the square of the difference between the actual value and the predicted value
and, SST is the total sum of the square of the difference between the actual value and the mean of the actual value.
Here, yi is the observed target value, ŷi is the predicted value, and y-bar is the mean value, m represents the total number of observations.
When we add new features in our data, R2 score starts increasing or constant but never decreases because It assumes that while adding more data variance of data increases.
But the problem is when we add an irrelevant feature in the dataset then at that time R2 sometimes starts increasing which is incorrect.
Here’s a Scikit-learn implementation of R2 Score:
R2 describes the proportion of variance of the dependent variable explained by the regression model. If the regression model is “perfect”, SSE is zero, and R2 is 1. If the regression model is a total failure, SSE is equal to SST, no variance is explained by the regression, and R2 is zero.
Adjusted R-Square:
Adjusted R² is the same as standard R² except that it penalizes models when additional features are added.
To counter the problem which is faced by R-square, Adjusted r-square penalizes adding more independent variables which don’t increase the explanatory power of the regression model.
The value of adjusted r-square is always less than or equal to the value of r-square.
It ranges from 0 to 1, the closer the value is to 1, the better it is.
It measures the variation explained by only the independent variables that actually affect the dependent variable.
where
n is the number of data points
k is the number of independent variables in your model
Conclusion:
In this article, we discussed several important regression evaluation metrics. Depending on the situation, some metrics might be more relevant than others.
In all cases, the goal is to estimate the distance between the predicted model and the actual values. The critical point is to have a subtle understanding of these metrics to use them adequately.
I recommend Scikit-learn official documentation for more metrics: https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics