Measure it up: Regression metrics of fit

Micha Kastelein
Sogeti Data | Netherlands
6 min readDec 18, 2020

--

Machine learning metrics/measures of fit can be quite daunting. Do you recognize this situation? You have gathered a lot of data and you have built a regression model that looks amazing. It gives predictions for the variable you want to predict. Everything looks perfect. But, how do you know if your model predicts correctly? Don’t look further, this blog is for you!

After fitting a regression model on a dataset, it is important to test if the calculated model fits the data. Luckily, there is a way to compare multiple models with one another: measures of fit. A measure of fit will qualify the error of the calculated model. The error is the mistake that the model makes, compared to the original data. Which measure of fit is used, determines the way the mistake of the model is calculated. It is important to remember that a measure of fit doesn’t necessarily say anything about the accuracy of a model, it only tells you about the average error between your models’ predictions and the true values.

In this blog we will discuss 4 different measures of fit: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE) and R Squared (R²). Each measure will be explained, and we will talk about its best uses and its flaws.

Mean Absolute Error (MAE)

The MAE calculates the absolute error between the prediction and the true value for every datapoint and takes the sum of all these errors divided by the number of datapoints.

The MAE is easy to interpret but doesn’t show larger error outliers that well. MAE is commonly used in time series analysis since the change on big differences in errors between data points is smaller.

Mean Squared Error (MSE)

The MSE calculates the squared error between the prediction and the true value for every datapoint, sums the errors and divides it by the number of datapoints.

The MSE is less easy to interpret since it has an order of 2 and data (normally) has an order of 1, the error can’t be directly correlated with the data. On the plus-side, the MSE will be smaller when more data is used. This results in the MSE giving a better representation of the error of a model when more data is used. It can be used for almost all regression models

Root Mean Squared Error (RMSE)

RMSE is very similar to the MSE. To calculate the RMSE, we simply take the root of the MSE.

This solves the MSE issue of not having the same order in the dataset and error. RMSE is also sensitive to outliers. But because of the root, the number will always be lower than MSE thus making it a more readable number than MSE. Keep in mind that you can’t use the RMSE to compare different variables if they have a different scale of numbers.

R squared

R² is calculated by dividing the Sum of Squared Errors (SSE) by Sum of Squared Total (SST).

The SSE is the sum of the difference between the predicted value and the true value squared. It is very similar to the calculation used in the MSE with the only difference being that MSE is divided by the number of datapoints. The SST is calculated by taking the sum of the difference between the predicted value and the average true value squared.

R² is the comparison of a straight line with the chosen model. If a straight line is better at predicting than the chosen model, R² is negative. When the model fits the data perfectly, R² is 1 (SSE will be zero for a perfect prediction). Don’t use R² on its own. A low R² isn’t always bad and a higher R² isn’t always good. Make sure to combine it with another measure of fit, a high R² might have a large error and vice versa.

Some Python magic: Measures of Fit in action

I hear you think: “Micha, this is interesting, but how can I use this in Python?”. Let me show you!

Let’s start with stating some true values and some predictions. I will be using some easy to see and calculate deviations. This is a nice way to get a good feeling for the discussed measures of fit, so next time you work with a model, you can use the measures yourself in a larger scenario.

y_true = [-10, -5, 0, 5, 10, 15]
y_always_five = [-15, -10, -5, 0, 5, 10]
y_small_large = [-7, -2, 3, 12, 17, 22]
y_outlier = [-10, -5, 0, 5, 10, 45]

Now that we have the y_true and the predictions, we can calculate if our prediction is any good. We’ll go over them one by one. The python code will only show examples for y_always_five, Table 1 will show the the measures for other predictions as well.

print("MAE y_always_five: ", mean_absolute_error(y_true, y_always_five))MAE y_always five: 5.0print("MSE y_always_five: ", mean_squared_error(y_true, y_always_five))MSE y_always_five: 25.0print("RMSE y_always_five: ", sqrt(mean_squared_error(y_true, y_always_five)))RMSE y_always_five: 5.0print("R2 y_always_five: ", r2_score(y_true, y_always_five))R2 y_always_five: 0.6571428571428571

A bunch of different numbers. Time to interpret them and get our best fit. The table below displays the different predictions and their corresponding measures (Table 1). But first some code to plot these predictions. The result is shown in Figure 1.

Figure 1: y_true and its predictions plotted

As previously stated, MAE doesn’t handle outliers very well. The y_outlier has a large outlier on x=5, but still shows a MAE value of 5. The other 3 measures are better at displaying this outlier. MSE and RMSE are giving a much higher value than before and R² even gives a negative number, showing that the predicted model is worse than a horizontal line at predicting the values. You can also notice that MSE has a much higher value than the other measures. As earlier discussed, this has to do with the MSE having an order of two and the data having an order of one. You can also notice the difference between y_always_five and y_small_large when you look at RMSE. There is a small difference in error. This shows that the larger errors in y_small_large way heavier than the smaller errors.

Recommendation

MAE, MSE, RMSE and R² plotted side by side, show that the MAE is most interpretable, but it lacks the ability to deal with outliers and differences between evenly distributed errors and a small variance in errors. It also shows that MSE and RMSE overcome MAE’s shortcomings since larger errors have a bigger impact on them. Lastly, R² is great in showing if our model is better at predicting values than a horizontal line. Its value shows us if our models prediction comes close to the original values. I would suggest using both RMSE and R² side-by-side. Trying to get a model with RMSE as low as possible, while also trying to get the R² as close to one as possible. When presenting a model, you can use the RMSE to show the average deviation the model makes, compared to the true values. The R² can be presented like an accuracy, but only when you use it in combination with other measures. This combination shows a model that comes close to the true values, while also showing a small average error for each datapoint.

--

--