# Evaluation of regression models

Introduction

In the previous article, we talked about how to use confusion matrix to evaluate classification models (i.e. accuracy, precision and recall). However, regression model predicts **numbers**, not the label. Thus, we have to apply different methods to evaluate the model’s performance. In this article, we’ll look at ** Mean absolute error**,

**, and**

*Root mean squared error***.**

*R-squared*Plot

Humans are visual creatures. When it’s possible, we should always plot the fitted model and the data on the same graph. Take simple linear regression for example. Say we want to predict a person’s weight based on the height, we can plot the fitted line on the original scatter plot. In this example, simple linear regression model seems to fit the data generally well.

However, numerical measures are required for accurate model evaluation.

Mean absolute error (MAE)

Mean absolute error is the **average of the absolute differences between true values and predicted values**. Thus, a low MAE is preferred.

In the example of the simple linear regression model, the MAE is 1/5 [ |60–55| + |57–62| + |44–41| + |85–88| + |80–78|] = 3.6.

Essentially, we’re measuring the average of the length of blue lines (distances between predicted and actual values) in the graph.

Root mean squared error (RMSE)

Root mean absolute error is the **square root of the average of the squared differences between true values and predicted values**.

For the previous example, the RMSE is sqrt(1/5 [ (60–55)² + (57–62)² + (44–41)² + (85–88)² + (80–78)²] )= sqrt(72/5) = sqrt(14.4) = 3.795.

You might be wondering why do we have 2 similar measures. In fact, **RMSE penalizes large errors more** than MAE does since we square the errors before taking the root square of them. Further, squaring is more suitable for the algebra than the absolute method from mathematical point of view.

Having say that, you can still use MAE if you want to. It’s by all means more interpretable as it’s simply the average distance between predicted and actual values.

R-squared

R-squared, or correlation of determination, measures **how close the data points are to the fitted line**. Alternatively, you can think of R-squared as the fraction of total sample variance explained by the independent variables in the model. Therefore, a **high** R-squared is generally preferred.

One thing to note is that **a bad model could have high R-squared by chance**. For instance, the model might fit part of the data well, while fail to capture the trend of the whole data (see this article for details).

Actually, there is no perfect measurements. R-squared is not comprehensive, neither is the MAE or RMSE. It’s important to look at multiple measures before making judgement about any regression model, just like we should never use a single measure to evaluate the classification model!

References