Sitemap

Understanding of the accuracy evaluation for a Regression Model

3 min readOct 22, 2024

--

Before evaluating a regression model, we first need to define some metrics.

Regression Metrics

Residuals

The difference between the actual data point and the prediction is called residual.

Residual

In this figure, green points are the actual data points and the orange line is the prediction line. As seen on the figure, the difference between the actual data point (yᵢ) and the prediction (yⱼ) is known as residual.

We can see the residual as the estimate of the error of a model, where the error is the difference between the actual value and the predicted value.

Average Residual

Calculating the average residual may look useful for error observations but this is not the case.

Data points below the line take negative values, while those above take positive values. As a result, when you multiply them, they may cancel each other out, which could lead to an average residual of ‘0’ giving the false impression that the model is making correct predictions.

For better understanding and use of residuals, we need to use other evaluation metrics and instead of focusing negative and positive values we can use the magnitude of the residuals with using some operations and formulas.

Mean Absolute Error

One way to account for the magnitude of residuals is to take the absolute value of each residual and then calculate the average. This is known as the Mean Absolute Error, and it represents the average difference to expect between the model’s predictions and the actual values.

Mean Absolute Error Equation

In this equation, n is the number of data points, y is the actual value, ŷ is the predicted value and, ∣y—ŷ∣ is the absolute difference between the actual and predicted values.

The nearer to zero Mean Absolute Error is, the better predictions were made, while the higher figures show lesser prediction accuracy. Meaning when comparing models the one with the lowest Mean Absolute Error will be chosen. But, Mean Absolute Error does not consider errors in equal manner because large errors are often worse than small errors. To counter this, replacement with squared error rather than the absolute one is an effective solution due to the consideration of large errors.

Root Mean Squared Error

To eliminate the effect of positive and negative residuals and focus on their magnitudes, you can square their values.

Root Mean Squared Error Equation

In this equation, n is the number of data points, y is the actual value, ŷ is the predicted value and,(y​−ŷ)2 is the squared difference between the actual and predicted values.

The given reason is that while taking the sum of squares it favors large errors due to squaring of errors. This helps us avoid models that give predictions that change drastically over time. However, squaring makes the error even more untraceable, because sometimes alters units . Which is why we take the square root of the Mean Squared Error to arrive at Root Mean Squared Error (Root Mean Squared Error) as it is easier to understand.

--

--

No responses yet