Why Not MSE or RMSE A Good Enough Metrics For Regression? All about R² and Adjusted R²

Neha Kushwaha
Analytics Vidhya
Published in
5 min readSep 1, 2020

In machine learning there is a serious problem, even if you get a good score how you will know its a good score? We will be studying the metrics used for evaluating Regression problems.

This article will help you answer following questions?

  • What are good metrics for Regression Evaluation?
  • Why using MSE and RMSE not a good idea for Regression evaluation metrics?
  • Why Adjusted R² preferred over R²?

Why using MSE or RMSE not a Good idea?

Let’s see it through a dummy data analysis. Blue point’s represent the training data. Now i draw a hypothetical line on the this data which might be a Good fit and the choose a higher and lower order function and see its impact on the data by calculating MSE Score.

Chart Vs Graph Comparison

Increasing the function complexity did decrease the MSE score in training. And will tempt you to choose this function for your model. But wait !! Did you validate this model on your test Data?

It did perform Good on training data, but failed on test data. This scenario was illustrated over-fitting where we try to get a function which tries to cover all the points. If you try to reduce function it might lead to under-fitting.

The difference in both scenarios, between test score and train score which will be quite noticeable and makes us difficult to find a good fit line. Now we know why our MSE or RMSE number lied to us!!

What is R-squared?

The main reason we were not able to judge MSE or RMSE score was, because there was no range which will help us in better judgement.

Can’t you tell scoring 10 is better then 1. Scale indeed helps!!!

R-square(R²) is also known as the coefficient of determination. It defines the degree of variance in the dependent variable(Y/target) can be explained by the independent variable(X/Features).

The R² value varies from 0 to 1. Score of 1 being the ideal where 100% variation can be explained by the input feature variable. It provides the goodness of best fit line.

Arriving at R² Formula

R² is the ratio of sum of squares of residuals from the regression model (SSR) and total sum of squares of errors from the average model (TSS) and then subtract it from 1.

  • SSE : It is measure of how far off our model’s predictions are from the observed values. In simple term’s it is the sum of the squares of the difference between the actual observed value (y) and the predicted value (y^)
  • TSS : It is a measure of the variance in the target variable. It is measured simply as the total sum of the squared difference between each actual observation(y) and the target mean( ȳ avg)
Courtesy by : https://twitter.com/KirkDBorne

When R² is 1(best fit without any error) and 0 (not a good model to go for) :

Where dose R² fails?-

  • R² assumes that every variable helps in explaining the variation in the target, which might not always be true.
  • For instance, if we add a new features to the data (which may or may not be useful), the R² value for the model would either increase or remain same but it would never decrease.
  • For penalizing the new added independent feature variable which might or might not be correlated is done by Adjusted R².

Adjusted R² explained

  • It also similar to R² calculates the variation between independent and dependent variable.
  • But, unlike R² , it take’s into account the positive or negative effect of newly added feature to independent variable by penalizing it.

Let us mathematically understand how this feature is accommodated in Adjusted R².

where:

  • n is the number of points in your data sample.
  • k is the number of independent features, i.e. the number of variables in your model, excluding the constant.

On addition of extra features, the adjusted R2 will compensate for this by penalizing you for those extra variables.

Note : While Adjusted R² values are usually positive, they can be negative as well. This could happen if your R2 is zero;

After the adjustment, the value can dip below zero. This usually indicates that your model is a poor fit for your data.

Other problems with your model can also cause sub-zero values, such as not putting a constant term in your model.

I hope it helped answered your few questions we defined in start of the Article. Stay Tuned!! Keep Learning!! Be Safe!!

Happy Learning !! :)

--

--

Neha Kushwaha
Analytics Vidhya

Software engineer by profession ….Data science learner by passion!!!!