R-Squared vs Adjusted R-Squared

Aishwarya Singh
Analytics Vidhya
Published in
3 min readAug 23, 2019

After building a machine learning model, the next step is to evaluate the model performance and understand how good our model is against a benchmark model. The evaluation metric to be used would depend upon the type of problem you are trying to solve —whether it is a supervised or unsupervised problem, and if it is a classification or a regression task.

In this post I am going to talk about two important evaluation metrics used for regression problems and highlight the key difference between them.

R-squared

R-squared, also known as the coefficient determination, defines the degree to which the variance in the dependent variable (or target) can be explained by the independent variable (features).

Let us understand this with an example — say the R-squared value for a particular model comes out to be 0.7. This means that 70% of the variation in the dependent variable is explained by the independent variables.

Ideally, we would want that the independent variables are able to explain all the variation in the target variable. In that scenario, the r-squared value would be 1. Thus we can say that higher the r-squared value, better in the model.

So, in simple terms, higher the R squared, the more variation is explained by your input variables and hence better is your model. Also, the r-squared would range from 0 to 1. Here is the formula for calculating R-squared-

The R-squared is calculated by dividing sum of squares of residuals from the regression model (given by SSres) by total sum of squares of errors from the average model (given by SStot) and then subtract it from 1.

One drawback of r-squared is that it assumes every variable helps in explaining the variation in the target, which might not always be true. For instance, if we add a new features to the data (which may or may not be useful), the r-squared value for the model would either increase or remain same but it would never decrease.

This is taken care of by a slightly modified version of r-squared, called the adjusted r-squared.

Adjusted R-squared

Similar to R-squared, the Adjusted R-squared measures the variation in the dependent variable (or target), explained by only the features which are helpful in making predictions. Unlike R-squared, the Adjusted R-squared would penalize you for adding features which are not useful for predicting the target.

Let us mathematically understand how this feature is accommodated in Adjusted R-Squared. Here is the formula for adjusted r-squared

Here R^2 is the r-squared calculated, N is the number of rows and M is the number of columns. As the number of feature increases, the value in the denominator decreases.

  • If the R2 increases by a significant value, then the adjusted r-squared would increase.
  • If there is no significant change in R2, then the adjusted r2 would decrease.

Resources:

--

--

Aishwarya Singh
Analytics Vidhya

• Free Articles • Making technical concepts easier to understand | Python | Machine Learning | Deep Learning | Time series | NLP