R-Squared, Adjusted R-Squared and the Degree of Freedom

Metrics for determining the goodness of fit in Regression Model

Md Junaid Alam
Geek Culture
8 min readAug 25, 2021

--

R-Squared (R²)

When we perform regression, then how good the model fit was for the regression depends on how well we pre-processed the data and what algorithm we used for fitting the regression model. Now there needs to be some kind of metrics to determine how good the fit is.
R-Squared (also denoted by ) is one of those metrics. Let us understand the concept of .

Let us take an example where we have the temperature recording of previous 5 days in degree Celsius which are: 32, 24, 26, 30 and 28.
With these given value of the past 5 days, one simplest wat we can forecast the temperature of the sixth day is by taking the mean of these temperature which is 28 degree Celsius.

While the approach of forecasting by mean would seem to work initially but with the actual new value of temperature in future, this way of forecasting may not work depending on the magnitude of the new value. We may have to again recalculate the new mean and follow the same process to forecast.

Now let us plot the graph of a simple dataset having with a single independent variable ‘x’ and dependent variable ‘y’. The value of y mean is calculated and the line for y mean (indicated by Y bar) is also plotted in the graph as shown below.

As we observe from the above graph that there are deviations between the actual value of y and the y mean. This is called Mean Deviation.
Now, In the updated graph below, the red lines denote the mean deviation of each data points.

Note that the sum of the mean deviation will be zero. So to avoid this effect of nullifying the sum of means to zero we square each mean deviation and take the sum.

This sum of the squares of mean deviation is called TSS (Total sum of squares).

Let us now fit a regression line (L) for the given data as shown below:

If we observe the above graph carefully we notice that the predicted value (value of y on the regression line ‘L’) is more closer to the actual value than the value on y mean line.
The distance between the mean and the predicted value on the regression line denotes how much the model has explained.

For the sake of simplicity and to explain better, let us consider referring to only one data point assuming that we are considering the remaining data points while referring to this one data point to explain the concepts.

Now if we take the sum of the squares of all distance between the predicted value on regression line and the Y mean, then this sum is called the ESS (Explained Sum of Squares).
The distance between the predicted and actual value are called residuals (errors) and if we take the square of each residuals and sum it then this is called RSS (Residual Sum of Squares).

The graph below shows the TSS, RSS and ESS.

Total Sum of Squares
Explained Sum of Squares
Residual Sum of Squares

The goodness of the fit is denoted by R² . It explains what portion of the given data variation is explained by the developed model. In other words it denotes that how much percent or what proportion of ESS is of the TSS.

Hence, =ESS/TSS
=> = (TSS-RSS)/TSS
=>
= 1-(RSS/TSS)

From the above graph we also observe that more the value of ESS, better is the goodness of the fit, lesser the value of ESS then lesser the goodness of fit. We infer that if ESS is maximum i.e. when ESS will be equal to the TSS then will be 1.
In worst case scenario ESS will be 0, hence will be 0. So higher the value of R² better is the fit of the model.

Therefore 0≤ R² ≤ 1

Please note that there are also possibilities for an R² to be negative in a rare situation where the model does not follows the trend of a data (generally having negative slope and RSS>TSS), but such model are anyway useless.

Adjusted R-Squared (Adjusted R²)

Before jumping into the concept of Adjusted R², let us first understand what is Degree of Freedom.

Degree of Freedom
There are different approach of determining the degree of freedom depending on the context. Let us understand the degree of freedom for a given equation below:

x1 + x2 + x3 = 500

In the above equation we have the freedom to choose any value of x1, so let us choose an arbitrary value of x1 = 100. Similarly we have the freedom to choose any value of x2, let us choose the value of x2 = 350.
But when we come to the 3rd variable, we do not have the freedom to choose any value of x3 as we need to choose a value of x3 such that the previously chosen values of x1 and x2, along with the value of x3 should satisfy the equation.
This implies that we have to choose the value of x3 as 50 as this value will satisfy the equation x1+x2+x3 = 500.
So out of 3 variables in the equation we had the freedom to choose any value for 2 (i.e. 3 - 1) variables. Similarly for higher number of variables say for ‘n’ number of variables in an equation we will have the freedom to choose any value for n-1 variables. This value n-1 is called the Degree of Freedom.

Degree of Freedom = n-1

Now let us understand the degree of freedom in terms of regression.

Given a dataset with ’n’ number of observations or sample as shown below:

The above dataset can be represented by the following linear equation:

Y = β0 + β1*X1 + β2*X2 + β3*X3

where ‘β0’ is the intercept and β1, β2 and β3 are the slope coefficients that we need to determine and estimate.

So let the number of observations be ’n’, the number of slope coefficients be ‘k’ and the number of intercept is always ‘1’ (which is β0).

So the Degree of Freedom in this case of would be:

Number of observations - number of coefficient to be determined - number of intercept.

Which implies that:

Degree of Freedom for Linear Regression = n - k - 1

So for the above dataset having 3 features the degree of freedom (df) will be:

df = n - 3 - 1 = n - 4

We also infer that for a regression to execute the number of observations should always be greater than the number of parameters (or variables), else the regression will not execute properly and will result in an error.

Number of observation(n) > Number of parameters (k+1)

So in the above case since the number of parameters are 4 (3 slope coefficients and 1 intercept), hence the minimum observation should at least be 5 for the regression to work.

As we saw earlier that R² provides the metrics of how good the model fit is, then why would there be a need of Adjusted R²?

To answer the above question, let us understand the problem faced when using R².
Let us add a new variable ‘X4’ to our previous equation. Hence the new equation now becomes:

Y = β0 + β1*X1 + β2*X2 + β3*X3 + β4*X4

We can clearly see that the new degree of freedom (df1) will decrease as given below:

df1= n - k - 1 = n - 4 - 1 = n - 5

If the degree of freedom decreases then the explanatory power of the model also decreases, but with the increase in the number of variables the R² value tends to increase or remains same giving us a notion that the explanatory power of model is increasing or constant which actually may not always be true.

On the other hand if the new variable is very much relevant then the explanatory power of the model would increase.
So depending on how much relevant the new added variable is the goodness of fit or explanatory power of the model would increase accordingly.

Hence there can be different situation where the newly added variable is highly relevant to the business domain and the increase of explanatory power of the model increases to a good extent, while simultaneously there has been a decrease in the degree of freedom, despite which the net effect is that overall there is an increase in the model explanatory power.
There can be an opposite scenario as well where the newly added variable is not significantly relevant so not much increase in model’s explanatory power and hence the net effect is that the decrease in degree of freedom dominates.

If we use R² as a metrics in this case then it will never decrease even in case where the degree of freedom is getting decreased with a net effect of decrease in model’s explanatory power. So in this case R² will not give a correct result.

Adjusted R² comes to the rescue

Adjusted R² will be a more suitable metrics in this scenario where we need to capture the overall net effect of the model’s goodness of fit when a new variable is added, depending on how relevant is that variable against the decrease in the degree of freedom that happens simultaneously.

Now let us determine Adjusted R². First let us recall the following formula:

ESS + RSS = TSS
ESS/TSS + RSS/TSS = 1
R² = 1 - RSS/TSS

Adjusting the R² by adjusting RSS and TSS with their corresponding degrees of freedom

Since RSS is related to the regression line and TSS is related to the mean value of ‘Y’ so we will make adjustment to RSS with regression’s degree of freedom (n - k - 1) while we will make adjustment to TSS with its equation’s degree of freedom (n - 1) .

The adjusted R² formula is written as:

Adjusted R² = 1- [RSS/(n - k - 1)] / [TSS/ (n-1)]
=> Adjusted R² = 1- [(n -1)/(n - k - 1)]- RSS / TSS
Since RSS/TSS = 1- R² so the formula can also be written in the form:

Adjusted R² will increase or decrease on addition of new variables depending on how the addition of new variables overall impacts the explanatory power of the model, which R² may not be able to indicate in certain cases, hence we conclude that Adjusted R² is a more reliable metrics that R² in cases where there are addition or removal of new variable during the regression process.

--

--