R-Squared and Adjusted R-Squared

Abhigyan
Analytics Vidhya
Published in
4 min readMar 2, 2020

--

When working with the linear regression model, doing a summary on the model in R by using the summary() function or from the statsmodel.api package in python using the .OLS() function,you will come across different Statistical measures which include R-Squared,Adjusted R-Squared,F-Statistic and P-value.Let’s talk about the R-squared and Adjusted R-squared.

R-Squared:

R-squared is a measure to which your input variables explain variance of the predicted variable.

Variance is a measure in statistics determining how far the points are spread out from each other.In other terms,It is defined as the average of the square of differences between individual point and the expected value.

R-Squared is a relative term related to the mean model.R-squared value ranges from 0–1 and the more closer it is to 1 the more it explains about the variability of response data around it’s mean.

So greater the R-squared value means the more better the model is? Yes,However the higher R-Squared value does not always mean the model is good or bad.

Photo by Jon Tyson on Unsplash

There can be cases where the value of R-Squared are low for a good model and value of R-Squared is high for a model that does not fit the data. R-Squared just provides an estimate of the strength of the relationship between your model and the response variable, it does not provide a formal hypothesis test for this relationship. The F-Test of overall significance determines whether this relationship is statistically significant.Refer to this article as to why i made the statement.

As R-Squared is a related to the mean model,adding more variables to the model may tend to increase the value of Squared although the variable may not contribute much to the model.This may also lead to the overfitting of the model as it becomes overly customized to fit the peculiarities and random noise in your sample rather than reflecting the entire population.

Formula of R-Sqaured.

Where,SSres is Sum of Squared Regression also known as Variance explained by model.

SStot is Sum of Squared Total.

yi is Actual Observation.

yi_cap is Predicted Observation.

y_bar is Mean of Actual value.

Adjusted R-Squared:

Since,R-square can be increased by adding more number of variable and may lead to the over-fitting of the model, the Adjusted R-squared comes into the picture.

Adjusted R-Squared is the modified version of the R-Squared whose value increases only when the variable in the model adds value to it,So more the useless variable present in the model lesser the value of adjusted R-Squared and more will be the value of R-Squared.This is why the value of Adjusted R-Squared is always lesser to the value of R-Squared.

Even though both R-Squared and adjusted R-Squared give an idea of the data points falling in the regression line.The only difference between them is that Adjusted R-Squared finds the percentage of variation explained by the independent variable that actually affect the dependent variable and R-Squared assumes every single variable explains variation in the dependent variable.

Formula of Adjusted R-Squared

where,

n is the number of data points.

k is the number of variable in the model excluding the dependent variable.

Like my article? Do give me a clap and share it,as that will boost my confidence.Also,I post new articles every sunday so stay connected for future articles of the basics of data science and machine learning series.

Also,if you want then connect with me on linkedIn.

Photo by Matt Botsford on Unsplash

--

--