Data Science (Python) :: R Squared

Intention of this post is to give a quick refresher (thus, it’s assumed that you are already familiar with the stuff) of concept of “R Squared” (using Python). You can treat this as FAQ’s as well.

What is a residual sum of squares (or simply residual sometimes)?
Residual is the sum of squares of actual value minus predicted value (predicted using a model). The model which gives least residual is considered as the best model.


What is total sum total of squares?
When we draw a line based on average of the dependent values and then calculate residual using this line, this is call sum of squares. Basically, we apply the same formula as residual, but after drawing line which is an average (this, will be parallel to X-axis)


What is R-Squared and what is it’s significance in modelling?
R-Squared = 1- (sum residual of squares / sum total of squares)
A model with an R-squared close to 1 is considered best fitting model. A model which is farther from 1 is considered as a bad model.


If we keep increasing the variables (multiple linear regression), will R-Squared increase or decrease and why?
With the increase of variables, R-Squared can never decrease. It will either stay or might decrease. This is because, when we are adding a new variable, model will try to find a coefficient which will decrease the squared sum of residuals. When the squared sum of residuals decreases, (look at the formula) R-Squared increases.


What is adjusted R-Squared and what is it’s significance?
Adjusted R-Squared = 1 — ( 1 — R² ) * [ ( n — 1) / ( n — p — 1)]
p = number of regressors/variables
n = sample size
Adjusted R-Squared counters the problem faced in R-Squared when there is an increase in the number of variables.

Next :- Data Science :: Advantages & Disadvantages of Each Regression Model

Prev :- Data Science (Python) :: Random Forest Regression

If you liked this article, please hit the ❤ icon below