Interpreting results of OLS

Stuti Singh
Analytics Vidhya
Published in
5 min readJul 5, 2020

Whether you are new to data science or even an experienced veteran, interpreting the results of a machine learning algorithm can be a challenge. The challenge is to understand the results of this model. Does that result mean how well the model worked with the data that you used to train it ? Linear regression is one of the most commonly used methods used for inference and prediction. But often people tend to ignore OLS assumptions before interpreting the results of this. Therefore, this is an important step for analysing various statistics released by OLS. I am going to explore the house price prediction dataset, a small, simple dataset containing observations of various house characteristics and prices.

Dependent(Predicted) variable- price

Independent variable- size

First Part(model Summary) Interpretation

Dep. Variable: Here dependent variable is price that we are going to predict through model.

Model: OLS stands for Ordinary Least Squares. Ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS selects the parameters of a linear function of a set of explanatory variables by the principle of least squares.

Method: Least squares is a standard approach in regression analysis to approximate the solution by minimising the sum of the squares of the residuals.

No of Observations: Total no of observations present in dataset

Df Residuals: The df(Residual) is the sample size minus the number of parameters being estimated, so it becomes df(Residual) = n — (k+1)

In our case n=100,k=1

df(Residual)=98

R-Squared: R2 is a statistic that will give some information about the goodness of fit of a model. It ranges from 0 to 1. In our case value of R-squared is 0.745 so it explains 74% of variance is explained by the model.

what’s a good value for R-squared ?” Answer will be,” it depends”. It depends on your goals and how the dependent variable is defined. If the dependent variable is a nonstationary (e.g., trending) time series, an R-squared value very close to 1, may not be very impressive. In fact, if R-squared is very close to 1, and the data consists of time series, this is usually a bad sign: there will often be significant time patterns in the errors. On the other hand an R-squared of 10% or even less may have some information when you are looking for a weak signal in the presence of a lot of noise in a setting where even a very weak one will be of general interest. Never let yourself fall into the trap of fitting a regression model that has a respectable-looking R-squared but is actually much inferior to a simple time series model.

R-squared is not the bottom line.

Adj. R Squared: Each time you add an independent variable to the model, the R-squared increases, even if the independent variable is not significant. It never decreases. Whereas Adjusted R-squared increases only when the independent variable is significant and affects the dependent variable.

Adjusted R-square should be used while selecting important predictors (independent variables) for the regression model.

F-statistics and Prob F-statistics: The “F value’’ and “Prob(F)’’ statistics test the overall significance of the regression model. Specifically, they test the null hypothesis that all of the regression coefficients are equal to zero.The F value is the ratio of the mean regression sum of squares divided by the mean error sum of squares. Its value will range from zero to an arbitrarily large number.

The value of Prob(F) is the probability that the null hypothesis for the full model is true (i.e., that all of the regression coefficients are zero). For example, if Prob(F) has a value of 8.13e-31 then there is almost zero chance in 100 that all of the regression parameters are zero.

AIC & BIC: AIC is abbreviation for Akaike’s Information Criteria and is used for model selection. It penalizes the errors mode in case a new variable is added to the regression equation. It is calculated as the number of parameters minus the likelihood of the overall model. A lower AIC implies a better model. Whereas, BIC stands for Bayesian information criteria and is a variant of AIC where penalties are made more severe.

Second Part (Coefficient Table)Interpretation

coef: Here we have coefficient for const and size as 1.019e+5 and 223.17 so if I say Price=b0+b1*size

It will be Price=(1.019e+5)+223.17*size

std err: It shows accuracy for each prediction. Lower the std error better the estimates.

t & p(t): It shows value for t statistics and p value. It involves hypothesis. It answers question as Is it a useful variable or does it help us to explain variability we have in this case. As we know p value <0.05 is considered as variable significant. And we can say in our example “size” is significant predictor while predicting “price”.

Third Part Interpretation

Let’s look at each of the values listed:

Omnibus/Prob(Omnibus) — Omnibus test is carried out in order to check whether errors are normally distributed(one of the assumptions of linear regression). Here, the null hypothesis is that the errors are normally distributed. A value close to zero is preferred, that would indicate normality. The Prob (Omnibus) performs a statistical test showing the probability that the residuals are normally distributed. A value close to 1 is preferred here.

Skew — Value of Skew is preferred close to zero, indicating the residual distribution is normal. Note that this value also controls the Omnibus.

Kurtosis — It is a measure of curvature (peakiness)of the data. Higher peaks lead to greater Kurtosis. Higher value for Kurtosis shows tighter clustering of residuals around zero, implying a better model with few outliers.

Durbin-Watson — It tests for homoscedasticity (independance of error). A value between 1 and 2 is preferred.

Jarque-Bera (JB)/Prob(JB) — This test is for normality of residual(one of the assumption for linear regression)The test is named after Carlos Jarque and Anil K. Bera. The test statistic is always positive. A large value of Jarque-Bera test shows that the errors are not normally distributed.

Condition Number — This test measures the sensitivity of a function’s output as compared to its input. In case of multicollinearity, we can expect much higher fluctuations to small changes in the data.

References:

  1. https://people.duke.edu/~rnau/rsquared.htm
  2. https://medium.com/@jyotiyadav99111/statistics-how-should-i-interpret-results-of-ols-3bde1ebeec01
  3. https://en.wikipedia.org

--

--