Interpreting an OLS model Summary!

Lokesh Rathi
Analytics Vidhya
Published in
3 min readOct 12, 2020
Google Images

Linear Regression is probably the first model you might have build either on Boston House dataset or Salary Prediction.

Although the model itself doesn’t require much of Hyperparamter Tuning, understanding the model result might be a small problem to many.

In this article, I have tried to cover the important concepts from OLS model summary as well as the Interview Questions, this might just come in handy to you!

I have linked the Dataset and Code in my GitHub repo, please find the same linked at the end of this article.

Assume we have a dataset,

With Target variable as ‘Sound_pressure_level’.

Since this is represented as an continuous variable, we use Multiple Linear Regression to model this and predict the levels for any new data.

df.info()

— Gives an idea about the features present in the dataset and about any missing Data values.

Let us now take a peak into the OLS model.

X — is the Independent features such as:

Frequency(Hz)

Angle of Attack

Chord_length

Free_stream_velocity

Displacement

Y — ‘Target’ Variable

Sound_pressure_level

Import the statsmodel library and build your first model using the following code:

Output:

OLS_Summary_Report

Lets understand the various Variables present in the Summary:

1. R-squared and Adjusted R-squared:

If the values of Adjusted R-squared and R-squared is very different, it is a sign that A feature/variable, might not be relevant to your model.

From our OLS summary, our values are 0.516 and 0.514 and hence we can say that there not much difference between them.

2. F — Statistic or F-test:

It is used for assessing the overall significance of a model. In a Multiple LR, it compares the model with no predictors i.e it creates two models, one with predictors and the other without predictors(only the constant variable).

The Null hypothesis is that these 2 models (intercept only model) are equal.

The Alternate Hypothesis is that the ‘intercept only model’ is worse that our ‘OLS model’.

We get back a p-value as well as a statistic value, that helps us to select/reject Null hypothesis.

From our OLS summary, the p-value is very small (0.00) and high F-statistic value (318.8), therefore we reject our Null hypothesis and conclude that there is a Linear Relationship between Independent and the Target Variable.

3. T-test:

Unlike f-test, t-test compares each Features with the Target Variable and tells if there is a relationship between them.

Null hypothesis is that the Feature coefficient is going to be 0.

The Alternate Hypothesis is that the Feature coefficient is not going to be 0.

Higher the t-test value, higher the chances that you reject the Null hypothesis.

From our OLS summary, the value is high and hence we reject the Null hypothesis (also p-value < 0.05 and hence we reject the Null hypothesis).

These are the few Stats concepts which is applied with Machine Learning and is quite famous among Interviewers.

I hope you liked this article, do give Support my work and share this with your friends who are trying to get into the Data Science Field.

Find the GitHub Repository Link for Data Cleaning, Visualization and Model pipe-lining.

Follow me on Linked for more such Data Science related content.

--

--

Lokesh Rathi
Analytics Vidhya

I write articles on Data Science, Machine learning Algorithm and Big Data.