Interpret R Linear/Multiple Regression output (lm output point by point), also with Python

Vineet Jaiswal
Feb 17, 2018 · 5 min read

Know your data

library(alr3)
Loading required package: car
library(corrplot)
data(water) ## load the data
head(water) ## view the data

Year APMAM APSAB APSLAKE OPBPC OPRC OPSLAKE BSAAM
1 1948 9.13 3.58 3.91 4.10 7.43 6.47 54235
2 1949 5.28 4.82 5.20 7.55 11.11 10.26 67567
3 1950 4.20 3.77 3.67 9.52 12.20 11.35 66161
4 1951 4.60 4.46 3.93 11.14 15.15 11.13 68094
5 1952 7.15 4.99 4.88 16.34 20.05 22.81 107080
6 1953 9.70 5.65 4.91 8.88 8.15 7.41 67594

filter.water <- water[,-1] ## Remove unwanted year

# Visualize the data
library(GGally)
ggpairs(filter.water) ## It's multivaribale regaression

LM magic begins, thanks to R

mlr <- lm(BSAAM~., data = filter.water)
summary(mlr)

# Output

Call:
lm(formula = BSAAM ~ ., data = filter.water)

Residuals:
Min 1Q Median 3Q Max
-12690 -4936 -1424 4173 18542

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15944.67 4099.80 3.889 0.000416 ***
APMAM -12.77 708.89 -0.018 0.985725
APSAB -664.41 1522.89 -0.436 0.665237
APSLAKE 2270.68 1341.29 1.693 0.099112 .
OPBPC 69.70 461.69 0.151 0.880839
OPRC 1916.45 641.36 2.988 0.005031 **
OPSLAKE 2211.58 752.69 2.938 0.005729 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7557 on 36 degrees of freedom
Multiple R-squared: 0.9248, Adjusted R-squared: 0.9123
F-statistic: 73.82 on 6 and 36 DF, p-value: < 2.2e-16

Output Explained

Residuals

Coefficients-Intercept

#            Estimate    Std. Error t value Pr(>|t|)    
# (Intercept) 15944.67 4099.80 3.889 0.000416 ***

Coefficient-Estimate

Coefficient-Std. Error

Coefficient-t value

Coefficient Pr(>|t|)

# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error

Residual standard error: 7557 on 36 degrees of freedom

Multiple R-squared and Adjusted R-squared

Multiple R-squared:  0.9248,	Adjusted R-squared:  0.9123

F-statistic

F-statistic: 73.82 on 6 and 36 DF

p-value

p-value: < 2.2e-16

So the Python

import pandas as pd
import scipy.stats as stats
from statsmodels.formula.api import ols
mlr = ols("BSAAM~OPSLAKE+OPRC+OPBPC+APSLAKE+APSAB+APMAM", df).fit()
print(mlr.summary())

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade