MSE vs MLE for linear regression

Abhijeet Pokhriyal
Analytics Vidhya
Published in
3 min readNov 29, 2019
Photo by Terry Vlisidis on Unsplash

What is MSE ?

Mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors — that is, the average squared difference between the estimated values and the actual value

What is MLE ?

Source : StatQuest with Josh Starmer

Maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable.

Basically we

  1. First assume that the data comes from a certain distribution
  2. Then we randomly pick some parameters for that distribution
  3. Then we calculate the Likelihood of the observed data under the assumed distribution
  4. Then we use Optimization Algorithm like Gradient Descent to find us the best parameters for our assumed distribution by maximizing the likelihood

Further Reading

MSE vs MLE for linear regression

• Goal of this article is to empirically see if the the estimates found by MSE are similar (same) to(as) the estimates found by the MLE method

Generating data

## Random X from normal distribution
x <- rnorm(100 , mean = 20)
## Let this be the "True" Phenomenon b0 <- 10
b1 <- 20
y <- b1*x + b0 + rnorm(100)## Convert to dataframe
df <- data.frame(x = x , y = y)
head(df)
## x y
## 1 21.02084 430.5484
## 2 19.60804 401.8297
## 3 21.19639 434.3705
## 4 19.05572 392.9510
## 5 18.55657 381.0414
## 6 21.69841 445.1470

MSE Estimate

model <- lm(data = df , formula = y ~ x)
summary(model)

Model Summary

## 
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4102 -0.7222 -0.1791 0.7138 2.7849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.800 2.249 4.803 5.63e-06 ***
## x 19.964 0.112 178.206 < 2e-16 ***
## — -
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 1.095 on 98 degrees of freedom
## Multiple R-squared: 0.9969, Adjusted R-squared: 0.9969
## F-statistic: 3.176e+04 on 1 and 98 DF, p-value: < 2.2e-16
  • We see our model did a good job of estimating the true parameters using MSE — Intercept is estimated to 10.8 and b1 to 19.964
## (Intercept) 10.800 
## x 19.964

MLE Estimate

loglikelihood <- function(b0 , b1){
-sum(dnorm(df$y — df$x*b1 — b0 , log=TRUE))
#-sum(log(R))
}
library(stats4)
mle(loglikelihood, start = list(b0 = 1 , b1 = 1))
##
## Call:
## mle(minuslogl = loglikelihood, start = list(b0 = 1, b1 = 1))
##
## Coefficients:
## b0 b1
## 10.80034 19.96405

IS THIS MAGIC OR WHAT??!!!

• we see that MLE Estimate is equal to the MSE estimate!

• Why ?

• Here is the Mathematical Proof

--

--

Abhijeet Pokhriyal
Analytics Vidhya

School of Data Science @ University of North Carolina — Charlotte