MSE vs MLE for linear regression
What is MSE ?
Mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors — that is, the average squared difference between the estimated values and the actual value
What is MLE ?
Maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable.
Basically we
- First assume that the data comes from a certain distribution
- Then we randomly pick some parameters for that distribution
- Then we calculate the Likelihood of the observed data under the assumed distribution
- Then we use Optimization Algorithm like Gradient Descent to find us the best parameters for our assumed distribution by maximizing the likelihood
MSE vs MLE for linear regression
• Goal of this article is to empirically see if the the estimates found by MSE are similar (same) to(as) the estimates found by the MLE method
Generating data
## Random X from normal distribution
x <- rnorm(100 , mean = 20)## Let this be the "True" Phenomenon b0 <- 10
b1 <- 20 y <- b1*x + b0 + rnorm(100)## Convert to dataframe
df <- data.frame(x = x , y = y)
head(df)## x y
## 1 21.02084 430.5484
## 2 19.60804 401.8297
## 3 21.19639 434.3705
## 4 19.05572 392.9510
## 5 18.55657 381.0414
## 6 21.69841 445.1470
MSE Estimate
model <- lm(data = df , formula = y ~ x)
summary(model)
Model Summary
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4102 -0.7222 -0.1791 0.7138 2.7849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.800 2.249 4.803 5.63e-06 ***
## x 19.964 0.112 178.206 < 2e-16 ***
## — -
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 1.095 on 98 degrees of freedom
## Multiple R-squared: 0.9969, Adjusted R-squared: 0.9969
## F-statistic: 3.176e+04 on 1 and 98 DF, p-value: < 2.2e-16
- We see our model did a good job of estimating the true parameters using MSE — Intercept is estimated to 10.8 and b1 to 19.964
## (Intercept) 10.800
## x 19.964
MLE Estimate
loglikelihood <- function(b0 , b1){
-sum(dnorm(df$y — df$x*b1 — b0 , log=TRUE))
#-sum(log(R))
}
library(stats4)
mle(loglikelihood, start = list(b0 = 1 , b1 = 1))##
## Call:
## mle(minuslogl = loglikelihood, start = list(b0 = 1, b1 = 1))
##
## Coefficients:
## b0 b1
## 10.80034 19.96405
IS THIS MAGIC OR WHAT??!!!
• we see that MLE Estimate is equal to the MSE estimate!
• Why ?
• Here is the Mathematical Proof