MSE vs MLE for linear regression

Published in

Analytics Vidhya

3 min readNov 29, 2019

What is MSE ?

Mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors — that is, the average squared difference between the estimated values and the actual value

What is MLE ?

Maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable.

Basically we

First assume that the data comes from a certain distribution
Then we randomly pick some parameters for that distribution
Then we calculate the Likelihood of the observed data under the assumed distribution
Then we use Optimization Algorithm like Gradient Descent to find us the best parameters for our assumed distribution by maximizing the likelihood

MSE vs MLE for linear regression

• Goal of this article is to empirically see if the the estimates found by MSE are similar (same) to(as) the estimates found by the MLE method

Generating data

## Random X from normal distribution
 x <- rnorm(100 , mean = 20)## Let this be the "True" Phenomenon b0 <- 10
 b1 <- 20 y <- b1*x + b0 + rnorm(100)## Convert to dataframe
 df <- data.frame(x = x , y = y)
 head(df)## x y
 ## 1 21.02084 430.5484
 ## 2 19.60804 401.8297
 ## 3 21.19639 434.3705
 ## 4 19.05572 392.9510
 ## 5 18.55657 381.0414
 ## 6 21.69841 445.1470

MSE Estimate

model <- lm(data = df , formula = y ~ x)
 summary(model)

Model Summary

## 
 ## Call:
 ## lm(formula = y ~ x, data = df)
 ## 
 ## Residuals:
 ## Min 1Q Median 3Q Max 
 ## -2.4102 -0.7222 -0.1791 0.7138 2.7849 
 ## 
 ## Coefficients:
 ## Estimate Std. Error t value Pr(>|t|) 
 ## (Intercept) 10.800 2.249 4.803 5.63e-06 ***
 ## x 19.964 0.112 178.206 < 2e-16 ***
 ## — -
 ## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
 ## 
 ## Residual standard error: 1.095 on 98 degrees of freedom
 ## Multiple R-squared: 0.9969, Adjusted R-squared: 0.9969 
 ## F-statistic: 3.176e+04 on 1 and 98 DF, p-value: < 2.2e-16

We see our model did a good job of estimating the true parameters using MSE — Intercept is estimated to 10.8 and b1 to 19.964

## (Intercept) 10.800 
 ## x 19.964

MLE Estimate

loglikelihood <- function(b0 , b1){
 -sum(dnorm(df$y — df$x*b1 — b0 , log=TRUE))
 #-sum(log(R))
 }
 library(stats4)
mle(loglikelihood, start = list(b0 = 1 , b1 = 1))## 
 ## Call:
 ## mle(minuslogl = loglikelihood, start = list(b0 = 1, b1 = 1))
 ## 
 ## Coefficients:
 ## b0 b1 
 ## 10.80034 19.96405

IS THIS MAGIC OR WHAT??!!!

• we see that MLE Estimate is equal to the MSE estimate!

• Why ?

• Here is the Mathematical Proof

MSE vs MLE for linear regression

MSE vs MLE for linear regression

Generating data

MSE Estimate

MLE Estimate

IS THIS MAGIC OR WHAT??!!!

Written by Abhijeet Pokhriyal