Maximum Likelihood for the Normal Distribution
Let’s start with the equation for the normal distribution or normal curve
It has two parameters the first parameter, the Greek character μ (mu) determines the location of the normal distribution’s mean.
a) A smaller value for μ moves the mean of the distribution to the left.
b) A larger value for μ moves the mean of the distribution to the right.
The second parameter the Greek character σ (sigma)is the standard deviation. and determines the normal distribution’s width
a) A larger value for σ makes the normal curve shorter and wider
b) A smaller value for σ makes the normal curve taller and narrower
We’re going to use the likelihood of the normal distribution to find the optimal parameters for μ the mean and σ the standard deviation,
given some data x.
Let’s start with the simplest data set of all: a single measurement.
The goal of this super simple example is to convey the basic concepts of how to find the maximum likelihood estimates for μ and σ
Here we’ve measured a Light Bulb and it weighs 32 grams.
Now just to see what happens…
We can overlay a normal distribution with μ= 28 and σ =2 onto the data
and then plug the numbers into this equation
The likelihood of the curve with μ =28 and σ =2, given the data is 0.03
Now we can shift the distribution a little bit to the right by setting μ = 30 and then calculate the likelihood
Again we just plug the numbers into the likelihood function:
If we decide to fix σ equals 2 so that it is a given just like the data then we can plug in a whole bunch of values for μ and see which one gives the maximum likelihood
For example, if we start with the mean of the distribution over here on the left at 20 grams.
and we get a very very small likelihood equal to 0.0000000003
this case the slope equals zero when μ = 32
Now we can fix μ = 32 and treat it like a given just like the data.
And we can plug in different values for σ to find the one that gets the maximum likelihood
Note: You actually need more than one measurement to find the optimal value for σ
If we had more data then we could plot the likelihoods for different values of σ and the maximum likelihood estimate for σ would be at the peak, where a slope of the curve equals zero
To solve for the maximum likelihood estimate for μ we treat σ like it’s a constant and then find where the slope of its likelihood function is 0.
And to solve for the maximum likelihood estimate for σ we treat μ like it’s a constant and then find where the slope of its likelihood function is 0.
The example with one measurement kept the math simple, but now I think we’re ready to dive in a little deeper
So let’s use a two sample data set to calculate the likelihood of a normal distribution
To keep track of things, let’s call the first bulb that weighs 32 grams X_1
And the second bulb that weighs 34 grams X _2
We’ve already seen how to calculate the likelihood for this curve given X_1, the Light Bulb that weighs 32 grams and we can calculate the likelihood for the curve given X_2 by plugging in 34 into this likelihood function
but what’s the likelihood of this normal curve given both X sub 1 and X sub 2
These measurements are independent (i.e. weighing X_1 did not have an effect on weighing X_2)
So we just plug in the numbers and do the math
And that gives us a really small number:
If we had a third data point then we just add it to the given side of the overall likelihood an
With n data points
Then multiply together all n individual likelihood functions.
Now that we know how to calculate the likelihood of a normal distribution when we have more than one measurement.
We just multiply together the individual likelihoods.
Let’s solve for the maximum likelihood estimates for μ and σ
Here’s the likelihood function without any value specified for μ and σ
It equals the product of the likelihood functions for the N individual measurements
and here’s what the equation looks like:
What we need to do is take two different derivatives of this equation:
One derivative will be with respect to μ. When we treat σ like it’s a constant and we can find the maximum likelihood estimate for μ by finding where this derivative equals zero
the other derivative will be with respect to σ when we treat μ like it’s a constant
And we can find the maximum likelihood estimate for σ by finding where this derivative equals zero, before we try to take any derivatives, let’s take the log of the likelihood function:
We do this because it makes taking the derivative way way easier
In the likelihood function and the log of the likelihood function both peak at the same values for μ and σ.
Now we’re going to go, step by step, through all of the transformations that the log has on this function
First the log transforms the multiplication
into addition:
Let’s focus on this one first
Convert the multiplication into addition
Convert 1 over the square root into the exponent -1/2
in the right side, convert the exponent into multiplication:
Back to the above equation the -1/2 exponent into multiplication
Putting everything together:
Summarizing:
And by following the same steps, we can transform the remaining parts of the sum:
Into:
Just to be clear about how we simplify, keep in mind that since we have n data points that means we have a term for the first data point, X sub 1and that this represents the terms for the remaining n minus 1 data points.
Then all n of the negative log of σ’s can be combined
and the last parts of each term stay the same.
This is the log of the likelihood function after simplification, and it is what we will take the derivative of:
So, let’s move it to the top for reference:
We’ll start by taking the derivative with respect to μ
This derivative is the slope function for the log of the likelihood curve and we’ll use it to find the peak.
The first term doesn’t contain μ, so it’s derivative is 0, the second term doesn’t contain μ either, so it’s derivative is also 0.
The third term contains μ, so now we have to work, specifically, the numerator contains μ and we have to apply.
We can use the chain rule, remember the derivative is with respect to μ ( σ is a constant and, thus, the denominator doesn’t change)
We can use the same logic to the remaining terms and get
we can pull the σ squared out and add the numerators together and combining the measurements and the μ’s
Now, let’s take the derivative of the log-likelihood function with respect to σ.
This derivative is the slope function for the log of the likelihood curve, and we’ll use it to find the peak.
So, from here on out, because they peak at the same spot I’ll show you the likelihood functions instead of the log-likelihood functions
Recall
The first term doesn’t contain σ, so it’s derivative is zero, the derivative of the second term is just n over σ.
The derivative of the third term isn’t tricky but it’s easier to figure out when we rewrite 1 over σ squared
We can use the same logic to the remaining terms and get the derivative of the log likelihood function with respect to σ
Simplifying:
To find the maximum likelihood estimate for μ, we need to solve for where the derivative with respect to μ=0 because the slope is zero at the peak of the curve:
Likewise to find the maximum likelihood estimate for σ, we need to solve for where the derivative with respect to σ=0
Setting the derivative with respect μ to 0 and solve for μ.
We start by multiplying both sides by σ squared, that makes the σ squared go away:
Then we add n times μ to both sides,
divide both sides by n and solve:
The maximum likelihood estimate for μ is the mean of the measurements.
Now we need to set the derivative with respect to σ to 0
Now multiply both sides by σ
Add n to both sides and multiplying both sides by σ squared
Divide both sides by n
and take the square root of both sides and at long last:
We see that the maximum likelihood estimate for σ is the standard deviation of the measurements
In Summary the mean of the data is the maximum likelihood estimate for where the center of the normal distribution should go and the standard deviation of the data is the maximum likelihood estimate of how wide the normal curve should be.
References:
Please subscribe to his channel: