Intro to Maximum Likelihood Estimate

Paul Butler
Analytics Vidhya
Published in
4 min readJan 22, 2021
https://unsplash.com/photos/6tG_liBojOk

Note: This post will be a bit more math-heavy than my other posts.

Imagine flipping a penny 10 times. After those 10 times, you get heads 4 times. Your friend flips the coin 10 times and gets heads all 10 times. You and your friend keep flipping the coin, but out of all the trials, you get a probability of heads being 65%. Your friend argues that this is wrong and the true probability is 50%, but what if the coin is biased? Given these trials, how would we go about finding the most likely equation (or estimate) for the probability of heads? Your friend is convincing, but you know the data is meaningful. Both of you flipped the coin 1,000 times collectively. How can we convince your friend that the probability is 65%? This is where the maximum likelihood estimate (MLE) comes in.

What is Maximum Likelihood Estimate?

Maximum Likelihood Estimate is just a way of estimating parameters of a probability distribution. There are two main steps to finding the MLE. The first step is to establish the likelihood equation. In practice, log-likelihood is used more often, which I’ll explain after presenting the formulas for likelihood and log-likelihood. Here are the formulas:

Likelihood formula
Log-Likelihood formula

So, why use the log-likelihood formula over the likelihood formula? The main issue with the likelihood formula is we are dealing with probabilities that are floating-point numbers (or decimals) less than or equal to 1 but greater than 0. Since we are taking the product of these decimals, the results can sometimes be extremely small. This leads us to use log-likelihood instead.

The second step in MLE is to find the parameter that maximizes the likelihood/log-likelihood equation. We can find the parameter that maximizes the likelihood/log-likelihood equation by computing the derivative of the likelihood/log-likelihood equation with respect to the parameter and set the equation to 0. It should look a little like this:

Now, with all this crazy math out of the way, let’s get into an example.

MLE Example

Let’s go back to the penny example mentioned earlier. Let’s find the MLE for P(Head). So, how do we get started? Let’s get started with what we know. We know that the penny could either be heads or tails. We know the equation must involve P(Head) and P(Tail), but we can also write P(Tail) as (1-P(Head)). Let’s also create a variable called x_i. This variable is 1 if the coin lands on heads and 0 if the coin lands on tails. This probability mass function (pmf) is called a Bernoulli trial. If this is not making much sense, I would highly encourage reading up on Bernoulli distributions.

So, we should have something looking a bit like this.

Now, how do we go about solving for theta hat? Well we are going to have to take the derivative of either the likelihood or log-likelihood equation and set it to 0. I will do both just to demonstrate how to perform calculations on both likelihood equations. They will equate to the same answer.

MLE using Likelihood Equation:

MLE using Log-Likelihood Equation:

Graph of Likelihood for 3 Heads and 1 Tail

As you can see, P(HHHT|theta) has a max where the number of heads is divided by the total number of flips. In the proofs above, I showed that the maximum estimate for the probability of heads is where the number of heads is divided by the total number of flips. Hopefully, this visualization gives more insight into what we were doing in the above equations.

Conclusion

Ok, I know that was a lot to digest. In this post, we learned about Maximum Likelihood Estimate and what it’s used for. I also briefly discussed why log-likelihood is used over the normal likelihood function. We then finished up with a demonstration of applying MLE of the P(Heads). We showed that the friend may be true if the coin is balanced. The most likely estimate for P(Heads) is 65%. As always, if you enjoyed the post, make sure to smash the clap button! Until next time.

--

--

Paul Butler
Analytics Vidhya

Master's student, programmer, and data science enthusiast