[ML] 1. Maximum Likelihood(ML) and Maximum A Posteriori(MAP) Estimation

Awaits
Learning
Published in
2 min readMar 26, 2020

This chapter will introduce the most commonly used two estimation methods, (1) Maximum Likelihood and (2) Maximum A Posteriori(MAP)

1. Bayes Rule

Figure 1. Equation of the Bayes Rule

In this case, the x and e imply the given dataset, and H and 𝚯 mean the parameter(hypothesis). In other words, x = e and H = 𝚯 in the above figure

The image below explains the difference between the probability and the likelihood. In case further explanation is needed, please follow this link.

Figure 2. from [1], probability vs likelihood

2. Maximum Likelihood Estimation

A maximum likelihood(ML) estimation is a method of estimating the parameters of a probability distribution by maximizing a likelihood function.

Therefore, ML is defined as below.

Figure 3. Derive to the ML Estimation

In the case of the classification task with supervised learning, our dataset is composed of pairs of data x and corresponding label y. This means the ML estimation also needs to deal with the conditional probability of model(network) output y’ given the input data x.

Figure 4. ML Estimation for the conditional distribution

3. Maximum A Posteriori(MAP)

An alternative estimator is the MAP estimator, which finds the parameter theta that maximizes the posterior.

According to the Bayes rule, the posterior can be decomposed into the product of the likelihood and prior. The MAP estimator begins with this idea and is defined as below.

Figure 5. Derive to the MAP Estimation

As the ML can be generalized to the conditional probability distribution, so does the MAP.

Figure 6. MAP Estimation for the conditional probability distribution

4. Reference

[1] https://www.youtube.com/watch?v=pYxNSUDSFH4

[2] https://www.youtube.com/watch?v=pYxNSUDSFH4

Any corrections, suggestions, and comments are welcome

Contents of this article are reproduced based on Bishop and Goodfellow

--

--