[ML] 1. Maximum Likelihood(ML) and Maximum A Posteriori(MAP) Estimation

Published in

Learning

2 min readMar 26, 2020

This chapter will introduce the most commonly used two estimation methods, (1) Maximum Likelihood and (2) Maximum A Posteriori(MAP)

1. Bayes Rule

In this case, the x and e imply the given dataset, and H and 𝚯 mean the parameter(hypothesis). In other words, x = e and H = 𝚯 in the above figure

The image below explains the difference between the probability and the likelihood. In case further explanation is needed, please follow this link.

Figure 2. from [1], probability vs likelihood

2. Maximum Likelihood Estimation

A maximum likelihood(ML) estimation is a method of estimating the parameters of a probability distribution by maximizing a likelihood function.

Therefore, ML is defined as below.

In the case of the classification task with supervised learning, our dataset is composed of pairs of data x and corresponding label y. This means the ML estimation also needs to deal with the conditional probability of model(network) output y’ given the input data x.

Figure 4. ML Estimation for the conditional distribution

3. Maximum A Posteriori(MAP)

An alternative estimator is the MAP estimator, which finds the parameter theta that maximizes the posterior.

According to the Bayes rule, the posterior can be decomposed into the product of the likelihood and prior. The MAP estimator begins with this idea and is defined as below.