Naive Bayes Classifier: Calculation of Prior, Likelihood, Evidence & Posterior

Abhishek Kumar
4 min readApr 10, 2019

--

Naive Bayes is a non-linear classifier, a type of supervised learning and is based on Bayes theorem. Basically, it’s “naive” because it makes assumptions that may or may not turn out to be correct. In other words, it is called naive Bayes or idiot Bayes because the calculation of the probabilities for each hypothesis are simplified to make their calculation tractable. Rather than attempting to calculate the values of each attribute value, they are assumed to be conditionally independent.

Naive Bayes theorem

Let’s take an example (graph on left side) to understand this theorem. x-axis represents Age, while y-axis represents Salary. There are 10 red points, depicting people who walks to their office and there are 20 green points, depicting people who drives to office. Now, we’ve taken one grey point as a new data point and our objective will be to use Naive Bayes theorem to depict whether it belongs to red or green point category, i.e., that new person walks or drives to work?

Our first step would be to calculate Prior Probability, second would be to calculate Marginal Likelihood (Evidence), in third step, we would calculate Likelihood, and then we would get Posterior Probability.

To calculate P(Walks) would be easy. It is simply the total number of people who walks to office by the total number of observation. Notice that the grey point would not participate in this calculation. So, the first step is complete.

Next step involves calculation of Evidence or Marginal Likelihood, which is quite interesting. The first thing that we will do here is, we’ll select a radius of our own choice and draw a circle around our point of observation, i.e., new data point. We’ll ignore our new data point in that circle, and will deem every other data point in that circle to be about similar in nature. P(X) tells us what is likelihood of any new random variable that we add to this dataset that falls inside this circle. Therefore, ignoring new data point, we’ve four data points in our circle. So, now we’ve completed second step too.

Now, we’ll calculate Likelihood and P(X|Walks) says, what is the Likelihood that somebody who walks exhibits feature X. Again, we will draw a circle of our radius of our choice and will ignore our new data point(X) in that and anything that falls inside this circle would be deem as similar to the point that we are adding.

So, the question is: what is the probability that a randomly selected data point from our data set will be similar to the data point that we are adding. Other way to think about this is: we are only working with the people who walks to work. So forget about green dots, we are only concerned about red dots here and P(X|Walks) says what is the Likelihood that a randomly selected red point falls into the circle area. And we’ve three red dots in the circle.

Now is the time to calculate Posterior Probability. We just fitted everything to its place and got it as 0.75, so 75% is the probability that someone putted at X(new data point) would be classified as a person who walks to his office.

Because this is a binary classification, therefore 25%(1-0.75) is the probability that a new data point putted at X would be classified as a person who drives to his office. If this was not a binary classification, we then need to calculate for a person who drives, as we have calculated above for the person who walks to his office.

Finally, we classified the new datapoint as red point, a person who walks to his office.

I hope, this article would have helped to understand Naive Bayes theorem in a better way.

Happy Analysing! :-)

References: https://www.udemy.com/machinelearning/

--

--