Naive Bayes Classifier

Smokers are less likely to die of age related sickness

Gaurav Chandak
Learning Machine Learning
3 min readMar 13, 2018

--

Originally published on my blog: Naive Bayes Classifier

Let’s get back to the example from our Supervised Learning introduction where given a basket of fruits, we had to identify each fruit. Say, for the learning process, I take you to an orchard and give you details about all the fruits in the orchard and in the form of a table with each row containing the following attributes:

  • the name of the fruit,
  • if it is yellow or not,
  • if it is long or not, and,
  • if it is sweet or not.
Sample data for the details about the fruits in the orchard

Now given that for each fruit in the basket, you have to identify if it is mango, banana or something else.

The core of Naive Bayes Classifier is that the attributes are considered to be conditionally independent. Even though it may not seem very practical, but being a simple algorithm this assumption is very effective.

So let’s say, when you try to identify a fruit from the basket, you pick up a fruit and note that it is yellow, long and not sweet. Now how can you use the given table to identify this fruit?? What if you can somehow compute the probabilities of the fruit being mango/banana/others using the data. You can actually compute the probability of the fruit being a yellow, long and not sweet X(mango/banana/others) using Bayes’ Theorem as:

P(X, Y, L, S’) = P(Y, L, S’, X) = P(Y|L, S’, X) * P(L|S’, X) * P(S’|X) * P(X)

Here, P(Y|L, S’, X) means probability of color being yellow given that the fruit is long, not sweet and is X. P(L|S’, X), P(S’|X) have similar meanings.

But since we are trying to use Naive Bayes Classifier, we’ll assume that all attributes are conditionally independent, so we can write the above as:

P(X, Y, L, S’) = P(Y|X) * P(L|X) * P(S’|X) * P(X)

Here,

  • P(X) = the probability of the fruit being X,
  • P(Y|X) = the probability of the color being yellow given it is fruit X,
  • P(L|X) = the probability of the length being long given it is fruit X,
  • P(S’|X) = the probability of the sweetness type being (sweet)’ given it is fruit X

The difference between both the equations is that we are assuming that P(Y|L, S’, X) = P(Y|X) since all attributes are independent. We are doing the same for other components in the equation.

Based on the details provided by me, you can create an aggregated table from the list with counts of each attributes for mango, banana and others as:

Here, Value at (Mango, Yellow) represents no. of yellow mangoes. Value at (Mango, Total Pieces) represents total no. of mangoes.

Now from this table, you can create a likelihood table (A table which represents the likelihood of an event happening) like this:

Here, value at (Mango, Yellow) represents P(Y|M), i.e., Probability of the color being yellow given that the fruit is mango.
Value at (Mango, Total Pieces) represent P(M), i.e., Probability of the fruit being mango. Other cells represent similarly.

Now, you’ve all the required probabilities for computing P(X, Y, L, S’).

P(M, Y, L, S’) = 200/300 * 0/300 * 250/300 * 300/1000 = 0
P(B, Y, L, S’) = 350/400 * 350/400 * 200/400 * 400/1000 = 0.153125
P(O, Y, L, S’) = 150/300 * 100/300 * 150/300 * 300/1000 = 0.025

Therefore, their individual probabilities given that fruit is yellow, long and not sweet are:

P(M | Y, L, S’) = 0/(0+0.153125+0.025) = 0
P(B | Y, L, S’) = 0.153125/(0+0.153125+0.025) = 0.86
P(O | Y, L, S’) = 0.025/(0+0.153125+0.025) = 0.14

Given, that the probability of it being a banana is the highest, you can say that the fruit is a banana. You can do the same for all the fruits in the basket and identify most of them correctly with a very high accuracy.

One drawback of using Naive Bayes Classifier is that it makes the assumption that the attributes are conditionally independent which is not always true.

Naive Bayes classifier is available in scikit-learn as GaussianNB, MultinomialNB, BernoulliNB in sklearn.naive_bayes.

Bayes’ Theorem (Source: wikipedia)

--

--