Approach to Understanding Naive Bayes Algorithm

Tejan Gupta
5 min readJun 22, 2023

--

Welcome everyone to my first blog post! Today we are going to understand intuition and take a look at a practical example and the mathematical calculations that happen behind the scene and how we finally get the prediction. It’s gonna be quite magical. So without further ado let’s dive in.

Naive Bayes is a classification algorithm, unlike regression. For simplicity, I will be restricting myself to only a binary dataset for the example which I will undertake later on in this article.

Prerequisites: Basic understanding of Supervised ML, Basic Probability (Don’t worry if this section is a little rusty, I will be giving a short primer.)

Firstly let us understand the basics of probability.

For the particular experiment of rolling a die, our sample set is as follows:

S = {1, 2, 3, 4, 5, 6}

Now, let us understand the probability or the likelihood of the occurrence of a few events, for instance, getting a 1, 2, or 3.

P(1) = P(2) = P(3) = No of ways an event can occur
------------------------------ = 1/6
No of possible outcomes

Here, the events being discussed in context are all independent events, since the event of say, getting a 1 is not dependent on the other (2 or 3).

With that in mind, let’s understand what dependent events are with an example.

Suppose I have a bag with 3 yellow balls and 3 green balls as shown below. We calculate the probability of the first event i.e., drawing a yellow ball, which is:

Bag with the given 6 balls
P(Y) = 3/6 = 1/2

We are now left with a total of 5 balls (with one yellow ball drawn). To further calculate the probability of the second event i.e., drawing a green ball, we have to use something which is important called conditional probability, which will be used in the ML algorithm of today’s discussion. That is the calculation of the probability of the above-said event aka the 2nd event given that the first event (drawing a yellow ball) has already occurred.

Remaining 5 balls
P(G/Y) = 3 / 5

Now suppose you are given a question of calculating the probability of “drawing a yellow ball and then a green ball”. For these kinds of questions with the conjunction and present in between the events, we use a rule called as Multiplication Rule.

P(Y and G) = P(Y) * P(G/Y) 
= 1/2 * 3/5 = 3/10

Further, let us understand what Bayes’ Theorem is, which is the crux of the Naive Bayes algorithm. We can derive it below as follows

P(A and B) = P(B and A)    
P(A) * P(B/A) = P(B) * P(A/B)

=> P(B/A) = P(B) * P(A/B)
-------------
P(A)

The equation above is called Bayes’ Theorem. That is all there is to understand about the probability for the purposes of this algorithm.

Here, we will talk about the Naive Bayes algorithm. Suppose, my dataset contains a set of independent features x1, x2, x3, x4, …….., xn, and the output or target variable y. Using Bayes’ Theorem we can calculate the probability of y given the set of features x1, x2, x3, x4, …….., xn as follows

By restricting the independent features to x1, x2, and x3 and output to mere Yes and No (Binary) we get these two equations

We can ignore the denominator completely since it is a fixed constant.

With this example, we can better understand how the prediction of whether yes or no is happening.

Let’s say we have

P(Yes/xi) = 0.13 and P(No/xi) = 0.05

The actual probability will look something like this

P(Yes/xi) = 0.13 P(No/xi) = 0.05
------------- = 72% -------------- = 28%
0.13 + 0.05 0.13 + 0.05

Since P(Yes/xi) is greater than P(No/xi) the output will be Yes.

After having an understanding of this, let us take a practical example in the theme of keeping things not boring, and to drive the point home we consider this dataset,

For our easiness, we will take into consideration only the independent features Outlook and Temperature and the o/p Play Tennis.

First, we construct the tables for Outlook and Temperature and for Play Tennis.

To predict for the set of new inputs (Sunny, Hot) we can calculate P(Yes/(Sunny, Hot)) and P(No/(Sunny, Hot))

P(Yes/(Sunny, Hot)) = P(Yes) * P(Sunny/Yes) * P(Hot/Yes)
= 9/14 * 2/9 * 2/9 = 0.031

P(No/(Sunny, Hot)) = P(No) * P(Sunny/No) * P(Hot/No)
= 5/14 * 3/5 * 2/5 = 0.08571

To get the Real Probability we do

P(Yes/(Sunny, Hot)) =    0.031
-------------
0.031 + 0.0857

= 27%

P(No/(Sunny, Hot)) = 100 - 27 = 73%

The value of P(No/(Sunny, Hot)) is greater by using Naive Bayes, hence under the circumstances, we are not permitted to play tennis :(

--

--