Naïve Bayes Theorem

Understanding Naïve Bayes Theorem

Gajendra
6 min readDec 22, 2022

Naïve Bayes Theorem

Naïve Bayes Theorem is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naïve Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

For example, a fruit may be considered to be an Apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an Apple and that is why it is known as ‘Naïve’.

Naïve Bayes Theorem

Let’s understand some of the key concepts related to Naïve Bayes Theorem.

Conditional Probability

Conditional Probability is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome. Conditional probability is calculated by multiplying the probability of the preceding event by the updated probability of the succeeding, or conditional, event.

Conditional Probability

Similarly,

Conditional Probability

Bayes’ Theorem

Bayes’ Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. Bayes’ Theorem provides a way to revise existing predictions or theories (update probabilities) given new or additional evidence.

From the conditional probability we know,

Conditional Probabilities

Also, per the law of probability, the probability of events A and B both occurring is the same as the probability of B and A both occurring.

Probability of Events

So,

Probability of Events

Therefore,

Bayes’ Theorem

Mathematics

Let’s assume we have a dataset as described here,

Dataset Description

Per the Bayes’ Theorem we get,

Bayes’ Theorem on Dataset

Now, the output of the equation below will be constant for any outcome i.e. will always give same value.

Constant

So we can say,

Proportionality

Finally,

Outcome

What this means is that the output (y) will be the value of the highest outcome of the equation.

Example

Let’s understand the concept with an example.

Binary Classification

Here we have a training dataset of Outlook, Temperature and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition.

Dataset

Outlook
Temperature
Play

Problem: Will Players play Today if the Outlook is Sunny and Temperature is Hot?

We can solve this problem using method of posterior probability.

Today

We can ready Today as a record with feature Outlook as Sunny and Temperature as Hot.

Today
Today

Playing

As per the Bayes’ Theorem,

Bayes’ Theorem

Per Naïve Bayes Theorem

Naïve Bayes Theorem

After plugging values we get,

Probability of Playing

Not Playing

As per the Bayes’ Theorem,

Bayes’ Theorem

Per Naïve Bayes Theorem

Naïve Bayes Theorem

After plugging values we get,

Probability of Not Playing

Now, if we notice the sum of the probabilities above are not equal to 1.

Sum of Probabilities

We want sum of our probabilities to equal to 1. To do so we can simply normalize our probabilities.

Probability of Yes

And,

Probability of No

Since the probability of No is higher than the probability of Yes we can conclude the the players will not play today.

We don’t necessarily need to normalize the probability as we can infer directly from the original probability whet the output is Yes (0.031) or No (0.085).

Text Classification

Naïve Bayes is also very useful in NLP. Specially for text classification or sentiment analysis. We slightly modify the equation for conditional probability for text classification problem.

Probability

Lets look an example to understand how Naïve Bayes does the text classification.

Here is our sample dataset with review and sentiment, Positive and Negative.

Reviews

First step is the creating of the vocabulary — the collection of all different words that occur in the training set.

Count Vectorizer

Now, we will calculate the conditional probabilities for each sentiment or class.

Positive

Positive

n = 8

n = 8

|Vocabulary| = 8

Vocabulary

Probabilities

Conditional Probabilities

Negative

Negative

n = 8

n = 8

|Vocabulary| = 6

|Vocabulary| = 6

Probabilities

Conditional Probabilities

Now that we have trained our classifier, let’s classify a new sentence according to,

Outcome

Our new sentence is “Poor movie”

Positive

Positive

Negative

Negative

Clearly, this review is a negative (-) review since it has the bigger probability.

Credit: Andrew Ng, StatQuest and Krish Naik

I hope this article provides you with a good understanding of Naïve Bayes Theorem.

If you have any questions or if you find anything misrepresented please let me know.

Thanks!

--

--

Gajendra

| AWS MLS, SAA, CLF | MIT - ADSP | Software Engineer | Data Scientist | Machine Learning | Artificial Intelligence | Hobby Blogger |