Basics of Probability for understanding Naive Bayes

Rahul Parasmani

9 min readMay 20, 2023

In this blog, I will cover all the basics that is needed to get started with Naive bayes Algorithm .

Some important terms:

Experiment:- The task we are performing
Event:- The set of favorable outcomes
Non Event:- The set of unfavorable outcomes
Sample Space::- Set of all possible outcomes
Mutually Exclusive Events:- Mutually exclusive events are events that cannot occur at the same time. If one event happens, the other event cannot happen simultaneously. In other words, the occurrence of one event excludes the possibility of the other event occurring
Exhaustive Events:- Exhaustive events are a set of events that collectively cover all possible outcomes in a particular scenario or experiment. In other words, the union of all the events in the set of exhaustive events covers the entire sample space (all the sample points ).

Two approaches for calculating probability:

Classical Approach:- In the classical approach we consider all sample points as equally likely. This method is useful only when we don’t have data about experiments outcome frequency
Frequency based approach:- In a frequency based approach we have sufficient data to calculate the probability of occurring of each outcome. This is mostly preferred in data science.

As the size of the sample becomes larger the results obtained by frequency based methods tend towards that of the classical approach. This is the reason why if you toss a fair coin two times then you do not always get one tail and one head .

Three different types of probabilities:

Marginal probability :- The probability of a single event occurring without taking into account any other events. It focuses on the probability of a specific outcome in a situation, disregarding the other possibilities.

Imagine you have a bag filled with different colored marbles — red, blue, and green. The number of marbles of each color is as follows:

5 red marbles
3 blue marbles
4 green marbles

Now, let’s define two events:

Event A: Selecting a red marble

Event B: Selecting a blue marble

The marginal probability of Event A refers to the probability of selecting a red marble from the bag, without considering any other events.

To calculate the marginal probability of Event A, we need to find the probability of selecting a red marble out of all the marbles in the bag. In this case, we have a total of 12 marbles in the bag (5 red + 3 blue + 4 green).

So, the marginal probability of Event A (P(A)) is:

P(A) = Number of red marbles / Total number of marbles

P(A) = 5 / 12

Similarly, we can calculate the marginal probability of Event B, which is the probability of selecting a blue marble from the bag, without considering any other events.

P(B) = Number of blue marbles / Total number of marbles

P(B) = 3 / 12

In this case, since we only have two events (Event A and Event B), the marginal probabilities of these events represent the probability of selecting a red marble and the probability of selecting a blue marble, respectively, without considering any other factors.

2. Conditional Probability:- The conditional probability refers to the probability of an event occurring given that another event has already occurred or is known to have occurred.

Imagine you have a bag filled with different colored candies — red, blue, and green. The number of candies of each color is as follows:

6 red candies
4 blue candies
5 green candies

Now, let’s define two events:

Event A: Selecting a red candy

Event B: Selecting a blue candy

Let’s calculate the conditional probability of Event A (selecting a red candy) given Event B (selecting a blue candy). We want to find the probability of selecting a red candy from the bag, assuming that we have already selected a blue candy.

To calculate the conditional probability, we divide the joint probability of both events by the probability of the given event.

The probability of selecting a red candy and a blue candy can be calculated by multiplying the individual probabilities of each event occurring.

P(A and B) = P(A) * P(B)

P(A and B) = (6 / 15) * (4 / 15)

P(A and B) = 24 / 225

The probability of selecting a blue candy from the bag is 4 out of 15, as there are 4 blue candies in the bag out of a total of 15 candies.

P(B) = Number of blue candies / Total number of candies

P(B) = 4 / 15

Now, we can calculate the conditional probability:

P(A|B) = P(A and B) / P(B)

P(A|B) = (24 / 225) / (4 / 15)

P(A|B) = (24 / 225) * (15 / 4)

P(A|B) = 3 / 25

So, the conditional probability of selecting a red candy given that a blue candy has already been selected is 3/25.

Another concept that is closely associated with conditional probability is the concept of independent event.

Independent Event:- Independent events refer to events that have no influence or impact on each other. In other words, the occurrence or non-occurrence of one event does not affect the probability of the other event happening.

For example, let’s consider two events:

Event A: Flipping a fair coin and getting heads.

Event B: Rolling a fair six-sided die and getting a 4.

These events are independent because the outcome of flipping the coin does not affect the outcome of rolling the die, and vice versa. The probability of getting heads on the coin flip is always 1/2, regardless of the outcome of the die roll. Similarly, the probability of rolling a 4 on the die is always 1/6, regardless of the outcome of the coin flip.

Mathematically, two events A and B are considered independent if the following equation holds true:

P(A/B)=P(A) and P(B/A)=P(B). These relations further simplifies to P(A and B) = P(A) * P(B)

Understanding independent events is important because it allows us to simplify probability calculations and make predictions. If two events are independent, knowledge about one event does not provide any information about the other event. Thus, the probability of both events occurring together is simply the product of their individual probabilities.

It is worth noting that independence of events can change if there are underlying dependencies or relationships between the events, which can be explored through further analysis and statistical methods

3. Joint Probability:- Joint probability refers to the probability of two or more events occurring together or simultaneously. It quantifies the likelihood of the joint occurrence of multiple events.

Let’s consider a simple example with two events, A and B. The joint probability of events A and B, denoted as P(A and B) or P(A, B), represents the probability of both events A and B happening at the same time.

Mathematically, the joint probability can be calculated using the following formula:

P(A and B) = P(A ∩ B)

Here, P(A ∩ B) denotes the intersection of events A and B, which represents the set of outcomes that satisfy both A and B simultaneously.

If A and B are independent events, meaning that the occurrence of one event does not affect the probability of the other event, the joint probability simplifies to:

P(A and B) = P(A) * P(B)

Imagine you have a bag containing red and blue socks. Let’s say there are 4 red socks and 3 blue socks in the bag. We define two events:

Event A: Selecting a red sock

Event B: Selecting a blue sock

The joint probability refers to the probability of two events occurring together or simultaneously. In this case, it represents the probability of selecting a red sock and a blue sock consecutively from the bag.

To calculate the joint probability of Event A and Event B (P(A and B)), we multiply the individual probabilities of each event occurring (as A and B are independent events as the chance of happening of A does not affect the chance of happening of and vice-versa)

P(A and B) = P(A) * P(B)

The probability of selecting a red sock (Event A) is 4 out of 7 because there are 4 red socks in the bag and a total of 7 socks.

P(A) = Number of red socks / Total number of socks

P(A) = 4 / 7

Similarly, the probability of selecting a blue sock (Event B) is 3 out of 7 because there are 3 blue socks in the bag.

P(B) = Number of blue socks / Total number of socks

P(B) = 3 / 7

Now, we can calculate the joint probability:

P(A and B) = P(A) * P(B)

P(A and B) = (4 / 7) * (3 / 7)

P(A and B) = 12 / 49

So, the joint probability of selecting a red sock and a blue sock consecutively from the bag is 12/49.

Total Probability Theorem

Let B1, B2 …Bn be n mutually exclusive and exhaustive events and A be defined in the same sample space.

Then P(A)=P(A∩B1) + P(A∩B2) + P(A∩B3) + .. +P(A∩Bn) =

∑ P(A∩Br)= ∑ P(Br)P(Br/A)

If we considerB1, B2 …Bn as n different paths and A as final destination then

P(Br) denotes the probability of selecting r-th path

And P(A/Br ) denotes the probability of occurrence of A through that path.

Bayes theorem

Bayes’ theorem, named after the Reverend Thomas Bayes, is a fundamental concept in probability theory and statistics. It describes how to update or revise the probability of an event occurring based on new evidence or information. Bayes’ theorem is expressed mathematically as:

P(A|B) = (P(B|A) * P(A)) / P(B)

Where:

P(A|B) is the conditional probability of event A occurring given that event B has occurred.
P(B|A) is the conditional probability of event B occurring given that event A has occurred.
P(A) is the probability of event A occurring.
P(B) is the probability of event B occurring.

In simpler terms, Bayes’ theorem allows us to calculate the probability of an event A given that we have observed evidence B. It takes into account both the prior probability of A (i.e., the probability of A before considering any evidence) and the likelihood of observing evidence B if A were true. By combining these probabilities, we can update our belief in the likelihood of event A occurring.

Bayes’ theorem has applications in various fields, including statistics, machine learning, medical diagnosis, spam filtering, and more. It provides a framework for reasoning under uncertainty and helps make informed decisions based on available evidence.

Monty Hall Problem

The Monty Hall problem is a famous probability puzzle named after the host of the American television game show “Let’s Make a Deal,” Monty Hall. The problem goes as follows:

You are a contestant on a game show, and there are three doors: Door 1, Door 2, and Door 3. Behind one of the doors is a valuable prize, such as a car, and behind the other two doors are less desirable prizes, such as goats.

The game proceeds as follows:

You are asked to choose one of the three doors, without opening it.
After you choose a door, the host, Monty Hall, who knows what’s behind each door, opens one of the other two doors to reveal a goat. He will always reveal a door that you did not select and that has a goat behind it.
At this point, you have a choice: stick with your original choice of door or switch to the remaining unopened door.
After you make your decision, the host opens the door you selected, revealing the prize behind it.

The question is: Should you stick with your original choice or switch to the other unopened door to maximize your chances of winning the valuable prize?

The surprising answer is that you should always switch doors! Switching gives you a higher probability of winning the valuable prize than sticking with your original choice.

To understand why switching is the better strategy, consider the probabilities at each stage:

When you first choose a door, the probability of choosing the door with the prize behind it is 1/3. The probability of the other two doors having goats behind them is 2/3.
After the host reveals a goat behind one of the other doors, the probability that your initial choice was correct remains 1/3. However, the probability that the prize is behind the remaining unopened door is now 2/3. By switching, you effectively shift your probability of winning from 1/3 to 2/3.
When the host opens your selected door, revealing a goat, switching to the other unopened door guarantees you the prize if you initially chose a door with a goat behind it. If you initially chose the door with the prize, switching would result in losing the prize. However, since the probability of initially choosing the door with the prize is only 1/3, switching still gives you a higher probability of winning.

This counterintuitive result can be better understood by considering all the possible scenarios and their respective probabilities. By systematically analyzing the problem, it becomes clear that switching doors increases your chances of winning from 1/3 to 2/3.

The Monty Hall problem demonstrates how our intuition about probabilities can sometimes lead us astray. It highlights the importance of careful reasoning and mathematical analysis in understanding and solving probability puzzles.

Additional Resource that you must watch

https://www.youtube.com/watch?v=r09xYWuumfE&t=1146s

Basics of Probability for understanding Naive Bayes

Some important terms:

Two approaches for calculating probability:

Three different types of probabilities:

Total Probability Theorem

Bayes theorem

Monty Hall Problem

Additional Resource that you must watch

Written by Rahul Parasmani