Pre-requisite: Knowledge on the basic calculation of probability.

Knowledge of probability is an important requirement for a Data Scientist. It helps to understand the backbone behind machine learning algorithms. Probability deals with the uncertainty that occurs in the results obtained from the implementation of the machine learning algorithms. For beginners, it can be quite confusing to understand the principles of Bayes Theorem. This article will help beginners understand the various concepts involved in utilizing Bayes Theorem for probability analysis in data science

Table of contents:

1. Conditional Probability with examples.

2. Bayes Theorem with examples.

3. Introduction to Naïve Bayes.

**Conditional Probability**

Conditional Probability is the probability of an event (A) occurring given that another event (B) has occurred. The formula:

P(A|B) = Probability of A occurring given B (When B has occurred).

P(AnB)=Probability of A and B occurring together.

P(B)=Probability of B occurring.

The venn diagram below shows the interception between A and B which means the probability of A and B occurring.

**Example**: The percentage of adult females, who are female and are smokers is 3%. What is the probability of being a smoker and female, given being a female?

**Solution**:

A = smoker

B = female

P(AnB) = 0.03

P(B) = 0.5 ( can either be female or male so ½)

Given the formula above:

P(A|B)=0.03 /0.5 = 0.06.

This interprets as the probability of being a smoker when female is 6%.

**Bayes Theorem**

Bayes Theorem is an application of conditional probability; it is derived from it. It calculates the probability of one scenario or event based on its relationship with another scenario or event. Formula can be extracted like below:

The formulae above show a translation from conditional probability to Bayes Theorem. Equation 1 states the conditional probability for P (A|B) while equation 2 shows the conditional probability of P (B|A). Since the probability of A and B is always going to be the same i.e. (P(AnB) is the same as P(BnA)). Equation 1 and 2 are then rearranged to give equation 3. The probability of A given B (P (A|B)) is then made the subject of the formula leading to the equation of Bayes Theorem below:

**Example: **A particular study shows that 20% of women will likely develop breast cancer at some point in their lives. A woman with breast cancer has a 90% chance of a positive result from a medical screening exam. A woman without breast cancer has a 7% chance of getting a false positive result. What is the probability that a woman has cancer given she has a positive test result?

**Solutions: **I would like to visualize the question using the tree below:

P (positive) = the probability of positive result =P (cancer and positive) + P(No cancer and positive)

P(positive)= (0.2)(0.9) + (0.07)(0.8)=0.236

P(positive|cancer) = 90% =0.9.

P(positive|No cancer)=7%=0.07.

P (cancer)= the probability of having cancer = 20% = 0.2.

The Question above is asking for the probability of a woman having cancer given the test is positive which is P(cancer|positive).

Using the Bayes Theorem formula:

P(cancer|positive) = (P(positive|cancer) * P(cancer)) \P(positive)

= (0.9*0.2)\0.236

P(cancer|positive) =0.763=76.3%

The result is interpreted as when a breast cancer test is done on a woman and the test result is positive there is 76.3% chance that the patient actually has breast cancer.

**Naïve Bayes**

Naïve Bayes classifier is an algorithm that uses Bayes Theorem to classify objects. It assumes strong independence between attributes and data points. It is popularly used in text analysis, sentiment analysis, spam filters and medical diagnosis.

There are three types of Naïve Bayes:

1. Gaussian Naïve Bayes: it is used when variables are continuous and assumes a normal distribution of variables.

2. Multinomial Naïve Bayes: It is used when the features represent frequency and works well with text classification problems.

3. Bernoulli Naïve Bayes: It is used when the features are binary (discrete features) and penalizes non-occurrence of a feature.

Naïve Bayes is just a modification of the Bayes Theorem in which the target variable(y) is the scenario which is checked given the presence of feature variable(x). The feature variable is usually more than one and each one is used in the calculation. This is shown below:

The above is calculated for each class of the target variable and the class with the highest probability is chosen.

I believe this is a good introduction to the concept of Conditional probability, Bayes Theorem and Naïve Bayes. We would then move to the application of Naïve Bayes using an external dataset in my next article.