Baby Steps into Data Science 03 — Math&Statistics: Bayes Theorem

Editor: Ishmael Njie & Sulayman Saleem

Published in

DataRegressed

3 min readJun 7, 2018

A key concept in statistics, and more importantly Data Science, is the Bayes Theorem; a way of describing the probability of an event based on prior information that may be directly related to that event. For example, we may want to provide a probability of a given person having the flu. To start off, we may just find the probability of the entire population having the flu. However, one may make the assumption that if this given person had just coughed, the probability of having the flu may change. Given this new evidence, we can use the Bayes Theorem to update our probability.

Bayesian Inference is a concept in statistics that generates properties of a probability distribution from a given data set; using the Bayes Theorem at its nucleus.

The Theorem:

This rule stems from the relationship between joint probabilities and conditional probabilities.

Given the formula above, we aim to find the probability of event A, while B is the event that is introduced to be related to A in some capacity.

P(A|B) is known as the posterior probability of A; the probability of event A being true given event B has happened.
P(B|A) is known as the likelihood; the probability of B being true given A has happened.
P(A) is the prior probability.
P(B) can also be seen as a prior; more importantly, it is the probability of observing the new event B.

Both P(A) and P(B) are probabilities of A and B respectively, without any regard to each other.

Example:

Suppose we have received data from a clinic that shows 10% of patients are diagnosed with cancer. In addition, 40% of the patients are smokers; as well as 70% of the patients that are diagnosed with cancer are also smokers.

If a new patient enters the clinic, what is the probability of this patient being diagnosed with cancer if they are a smoker?

P(C) = 0.1, P(S) = 0.4, P(S|C) = 0.7

We are trying to find P(C|S); referring back to our rule:

P(C|S) = P(S|C) * P(C) / P(S)

P(C|S) = 0.7 * 0.1 / 0.4 = 0.175

Here, we can see that the probability of being diagnosed with cancer while being a smoker P(C|S) is greater than simply just being diagnosed with cancer P(C). With this new evidence of ‘being a smoker’, we achieved a better probability to describe the scenario. Our probability increased from 10% to 17.5%. In theory, the case study and its outcome is pretty rational; implying that being a smoker increases ones chances of being diagnosed with cancer.

Based on this case study, the Bayes theorem aids in obtaining a more precise probability of our events given additional relevant information.

The Bayes Theorem is a fundamental concept of probability theory in Machine Learning. Naive Bayes is a Machine Learning algorithm that computes probabilities for binary and multi-class classification. Where the Bayes Rule is the foundation, the algorithm assumes a strong independence between features in a data set.

Below is a great video on the Bayes Theorem!

Baby Steps into Data Science 03 — Math&Statistics: Bayes Theorem

Editor: Ishmael Njie & Sulayman Saleem

Written by DataRegressed Team