Bayes Theorem | NLP Part-2

Published in

Batteries Included

4 min readApr 15, 2020

In this article we will discuss Bayes theorem and how this is used for spam classification in the machine learning.

First of all we have to imagine a situation, We have two colleagues Max and Tina. They both look alike and share same lab. One day there was a fire in the Lab. They both said they don’t know anything about fire and has nothing to do with it. But the CCTV camera in the corridor saw someone running out from the lab when the fire has started in a red sweater. Now we can not identify who this person was because they both look alike.

For now, we can say that the chances of fire started by any of them is 50%. But there comes a new outcome, Max was working only 2 days a week and Tina was working 3 days a week. Now the probability values change and we can say that probability of Max causing the fire is now 40% and that of Tina is 60%.

As for now, we can say that Tina is most likely the person who has started fire. This information is called prior. Now there comes a new information, Max wears Red sweater 3 times a weeks and Tina has a red sweater which she wears 1 time a week. Since the person we saw in the footage was wearing red sweater, this is very valuable information. Let’s calculate new probabilities independently.

Now to calculate the total probabilities of and of these fore scenarios happening, we must multiply them.

Probability of Max wearing red sweater and coming to work = Probability of Max coming to work * Probability of Max wearing a red sweater
Probability of Tina wearing red sweater and coming to work = Probability of Tina coming to work * Probability of Tina wearing a red sweater
3. since the probabilities of Max and Tina not wearing red doesnt matter, we will not consider them at all.

We just used Bayes theorem but there is an honest mistake in it. Sum of probabilities must add up to one. But in our case its way less than 1. We must normalize it. How we can do it? Just divide it with sum of probabilities of both of our outcomes.

This probability theorem has worked for spam classifier !!! How? Now, in our data-set, there are 5 emails which are spam and 5 which are not spam. Lets say a person receives a message ‘easy money’. The word easy appears 1 time Hams and spams. The probability of word ‘easy’ in spam and ham is:

Now lets calculate the probabilities for the word ‘money’. It shows up 2 times in the 3 spams and one time in 4 hams:

Now we have calculated the probabilities of words ‘easy’ and ‘money’ occurring in spams and hams. What is the probability of both of these words occurring in one message together?

This way the model learns the probabilities of each word belonging to ham or spam. Hope you understood most of it, if not, please let me know in comments. This series will continue, please don't forget to follow me or clap this article. This give me motivation and some dose of dopamine :)

Bayes Theorem | NLP Part-2

Written by Rahul Sood