Bayes Rule and Sentiment Analysis

Praful Mohanan

Published in

GDSC DYPCOE

7 min readJun 8, 2020

Exploring Natural Language Processing, the most fascinating thing that caught my eye was Bayes Rule.

Fun Fact : SS Central America which sank in 1857 carrying 20 tons of gold was found using the Bayesian Theory.

In this article I explore the Bayes Rule First and how it is used to perform Sentiment Analysis followed with a Python code example.

Notation:
P(A and B) : Probability that Event A and Event B happens
P(A|B) : Probability that Event A is going to happen given that B happens P(B|A) : Probability that Event B is going to happen given that A happens P(A) : Probability that Event A is going to happen
P(B): Probability that Event B is going to happen

Rule of Multiplication:
P(A and B) = P(A) P(B|A) = P(B)P(A|B)

So that’s a lot of notations. Let me break it down bit by bit. So here | symbol represents conditional probability. A conditional probability is the expression of how probable one event is given that some other event occurred.

For example:
What is the probability that I fall given that its slippery?
P(I fall| Its slippery) = P(I falls and its slippery)/P(Its slippery)

What is the probability that it rains given that its cloudy?
P(It rains| Its cloudy) = P(It rains and its cloudy)/P(Its cloudy)

To think about this intuitively If i want to know the probability that it rains given that its cloudy, I want to consider all ways that it rains and its cloudy out of the only ways where its already cloudy because that is what we care about right!

We know that
P(It rains and its cloudy) = P(Its cloudy and it rains)
So we can similarly say that
P(Its cloudy| It rains) = P(It rains and Its cloudy)/P(It rains).
Equate P(Event A and Event B) and we get the Bayes rule.

P(It rains | Its cloudy) P(Its cloudy) = P(Its cloudy| It rains) P(It rains)

Example :
Suppose I want to calculate the probability of rain in the afternoon. Information I have:
20% of days have rainy afternoons — P(It rains) = 0.20
30% of days have cloudy mornings — P(Its cloudy) = 0.30
70% rainy afternoons have cloudy morning — P(Its cloudy | It rains) = 0.70
Plugging in we get — P(It rains|Its cloudy) = 0.46

So what does this mean? There’s 46% chance that it rains in the afternoon whenever the morning is cloudy. Interesting right? Earlier the probability was 20% now it increased to 46%. Why so? From the evidence we had, we calculated a new probability based on how accurate the evidence was!

Now that you have a good grasp of Bayes theorem let’s use it to perform sentiment analysis on product reviews.
As far as the definition of Sentiment analysis goes, it is a type of analysis which interprets and classifies the emotions within text data using either statistical or linguistic techniques or both. For example, “Didn’t like it” is a negative review while “Very useful” is a positive review. We will be using a statistical technique called Bag-of-words model. Bag of words technique ignores the grammatical structure of the words while relying on appearance of words such as ‘nice’, ‘great’, ‘bad’ to classify it as a positive or negative sentiment.

Notation:
S — Sentence
😀 — Positive Sentiment eg. S = I liked this product.
😐 — Negative Sentiment eg.S = Did not work as expected.
P(😀 | S) — Probability of Positive Sentiment given the Sentence
P(😐 | S) — Probability of Negative Sentiment given the Sentence

Consider a sentence for which we need to do sentiment analysis:
S = “Great product at affordable price”

We need to find P( 😀 | “Great Product at affordable price”)
According to the Bag-of-words model we will treat each sentence as a collection of words P( 😀 | “Great”, “Product”, “at”, “affordable”, “price ”)
This now means what is the probability that the Sentence is Positive given that the word “great” was in the sentence and “product” was in the sentence and the word ‘’at” was in the sentence.. likewise. This is where Bayes Rule comes in.

Notation — wᵢ : denotes iᵗʰ word for simplicity.

Let’s walk through each of these steps:
1. According to Bayes theorem we can say that both of these terms are proportional to each other ignoring the denominator.
P(A|B) α P(B|A) P(A)

2. From Rule of multiplication — P(A|B) P(B) = P(A and B)

3. This part is where Naive of Naive Bayes comes in, normally it would have been P(A)P(A|B)P(A|B,C)… but the presence of one word does not affect the likelihood of another(Bag of words model) rather it is only affected by whether the word is positive or negative so P(A)P(B|A)P(C|A) likewise. The probability of being a positive or negative would not change if it is known that some word comes up less or more than some other word.

Fig 1.2 (From Fig 1.1)

Now we have shown that these both equations are proportional!
This is what we want given the sentence we need to calculate the probability that it is a positive sentiment.

Now Can we find the terms in Equation 2(Fig 1.2)? Yes.
We can calculate all the terms in the 2nd equation using:

Fig 1.3

Consider we have a labelled dataset containing both positive, negative reviews with 52% positive and 48% negative examples and some arbitrary word distribution as follows for this particular example.

For eg. 0.3 here denotes Probability that a sentence has the word “product” given that it is a positive example and how did we calculate that?
Refer Fig 1.3

S = “Great product at affordable price”
Now we can calculate P(😀| S) using the equation we found in Fig 1.2

Calculating we get:
P(😀 | S) = 0.52 * 0.8 * 0.3 * 0.1 * 0.7 * 0.2 = 0.0017472
This number in itself does not make sense let’s calculate P(😐| S) Similarly
P(😐 | S) = 0.48 * 0.05 * 0.7 * 0.5 * 0.4 * 0.66 = 0.000336
Normalizing:
P(😀|S) = 0.83
P(😐|S) = 0.16

83% The product review is a positive example. Pretty amazing right!

Now let’s look at some Code:

We will be using the Naive Bayes Classifier from the nltk python package.
Our dataset :

Positive Samples

Negative Samples

The classifier expects training data in the form of list of tuples where each tuple is a feature(sentence) and a label, in this case there will be two labels — Positive and Negative. Here’s an example feature for the sentence “Product is very good”. The tuple will contain true for words in the sentence and false for words not in the sentence. So first we will need to generate these features for both Positive and Negative examples.

Sample Feature

Import necessary packages and extract both samples

2. Get all the unique words from the dataset

3. Create features required for training and train using the classifier.

4. Now our classifier is trained let’s give it the input from earlier:
S = “Great product at affordable price”
Just like our training data, the input sentence needs to converted into the proper form. After that we will use prob_classify function which returns the probability distribution for the respective labels.

Positive: 0.9596
Negative: 0.0404

This shows us that it is a 😀 sentiment with 95% confidence!
Great Work! Now we have trained our very own sentiment classifier.

Now you have an intuitive understanding of Bayes Theorem and also know how to use it to do sentiment analysis using the nltk package. You can use this concept similarly even to classify spam emails. Since we used a small sample data, here’ s one Amazon review dataset which you can use to do sentiment analysis on.

Thank you for reading

I like explaining things intuitively and answering “why” questions.
If you liked my article, do 👏🏼.
Your appreciation inspires me to ✍🏼write more.

~Praful Mohanan
Connect with me on LinkedIn, Github.

References:

Bayes Rule and Sentiment Analysis

Now let’s look at some Code:

Thank you for reading

Written by Praful Mohanan