Natural Language Processing(Part 16)-Naïve Bayes Introduction

Coursesteach
6 min readNov 5, 2023

--

📚Chapter 3: Sentiment Analysis (Naive Bayes)

Description

In this tutorial, we will solve the same problem using a method called the Naive Bayes. It’s a very good, quick and dirty baseline for many texts classification tasks. The concepts you learned will be used later throughout the specialization.

Naive Bayes is an example of supervised machine learning and shares many similarities with the logistic regression method you used in the previous assignments. It’s called naive because this method makes the assumption that the features you’re using for classification are all independent. Which in reality is rarely the case. As you will see, however, it still works nicely as a simple method for sentiment analysis.

Sections

Key points
Steps of Naive Bayes for Sentiment Analysis
extract the vocabulary
compute the conditional probabilities
Smooth your probability function.
Naive Bayes inference condition rule for binary classification
Conclusion

Key points

  • It’s called naive because this method makes the assumption that the features you’re using for classification are all independent.
  • The interesting thing here is words that are equally probable don’t add anything to the sentiment.
  • In contrast to these neutral words, look at some of these other words like happy, sad, and not. They have a significant difference between probabilities. These are your power words tending to express one sentiment or the other.
  • Now let’s take a look at because. As you can see, it only appears in the positive corpus. It’s conditional probability for the negative class is 0. When this happens, you have no way of comparing between the two corpora, which will become a problem for your calculations.

Section 1- Steps of Naive Bayes for Sentiment Analysis

Step1:extract the vocabulary

As before, you will begin with two corpora. One for the positive tweets and one for the negative tweets. You need to extract the vocabulary or all the different words that appear in your corpus, along with their counts. You get the word counts for each occurrence of a word in the positive corpus, then do the same for the negative corpus just like you did before. Then you’re going to get a total count of all the words in your positive corpus and do the same again for your negative corpus. That is, you’re just summing over the rows of this table. For positive tweets, there is a total of 13 words and for negative tweets, a total of 12 words. This is the first new step for Naive Bayes and it’s very important because it allows you to compute the conditional probabilities of each word given the class as you’re about to see.

Step 2: compute the conditional probabilities

Now divide the frequency of each word in a class by it’s corresponding sum of words in the class. For the word I, the conditional probability for the positive class would be 3//13. You store that value in a new table with the corresponding value, 0.24. Now for the word I in the negative class, you get 3/12. You store that in your new table with the corresponding value, 0.25. Now apply the same procedure for each word in your vocabulary to complete the table of conditional probabilities.

One key property of this table is that if you sum over all the probabilities for each class, you will get 1.

Let’s investigate this table further to see what these numbers mean. First, note how many words have a nearly identical conditional probability, like I’m learning and then NLP. The interesting thing here is words that are equally probable don’t add anything to the sentiment.

In contrast to these neutral words, look at some of these other words like happy, sad, and not. They have a significant difference between probabilities. These are your power words tending to express one sentiment or the other. These words carry a lot of weight in determining your tweet sentiments.

Step 3: Smooth your probability function.

Now let’s take a look at because. As you can see, it only appears in the positive corpus. It’s conditional probability for the negative class is 0. When this happens, you have no way of comparing between the two corpora, which will become a problem for your calculations.To avoid this, you will smooth your probability function.

Step 4- Naive Bayes inference condition rule for binary classification.

Say you get a new tweet from one of your friends and the tweet says, “I’m happy today, I’m learning.” You want to use the table of probabilities to predict the sentiments of the whole tweet. This expression is called the Naive Bayes inference condition rule for binary classification.

This expression says that you’re going to take the product across all of the words in your tweets of the probability for each word in the positive class divide it by the probability in the negative class. Let’s calculate this product for this tweet. For each word, select it’s probabilities from the table. For I, you get a positive probability of 0.2 and a negative probability of 0.2. The ratio that goes into the product is just 0.2/0.2. For I’m, you also get 0.2/0.2. For happy, you get 0.14/0.10. For today, you don’t find any word in the table, meaning this word is not in your vocabulary. So you won’t include any term in the score. For the second occurrence of I again, you’ll have 0.2/0.2. For the second occurrence of I’m, you’ll have 0.2/0.2 and learning gets 0.10/0.10. Now note that’s all the neutral words in the tweet, like I and I’m, just cancel out in the expression. What you end up is with 0.14/ 0.10, which is equal to 1.4. This value is higher than one, which means that overall, the words in the tweets are more likely to correspond to a positive sentiment.

Conclusion

So you conclude that the tweet is positive. So far, you’ve created a table to store the conditional probabilities of words in your vocabulary and applied the Naive Bayes inference condition rule for binary classification of a tweet. Great. Next, you’ll look into some issues with this implementation and how to fix them. Now you have seen how Naive Bayes can be used to classify the sentiment of a tweet. In the next video, we will simplify the calculations before we implement it.

Please Follow coursesteach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

References

1- Natural Language Processing with Classification and Vector Spaces

--

--