Natural Language Processing(Part 18)-Log Likelihood, Part 1

Coursesteach
6 min readNov 19, 2023

--

📚Chapter 3: Sentiment Analysis (Naive Bayes)

Description

In this blog, I will introduce you to log-likelihoods. These are just logarithms of the probabilities we’re calculating from the last blog. They are way more convenient to work with and they appeared throughout deep learning and NLP.

Sections

Ratio of Probabilities
Naive Bayes inference
Log-likelihood
Calculating Lambda
Summing the Lambdas
Summary

Section 1- Ratio of Probabilities

Let’s go back to the table you saw previously that contains the conditional probabilities of each word. For positive or negative sentiment. Words can have many shades of emotional meaning,but for the purpose of sentiment classification, they’re simplified into three categories; neutral, positive, and negative. All can be identified by using their conditional probabilities. These categories can be numerically estimated just by dividing the corresponding conditional probabilities of this table.

Now, let’s see how this ratio looks for the words in your vocabulary. The ratio for the word I is 0.2 divided by 0.2 or 1. The ratio for the word am is again 1. The ratio for the word happy is 0.14 divided by 0.1 or 1.4. Do the same for because, learning, and NLP. The ratio is 1. For sad and not the ratio is 0.1 divided by 0.15 or 0.6.

Again, neutral words have a ratio in your one. Positive words have a ratio larger than 1. The larger the ratio, the more positive the words are going to be. On the other hand, negative words have a ratio smaller than 1. The smaller the value, the more negative the word. In this week’s assignment, you’ll implement a function that filters words depending on their positivity or negativity. You will find the expression shown here to be very helpful with that.

Section2- Naive Bayes inference

These ratios are essential in Naive Bayes’ for binary classification. I’ll illustrate why using an example you’ve seen before. Recall earlier, where you use the formula to categorize a tweet as positive if the products of the corresponding ratios of every word appears in the tweet is bigger than 1. We said it was negative, if it was less than 1. This is called the likelihood. If you were to take a ratio between the positiv and negative tweets, you’d have what’s called the prior ratio. I haven’t mentioned it till now because in this small example, you had exactly the same number of positive and negative tweets, making the ratio 1. In this week’s assignments, you’ll have a balanced data-sets. You’ll be working with a ratio of 1. In the future though, when you’re building your own application, remember that this term becomes important for unbalanced data-sets.

With the addition of the prior ratio, you now have the full Naive Bayes formula for binary classification. A simple, fast and powerful method that you can use to establish a baseline quickly.

Section3- Log-likelihood

Now, it’s a good time to mention some other important considerations for your implementation of Naive Bayes’.

Sentiments probability calculation requires the multiplication of many numbers with values between 0 and 1. Carrying out such multiplications on a computer runs the risk of numerical underflow when the number returned is so small if can’t bestored on your device.

Luckily, there is a mathematical trick to solve this. It involves using a property of logarithms. Recall that the formula you’re using to calculate a score for Naive Bayes’ is the prior multiplied by the likelihood. The trick is to use a log of the score instead of the raw score. This allows you to write the previous expression as the sum of the log priorand the log likelihood, which is a sum of the logarithms of the conditional probability ratio of all unique wordsin your corpus.

Section4- Calculating Lambda

Let’s use this method to classify the tweets: I’m happy because I’m learning. Remember how you used the Naive Bayes’ inference condition earlier to get the sentiment score for your tweets.

Now, you’re going to do something very similar to get the log of your score. What you’ll need to calculate the log of the score is called the Lambda. This is the log of the ratio of the probability, that your word is positive and you divide that by the probability that the word is negative. Now, let’s calculate Lambda for every word in our vocabulary. For the word I, you get the logarithm of 0.05 divided by 0.05, or the logarithm of 1, which is equal to 0. Remember, the tweets will be labeled positive if the product is larger than 1. By this logic, I would be classified as neutral at 0. For am, you take the log of 0.04 over 0.04, which again is equal to 0. You enter 0 in the table. For happy, you get a Lambda of 2.2, which is greater than 0, indicating a positive sentiment. From here on out, you can calculate the log score of the entire corpus just by summing out the Lambdas.

Section 5- Summing the Lambdas

Section 6- Summary

You’re almost done with the log-likelihood. Let’s stop here and take a quick look back at what you did so far. Words are often emotionally ambiguous, but you can simplify them into three categories and then measure exactly where they fall within those categories for binary classification. You do so by dividing the conditional probabilities of the words in each category. This ratio can be expressed as a logarithm as well called Lambda, and you can use that to reduce the risk of numerical underflow. In this video, you learned about the ratio of positive words and negative words. The higher the ratio is, the more positive the words are going to be. As the number of words we are using gets larger and larger, then we are very likely to get a product that is very close to 0. Hence, we end up takingthe log of that ratio.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

References

1- Natural Language Processing with Classification and Vector Spaces

--

--