Natural Language Processing(Part 23)-Naïve Bayes Assumptions

Coursesteach
4 min readDec 24, 2023

--

📚Chapter 3: Sentiment Analysis (Naive Bayes)

Description

Now I’m going to the assumptions underlying the naïve bayes method. The main one, is independence of words in a sentence. And I’ll tell you why this can be a big problem, when the method is applied. Naïve bayes is a very simple model because it doesn’t require setting any custom parameters. This method is referred to as naïve, because of the assumptions it makes about the data.The first assumption is independence between the predictors or features associated with each class. And the second, has to do with your validation set. Let’s explore each of these assumptions and how they could affect your results.

Sections

  • Independence
  • Naive Bayes Assumptions

Section 1- Independence

To illustrate what independence between features looks like, let’s look at the following sentence. It is sunny and hot in the Sahara desert. Naïve Bayes assumes that the words in a piece of text are independent of one another. But as you can see, this typically isn’t the case, the word sunny and hot often appear together as they do in this example. Taken together, they might also be related to the thing they’re describing, like a beach or a dessert. So the words in the sentence are not always necessarily independent of one another. But naïve bayes, assumes that they are,

Section 2- Naive Bayes Assumptions

this could lead you to under or overestimate the conditional probabilities of individual words. When using a naïve bayes, for example, if your task was to complete the sentence, it’s always cold and snowy in blank. Naïve bayes might assign equal probability to the words spring, summer, fall and winter. Even though from the context you can see that winter should be the most likely candidate And the next courses of this specialization, you will be introduced to some more sophisticated methods, let’s deal with this.

Another issue with naïve bayes is that it relies on the distribution of the training data sets. A good data set, will contain the same proportion of positive and negative tweets as a random sample. However, most of the available annotated corporal are artificially balanced, just like the data set you use for the assignment. And the real tweet stream, positive tweets tend to occur more often than their negative counterparts. One reason for this is that negative tweets, might contain content that is banned by the platform or muted by the user. Such as inappropriate or offensive vocabulary, assuming that reality behaves as your training corpus. This could result in a very optimistic or very pessimistic model. There’s a lot more on this, in the last video of this module which analyzes the sources of errors in naïve bayes.

Let’s do a quick recap of all this new information, the assumption of independence and naïve bayes is very difficult to guarantee. But despite that, the model works pretty well in certain situations. And for the assignments in this module, the relative frequency of positive and negative tweets and your training data sets, needs to be balanced in order to deliver an accurate results. Now you understand the assumptions that underlie the naïve bayes method. What if it fails to perform well, for some sentence in the next video, I’ll show you what to do in such cases.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Classification and Vector Spaces

--

--