Natural Language Processing(Part 24)-Error Analysis

Coursesteach
5 min readDec 31, 2023

--

📚Chapter 3: Sentiment Analysis (Naive Bayes)

Introduction

No matter what NLP method you use, you will one day find yourself faced with an error. For example a misclassified sentence. In this tutorial, I’ll show you how to analyze such errors. Let us consider some possible errors and the model prediction that can be caused by these issues.

Sections

Possible errors
Processing as a source of Error: Punctuation
Processing as a source of Error: Remove the word
Processing as a source of Error: word orders
Adversarial attacks

Section 1- Possible errors

  • One semantic meaning lost in the pre-processing step.
  • Two, a word order affects the meaning of a sentence.
  • And three some quirks of languages come naturally to humans was confused, naïve based models.

Section 2- Processing as a source of Error: Punctuation

So let’s start. One of your main considerations when analyzing errors in NLP systems is what the processed version of the text actually looks like. Let’s look at this tweet, my beloved grandmother with some punctuation indicating his sad face. The sad face punctuation in this case is very important to the sentiments of the tweet because it tellsyou what’s happening. But if you’re removing punctuation then the processed tweet will leave behind only beloved grandmother, which looks like a very positive tweet. My beloved grandmother exclamation mark would be a very different sentiments. So remember always check what the actual text looks like. It’s not just about punctuation either. Check out this tweet.

Section 3- Processing as a source of Error: Remove the word

This is not good because your attitude is not even close to being nice. If you remove neutral words like not and this what you’re left with is the following. Good, attitude, close, nice. From this set of words and you classifier will infer that this is something very positive. We’ll talk later on about handling notes and word orders. But remember double check what your process text looks like to make sure your model will be able to get an accurate rea. The inputs pipeline isn’t the only potential source of trouble.

Section-4- Processing as a source of Error: word orders

Look at these tweets. I’m happy because I did not go. This is a purely positive tweets. I am not happy because I did not go with a negative sentiment. In this case the not is important to the sentiment. What gets missed by your naïve base classifier. So word order can be as important as spelling. There are many other factors to consider as well and you will see more and more ways to build systems thathandle them in the weeks to come.

Section 5- Adversarial attacks

Another problem of naïve base is something called an adversarial attack. The term adversarial attack describe some common language phenomenon like sarcasm, irony and euphemism. Humans pick these up quickly but machines are terrible at it. This tweet this is a ridiculously powerful movie. The plot was gripping and I cried right through until the ending contains a somewhat positive movie review was pre processing my suggests otherwise. If you pre process this tweet you’ll get a list of mostly negative words. But as you can see they were actually used to describe a movie that the author enjoyed. If you use naïve base on this list of words, it would end up giving a very negative score regardless.

Now you know how to apply the naïve based method to tax classification. It makes the independence assumption which can lead to errors. What do you know how to analyze them? It’s still a very powerful baseline, as you know, it relies on word frequency counts. Next week we can learn how to use word vectors. This can give us better results.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Classification and Vector Spaces

--

--