Why NLP is hard for Reviews

Nitin Kumar Kain
Dec 28, 2018 · 2 min read

Nowadays, Natural Language Processing(NLP) is one of the hottest topics in the field of AI. The reason behind this, NLP techniques helps us to understand customer behavior, demand, and feedback about the product.

Now the amount of text data on the internet is massive…..Boom. But most of the data is unstructured. So, What else we need, We have data and computation power. Now we need a fast and scalable algorithm then NLP algorithm comes in the picture.

We @Bewgle are working on customer reviews to understand the customer feedback and finding the deeper insights from reviews with the help of AI.

So During this work what we feel that NLP makes hard for reviews.

There are basically 4 major problems in review especially if we take the review from Indian customers.

1. Combination of multiple languages:

In a single review, there are words from different languages or word from other language written in English (i.e Faltu, Bakwass). This is one of the major problems to handle such cases.

2. No linguistic pattern:

Standard languages sentences are slightly easy to handle by using an existing library or existing models. But the problem with this is, In most of the reviews there is not a linguistic pattern that makes this more difficult.

3. Lots of grammatical error:

The grammatical error is one of the biggest challenges in reviews. In reviews, we got hundreds of variant for one word. For example, Awesome has variant like.. awsmm, awesone, awesomee, ossm, awosme, ossom, awesum, owesome, awaysome, awesomeee, aswam and there are many more. so handling such problem makes this problem slightly harder.

4. Lack of domain-specific supervised data:

If you are already aware of the machine learning algorithms, then you can probably understand the value of labeled data. How you can make your model more powerful while training with the labeled data. And Labeled data is very expensive and very hard to collect. And if you are working the different domain then its very hard to get domain specific labeled data. On the internet, there are data available but with very limited amount and limited domains.

And there are many problems with NLP that's why it is an interesting topic to work in the AI. NLP problems are very interesting to work on and it is the hot topic of current research.

Nitin Kumar Kain

Written by

Biker||Explorer and a Passionate learner||works at Bewgle.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade