NLP in Layman's Terms

Published in

RumPyDas

6 min readJun 22, 2019

Hello Lovely Readers! I would love to thank you for reading my blogs. If you haven’t read my previous Blog in Data Science and Machine Learning about Time series then go have a look at it-“Everything you need to know about Time Series!” by Rumana Shaikh https://link.medium.com/NRbGl5wXIX

Natural Language Processing

Formally, Natural Language Processing or NLP is defined as the application of computational techniques for the analysis and the synthesis of text. The aim of NLP is to give computers the ability to do tasks involving human language. In terms of hands-on or engineering terms, it can broadly be defined as “cleaning” and “transforming” text to a form fit for machine learning. Of course, you can derive insights from the text as well just like any EDA operation. But the inherent properties of text call for a principally different approach to deal with it. This leads us to an important question of why is it so hard to deal with text that it requires a separate field of study.

Why is it difficult to work with text?

Comprehending Language is hard for computers. Some of the unique challenges of working with text are as follows:

Synonymy — This corresponds to different words having the same meaning. A similar intent can be conveyed in various ways and this is one of the prime reasons, why computers have a hard time deciphering the meaning or intent of those statements. “The President of United States has signed a new decree” and “POTUS has inked in a new law” are basically advocating the same sentiment. However as they are completely different sentences syntactically, computers have a hard time figuring out the user intent.
Ambiguity — “The bank deposit rate is quite high” and “He stood near the bank admiring the river”. In these statements, the word bank has completely different meanings. In the first case, it represents a financial institution, and in the second case, it refers to land near the river. Disambiguating the meaning in sentences is quite challenging.
Anaphora Resolution — “George is my friend. He likes football”. In the second statement he refers to George. It is difficult for the computers to discern what person/entity the pronoun he is referring to.
Language related issues — Every language has its own uniqueness. For English, we have words, sentences, paragraphs and so on. But in Thai, there is no concept of sentences at all! The grammar and morphology of languages are so different. This is why we observe that Google Translator or any other translator service struggles to perfectly convert a piece of text from one language to another.
Out of Vocabulary problem — Machines have a hard time adapting to any new constructs that humans come up with. As humans when we come across a word we haven’t seen earlier, we might not understand its meaning instantly. But this does not mean we cannot adapt. After looking at the word in several different sentences and understanding its usage, we understand the context and meaning of the new word. Machines can only handle data that they have seen before. It is unable to adapt well.
Language generation — While language understanding is hard, language generation too has its own set of challenges. For chatbots to work effectively, they need to communicate properly constructed sentences which are grammatically correct. This is quite a hard problem and a challenge that needs to be overcome.

We now know that working with text is hard. But there are also exciting applications and use cases involved with working on the text. We will now take a look at some of the use cases.

Use cases of NLP

The use cases of NLP encompass almost anything you can do with Language in relation to a problem.

1) Sentiment Analysis — Finding if the text is leaning towards a positive or negative sentiment.

The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral is called Sentiment Analysis. The information present over the Internet is constantly growing resulting in a large number of texts expressing opinions in review sites, forums, blogs, and different social media forums. Sentiment analysis is, therefore, a topic of great interest and development since it has many practical applications. It is immensely useful in figuring the overall sentiment of products (Amazon), movies (Netflix), food (Yelp), etc. Its applications include Market Research, Social Monitoring, Customer Support, and Product Analytics.

2) Text Classification — Categorizing text to various categories

Text classifiers can be used to organize, structure, and categorize almost any text data we have. For e.g. New articles can be organized by topics, chat conversations can be organized by language, support tickets can be organized by urgency, etc. Other examples of text classification include:

Directing customer queries to the right vertical
Detection of spam and non-spam emails,
Auto tagging of customer queries

3) Document Summarization — Compressing a paragraph/document into few words or sentences

Text summarization is the method of compressing a text document, in order to create a summary of the major points of the document. The idea of summarization is to find a subset of data which contains the information of the entire set. Its applications include News summary(In shorts app), Novel Summary, Book Summary (Blinkist), etc. With the overall attention span declining, the need to provide information in the shortest possible words has risen - and summarization helps solve this problem.

4) Parts of Speech Tagging — Figuring out the various nouns, adverbs, verbs, etc in the text.

Identifying the part of speech tags is much more complicated than it looks. This is because over time in the development of language, a single word can have different parts of speech tag in different sentences based on different contexts. This makes it impossible to have a generic mapping for POS tags. Few of its applications include:

Text to speech conversion
Word Sense Disambiguation (Teach a machine to know the difference of the meaning of word ‘bears’ in “I saw a couple of bears” and “Hard work always bears fruit”)

5) Machine translation — Translate text from one language to another

Machine Translation is the task of automatically translating one natural language into another while retaining the meaning of the original text. Translation from one language to another is complex because some of the words in the original language could have multiple meanings and these words could have different forms in the output language. Its most popular application is Google Translate and it is employed in devices like Google Home as well. Machine translation allows business transactions between partners in different countries without the need of a human interpreter.

6) Named Entity Recognition — Identify the entities present in the text

Named Entity Recognition deals with named entity mentions in text and categorizes these entities into a person, organization, DateTime reference, etc. This is used a lot in the field of bioinformatics, molecular biology, and other medical NLP applications. It also plays an important role in the overall field of Information Extraction where we try to extract knowledge from unstructured text.

7) Conversational AI — Chat with a machine in natural language and get queries resolved

Conversational AI deals with creating an interface between machines and humans to converse in natural language. Such interfaces are known as chatbots. A user can interact in natural language with natural language, the same way he usually communicates with a human. For organizations to truly scale in terms of customer support, chatbots are increasingly adopted as the first point of contact for customer query resolution across all organizations.

So for enabling all the NLP use cases, the first challenge is to convert the text into a form that the machine can understand. For that, we need to arrive at a fundamental component of text known as tokens.

Hence in our next blog, we will be understanding more about tokens and other important terms NLP followed by solving a Data Set.

NLP in Layman's Terms

Why is it difficult to work with text?

Use cases of NLP

Written by Rumana Shaikh