A Beginner’s Guide to NLP

Karteek Menda
6 min readJul 10, 2021

This article is dedicated to Late Alan Turing.

Hello Aliens

Natural Language Processing(NLP)

Natural Language Processing, usually called as NLP, is a branch of artificial intelligence that deals with the interaction between computers and humans using the natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. Most NLP techniques rely on machine learning to derive meaning from human languages. NLP plays a critical role in supporting machine-human interactions.

In this article, I will let you know some of the NLP tasks which were performed and later we will deploy on to the web to make it a complete package.

The tasks are mentioned below.

  1. Analyzing the text and getting the tokens and lemma of the text.
  2. Also getting the NER(Named Entity Recognition) from the text entered.
  3. Sentimental Analysis.
  4. Text Summarization (Extract Summarization)
  5. Machine Translation.

I will throw some light on each and every task mentioned above as we proceed further.

1. Tokens and Lemma

A token is the smallest part of a corpus. And tokenization is the task of chopping it up into pieces, called tokens.

For example:

Input: NLP and Machine learning go hand in hand.

After Tokenization, the output is nothing but each of the word present in this sentence. NLP is one token Machine is another token and this list goes on like this.

Lemma is like getting to a root of that given word. Lemma uses wordnet corpus. It can be used when we want more human understandable words, as the output of lemmatization is a proper word. It will be more clear with an example.

Lets take three words “going”, “goes”, “gone”. The lemma is nothing but getting the root word which is “go”.

Tokens and Lemma of the text.

2. NER(Named Entity Recognition)

In any text document, there will be particular terms that represent specific entities that are more informative and have a unique context. These entities are known as named entities , which more specifically are real-world objects like people, places, organizations, and so on, which are often denoted by proper names.

Named entity recognition (NER) , also known as entity chunking/extraction, is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.

Named Entity Recognition

3. Sentimental Analysis.

Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis tools allow businesses to identify customer sentiment toward products, brands or services in online feedback

A sentiment analysis system for text analysis combines natural language processing (NLP) and machine learning techniques to assign weighted sentiment scores to the entities, topics, themes and categories within a sentence or phrase.

For example: Lets take a movie review “ this movie was the worst of times”. Of course this is a negative sentiment. We as humans can say that but what about Machines. So, we have nice package named TextBlob which could tell us the sentiment of this text, and in the background we are using the default NaiveBayes Analyzer.

The review is a negative sentiment and probability of this review being negative is almost 80.5%. Great job.

4. Text Summarization

Text summarization refers to the technique of shortening long pieces of text. The intention is to create a fluent summary while preserving key information content and overall meaning. Applying text summarization reduces reading time, accelerates the process of researching for information, and increases the amount of information that can fit in an area.

Summaries reduce the reading time. Automatic summarization improves the effectiveness of the indexing and are less biased than the human summarizers.

There are broadly two different approaches that are used for text summarization:

  1. Extractive Summarization
  2. Abstractive Summarization

However I would go with extractive summarization for this. I strongly recommend the Aliens to go through this Text Summarization and dig deep as there are number of algorithms which are available to summarize the text. I will be posting a seperate article on Text summarization in a short period.

I have used almost 4 algorithms to summarize the text.

Gensim.

  • This module provides functions for summarizing texts. Summarizing is based on ranks of text sentences using a variation of the TextRank algorithm.
  • This module automatically summarizes the given text, by extracting one or more important sentences from the text. In a similar way, it can also extract keywords.

Lex rank.

  • Unsupervised approach to text summarization based on graph-based centrality scoring of sentences.
  • The main idea is that sentences “recommend” other similar sentences to the reader. Thus, if one sentence is very similar to many others, it will likely be a sentence of greater importance.

Luhn.

  • Based on frequency of most important words.

LSA.

  • based on term frequency techniques with singular value decomposition to summarize texts.

5. Machine Translation.

Machine translation (MT) is an automatic translation from one language to another. Machine translation refers to fully automated software that can translate source content into target languages. Humans may use MT to help them render text and speech into another language.

Over here I used the package TextBlob for the translation. The input can be any language text. The output will be the specified language which the end user wants.

English to Hindi

The input sentence is an English text and the output we want it to be translated to Hindi. And, we got it.

Also, the input can be of any language. Lets go with one more language Arabic.

Arabic to Hindi

Lets cross check once.

Arabic to English

Now lets move on to the deployment.

So, same we will be importing all the packages and creating functions and calling them whenever the user selects the particular field. You can find the entire code for this here.

So, after the deployment is done, we can see a nice web app which is ready to serve the business.

Web App

A simple and beautiful web app is created. Now you can use Heroku for cloud deployment.

I will try to take away your confusion in “Confusion Matrix” in my upcoming article.

Please do follow me on Medium so that you can receive the updates.

Happy Learning…….

Follow me on Linkedin: www.linkedin.com/in/karteek-menda

This is Karteek Menda.

Signing Off

--

--

Karteek Menda

Robotics GRAD Student at ASU, Project Engineer in Dynamic Systems and Control Lab at ASU, Ex - Machine Learning Engineer, Machine Learning Blogger.