Natural Language Processing(NLP): A beginner’s guide

Published in

IETE SF MEC

7 min readAug 29, 2021

The year was 1950. Scientist Alan Turing described a test- he said that if someone asked a series of questions to a human and a machine simultaneously, and if there were no noticeable differences, then it could be concluded that the machine possessed the ability to think. This had to happen using a teleprinter , without the speaker knowing who is who. He named it the Imitation Game, now known as the ‘Turing Test’. It was a breakthrough idea. Can machines really be made to ‘think’?

Thinking ability is not unique to, but is advanced in human beings. So is the ability to communicate, via a language. The languages we use are complex and sophisticated with their own set of rules and vocabulary. Even closely related species do not have this potential. The primary parts responsible for speech and decoding of speech in humans are the Broca’s area and the Wernicke’s area respectively. Both are located in the left part of the brain.

The fact that neurons are used by the brain to form electrical networks was discovered by two scientists, Alan Hodgkin and Andrew Huxley in 1952. Their model received a Nobel Prize in Physiology or Medicine. These events paved the way for evolution of computers, Artificial Intelligence and Natural language processing(NLP).

Artificial Intelligence(AI) is based on mimicking human intelligence, while NLP takes it a step further - it tries to enable computer programs to understand human language. This is a tough task, given that there are about 7000 different languages spoken around the world. Add to it different slangs, grammar and dialects! Whoa!!

Now let us address the elephant in the room. What is the need or rather the motivation to invest resources behind this herculean task?

Data, data everywhere- Machines generate data. Humans generate data. It is impossible for humans to handle all this data effectively.
Unbiased intent- It is known that humans are biased. Our childhood, surroundings, culture and experiences shape us. These biases are not bad, but they reflect on our decisions. This can be reduced by bringing in a machine.
Inclusivity- The aim of technology is to reach everyone, so that all can reap its benefits. Communication via natural language accommodates more naive users.

How does it work?

The working of NLP has two phases, mainly, data preprocessing and algorithm development.

Data preprocessing

The text data received will contain a lot of details. Some maybe necessary, some unnecessary, so it is important to ‘sieve’ the data. The methods employed are:

Tokenization: Given text is broken down into small chunks known as tokens. Smaller units are easier to work with compared to larger units. Chances of errors are also minimum.

Removal of stop words: Common words are removed. For example, words like ‘the’.

Lemmatization and stemming: Lemmatization refers to the context while stemming finds the ‘stem’ word around which the sentences are centered.

Lower casing: All words are converted to lower case. Otherwise ‘CASE’ and ‘case’ will be treated like two unique words.

Part-of-speech tagging: Words are tagged based on their part of speech, i.e. whether they are a noun or a verb, etc.

Algorithm development

Rule-based system: Traditionally used, they are based on rules built from linguistic structures. This system tries to mimic how humans form sentences from structures.

Machine learning based system: By using statistical methods like analyzing trained set and building its own rules.

Nowadays, deep learning is used as it is more flexible and attempts to mimic the way a child learns a language.

Techniques and tools for NLP

Many techniques are used for NLP. The two main terms that stand out here are syntax and semantics.

Syntax works with the grammar of the system, while semantic refers to the sense of the sentence. For eg: Ram plays Guitar. This sentence is both syntactically and semantically correct. Meanwhile, Guitar plays Ram is syntactically correct but semantically wrong as it doesn’t make any sense. Each technique can be syntactic or semantic.

Syntactical analysis

Word segmentation and sentence breaking

Differentiating two words by locating whitespace and differentiating two sentences by looking at period ‘.’ are also part of syntactic analysis.

Also, Syntactic analysis has different levels such as:

POS tagging or part-of-speech tagging, which classifies words based on their part of speech.

Constituency parsing: Components like Noun phrase, Verb phrase, Prepositional phrase are some constituencies in the English language. Replacing one constituent with another of the same type won’t affect the sentence, but like in the above Ram-Guitar example, it need not be semantically correct.

Dependency parsing: It is a more advanced approach which can be extended to other languages as well. It concentrates on the subject-predicate concept. Every sentence in this world has a subject and a predicate(verb+ object). A sentence conveys what the subject is, what it does (verb), and to whom it is done(object).

Semantic analysis

Semantics play a very important role in understanding natural language.

Some semantic analysis techniques include:

Named Entity Recognition: A very popular semantic technique, NER takes a paragraph and groups words based on some property. It is used extensively by search engines.

2. Sentiment Analysis/ Opinion Mining

As the name indicates, sentiment analysis understands the emotion behind a piece of text. They are used in product reviews, online course reviews etc.

3. Natural Language Generation/ Data storytelling

The opposite of natural language understanding, it converts large structured data into natural language for ease of understanding.

4. Topic Modelling

It is used to identify topics in the text. It is also an unsupervised technique. Some algorithms used in topic modelling include Latent Semantic Analysis and Latent Dirichlet Allocation.

Information thus obtained from the text can be used in ML models or used directly.

Tools

Some of the most popular tools include:

Natural Language Toolkit(NLTK)- Python library
Stanford Core NLP - Stanford’s toolkit
Google Cloud NLP API
TextBlob - A python library for textual data processing
Amazon Comprehend - An NLP service which uses machine learning
Gensim- An open source library used for NLP

Real-world applications of NLP

Some real world applications of NLP include:

E-mail filters — classifying mails as primary, social and promotions, differentiating spam mails from regular mails ,etc.

2. Smart Assistants like Apple’s Siri and Amazon’s Alexa to recognize patterns and to give appropriate responses. To learn more about how this works, check out our article on voice assistants here.

3. In applications like Inshorts, for text summarization.

4. To predict text and in services like auto-correction while chatting, or using e-mail.

5. To analyze customer feedbacks, a few apps which use this include Uber, Zomato etc.

6. To detect plagiarism in works.

7. For academic and research purposes.

Conclusion

Natural Language Processing is a very vast subject, advancing everyday. As mentioned before, conquering the vast multitude of languages is a persisting, daunting task. At the same time, we need to acknowledge the extent to which NLP has simplified our daily lives, and therefore it is a goal worth trying. After all, practice makes a ‘machine’ perfect!!

Resources

What is Natural Language Processing? Introduction to NLP

Editor's note: This blog post was last updated on March 2, 2021. The field of study that focuses on the interactions…

algorithmia.com

What is Natural Language Processing? An Introduction to NLP

Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken and…

searchenterpriseai.techtarget.com

5 key facts about language and the brain

Language and communication are as vital as food and water. We communicate to exchange information, build relationships…

www.medicalnewstoday.com

A Brief History of Natural Language Processing (NLP) - DATAVERSITY

Natural Language Processing (NLP) is an aspect of Artificial Intelligence that helps computers understand, interpret…

www.dataversity.net

What Is Artificial Intelligence or AI and why is it Important | NetApp

As the data authority for hybrid cloud, NetApp understands the value of the access, management, and control of data…

www.netapp.com

What is Natural Language Processing?

Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret and…

www.sas.com

https://www.analyticssteps.com/blogs/introduction-natural-language-processing-text-cleaning-preprocessing

The Main Approaches to Natural Language Processing Tasks - KDnuggets

Source: Top 5 Semantic Technology Trends to Look for in 2017 (ontotext). We have previously discussed a number of…

www.kdnuggets.com

5 Natural Language Processing Techniques for Extracting Information

The field of artificial intelligence has always envisioned machines being able to mimic the functioning and abilities…

blog.aureusanalytics.com

Image courtesy:

Download Natural Language Processing Illustration Concept. Modern flat design concept of web page…

Download the Natural Language Processing Illustration Concept. Modern flat design concept of web page design for…

www.vecteezy.com

Photoreception - Central processing of visual information

Photoreception - Photoreception - Central processing of visual information: Vivid images of the world, with detail…

www.britannica.com

Vint Cerf Quotes

Vint Cerf - American Scientist born on June 23, 1943, Vinton Gray "Vint" Cerf is an American Internet pioneer, who is…

quotesgram.com

**We also invite science and technology enthusiasts to write for us. If you think you have interesting stuff which the world should know about, send in your articles to us!***

Interested in writing for us? Fill up this form!