What is Chunking in NLP? — Dipansu Tech

Dipansu Tech
5 min readJun 9, 2023

--

Hey guys, are you also tired of searching “What is Chunking in NLP?”, all over the internet, and still do not get any accurate results?

Do not worry, we are here only to solve your tech problems.

What is Chunking in NLP?
What is Chunking in NLP?

So, in this article, we are going to read about What is Chunking in NLP? in deep. So, let’s begin this wonderful session.

NLP: Short Note

Although we have understood NLP many times, still for new viewers of this website, let’s study this topic once again!

NLP stands for Natural Language Processing. It is a domain of AI (Artificial Intelligence).

It simply means that giving those powers to that computer so that, they would be able to process the human’s natural language. Like — Hindi, English, German, French, Chinese, etc.

The AI-powered chatbot could interact with you in any form given below -:

Some of the real-life examples of power conversational chatbots are given below -:

So, these are the real-life examples of NLP.

Many steps were required to be followed by the robot so, that i t could mimic human conversations. If you want to know about those steps, just click the link.

Chunking in NLP

Chunking is a process that is based on POS (Parts of Speech) Tagging. It is also known as Partial/Shallow Parsing.

Before Understanding Chunking, let’s understand Chunks. Chunks mean, grouping those words which are the same in their syntactic structure.

Chunks are noun phrases, verb phrases, adjective phrases, etc.

And identifying, understanding, and then extracting those meaningful chunks from a sentence that is given by the user refers to chunking.

For Example — “Hy Mahesh, are going to buy a new red car?

In most of the robots, it is a process that is said to be done earlier (pre-processing step). It helps a robot to identify or extract the main phrases or words from the sentence.

And thus, it becomes easier for the robot to communicate with the user.

Types of Chunking

Till now we have understood NLP and chunking. Now, let’s move on to the types of Chunking.

There are mainly two types of chunking, which are -:

  • Chunking Up
  • Chunking Down

Chunking Up

Moving from exact details to more abstract and high-level notions is referred to as “chunking up “.

It is all known as abstraction or generalization.

This NLP procedure seeks to locate more general categories, subjects, or themes in a given text.

For example — let’s say you have to read your whole science, English, social science, and Information Technology book in a very short period of time.

It is impossible to do so. So, you’ll Chunk it up and you will get the most important topics in a well-categorized manner. And thus, you will be able to get a lot of information in a very efficient way.

Chunking Down

It means breaking down a longer word or piece of information into shorter, more focused chunks.

The goal of this method is to extract specific, granular information from a text.

For Example — on taking the above example, we can also chunk it down.

By Chunking down, we will get a piece of detailed information, facts, and specific elements. And thus, chunking down could also be used in consuming a lot of data in a short time.

Applications of Chunking

So, in this section, we are going to discuss some of the applications of chunking in NLP, which are -:

  • Named Entity Recognition (NER)
  • Information Extraction
  • Syntax Parsing
  • Sentiment Analysis

Named Entity Recognition

NER is the main application of Chunking in NLP.

As a pre-processing step for NER, chunking can be used to identify and group together named entities such as names of people, businesses, locations, etc.

Information Extraction

By detecting and extracting pertinent chunks, such as noun or verb phrases, chunking can help with the extraction of specific information from text.

To create structured information, additional processing can be applied to these extracted chunks.

Syntax Parsing

By dividing words into intelligible chunks according to their grammatical functions,

This technique of chunking helps in the analysis of the grammatical structure of sentences.

Tasks like dependency parsing, parsing trees creation, and syntactic parsing can all benefit from this.

Sentiment Analysis

In order to identify and categorize the sentiment indicated in a particular piece of text,

And thus, those projects which involve sentiment analysis might get benefit from chunking, which can be used to extract opinion statements or sentiment-bearing chunks from the text.

Chunking in Python

For chunking in python, we have to use an NLP library called NLTK.

Then, we can execute Chunking using these tags.

If we wish to choose verbs, we may create a grammar that uses a grammar tag to choose the terms.

Let’s analyze the code -:

text = word_tokenize("And now for something completely different")
nltk.pos_tag(text)

The Output of the code:

[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),('completely', 'RB'), ('different', 'JJ')]

Tags in NLTK

Now, let’s understand some of the tags in NLTK, which are -:

  • CC — Coordinating Conjunction
  • MD — Modal
  • CD — Cardinal Number
  • DT — Determiner
  • VBZ — Verb, 3rd person singular present
  • WB — Wh-pronoun
  • VBD — Verb, past tense
  • RP — Particle
  • RB — Adverb

These are some of the tags in NLTK, there are much more tags like this.

Code for Chunking

Now let’s write the code in Python which will be able to chunk the verb and noun in the sentence.

import nltk

# Sample text
text = "The quick brown fox jumps over the lazy dog"

# Tokenize the text
tokens = nltk.word_tokenize(text)

# POS tagging
tagged_tokens = nltk.pos_tag(tokens)

# Verb Chunking
verb_grammar = r"""
VP: {<VB.*><.*>*}
"""
verb_chunk_parser = nltk.RegexpParser(verb_grammar)
verb_result = verb_chunk_parser.parse(tagged_tokens)

print("Verb chunks:")
for subtree in verb_result.subtrees():
if subtree.label() == 'VP':
print(subtree)

# Noun Chunking
noun_grammar = r"""
NP: {<DT>?<JJ>*<NN.*>+}
"""
noun_chunk_parser = nltk.RegexpParser(noun_grammar)
noun_result = noun_chunk_parser.parse(tagged_tokens)

print("\nNoun chunks:")
for subtree in noun_result.subtrees():
if subtree.label() == 'NP':
print(subtree)

The output of the code is given below:

Verb chunks:
(VP jumps/VB over/IN)
(VP jumps/VB over/IN the/DT lazy/JJ dog/NN)

Noun chunks:
(NP The/DT quick/JJ brown/JJ fox/NN)
(NP the/DT lazy/JJ dog/NN)

So, this is all the code through which you can perform verb and noun chunking in Python. The code is taken from ChatGPT, YouTube, and Different Websites.

FAQ

Q1. What do you understand by Chunking in NLP?

A1. Chunking means identifying and extracting the important words and phrases from a sentence provided by the user.

Q2. How many types of Chunking are? Name those types.

A2. There are mainly two types of chunking, which are -:
1. Chunking Up
2. Chunking Down

Q3. Full form of NLTK?

A3. NLTK stands for Natural Language Toolkit.

Q4. Tell the full form of any two tags in NLTK.

A4. The two tags, with their full form, is -:
1. CC — Coordinating Conjunction
2. RB — Adverb

Conclusion for “What is Chunking in NLP”

So, guys, with this we have completed our article, which was about “Chunking in NLP”.

I hope that you understood the article completely. If this is so, do not forget to like, share and follow us on different social media platforms.

At last, I want to say Thanks for Reading this Article!

--

--

Dipansu Tech

Hello, all my friends, I am Dipansu Joshi, and I am here to present all the Tech and Coding Knowledge to you at 0 cost.