Learning POS Tagging & Chunking in NLP

Jocelyn D'Souza
Apr 4, 2018 · 3 min read

This post will explain you on the Part of Speech (POS) tagging and chunking process in NLP using NLTK. In my previous post, I took you through the Bag-of-Words approach. Bag-of-words fails to capture the structure of the sentences and sometimes give its appropriate meaning.

POS and Chunking helps us overcome this weakness.

What is Part of Speech?

The part of speech explains how a word is used in a sentence. There are eight main parts of speech - nouns, pronouns, adjectives, verbs, adverbs, prepositions, conjunctions and interjections.

  • Noun (N)- Daniel, London, table, dog, teacher, pen, city, happiness, hope
  • Verb (V)- go, speak, run, eat, play, live, walk, have, like, are, is
  • Adjective(ADJ)- big, happy, green, young, fun, crazy, three
  • Adverb(ADV)- slowly, quietly, very, always, never, too, well, tomorrow
  • Preposition (P)- at, on, in, from, with, near, between, about, under
  • Conjunction (CON)- and, or, but, because, so, yet, unless, since, if
  • Pronoun(PRO)- I, you, we, they, he, she, it, me, us, them, him, her, this
  • Interjection (INT)- Ouch! Wow! Great! Help! Oh! Hey! Hi!

Most POS are divided into sub-classes. POS Tagging simply means labeling words with their appropriate Part-Of-Speech.

How does POS Tagging works?

POS tagging is a supervised learning solution that uses features like the previous word, next word, is first letter capitalized etc. NLTK has a function to get pos tags and it works after tokenization process.

The most popular tag set is Penn Treebank tagset. Most of the already trained taggers for English are trained on this tag set. To view the complete list, follow this link.

What is Chunking?

Chunking is a process of extracting phrases from unstructured text. Instead of just simple tokens which may not represent the actual meaning of the text, its advisable to use phrases such as “South Africa” as a single word instead of ‘South’ and ‘Africa’ separate words.

Chunking works on top of POS tagging, it uses pos-tags as input and provides chunks as output. Similar to POS tags, there are a standard set of Chunk tags like Noun Phrase(NP), Verb Phrase (VP), etc. Chunking is very important when you want to extract information from text such as Locations, Person Names etc. In NLP called Named Entity Extraction.

There are a lot of libraries which gives phrases out-of-box such as Spacy or TextBlob. NLTK just provides a mechanism using regular expressions to generate chunks.

Let’s dive deeper..

We will consider Noun Phrase Chunking and we search for chunks corresponding to an individual noun phrase. In order to create NP chunk, we define the chunk grammar using POS tags. We will define this using a single regular expression rule.

The rule states that whenever the chunk finds an optional determiner (DT) followed by any number of adjectives (JJ) and then a noun (NN) then the Noun Phrase(NP) chunk should be formed.

I hope you have got a gist of POS tagging and chunking in NLP. I have guided you through the basic idea of these concepts. There is much more depth to these concepts which is interesting and fun. To learn more: Part of Speech Tagging with NLTK Chunking with NLTK

Thanks for reading! ❤

Follow for more updates!



Jocelyn D'Souza

Written by

Data Scientist | Machine Learning | Artificial Intelligence