NLP Guide: Identifying Part of Speech Tags using Conditional Random Fields

Aiswarya Ramachandran
Analytics Vidhya
Published in
6 min readOct 5, 2018

--

In the world of Natural Language Processing (NLP), the most basic models are based on Bag of Words. But such models fail to capture the syntactic relations between words.

For example, suppose we build a sentiment analyser based on only Bag of Words. Such a model will not be able to capture the difference between “I like you”, where “like” is a verb with a positive sentiment, and “I am like you”, where “like” is a preposition with a neutral sentiment.

So this leaves us with a question — how do we improve on this Bag of Words technique?

Part of Speech (hereby referred to as POS) Tags are useful for building parse trees, which are used in building NERs (most named entities are Nouns) and extracting relations between words. POS Tagging is also essential for building lemmatizers which are used to reduce a word to its root form.

POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag, based on its context and definition. This task is not straightforward, as a particular word may have a different part of speech based on the context in which the word is used.

For example: In the sentence “Give me your answer”, answer is a Noun, but in the sentence “Answer the question”, answer is a…

--

--