Parts of Speech (POS) Tagging

Published in

CodeX

3 min readAug 31, 2021

What is it?

Parts of speech tagging is the process of tagging each word in a sentence with what part of speech that word is (e.g. Noun, Verb, Adjective, etc). A computer program that does this will take in input a field of text, parse the text into a list of words, and then return a list of the same size where each element is a tuple containing the word, and what part of speech that word is.

Why is it useful?

Parts of speech tagging allow both humans and computers to distinguish between different usages of the same word and give more context about a word. For example, the word “walk” is a verb in the sentence “I squeeze the lemon” but a noun in the sentence “I am in a bit of a squeeze”.

How is it done?

All implementations of parts of speech tagging require a labelled corpus to train on. This corpus contains many sentences, where each word in every sentence is already tagged with a part of speech.

Naive Implementation

The naive implementation assigns a part of speech to a word based on what was the most common part of speech for that word in the training corpus. Based on a dataset collected from the Wall Street Journal, this naive approach can have a 89% accuracy. This suggests that most words are unambiguous, or that most words only have 1 part of speech in any usage of the word. An example would be the word “beautiful” is always an adjective. In fact, only 14% of the words in the WJS corpus are ambiguous, meaning they can have two or more parts of speech depending on their usage in a sentence.

The naive implementation can serve as a baseline to benchmark more sophisticated approaches.

Probabilistic Models

Hidden Markov Models + Dynamic Programming

To get better results than the naive approach, Hidden Markov Models look at the word immediately before a word that we wish to tag. Going back to our prev ious example of the sentences “I squeeze the lemon” and “I am in a bit of a squeeze”, we can perhaps deduce that ‘squeeze’ appearing after the word ‘I’ is more likely to be a verb, and ‘squeeze’ appearing after the word ‘a’ is more likely to be a noun. The implementation for this analyzes the training corpus, counting how many times a part of speech occurs after another. For example, how many times a noun comes after a verb. It then counts for each word, the part of speech it is labelled as in the training corpus. For example, ‘squeeze’ occurred as a noun 5 times and a verb 20 times. Once we have these probabilities calculated, we can classify new input sentences. We can then use dynamic programming to look for the most probable chain of tags for the sentence. This approach yields about 95% accuracy on the WSJ dataset

Rule-Based Models

Something a bit more sophisticated than the naive implementation can specifically look at word endings and other characteristics about the word to determine its tag. For example words ending in ‘ed’ are usually verbs.

Deep Learning Models

Reports of use of Recurrent Neural Networks have had better results than my Hidden Markov implementation on the WSJ dataset as reported here, getting an accuracy of 97.64%.

An explanation for this is that the neural networks are able to look at other words in the sentence than just the previous word to determine its part of speech. Neural networks can take inputs to each character of a word as well.

Useful Resources

https://medium.com/analytics-vidhya/pos-tagging-using-conditional-random-fields-92077e5eaa31

https://en.wikipedia.org/wiki/Part-of-speech_tagging

https://www.aclweb.org/anthology/L18-1446.pdf