Natural Language Processing in Python: Exploring Word Frequencies with NLTK

Sigli Mumuni
6 min readDec 20, 2021

A step-by-step guide to visualizing word occurrences in a text.

Image by Author (Word cloud generated from the Wikipedia NLP article)

Natural Language Processing (NLP) is broadly defined as the manipulation of human language by software. It has its roots in linguistics but has evolved to encompass computer science and artificial intelligence, with NLP research largely devoted to programming computers to understand and process large amounts of natural language data, including speech and text. Today, NLP has wide-ranging applications in language translation, sentiment analysis, chatbots, voice assistants, and several more.

In the first of several upcoming tutorials in this series, we will explore one of the most basic tasks in NLP, word frequency analysis. While it is itself a comprehensive subject, we will be exploring a basic implementation using the Natural Language Toolkit or NLTK, a popular Python NLP library. The text we will be analyzing is the Great Gatsby, regarded as one of the greatest books ever written.

Importing the relevant libraries

To get started, we’ll first need to import all the relevant libraries. If you haven’t installed these already, feel free to do so using the pip install command.

--

--