Member-only story
Advanced Visualisations for Text Data Analysis
Explore n-gram word cloud, chord diagram, and a bubble chart, and their implementation in Python
This article will survey a couple of slightly more advanced graphics for text data visualization in Python. More precisely:
- N-gram word cloud: to display the frequency of higher-order n-grams
- Chord diagram: to show connections between several entities and their strength
- Packed bubble chart: visually engaging display of word frequencies
To illustrate their applications and python coding, I use the classic IMDb 50K Movie Reviews (data license is here). A subset of data was already
pre-processed and cleaned from numbers, stopwords, and special characters.
#1: N-gram word cloud
Standard word cloud from Python’s wordcloud library displays unigrams (single words such as “cat”, “table”, or “flower”). We will explore a slightly more advanced version of the graph, which plots the frequency of bigrams and trigrams (i.e., two and three consecutive words).