TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Advanced Visualisations for Text Data Analysis

5 min readMay 15, 2022

--

Image 1. Packed bubble chart, Image by author

This article will survey a couple of slightly more advanced graphics for text data visualization in Python. More precisely:

  • N-gram word cloud: to display the frequency of higher-order n-grams
  • Chord diagram: to show connections between several entities and their strength
  • Packed bubble chart: visually engaging display of word frequencies

To illustrate their applications and python coding, I use the classic IMDb 50K Movie Reviews (data license is here). A subset of data was already
pre-processed and cleaned from numbers, stopwords, and special characters.

#1: N-gram word cloud

Standard word cloud from Python’s wordcloud library displays unigrams (single words such as “cat”, “table”, or “flower”). We will explore a slightly more advanced version of the graph, which plots the frequency of bigrams and trigrams (i.e., two and three consecutive words).

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Petr Korab
Petr Korab

Written by Petr Korab

Python engineer /NLP / data Viz. Text Mining Stories founder textminingstories.com

Responses (4)