TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Visualization Module in Arabica Speeds Up Text Data Exploration

Arabica now offers unigram, bigram, and trigram word cloud, heatmap, and line chart to further accelerate time-series text data analysis

Petr Korab
TDS Archive
Published in
6 min readJan 9, 2023

--

Figure 1. Bigram word cloud, image by author.

Introduction

Arabica is a python library for exploratory text data analysis focusing on text from a time-series perspective. It reflects the empirical reality that many text datasets are collected as repeated observations over time. Time series text data include newspaper article headlines, research article abstracts and metadata, product reviews, social network communication, and many others. Arabica simplifies exploratory analysis (EDA) of these datasets by providing these methods:

  • arabica_freq: descriptive n-gram analysis and time-series n-gram analysis, for n-gram based EDA of text dataset
  • cappuccino: for visual exploration of the data.

This article provides an introduction to Cappuccino, Arabica’s visualization module for exploratory analysis of time-series text data. Read the documentation and a tutorial here for a general introduction to Arabica.

EDIT Jan 2023: Arabica has been updated. Check the documentation for the full list of parameters.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Petr Korab
Petr Korab

Written by Petr Korab

Python engineer /NLP / data Viz. Text Mining Stories founder textminingstories.com

Responses (1)