Member-only story
Visualization Module in Arabica Speeds Up Text Data Exploration
Arabica now offers unigram, bigram, and trigram word cloud, heatmap, and line chart to further accelerate time-series text data analysis
Introduction
Arabica is a python library for exploratory text data analysis focusing on text from a time-series perspective. It reflects the empirical reality that many text datasets are collected as repeated observations over time. Time series text data include newspaper article headlines, research article abstracts and metadata, product reviews, social network communication, and many others. Arabica simplifies exploratory analysis (EDA) of these datasets by providing these methods:
- arabica_freq: descriptive n-gram analysis and time-series n-gram analysis, for n-gram based EDA of text dataset
- cappuccino: for visual exploration of the data.
This article provides an introduction to Cappuccino, Arabica’s visualization module for exploratory analysis of time-series text data. Read the documentation and a tutorial here for a general introduction to Arabica.
EDIT Jan 2023: Arabica has been updated. Check the documentation for the full list of parameters.