Member-only story
Sentiment Analysis and Structural Breaks in Time-Series Text Data
Arabica now offers a structural break and sentiment analysis module to enrich time-series text mining
Introduction
Text data contains lots of qualitative information, which can be quantified with various methods, including sentiment analysis techniques. These models are used to identify, extract and quantify emotions from text data and have wide use in business and academic research. Since the text is often recorded on a time-series basis, text datasets might display structural breaks as the quantitative information change due to many possible factors.
As a business analyst, measuring the changes in customer perceptions of a particular brand might be one of the key tasks. In the research role, one can be interested in the shifts in Vladimir Putin’s public statements over time. Arabica is a python library specifically designed to deal with similar questions. It contains these methods for exploratory analysis of time-series text datasets:
- arabica_freq for descriptive n-gram-based exploratory analysis (EDA)
- cappuccino is a visualization module including heatmap, word cloud, and line plot for unigram, bigram, and trigram frequencies