Words with Feeling — Sentiment Analysis in the Age of Trump

Published in

Impact Policy

5 min readFeb 1, 2017

Computational linguistics often depend on distributional semantics, the concept that texts with similar word counts (frequency distributions) are alike, an extension of Leibniz’s Law to language informally called the “bag of words” model. While computers remain far from comprehending human language, much less human feelings, distributional semantics do allow us to compare documents and infer topics from words and word combinations that occur often within document corpora.

“Rule-based” sentiment analysis (versus supervised machine learning using human-labeled training sets) counts “positive” and “negative” words using human-compiled dictionaries to infer author sentiment from the resulting semantic frequency distributions. Before computers became powerful enough to process large text corpora efficiently, literary theorists used a similar approach to interpret tone — the feelings authors want to evoke in their readers — by examining diction, the conscious choices authors make about what words to use and how to use them.

Since computer-assisted sentiment analysis can only tell us about the kind of words used and their frequency distributions, it’s a leap in logic to say that these measures unambiguously represent human emotions. However, it is reasonable to assume that authors select descriptive words to represent their opinions, which, to some extent, are influenced by their feelings.

Despite this early technology’s limitations, much has been made about using sentiment to predict everything from riots to suicide to stock market shifts:

So what can sentiment tell us about several thousand articles posted on a popular blogging platform (which shall, per NDA, remain nameless)?

This analysis examined a large sample (n>8000) of articles across 5 topic tags — Design, Technology, Education, Politics and Life — published between August 2015 and April 2016. Sentiment scores were calculated using C.J. Hutto and Eric Gilbert’s VADER (Valence Aware Dictionary and sEntiment Reasoner) module, since it performs well across a variety of texts and is readily available as opensource. VADER’s design was evaluated using:

“7,500 lexical features with validated valence scores that indicated both the sentiment polarity (positive/negative), and the sentiment intensity on a scale from –4 to +4. For example, the word “okay” has a positive valence of 0.9, “good” is 1.9, and “great” is 3.1, whereas “horrible” is –2.5, the frowning emoticon :( is –2.2, and “sucks” and its slang derivative “sux” are both –1.5.”

Since VADER combines negative and positive counts into normalized ‘compound’ scores between -1 and 1, the absolute “negativity” or “positivity” of a given document isn’t determined by this score. Rather, compound scores are relative to the entire document corpus, so an article with a -1 compound score isn’t the most negative, hateful thing ever written. Rather, a score at either end of the normalized scale is a more certain estimate of the document’s valence relative to all the other documents examined. As an example, a document with a score of .2 would be only slightly likely to be positive, while a document with a score of 1 is very likely to be positive.

Articles tended to be mostly positive across all genres, and sentiment seems to pool at both the negative and positive ends of the scoring scale. As Ribeiro et al observed what could be bias towards detecting positivity across several sentiment analysis methods, we could be observing the same. One possible explanation for this bias is that literary devices like sarcasm that intentionally change the meaning of otherwise conventional words can easily confuse natural language processing programs.

For example, if we a hear a person play the violin badly and remark, “Oh, you are just GREAT at the violin” while looking down the length of our nose, cocking our head to the side and smirking, body language and vocal tone will convey sarcasm effectively. Read by a sentiment analysis program, though, the phrase states clearly that the struggling musician is “GREAT” at the violin, and the program outputs a false positive.

Since we can’t confirm or dismiss this bias, we can only infer some broad conclusions about written sentiment’s effect on user behavior.

Articles with normalized sentiment scores closer to -1 or 1 receive more reading time. As well, when we split the data into positive and negative groups and applied separate linear models that control for article length, we found statistically significant relationships between sentiment and TTR. For positive articles, intensifying positive sentiment towards 1 increases TTR by approximately 140% ± 7%, while for negative articles, intensifying negative sentiment towards -1 increases TTR by approximately 75% ± 11%.

W e also observed significant seasonal variation within all five genres, especially politics:

While Seasonal Affective Disorder, which could affect as much as 10% of the population, likely plays a role, certain topics may also provoke stronger than average negative or positive reactions that would contribute to the trend above. Beyond this, however, do specific topics influence positive and/or negative sentiment intensity among writers?

Unsupervised topic modeling using Latent Dirichlet Allocation (LDA) did create an oddball summary of possible topics that authors in the sample corpora may have been covering during this time period, but the technique produces a different model each time and varies widely depending on the number of topics specified.

10 topic LDA — Articles tagged “Politics” published between October 2015 and January 2016

5 topic LDA — Articles tagged “Politics” published between October 2015 and January 2016

8 topic LDA — Articles tagged “Politics” published between October 2015 and January 2016

Although LDA didn’t reveal any concrete information about topics more or less strongly associated with negative or positive sentiment, more basic n-gram analyses generated some interesting hints based on the four-word (quadgram) combinations that appeared most frequently within ‘politics’:

Most commonly occurring quadgrams in articles tagged “Politics,” Oct 25, 2015 to Jan 1, 2016

Difficult as it is for computers to parse language, computational linguistics, especially in subjective realms like sentiment analysis, still depends more heavily on human interpretation than anything else. As we see from the analyses above, even the latest methods sometimes don’t easily provide sufficient, observable evidence to draw firm conclusions. We can see, however, that it is possible to automatically identify topics articles cover and loosely associate these topics with potentially important predictive features like sentiment.

Words with Feeling — Sentiment Analysis in the Age of Trump

Written by Andrew Pederson