NLP: Pre-trained Sentiment Analysis

Mohammed Terry-Jack
3 min readMay 1, 2019

--

Let’s evaluate some pretrained sentiment analysis tools provided in various Pythonic NLP libraries.

NLTK

import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
sid.polarity_scores(sentence)
NLTK Vader’s predicted sentiment for the sentence and each individual word

NLTK’s Vader sentiment analysis tool uses a bag of words approach (a lookup table of positive and negative words) with some simple heuristics (e.g. increasing the intensity of the sentiment if some words like “really”, “so” or “a bit” are present).

Notice that the overall sentence sentiment is slightly less negative if we change the phrase from “so dumb” to simply “dumb”, even though the individual word sentiments are still the same

The advantage of this approach is that sentences containing negated positive words (e.g. “not happy”, “not good”) will still receive a negative sentence sentiment (thanks to the heuristics to flip the sentiment of the word following a negation). Some simpler sentiment analysis tools will just take the average of the sentiments of the words and would miss subtle details like this

Even though there are only neutral and positive words in the sentence, it is classified as negative overall due to the negation “not”

The disadvantage of this approach is that Out of Vocab (OOV) words that the sentiment analysis tool has not seen before will not be classified as positive/negative (e.g. typos).

Notice how misspelling the word “dumb” results in a neutral sentiment prediction for that word, tipping the overall sentence sentiment into the positive

Textblob

Textblob’s Sentiment Analysis works in a similar way to NLTK — using a bag of words classifier, but the advantage is that it includes Subjectivity Analysis too (how factual/opinionated a piece of text is)!

from textblob import TextBlob
TextBlob(sentence).sentiment
Textblob‘s sentiment and subjectivity analysis

However, it doesn’t contain the heuristics that NLTK has, and so it won’t intensify or negate a sentence’s sentiment.

Notice how the sentence’s sentiment for the sentence “…but you can be dumb sometimes”) is no more or less negative to before (“…but you can be so dumb sometimes”),

Flair

Flair’s sentiment classifier is based on a character-level LSTM neural network which takes sequences of letters and words into account when predicting

!pip3 install flair
import flair
flair_sentiment = flair.models.TextClassifier.load('en-sentiment')
s = flair.data.Sentence(sentence)
flair_sentiment.predict(s)
total_sentiment = s.labels
total_sentiment

The network has learnt to take negations into account

“i do like you” = positive
“I do not like you” = negative

As well as intensifiers

“…but you can be dumb sometimes” = positive
“…but you can be so dumb sometimes” = less positive

But probably one of its biggest advantages is that it can predict a sentiment for OOV words that it has never seen before too (such as typos)

predicts a negative sentiment for the misspelt word “dum”

DeepMoji

This last one isn’t technically a sentiment analysis tool, because it predicts emojis for a sentence, however, I’ve included it here because this type of classifications demonstrates an awareness of sentiment (and even emotion) from the model.

!git clone https://github.com/huggingface/torchMoji
import os
os.chdir('torchMoji')
!pip3 install -e .
!python3 scripts/download_weights.py
!python3 examples/text_emojize.py --text f" {sentence} "
Predicted emojis reflect the sentiment (and emotion) of the sentence

The classifier is also neural, but is slightly more sophisticated (a deep bi-LSTM with an attention mechanism)

--

--