Sentiment Analysis: Introduction with a Simple Technical Example.

Dennis Alexander Morozov
Analytics Vidhya
Published in
6 min readJun 26, 2021

What is it

Sentiment analysis is the term used to study the general mood/feeling/ or sentiment of human language. It can also be applied to other fields (perhaps humpback whale mating songs) but we will only look at sentiment analysis of text.

The topic can be a bit spooky to an outsider and may seem intrusive to our humanity. Can we really quantify our emotions ? Of course we can, and there are many useful applications for sentiment analysis. It’s not perfect, far from perfect actually, but it can be insightful and fun to work with.

Use Cases

User Experience: A company redesigned their website and wants to measure if user comments are generally more positive or negative compared to the previous benchmark measure.

Product Development: A R&D team asks users to review a prototype and uses the review questionnaires for sentiment analysis to gauge the general reaction to the new product.

Customer Service: Help-desk services can use sentiment analysis along with ticket priority level to rank which issues to address first. For example, one solution is to start from most upset customers with the highest priority to most happy customers with the lowest priority.

Political Polls: A political organization may perform sentiment analysis on a twitter hashtag to “poll” the general sentiments of a new law.

Examples

Sentiment analysis generally looks at a simple breakdown of positive, neutral, and negative scores.

Each word, or collection of words, will have all three scores plus a compound score which is just a combination of individual scores to relay some aggregate measure.

“I’m gonna make him an offer he can’t refuse.” The Godfather (1972)

from nltk.sentiment.vader import SentimentIntensityAnalyzertext = "I'm gonna make him an offer he can't refuse."sia = SentimentIntensityAnalyzer()
sia.polarity_scores(text)
output: {'compound': 0.2235, 'neg': 0.0, 'neu': 0.809, 'pos': 0.191}

neg measures how negative the text is

neu measures how neutral the text is

pos measures how positive the text is

compound is the overall scores for the text and can be interpreted in the following way:

  • 1 to 0.5 — generally positive sentiment :
  • 0.5 to -0.5 — neutral sentiment
  • -0.5 to -1 — negative sentiment

Let’s look at a few other examples. Notice that shorter passages are harder to analyze, and most are neutral in sentiment.

“I’m king of the world!” (Titanic 1997)

sia.polarity_scores(text)
output: {'compound': 0.0, 'neg': 0.0, 'neu': 1.0, 'pos': 0.0}

“Carpe diem. Seize the day, boys. Make your lives extraordinary.” — Dead Poets Society, 1989.

sia.polarity_scores(text)output: {'compound': 0.0, 'neg': 0.0, 'neu': 1.0, 'pos': 0.0}
# "seize the day" and "extraordinary" are very positive words. Again, sentiment analysis is not perfect.

“Mama always said life was like a box of chocolates. You never know what you’re gonna get.” — Forrest Gump, 1994

sia.polarity_scores(text)
output: {'compound': 0.3612, 'neg': 0.0, 'neu': 0.857, 'pos': 0.143}

To recover from the emotional mess Titanic, Dead Poets Society, and Forrest Gump left you in, let’s look at something even more exciting, the preamble from the US constitution…

PREAMBLE = """
We the People of the United States,\
in Order to form a more perfect Union, establish Justice,\
insure domestic Tranquility, provide for the common defense, \
promote the general Welfare, and secure the Blessings of \
Liberty to ourselves and our Posterity, do ordain and establish \
this Constitution for the United States of America.\
"""
sia.polarity_scores(PREAMBLE)
output: {'compound': 0.9744, 'neg': 0.0, 'neu': 0.608, 'pos': 0.392}

Ahaa, finally something overly positive!

Simple but Technical Explanation on How it Works in 4 steps

  1. Tokenize dictionary
  • Create a map for each word to a numeric id. Essentially assign a unique index for each word.
  • This can be done to the whole English language, or some set of words you want to work with.
  • For example, let’s take some not so random collection of words:
+----+---+----+------+------+
| 0 | 1 | 2 | 3 | 4 |
+----+---+----+------+------+
| No | I | am | your |father|
+----+---+----+------+------+

2. Vectorize your body of text

  • Since math is done with numbers, lets take the words we want to analyze and transform them into vectors. We can do that by projecting a 0 if a word from the dictionary is missing in the sentence, or 1 if a word is present.
  • Example sentences:
"No, I am your father." - Star Wars (the correct quote) +---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | <- index
+---+---+---+---+---+
| 1 | 1 | 1 | 1 | 1 | <- vector
+---+---+---+---+---+
"I am a father." - a simple sentence (ignore "a" for now)+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | <- index
+---+---+---+---+---+
| 0 | 1 | 1 | 0 | 1 | <- vector
+---+---+---+---+---+

3. Train Model

  • By training the model, I mean to simply designate a sentiment score for each word. This can be done manually, or by using some other machine learning approach.
  • For our example, let’s only use a single compound score to assign a sentiment value.
  • I will assign scores based on my own sentiment perception, thus bias must be seriously considered in sentiment analysis.
indices 
+--+--+--+----+------+
| 0| 1| 2| 3 | 4 |
+--+--+--+----+------+
text
+--+--+--+----+------+
|No| I|am|your|father|
+--+--+--+----+------+
scores
+--+--+--+----+------+
|-1| 1| 1| 0 | 1 |
+--+--+--+----+------+
1 = positive
-1 = negative
0 = neutral

4. Fit text to model.

  • This is where you fit the text you want to analyze to the model. The end goal is to get a sentiment score.
  • The score is another vector that maps to the dictionary. We don’t need to have a score for each word in the dictionary as we do in our example. Other machine learning models can average the scores for words that don’t have an exact sentiment value associated with them.
index from the dictionary
+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 |
+---+---+---+---+---+
vector for "No, I am your father"
+---+---+---+---+---+
| 1 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+
vector for "I am a father"
+---+---+---+---+---+
| 0 | 1 | 1 | 0 | 1 |
+---+---+---+---+---+
score vector
+---+---+---+---+---+
|-1 | 1 | 1 | 0 | 1 |
+---+---+---+---+---+
  • Lets use the two sentences and the vectors we created in step 2 to fit them to the model and get a sentiment score. Use dot product to multiply the word vectors with score vector. Divide the result of the dot product by the number of ones in the word vector to get the “weighted” or the combined sentiment score that will be between -1 and 1.
# "No, I am your father."
[1,1,1,1,1]*[-1,1,1,0,1]^T = 2/5 = 0.4
# "I am a father."
[0,1,1,0,1]*[-1,1,1,0,1]^T = 3/3 = 1
  • It was tempting to score the Star Wars quote “No, I am your father” as negative since that was the worst news Luke Skywalker could have received. The score is actually neutral at 0.4. I think Luke was neutral about the news also. By the way, sarcasm is very difficult to measure with machine learning.
  • The text “I am a father.” has a positive score of 1.
  • We divided by the count of ones in the word vector to normalize the score by the number of present words in the sentence, which are designated by a 1 in the vector.

I hope that was a helpful introduction into sentiment analysis, with a simple but technical example.

--

--

Dennis Alexander Morozov
Analytics Vidhya

Data Scientist, Machine Learning Engineer, Life Long Student