Sentiment Analysis Series 1

Xikai Zhao
Aug 11, 2017 · 5 min read

What is Sentiment Analysis?

Sentiment analysis (also known as opinion mining or emotion AI) is essentially the process of determining the emotional tone behind a series of words, used to gain an understanding of the attitudes, opinions and emotions expressed within an online mention.

Why should I care about it?

The applications of sentiment analysis are endless and extremely powerful!

What is the magic behind it?

There is no magic. It’s math and statistics. In this case, it’s Naive Bayes classifier.

What is Naive Bayes Classification?

Naive Bayes Classification is a family of algorithms based on a common principle. All Naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable.

Why is it called “Naive”?

As explained above, all Naive Bayes are based on the independent feature assumption. Despite their naive design and apparently oversimplified assumptions, Naive Bayes classifiers have worked quite well in many complex real-world situations.

Efficient and highly scalable

Maximum-likelihood training can be done by evaluating a closed-form expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers.

Image for post
Image for post
Image for post
Image for post

Multinomial Naive Bayes

With a multinomial event model, samples (feature vectors) represent the frequencies with which certain events have been generated by a multinomial (p1, p2…. pn) where pi is the probability that event i occurs (or K such multinomials in the multiclass case).

Image for post
Image for post

Bernoulli Naive Bayes

In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence features are used rather than term frequencies.

Image for post
Image for post

Preparation

The model is based on a simple model from this blog on sentiment analysis as a starting point. In the model, I use a movie review corpus from NLTK with reviewed categorized into two categories: positive reviews and negative reviews. I simply started with three simple Naive Bayes classifiers as a baseline, with boolean word as feature extraction. Then evaluate the model based on their accuracy, recall and precision.

Image for post
Image for post

Word feature extraction

I extracted word features as the training set. For this problem, I used a simplified bag-of-words model and mapped individual word (feature name) to a boolean value (feature value).

Training

Three quarters of total records, or 1,500 records, were used as the training set, and another 500 records was used as the testing set to evaluate the model.

Naive Bayes classifier

Multinomial Naive Bayes

Bernoulli Naive Bayes classifier

Evaluation and conclusion

For evaluation, I used accuracy as the core metric to evaluate the model. In addition, precision and recall are also used, as they provide great insights on biases.

Image for post
Image for post

Flaws and next steps

  1. I didn’t filter out stop words for this model, and it’s a good practice, in general, to remove noise like stop words.
  2. This model is using the bag-of-words model, and it treats each word as an individual object. Bag-of-words ignores the context of words. It can fail badly in some specific cases. For example: “Not bad” doesn’t not-equal-to “bad.” I will apply N-gram model in the next version.
  3. For the word features, it used a boolean value as feature values. I am interested in using term frequencies along with TF-IDF to see how the model performs.
  4. In addition, I plan to create a hybrid model that consists of all the models, having each model vote and take the majority votes as the result, which might improve the accuracy as well.
  5. Eventually, I plan to extract social media newsfeeds (Twitter for example) and apply this model to get an idea of people’s opinions.

RetailMeNot Engineering

Saving The World Money Since ‘09

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store