What is Sentiment Analysis?
Sentiment analysis (also known as opinion mining or emotion AI) is essentially the process of determining the emotional tone behind a series of words, used to gain an understanding of the attitudes, opinions and emotions expressed within an online mention.
Why should I care about it?
The applications of sentiment analysis are endless and extremely powerful!
The Obama administration used sentiment analysis to gauge public opinion to policy announcements and campaign messages ahead of the 2012 presidential election.
Companies monitor social media to track customer reviews, survey responses and competitors. The finance industry uses it to predict stock prices by understanding customers’ sentiment toward certain brands.
Sentiment analysis is in demand because of its efficiency. Thousands of text documents can be processed for sentiment (and other features, including named entities, topics, themes, etc.) in seconds, compared to the hours it would take a team of people to manually complete.
What is the magic behind it?
There is no magic. It’s math and statistics. In this case, it’s Naive Bayes classifier.
What is Naive Bayes Classification?
Naive Bayes Classification is a family of algorithms based on a common principle. All Naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable.
Why is it called “Naive”?
As explained above, all Naive Bayes are based on the independent feature assumption. Despite their naive design and apparently oversimplified assumptions, Naive Bayes classifiers have worked quite well in many complex real-world situations.
Efficient and highly scalable
Abstractly, Naive Bayes is a conditional probability model: Given a problem instance to be classified, represented by a vector x=(x1,….xn) representing some n features (independent variables), it assigns to this instance probabilities for each of k possible outcomes or classes:
This can be translated into English:
Multinomial Naive Bayes
With a multinomial event model, samples (feature vectors) represent the frequencies with which certain events have been generated by a multinomial (p1, p2…. pn) where pi is the probability that event i occurs (or K such multinomials in the multiclass case).
Bernoulli Naive Bayes
In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence features are used rather than term frequencies.
The model is based on a simple model from this blog on sentiment analysis as a starting point. In the model, I use a movie review corpus from NLTK with reviewed categorized into two categories: positive reviews and negative reviews. I simply started with three simple Naive Bayes classifiers as a baseline, with boolean word as feature extraction. Then evaluate the model based on their accuracy, recall and precision.
Word feature extraction
I extracted word features as the training set. For this problem, I used a simplified bag-of-words model and mapped individual word (feature name) to a boolean value (feature value).
Three quarters of total records, or 1,500 records, were used as the training set, and another 500 records was used as the testing set to evaluate the model.
Naive Bayes classifier
Multinomial Naive Bayes
Bernoulli Naive Bayes classifier
Evaluation and conclusion
For evaluation, I used accuracy as the core metric to evaluate the model. In addition, precision and recall are also used, as they provide great insights on biases.
The results are shown below:
As seen, Multinomial Naive Bayes and Bernoulli Naive Bayes performed significantly better than the regular Naive Bayes, with an accuracy of about 80. I am quite amazed by this result considering human sentiment analysis accuracy is around 80 percent too. It’s also surprising that “avoids” is one of the top most informative features.
According to the precision and recall, 98% positive reviews are identified by the model. On the other hand, 96% of selected negative reviews are correct.
Flaws and next steps
- I didn’t filter out stop words for this model, and it’s a good practice, in general, to remove noise like stop words.
- This model is using the bag-of-words model, and it treats each word as an individual object. Bag-of-words ignores the context of words. It can fail badly in some specific cases. For example: “Not bad” doesn’t not-equal-to “bad.” I will apply N-gram model in the next version.
- For the word features, it used a boolean value as feature values. I am interested in using term frequencies along with TF-IDF to see how the model performs.
- In addition, I plan to create a hybrid model that consists of all the models, having each model vote and take the majority votes as the result, which might improve the accuracy as well.
- Eventually, I plan to extract social media newsfeeds (Twitter for example) and apply this model to get an idea of people’s opinions.