Sentiment Analysis in Five Lines of Python

by Michael Fire (originally published in dato.com)

Machine learning makes it possible for data scientists to build intelligent applications, including fraud detection, recommendation systems, and sentiment analysis. Sentiment analysis is a fast-growing use case for machine learning in the enterprise. From my experience, it is also one of the most useful Natural Language Processing tasks. In this notebook, I demonstrate how to create a Bag-of-Words sentiment classifier in just five lines of Python using GraphLab Create’s text analytics Toolkit.

Sentiment analysis can often be done via text classification. There are many methods available — just type text classification in Google and you will be overwhelmed by the number of results. Aside from classifying product reviews as either positive or negative, text classification can also be used to decide if an email is spam or ham, or identify the language of a text.

One of the classic methods for text classification involves a Bag-of-Words model, which simply uses the frequency of words in the text as features. In this notebook I demonstrate how easy it is to create a Bag-of-Words model using GraphLab Create’s text analytics Toolkit. In fact, it took me about five simple code lines to create a classifier (with AUC of ~0.88) that can decide if an IMDB movie review is positive or negative in sentiment.

import graphlab as gl
train_data = gl.SFrame.read_csv (traindata_path,
header=True,
delimiter='\t',
quote_char='"',
column_type_hints = {'id':str,
'sentiment' : int, 'review':str }
)
train_data['1grams features'] =
gl.text_analytics.count_ngrams(train_data['review'], 1)
train_data['2grams features'] =
gl.text_analytics.count_ngrams(train_data['review'], 2)
cls = gl.classifier.create (train_data,
target='sentiment',
features=['1grams features',
'2grams features'])

Another common representation for unstructured text is called TF-IDF. With just one additional line of code, I can create a text classifier that uses the TF-IDF representation of text. I leave it to you to discover these capabilities on your own and to challenge yourself to create models for sentiment analysis using GraphLab Create.

Our company was founded on a mission to create a more intelligent world. Sentiment analysis is an exciting new area of our work. If you have any questions, feel free to write me an email or post a comment at the end of this page.


Originally published at blog.dato.com.