Build your own sentiment analysis classifier with NLTK
Man: “Hey baby, I’m tired, is it ok if we just stay home tonight?”
Woman: “Sure…”
Man: “You are the best”
Woman: “I BET IF IT WAS WITH THAT B***H MICHELLE YOU WOULD WANT TO GO OUT”
Well, that wouldn’t happen if you could analyse her message using this sweet sentiment analysis classifier we’re about to build.
Just kidding, it would still happen, but it’s good fun, promise!
We’re going to use NLTK, and we’re going to train our own Naive Bayes classifier.
A key thing about a Naive Bayes Classifier is that it looks at each individual feature individually and assumes they are not connected to each other at all, so then the first thing we need to do is the ability to extract each individual word so that we can look at them individually.
Next, we require a large database of analysed text to feed our classifier with information about sentiments, that’s really when NLTK comes in handy, as besides allowing you to use the Naive Bayes algorithm without having to code it, it provides your with a large database of movie reviews you can use, divided by negative and positive reviews.
You will notice in the code below, we separate the positive reviews from the negative. We also have a cutoff for training and testing. We’re using 3/4 of the 2000 reviews to train and 1/4 for testing and validating accuracy.
At the very bottom of the file, we’re saving the classifier to disk so that it can be used later.
The last piece of code loads the classifier and expects a text input to analyse.
To test it, after training, run python classify “Machine Learning is fun!”
It results in something like this:
Correctly identified the sentiment of my phrase as positive or negative.
This has many applications, for instance, in the case of a customer service chatbot, this could be used to identify that the customer is angry or frustrated and change the tone of the messages.
There are several services out there that offer more accurate and more specific sentiments such as Amazon Comprehend and Google NLP, but besides the extra cost, these are cloud services, which can add to latency.
So, what can you build with this? 😉