My computer feels me

Sentiment analysis is used to establish if the author feels positively, neutrally, or negatively about a topic. It is used for analyzing things like movie reviews and social media.

At The Power of Neural Tensor Networks Meetup (full video here), Patrick Smith explained the techniques behind the best sentiment analysis available today.

Show me the sentiment analysis

Stanford Treebank is an excellent working example of this in action (sadly their site seems to go down frequently). It will do sentiment analysis on any text that you give it, e.g.

It seemed to work well for me, except for song lyrics, which seem to be more nuanced, e.g. I was expecting this classic Beatles lyric to be neutral, but it came out negative:

[Their website has a feedback mechanism, so I updated the top node to be neutral and pressed “All labels are now correct” to submit my suggestion.]

Use the structure of the text

It isn’t good enough to treat a sentence as a ‘bag of words’. This is because the order matters, e.g.

white blood cells destroying an infection

is positive, but

an infection destroying white blood cells

is negative.

Using a ‘bag of words’ model, sentiment analysis for single sentence movie reviews never reached above 80% accuracy in 7 years.

For the best results, you need to analyze the structure of the text by creating a parse tree. This is the basis of deep learning for natural language processing (NLP).

Recursive Neural Networks (RNNs)

RNN-based models (this part of the video) inspect words near each other, and instead of using a positive/neutral scale, it uses a multidimensional distribution and builds the best tree for the input.

Manning, Socher, et al.

RNNs are a special case of Recursive Neural Tensor Networks (RNTNs) (this part of the video). Apparently the “Tensor” part means that one vector can modify another as they get combined. I’m probably wrong, but I think the RNTNs effectively allow bigger chunks of text to be evaluated together.

Behind the scenes

The implementation quickly gets very, very complicated, and is beyond my ability to explain — you should just watch the video. :-)

One interesting detail is that words get represented as dense vectors. The TensorFlow documentation has a really good explanation.

Effectively, words get turned into a vector of numbers, and those vectors end up being grouped roughly by their meaning, e.g.

If you search for Manning and Socher you can find a bunch of their academic papers on this topic.

If you’ve read this far and you want a more interesting ending, go read the comments under the Stanford Treebank page. Here’s the first one: