Sentiment analysis Analysis Part 3 — Neural Networks

Published in

NLPython

2 min readAug 1, 2017

In the next set of topics we will dive into different approachs to solve the hello world problem of the NLP world, the sentiment analysis.

Check the other parts: Part1 Part2 Part3

The code for this implementation is at https://github.com/iolucas/nlpython/blob/master/blog/sentiment-analysis-analysis/neural-networks.ipynb

Sentiment analysis is an area of research that aims to tell if the sentiment of a portion of text is positive or negative.

The Code

We will use two machine learning libraries:

scikit-learn to create onehot vectors from our text and split the dataset into train, test and validation;
tensorflow to create the neural network and train it.

Our dataset is composed of movie reviews and labels telling whether the review is negative or positive. Let’s load the dataset:

The reviews file is a little big, so it is in zip format. Let’s Extract it with the the zipfile module:

Now that we have the reviews.txt and labels.txt files, we load them to the memory:

Next we load the module to transform our review inputs into binary vectors with the help of the class MultiLabelBinarizer:

After that we split the data into training and test set with the train_test_split function. We then split the test set in half to generate a validation set:

We then define two functions: label2bool, to convert the string label to a binary vector of two elements and get_batch, that is a generator to return parts of the dataset in a iteration:

Tensorflow connects expressions in structures called graphs. We first clear any existing graph , then get the vocabulary length and declare placeholders that will be used to input our text data and labels:

This post does not intend to be a tensorflow tutorial, for more details visit https://www.tensorflow.org/get_started/

We then create our neural network:

h1 is the hidden layer that received as input the text words vectors;
logits is the final layer that receives the h1 as input;
output is the result of applying the sigmoid function to the logits;
loss is the loss expression to calculate the current error of the neural network;
optimizer is the expression to adjust the weights of the neural network in order to reduce the loss expression;
correct_pred and accuracy are used to calculate the current accuracy of the neural network ranging from 0 to 1.

We then train the network, periodically printing its current accuracy and loss:

With this network we got an accuracy of 90%! With more data and using a bigger network we can improve this result even further!

Sentiment analysis Analysis Part 3 — Neural Networks

The Code

Please recommend this post so we can spread the knowledge

Leave any questions and comments below

Written by Lucas Oliveira