Sentiment analysis Analysis Part 3 — Neural Networks

Lucas Oliveira
Aug 1, 2017 · 2 min read

In the next set of topics we will dive into different approachs to solve the hello world problem of the NLP world, the sentiment analysis.

Check the other parts: Part1 Part2 Part3

The code for this implementation is at https://github.com/iolucas/nlpython/blob/master/blog/sentiment-analysis-analysis/neural-networks.ipynb

Sentiment analysis is an area of research that aims to tell if the sentiment of a portion of text is positive or negative.

The Code

We will use two machine learning libraries:

  • scikit-learn to create onehot vectors from our text and split the dataset into train, test and validation;

Our dataset is composed of movie reviews and labels telling whether the review is negative or positive. Let’s load the dataset:

The reviews file is a little big, so it is in zip format. Let’s Extract it with the the zipfile module:

Now that we have the reviews.txt and labels.txt files, we load them to the memory:

Next we load the module to transform our review inputs into binary vectors with the help of the class MultiLabelBinarizer:

After that we split the data into training and test set with the train_test_split function. We then split the test set in half to generate a validation set:

We then define two functions: label2bool, to convert the string label to a binary vector of two elements and get_batch, that is a generator to return parts of the dataset in a iteration:

Tensorflow connects expressions in structures called graphs. We first clear any existing graph , then get the vocabulary length and declare placeholders that will be used to input our text data and labels:

This post does not intend to be a tensorflow tutorial, for more details visit https://www.tensorflow.org/get_started/

We then create our neural network:

  • h1 is the hidden layer that received as input the text words vectors;

We then train the network, periodically printing its current accuracy and loss:

With this network we got an accuracy of 90%! With more data and using a bigger network we can improve this result even further!

Please recommend this post so we can spread the knowledge

Leave any questions and comments below

NLPython

Deep learning and natural language processing with python.

Lucas Oliveira

Written by

Engineer focused on Artificial Inteligence

NLPython

NLPython

Deep learning and natural language processing with python.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade