Predicting Movie Review Sentiment with TensorFlow and TensorBoard

16 min readApr 11, 2017

After reading this article, I hope that you have a better understanding of how to use TensorFlow for natural language processing projects, and how to use TensorBoard in general. The article is aimed at being a step above the introductory level, so I will go into some detail about what’s happening in the code, but I’ll skip over most of the basic stuff.

For this analysis, we will be using the data from an old Kaggle competition “Bag of Words Meets Bags of Popcorn” (https://www.kaggle.com/c/word2vec-nlp-tutorial). This dataset contains 25,000 labeled training reviews, 50,000 unlabeled training reviews, and 25,000 testing reviews. Let’s get started!

Load the packages that we need:

import pandas as pd
import numpy as np
import tensorflow as tf
import nltk, re, time
from nltk.corpus import stopwords
from collections import defaultdict
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from collections import namedtuple

Some of these might be unfamiliar to you. I’ll explain what most of them are when the time comes, otherwise Google will be able to help you.

The data is formatted as .tsv, so we will need to load the data with a delimiter.

train = pd.read_csv("labeledTrainData.tsv", delimiter="\t")
test = pd.read_csv("testData.tsv", delimiter="\t")

Inspect the data however you like, but to give you a preview, some of the reviews can be quite long:

# Here's the first review as an exampleWith all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br /><br />Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br /><br />The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br /><br />Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br /><br />Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter.

To improve the performance of your model, you will need to clean the data. Notice the <br /> , you do not want text like this during training, because it won’t give you any indication if a review is positive or negative. This is an example of noise (useless data) and we only want the signal (useful data).

def clean_text(text, remove_stopwords=True):
    '''Clean the text, with the option to remove stopwords'''
    
    # Convert words to lower case and split them
    text = text.lower().split()

    # Optionally, remove stop words
    if remove_stopwords:
        stops = set(stopwords.words("english"))
        text = [w for w in text if not w in stops]
    
    text = " ".join(text)

    # Clean the text
    text = re.sub(r"<br />", " ", text)
    text = re.sub(r"[^a-z]", " ", text)
    text = re.sub(r"   ", " ", text) # Remove any extra spaces
    text = re.sub(r"  ", " ", text)
    
    # Return a list of words
    return(text)

Let’s break this down into two sections: stop words, and re.

# stop words
if remove_stopwords:
    stops = set(stopwords.words("english"))
    text = [w for w in text if not w in stops]

Stop words are words that provide little context to a sentence (a, the, just…). We remove them because they provide more noise than signal for this project. Here’s a link to the list of stop words that we are using: https://gist.github.com/sebleier/554280

Side note: for other projects, I have found it useful to make my own list of stop words. Some of the words included in our list are pronouns, which can be helpful for training other models.

# re
text = re.sub(r"<br />", " ", text)
text = re.sub(r"[^a-z]", " ", text)
text = re.sub(r"   ", " ", text) # Remove any extra spaces
text = re.sub(r"  ", " ", text)

“re” stands for regular expression. It provides shortcuts to manipulate our data.

In the second line of the code block above, we are replacing <br /> with an empty string, thus removing it from our text.

In the third line, we are removing any character that is not a lowercase letter. ^ signals removal, and [] signals within this set.

After you clean all your training and testing reviews, the next step is to tokenize your words.

# Tokenize the reviews
all_reviews = train_clean + test_clean
tokenizer = Tokenizer()
tokenizer.fit_on_texts(all_reviews)
print("Fitting is complete.")

train_seq = tokenizer.texts_to_sequences(train_clean)
print("train_seq is complete.")

test_seq = tokenizer.texts_to_sequences(test_clean)
print("test_seq is complete")

Tokenizing is when you convert words into numbers. So if our sentence is: [“The”, “cat”, “went”, “to”, “the”, “zoo”, “.”] it would be tokenized to [1, 2, 3, 4, 1, 5, 6]. Notice that the number 1 is used for both instances of the word “the”. This means that every words receives its own, unique number.

There are few different ways to tokenize text, but I prefer to use Keras’ method.

There’s not a huge vocabulary to this project, 99,426 words, which we can find by typing:

word_index = tokenizer.word_index

If you want, and it could be worthwhile to try, you could limit the vocabulary to the most common words — say 80,000. Or even better, you could find the number of words that are used at least 5 times, and limit your vocabulary to this number. Your model should benefit from limiting your vocabulary to more common words because it has seen each word in the text multiple times, thereby better understanding what that word means.

If “Goldfinger” appears in your text only once, your model will have no idea that this is one of the best James Bond films, and it won’t be able to adjust the sentiment of the review accordingly. Very common words, such as good or bad, your model will be able to understand that it processing a positive or negative review, respectively.

To get back to the code, here’s what the first review looks like after it has been tokenized:

[445, 86, 489, 10939, 8, 61, 583, 2603, 120, 68, 957, 560, 53, 212, 24485, 212, 17247, 219, 193, 97, 20, 695, 2565, 124, 109, 15, 520, 3954, 193, 27, 246, 654, 2352, 1261, 17247, 90, 4782, 90, 712, 3, 305, 86, 16, 358, 1846, 542, 1219, 3592, 10939, 1, 485, 871, 3538, 23, 526, 673, 1414, 19, 63, 5305, 2089, 1118, 185, 413, 1523, 817, 2583, 7, 10939, 477, 86, 665, 85, 272, 114, 578, 10939, 34480, 29662, 148, 2, 10939, 381, 13, 59, 26, 381, 210, 15, 252, 178, 10, 751, 712, 3, 142, 341, 464, 145, 16427, 4121, 1718, 635, 876, 10547, 1018, 12089, 890, 1067, 1652, 416, 10939, 265, 19, 596, 141, 10939, 18336, 2302, 15821, 876, 10547, 1, 34, 38190, 388, 21, 49, 17539, 1414, 434, 9821, 193, 4238, 10939, 1, 120, 669, 520, 96, 7, 10939, 1555, 444, 2271, 138, 2137, 2383, 635, 23, 72, 117, 4750, 5364, 307, 1326, 31136, 19, 635, 556, 888, 665, 697, 6, 452, 195, 547, 138, 689, 3386, 1234, 790, 56, 1239, 268, 2, 21, 7, 10939, 6, 580, 78, 476, 32, 21, 245, 706, 158, 276, 113, 7674, 673, 3526, 10939, 1, 37925, 1690, 2, 159, 413, 1523, 294, 6, 956, 21, 51, 1500, 1226, 2352, 17, 612, 8, 61, 442, 724, 7184, 17, 25, 4, 49, 21, 199, 443, 3912, 3484, 49, 110, 270, 495, 252, 289, 124, 6, 19622, 19910, 363, 1502]

The next step is to make all of the reviews the same length.

max_review_length = 200

train_pad = pad_sequences(train_seq, maxlen = max_review_length)
print("train_pad is complete.")

test_pad = pad_sequences(test_seq, maxlen = max_review_length)
print("test_pad is complete.")

It’s fair to say that using more text is better, so a longer review length should improve the accuracy of your model. I limited mine to 200 to increase the training speed of my model. Remember to check the length of your text before setting a maximum length. I use numpy’s percentile function for this.

np.percentile(lengths.counts, 80)

With a max length of 200, slightly more than 80% of reviews will include all of their text. Reviews with more than 200 words will have those extra words removed. Reviews with less than 200 words will have padding tokens added until it reaches the length of 200.

It’s possible to be a bit more creative with padding and truncating your text. Rather than having an absolute maximum length, you can set a batch maximum length. I won’t show you how to do that in this article, but I will write about it in the near future. In the meantime, I encourage you to give it a try.

Now we can split the data into a training and a validation set.

x_train, x_valid, y_train, y_valid = train_test_split(train_pad, train.sentiment, test_size = 0.15, random_state = 2)

Normally I would split the data into training, validation, and testing sets, but because we are working with data from a Kaggle competition, we can use its testing data as our test set.

Before we get to building our model, let’s create some functions to make our batches.

def get_batches(x, y, batch_size):
    '''Create the batches for the training and validation data'''
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]def get_test_batches(x, batch_size):
    '''Create the batches for the testing data'''
    n_batches = len(x)//batch_size
    x = x[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size]

These functions will divide the data into equal groups. The one caveat is that if the final batch is smaller than the batch_size, it won’t be used. Therefore your batch size should be a multiple of the length of the datasets you are using, otherwise you will run into an issue if you try to upload your test predictions to Kaggle.

On to building our recurrent neural network! Here’s the function and we’ll break it down, step by step.

def build_rnn(n_words, embed_size, batch_size, lstm_size, num_layers, dropout, learning_rate, multiple_fc, fc_units):
    '''Build the Recurrent Neural Network'''

    tf.reset_default_graph()

    # Declare placeholders we'll feed into the graph
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [None, None], name='inputs')

    with tf.name_scope('labels'):
        labels = tf.placeholder(tf.int32, [None, None], name='labels')

    keep_prob = tf.placeholder(tf.float32, name='keep_prob')

    # Create the embeddings
    with tf.name_scope("embeddings"):
        embedding = tf.Variable(tf.random_uniform((n_words, 
                                    embed_size), -1, 1))
        embed = tf.nn.embedding_lookup(embedding, inputs)

    # Build the RNN layers
    with tf.name_scope("RNN_layers"):
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        drop = tf.contrib.rnn.DropoutWrapper(lstm, 
                                         output_keep_prob=keep_prob)
        cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)
    
    # Set the initial state
    with tf.name_scope("RNN_init_state"):
        initial_state = cell.zero_state(batch_size, tf.float32)

    # Run the data through the RNN layers
    with tf.name_scope("RNN_forward"):
        outputs, final_state = tf.nn.dynamic_rnn(
                                        cell,         
                                        embed,
                                        initial_state=initial_state)    
    
    # Create the fully connected layers
    with tf.name_scope("fully_connected"):
        
        # Initialize the weights and biases
        weights = tf.truncated_normal_initializer(stddev=0.1)
        biases = tf.zeros_initializer()
        
        dense = tf.contrib.layers.fully_connected(outputs[:, -1],
                    num_outputs = fc_units,
                    activation_fn = tf.sigmoid,
                    weights_initializer = weights,
                    biases_initializer = biases)
        
        dense = tf.contrib.layers.dropout(dense, keep_prob)
        
        # Depending on the iteration, use a second fully connected 
          layer
        if multiple_fc == True:
            dense = tf.contrib.layers.fully_connected(dense,
                        num_outputs = fc_units,
                        activation_fn = tf.sigmoid,
                        weights_initializer = weights,
                        biases_initializer = biases)
            
            dense = tf.contrib.layers.dropout(dense, keep_prob)
    
    # Make the predictions
    with tf.name_scope('predictions'):
        predictions = tf.contrib.layers.fully_connected(dense, 
                          num_outputs = 1, 
                          activation_fn=tf.sigmoid,
                          weights_initializer = weights,
                          biases_initializer = biases)
        
        tf.summary.histogram('predictions', predictions)
    
    # Calculate the cost
    with tf.name_scope('cost'):
        cost = tf.losses.mean_squared_error(labels, predictions)
        tf.summary.scalar('cost', cost)
    
    # Train the model
    with tf.name_scope('train'):    
        optimizer = 
            tf.train.AdamOptimizer(learning_rate).minimize(cost)

    # Determine the accuracy
    with tf.name_scope("accuracy"):
        correct_pred = tf.equal(tf.cast(tf.round(predictions), 
                                        tf.int32), 
                                        labels)
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
        tf.summary.scalar('accuracy', accuracy)
    
    # Merge all of the summaries
    merged = tf.summary.merge_all()    

    # Export the nodes 
    export_nodes = ['inputs', 'labels', 'keep_prob','initial_state',        
                    'final_state','accuracy', 'predictions', 'cost', 
                    'optimizer', 'merged']
    Graph = namedtuple('Graph', export_nodes)
    local_dict = locals()
    graph = Graph(*[local_dict[each] for each in export_nodes])
    
    return graph

If you have never used TensorBoard before, I recommend that you watch Siraj Raval’s video: https://www.youtube.com/watch?v=fBVEXKp4DIc, it will help to further explain what is happening in this function. He has many other amazing videos that I encourage you to watch!

tf.reset_default_graph()

Reset your graph before you start to train your model. This ensures that it is ‘clean’ and ready to be trained anew.

# Declare placeholders we'll feed into the graph
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [None, None],  
                                    name='inputs')

    with tf.name_scope('labels'):
        labels = tf.placeholder(tf.int32, [None, None], 
                                    name='labels')

    keep_prob = tf.placeholder(tf.float32, name='keep_prob')

These are the placeholders for our data. tf.name_scope() is used to label a specific part of your graph when you visualize it on TensorBoard.

# Create the embeddings
    with tf.name_scope("embeddings"):
        embedding = tf.Variable(tf.random_uniform((n_words, 
                                  embed_size), -1, 1))
        embed = tf.nn.embedding_lookup(embedding, inputs)

By creating our embeddings, we are converting our words to vectors that have embed_size number of dimensions. Here is a slightly longer, and more detailed answer: https://www.quora.com/What-does-the-word-embedding-mean-in-the-context-of-Machine-Learning

Although I used a random_uniform distribution, there are other ways to create these embeddings. Using a truncated normal distribution, with a small standard deviation can be very good. This would be written as:

embedding = tf.Variable(tf.truncated_normal((n_words, embed_size), -0.1, 0.1))

Feel free to give it a try and see what happens!

# Build the RNN layers
with tf.name_scope("RNN_layers"):
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
    drop = tf.contrib.rnn.DropoutWrapper(lstm, 
                                         output_keep_prob=keep_prob)
    cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)

This is the heart of our recurrent neural network. As you will see from the hyperparameters, we will be using a two layer network with 50% dropout.

# Set the initial state
with tf.name_scope("RNN_init_state"):
    initial_state = cell.zero_state(batch_size, tf.float32)

This creates the initial state of our graph, which we will build from.

# Run the data through the RNN layers
with tf.name_scope("RNN_forward"):
    outputs, final_state = tf.nn.dynamic_rnn(cell, embed,
                                        initial_state=initial_state)

This is the feed forward part of our model. As I mentioned earlier, we could use different maximum review length for each batch. This is made possible by the use of tf.nn.dynamic_rnn . More info about that can be found on its doc page: https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/dynamic_rnn

# Create the fully connected layers
with tf.name_scope("fully_connected"):
        
    # Initialize the weights and biases
    weights = tf.truncated_normal_initializer(stddev=0.1)
    biases = tf.zeros_initializer()
        
    dense = tf.contrib.layers.fully_connected(outputs[:, -1],
                num_outputs = fc_units,
                activation_fn = tf.sigmoid,
                weights_initializer = weights,
                biases_initializer = biases)
        
    dense = tf.contrib.layers.dropout(dense, keep_prob)
        
    # Depending on the iteration, use a second fully connected layer
    if multiple_fc == True:
        dense = tf.contrib.layers.fully_connected(dense,
                    num_outputs = fc_units,
                    activation_fn = tf.sigmoid,
                    weights_initializer = weights,
                    biases_initializer = biases)
            
        dense = tf.contrib.layers.dropout(dense, keep_prob)

This is where we add our first, and possibly second fully connected layers. Their weights and biases are initialized using the method that I mentioned eariler for the embeddings.

You might have seen that multiple_fc was one of our parameters for this function. This allows us to experiment with the architecture of our model. Using this method, you can test many different things about your model, such as how to initialize the weights and biases, whether to use LSTMs or GRUs, etc.

# Make the predictions
with tf.name_scope('predictions'):
    predictions = tf.contrib.layers.fully_connected(dense, 
                      num_outputs = 1, 
                      activation_fn = tf.sigmoid,
                      weights_initializer = weights,
                      biases_initializer = biases)
        
    tf.summary.histogram('predictions', predictions)

We only have 1 output because we are predicting the sentiment on a scale of 0 to 1. Sigmoid is our activation function because it maps the output of our final fully connected layer to this range.

tf.summary.histogram() record our predictions, which we can view as a histogram on TensorBoard. This allows us to see how the distribution of our predictions change as we train the model, and how our validation predictions compare to our training predictions.

# Calculate the cost
with tf.name_scope('cost'):
    cost = tf.losses.mean_squared_error(labels, predictions)
    tf.summary.scalar('cost', cost)

This computes the cost during training. We are using tf.summary.scalar() because cost is a scalar value (definition: (of a quantity) having only magnitude, not direction.), and does not have a range, in the same sense that our predictions do.

# Train the model
with tf.name_scope('train'):    
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

Adam is a common optimizer to use, it trains the model rather efficiently. There are others that you can use, such as stochastic gradient descent or Adagrad, but I’ll leave it up to you to read more about these.

# Determine the accuracy
with tf.name_scope("accuracy"):
    correct_pred = tf.equal(tf.cast(tf.round(predictions), 
                                        tf.int32), 
                                        labels)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    tf.summary.scalar('accuracy', accuracy)

Our predictions are values between 0 and 1. This is why we need to round them before we can see if they are equal to our labels. We use tf.reduce_mean() so that our model tries to maximize the number of correct predictions.

# Merge all of the summaries
merged = tf.summary.merge_all()

We merge all of the summaries to simplify the process of saving this data. This should will be clearer when we look at the train function.

# Export the nodes 
export_nodes = ['inputs', 'labels', 'keep_prob','initial_state',        
                'final_state','accuracy', 'predictions', 'cost', 
                'optimizer', 'merged']
Graph = namedtuple('Graph', export_nodes)
local_dict = locals()
graph = Graph(*[local_dict[each] for each in export_nodes])

Export all of our nodes, so that we can use them in the train function.

Done! We have built our recurrent nerual network! Now it’s time to train!

def train(model, epochs, log_string):
    '''Train the RNN'''

    saver = tf.train.Saver()
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        # Used to determine when to stop the training early
        valid_loss_summary = []
        
        # Keep track of which batch iteration is being trained
        iteration = 0

        print()
        print("Training Model: {}".format(log_string))

        train_writer = tf.summary.FileWriter('./logs/3/train/{}'.format(log_string), sess.graph)        valid_writer = tf.summary.FileWriter('./logs/3/valid/{}'.format(log_string))

        for e in range(epochs):
            state = sess.run(model.initial_state)
            
            # Record progress with each epoch
            train_loss = []
            train_acc = []
            val_acc = []
            val_loss = []

            with tqdm(total=len(x_train)) as pbar:
                for _, (x, y) in enumerate(get_batches(x_train,       
                                               y_train, 
                                               batch_size), 1):
                    feed = {model.inputs: x,
                            model.labels: y[:, None],
                            model.keep_prob: dropout,
                            model.initial_state: state}
                    summary, loss, acc, state, _ =     
                                          sess.run([model.merged, 
                                                  model.cost, 
                                                  model.accuracy, 
                                                  model.final_state, 
                                                  model.optimizer], 
                                                  feed_dict=feed)                
                    
                    # Record the loss and accuracy of each training  
                      batch
                    
                    train_loss.append(loss)
                    train_acc.append(acc)
                    
                    # Record the progress of training
                    train_writer.add_summary(summary, iteration)
                    
                    iteration += 1
                    pbar.update(batch_size)
            
            # Average the training loss and accuracy of each epoch
            avg_train_loss = np.mean(train_loss)
            avg_train_acc = np.mean(train_acc) 

            val_state = sess.run(model.initial_state)
            with tqdm(total=len(x_valid)) as pbar:
                for x, y in get_batches(x_valid,y_valid,batch_size):
                    feed = {model.inputs: x,
                            model.labels: y[:, None],
                            model.keep_prob: 1,
                            model.initial_state: val_state}
                    summary, batch_loss, batch_acc, val_state =     
                                 sess.run([model.merged, 
                                           model.cost, 
                                           model.accuracy, 
                                           model.final_state], 
                                           feed_dict=feed)
                    
                    # Record the validation loss and accuracy of 
                      each epoch
                    
                    val_loss.append(batch_loss)
                    val_acc.append(batch_acc)
                    pbar.update(batch_size)
            
            # Average the validation loss and accuracy of each epoch
            avg_valid_loss = np.mean(val_loss)    
            avg_valid_acc = np.mean(val_acc)
            valid_loss_summary.append(avg_valid_loss)
            
            # Record the validation data's progress
            valid_writer.add_summary(summary, iteration)

            # Print the progress of each epoch
            print("Epoch: {}/{}".format(e, epochs),
                  "Train Loss: {:.3f}".format(avg_train_loss),
                  "Train Acc: {:.3f}".format(avg_train_acc),
                  "Valid Loss: {:.3f}".format(avg_valid_loss),
                  "Valid Acc: {:.3f}".format(avg_valid_acc))

            # Stop training if the validation loss does not decrease 
              after 3 epochs
            
            if avg_valid_loss > min(valid_loss_summary):
                print("No Improvement.")
                stop_early += 1
                if stop_early == 3:
                    break   
            
            # Reset stop_early if the validation loss finds a new low
            # Save a checkpoint of the model
            else:
                print("New Record!")
                stop_early = 0
                checkpoint ="./sentiment_{}.ckpt".format(log_string)
                saver.save(sess, checkpoint)

Just like before, let’s break this down step by step:

saver = tf.train.Saver()

You’ll need to call tf.train.Saver() if you want to save the checkpoints of your model.

train_writer = tf.summary.FileWriter('./logs/3/train/{}'.format(log_string), sess.graph)valid_writer = tf.summary.FileWriter('./logs/3/valid/{}'.format(log_string))

These will record the summaries as you train your model. I suggest that you keep these summaries in a folder, such a “logs”, and separate your training and validating summaries to make comparisons more easily on TensorBoard. Using a unique and descriptive name for each summary will make it much easier to know which iteration you are looking at on TensorBoard.

tqdm() (https://pypi.python.org/pypi/tqdm) will allow you to track the progress of your model within each epoch. I like knowing how much time is left in each epoch, so tqdm is great for that.

# Record the validation data's progress
valid_writer.add_summary(summary, iteration)

This will record your summaries. You do not have to do it after each batch, but this will provide you with the most detail.

# Reset stop_early if the validation loss finds a new low
# Save a checkpoint of the model
else:
    print("New Record!")
    stop_early = 0
    checkpoint = "./sentiment_{}.ckpt".format(log_string)
    saver.save(sess, checkpoint)

I really recommend that you use early stopping. This can save you alot of time when training! Depending on your patience, and model that you are building, you will want to use more than 3 ‘timesteps’ before you stop training your model.

I chose to only save (checkpoint) the best iteration of a model. This saves space on my laptop, but for other projects it can be worthwhile to save multiple iterations. If you are doing a text generation project, for example, it can be interesting to see the improvements of your model with more training.

Here are the default hyperparameters that I used for my model.

n_words = len(word_index)
embed_size = 300
batch_size = 250
lstm_size = 128
num_layers = 2
dropout = 0.5
learning_rate = 0.001
epochs = 100
multiple_fc = False
fc_units = 256

Hyperparameters are a great thing to tune to improve the performance of your model. Below you will see how I tuned mine.

Some specifics to note for this project:

batch_size: 250 divides the data into equal batches. If I used 256, I wouldn’t have been able to upload my predictions to Kaggle because the final batch would have been smaller than the batch size, and would never have been predicted.
epochs: The most epochs that I used for any iteration of my model was 13. I wanted my model to be stopped by early stopping, rather than the number of epochs. Using a large number, like 100, ensured that my model would be fully trained before it stopped training.

# Train the model with the desired tuning parameters
for lstm_size in [64,128]:
    for multiple_fc in [True, False]:
        for fc_units in [128, 256]:
            log_string = 'ru={},fcl={},fcu={}'.format(lstm_size,
                                                      multiple_fc,
                                                      fc_units)
            model = build_rnn(n_words = n_words, 
                              embed_size = embed_size,
                              batch_size = batch_size,
                              lstm_size = lstm_size,
                              num_layers = num_layers,
                              dropout = dropout,
                              learning_rate = learning_rate,
                              multiple_fc = multiple_fc,
                              fc_units = fc_units)            
            train(model, epochs, log_string)

You can see that I am experimenting with lstm_size, multiple_fc, and fc_units. Using this structure you can tune as whatever you want. Just remember to write your log string to reflect what you are tuning.

def make_predictions(lstm_size, multiple_fc, fc_units, checkpoint):
    '''Predict the sentiment of the testing data'''
    
    # Record all of the predictions
    all_preds = []

    model = build_rnn(n_words = n_words, 
                      embed_size = embed_size,
                      batch_size = batch_size,
                      lstm_size = lstm_size,
                      num_layers = num_layers,
                      dropout = dropout,
                      learning_rate = learning_rate,
                      multiple_fc = multiple_fc,
                      fc_units = fc_units) 
    
    with tf.Session() as sess:
        saver = tf.train.Saver()
        # Load the model
        saver.restore(sess, checkpoint)
        test_state = sess.run(model.initial_state)
        for _, x in enumerate(get_test_batches(x_test, 
                                               batch_size), 1):
            feed = {model.inputs: x,
                    model.keep_prob: 1,
                    model.initial_state: test_state}
            predictions = sess.run(model.predictions,feed_dict=feed)
            for pred in predictions:
                all_preds.append(float(pred))
                
    return all_preds

This function will create your predictions for the testing data. The main thing to note, is to change the parameters to match what you have just tuned. Otherwise, your predictions will be made with the default parameters, which are likely to be less optimal than some of the values with which you tuned.

I think that should do for this article. I hope that you learned a thing or two, and congrats for sticking to the end! If you want to see my full code, check it out on my GitHub: https://github.com/Currie32/Movie-Reviews-Sentiment

If you have any questions, ideas to improve this model, or interesting links, please post them in the comment section below! Thanks for reading!

Predicting Movie Review Sentiment with TensorFlow and TensorBoard

Written by Dave Currie