Sentiment Analysis in Action: A Case Study with Movie Reviews using NLP Techniques

7 min readJan 21, 2024

In the ever-evolving landscape of Natural Language Processing (NLP), sentiment analysis stands as a pivotal application, offering insights into the emotional tone of textual data. This case study embarks on a journey through the creation of a sentiment analysis model using a neural network. Leveraging the popular IMDB movie reviews dataset, we explore the intricacies of data preprocessing, model construction, training, evaluation, and visualization.

Sentiment analysis, also known as opinion mining, involves determining the sentiment expressed in a piece of text — whether it’s positive, negative, or neutral. With the exponential growth of user-generated content on the internet, sentiment analysis has become a valuable tool for understanding public opinion, customer feedback, and social media dynamics.

In this case study, we focus on sentiment analysis of movie reviews, a classic application where understanding audience sentiment is crucial for filmmakers, critics, and moviegoers alike. Our goal is to develop a neural network model capable of discerning the sentiment conveyed in reviews — whether a movie review exudes positivity or negativity.

The IMDB Movie Reviews Dataset

The chosen battleground for our sentiment analysis adventure is the IMDB dataset, a curated collection of movie reviews labeled with sentiment scores. Each review is associated with a sentiment label: 1 for positive and 0 for negative. The dataset’s richness lies not only in its sizable corpus but also in the diversity of language, expressions, and sentiments it encapsulates.

We import essential libraries, including TensorFlow for building and training the neural network, Matplotlib for plotting, and scikit-learn for metrics like ROC curve. imdb is a dataset containing movie reviews labeled with sentiment, and Tokenizer and pad_sequences are used for text preprocessing.

# Import necessary libraries
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

We load the IMDB dataset, which consists of movie reviews labeled as positive (1) or negative (0). num_words=10000 limits the vocabulary to the top 10,000 most frequent words, keeping the dataset manageable.

# Load the IMDB dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

We check the shape of the training and testing data.

# Check the shape of the training and testing sets
print("Training data shape:", x_train.shape)
print("Testing data shape:", x_test.shape)

Training data shape: (25000,)
Testing data shape: (25000,)

We print the first review in the training set and its corresponding sentiment label. The review is represented as a sequence of word indices, and the sentiment label is either 0 or 1.

# Explore a sample review and its corresponding sentiment label
print("Sample Review:")
print(x_train[0])
print("Sentiment Label:")
print(y_train[0])

Sample Review:
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
Sentiment Label:
1

We decode the sample review back to words using the word index provided by the IMDB dataset. The reverse_word_index maps integer indices back to words, and we use it to reconstruct the original review.

# Decode the sample review back to words
word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in x_train[0]])

print("Decoded Review:")
print(decoded_review)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1641221/1641221 [==============================] - 0s 0us/step
Decoded Review:
? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all

We pad sequences to ensure that all reviews have the same length (max_length). Padding is essential for feeding data into a neural network, as it requires inputs of consistent dimensions.

# Perform data preprocessing
max_length = 100
x_train = pad_sequences(x_train, maxlen=max_length)
x_test = pad_sequences(x_test, maxlen=max_length)

We define a simple neural network for sentiment analysis. The model includes an embedding layer, a flattening layer, and a dense output layer with a sigmoid activation function for binary classification.

# Define the neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=10000, output_dim=8, input_length=max_length),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

We compile the model, specifying the optimizer, loss function, and metrics. adam is a popular optimization algorithm, and binary_crossentropy is appropriate for binary classification.

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

We print a summary of the model architecture, showing the number of parameters and layers.

# Display the model summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, 100, 8)            80000     
                                                                 
 flatten (Flatten)           (None, 800)               0         
                                                                 
 dense (Dense)               (None, 1)                 801       
                                                                 
=================================================================
Total params: 80801 (315.63 KB)
Trainable params: 80801 (315.63 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

We train the model on the training data for 5 epochs with a batch size of 64. The validation_split=0.2 means 20% of the training data is used for validation during training.

# Train the model
history = model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2, verbose=1)

Epoch 1/5
313/313 [==============================] - 2s 4ms/step - loss: 0.6687 - accuracy: 0.6106 - val_loss: 0.5726 - val_accuracy: 0.7702
Epoch 2/5
313/313 [==============================] - 1s 3ms/step - loss: 0.4218 - accuracy: 0.8404 - val_loss: 0.3727 - val_accuracy: 0.8412
Epoch 3/5
313/313 [==============================] - 1s 3ms/step - loss: 0.2843 - accuracy: 0.8940 - val_loss: 0.3369 - val_accuracy: 0.8538
Epoch 4/5
313/313 [==============================] - 2s 5ms/step - loss: 0.2205 - accuracy: 0.9244 - val_loss: 0.3287 - val_accuracy: 0.8562
Epoch 5/5
313/313 [==============================] - 2s 5ms/step - loss: 0.1737 - accuracy: 0.9459 - val_loss: 0.3298 - val_accuracy: 0.8548

We define a function to plot the training and validation accuracy as well as loss over epochs. This helps visualize how well the model is learning from the training data.

# Visualize training history
def plot_history(history):
    plt.figure(figsize=(10, 4))
    
    # Plot training & validation accuracy values
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title('Model accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend(['Train', 'Validation'], loc='upper left')

    # Plot training & validation loss values
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend(['Train', 'Validation'], loc='upper left')

    plt.tight_layout()
    plt.show()

# Visualize training history
plot_history(history)

We evaluate the trained model on the test set and print the test accuracy.

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print("\nTest Accuracy:", test_accuracy)

782/782 [==============================] - 1s 1ms/step - loss: 0.3325 - accuracy: 0.8548

Test Accuracy: 0.8547599911689758

We define a function to visualize actual vs predicted sentiment labels and plot the ROC curve. The ROC curve shows the trade-off between sensitivity (true positive rate) and specificity (true negative rate).

# Visualize model predictions
def visualize_predictions(model, x_test, y_test):
    predictions = model.predict(x_test)

    plt.figure(figsize=(10, 4))

    # Plot actual vs predicted sentiment labels
    plt.subplot(1, 2, 1)
    plt.hist(y_test, bins=[-0.5, 0.5, 1.5], alpha=0.7, rwidth=0.8, color='blue')
    plt.hist(predictions, bins=[-0.5, 0.5, 1.5], alpha=0.7, rwidth=0.8, color='orange')
    plt.title('Actual vs Predicted Sentiment Labels')
    plt.xlabel('Sentiment Label')
    plt.ylabel('Count')
    plt.xticks([0, 1], ['Negative', 'Positive'])

    # Plot ROC curve
    plt.subplot(1, 2, 2)
    fpr, tpr, _ = roc_curve(y_test, predictions)
    roc_auc = auc(fpr, tpr)  # Calculate AUC
    plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'AUC = {roc_auc:.2f}')
    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.legend()

    plt.tight_layout()
    plt.show()

# Visualize model predictions
visualize_predictions(model, x_test, y_test)

As our journey through sentiment analysis with neural networks draws to a close, we reflect on the insights gained from the IMDB movie reviews dataset. The culmination of data exploration, model construction, training, and evaluation unveils a nuanced understanding of sentiment classification in the realm of movie reviews. The journey through sentiment analysis with neural networks serves as a testament to the synergy between language, emotions, and machine learning. As we navigate the ever-expanding landscape of NLP, the lessons learned from this case study provide a foundation for enthusiasts and practitioners to embark on their own ventures in understanding and decoding sentiments within textual data.

Sentiment Analysis in Action: A Case Study with Movie Reviews using NLP Techniques

Written by Qasim Al-Ma'arif