Demystifying Neural Networks: Sentiment Analysis

Understanding the Emotions

Dagang Wei
7 min readFeb 8, 2024
Image created with Bard

This article is part of the series Demystifying Neural Networks.

Introduction

In the era of data, understanding opinions, emotions, and sentiments expressed in text data has become crucial for businesses, policymakers, and researchers alike. Sentiment analysis, a subfield of natural language processing (NLP), offers a powerful lens to gauge public sentiment, analyze customer reviews, and monitor brand health in real-time. This blog post dives into the intricacies of sentiment analysis, explaining its workings and showcasing a hands-on example using the IMDB movie reviews dataset with Keras / TensorFlow.

What is Sentiment Analysis?

Sentiment analysis, sometimes referred to as opinion mining, is the computational process of identifying and categorizing opinions expressed in a piece of text, especially to determine whether the writer’s attitude towards a particular topic, product, or service is positive, negative, or neutral. At its core, sentiment analysis involves the application of machine learning techniques to text data to understand the underlying emotions.

How Does Sentiment Analysis Work?

In this article, we’ll build a deep learning model for sentiment analysis. Here’s a closer look at this model:

  • Embedding Layer: This initial layer transforms words into dense vectors of fixed size, capturing the semantic meaning of each word. Unlike one-hot encoding, embeddings provide a more efficient and nuanced representation of words, allowing the model to understand similarities between words based on their context.
  • GlobalAveragePooling1D Layer: Following the embedding layer, this layer reduces the dimensionality of the data by averaging over the sequence dimension. This step simplifies the model by condensing the information from each review into a fixed-size vector, facilitating the processing of variable-length text.
  • Dense Layers: The model includes dense layers, where neurons are fully connected. The first dense layer uses ReLU (Rectified Linear Unit) activation for its ability to introduce non-linearity, allowing the model to learn complex relationships in the data. The final layer employs a sigmoid activation function, outputting a probability score between 0 and 1 that indicates the sentiment of the review.

This structure is adept at sentiment analysis because it combines the depth of understanding provided by the Embedding layer with the classification power of Dense layers, making it highly effective for interpreting the nuances of human language in text form.

Example

To put theory into practice, let’s analyze sentiments of movie reviews using the IMDB dataset with TensorFlow, a popular open-source library for machine learning and deep learning applications.

The IMDB Movie Reviews Dataset

The IMDB movie reviews dataset is a labeled dataset. It consists of 50,000 movie reviews from the Internet Movie Database (IMDB) split into two sets: 25,000 reviews for training and 25,000 reviews for testing. Each set contains an equal number of positive and negative reviews, making it a balanced dataset. The positive reviews are those with a sentiment score of 7 or higher (out of 10), and the negative reviews have a score of 4 or lower, with neutral reviews typically excluded. This labeling makes the IMDB dataset particularly useful for binary sentiment classification tasks, where the goal is to train a model to predict whether a given review expresses a positive or negative sentiment.

To give you a taste of what the IMDB dataset looks like, here are a few anonymized examples of movie reviews:

  1. Positive Review: “An absolute masterpiece! The performances were top-notch, and the storyline was both engaging and thought-provoking. Definitely a must-watch.”
  2. Negative Review: “Unfortunately, this movie failed to deliver on its promising premise. The plot was predictable, and the acting was lackluster. A disappointing experience.”

These examples illustrate the variance in sentiment and the subjective nature of movie reviews, highlighting the challenges and opportunities in analyzing such data.

Building the Sentiment Analysis Model with Keras

We use Keras to preprocess the text data, build a neural network model, and train it for sentiment classification. The code is available in this colab notebook:

from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, GlobalAveragePooling1D, Dense
from keras.datasets import imdb
import numpy as np

# Constants for data preprocessing
max_length = 256 # Maximum length of the sequences
padding_type = 'post' # Padding type for sequences shorter than the maximum length
vocab_size = 10000 # Size of the vocabulary used in the Embedding layer

# Load the IMDB dataset
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=vocab_size)

# Helper function to preprocess data
def preprocess_data(data):
return pad_sequences(data, maxlen=max_length, padding=padding_type)

# Preprocess the data
train_data = preprocess_data(train_data)
test_data = preprocess_data(test_data)

# Define the model architecture
def build_model(vocab_size, embedding_dim=16, hidden_units=16):
model = Sequential([
Embedding(vocab_size, embedding_dim, input_length=max_length),
GlobalAveragePooling1D(),
Dense(hidden_units, activation='relu'),
Dense(1, activation='sigmoid')
])
return model

# Build and compile the model
model = build_model(vocab_size)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

# Train and evaluate the model
history = model.fit(train_data, train_labels, epochs=10, batch_size=32, validation_data=(test_data, test_labels), verbose=2)
test_loss, test_acc = model.evaluate(test_data, test_labels, verbose=2)
print(f"Test Accuracy: {test_acc}, Test Loss: {test_loss}")

# Decode review function
word_index = imdb.get_word_index()
def decode_review(encoded_review):
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
return ' '.join([reverse_word_index.get(i - 3, '?') for i in encoded_review if i >= 3])

# Display incorrect predictions
def display_incorrect_predictions(test_data, test_labels, predictions, num_examples=3):
predicted_classes = (predictions > 0.5).astype(int)
incorrect_indices = np.where(predicted_classes.flatten() != test_labels)[0]
for i, idx in enumerate(incorrect_indices[:num_examples]):
print(f"Incorrect Prediction {i+1}:")
print(f"Review: {decode_review(test_data[idx])}")
print(f"Actual Sentiment: {'Positive' if test_labels[idx] == 1 else 'Negative'}")
print(f"Predicted Sentiment: {'Positive' if predicted_classes[idx][0] == 1 else 'Negative'}")
print("--------------------------------------------------------------------------------\n")

predictions = model.predict(test_data)
display_incorrect_predictions(test_data, test_labels, predictions)

# Predict sentiments for sample reviews and display them
def predict_and_display_reviews(reviews):
sequences = [[word_index.get(word, 2) for word in review.lower().split()] for review in reviews]
padded_sequences = preprocess_data(sequences)
sample_predictions = model.predict(padded_sequences)
sample_predicted_classes = (sample_predictions > 0.5).astype(int)
for i, review in enumerate(reviews):
print(f"Review {i+1}: {review}")
print(f'Predicted Score: {sample_predictions[i]}')
print(f"Predicted Sentiment: {'Positive' if sample_predicted_classes[i][0] == 1 else 'Negative'}")
print("--------------------------------------------------------------------------------\n")

# Sample movie reviews
reviews = [
"This movie was an excellent portrayal of character development and had stellar acting.",
"I found the movie to be predictable with a lackluster script.",
"The cinematography was magnificent, and the pacing was perfect. Highly recommend watching.",
"It was a terrible movie that wasted two hours of my life. The plot made no sense.",
"An absolute masterpiece, with a gripping story and profound performances."
]

predict_and_display_reviews(reviews)

Output:

Test Accuracy: 0.8620399832725525, Test Loss: 0.45465412735939026
782/782 [==============================] - 1s 2ms/step
Incorrect Prediction 1:
Review: i generally love this type of movie however this time i found myself wanting to kick the screen since i can't do that i will just complain about it this was absolutely idiotic the things that happen with the dead kids are very cool but the alive people are absolute idiots i am a grown man pretty big and i can defend myself well however i would not do half the stuff the little girl does in this movie also the mother in this movie is reckless with her children to the point of neglect i wish i wasn't so angry about her and her actions because i would have otherwise enjoyed the flick what a number she was take my advise and fast forward through everything you see her do until the end also is anyone else getting sick of watching movies that are filmed so dark anymore one can hardly see what is being filmed as an audience we are involved with the actions on the screen so then why the hell can't we have night vision
Actual Sentiment: Negative
Predicted Sentiment: Positive
--------------------------------------------------------------------------------

Incorrect Prediction 2:
Review: hollywood had a long love affair with bogus nights tales but few of these products have stood the test of time the most memorable were the jon hall maria films which have long since become camp this one is filled with dubbed songs and slapstick it's a truly crop of corn and pretty near today it was nominated for its imaginative special effects which are almost in this day and age mainly of trick photography the only outstanding positive feature which survives is its beautiful color and clarity sad to say of the many films made in this genre few of them come up to alexander original thief of almost any other nights film is superior to this one though it's a loser
Actual Sentiment: Negative
Predicted Sentiment: Positive
--------------------------------------------------------------------------------

Incorrect Prediction 3:
Review: ed mitchell is a teenager who lives for his job at good a small but friendly neighborhood stand while his buddy thompson also works there but lack single minded devotion to his job he's there because he accidentally destroyed the car of his teacher mr and has to raise money to pay the when a fast foot chain opens across the street it looks like good is history until ed a secret that brings hundreds of new customers to their door however the manager of kurt jan is determined to get his hands on the and put good out of business meanwhile ed and must rescue the world's oldest fast food employee from the demented hills asylum and ed might just find love with jackson if he could take his mind off the long enough to pay attention to her good is a comedy directed for kids decent story acting and overall a pretty harmless kids movie
Actual Sentiment: Negative
Predicted Sentiment: Positive
--------------------------------------------------------------------------------

1/1 [==============================] - 0s 19ms/step
Review 1: This movie was an excellent portrayal of character development and had stellar acting.
Predicted Score: [0.79064614]
Predicted Sentiment: Positive
--------------------------------------------------------------------------------

Review 2: I found the movie to be predictable with a lackluster script.
Predicted Score: [0.6444884]
Predicted Sentiment: Positive
--------------------------------------------------------------------------------

Review 3: The cinematography was magnificent, and the pacing was perfect. Highly recommend watching.
Predicted Score: [0.2016313]
Predicted Sentiment: Negative

This code demonstrates how to preprocess data, define a neural network architecture, and use it for sentiment analysis.

Conclusion

Sentiment analysis is a powerful tool in the arsenal of data scientists, offering insights into the public’s perceptions and feelings. By leveraging Keras/TensorFlow and the IMDB dataset, we’ve shown how to build and train a model capable of classifying sentiments in movie reviews. The field of sentiment analysis is vast and continuously evolving, with new techniques and models emerging regularly. This example serves as a starting point, and there’s much more to explore and experiment with in this exciting domain.

--

--