Sentiment Analysis of Amazon Reviews Using Natural Language Processing

11 min readFeb 8, 2024

Reviews serve as the lifeblood of every business, offering invaluable insights into customer satisfaction, preferences, and areas for improvement.

In today’s digital age, where consumers wield unprecedented influence through online platforms, understanding and interpreting these reviews is paramount for businesses seeking to thrive in competitive markets.

However, with the large volume of reviews generated daily, manually analyzing them becomes impractical. This is where sentiment analysis emerges as a crucial tool, enabling businesses to extract actionable intelligence from the vast sea of customer feedback.

By deciphering the underlying sentiments expressed within reviews, sentiment analysis empowers businesses to make informed decisions, enhance customer experiences, and ultimately, drive success.

Project Overview

This project aims to perform sentiment analysis on Amazon reviews using two different approaches: VADER (valence-aware dictionary and sentiment Reasoner) and a pre-trained RoBERTa (Robustly optimized BERT approach) model..

Overview of Dataset

I downloaded the dataset from Kaggle: https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews

The dataset consists of Amazon product reviews, containing the following fields:

Review ID: A unique identifier for each review.
Product ID: A unique identifier for the product being reviewed.
User ID: A unique identifier for the user who wrote the review.
Profile Name: The name of the user who wrote the review.
Helpfulness Numerator: The number of users who found the review helpful.
Helpfulness Denominator: The total number of users who indicated whether the review was helpful or not.
Review Score: The star rating given by the user, ranging from 1 to 5.
Timestamp: The timestamp of when the review was posted.
Review Summary: A summary of the review content.
Review Text: The main body of the review, contains detailed feedback and opinions.

The dataset contains a total of 568,454 reviews but only 500 were selected for analysis in this project.

Technologies Used

Python, libraries

Pandas
Matplotlib
Seaborn
NLTK (Natural Language Toolkit)
Transformers
Torch
TensorFlow
Flax

Data Exploration and Processing

Import relevant libraries

pandas, numpy, matplotlib, and seaborn are commonly used libraries for data manipulation, numerical computation, and visualization.

The line plt.style.use('ggplot') sets the plotting style to emulate the visual aesthetic of the popular R package, ggplot2. This style choice results in plots with a distinctive appearance characterized by bold lines, a gray background, and a combination of colorful elements.

nltk is the Natural Language Toolkit, which provides tools for natural language processing tasks.

2. Read the data

Here is an overview of the data frame ;

3. Data Pre-processing

Dataset size

The dataset has 568454 rows and 10 columns

Limit the size of the dataset for faster processing to a number of your choice, I chose 500. If you have the time and resources of course you can continue with the whole data set.

4. Exploratory Data Analysis

Generate a bar plot showing the distribution of review scores, this helps to visualize the distribution of reviews based on star ratings.

Result

This shows that most of the reviews are positive, and the negative reviews are very few.

Sentiment Analysis

Basic NLTK Sentiment Analysis

Download necessary NLTK resources such as lexicons, tokenizers, and part-of-speech taggers.

How it works

First extract one review from the 500 review that we will analyze as an example

This code prints out the text content of the review located at index 49 in the ‘Text’ column (which contains the review text) of the DataFrame df.

Tokenize the content and slice it for faster processing

tokens = nltk.word_tokenize(example): This line tokenizes the text content of the review stored in the variable example.

Tokenization is the process of breaking down a text into individual words or tokens.

The word_tokenize function from NLTK is specifically used to tokenize text into words.

tokens[:10]: This line slices the list of tokens obtained from tokenization to display only the first 10 tokens.

The [:10] syntax is used to specify that we want to display elements from index 0 to index 9 (the first 10 elements) of the tokens list.

This code therefore essentially tokenizes the text content of a review and then displays the first 10 tokens obtained from the tokenization process.

The next step is Part-of-speech (POS) tagging, which is used to assign a part-of-speech tag to each tokenized word in the review.

POS tagging plays a crucial role in various NLP tasks by providing linguistic insights and facilitating the analysis and interpretation of textual data.

This code performs part-of-speech tagging on the tokenized words of the review and then displays the part-of-speech tags assigned to the first 10 words.

Lastly is Name Entity Recognition, the process where named entities in the text are identified and classified.

This is important because it helps in extracting specific entities such as persons, organizations, locations, dates, and more from text data.

These entities often carry significant meaning and context within the text, and extracting them can provide valuable insights for various natural language processing tasks, such as information extraction, question answering, and knowledge graph construction.

First Model: VADER

1. Initialize Vader Sentiment Analyzer

Breakdown

from nltk.sentiment import SentimentIntensityAnalyzer: This line imports the SentimentIntensityAnalyzer class from the nltk.sentiment module. The SentimentIntensityAnalyzer is a pre-trained model included in NLTK for performing sentiment analysis on text data.

from tqdm.notebook import tqdm: This line imports the tqdm library, which provides a fast, extensible progress bar for Python and wraps around the iterable object to provide a progress indicator during iterations.

The .notebook the module is specific to Jupyter Notebook environments.

sia = SentimentIntensityAnalyzer(): This line creates an instance of the SentimentIntensityAnalyzer class and assigns it to the variable sia. This analyzer is capable of analyzing the sentiment of text data by assigning polarity scores, such as positive, negative, neutral, and compound scores, to each piece of text.

2. Apply VADER Sentiment Analysis on the dataset

This code snippet iterates over each row in the DataFrame df, retrieves the text content of each review, and performs sentiment analysis using the SentimentIntensityAnalyzer (sia). It then stores the sentiment scores for each review in a dictionary called res, with the review ID ('Id' column) as the key.

Remember : The purpose of the tqdm() function is to wraps around the iterator and display a progress bar during the iteration, making it easier to track the progress.

3. Plot VADER results

Plot a data frame for the results and merge it with the original data frame.

Dataframe

In VADER (Valence Aware Dictionary and sEntiment Reasoner), the terms neg, pos, and neu represent different aspects of sentiment expressed in a piece of text. These aspects are determined based on the intensity of positive, negative, and neutral sentiments present in the text, respectively.

neg (Negative Score): The neg score indicates the proportion of negative sentiment present in the text. It represents the extent to which the text expresses negative emotions, such as anger, sadness, or frustration. The value neg ranges from 0 to 1, with higher values indicating a greater degree of negativity.
pos (Positive Score): The pos score indicates the proportion of positive sentiment present in the text. It represents the extent to which the text expresses positive emotions, such as happiness, joy, or satisfaction. The value pos also ranges from 0 to 1, with higher values indicating a stronger positivity.
neu (Neutral Score): The neu score indicates the proportion of neutral sentiment present in the text. It represents the extent to which the text is neutral or lacks a strong emotional polarity. A higher neu score suggests that the text contains more neutral language and less emotional content. Similar to neg and pos, the value neu ranges from 0 to 1.
the compound score is a single value that represents the overall sentiment polarity of a piece of text. It takes into account both the positive and negative sentiment scores, along with their intensities, to provide a comprehensive assessment of the text's sentiment. It ranges from -1 to 1.

4. Visualize VADER results

Plot a bar graph to visualize the results

Bar Graph

This indicates that most of the reviews are positive and with a high sentiment polarity in positivity while the negative reviews are few and have low sentiment polarity .

Sub-plot for each category

Create a subplot with three bar plots showing the distribution of positive, neutral, and negative sentiment scores across different review scores.

Sub-plots

Second Model: Roberta Pre-trained Model

Install necessary packages and dependencies required for using the Roberta pre-trained model.

Breakdown

The code begins by using the !pip install command to install several Python packages. These packages include:

torch: PyTorch, a popular open-source machine learning library.
tensorflow: TensorFlow, another widely-used machine learning library developed by Google.
flax: Flax, a neural network library that is tightly integrated with JAX, a high-performance numerical computing library.
tensorflow-intel: An optimized version of TensorFlow for Intel architectures.
ml-dtypes==0.2.0: A specific version of the ml-types library.

After installing the required packages, the code imports specific modules from these packages. These modules are necessary for working with transformer-based models and performing sequence classification tasks.

AutoTokenizer and AutoModelForSequenceClassification are classes from the transformers library. These classes allow for easy loading of pre-trained transformer models and their corresponding tokenizers for various natural language processing (NLP) tasks.
softmax is a function from the scipy.special module. It calculates the softmax function, which is commonly used to convert raw scores into probabilities.

2. Initialize the Roberta Model

Define the Model: The variable MODEL specifies the name of the pre-trained model to be loaded. In this case, it's "cardiffnlp/twitter-roberta-base-sentiment", which refers to a RoBERTa model fine-tuned on Twitter data for sentiment analysis.
Load the Tokenizer: The AutoTokenizer.from_pretrained() function is used to load the tokenizer associated with the specified pre-trained model. The tokenizer is responsible for converting raw text input into a format suitable for the model to process. By using AutoTokenizer, the appropriate tokenizer for the specified model is automatically selected based on the model's name.
Load the Model: Similarly, the AutoModelForSequenceClassification.from_pretrained() function loads the pre-trained model for sequence classification. This model has been fine-tuned on sentiment analysis tasks and is capable of classifying the sentiment of a given text sequence into categories like positive, negative, or neutral.

By executing these lines of code, we have a pre-trained RoBERTa model and its tokenizer, ready for use in sentiment analysis. We will then input our data into the tokenizer, pass the tokenized input to the model for inference, and obtain predictions about the sentiment of the input text.

3. Run Roberta Model on Data

This code iterates over each row of the DataFrame df, where each row represents a review. For each review, it performs sentiment analysis using both VADER (Valence Aware Dictionary and sEntiment Reasoner) and the pre-trained RoBERTa model.

Here's a breakdown of what it does:

Initialization

res = {}: Initializes an empty dictionary res to store the results of sentiment analysis for each review.

Iterating Over DataFrame

for i, row in tqdm(df.iterrows(), total=len(df)):: Iterates over each row in the DataFrame df, where tqdm is used to create a progress bar to track the iteration progress.

Sentiment Analysis:

text = row['Text']: Retrieves the review text from the current row.

myid = row['Id']: Retrieves the unique identifier (ID) associated with the review.

vader_result = sia.polarity_scores(text): Performs sentiment analysis using VADER on the review text, which returns a dictionary containing sentiment scores.

vader_result_rename = {}: Initializes an empty dictionary to store the VADER scores with modified keys.

for key, value in vader_result.items():: Iterates over the items (key-value pairs) in the VADER result dictionary.

vader_result_rename[f"vader_{key}"] = value: Modifies the keys of the VADER result dictionary by prefixing them with "vader_" and stores them in vader_result_rename.

roberta_result = polarity_scores_roberta(text): Calls the polarity_scores_roberta function to perform sentiment analysis using the RoBERTa model on the review text, which returns a dictionary containing sentiment probabilities.

both = {**vader_result_rename, **roberta_result}: Combines the VADER and RoBERTa sentiment analysis results into a single dictionary named both.

res[myid] = both: Adds the combined sentiment analysis results to the res dictionary, using the review ID as the key.

Exception Handling:

except RuntimeError:: Catches any runtime errors that occur during sentiment analysis.

print(f'Broke for id {myid}'): Prints a message indicating which review ID caused the runtime error.

Overall, this code performs sentiment analysis on each review text in the DataFrame using both VADER and the RoBERTa model, and stores the results in a dictionary for further analysis.

It handles exceptions gracefully by printing an error message if any issues occur during sentiment analysis.

3. Analysis Results

This shows that both models have close to similar results, with a high number of positive reviews and very few negative reviews

Create a data frame to plot the results

Data frame

Comparison and Evaluation

Plot a pair plot to compare results

This code generates a pair plot using Seaborn’s pairplot function to visualize the relationship between sentiment scores predicted by VADER and the RoBERTa model, with respect to the review scores.

Pair Plot

Conclusion

Summary of Findings

Both VADER and RoBERTa models showed similar results, with a high number of positive reviews and very few negative reviews.
The sentiment analysis revealed that most of the reviews in the dataset were positive, indicating overall satisfaction with the products.

Limitations and future improvements

Limited Dataset Size: The analysis was performed on a subset of the dataset (500 reviews). Utilizing the entire dataset could provide more comprehensive insights.
Model Performance: While both VADER and RoBERTa models performed well, there is always room for improvement in model accuracy and generalization.
Additional Features: Incorporating additional features such as reviewer demographics or product categories could enhance the analysis and provide deeper insights into customer sentiment.

View my code on Github, here.