Analyzing E-commerce Reviews: A Practical Guide to Sentiment Analysis for Amazon and Shopee

A project involving collection of data, transforming it, deriving insights and then predicting the Sentiment, encompassing the whole Data Lifecycle Process.

Utkarsh Lal
Geek Culture
4 min readFeb 22, 2022

--

Photo by Sunrise King on Unsplash

INTRODUCTION

This project is based on a very common NLP use-case — Sentiment Analysis of Product Reviews. A web-app has been developed which scrapes product reviews off of e-commerce websites like Amazon and Shopee and then performs Sentiment Analysis of the reviews to find if the reviews are positive or negative. This app was designed as a protoype to analyze product reviews of a particular product available online, in order to find out the reason behind the negative reviews in the APAC market.

The whole code is available in my GitHub repo.

Methodology

1. Webapp

The workflow of the web-app is available below : -

The UI of the app is as follows : —

Step 1: In the ‘Crawler’ tab, any URL of a product page can be pasted from Amazon or Shopee.

Step 2: Choose the particular retailer and Click on ‘Get Reviews’

Step 3: Click on Refresh. A new job will be queued in the table given below. When the ‘Status’ says done then click on the Review csv file generated. This will download the file to the local machine. Note: This might take a few minutes.

Step 4: Go the ‘Sentiment Analysis’ tab now. Click on choose file and upload the downloaded csv file. Click ‘Submit’.

Step 5: Download the output csv file.

The final csv file will have a new column with Positive/Negative labels for each review.

Alternatively, a textual review can be directly pasted in the Text Box given too. After clicking ‘Get Sentiment’, it will give the ouput as Positive or Negative.

Few things to note

  1. Redis Queue management has been employed which queues the ‘get review’ requests in the table given in the above snapshots.
  2. A new Database needs to be created with the following table schema. Also the database credentials need to be specified in the “mysql_connection.py” and “main.py”(inside fetch function) files given in the GitHub repo.

2. Sentiment Analysis Model

The best part of this app is that anyone can incorporate their own trained Machine Learning model. The model exists in pickle file called ‘model.pkl’ in the above mentioned GitHub repo. It is a Naive Bayes classifier, which has been trained on Kaggle Amazon product reviews data linked here.

The Sentiment Analysis follows the following model methodology : —

  1. Data Preprocessing: Filtering out the irrelevant data.
  2. Noise Removal: Removing stop-words from the reviews, except negation words like ‘not’, “n’t”, ‘un-’ etc. This is done in order to preserve the polarity of the sentence.
  3. Negation Handling: Handling negations increases the accuracy of Sentiment classification by a big margin as highlighted in the following article worth checking out : -

4. Feature Extraction: In this project, TF-IDF vectorizer with uni+bigrams have been used to extract features and represent them as a sparse matrix.

5. Training the model: Train test split function is used to divide the dataset into a ratio of 70:30. Naive Bayes classifier is used for machine learning.

6. Validation: Validation can be done by the accuracy_score funciton from sklearn.metrics.

Instructions to run the web-app on LINUX

  1. Redis Queue Managemant command: Open a terminal and run ‘rq worker’
  2. Run Flask APP command: Open another terminal and run ‘export FLASK_APP=app.main:app && flask run — reload’
  3. Create a MySQL database with one table

The table Schema:

4. Specify the db credentials in “mysql_connection.py” and “main.py”(inside fetch function) files present in the GitHub repo mentioned above.

Future Scope

  1. Topic Modelling using Latent Dirichlet Allocation or other techniques can be incorporated to find out the most frequently occouring negative features.
  2. Word Embedding can be used to enhance the Sentiment Classification as well as Negation Handling.

Any further modifications in the app are very welcome. Please ping me on my LinkedIn for any help or collaboration. Thanks for reading.

GitHub link.

--

--

Utkarsh Lal
Geek Culture

Product Development Analyst at American Express