Sentiment analysis on smartphones reviews (LSTM model)

Published in

Analytics Vidhya

5 min readDec 1, 2020

I have recently learned NLP as a part of my Deep learning course.

So, I decided to write a blog on Sentiment analysis on smartphone reviews which are scrapped from amazon.in, which would be very useful to beginners to explore NLP domain.

In this blog we will use a LSTM model to train the the text and it is a many to one model which inputs 2 or more values and outputs a value.

Contents-

Scrapping reviews
preprocessing and EDA
Training the NLP model

Technologies used- Python, Tensorflow, Seaborn, BeautifulSoup

Scrapping Reviews

I have used Beautifulsoup for scraping reviews.

First we have to get ASIN numbers of all smartphones we want.

Amazon ASIN: What is An ASIN Number?

Amazon uses ASINs to manage their ever-growing product catalog. These unique product identifiers are an important part…

www.nchannel.com

I wrote some helping functions :

getAmazonSearch: take search query and page number to return HTML page
Searchasin: take ASIN number to return Product page
SearchReviews: take All reviews link to return Reviews page of product

Function to extract ASIN numbers:

data_asin[:5]
output:['B07SDPJ4XJ', 'B089MQ622N', 'B07X4R63DF', 'B07WPVLKPW', 'B086KCCMCP']

Then, By passing the data-asin numbers, we will go to the product page and get “see all reviews” link.

Using these “see all reviews” link and setting page number will we scrap all the reviews (mobile name, review title, body, stars ) and save them to a CSV file.

Now, we completed Scrapping reviews. Next to preprocess and visualize the information.

https://github.com/msiddhu/sentiment-analysis_on_phone-reviews/blob/main/reviews-scraping.ipynb

Preprocessing and EDA

Now, we have to do data cleaning.

The data contains noise like emojis, numbers, frequently used words(is,the, for), blank spaces. We have to clean them and convert all sentences to lower case characters for training to be done easily.

Example:
RAW DATA:Nice phone from Samsung in this price. Display is good . Camera is not awesome but average. Battery will last 1 day with normal usage. N it has all necessary features. I got this for 8999 . So good phone under 10 k.As samsung so last for years.
FILTERED: nice phone samsung price display good camera awesome average battery last 1 day normal usage necessary features got good phone 10 kas samsung last years

Using this filter_text() function we can clean all the data

Lets, visualize which words in the text are most commonly used:

By this we could see that most commonly used words like battery life,value money ,camera quality, don’t buy are some crucial bigrams for deciding the rating of smartphone.

Plot a graph for distribution of ratings:

Rating 1 or 2 — Negative

Rating 3 — Neutral

Rating 4 or 5 — Positive

https://github.com/msiddhu/sentiment-analysis_on_phone-reviews/blob/main/preprocess-and-eda.ipynb

Training the LSTM model for Sentiment analysis

And then download GloVe vectors which are pre trained on large text corpus and provide word-word co-occurrence. Simply, the words with same meaning or words provide similar conclusion have similar GloVe vectors.

We are using GloVe vec of 50-dimensions which are downloaded from kaggle.

glove.6B.50d.txt

Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data…

www.kaggle.com

Steps for preparing,

Read csv file
Split the data into train data and test data.
Find the avg. length of sentences.
Define HyperParameters

Now, we have to create an embedding layer to convert the sentences into number vectors

The functions pretrained_embedding_layer, sentences_to_indices, read_glove_vecs are taken from Sequence Models course of Coursera.

h1_func

Output:
the index of good in the vocabulary is 164328 the 50030
the word in the vocabulary is al-gama

And the sentences_to_indices function,

Input:
“the phone is good”,
“very bad”
“no star rating”
Output:
[357266, 283483, 192973,164328.],
[377946, 65963]
[262350, 341678, 301038]

Build the 2-Layer LSTM model using keras which is very easy comparatively than other frameworks.

Text to indices process

1.convert to numpy

2.sentences_to_indices

3. pad the sequences

Now finally, Compile and train the model:

model.fit(X_train_indices, Y_train_oh, epochs = epochs, batch_size = batch_size, shuffle=True)

Testing the model

Take the train, test data get the predictions and print the accuracy score, and cohen kappa score.

Output:
Test Cohen Kappa score: 0.993
Test Accuracy score : 0.986
Train Cohen Kappa score: 0.996
Train Accuracy score : 0.991

The training and testing is over, we will plot some wordCloud positive reviews

common words: phone price, fast charging, good phone, battery life .

Note: There are some words that should not be there such as “galaxym21, samsung galaxy, redmi note”

Bigrams

Bigrams will give a very clear view which adjacent words lead to bad or good reviews.

Project Link:

msiddhu/sentiment-analysis_on_phone-reviews

Sentiment Analysis using LSTM model on the smartphone reviews. Which are are scarped from amazon.in . GitHub is home to…

github.com

So, I think you have understood my explanation.

Feel free to contact me if you have doubts regarding this article.

Siddhartha Malladi - CVR College of Engineering, Hyderabad - Hyderabad, Telangana, India | LinkedIn

View Siddhartha Malladi's profile on LinkedIn, the world's largest professional community. Siddhartha's education is…

www.linkedin.com

msiddhu - Overview

Undergraduate | Machine learning enthusiast | Android App Developer | Interested in Competitive Programming and…