Amazon Phone Review Text Analysis

11 min readJun 13, 2020

Written by Huhao(Howard) Zhu, Shengze(Chris) Xiao, Quan(Serina) Qiao, Shiyao (Stephanie) Chen, Yongyuan(Isaac) Wan

Scope of the article

We have tons of models and techniques that can be performed for the review dataset. However, we decide to limit the approach and focus on text analysis of customer reviews. This article is to identify the key features of products and help businesses understand their current product market standing.

Why is online review important?

Online review is important to business as it serves as a key to building credibility. In general, 88% of customer trust reviews as personal recommendations. Reviews will affect customer buying decisions as over 90% of people rely on reviews when they make buying decisions. Moreover, customer reviews are a great source of “Voice of the customer” and could offer tremendous insights into what customers like and dislike about a product or service. Most important is that negative review is devastating for business as 86% of people will hesitate to purchase a product that has too many negative reviews.

About Dataset

The dataset we will be using for this post is from UCSD dataset and is from focus on Amazon cell phone review which was made between 2007–2014. This dataset has 100 products which consist of 7 different brands (HTC, LG, Nokia, Samsung, Sony, Apple, BlackBerry). This dataset includes 10,063 reviews and 31 features variables. Each row corresponds to a customer review. Some highlight variables are:

asin — ID of the product, e.g. 0000013714
helpful — helpfulness rating of the review, e.g. 2/3
reviewText — text of the review
overall — product rating
summary — summary of the review
reviewTime — review data
title — name of the product
price — price in US dollars
brand — name of the brand
categories — list of categories the product belongs to

Data Mining

This is the way that we get data from the internet, how we divide the data and how we combine the two datasets.

Data Cleaning

We class the review names as nlp.reviewText. We cleanup the reviews by dropping na, replace the \n and all the special characters into space. The clean review text is stored into the column call ‘clean_up_review’.

Data Visualization

First is a preliminary look at the price distribution by each brand.

Figure 1 tells us the price of the majority of the products is below $300. In detail, most of Samsung’s phones have a relatively low price (around $100- $450). In contrast, we can notice that Apple has a generally higher price as the majority of its price distribution is over $500.

Figure 2 shows the rating distribution at different prices. Since most of the reviews price is less than $500, the shape of this figure is basically the same as figure 1. In detail, we can notice almost every single price consists of a different score of ratings, but the majority of the rating score is 1 or 5. In this case, this figure implies that customers tend to leave comments for products with highly satisfied or dislike it at all.

To find out which brand consists of the most review rating, we did a visualization for rating distribution by different brands. From figure 3, Samsung has the most of the rating review. Rating score 5 is the most frequent rating among all kinds of brands.

As we can see from Figure 4, a major chunk of the reviewers have given the review in about 0–99 characters. As the number of characters( the length of the review) increases, the number of reviewers decreases consistently. This just shows that users simply don’t find the need as well as time to give in a detailed description of their experience.

Figure 5 shows the average price and overall rating for each brand. The graph indicated that Apple has the highest average price but the lowest rating. In other words, Nokia has the lowest average price but the highest rating. As for Apple, people are getting more cost-sensitive as price goes up, customer’s ratings of the products are not proportionally increased.

In figure 6, in order to see what’s the difference between different brands, we created two word-clouds for Nokia and Apple. First, we create a unique data frame for Nokia and Apple. Then we fit the review into WordCloud and plot them. As you can see, most of the words for Nikia are positive words such as fantastic, Awesome, perfect, etc. As for Apple, most of their words are related to “price” such as overpriced, price.

Methodology

We first import all the packages necessary for word processing and prediction algorithms. In this analysis, we decided to use both logistic regression and random forest to classify the satisfaction level of the customer on their product.

We first set the reviews with a rating above 3 as positive review and below 3 as a negative review, Then drop the 3 since the neutral review may confuse the model. Furthermore, we drop the unknown rating and create a dataframe with only review text and variable target, where the target is a binary variable representing positive and negative reviews.

Result:

Then we start to split data into train and test dataset, where the training dataset contains 70% of the observations and testing dataset contains 30%.

Then transfer the cleanup review with the CountVectorzier and fit in the logistic regression model and random forest model. We chose the Trigram (Ngram_range = (3,3)) which the review texts are becoming 3-word combination vectors. This is because we want to get more information from the feature selection. We tried using unigram and bigram but both of them will not give meaningful features compared to trigram. Also, we had further explored to an even higher dimension, but they don’t provide additional information than trigram and reduce the accuracy. For stopwords, we decided to use English but updated some useless adjectives(love, great, best, etc). For the logistic regression, we first use the for loop to test the optimal parameter(C value for logistic regression and min df for CountVectorizer)

From the for loop selection, we are getting the optimal parameters is C =6 and min_df = 2. We use these parameters to rerun the model with the same structure, then use eli5 to show the feature.

The larger positive weight or smaller negative weight means that the feature is more significant to predictive the positive and negative reviews. We select some of the features to identify what function is important for a phone from a customer’s perspective.

Here is the feature is shown by the logistic regression model:

Then we will imply the same method on the random forest model, the parameter we set is n_estimator = 300, random_state = 50.

Then we still use the eli5 to show the feature of it:

Customer Satisfaction Prediction Model

The next step is to use the algorithm to classify the satisfied reviews and unsatisfied reviews. To predict customer satisfaction, we also assume the positive review means the customers are satisfied with the product, and negative review means unsatisfied.

In both algorithms, we apply both count and tfidfvectorizer with ngram_range = (1,3), which is all possible unigram, bigram, and trigram. Also, we use stop_word = English and lowercase = true.

To obtain the best parameter for the model, we ran for-loop on different parameters. For logistic regression, we test the different c values in the range between 0.5 to 10 by 0.5, and for the random forest, we test the different n estimator in range between 100 to 1000 by 100. In both count vectorizer and Tfidfvectorizer, we test the different min_df in range between 1 to 10 by every 1 unit.

The example below is the for loop using logistic regression with tfidf vectorizer.

Results:

One more example for using for-loop on Random forest with the count vectorizer.

Results:

To find the optimal model, print all the max test f1 scores for each of the 4 for loops, compare which model has the highest accuracy.

The best accuracy model is logistic regression with tfidf vectorizer. The parameter is c = 7 for logistic regression and mid_df = 2 for tfidf vectorizer.

Comparing

First, we have to tokenize words. Then we have to normalize the word by using stopwords and lower, also we use pos_tag to get the adjective and adverb only. After we get the list of words, we use word2vec to find similar words for the keywords we want to know. Since there are too many words, we decided to choose similar words which have appeared more than 5 times to get more accurate similar words.

After fitting in the model, we will find out the most_similar words of the features we get from the previous code. Then we write those features with the word “good” to get the good adjectives about those features. The features we choose are: battery, screen, camera, signal(should be sim card but it doesn’t provide much information so we switch to signal, same with sd card(change to memory).

Combining previous feature selection analysis, we find out a list of basic features and customer expectations on them.

Clustering

To find the similarity between those brands, and help the company to find their potential competition in the cell phone market, We decided to use a graph to show which similarity adjective words will the customer use to describe the phone. First of all, we choose our 6 brands and put them in a graph to get their position. Then we put 250 words into the graph. Then we needed to divide the cluster by looking up the graph.

This above graph shows that customers will use almost the same adjective words to describe the iPhone and LG, Samsung and Nokia, also HTC and Sony. The only outlier for the brand is blackberry, which pretty makes sense since we know blackberry is still using the pad in their product while other brands have already given up with it.

Challenges

Quality of evaluation: since all of the reviews are from Amazon, they are not professional enough. The reviewers may not give the comments of phones in a technical way.

Comparison: many Customers use other brands as comparison and describe their good features, which made it hard for our team to classify whether it is a good comment or not.

Typo: there are a lot of typos in the review text, such as a customer wants to type “weak” but type incorrectly as “week.”

Sarcasm: people like to use sarcasm in negative comments such as customers dislike the product but they will comment as “I really “love” the product”. In this case, the Natural Language Process is hard to recognize sarcasm.

Insights

Companies should develop their products continuously based on customer reviews and the changes made by the competitors. For example, some reviews stated that the battery of Samsung’s phones is terrible compared to Huawei. In this case, Samsung needs to improve its battery, in case their customers choose Huawei for better battery life. Online reviews help companies to better understand where they are in the market. The reviews represent how customers feel about the products. It’s a good way for companies to notice where they did good and where not.

When the basic features are all at the same level, a fancy feature could help products to stand out. For instance, when “Sari” first came out, customers were so shocked and were crazy about the iPhone back at that time.

Future steps

Our dataset is only the review from amazon between 2007–2014. Our next step will be to analyze more current phone models (such as iPhone 11, galaxy note10 etc.) review on the platform such as Bestbuy, official websites and do the same analysis from those items. In addition, technique features such as processor, built in memory will also be included in our future analysis.