Understanding customer reviews with machine learning

6 min readJul 28, 2019

In a recent episode of one of my favourite podcasts, the guest, Ellen Loeshelle, discussed software for analyzing customer reviews and classifying sentiment on companies’ products. Why is this important? As technologies advance and industries become increasingly competitive, focus on customer experience becomes critical to maintaining loyalty and driving long-term business strategy. Companies must quickly identify where they are succeeding with customers and where there are opportunities to improve.

Inspired by the podcast episode, I set out to build my own classification model for analyzing customer reviews and classifying them as either (i) positive, (ii) negative, or (iii) neutral. However, rather than implementing a rule-based approach to classify reviewer sentiment, as proposed by Loeshelle, I used machine learning (ML) algorithms such as word2vec and random forest classification. Overall, the model successfully identified negative reviews with 92% accuracy, and achieved an overall accuracy of 71%.

Acquiring Data

Now, more than ever before, conversations between businesses and customers are open dialogues. Businesses use platforms such as Facebook, Twitter and Instagram to distribute company and product information to vast audiences. Customers, in-turn, use these platforms to react, share and voice their opinions on companies and said products. For large organizations offering a wide range of products and services, reading every review from a myriad of sources including online forums, emails, social media, and call-center transcripts is not feasible. However, using Natural Language Processing (NLP) concepts and a large corpus of customer reviews, ML algorithms can be trained to automatically classify customer feedback.

For this project, I used an open dataset containing 15,000 labeled customer reviews of airline companies, collected from users on Twitter.

The sentiment of the customer reviews is shown in Figure 1 below. Apart from Virgin America, customer sentiment on the airlines is imbalanced and over-represented with negative reviews. Imbalanced datasets are problematic for training ML algorithms, as there is insufficient data to model under-represented labels (in this case, ‘neutral’ and ‘positive’ reviews). In this iteration, I do not implement any techniques to deal with class distribution, however, this would be useful for improving sensitivity of the model.

**Figure 1 |** Visualization of the sentiment of six airline companies by 15,000 twitter users. Source: https://www.kaggle.com/crowdflower/twitter-airline-sentiment

Preprocessing

Characters which appeared in the Tweets but would be problematic for training the model were removed. This included html tags, and non-letter characters such as emojis.

Stop-words and filler-words such as “the”, “and”, “I” etc, were also removed. These words occur frequently but do not indicate positive or negative sentiment and are not useful in training the model. Following this cleaning process, the most frequently used words in the dataset are displayed in Figure 2 below.

**Figure 2 |** Most frequently words used in customer reviews of airlines.

From words to vectors with Word2Vec

A major challenge of analyzing free-flowing text is the inherent lack of structure. In order to utilize ML algorithms we must first convert the unstructured text into vectors. Word2vec is an algorithm developed by Google which converts text from a large corpus into vectors. It first constructs a vocabulary from the training text data and then learns vector representation of words. Vectors of words with similar meaning appear close together and as a result, words and the resulting word-vector model can be used as a feature in many NLP and ML applications.

Visualizing

Word2vec is commonly implemented with 300 dimensional vectors. Unfortunately, those are too many dimensions for people to visualize. Fortunately, we can use the t-SNE algorithm to visualize the word-vectors in 2-D.

**Figure 3 |** 2D t-SNE representation of word2vec model trained using airline customer reviews.

Examining Figure 3, we observe that clusters containing similar words are formed. The word2vec library provides the similar_by_word() method. When provided with a word, the method returns the words with the shortest cosine distance. We can see the results for some examples:

Terrible: 'horrible','poor','received','awful','sucks','desk','bad','cust','rude'

Americanair: 'southwestair','united','virginamerica','jetblue','amp','usairways'

Call: 'phone','someone','hold','hung','back','speak','answer’,'line','online'

Averaging vectors and Random Forest Classification

Our dataset is divided such that 80% of the data will be used for training the model and the remaining 20% will be used for testing and validation. As each word within a customer review is an independent vector, we can calculate the average word-vector of each review. Thus, we can correspond a vector, representing a customer review, to a sentiment label (‘positive’, ‘neutral’, or ‘negative’). And, we are ready to train our model!

**Figure 4** | A) Normalized confusion matrix for validating the random forest model. B) Comparison of actual and predicted results.

Using a random forest classification model with 100 trees, we achieve an accuracy of 92%, 38%, and 41% for predicting negative, neutral, and positive reviews. The recall for the above classes was 92%, 38%, and 43%, respectively. The precision was 77%, 56%, and 70%, respectively. The higher accuracy of classifying negative reviews was likely caused by the imbalance of sentiment among the reviews. Furthermore, word2vec performs best when trained on datasets containing billions of words, and thus better results could be expected using a larger dataset of labelled customer reviews.

Future Steps

I look forward to developing this project further and documenting my progress.

Next, I will evaluate the techniques presented here against pre-trained sentiment analysis models. Although pre-trained models are not trained on domain-specific data, the transfer-learning approach benefits from very large corpora and combinations of complex models and techniques. Pre-trained models are also readily available and simple to implement, making them attractive to corporations that do not have the resources to develop in-house solutions.

Additionally, I plan to implement named entity recognition to identify the subject of each review. This could provide insight into sentiment around specific products and services and generate suggestions to businesses on areas for improvement. I have not reviewed the literature on this topic yet and welcome any suggestions on how to approach this problem!

Thank you for reading. Please share any comments and thoughts below, or get in touch via LinkedIn.

Alex Ianovski - Automation Engineer - Rogers Communications | LinkedIn

View Alex Ianovski's profile on LinkedIn, the world's largest professional community. Alex has 6 jobs listed on their…

www.linkedin.com

Resources

ianovski/customer-review-sentiment

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

References

‎Data Skeptic on Apple Podcasts

‎The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine…

podcasts.apple.com

Twitter US Airline Sentiment

Analyze how travelers in February 2015 expressed their feelings on Twitter

www.kaggle.com

Python for NLP: Sentiment Analysis with Scikit-Learn

This is the fifth article in the series of articles on NLP for Python. In my previous article, I explained how Python's…

stackabuse.com

Code

Edit description

code.google.com

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings with t-SNE

Everyone perceives texts in a unique way, regardless of whether this person reads news on the Internet or world-known…

towardsdatascience.com

Visualizing Word Vectors with t-SNE

Using data from Quora Question Pairs

www.kaggle.com

Sentiment analysis using word2vec

Using data from imdb Dataset

www.kaggle.com

Confusion matrix - scikit-learn 0.21.2 documentation

Example of confusion matrix usage to evaluate the quality of the output of a classifier on the iris data set. The…

scikit-learn.org

Understanding customer reviews with machine learning

Acquiring Data

Preprocessing

From words to vectors with Word2Vec

Visualizing

Averaging vectors and Random Forest Classification

Future Steps

Alex Ianovski - Automation Engineer - Rogers Communications | LinkedIn

View Alex Ianovski's profile on LinkedIn, the world's largest professional community. Alex has 6 jobs listed on their…

Resources

ianovski/customer-review-sentiment

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

References

‎Data Skeptic on Apple Podcasts

‎The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine…

Twitter US Airline Sentiment

Analyze how travelers in February 2015 expressed their feelings on Twitter

Python for NLP: Sentiment Analysis with Scikit-Learn

This is the fifth article in the series of articles on NLP for Python. In my previous article, I explained how Python's…

Code

Edit description

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings with t-SNE

Everyone perceives texts in a unique way, regardless of whether this person reads news on the Internet or world-known…

Visualizing Word Vectors with t-SNE

Using data from Quora Question Pairs

Sentiment analysis using word2vec

Using data from imdb Dataset

Confusion matrix - scikit-learn 0.21.2 documentation

Example of confusion matrix usage to evaluate the quality of the output of a classifier on the iris data set. The…

Written by Alex Ianovski