Understanding customer reviews with machine learning
In a recent episode of one of my favourite podcasts, the guest, Ellen Loeshelle, discussed software for analyzing customer reviews and classifying sentiment on companies’ products. Why is this important? As technologies advance and industries become increasingly competitive, focus on customer experience becomes critical to maintaining loyalty and driving long-term business strategy. Companies must quickly identify where they are succeeding with customers and where there are opportunities to improve.
Inspired by the podcast episode, I set out to build my own classification model for analyzing customer reviews and classifying them as either (i) positive, (ii) negative, or (iii) neutral. However, rather than implementing a rule-based approach to classify reviewer sentiment, as proposed by Loeshelle, I used machine learning (ML) algorithms such as word2vec and random forest classification. Overall, the model successfully identified negative reviews with 92% accuracy, and achieved an overall accuracy of 71%.
Acquiring Data
Now, more than ever before, conversations between businesses and customers are open dialogues. Businesses use platforms such as Facebook, Twitter and Instagram to distribute company and product information to vast audiences. Customers, in-turn, use these platforms to react, share and voice their opinions on companies and said products. For large organizations offering a wide range of products and services, reading every review from a myriad of sources including online forums, emails, social media, and call-center transcripts is not feasible. However, using Natural Language Processing (NLP) concepts and a large corpus of customer reviews, ML algorithms can be trained to automatically classify customer feedback.
For this project, I used an open dataset containing 15,000 labeled customer reviews of airline companies, collected from users on Twitter.
The sentiment of the customer reviews is shown in Figure 1 below. Apart from Virgin America, customer sentiment on the airlines is imbalanced and over-represented with negative reviews. Imbalanced datasets are problematic for training ML algorithms, as there is insufficient data to model under-represented labels (in this case, ‘neutral’ and ‘positive’ reviews). In this iteration, I do not implement any techniques to deal with class distribution, however, this would be useful for improving sensitivity of the model.
Preprocessing
Characters which appeared in the Tweets but would be problematic for training the model were removed. This included html tags, and non-letter characters such as emojis.
Stop-words and filler-words such as “the”, “and”, “I” etc, were also removed. These words occur frequently but do not indicate positive or negative sentiment and are not useful in training the model. Following this cleaning process, the most frequently used words in the dataset are displayed in Figure 2 below.
From words to vectors with Word2Vec
A major challenge of analyzing free-flowing text is the inherent lack of structure. In order to utilize ML algorithms we must first convert the unstructured text into vectors. Word2vec is an algorithm developed by Google which converts text from a large corpus into vectors. It first constructs a vocabulary from the training text data and then learns vector representation of words. Vectors of words with similar meaning appear close together and as a result, words and the resulting word-vector model can be used as a feature in many NLP and ML applications.
Visualizing
Word2vec is commonly implemented with 300 dimensional vectors. Unfortunately, those are too many dimensions for people to visualize. Fortunately, we can use the t-SNE algorithm to visualize the word-vectors in 2-D.
Examining Figure 3, we observe that clusters containing similar words are formed. The word2vec library provides the similar_by_word() method. When provided with a word, the method returns the words with the shortest cosine distance. We can see the results for some examples:
Terrible: 'horrible','poor','received','awful','sucks','desk','bad','cust','rude'
Americanair: 'southwestair','united','virginamerica','jetblue','amp','usairways'
Call: 'phone','someone','hold','hung','back','speak','answer’,'line','online'
Averaging vectors and Random Forest Classification
Our dataset is divided such that 80% of the data will be used for training the model and the remaining 20% will be used for testing and validation. As each word within a customer review is an independent vector, we can calculate the average word-vector of each review. Thus, we can correspond a vector, representing a customer review, to a sentiment label (‘positive’, ‘neutral’, or ‘negative’). And, we are ready to train our model!
Using a random forest classification model with 100 trees, we achieve an accuracy of 92%, 38%, and 41% for predicting negative, neutral, and positive reviews. The recall for the above classes was 92%, 38%, and 43%, respectively. The precision was 77%, 56%, and 70%, respectively. The higher accuracy of classifying negative reviews was likely caused by the imbalance of sentiment among the reviews. Furthermore, word2vec performs best when trained on datasets containing billions of words, and thus better results could be expected using a larger dataset of labelled customer reviews.
Future Steps
I look forward to developing this project further and documenting my progress.
Next, I will evaluate the techniques presented here against pre-trained sentiment analysis models. Although pre-trained models are not trained on domain-specific data, the transfer-learning approach benefits from very large corpora and combinations of complex models and techniques. Pre-trained models are also readily available and simple to implement, making them attractive to corporations that do not have the resources to develop in-house solutions.
Additionally, I plan to implement named entity recognition to identify the subject of each review. This could provide insight into sentiment around specific products and services and generate suggestions to businesses on areas for improvement. I have not reviewed the literature on this topic yet and welcome any suggestions on how to approach this problem!
Thank you for reading. Please share any comments and thoughts below, or get in touch via LinkedIn.