How topic modeling can help companies to mine customer reviews

Rolly John
IBM Data Science in Practice
5 min readFeb 9, 2022
two hands typing on a laptop with a monitor in the far background

Current social media savvy consumers love to leave feedback about products and services. They use many channels like Twitter, Facebook, the App Store, and product websites. Companies spend a lot of time and money monitoring these comments to protect themselves and at the same time, smart companies leverage these comments as sources for improvement.

Last week, I was working on sentiment analysis for App Store reviews of a banking app and wanted to extract information in the most efficient way. I had two questions in mind:

  • Can I sort through the comments and find out the topics people are talking about?
  • Can I classify the reviews based on the topic?

Further research into this area led me to Topic Modelling.

Topic modelling is one tool of Natural Language Processing(NLP). It helps someone to go through huge textual documents and find out what the document is talking about.

topic modeling flow: online feedbacks represented by documents feed into a topic modeling algorithm represented by a cloud and the letters “NMF”, “LDA”, and “LSA” which then output into different topics represented by multiple squares, each colored differently per topic which are then fed to different teams represented by icons of a person at a desk, a person speaking in front of people at a conference table, and a person reading a book.
Topic modelling flow

The document is broken down into a corpus state first and then the topic model algorithm finds the probabilities of occurrence of particular words in particular topics. For example “dog” and “bone” will appear more frequently in a document about dogs. The document may also be talking about cats, but we can assume that in a document about dogs there would probably 9 times more dog words than cat words. Topic modelling helps us to capture this.

How to apply topic modelling to AppStore reviews

All banks provide their customers' banking experience through mobile apps. For new-gen banking customers, a smooth mobile banking experience is a must-have. For this project, I am going to use the Apple App Store reviews left for the net banking mobile app of a leading Indian bank. I extracted 1000 reviews using the app-store-scraper package in Python and filtered for reviews with less than 4 ratings, i.e negative reviews, for analysis.

Snapshot of data

A quick look at the data shows that the reviews are rich in content, both lengthy and insightful. That makes it impossible for a human to read through and manually find the topics from the thousands of reviews.

Manually inspecting the first three comments, I find:

  • the first comment is about multiple topics — the user wants a user-friendly interface for credit card transactions, better UI for NPS and investments, and easier debit card blocking functionality.
  • the second user is also complaining about multiple topics — overall look and feel, spammy advertisements on login, and incessant customer care calls.
  • the third user is asking for a new feature — a direct credit card payment to any bank.

That took me 5–8 minutes just for three comments!!

Latent Dirichlet allocation (LDA) algorithm to extract topics

LDA is the most commonly used algorithm in Topic Modelling, developed by David Blei, Andrew Ng, and Michael I. Jordan in 2002. It builds a topic-per-document model and words-per-topic model, modelled as Dirichlet distributions. As the word ‘latent’ suggests, LDA sets to discover the hidden pattern that is yet to be discovered.

“each document is a collection of topics with different weightage”: the document has three different topics with the respective weights of 0.47, 0.22, and 0.09. “each topic is a collection of words with different weightage”: one topic has the words dog (0.72), bone (0.1), and ball (0.07), another has the words cat (0.22), calico (0.1), and fish (0.07), and yet another has words pets (0.18), friends (0.09), and kids (0.02)

LDA assumes that each document is a collection of a few topics and each topic is a collection of a few words (or tokens). The algorithm finds the probabilistic distribution of words and topics.

Other popular algorithms include Latent Semantic Analysis or LSA and Non-negative Matrix Factorization or NMF.

Before applying the model the review column is pre-processed through steps including:

  • Tokenization
  • Stop word removal
  • Lemmatization

I used the gensim package in Python for the LDA modelling.

The output of the model is 8 topics and keyword probability for each topic. For example, the output for ‘topic 1’ looks like this,

Topic 1: '0.062*"card" + 0.039*"credit" + 0.032*"work" + 0.032*"show" + 0.026*"use" + '0.025*"even" + 0.022*"get" + 0.021*"time" + 0.020*"check" + 0.019*"detail"')

How do we interpret the output?

The model has given 8 topics. Each topic is a weighted combination of words.

For ‘Topic 1’, the maximum weightage (0.062) is for the keyword — ‘card’ followed by ‘credit’, indicating the topic covers issues related to credit cards.

Measuring the goodness of the model

For the above model, I got a Perplexity score of -7.31 and a Coherence score of 0.34.

Perplexity measures how the trained model will deviate if we introduce new data to the training bunch, and the lower the score the better. The coherence score measures the semantic similarity between the keywords within a topic.

The coherence score can be improved in two ways:

  • the ‘elbow method’ to determine the optimal number of topics
  • parameter tuning

I will cover these potential improvements in future blogs.

Visual exploration: What are the dominant topics and keywords?

The first analysis I am interested to see is the most discussed topic from all reviews. To determine what key issues users are facing I calculate:

  • dominant topic per review
  • weightage of dominant topic per review
two bar charts. the top one shows the number of reviews by the dominant topic and the bottom one shows the number of reviews by the topic weightage. Both have the number of reviews for the y-axis. The top graph shows that topic 1 has approximate 250 reviews and the next highest is topic 6 with approximately 75. the bottom graph shows that topic 1 has the words “card, credit, work, show, use” as dominant and that topic 6 has the words “login, time, ask, application, open” as dominant
Number of reviews by topic

‘Topic 1’ has the maximum number of reviews and weightage followed by ‘Topic 6’

  • ‘Topic 1’ has top keywords ‘card’ and ‘credit’ which are pointing towards issues related to credit (or any card) functionalities
  • ‘Topic 6’ consists of words like ‘login’, ‘time’ and ‘application’ which indicates that the topic is talking about issues related to account login experience

Classifying each feedback to a dominant topic

Another important step is to assign a topic to each feedback in our data set. This is done by calculating the % contribution of topics to reviews, like below:

Now I can compare the model suggestions with the manual analysis done at the beginning

  • the first and third reviews have mentioned issues pertaining to credit card features provided in the app, the model has assigned ‘topic 1’ here.
  • the fourth review is about spending limits, model assigned ‘topic 0’ which has keys words like ‘bill’, ‘statement’ and ‘update’
  • the fifth review is about login issues for which ‘topic 2’ is an appropriate match.
  • the only mismatch I feel is on the second review which is about the overall look and feel of the app. So ‘topic 1’ looks to be wrong here.

By my judgment, overall the model worked well!

Conclusion

Topic modelling helps to discover the patterns and topics from huge amounts of text data in an efficient way. There are many practical applications of such a tool to companies like

  • Automatic routing of topics & reviews to relevant teams (strategy, product owner, application maintenance) for further action
  • Automation of data mining
  • Decision automation

Please check out my GitHub for complete analysis and leave your feedback.

References:

--

--