Blibli Future Program Batch 5 — Data Track Phase 2: Data Science

Topic modelling and deep learning classifier of e-commerce app reviews

6 min readFeb 22, 2022

Previously we’ve scrapped app reviews from app store and play store, then we analyze and create a visualization based on those data. Now we’re going to do topic modelling to find topics of those reviews and make a deep learning classifier to classify reviews into a topic.

Finding Topics of App Reviews

We tried both Topic Modelling and Clustering to try finding topics of these app reviews. Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. In this article, we tried using LDA as our topic modelling methods and a clustering method called KMeans.

Before creating models, we need to preprocess the reviews text. Below are the steps taken to preprocess them:

Removing punctuation
Removing stop words and lowercasing reviews
Removing emojis and non-ascii characters
Take only reviews with 3 words or more

Clustering Model: KMeans

We experimented with 2 different word embeddings: TF-IDF and FastText (Bahasa) and multiple number of clusters. Below are our workflow:

Text pre-processing
Apply TF-IDF or FastText embedding model to the text reviews
Create K-Mean models with different number of cluster and see elbow score
Find words that appear near by each cluster centroids in each cluster to find the topic of each cluster. If topics can’t be determined this way, then we count words that appear the most in each cluster to find the topics.
Sample these clusters and check resulted cluster manually

Looking at these graph below, we decided to choose 3–5 clusters, as the sum of squared distances does not have a significant drop in any number of clusters, and 3–5 seems to be a good amount of cluster to work with in the classifier later.

After checking the samples and top words of each cluster manually, KMeans cluster with number of cluster 3,4,or 5 with TFIDF or FastText word embedding is not viable to be used as training data. Resulted clusters is not comprehensible by human, and each cluster does not contain specific topic as desired.

Topic Modelling with LDA

Another way to find topics beside clustering is by using topic modelling. In this article, we will be using Latent Dirichlet allocation (LDA). Below are the steps taken:

Text pre-processing
Create LDA models with different number of topics and see coherence score

Find most important words in each cluster

Example for 4 clusters:

Topic: 0 Words: [‘suka’, ‘ilmu’, ‘aplikasinya’, ‘download’, ‘hadiah’, ‘lancar’, ‘muncul’, ‘jaya’, ‘buka’, ‘usaha’]
Topic: 1 Words: [‘membantu’, ‘mudah’, ‘terimakasih’, ‘online’, ‘membeli’, ‘sukses’, ‘kartu’, ‘pembelian’, ‘berbelanja’, ‘transaksi’]
Topic: 2 Words: [‘pelatihan’, ‘prakerja’, ‘kasih’, ‘terima’, ‘semoga’, ‘cepat’, ‘skill’, ‘beli’, ‘academy’, ‘sukses’]
Topic: 3 Words: [‘bermanfaat’, ‘gratis’, ‘ongkir’, ‘murah’, ‘promo’, ‘memudahkan’, ‘pengguna’, ‘harga’, ‘mengikuti’, ‘diskon’]

Label each review to a cluster based on most important word. We label these reviews by finding the topic number that has the highest percentage contribution in each review.
Sample these topic clusters and check manually.

According to the coherence score, LDA with 3 and 5 topics seems promising. We will also try 4 clusters as the different in coherence score is not that big (around 0.04).

After labelling and sampling of each model, LDA model with 4 topics resulted in clusters that are comprehensible by human, and each cluster contain a somewhat specific topic. LDA with 5 topics resulted in class distribution that are highly imbalanced. Meanwhile LDA with 3 topics resulted in topics that are too general and may not be useful for further analysis. Therefore, we will use LDA with 4 topics as our final model, and use the labelled data to train a deep learning classifier.

Below are the resulted clusters:

Topic 1: Aplikasi
Words: suka, ilmu, aplikasinya, download, hadiah, lancar, muncul, jaya, buka, usaha
Topic 2: transaksi dan pembayaran
Words: membantu, mudah, terimakasih, online, membeli, sukses, kartu, pembelian, berbelanja, transaksi
Topic 3: pelatihan prakerja, pengiriman, pelayanan
Words: pelatihan, prakerja, kasih, terima, semoga, cepat, skill, beli, academy, sukses
Topic 4: ongkir dan promosi
Words: bermanfaat, gratis, ongkir, murah, promo, memudahkan, pengguna, harga, mengikuti, diskon

Deep Learning Classifier

There were 3 different models that were used:

CNN
BiGRU with normal embedding layer
BiGRU with pre-trained embedding layer using FastText Bahasa

Before feeding the data into those models, we preprocess the text using the same steps as we did in our topic modeling. Then we will tokenize text reviews into a sequence of numbers and pad those sequences into the same size.

For the first model, we will use a Convolutional Neural Network. Below are the model architecture:

After trying multiple activation function and number of kernels, tanh activation function with kernel size of 1 resulted in model with the highest test accuracy of 92.60%.

Accuracy for each activation function and kernel size

The next model is Bidirectional GRU (BiGRU) with normal embedding layer from tensorflow. Below are the model architecture:

This second model achieve test accuracy of 94.80%!

And finally we will try to use pretrained FastText word embedding model on our text, and then transfer these weights into our model’s embedding layer. This model has the same architecture as our previous model, with the only difference is the embedding layer has its weight from the FastText model.

This third model achieve test accuracy of 94.26%!

After comparing all of these models, we will use our second model (BiGru with normal embedding layer) as our final model as it has the highest test accuracy.

Next, we will analyze and compare 6 different e-commerce apps, and their differences in the topic discussed in their app reviews.

Analyzing Apps Reviews and Its Topics

We will analyze reviews data from 6 different apps: Bukalapak, Tokopedia, Shopee, Blibli, Zalora, and Lazada. Reviews data are taken from app store and play store from November 2021 until January 2022.

% of reviews with topic: Transaksi dan Pembayaran

2. % of reviews with topic: Pengiriman dan Pelayanan

3. % of reviews with topic: Aplikasi

4. % of reviews with topic: Ongkir dan Promosi

Seeing those 4 graphs, we can see that Blibli has the highest percentage in topic: Ongkir dan Promosi compared to other e-commerce apps. Next, we want to find out whether people talk a lot abour Ongkir and Promosi because they are satisfied with it, or because they disliked it.

By Checking score on each review, we can determine their sentiments, with 1&2 is negative sentiment, score of 3 is neutral, and score 4&5 as positive sentiment. Below are the sentiment of Blibli’s review with the topic of Ongkir dan Promosi:

We can see that the sentiments are pretty much divided in half, with only 51% of the reviews is satisfied with Blibli ongkir dan promosi. Comparing this to number to other e-commerce, Blibli has the lowest positive sentiment in Ongkir dan Promosi related reviews. With this we can conclude that Blibli customers like to talk about ongkir dan promosi more than other e-commerce customers because many Blibli customers is not satisfied with Blibli’s ongkir dan promosi.

Further analysis can be done by a marketing team to check if these reviews on app store corelate with Blibli marketing campaign. And if ongkir dan promosi is Blibli’s brand identity, improving customer experience on ongkir dan promosi can be vital to meet customers’ expectation.