Stargazing with Machine Learning — Employing Structured Topic Modelling to Unveil Customer Satisfaction Drivers on Trustpilot

Published in

TechLabs

6 min readNov 28, 2020

This project was carried out as part of the TechLabs “Digital Shaper Program” in cooperation with the Marketing Center Münster (Term 2020/01).

Abstract: Traditionally, businesses had to invest in market research to find out the exact reasons their customers where buying — or leaving. Fast forward into 2020, many social media platforms function as a hub for customers which are deliberately stating and explaining their satisfaction levels with companies they interacted with. In this project we showed that companies nowadays are able to employ machine learning technology in order to identify the exact drivers of customer satisfaction — as stated by the customers themselves. How we did it? 1) We crawled over 250,000 reviews on Trustpilot. 2) We employed latent Dirichlet allocation-based topic modelling on the texts of two particular companies. 3) We unveiled the relationship between topics and star ratings with means of a logistic regression, thus determining the effect of each and every satisfaction driver.

Data Collection

In order to access the data from Trustpilot we programmed a crawler with means of the R-package rvest. Because we wanted to perform the data collection as efficiently and reliably as possible, we coded our crawler such that it possessed all of the following traits:

· Given a Trustpilot page of a company, automatically scrape all reviews over all subpages available (note that each company page has multiple subpages, where for each subpage 20 reviews are displayed — our maximum No. of reviews for one company was 80,000 reviews across 4,000 subpages!)

· For each review, crawl a specified set of relevant data points, such as the text, the star rating and the date, for instance.

· Given a pre-specified date period, solely crawl reviews which were posted within that period

· Given a list of company pages, automatically scrape all customer reviews available for each of those companies considering the pre-specified time period

· In case a problem occurs while crawling a particular review, print & save the error log and jump to the next review instead of aborting the whole crawler process

· In case a particular subpage loads too long jump to the next page instead of crawling for infinite time

· “sleep time” between subpages in order to avoid over-exhaustive domain use

· Saving all data frames as excel sheets into a pre-specified folder, such that the complete crawling process happens autonomously & over night

· Various aesthetic measures such as printing the estimated time to scrape all reviews, progress bars, saving errors into a log-list etc.

**One line to rule them all — our scraper with all input parameters needed to crawl a quarter million reviews on Trustpilot**

Finally, in order to have sufficient amounts of observations available, a quarter million reviews posted between June 1st, 2018, and July 14th, 2020, have been collected. We then performed exploratory analyses on the complete data set, compared different companies and critically assessed their suitability as a research object for our project. As a result of this meta-analysis, otto.de as a multi-brand retail platform and vinos.de as a niche player in the wine shipping segment emerged as promising research objects.

**Descriptive statistics for both otto.de and vinos.de**

Topic Modelling

Initially, we employed a latent Dirichlet allocation-based topic modelling algorithm on all texts crawled. However, we saw that enriching the topic modelling with relevant covariates enhanced the quality of the topics. That is why we performed a sentiment analysis on all reviews scraped and integrated the sentiment polarity into our topic modelling algorithm, essentially conducting a so-called structured topic modelling approach. In this context, we previously validated three different dictionaries (GPC, SentiWS and NRC) on a random subsample of our text corpus and found the GPC lexicon to have the highest congruence with our judgement. With respect to our topic modelling approach, we fine-tuned the hyperparameters of the algorithm in a systematic manner by assessing the score of each parameter setting over five different quality metrics (For the nerds: Semantic Coherence, Held-Out Likelihood, Exclusivity, Residuals and Lower Bound.

**six of the fifteen identified topics for otto.de along with their most exclusive (FREX) and probable (PROB) words per topic**

All in all, we identified 15 topics for otto.de where we were able to recognize different overarching elements of OTTO’s service. The topics identified were ranging from evaluations about the delivery management to critical assessments of the price levels at the online shop (see the above table for a subset of the topics and their respective most characteristic words). For vinos, we identified 11 different topics where most of these focused on different aspects of the product delivery highlighting that this aspect is of particular importance for VINOS’ clients. A t-SNE reduced dimensional space shows that all reviews were cleanly clustered into inherently different aspects. Hence, at this point, our report was capable of showing that it is possible to clearly identify over-arching topics from an enormous set of unstructured textual data.

**All 23,820 reviews of the otto.de corpus visualized on a (t-SNE) reduced dimensional space, colored according to their topic assignment**

Analyzing the Impact of Topics on Overall Satisfaction Levels

After having identified several latent topics in the customer reviews, a logistic regression model was executed to determine single topic effects on customer satisfaction. In this study, the review’s star rating was assumed to be a proxy for the satisfaction of customers and serves as the dependent variable. A high star rating, respectively five or four stars, indicates that customers are generally more satisfied. Contrarily, customers that assigned the company with a low star rating (one or two stars) were considered to be overall dissatisfied. Hence, star ratings have been recoded into a binary variable.

For both companies we were able to identify significant effects of the topics on the overall satisfaction levels (see the figure below for a subset of OTTO’s topics with their respective effect on customer satisfaction ratings). Importantly, although both companies show excellent average ratings, it is possible to identify points which dissatisfy a subsection of the customers — such as a sloppy communication of payment problems in the case of otto.de. Summarizing all the results for OTTO and VINOS, we learned a couple of things:

· Through advanced machine learning methods, companies are very much capable of identifying topics which are over-arching the complete chatter and noise ongoing at social media platforms

· There exist a set of indicators which imply whether a review corpus of one company is suitable for such a text analysis

· Once interpretable topics have been identified, it is possible to determine their effect on overall star ratings

· With these effects, companies cannot just unveil what their clients are talking about, but how important the different aspects are to them

· Interestingly, as well for companies with a near-to-perfect average rating our report shows that the methods still can accurately identify negative satisfaction drivers

**Effects (x-axis) of 7 of the 15 previously identified OTTO topics** **on customer satisfaction levels as measured with the star ratings. Note that the line on the 1.0 mark separates negative (left side) from positive effects (right side).**

The Team:

Michael Heilmann — LinkedIn

Henrik Robert Kram — LinkedIn

Friederike Ulmer — LinkedIn

Marie Woltering — LinkedIn

Stargazing with Machine Learning — Employing Structured Topic Modelling to Unveil Customer Satisfaction Drivers on Trustpilot

Written by Inside.TechLabs