Voice of Customer Is The New Currency : What Machine Learning & Twitter Reveal About the Clicks debacle.
Using Topic Modelling & Twitter scraping to extract Voice of Customer after the Clicks Group racially offensive advertisement debacle.
On Friday, 4 September, Twitter reacted to the Clicks Group, one of South Africa’s largest healthcare retailer after they released an online advert depicting a black woman’s hair described as ‘frizzy and dull’, and a white woman’s hair described as ‘normal’ as part of an ad campaign with American hair care brand , TRESemmé.
In this post I scrape Twitter data related to the debacle and use a Natural Language Processing algorithm, the latent Dirichlet allocation (LDA) to unveil some the major topics which emerge from the twitter discourse since the event.
Why Twitter is a good source for Voice of Customer in South Africa?
Academics studying social media analysis have a multitude of datasources to use but Twitter has often been selected as a data source of choice due to its access friendly infrastructure, predisposition for social discourse and near total availability of data, particularly in South Africa. Google is able to provide a signal for the search trends by region in South Africa which speaks to the heartbeat of social media.
The most related searches from google were “Clicks Hair Advert” & “Clicks Racism Ad” with average growth over 900% each. These directly correlated with the sharp stock price fall following the advertisement’s release.
Twitter Voice of Customer Analysis
I used scraping library GetOldTweets3 to extract around 15,000 tweets linked to Googles Trending Related Searches for Clicks limited to the South African boundaries into a pandas data frame detailing the user, tweet text, Date and the hashtags. These are from 4 September until 8 September 2020.
I extracted the texts which are the most relevant data and performed gensims standard pre-processing including removing stop words, using tokenisers to remove punctuations and unnecessary characters altogether and finally creating the dictionary and Corpus needed for Topic Modelling. I go through the detail of pre-processing data for Topic Modelling in my previous post here.
Visualising the Tweets
I’ve found that word cloud’s are useful in getting a visual of some of the key themes before even attempting the modelling. To make the insights more useful I used generate_from_frequencies function from pythons wordcloud library which restructures the text data dictionary such that tweets with more reach with respect to retweets and favouriting, are prioritised.
WordClouds are great at for extracting the essential insights, with the 100 words from 15,000 tweets revealing that racism, racists and protests were some of the most associated words with the Clicks Brand. We cant gather enough insights from words clouds alone to reveal what topics emerge from the 15,000 tweets which is where the LDA model comes in.
Topic Modelling with the LDA
To qualify why LDA as a topic modelling algorithm is significant for this exercise, imagine all these words from individual Tweets are broken down individually as we have done with our preprocessing. The LDA investigates the distribution and frequency of these words within their respective documents and tries to imagine a fixed set of topics. Each topic represents a set of the words from the Tweets & the LDA maps all the words from Tweets to the topics such that they are captured by those topics. This yields something like which looks like this:
Key Topics & their Interpretations
Four topics emerged from the Topic Model which speaks to some of the key themes of the events which ensued after Clicks released the offensive online ad.
Visualising the Twitter Topics with pyLDAvis
The LDA output doesn't really make too much sense in this format which is why we use pyLDAvis, an interactive LDA visualization package, to plot all generated topics and their keywords. PyLDAvis calculates semantic distance between topics and projects topics on a 2D plane.
Bubble size of the represents “importance” of the topic and distance between the bubbles reflects the similarity between topics. The closer the two circles are, the more similar the topics are. A good topic model should have some dominant bigger bubbles, with smaller ones scattered on the plane and avoid overlaps which shows topic infusions. I have iterated to 4 topics which meet this criteria.
Topic 1: General Clicks & Hair Ad Discourse
At over 9,000 , the root word click is expectedly the most common in the entire corpus which speaks the brand at hand. The key themes surrounding the brand have to do with the hair advert with key words such as “hair” , “clicksmustfall”, “employee” and even political party “eff” which has been the most vocal in this discussion, emerging.
Topic 2: Protests Violence & Closed Stores
The key topics surrounding this topic speak to the discussion around the protests which resulted in the closing of Clicks stores on Wednesday, September 8. Words such as “close”, “store” , “protest” , and “right” were dominant in the topic while words such as “wrong”,”violence” & “damage” were also in the same cluster. By frequency and distribution, it would appear there were mixed feelings about the protests with the dominant sentiment deeming the “protest” as “right” or a “right” while less but significant amounts saw them as “wrong” and “violent”.
Topic 3 and 4 : Racism, Racists & White People
These two topics are within close proximity with good reason — They both deal with assertions around race & racism, a topic which is at the core of this debacle.
Topic 3 has “racism” and “racist” as the most frequent words followed by “take” ,“job” ,”lose” & “today” . This points to the discourse around professional accountability for the racial advert which may very well have resulted in the firing of a senior executive at Clicks responsible for the racist advert on 8 September. Interesting enough, the word “apology” appears at the very bottom of this topic.Topic 4 speaks to some of these themes in topic 3 but specifically at “ white”, “people” and “clicksshutdown”.
The social media landscape has created a new social currency which users can leverage to get companies to respond to criticism. Topic Modelling social media allows us to parse thousands of data points within the twitter sphere to get a gauge of the heartbeat of the populace in order to understand the themes which emerge.