DATA STORIES | BRAND REPUTATION | KNIME ANALYTICS PLATFORM

Brand Reputation Measurement using Social Media

Use a codeless approach to track in real-time what your stakeholders have to say about you

Roberto Cadili
Low Code for Data Science
9 min readFeb 15, 2023

--

Author: Konstantin Pikal

Photo by Jonny Caspari on Unsplash.

“Your brand is what others say about you when you are not in the room”.

This famous quote, attributed to Jeff Bezos, Amazon’s founder, is one of our favorites. It is not without criticism, but it is an apt way to approach the measurement of branding.

Building a brand is on every marketer’s task description. But how do you do it? And foremost: How do you measure what you have built (and hopefully are still building)?

How do you analyze what your stakeholders (e.g., customers, media, competitors) have to say about you? We could turn our heads and point them towards social media, one of the channels where people “are talking about us”.

The Brand Reputation Tracker

A research team led by Prof. Ronald Rust of the University of Maryland has developed a new way to analyze social media to understand a Brand’s reputation. Different from other famous Brand Metrics, like BAV or Interbrand, this tracker uses real-time social media data and measures the occurrence of three important drivers of Brand reputation: Value, Brand and Relationships. Inspired by Rust et al.’ s work, we will construct an interpretable tracker with a codeless approach using KNIME Analytics Platform. In this example tutorial, we focus on the driver Brand and its sub-drivers: Cool, Exciting, Innovative, and Social Responsibility.

The original Real-Time Brand Reputation Tracker was explained in: Rust, R. T., Rand, W., Huang, M.-H., Stephen, A. T., Brooks, G., & Chabuk, T. (2021). Real-Time Brand Reputation Tracking Using Social Media. Journal of Marketing, 85(4), 21–43. https://doi.org/10.1177/0022242921995173.

Open KNIME and load the workflow — Where to find it

You can find the workflow in the Machine Learning and Marketing Space on the KNIME Community Hub along with other helpful workflows. This article will focus mostly on the “Text Mining Replication of Brand Reputation Tracker” workflow.

Figure 1: The Brand Reputation Tracker workflow on the KNIME Community Hub.

What the Brand Reputation workflow does

  1. Get tweets via Twitter API
  2. Clean Twitter data
  3. Prepare the tweets for text-processing
  4. Tag the text based on Brand Reputation dictionaries
  5. Calculate Brand Reputation scores
  6. Visualize Brand Reputation over time

1. Get tweets via Twitter API

First, we need some data to use the brand reputation tracker on. Therefore, we suggest you get developer access on the Twitter API. It sounds complicated, but it really isn’t. It just gives you access to Tweets that you will have to later text-mine.

How to get your Twitter API

Firstly, we need to get access to Twitter data. Luckily enough, Twitter offers an API (Application Programming Interface), where we can get the data from. Now it is free, but it might become a paid service soon. First, you will have to sign-up for a Twitter account. If you already have one, you can skip this step. It will ask you some basic information and you will need to verify your email address. Afterwards, you will be asked to configure a couple of things for your developer account: Your Country (in our case, Italy), your use case (if you are following a course, select “Student”).

Now you will also need to verify your account with a phone number. Make sure you do that. If not, you will not have access to the Developer Tools at Twitter. Finally, it will ask you to agree to the “Developer agreement & policy”.

Set-up the Twitter API

In your Twitter developer portal, you will have to get four things: the API key, the API secret, the Access Token and the Access Token Secret. Those are the credentials that you will need to be able to connect to Twitter and retrieve tweets.

Twitter API Connector node

By right-clicking on the node and clicking on “configure” you will be able to access the configuration window of the node. Please add your personal Twitter credentials (API key, API secret, Access Token and Access Token Secret), as you find them in your Twitter developer account.

Figure 2: Insert your personal keys from Twitter (those displayed here are for demonstration purposes only).

Twitter Search node

Next, we will have to get tweets that were written around a certain brand. The way we do it is by using the Twitter handle. In our example, we will use “@amazon”. This gives us access to the tweets that mention Amazon. The number of rows that you can set has to do with the access level of your Twitter API. For example, the Twitter v2 rate limits are 900 Tweets per look-up, after which you will have to wait 15 minutes. In other words, the max amount of tweets that you can get in one go ranges between 15 and 16 thousand. Note that we excluded user profile images, because they would slow down the execution. In case you are interested in profile images, just add it to the fields selection.

Figure 3: Configuring the Twitter Search node.

2. Clean Twitter data

Before we start analyzing the data, we will have to do some cleaning: or better, the workflow is doing the cleaning for us. It excludes retweets and filters only tweets in English (this is important because our text-mining dictionary is only in English).

Figure 4. Cleaning the Tweets.

3. Prepare Tweets for Text Processing

Now that we have cleaned our data, the most important part begins. We are going to work with tweet texts and extract insights using the KNIME Textprocessing extension.

Preprocessing

We start off with pre-processing. For this, we first must convert Strings (e.g., text on tweets) back into Document data type.

We need the Document data type to be able to perform text-mining operations in KNIME. In the first step, we stem all our words to make them easier to interpret for the machine. For example, “exciting” becomes “excit” and “inspiring” becomes “inspir”. After KNIME has done this for us, our documents are fed into the Dictionary Tagger node.

4. Tag the text based on Brand Reputation dictionaries

The Dictionary Tagger consists of two inputs: a dictionary, including all the relevant words, and a tagger, where we specify what tag applies to which word. For example, “trendi” and “hip” are part of the positive “Cool-dictionary”, whereas “ancient” and “lame” are part of the negative “Cool-dictionary”. The tagger now uses the dictionaries to tag the document in this way. When the tagger finds a word in the document that is also in the dictionary, e.g., “modern”, it tags that word with the corresponding tag (FTB-A).

You might have noticed that in the paragraph above we used a tag type called “FTB” and its values (e.g., A). This is because KNIME does not have a custom tagger for brand reputation drivers. Next, we create a “bag of words”. A bag of words is simply a list of all single words occurring in the dataset.

Figure 5: Top: The KNIME workflow for Tagging. Bottom: The “cool” Dictionary: Positive Words on the left, negative words on the right.

5. Calculate Brand Reputation scores

We now reconvert our tags to strings, and we only keep the words in our document that have been tagged by our dictionaries. The reason for this is that way we have less data to process. After filtering out the words that do not have any tags using the Row Filter node, we use the TF node to count the occurrences of each term in the document. This will give us a document where we see the count of a specific term in any tweet. If you look closely, you will see that every tag/tweet combination has its own row. This means that if we have two tags in a tweet, we will have two rows. We will later use a Pivoting node to sum up tweets and tags.

To group our data by time (in our case by months, but this depends on the data that you have collected — the workflow on the Hub aggregates data by day and hour), we must extract date and time fields. We manipulate the data in such a way that we end up having different tag frequencies in the columns and time info in the rows.

Figure 6: Data Manipulation. On top, we group by the time dimension we want, e.g. months, taking the mean of timestamp. On the bottom, we group by time, taking as the pivot the sum of the tag-counts. In the Joiner node, we join the tables by time.

After this, we also handle any missing values by fixing them to the value “0”.

It is worth mentioning that the column names correspond to the tag values that we used during the dictionary tagging process (e.g., A, ADV, etc.). Therefore, we rename the columns with the names of the brand sub-drivers (cool, exciting, innovative, social responsibility).

Figure 7: Construct Operationalization. Term frequencies and missing values.

Net and average scores

When you look at the table, you will see that there are positive and negative columns for each sub-driver. Using a series of Math Formula nodes, we subtract the negative column from the positive column for each sub-driver. In this way, we obtain the net scores. After that, we take the net scores and average them over the four brand sub-drivers. In this way, we obtain the “Brand Driver” average. If we inspect the output table, we will now have five columns: Cool Net (which is obtained subtracting Cool_Negative from Cool_Positive), Innovative Net, Exciting Net and Soc. Resp Net. As stated before, the Brand Driver is the average of those four attributes.

Figure 8: Net scores of the Brand sub-drivers.

6. Visualize Brand Reputation over time

Finally, we normalize all our values to make it easier to understand changes in time and across drivers (in case you add new drivers, such as the “relationship driver” and its sub-drivers). To visualize the evolution of the selected sub-drivers over time, we use the Line Plot node. We need to make sure to choose the time dimension on the x-axis and our drivers (depending on how detailed we want our analysis to be) on the y-axis. If you look at our example, you can see that Amazon’s Brand has been perceived as less innovative and exciting throughout the year, while the overall perception of its social responsibility seems to have improved throughout 2022.

Figure 9: Normalizer node. We normalize our values to make time-series analysis easier.
Figure 10: Example visualization of Amazon with the 4 Brand sub-drivers in 2022.

Automate brand reputation analysis with no-code

Measuring brand reputation is not a trivial task. While most well-established metrics get the job done, they usually fail to do so in real-time. In this article, we introduced a brand reputation tracker that uses real-time social media data to measure the occurrence and trends of one major Brand reputation driver: Brand and its four sub-drivers, i.e. Cool, Exciting, Innovative, and Social Responsibility.

To ensure a fully automated and transparent process, we relied on KNIME capabilities to connect, search and retrieve Twitter data around a chosen brand without a single line of code. Likewise, the no-code steps to process tweet texts, assign tags and visualize how brand reputation changes over time can be reused and extended conveniently beyond the scope of our example.

So now it’s your turn! Get your Twitter API and start your journey into real-world brand research. We are very curious to see your results!

--

--

Roberto Cadili
Low Code for Data Science

Data scientist at KNIME, NLP enthusiast, and history lover. Editor for Low Code for Data Science.