What makes an indicator an indicator?

9 min readApr 11, 2019

What makes an indicator an indicator? Reflections on barking up the wrong tree with a cup of coffee. [My Starbucks Story]

The Inspiration behind this Data Visualization

I love coffee. As a University student who had once lived on campus, the Starbucks outlet conveniently located in the heart of University Town has been a saviour amidst one-too-many late essay-writing nights. Of course, these cups of iced latte are a welcomed-but-infrequent indulgence given the lack of wallet-friendly prices.

The largest coffee chain in the world has garnered itself a reputation as a coffee for ‘hipsters’, which is millennial terminology for coffee that is priced too expensively. It’s hefty price tags of 6 to 7 dollars a cup is about 3 times the price of a cup of coffee from a neighbourhood coffee shop. Another random (but helpful) person on the internet had calculated that a daily Starbucks habit would set the individual back by $2,300 a year. Even if one does not drink their coffee everyday, the most loyal customer segment (the top 20%) would still be spending a significant amount of their disposable income as they were found to visit the store at least 16 times a month.

Hence, I wanted to investigate the validity of this impression of Starbucks as a drink for the upper-middle class. Since this class had defined indicators as measured concepts that monitor development and track progress, hence the quantitative data I would need would be a composite of two features — (1) Starbucks-related, and (2) income-related.

The research question that my data visualisation would then hope to provide a picture of is “Can the number of Starbucks store in a city be associated with how well-off the city is?”

This train of thought was inspired by The Big Mac Index invented by The Economist in 1986. The Big Mac Index uses the theory of purchasing-power parity to form a lighthearted guide that discerns whether currencies are at their “correct’ level. It is based on the notion that long run exchange rates should move toward the rate that would equalise the prices of an identical basket of goods and services (in this case, a burger) in any two countries. Hence, using burgernomics, we can estimate how much one currency is under- or overvalued relative to one another. However, instead of such lofty ambitions of creating a composite indicator alongside the awareness that correlation is not causation, I just wanted to understand if there would be any relationship between Starbucks and the relatively well-off.

Technical Difficulties — Finding Data Sources, Data Cleaning and Visualising the Data

Data Access

In order to scope the research question a little further, I decided to take a more pragmatic approach and based this on what access to data I had. A quick search on Kaggle brought me to a dataset on the locations of Starbucks stores worldwide based on a scraped dataset from from the Starbucks store locator webpage by Github user chrismeller. The dataset was intimidatingly massive and I knew that I had to scope down the locations even further. Since we were using Reddit and 53.9% of its user base come from the United States, I decided to attempt to appeal to the audience profile and extract data about the United States. By focusing on one country, I hope to eliminate varied price differences between Starbucks drinks.

For ease of reading, I knew that I wanted to try visualising the data on a map. Hence, rather than cities, I decided to focus on states. In terms of understanding the incomes of the various states, I was playing around with Tableau and realised that we could add filters of income per capita by state.

Hence, I had scoped the research question to “Can the number of Starbucks store in a city be associated with how well-off a state is?”

Data Cleaning

To my horror, I discovered that the latitudes and longitudes available in the dataset were not readable by the software and I had to manually input them for the states if I wanted to map it on Tableau.

I extracted the relevant portions using Pivot Table to get the count of number of Starbucks shops per states.

Visualising the Data + Data Analysis

I discovered that my greatest roadblocks were (1) the lack of technical expertise in the chosen data visualisation software of choice and (2) the lack of knowledge about the United States.

For (1), it took me 24 hours to come up with a simple visualisation as I realised I had no idea where to begin or how to even get started. But, no pain, no gain and I decided to stick it through to learn at least some technical skills from this journey. Furthermore, I had initially wanted to remove Canada from the visualisation so that the end-product would magnify the figures of the state. Unfortunately, it was not within my skill set at that point in time, so I abandoned the idea. Reddit had provided similar feedback which demonstrated to me that my intuition was on the right track.

For (2), I had only realised the difficulty upon embarking on analysing the data. For example, with regards to outliers — why would certain states have more stores? What contextual factors are needed to understand outliers? Furthermore, when it came to basic data literacy, I realised that I could not interpret the abbreviations of the state (e.g. AK stands for Arkansas) or pinpoint their location on the map without data labels.

Thankfully, this was a job where Google and a little bit of extra effort could come to rescue. I manually translated all the abbreviations so I could at least read the data better. This experience had taught me the perseverance necessary when it comes to filling knowledge gaps.

Key Findings

I decided that the visual map, while allowing the reader to view the number of stores quickly, was unable to portray the relationship between personal income per capita of each state and the number of Starbucks stores in each state.

I decided to do up a supplementary scatter plot to investigate the relationship between these two variables, and was promptly proven wrong. My initial hypothesis seemed to have been shown false — there was no distinct evidence of even a correlation between these two variables.

I offer a few preliminary reasons:

My chosen metrics were not reflective of the trend I wanted to identify. Personal income per capita may not be an accurate reflection of wealth of the state population.
My hypothesis was not aligned to the marketing intent of Starbucks. During my research on the business model of Starbucks, I found that the founders’ original intent for the shop was to remain a roaster and sell high-end coffee to discerning consumers. This was actually aligned to my initial hypothesis of the relationship between Starbucks and those-who-can-afford-it. However, I discovered that the event which enabled the ‘scaling’ of Starbucks’ operations was their move towards mass appeal — in particular, framing Starbucks as a third place — a cosy environment where people could gather and connect. What this means is that perhaps what motivates the selection of locations for Starbucks are figures like footfall and population numbers. It matters less whether an individual could afford Starbucks regularly. Rather, the availability of a sizeable segment of the population in the upper-middle class who will ‘indulge’ themselves in a cuppa was all that was needed to keep the Starbucks business flourishing.

“People stopping at a Starbucks on their way to work was almost like an indulgent act of self-love. They’d go in and buy themselves this relatively expensive coffee, and they’d feel like they’d treated themselves.” — Asst Prof Ohlsson-Corboz.

The chosen scope of states in United States was too large a geographical area for analysis. Hence the choice of states could have potentially masked the confounding factor of more populated cities having a higher number of Starbucks stores.

So, is this data visualisation still meaningful?

Well, my answer would be yes and no.

No because…

The data layer on Personal Income per Capita arguably did not add value to the visualisation as it failed to show any particular relationship or association with the number of stores in each state.

Yes because…

It gave me a data-driven approach to challenge my own assumptions about the association behind the location of Starbucks stores.and how well-off the state is.
Removing the data layer, the data visualisation was still useful in geographically mapping the number of stores per state.

As a first data visualisation, I had picked out the following feedback for myself:

Due to the lack of technical knowledge, I could not figure how to remove Canada from the map on Tableau. Without Canada, the visualisation would make the states bigger and more visually appealing.
This would then enable data labels to be placed for every state’s figure on the number of stores, rather than the current constraints on space. I was worried that if I were to place data labels for every state — I would end up with a difficult-to-read visualisation due to the numbers overlapping one another.
A mix of data visualisation approaches would facilitate greater understanding for people who have a different familiarity with the varying types of data visualisations. Different visual elements like charts, graphs, and maps, data visualization tools provide differing accessible ways to see and understand trends, outliers, and patterns in data. For example, I could have chosen to represent the number of stores each state has with a barchart rather than a map, but the geographical context would be lost. However, a barchart would likely be more accessible in terms of identifying outliers.

I may have barked up the wrong tree with my research question and choice of ‘indicator’, but all is not lost.

Reflections on what makes an indicator an indicator

Though I had failed in representing my initial hypothesis, the journey did spark thoughts on what other datasets I could use. For example, perhaps another two variables that would have a greater correlation with one another would be the price of Starbucks drinks and the Income per Capita of each state.

To quote the Chinese philosopher Lao Tzu, a journey of a thousand miles begins with a single step. At the very least, I am glad that I had attempted to try out a data visualisation format I had never tried before. This was my small step of courage amidst the many fears of failure I had with this module.

Understanding the data… A foray into data journalism

Head on to my Part 2 which consolidates my own research understanding the Californian outlier amongst the dataset I discovered on the number of Starbucks stores in each state and some data I had consolidated about American coffee culture.

post script; had no time to do a part 2 but here are some links instead! tldr; California is big, New York City (a city not a state) has the most Starbucks, followed by Seattle (a city not a state, and also the home of Starbucks). Seattle is still your most coffee-loving city as number of Starbucks stores is not the only indicator! Coffee culture is way more than Starbucks (rolls eyes).

Cheers! & maybe grab a cup of coffee :)

Written by Roxanne

No responses yet