Coffee Quality by Continent

Applying ANOVA to Coffee Q-Scores

Aiden Bromaghin
CodeX
7 min readApr 9, 2022

--

Photo by Daniel Lincoln on Unsplash

Introduction

I once thought that there were two kinds of people in the world: those who drink coffee, and those who are sad. This general worldview guided me through most of undergrad and seemed to me a perfectly reasonable lens through which to view people. It wasn’t long though, until I realized that my outlook required a little more nuance. There were sad people, those who drink coffee, and coffee snobs.

This revelation was the result of working in the specialty coffee industry during and after college. I may not have liked the term coffee snob, but I absolutely was one. I worked as a barista, trainer, and roaster at a coffee company that focused on light-roasted single origin coffees. I learned a lot about coffee during that time, from reading roasting profiles to dialing in a coffee’s brewing parameters. One of the most important aspects of my job though, was learning how to objectively measure a coffee’s quality.

This is accomplished through a process called cupping. It involves the blind tasting of several coffees to assess different aspects, such as flavor, aroma, body, sweetness, and acidity. The purpose is to both evaluate coffees for their quality and to give coffee professionals a common language to talk about their product. Its used by roasters to evaluate roast batches as well as the green coffee itself, but it also plays an important role before the coffees make it to the roaster. Prior to export from their country of origin, coffees are evaluated and assigned a quality score (Q-score) based on 10 attributes. The score ranges from 0–100 and anything above 80 is considered “specialty”.

The Problem

The tricky thing with all this is that taste is an inherently subjective experience. If I give you a cup of coffee and you hate it, you probably aren’t going to care if I tell you that its an extremely rare, expensive, high-scoring coffee. The value is in the subjective experience of drinking it — and the caffeine — and if that experience isn’t pleasant, you probably aren’t going to care about much else.

Like most people, I tend to have pretty strong feelings about what makes coffee “good.” One of the first things I look at is what region the coffee is grown in. Similar to wine, terroir plays a pretty big role in the coffee growing process, with different regions leading to different flavor profiles. There’s a lot of variation within the regions, but there’s a broad consensus that certain growing regions are more likely to produce coffees with certain traits. These regions can roughly be broken down into Central and South America, Africa, and Southeast Asia. Based on my experiences, I firmly believe that some countries produce objectively better coffees (on average) than others due to the conditions of the soil and the varieties grown there.

So we know that there are standardized methods of assessing coffee quality, that taste is subjective, and that personal preference will play a big role in determining if someone actually likes a coffee. I know that I tend to like coffees from some growing regions better than others. Now, if only there was some way to test if the growing region actually affects quality — turns out, there is!

The Data

I found this dataset on Kaggle. It was kindly scraped from the Coffee Quality Institute and contains a number of fun and interesting variables worth exploring. However, I’m only interested in the relationship between quality and growing region. If we take a look at a boxplot for the quality, we get some interesting results.

Our data is pretty tightly centered around 80. We’ve got a handful of values in the upper 80’s and some in the 60–80 range. We’ve also got one observation right down around 0. Based on how low the value is, I’m guessing this was a data entry error, and I’ll remove this value before continuing. Normally I would also check for outliers, but I’m not going to in this case for two reasons. The data is clustered around a small range of values, and I know from experience that the lower scores we can see in the boxplot aren’t unusually low if we’re looking at coffee production as a whole.

The dataset contains records on the country of origin for each coffee. However, I’m interested in getting a higher level overview of the relationship between quality and growing regions as a whole. I wrote a quick function to modify the data by adding a loosely defined “continent” column containing the growing region for each entry. Based on the where the coffees in the dataset came from, I ended up with 5 possible options: Central/South America, Africa, North America, Southeast Asia, and Oceania. Let’s look at the breakdown of quality according to growing region:

Again, our data doesn’t seem to have a lot of range. The exception is Central/South America, which has a number of poorly performing coffees dragging down its average quality score in this dataset. Every region has outliers, but none of them have unrealistic or worrisome values. We can see that the medians for all of our five regions are centered between 80 to 85. With such similar results, how can we tell if there is a significant difference?

Analysis of Variance (ANOVA)

ANOVA is a technique that exists for exactly this purpose. It looks at the differences of means between subsets of data to determine where the variance in the data is occurring. It’s ideal in situations like this, where we want to predict a quantitative variable (quality) based on a categorical feature (continent). It’s extremely easy to implement, and should be able to tell us what regions are producing the best coffee and whether or not the differences between each group actually matter.

Once the ANOVA model is fit to the data, we can conduct some post-hoc tests to see what we have learned. There’s a lot of options to choose from, but I like using a pairwise t-test and Duncan’s Multiple Range Test (MRT). I’ve included the output from the tests below.

Pairwise T-Test

The t-test shows the p-value for the variation between each combination of growing regions. If the p-value is less than 0.05, this means that there is only a 5% chance that there is not a meaningful difference between the two regions. For example, we can see that there’s a significant difference between the quality of coffees from Asia and Africa, but coffees from Asia and North America are likely to be pretty similar in quality.

Duncan MRT

The Duncan MRT provides some additional insight. We can see that Africa is in its own group with an average score of 84, followed by Oceania at 83, and then we have Asia and the Americas in a group with average scores in the 81–82 range. It looks like a coffee’s region of origin does affect quality after all!

Interpreting Results

Based on the data we have and brief analysis we did, it does look like the growing region affects a coffee’s quality. I feel especially vindicated since my favorite growing region, Africa, came out on top. However, I don’t think that we can generalize these results, or even trust them to be concrete. There’s a couple reasons why.

We already saw that the majority of values for quality in our dataset hover around 80, but coffees from Central and South America had a much wider range of values. While actually going through the analysis in R Studio, I also noticed that there were only a handful of entries from Papua New Guinea, which is our only country in the Oceania category. Long story short, we don’t have a representative sample of data to draw conclusions from. This dataset wasn’t curated and released by the Coffee Quality Institute — it was scraped from the web and posted to Kaggle by a stranger on the internet, so I suppose I can’t let my expectations be too high.

Another factor that makes me hesitant to accept our findings is the results of Duncan’s MRT. Based on my experience at the cupping table, I would’ve expected Central and South American coffees to be in their own category with a higher average quality score, and coffees from Oceania and Asia to be lumped together and come in last. The results of the test don’t line up with my experience. This isn’t to say that my experience is right and the data is wrong, but combined with the data quality issues listed above, it’s another reason to not take the results at face value.

Conclusion

So what can we draw from this? Given that we did find statistically significant differences in quality based on growing region, I think we can tentatively say that growing region does affect quality. However, I don’t think we can make any definitive claims about which regions produce the best coffee due to our issues with data quality. To do that, we’d need more data with better sampling. Until then, we’re left to our own prejudices and preferences to determine which regions produce the “best” coffees.

Regardless of what kind of coffee you like, or if you’re one of the mystical few who function without it, thank you for taking the time to read this. By now you probably need a refill — you deserve it.

Photo by Isaac Benhesed on Unsplash

--

--

Aiden Bromaghin
CodeX

Data science graduate student with a background in consumer and mortgage lending.