At SRCCON 2016 in Portland, Oregon, I gave a lightning talk about writing a statistics paper analyzing your own beer intake. Here is the paper I wrote. (Slides are here if you’re curious.)
Introduction: As a frequent craft beer drinker, I have certain assumptions about what I expect to like in a beer. However, I would like a more rigorous investigation into what makes me like a beer. Thus, we can use an app that tracks both my personal ratings of a beer as well as information about that beer to look at the relationship between my preferences and three variables: a given beer’s alcohol by volume content (better known as its ABV), where the beer was brewed, and what style the beer is defined as.
Data and methods: The dataset is assembled using a mobile app called Untappd, which allows you to “check in” when you drink a beer — you rate the beer, and Untappd links up with its database to provide a range of technical and demographic information about the beer.
Here my dataset consists of 516 “checkins” I have logged using the app. The data dates back to spring 2010, though I did not use the app at all between spring 2011 and spring 2013 (July 2016 note: writing this made me sick of overthinking something as joyous as booze so it prompted another hiatus, but realizing I had no data from the first time I beercationed in Portland, I’ve once again come out of Untappd retirement for this trip).
Each observational unit includes a beer (categorical, nominal), the brewery that makes the beer (categorical, nominal), the state or other administrative division in which the brewery is located geographically (categorical, nominal), the beer style (categorical, nominal), the alcohol by volume content percentage of the beer (numerical, continuous), and the rating I have assigned the beer between one and five stars, with half-stars permitted (numerical, discrete). We can consider this a sample of the overall population of my beer drinking habits, since I don’t use the app every single time I drink a beer.
First we’ll examine whether the distribution of the numerical variables falls into a normal pattern, to see how fair a judge I am as well as get information about trends in ABV content. To examine relationships between the variables and the ratings, we’ll have to use different methods. Since ABV is a numerical, continuous variable, we’ll plot that data on a scatterplot and use linear regression to examine the relationship with my ratings. State of origin and style are both categorical variables, so we’ll use chi-squared two-way tests for independence to look at the relationships there.
Distribution: Assuming that I’m a fair and reasonable judge, we might expect a normal, bell-curve-shaped distribution of ratings, with few one or five-star ratings, and most of the ratings clustering around three stars. But this data set decidedly fails tests for normality. We can see in the histogram below that the rating data is skewed left.
I’m also curious about the distribution of ABV. A beer’s alcohol content, better known as ABV, is represented as a percentage. High-alcohol beers with ABV numbers of 10% or higher have been popular in craft beer circles over the past five years or so, though “session” beers (with ABV below 5%) have risen in popularity in response more recently. ABV appears to be skewed right. Why? The IQR of 5% — 6.8% represents the majority of the craft beer available on the market today, but brewers are able to brew a beer with any ABV up to that high-alcohol range in the low teens, but it’s very difficult, scientifically speaking, to brew a beer with less than 3% ABV. The exception to that statement is the handful of 0% non-alcoholic beers, which appear as outliers here.
Relationship between ABV and rating: Mainly I’m interested in what factors might influence the ratings I’m assigning to beers. In my dataset, the simple linear regression for this data can be expressed as the equation y = 0.07993x + 3.43089, where y is the expected mean rating given ABV of x percent. With a hypothetical zero-ABV beer, we’d expect a mean rating of 3.43089. For every increase of one percent of ABV, the expected mean rating increases by 0.07993 stars, illustrated in the scatterplot below.
That said, the correlation here is not strong — with a coefficient of 0.1837568, the correlation between ABV and rating is not terribly strong, but does go in a positive direction. (Perhaps that small correlation can be explained by having me give out more generous grades when I’m more buzzed?)
Ratings by state: Craft beer is primarily an American phenomenon at this point in time, but anecdotally, not every state’s beer industry is created equally. Some states, including my former home state of Vermont, have developed a reputation for world-class beer despite their small size.
As we’re working with inferences for categorical data here, we’d like to conduct a Chi-squared test for independence to determine whether state of origin has any apparent influence on my beer ratings. Unfortunately there are many states where I’ve only logged a handful of checkins on Untappd, and for these the expected counts are simply too low for a proper Chi-squared test. In the interest of conducting tests for this paper, we’ll filter the data so that we only count data from a state that has at least 15 checkins — this leaves us six states.
Our hypotheses are as follows:
H0 = There is no relation between a beer’s state of origin and its rating.
HA = A beer’s state of origin and the rating I gave it are related.
We’ll assume an a of 0.05. Using statistical software, we compute a test statistic of X-squared = 71.191, with 40 degrees of freedom (six states by nine possible ratings). We’re left with a p-value of 0.001738, which is much lower than 0.05. Therefore we reject the null hypothesis — there is some degree of relationship between where a beer is brewed and the rating I give it on Untappd.
Ratings by style: This is similar to the above. Across the 516 units in my dataset, there are 81 different beer styles. Some of the more obscure ones have only been logged a handful of times, and therefore our expected counts are once again messed up. This time I will filter the dataset to only consider the data where the style has been logged at least 10 times, which leaves me 316 of my checkins spread across 12 styles.
Our hypotheses are as follows:
H0 = There is no relation between the style of a beer and its rating.
HA = I give higher ratings to certain styles of beer than others.
We’ll assume an a of 0.05, once again. Using statistical software, we compute a test statistic of X-squared = 204.01, with 77 degrees of freedom (12 styles by eight possible ratings). We’re left with a p-value of 1.958e-13, which is again much lower than 0.05. Therefore we reject the null hypothesis — there is some degree of relationship between a beer style and the rating I give it on Untappd.
Conclusion: The rating I assign a beer in Untappd has some relationship with both its style and its state of origin, as well as a slight relationship with its ABV content.
Because the p-value for our style test is even smaller than our p-value for our states test, that means we have a lower probability of mistakenly rejecting the idea that the variable is unrelated to a beer’s rating. We can assume that the relationship between a beer’s style and its rating is somewhat stronger than the relationship between a beer’s state of origin and its rating.
Using a pivot table, here are the average ratings by style, among styles that have been logged at least 10 times.
We also found a very slight relationship between a beer’s ABV content and its rating. However, that relationship was measured as a correlation, rather than through a p-value, therefore it’s difficult to judge whether that relationship is stronger or weaker than the relationship with style, or with state of origin. That said, the correlation was not terribly strong in that case.
There are a few weak spots with this dataset. For one, I would like to see a more normal distribution of ratings if I were to draw solid conclusions from it. I think there are a couple reasons for the skewness of the ratings — I am perhaps too easy a judge when it comes to craft beer and have not refined my palate enough, and/or I am more likely to check in on the Untappd app if I’m drinking a good beer than a bad one. (For the latter reason, we can’t quite consider this sample of my entire drinking habits a truly random one, either.)
I also cannot conduct an ideally accurate two-way test for independence, because in my contingency tables, many of the expected cell counts do not add up to 5 — a lack of one- to two-star reviews, very few beers of an unusual obscure style, and very few beers from faraway states like Wisconsin or Utah.
In the future, a stronger dataset would help draw stronger conclusions, as the current state of the dataset leaves a lot of room for error in the tests for relationships. And it’d be even better if the distribution of ratings were more normalized. This is a personal problem — I should be more consistent in my use of Untappd, and have a more clear outline of when a given beer should get a given rating.