Finding Similar Beers to “Buzz”

Published in

INST414: Data Science Techniques

3 min readNov 29, 2023

Introduction and Insight:

In this paper, I undergo comparative exploration of the unique characteristics and flavor profile of “Buzz”, one of my favorite beers, and its similarities to other popular beers in the market. With an ever-expanding universe of craft and commercial beers, understanding the nuanced qualities that set “Buzz” apart, yet also align it with well-known brews, offers intriguing insights into the art and science of beer making. This comparison sheds light on the broader spectrum of beer flavors and styles, enabling both connoisseurs and casual drinkers to appreciate the similarities of certain popular beers. In this analysis I look to identify what other popular beers are most similar to “Buzz” for the purpose of informing beer enjoyers of these similarities with the hope of helping them branch out and try new/ similar products on the market.

Data Source and Similarity Metrics:

In order to source this data on beer, I wrote some code in order to access one of the free APIs on the course GitHub with data on popular types of beer. I wrote some code to access the API and convert it to a JSON file. The code is snippet is provided here:

I decided to parse the file into a JSON file and then convert that into a data frame in python using Pandas libraries, allowing the data to be easily viewed and cleaned. This data includes features I will be examining to determine similarity including: Name, ABV, IBU, Volume, Boil Volume, Method, Ingredients, etc. I used three different similarity metrics for this analysis, those being: Jaccardian similarity, Cosine similarity, and Dimensionality reduction.

Top 10 Most Similar Items:

Through this analysis I was able to determine the top 10 most similar beers to “Buzz” according to the similarity metrics listed above. My finding are posted here:

Software Used and Data Cleaning:

The software I utilized for this analysis includes Pandas libraries. I used Pandas libraries to create the data frame containing the beer data as well as filter just the relevant columns needed to determine similarity (ABV, IBU, Volume, Boil Volume, Method, Ingredients, etc.). I also used softwares Sklearn, Numpy and NetworkX to perform similarity analysis and eventually create a visual chart showcasing these similarities.

The cleaning for this dataset included filtering relevant columns, removing rows with null values. Much of the debugging process was fixing syntax errors in my code along with figuring out how to create charts like the one provided above. To solve this issue I referenced YouTube and Stack Overflow to see examples of similarly created charts / figures.

Limitations and Bias:

The limitations of this analysis are that the list of beers provided in the dataset is not a comprehensive list of every single beer in the United States. There are many local breweries that I’m sure, if they were to be added, could alter my findings and conclusions. This analysis is also inherently biased as I am only looking for similarities of my favorite beers, that being “Buzz”. I do not consider examining other types of beer which may be of interest to another consumer.

Link to my code in GitHub:

https://github.com/dav1dalvaro/INST414/blob/main/a333.ipynb