Analyzing Cocktail Recipes

Dominic Graziano
INST414: Data Science Techniques
3 min readOct 23, 2023

Data and Collection

Overall this project seeks to showcase relations between different ingredients and their usage within cocktails. To gather the data I found and used an API for TheCocktailDB, to get the name of the cocktail and up to 7 ingredients. This analysis could help users by showcasing the links between different ingredients and their usage within certain drinks. Which could be used by retailers to market staple cocktail ingredients based on their versatility and number or usages. All of my code for this is located in a Jupyter Notebook and the main libraries I used were Requests, Pandas, NetworkX, and Matplotlib. From this I used the requests library to call the API in a loop, to get the ingredient list for each cocktail name in a list of 50 which was generated by ChatGPT. This was then appended to get a single dataframe to be cleaned.

Data Cleaning

To get the data into the format I wanted I had to break apart the column that held together all of the recipe ingredients as well as there measurement. By the end of the cleaning I had broken this one column into the 7 columns to equal the maximum amount of ingredients. The next problem I encountered was getting rid of the numerical values of the ingredients, and their measurements. This was fairly easy to do in regex but I not only had to get rid of all numbers but also measurements such as oz, cups, drops, etc. I also used regex to simplify some of the ingredients to narrow down the number of nodes that would be produced. Overall the dataframe ended up looking like this:

Analysis and Visualization

Overall I wanted to showcase the relation between ingredient nodes and the cocktail to be made. Within the visualization the nodes are differentiated with either the color blue for the name of the cocktail and green for the ingredients. Within NetworkX I used the spring layout and ordered from most edges to the least. Additionally when identifying the 3 most important nodes, I decided to use the nodes with the highest number of edges. This then returned one ingredient and two cocktails, in order of Gin, Zombie, and Manhattan.

Limitations

Overall, this is limited as I created a list of 50 cocktails using ChatGPT, and could have been expanded to feature more data. Additionally there are not exactly 50 cocktails in the graph as not all of the names in the list were found using the API. I would say that I could have also simplified the types of ingredients more, as I notice in the case of something like rum, there is dark, light, and other types which may contribute to the clutter of the graph. Additionally I believe some of the nodes may have been misclassified where they are listed as ingredients when they should be the cocktail’s name.

Link to my repository: https://github.com/Domgraz/Cocktail-Ingredient-Network/blob/main/Cocktail_Network.ipynb

--

--