A Market-Basket Thanksgiving

Eugene Olkhov
CompassRed Data Blog
5 min readNov 14, 2019
Photo by Element5 Digital on Unsplash

Thanksgiving is almost upon us, and that means lots of cooking and looking up new recipes!

As someone who frequently visits food blogs, I sometimes get a bit overwhelmed from the sheer volume of recipes, and especially the number of ingredients that they require. A question that I started thinking about recently, is can I determine which ingredients are used in most recipes so that I can cook a variety of meals with as few ingredients as possible?

Market-Basket Analysis

One way we could answer that question is by conducting Market-Basket Analysis. This type of analysis is frequently used in marketing to develop associations between products that customers purchase. For example, one use case is grocery stores can determine which item is typically purchased with a group of other items and use that knowledge (called “rules”) to place certain items closer in proximity inside a store.

This technique can also be used to find ingredients that are associated with other recipes, which is what I aim to find out with this analysis.

While this is not intended to be a fully-fledged tutorial on conducting Market-Basket Analysis, there are some terms that are important to know in order to understand the results, which I very briefly cover below.

Support

The output of market-basket analysis is a set of rules:

{Burger, Fries} => {Soda}

The probability that all of the items appear together in the data is the support. If half of transactions (or recipes) contain burger, fries and soda, the support would be 0.5.

Confidence

The probability that someone will purchase Soda with Burger and Fries is the confidence. If Soda is always purchased with Burger and Fries, then the confidence is 1.

Lift

The lift is a little more tricky, but essentially it is a ratio that tells us if the rule is more likely to occur than chance. A lift ratio higher than 1 suggests that the rule/relationship does occur more frequently than chance, and the higher the number the more useful the rule is.

Data

The data for this analysis can be found on Kaggle, and it was donated by Yummly.com. It contains 39,774 recipes (in the training set, which is what I used), as well as labels for the cuisine the recipe is associated with. For this analysis, I decided to get rid of the cuisine labels and focus solely on the ingredient list.

To get a sense of the data, we can take a look at the most frequent items:

Analysis

This analysis was conducted using R and the arules package. This package implements the Apriori algorithm and makes conducting Market-Basket Analysis very easy.

After some work massaging the data into the shape that arules requires, I was able to get the first set of rules. In order for the package to return rules, there needs to be an input of the minimum support and minimum confidence. This is needed since with a large dataset, such as the one used here, building every possible rule will simply be overwhelming (and computationally expensive). To start, I used support = 0.005 and confidence = 0.8. The minimum support in this case required itemsets to appear in the data at least 105 times to be included in a rule. This resulted in only 64 rules being written. Here’s a sample of some of them when sorted by highest lift:

In hindsight, it should not be a huge surprise looking at these results. Ingredients such as flour and salt have very high support since they are included in so many recipes. The relatively high support also means that the ingredients on the left-hand side of the rules will also be common ingredients. So, all this is showing us are common ingredients that also are used with other common ingredients — butter, onions, pepper, salt, flour, etc.

What happens if we reduce the minimum support to allow for some less common ingredients?

Photo by Paolo Bendandi on Unsplash

More spices!

We start to see a breakdown of common ingredients by type of cuisine:

However, the rules still consist of spices, and common pantry staples such as flour, eggs, baking powder, etc.

While further analysis could look at taking out the most common ingredients and look at what rules are developed, the point still stands that if you are preparing to do a lot of cooking, prepare your pantry with many types of spices! In particular these are the spices you should definitely keep in your pantry based on this analysis:

  1. Onion powder
  2. Cloves
  3. Cumin seed
  4. Ground turmeric
  5. Garlic powder
  6. Paprika
  7. Sesame oil
  8. Soy sauce
  9. Ground cumin
  10. And of course…salt and pepper!

These can be used in many recipes, and can make a lot of people thankful during Thanksgiving.

Happy Cooking!

--

--