Day — 28: 30 Days Machine Learning Project Challenge;

Association Rule Learning Using FP-Growth🥛🍞🧈

Abbas Ali
4 min readMar 30, 2024
Photo by Hadi Yazdi Aznaveh on Unsplash

Hey there!👋

Today’s topic is FP-Growth Algorithm (Frequent Patterns Growth Algorithm).

In unsupervised learning, there are two important topics, one is clustering and another one is association.

Most people know about clustering but very few people know about association. Let me first give you a brief explanation of what it is.

Imagine you own a supermarket and in your database, you have some data about people’s buying patterns. For example,

Customer-1: [Milk, Bread, Butter, Curd]

Customer-2: [Bread, Butter, Banana, Onion]

Customer-3: [Ice Cream, Chocolate, Biscuit]

and so on.

Now what is the best thing you can do with this data?

If you know people who are buying “Bread” tend to buy “Butter” most of the time, then you can place “Butter” and “Bread” together in your supermarket, which can potentially increase your sales.

(For an experiment, go to a nearby supermarket and see how products are placed. See what is beside the “Soap” counter, is there “Shampoo”? See what is beside the “Tooth Brush” counter, is there “Tooth Paste”? )

These products are placed together purposefully.

To do these placements you need to understand the buying patterns of your customers and then you need to know which two products should be kept together.

This process is known as Market Basket Analysis.

This is what we can achieve using association rule mining algorithms like Apriori and FP-Growth.

I have already written an article for Apriori👇.

In this article, we are going to look at the FP-Growth algorithm.

Before moving on you need to know what is an “Itemset”

Itemset — Itemset means a group of items like [“Milk”, “Butter”, “Banana”] or sometimes it can be a single item like just “Milk”.

As the name suggests, Frequent Pattern Growth.

This algorithm will find the frequently occurring patterns in the dataset(containing customer’s purchasing patterns).

You will understand this more clearly once we get into the code.

We also need to set a threshold(min_support) which will tell the algorithm to only find frequent patterns that are above the threshold. For example, if there are 10 instances in the dataset and our threshold is 0.6(60%) if pattern [“Milk”, “Butter”] occurs in 6 instances in the dataset then it will be considered as frequent otherwise not.

Let’s get into the code.

First, create a sample dataset like the one below.

dataset = [['Milk', 'Curd', 'Biscuit', 'Bread', 'Eggs', 'Banana'],
['Coffee', 'Curd', 'Biscuit', 'Bread', 'Eggs', 'Banana'],
['Milk', 'Salt', 'Bread', 'Eggs'],
['Milk', 'Unicorn', 'Salt', 'Bread', 'Banana'],
['Salt', 'Curd', 'Sugar', 'Bread', 'Ice cream', 'Eggs']]

Before applying the FP-Growth algorithm, we need to do some preprocessing on the above data. This can be done using something called TransactionEncoder available in the mlxtend library.

The TransactionEncoder works similarly to a OneHotEncoder. See code below to understand clearly.

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_array = te.fit_transform(dataset)

te_array
"""
OUTPUT:
array([[ True, True, True, False, True, True, False, True, False,
False, False],
[ True, True, True, True, True, True, False, False, False,
False, False],
[False, False, True, False, False, True, False, True, True,
False, False],
[ True, False, True, False, False, False, False, True, True,
False, True],
[False, False, True, False, True, True, True, False, True,
True, False]])
"""

After applying TransactionEncoder we will get an array of Boolean values. If it does not make sense to you now, no problem go ahead you will get it.

te.columns_
"""
OUTPUT:
['Banana',
'Biscuit',
'Bread',
'Coffee',
'Curd',
'Eggs',
'Ice cream',
'Milk',
'Salt',
'Sugar',
'Unicorn']
"""
df = pd.DataFrame(te_array, columns=te.columns_)
df
"""
OUTPUT:
Banana Biscuit Bread Coffee Curd Eggs Ice cream Milk Salt Sugar Unicorn
0 True True True False True True False True False False False
1 True True True True True True False False False False False
2 False False True False False True False True True False False
3 True False True False False False False True True False True
4 False False True False True True True False True True False
"""

After converting the array into a data frame we can clearly see the purpose of the Boolean values.

Let’s take “Banana” the first column. If “Banana” is present in the 0th index of the original dataset then the value will be “True” else “False”.

Now, let’s apply the FP-Growth Algorithm.

from mlxtend.frequent_patterns import fpgrowth

fpgrowth(df, min_support=0.6) #Setting the threshold to 60%

"""
OUTPUT:
support itemsets
0 1.0 (2)
1 0.8 (5)
2 0.6 (7)
3 0.6 (4)
4 0.6 (0)
5 0.6 (8)
6 0.8 (2, 5)
7 0.6 (2, 7)
8 0.6 (4, 5)
9 0.6 (2, 4)
10 0.6 (2, 4, 5)
11 0.6 (0, 2)
12 0.6 (8, 2)
"""

The above output shows the index value in the “itemsets” column, let’s replace that with the original value.

fpgrowth(df, min_support=0.6, use_colnames=True)
"""
OUTPUT:
support itemsets
0 1.0 (Bread)
1 0.8 (Eggs)
2 0.6 (Milk)
3 0.6 (Curd)
4 0.6 (Banana)
5 0.6 (Salt)
6 0.8 (Bread, Eggs)
7 0.6 (Bread, Milk)
8 0.6 (Eggs, Curd)
9 0.6 (Bread, Curd)
10 0.6 (Bread, Eggs, Curd)
11 0.6 (Bread, Banana)
12 0.6 (Bread, Salt)
"""

Now, from the above output we can see, that the item “Bread” is being purchased by all the customers.

Let’s focus on itemsets with two values.

See index 6 which says, [“Bread”, “Egg”] is being purchased by customers together 80% of the time(Maybe for bread omelet). This means we should place those two products together.

Similar to the above example, see the index 7, 8, 9, 10, 11, and 12. Products on those indexes should also be kept together for better sales.

That’s it.

I hope this article was helpful.

2 days more.

P.S. You can connect with me on X.

X: AbbasAli_X

--

--

Abbas Ali

Python, Data Science, ML, DL, CV, NLP, AI...Sounds like something you are interested in? Well, I am here to write for you! https://abbasaliebooks.gumroad.com/