Association Rule(Apriori and FP-Growth Algorithms) with Practical Implementation

Amir Ali
The Art of Data Scicne
8 min readFeb 3, 2019

In this chapter, we will discuss Association Rule (Apriori and FP-Growth Algorithms) which is an unsupervised Machine Learning Algorithm and mostly used in data mining.

This chapter spans 5 parts:

  1. What is the Association Rule?
  2. How does the apriori Algorithm work in Association Rule?
  3. How FP-Growth Algorithm Work in Association Rule?
  4. Practical Implementation of the Apriori Algorithm.
  5. Practical Implementation of FP-Growth Algorithm.

1. What is the Association Rule?

Most ML algorithms in DS work with numeric data and tend to be quite mathematical. *But, ARM is perfect for categorical data and involves little more than simple counting!

· It’s a rule-based ML method for discovering interesting relations between variables in large databases.

· Identifies strong rules using some measures of interestingness.

No love lost.

On Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer.

This anecdote became popular as an example of how unexpected association rules might be found from everyday data.

Such information can be used as the basis for decisions about marketing activities such as Promotional Pricing or Market Basket Analysis.

Mathematics Involved

The problem can be seen as:

· Let I = {i_1,i_2,….} be a set of binary attributes called items.

· Let D = {t_1,t_2,….} be a set of transactions called the database.

· Each transaction in D contains a subset of items in me.

The simple rule looks like -
t_1 t_2 (Here, t_i is generally a single item or a set of items)
*t1: Antecedent, t2: Consequent

Supermarket Example

I = {milk, bread, butter, beer, diapers}
D is as shown below:

1 indicates the presence of the item in that transaction, and 0 shows the absence.

Rule: {butter, bread} {milk}, meaning that if butter and bread are bought, customers also buy milk.

Thresholds used for Relations

· Support — Indication of how frequently the itemset appears in the database. It is defined as the fraction of records that contain X∪Y to the total number of records in the database. Suppose, the support of an item is 0.1%, it means only 0.1% of the transactions contain that item.

Support (XY) = Support count of (XY) / Total number of transaction in D

· Confidence — Fraction of the number of transactions that contain X∪Y to the total number of records that contain X.
It’s is a measure of the strength of the association rules.
Suppose, the confidence of the association rule X⇒Y is 80%, it means that 80% of the transactions that contain X also contain Y together.

Confidence (X|Y) = Support (XY) / Support (X)

· Other measures include Lift, Conviction, All-Confidence, Collective strength and Leverage.

2. How does the Association Rules Algorithm work?

2.1: How the Apriori algorithm works?

Dataset Description:

This dataset has two attributes and 5 transactions of different item here M= mango, O=orange, N=nut, E=Elderberry, y= yam, D= date, A= apple, U= Uglifruit, c=cherry, and k= kiwifruit and our target is that we take this transaction of five instances to make an association rule of this and recommend the item.

Min support = 60%

Min confidence = 80%

Now find the support count of each item set

C1 =

Support count = Number of items set repeat in a different transaction.

Compare min support with each item set support count.

So

So skip the item set which is repeated less than 3 from above C1 table.

L1 =

Now make a combination of Item Set in pair and count the Support to generate C2

C2=

Support Count = No of itemset repeat in a different transaction.

Again Compare C2 with Support which is 3 so Skip the itemset which supports count less than 3.

As we showed in below table

L2=

Now Make a Pair to Generate C3

C3=

Support Count = No of itemset repeat in a different transaction.

Again Compare C3 with Support which is 3 so Skip the itemset which supports count less than 3.

As we showed in below table

L3=

Now create an association rule with support & confidence from O, K, E.

Now Compare Confidence with min confidence which we mention earlier which 80% so skip Itemset with less than 80% confidence

As we have shown Below in Table

Hence Final Association Rule Are

Market Basket Analysis.

O Ʌ K ⇒ K

Orange Ʌ Kiwi fruit ⇒ Elderberry

O Ʌ E ⇒ K

Orange Ʌ Elderberry ⇒ Kiwi fruit

So Finally

If you buy Orange and Kiwi Fruit together then he prefers or recommends also buy Elder Berry.

Or if you buy Orange and Elderberry together then he prefers or recommends also buy Kiwi Fruit.

Note: If you want this article check out my academia.edu profile.

2.2: How the FP-Growth algorithm works?

Dataset Description:

This dataset has two attributes and five instances first attribute is Transaction Id and the Second attribute is basically Itememset.

Four Step to Solve this

1. Find the minimum support of each item.

2. Order frequent itemset in descending order.

3. Draw an FP tree.

4. Minimum frequent pattern from the FP tree.

So let’s solve step by step

Step 1: Find the minimum support of each item

Minimum support = 3

Skip item from the above table which is less than 3 so

Step 2: Order frequent item in descending order

f = 4, c= 4, a= 3, b= 3, m= 3, p=3

Step 3: FP tree

  1. Insert the first transaction

2. Insert the second transaction

3. Insert the third transaction

4. Insert the fourth transaction

5. Insert the fifth transaction

Step 4: Mining frequent Pattern from FP tree.4

So finally

If you buy

1. c then he suggests p

2. F, c, a suggest a

3. F, c suggest a

4. F suggest c

Note: If you want this article check out my academia.edu profile.

9.3: Practical Implementation of Association Rule.

9.3.1: Implementation of Apriori Algorithm

Dataset Description:
This dataset contains information on Market Basket Optimization which has 7500 instances data of different items that have max 20 items in one instance and a minimum 1 item may have contained in some instances. We use this dataset to make a recommendation system for our market Basket Analysis and we use the apriori algorithm to make the rule for Market Basket Analysis.

Part 1: Data Preprocessing:

1.1 Import the Libraries

In this step, we import three Libraries in Data Preprocessing part. A library is a tool that you can use to make a specific job. First of all, we import the numpy library used for multidimensional array then import the pandas library used to import the dataset and in last we import matplotlib library used for plotting the graph.

1.2 Import the dataset

After importing the Libraries using the pandas library import the dataset which is in CSV and then makes a new list of transaction makes and appends our dataset in this list to make the right format of the index.

Part 2: Building the apriori model:

In this part we building the apriori model but unfortunately not such library for association rule in scikit learn so we use developer made library and used after import the library we initialize the apriori algorithm and pass some parameter first is transaction which is our dataset take min_supprot=0.003 min_confidence=0.2 min_lift=3 and min_length=2 which I already explain in mathematical portion.

Part 3: Check the Rule:

In this part, we check our rule for Market Basket Analysis

These above are the rule if you buy this and this then these recommendations as we see above So that’s it.

If you want dataset and code you also check my Github Profile.

3.2: Implementation of FP Growth Algorithm

Unfortunately, there is no such library to Build an FP tree So we doing from Scratch.

If you want dataset and code you also check my Github Profile.

End Notes:

If you liked this article, be sure to click ❤ below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Github for code & dataset follow on Aacademia.edu for this article, Twitter and Email me directly or find me on LinkedIn. I’d love to hear from you.

That’s all folks, Have a nice day :)

--

--