Apriori Algorithm in Association Rule Learning

Amit Ranjan
Analytics Vidhya
Published in
5 min readDec 4, 2020

--

Before we go into Apriori Algorithm I would suggest you to visit this link to have a clear understanding of Association Rule Learning.

What is Apriori Algorithm?
Apriori Algorithm is one of the algorithm used for transaction data in Association Rule Learning. It allows us to mine the frequent itemset in order to generate association rule between them.
Example: list of items purchased by customers, details of website which are frequently visited etc.

This algorithm was introduced by Agrawal and Srikant in 1994.

Principles behind Apriori Algorithm

  1. Subset of frequent itemset are frequent itemset.
  2. Superset of infrequent itemset are infrequent itemset.

I know you are wondering this is too technical but don’t worry you will get it once we see how it works!

Apriori Algorithm has three parts:
1. Support
2. Confidence
3. Lift

Support( I )=
( Number of transactions containing item I ) / ( Total number of transactions )

Confidence( I1 -> I2 ) =
( Number of transactions containing I1 and I2 ) / ( Number of transactions containing I1 )

Lift( I1 -> I2 ) = ( Confidence( I1 -> I2 ) / ( Support(I2) )

Algorithm in a nutshell
1. Set a minimum support and confidence.
2. Take all the subset present in the transactions which have higher support than minimum support.
3. Take all the rules of these subsets which have higher confidence than minimum confidence.
4. Sort the rules by decreasing lift.

Mathematical Approach to Apriori Algorithm

Consider the transaction dataset of a store where each transaction contains the list of items purchased by the customers. Our goal is to find frequent set of items that are purchased by the customers and generate the association rules for them.

We are assuming that minimum support count is 2 and minimum confidence is 50%.

Step 1: Create a table which has support count of all the items present in the transaction database.

We will compare each item’s support count with the minimum support count we have set. If the support count is less than minimum support count then we will remove those items.

Support count of I4 < minimum support count.

Step 2: Find all the superset with 2 items of all the items present in the last step.
Check all the subset of an itemset which are frequent or not and remove the infrequent ones. ( For example subset of { I2, I4 } are { I2 } and { I4 }but since I4 is not found as frequent in previous step so we will not consider it ).

Since I4 was discarded in previous one, so we are not taking any superset having I4

Now, remove all those itemset which has support count less than minimum support count. So, the final dataset will be

Step 3: Find superset with 3 items in each set present in last transaction dataset. Check all the subset of an itemset which are frequent or not and remove the infrequent ones.

In this case if we select { I1, I2, I3 } we must have all the subset that is,
{ I1, I2 }, { I2, I3 }, { I1, I3 }. But we don’t have { I1, I3 } in our dataset. Same is true for { I1, I3, I5 } and { I2, I3, I5 }.

So, we stop here as there are no frequent itemset present.

Step 4: As we have discovered all the frequent itemset. We will generate strong association rule. For that we have to calculate the confidence of each rule.

All the possible association rules can be,
1. I1 -> I2
2. I2 -> I3
3. I2 -> I5
4. I2 -> I1
5. I3 -> I2
6. I5 -> I2

So, Confidence( I1 -> I2 ) = SupportCount ( I1 U I2 ) / SupportCount( I1 )
= (2 / 2) * 100 % = 100%.

Similarly we will calculate the confidence for each rule.

Since, All these association rules has confidence ≥50% then all can be considered as strong association rules.

Step 5: We will calculate lift for all the strong association rules.

Lift ( I1 -> I2 ) = Confidence( I1 -> I2 )/ Support( I2 ) = 100 / 4 = 25 %.

Now we will sort the Lift in decreasing order.

I know you are thinking we have done all these calculation for what? Believe me I also didn’t get it in first time.

It means that there is 25% chance that the customers who buy I1 are likely to buy I2.

There you go. That’s the Apriori Algorithm where we find the association between different items.

References:
1. https://www.geeksforgeeks.org/apriori-algorithm/
2.https://www.udemy.com/course/machinelearning/learn/lecture/6455322#questions

--

--