Association Rule Mining Including Apriori Algorithm

Published in

Analytics Vidhya

5 min readJan 2, 2022

Supermarkets have large number of customers checking into the shops and as Shop Owners they need to decide what products could be at which shelve and etc . So next week the shop decides to put up the new release of Iphone but they need to decide what goes with it how we are going to attract the customers etc.. So in order to answer these questions the company does Data Mining and tries to find association between items.

Itemset

Itemset is a set of two or more similar items that are bought together by the customers .

E.g Bread and Butter , Paper and Pen , Laptop and Charger etc

If we take this bread and butter as item that are bought by the customers frequently . The shop decides to introduce eggs in the offer then the customers are likely to buy eggs too. So these are Association between items

Association Rule

Association rules are of if→then format. So if users buys A then he buys B. This is called as Single Cardinality.

Cardinality is the no of items in the particular set. So if there is a set say A={1,2,3} then it’s cardinality is 3.

⇒ represents that in the same transaction people would buy A and B together .

A is called as the antecedent and B is the consequent .

So these associations can be increased such as Bread,Butter,Eggs and Pen,Paper,Pencil etc . When it increases the cardinality becomes multiple

Measure of Association Rules

These rules are measured using the following

Support
Confidence
Lift

Support

It is the frequency of item that the user bought and the combination of the frequency of items that the user purchased on the same transaction

The use of support is to filter the less frequent items bought and eliminate them.

Support = freq(A)/N

Support = freq(A,B)/N where N is the no of transactions and freq is the no of transaction in which A appears or A and B appears.

Confidence

It denotes how often A and B occur together given the number of times A occur one after the other .

The use of confidence is to find when the user buys A,B but not C frequently then we can eliminate the association rule

Confidence=freq(A,B)/freq(A)

Although we frame these rules say for transactions we get 5000 rules then for each and every item it is impossible to compute the rules . So we introduce one more measure i.e Lift.

Lift

It denotes the strength of any rule.

The use of lift is we can determine the occurrence of randomness rather than the association and eliminate the rule.

Lift=Support/Support(A)*Support(B)

Although we find the strength and the associations of rules the companies requirements differ every time. So the company fixes Minimum Support and Confidence . So beyond the particular threshold value only the association rules are selected.

Example for Support , Confidence and Lift

There are five items named A,B,C,D,E and there are 5 transactions given along with the rules. Calculating the support , confidence and lift the table would be as follows.

Frequent Itemset

Frequent itemsets are those items whose support is greater than the threshold value or user-specified minimum support. It means if A & B are the frequent itemsets together, then individually A and B should also be the frequent itemset.

Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two transactions, 2 and 3 are the frequent itemsets

Apriori Algorithm

It is used for efficiency of level-wise generation of frequent itemset thus reducing the search space

Rule : All Non Empty Subsets of aa frequent itemset must also be frequent.

There are 2 steps in apriori

Join — (K+1) itemset from K
Prune — count<minimum support values

Association Rule Mining

There are 2 steps.

Find all frequent itemsets using Apriori Algorithm or FD-Growth
Association Rules from Frequent Item Sets ( Minimum Support and Confidence threshold)

Eg for Association Rule Mining using Apriori

Minimum Support = 2 and Minimum Confidence = 60%

Each Candidate/Product are I1, I2, I3, I4 and I5

We have to start with single cardinality find the support count
Then check whether the support_count<minimum support and eliminate the items
Increase the cardinality and do the previous steps again until the set becomes empty