Association Rule Mining Including Apriori Algorithm

Anjana S
Analytics Vidhya
Published in
5 min readJan 2, 2022

Supermarkets have large number of customers checking into the shops and as Shop Owners they need to decide what products could be at which shelve and etc . So next week the shop decides to put up the new release of Iphone but they need to decide what goes with it how we are going to attract the customers etc.. So in order to answer these questions the company does Data Mining and tries to find association between items.

Itemset

Itemset is a set of two or more similar items that are bought together by the customers .

E.g Bread and Butter , Paper and Pen , Laptop and Charger etc

If we take this bread and butter as item that are bought by the customers frequently . The shop decides to introduce eggs in the offer then the customers are likely to buy eggs too. So these are Association between items

Association Rule

Association rules are of if→then format. So if users buys A then he buys B. This is called as Single Cardinality.

Cardinality is the no of items in the particular set. So if there is a set say A={1,2,3} then it’s cardinality is 3.

⇒ represents that in the same transaction people would buy A and B together .

A is called as the antecedent and B is the consequent .

So these associations can be increased such as Bread,Butter,Eggs and Pen,Paper,Pencil etc . When it increases the cardinality becomes multiple

Measure of Association Rules

These rules are measured using the following

  • Support
  • Confidence
  • Lift

Support

It is the frequency of item that the user bought and the combination of the frequency of items that the user purchased on the same transaction

The use of support is to filter the less frequent items bought and eliminate them.

Support = freq(A)/N

Support = freq(A,B)/N where N is the no of transactions and freq is the no of transaction in which A appears or A and B appears.

Confidence

It denotes how often A and B occur together given the number of times A occur one after the other .

The use of confidence is to find when the user buys A,B but not C frequently then we can eliminate the association rule

Confidence=freq(A,B)/freq(A)

Although we frame these rules say for transactions we get 5000 rules then for each and every item it is impossible to compute the rules . So we introduce one more measure i.e Lift.

Lift

It denotes the strength of any rule.

The use of lift is we can determine the occurrence of randomness rather than the association and eliminate the rule.

Lift=Support/Support(A)*Support(B)

Although we find the strength and the associations of rules the companies requirements differ every time. So the company fixes Minimum Support and Confidence . So beyond the particular threshold value only the association rules are selected.

Example for Support , Confidence and Lift

There are five items named A,B,C,D,E and there are 5 transactions given along with the rules. Calculating the support , confidence and lift the table would be as follows.

Frequent Itemset

Frequent itemsets are those items whose support is greater than the threshold value or user-specified minimum support. It means if A & B are the frequent itemsets together, then individually A and B should also be the frequent itemset.

Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two transactions, 2 and 3 are the frequent itemsets

Apriori Algorithm

It is used for efficiency of level-wise generation of frequent itemset thus reducing the search space

Rule : All Non Empty Subsets of aa frequent itemset must also be frequent.

There are 2 steps in apriori

  • Join — (K+1) itemset from K
  • Prune — count<minimum support values

Association Rule Mining

There are 2 steps.

  • Find all frequent itemsets using Apriori Algorithm or FD-Growth
  • Association Rules from Frequent Item Sets ( Minimum Support and Confidence threshold)

Eg for Association Rule Mining using Apriori

Minimum Support = 2 and Minimum Confidence = 60%

Each Candidate/Product are I1, I2, I3, I4 and I5

  • We have to start with single cardinality find the support count
  • Then check whether the support_count<minimum support and eliminate the items
  • Increase the cardinality and do the previous steps again until the set becomes empty

The above step is called as C1 or 1 frequent Itemset .

If we compare that with the Minimum Support = 2 all the items are satisfying the criteria so we can call that as L1.

So from L1 → we have to compute C2 and from there we derive L2.

Increase the cardinality or frequent itemsets as 3 and find the itemsets satisfying computing L3.

The solution is L3 and it is computed using the formula given below

The frequent itemsets are I1,I2,I3 and I1,I2,I5.

Association Rules

  • For each frequent itemset l , generate all empty subsets of l
  • For every non empty subset s of l , output the rule $S=> (l-S)$

Let’s take the frequent itemset I1, I2, I5.

I1 → I2 ^ I5

I2 → I1 ^ I5

I5→I2^I5

I1^I2→I5

I1^I5→I2

I2^I5→I1

I1^I2^I5→empty ⇒ So we can neglect this rule

Computing the confidence and if it is lesser than minimum confidence then we can neglect the rule

Applications of Apriori

  • Recommendation systems in E-Commerce
  • Data Mining
  • SuperMarkets to improve the sales performance
  • Medical Field with Patient history records

--

--

Anjana S
Analytics Vidhya

Ex Uipath Student Developer Champion , Certified Ethical Hacker , RPA Enthusiast , Technical Content & Design Creator (@analysta02)