3.2) Association Rule Mining using APRIORI Algorithm

Published in

Machine Learning and Artificial Intelligence

8 min readSep 16, 2020

Hello! This article is a part of a full tutorial of Recommendation Systems which you can find here. I strongly recommend to follow the agenda of this tutorial if you want to master this topic.

In this article we will study the process of association rule discovery using Apriori algorithm and discuss different techniques of association rule mining. If you have not read previous article I strongly recommend to see it first here.

Concept of Association Rules

Today large amount of customer purchase data are collected daily basis by companies. The picture below illustrates an example of such data, commonly known as market basket transactions. Each row in this table corresponds to a transaction, which contains unique identifier TID (transaction id) and set of items bought by a given customer.

Retailers are interested in analyzing market basket data to learn purchasing behavior of their customers. Some of useful technique to analyze market basket data set is association rule mining.

The goal of association rule mining is to find rules that will predict the occurrence of an item (Y) based on the occurrence of other items (X) in the transaction. So the goal is to find which items appear frequently together in the dataset.

From the previous article we already know the form of association rule which looks like: X->Y [support, confidence], where X and Y are sets of items in the transaction data. For example: Bread => Milk [Support=60%, Confidence= 75%], where support shows that in 60% of transactions Milk and Bread appeared together and confidence shows that 75% of customers who purchased bread also bought milk. Let’s review some of the known terms again:

Item-set: It’s a collection of one or more items. K-item-set means a set of k items. For example: {Bread, Milk}
k-itemset: An itemset that contains k items. For example: {Bread, Milk} is 2-itemset
Support Count: Indication of how frequently the item set appears in the database. For example: {Bread, Milk} occurs 3 times in our data set
Support: Fraction of transactions that contain the item-set. Support=Frequency of Itemset/Total N of Transactions. For example: Support for {Bread, Milk} = 3/5=60%. It means that 60% of the transactions contain itemset {Bread, Milk}
Frequent item-set: It’s an itemset whose support is greater than or equal to a minimum _support threshold
Confidence: Confidence is the number of transactions with both X and Y divided by the total number of transactions having X. For a rule X=>Y confidence shows the percentage in which X is bought with Y. For example: Confidence for Bread => Milk = 3 / 4 = 75%, it means that 75% of the transactions that contain X (Bread) also contain Y (Milk) together. Confidence (X=>Y) = P(X∩Y)/P(X)= Frequency (X,Y) / frequency(X)
Strong rules: If a rule X=>Y [Support, Confidence] satisfies min_support and min_confidence then it is a strong rule
The goal of association rule mining: given a set of transactions find all the rules having support≥minimum_support and confidence ≥ minimum_confidence

Association Rule Mining Process

In real world transaction data can exceed up to GB s and TBs of data and mining association rules is quite computationally expensive task. An optimized algorithms are needed to prune out item-sets that will not help in later steps and reduces computation time. Apriori and Eclat algorithms are used to do this job which we will discuss later.

A brute-force approach for mining association rules is to compute the support and confidence for every possible rule. This approach is very expensive because there are exponentially many rules that can be extracted from the data set. Total number of possible rules that can be extracted from a data set that contains d items is: R=3^d - 2^(d+1) +1. Even for a small data set like us this approach requires to compute many rules:

Data set containing 5 items: 3⁵-2⁶ +1 = 180 rules
Data set containing 6 items: 602 rules and so on

More than 80% of the rules are discarded after applying minsup=20% and minconf=50% thresholds, thus making most of the computation become wasted. To avoid performing needles computations, it would be useful to prune the rules early without having to compute their support and confidence values.

An initial step toward improving the performance of association rule mining algorithms is to decouple the support and confidence requirements. It means that at first we only calculate the support of the candidate itemsets, based on their support value we can filter out candidates which do not satisfy the min_support constraint. Then we calculate the confidence for generated frequent itemsets. We know that support of the rule X -> Y depends only on the support of its corresponding itemsets. Rules originating from the same item set have identical support but can have different confidence. For example these rules have an identical support:

{Beer, Diapers} -> {Milk}
{Diapers, Milk} -> {Beer}
{Milk} -> {Beer, Diapers}
{Bear, Milk} -> {Diapers}
{Bear} -> {Diapers, Milk}
{Diapers} -> {Beer, Milk}

If the itemset is infrequent, then all six candidate rules can be pruned immediately without our having to compute their confidence values. Therefore, a common strategy adopted by many association rule mining algorithms is to decompose the problem into the two major sub tasks:

Frequent itsemset generation whose objective is to find all the itemsets that satisfy the minimum support threshold. These itemsets are called frequent itemsets. In this step we should find all frequent item-sets whose support ≥ minimum_support (the value of minimum_support is defined by us). Item-sets which can’t satisfy the following constraint support >= min_support will be filtered out in this step. Generation of frequent item-sets is the most computationally expensive step.
Rule generation, whose objective is to extract all the high-confidence rules from the frequent itemsets found in the previous step. These rules are called strong rules. We have a list all association rules from frequent item-sets generated in first step, Calculate Support and Confidence for all of these rules, Prune rules that fail min_confidence thresholds

Generation of Frequent Itemsets

Here we have a data set of 5 transactions with 5 unique items: Bread, Milk, Diapers, Beer and Cola. If we create all item-sets from this 5 items we will get 2⁵-1 item-set.

Let’s see what kind of item-sets can be created according to our data set on the picture below and you can imagine how time consuming can be calculating the support for each super-set (2^k)-1 when you have too many items.

**With k item you will get 2^k -1 super set**

Here is more general example of itemset lattice for I = {a,b,c,d,e} where we get (2^k)-1=31 itemsets in total.

As we know, a brute-force approach for finding frequent itemsets is to determine the support count for every candidate itemset in the lattice. To do this, we need to compare each candidate against every transaction. This is very expensive operation because it requires many comparisons. Specifically, it requires N*M*w operations where N is the number of transactions, M is the number of candidate itemsets (M=2^k -1) and w is maximum transaction width.

**A brute-force approach for counting the support for each candidate itemset by scanning the whole database**

There are several strategies t reduce complexity of generation frequent itemsets:

Reduce the number of candidates (M) — Apriori algorithm is a good example of eliminate some of the candidates without counting their support value. We will discuss Apriori in details later.
Reduce the number of comparisons (NM) — For this we can use efficient data structures to store the candidates or transactions. No need to match every candidate against every transaction. We will discuss this strategies.

Basic Concepts of Apriori Algorithm

Apriori algorithm states: “Any subset of a frequent item set must also be frequent“. Conversely, if an itemset is infrequent then all of its supersets must be infrequent to. In other words, no super set of an infrequent item set must be generated or tested!

Look at the picture below. If item-set {a,b} is infrequent (support<min_support) then we do not need to take into account all its super-sets (abc, abd, abcd). You can see how this principle reduces the number of total supersets for calculating the support.

**If item-set {a,b} is infrequent (support<min_support) then we don’t need to take into account all its super-sets (abc, abd, abcd)**

Apriori principle holds due to the following property of the support measure: Support of itemset(Y) never exceeds the support of its subsets(X): s(X)≥s(Y). This is known as anti-monotone property of support. This strategy of reduce number of generated candidates using support measure is called support-based pruning. Apriori is the first association rule mining algorithm that pioneered the use of support-based pruning. Here is a high level illustration of the frequent itemset generation using Apriori:

**An illustration of frequent itemset generation using Apriori (We assume that minim_support count=3)**

Here the Apriori algorithm generates 13 candidates, with brute-force approach we will get 41 candidates, this represents a 68% reduction in the number of candidate itemsets.

Now as we know, Apriori algorithm in a general way, we can see the step by step approach. I recommend to do all of these calculation by your own and compare the results. Follow to those steps accordingly.

**Step by Step Approach of Apriori Algorithm**

**1)Generate itemsets of k=1 and prune candidates whose support<min_support**

**2) Generate k=2 itemsets from previous frequent itemsets and prune candidates whose support<min_sup**

**3) Generate k=3 frequent itemsets from k=2 frequent itemsets**

**4) No new frequent item-sets can be generated. Creating rules. Finally, we got 4 strong rules: {I2}->I1; {I1}->I2; {I2}->I3; {I3}->I2**

You can see that there are 4 strong rules in our example. For example, rule I2=>I3 having confidence equal to 75% tells that 75% of people who bought I2 also bought I3.

Summarize the Steps of Apriori:

Candidate generation: Generates new candidate k-itemsets based on the requent (k-1) itemsets found in the previous iteration.
Candidate Pruning: Eliminates some of the candidate k-itemsets using the support-based pruning strategy (If one subset of candidate itemset is infrequent then this itemset is immediately pruned, because all subsets of candidate itemsets must be frequent).
The Apriori algorithm traverses the item lattice in a breadth-first manner, as shown in above examples. It first discovers all the frequent 1-itemsets, followed by the frequent 2-itemsets, and so on, until no new frequent itemsets are generated.

Congratulations!

You have completed the theory of Apriori algorithm. You can find the python codes for Apriori here. In the next article we will discuss Eclat algorithm which is the final part of non-personalized recommenders. Link