Association Rule Learning & APriori Algorithm

Ibrahim Yıldız
Analytics Vidhya
Published in
4 min readJul 20, 2020

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.

Association Rules find all sets of items (itemsets) that have support greater than the minimum support and then using the large itemsets to generate the desired rules that have confidence greater than the minimum confidence. The lift of a rule is the ratio of the observed support to that expected if X and Y were independent. A typical and widely used example of association rules application is market basket analysis.

Measure 1: Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. In Table 1 below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.

If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold. You may then identify itemsets with support values above this threshold as significant itemsets.

Measure 2: Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears. In Table 1, the confidence of {apple -> beer} is 3 out of 4, or 75%.

One drawback of the confidence measure is that it might misrepresent the importance of an association. This is because it only accounts for how popular apples are, but not beers. If beers are also very popular in general, there will be a higher chance that a transaction containing apples will also contain beers, thus inflating the confidence measure. To account for the base popularity of both constituent items, we use a third measure called lift.

Measure 3: Lift. This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is 1,which implies no association between items. A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought.

APriori Algorithm

It is an algorithm developed to extract the relationship between data in machine learning. The algorithm uses a bottom-up approach, examines one data at a time and seeks a relationship between this data and others.

For example, suppose the above figure is the shopping baskets of customers in a market. When we look at the first table, we see the products taken. (1 3 4–2 3 4 etc.) The algorithm first finds the frequency of these products, ie the total number of intakes. (1st product was bought 2 times, 3rd product was 3 times etc.) After finding these values, it gets the minimum support value of the highest frequency (50–3 * 50/100 = 1.5%) and those with a frequency lower than this value are eliminated. By combining the remaining values, the same process is repeated and the table is further reduced. This continues until a relationship is found.

We performed the Apriori algorithm using real supermarket data.You can access this application from the Kaggle link below.

https://www.kaggle.com/ibrahimyildiz/association-analysis-with-apriori-association-rule

References

--

--