Association Rule Mining. Deciphered.

Mahika Bansal
Data Science Group, IITR
5 min readMar 29, 2017

In our 12A12D series, we are trying to be as diverse as possible. I can bet that 90% of you are unaware of ARM Techniques, even though they are basics of Data Mining.

Now, let’s try to understand this:

Vicious Circle!

The concept used above is what ARM is all about — Associating things/objects by using some rules.

A man only learns in two ways, one by Reading, and the other by Association with smarter people.

What is Association Rule Mining?

Most ML algorithms in DS work with numeric data and tend to be quite mathematical.
*But, ARM is perfect for categorical data and involves little more than simple counting!

  • It’s a rule-based ML method for discovering interesting relations between variables in large databases.
  • Identifies strong rules using some measures of interestingness.
No love lost.

On Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer.

This anecdote became popular as an example of how unexpected association rules might be found from everyday data.

Such information can be used as the basis for decisions about marketing activities such as Promotional Pricing or Market Basket Analysis.

Mathematics Involved

Problem can be seen as:

  • Let I = {i_1,i_2,….} be a set of binary attributes called items.
  • Let D = {t_1,t_2,….} be a set of transactions called the database.
  • Each transaction in D contains a subset of items in I.

Simple rule looks like -
t_1 t_2 (Here, t_i is generally a single item or a set of items)
*t1: Antecedent, t2: Consequent

Supermarket Example

I = {milk, bread, butter, beer, diapers}
D is as shown below:

1 indicates presence of the item in that transaction, and 0 shows the absence.

Rule: {butter, bread} {milk}, meaning that if butter and bread are bought, customers also buy milk.

Thresholds used for Relations

  • Support — Indication of how frequently the itemset appears in the database. It is defined as the fraction of records that contain X∪Y to the total number of records in the database. Suppose, the support of an item is 0.1%, it means only 0.1% of the transactions contain that item.

Support (XY) = Support count of (XY) / Total number of transaction in D

  • Confidence — Fraction of the number of transactions that contain X∪Y to the total number of records that contain X.
    It’s is a measure of strength of the association rules.
    Suppose, the confidence of the association rule X⇒Y is 80%, it means that 80% of the transactions that contain X also contain Y together.

Confidence (X|Y) = Support (XY) / Support (X)

  • Other measures include Lift, Conviction, All-Confidence, Collective strength and Leverage.

Algorithms

Apriori:

Uses a breadth-first search strategy to count the support of itemsets and uses a candidate generation function which exploits the downward closure property of support.

Can it be explained more effectively?

Comparison of Apriori, AprioriTid, AprioriHybrid and Tertius:

Is it really that good?

Pros

  • Least memory consumption.
  • Easy implementation.
  • Uses Apriori property for pruning, therefore, itemsets left for further support checking remain less.

Cons

  • Requires many scans of database.
  • Allows only single minimum support threshold.
  • Favourable for small databases.

Example can be found here.

Frequent-Pattern Growth: (FP)

  • In the first pass, the algorithm counts occurrence of items (attribute-value pairs) in the dataset, and stores them to header table.
  • In the second pass, it builds the FP-tree structure by inserting instances. Items in each instance have to be sorted by descending order of their frequency in the dataset, so that the tree can be processed quickly.
    Items in each instance that do not meet minimum coverage threshold are discarded. If many instances share most frequent items, FP-tree provides high compression close to tree root.
  • Recursive processing of this compressed version of main dataset grows large item sets directly, instead of generating candidate items and testing them against the entire database. Growth starts from the bottom of the header table (having longest branches), by finding all instances matching given condition. New tree is created, with counts projected from the original tree corresponding to the set of instances that are conditional on the attribute, with each node getting sum of its children counts. Recursive growth ends when no individual items conditional on the attribute meet minimum support threshold, and processing continues on the remaining header items of the original FP-tree.

Once the recursive process has completed, all large item sets with minimum coverage have been found, and association rule creation begins.
Example can be found here.

Other algorithms include Eclat and Node-set based algorithms.

Choose wisely. Very.

Is it really that good?

Pros

  • Faster than other ARM algorithm.
  • Uses compressed representation of original database.
  • Repeated database scan is eliminated.

Cons

  • More memory consumption.
  • Not for interactive or incremental mining.
  • Resulting FP-Tree is not unique for same logical database.

References

  1. A Review on Association Rule Mining Algorithms by Jyoti Arora, Nidhi Bhalla and Sanjeev Rao
  2. Blogs by Mapr, Aimotion and AV
  3. Association Rule Mining: A Survey by Gurneet Kaur

Footnotes

I hope that you appreciate the power of Associative techniques. With this, we come to an end to our 8th of the Series. If you are still following, you are awesome and committed. :D

Thanks for reading. :)
And, ❤ if this was a good read. Enjoy!

Co-Author: Akhil Gupta

--

--