A Conceptual Introduction into Association Rule Mining — Part 1

Annette Catherine Paul
delvify
Published in
8 min readJul 9, 2021

As humans, we inherently try to connect between various observations and concepts. These connections are built over some associations we relate to. The strength of these associations are directly related to how analytical we are with and without biases. Sometimes, it is a direct representation of anything we can relate to. Other times, if we put efforts, we can try and comprehend things beyond our scope, like the time we argued about whether the dress was blue and black or white and gold.

Image Source: GIPHY

In my family, my mother is not fond of owning pets. However, when we got our tabby cat, my cat associated with my mother the most on the grounds that she used to pet him the least, but when she did, he enjoyed it the most.
Whenever guests who were not comfortable with cats came over, we would have to keep him inside my room with the door closed. When we returned, we were always shocked to find all the papers in the room destroyed, probably cursing on us in secret cat language.

At no point did we communicate with our cat that when guests come, we will lock him up for a brief while. But after observing and registering repetitive actions, he made associations and chose the appropriate rule to apply (action) based on these associations.

What is association rule learning?

Now that we understand associations from our perspective, as humans we tend to jump to conclusions quickly. In other terms, we tend to conclude associations as causality or correlations. With association rule learning that is not the case. Let us first understand what association rule learning is and why it does not necessarily imply causality.

When we walk into a supermarket, there is a reason why dairy items are all placed together and, not far from it, breakfast cereals are placed too. These product placements are a result of associations based on previous customer transactions with the one motivation that a customer should not spend time searching for relevant items. Also, if the product placements are intuitive, we would end up buying more than what we need. We may come with the intention to buy milk, but seeing the breakfast cereals, we would see an association and would want to purchase those too.

For example, this happens frequently when shopping in Lulu Hyperpermarket. Their product placements are so intuitive that there is no way you are going to step out without buying more than what you came for and not to mention every other hypermarket you walk into, you compare with Lulu unknowingly as in your mind that is how you default the arrangement of any hypermarket.

Association rule learning is a rule based machine learning method for discovering relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. — [1]

Does that mean my customer will always buy breakfast cereals because they bought milk?

Association rule learning tries to quantify the strength of co-occurrence. An example of co-occurrence is, you may be motivated to join the gym if your friend joins too (who doesn’t like to workout together, right?) but that does not mean, you will join without thinking twice. Additionally, it does not contain any order preference as it is a set of elements and it emphasizes on capturing all the items within a transaction as a group itself. Hence, you cannot map it back to a customer level and arrive at preferential inferences.

When I think back to my cat’s actions. Did he always react in that manner whenever we closed him in my room or were the frequency of his actions different at different times? Certainly it would be unfair to assume that all of his actions will be same. Some associations would be stronger than others.

How do we validate this?

Let us assume (if there existed multiple itemset to his reactions to being locked in a room) these are the possible item sets:

Itemset1: {locked in room, anger}
Itemset2: {locked in room, calm}
Itemset3: {locked in room, anger, hiss}
Itsemset4: {locked in room, anger, sleep}

Image Source: GIPHY

Support: From my experience with him, the co-occurrence of him being locked in a room and angry is higher than the co-occurrence of him being locked in a room and calm. The number of times he was angry most definitely outweighs the number of times he was calm. Additionally, anger would most certainly result in {hiss} as compared to pushing it off by taking a nap ({sleep}). In other words, itemset3 has more support than itemset4.
Impact of support: If I know that beyond a certain proportion my cat tends to behave in a certain way then, I can consider it as a significant support. However, if the support is low, then the strength of the association would tank too as we cannot really arrive at any conclusion based on that scenario.

Image Source: GIPHY

Confidence: We all want some level confidence to relate to anything in our life right? Given that his itemset has {hiss}, how many times was he actually locked in the room ({locked in room})?
Impact of confidence: This conditional probability speaks of the confidence with which you can speak of an association rule with a frequent item set implying on to another one. Sometimes, this value can be incoherent with intuitive knowledge. If the number of occurrences itself is limited, but has a relatively high overlap with various items. then this number can be high, even close to 1. Although intuitively we know that the association is misleading, just as in the case of given {sleep}, what is the confidence of {locked in the room}. Hence we need another measure in lieu with this concept to make sure we are not comparing apples and oranges, and that measure is lift.

Image Source: GIPHY

Lift: The support of an itemset that we are trying to associate to is controlled by lift. Or, what is the lift to the confidence of {sleep} in having {locked in the room} within the itemset. We are trying to analyze the change in likelihood of having {locked in room} in the set knowing that {sleep} is also present in the same set, above the likelihood of having {locked in room} in a set without any prior knowledge of existence of {sleep}.
Impact of lift: Whenever the presence of one item actually pushes to having another item in the set, lift would be greater than 1, which implies high association, greater the cooccurrence and the reliability of the confidence with which we can speak of {X} → {Y}. This is what will truly help in the product placement scenario we spoke of before.

Image Source: Patterns of User Involvement in Experiment-Driven Software Development [2]

What is the scalability of these rules with respect to time?

Optimizing these two aspects is always a challenge within any technological firm. While finding answers is indeed important, the relevancy and optimization of those answers is also very important, as we do not have all the time and resources in the world to scale solutions that are unrealistic. Hence, generating these rules from a massive dataset could take a lot of time and effort. Are there any ways wherein we can leverage our data and optimize what we feed into the model? By leveraging itemset(s) that are frequent enough when compared to a minimum support threshold value, we can significantly optimize. A related concept when we deep dive into this is Maximal Frequent Itemset and Closed Itemsets.

A maximal frequent itemset is a frequent itemset for which none of its immediate supersets are frequent. — [3]
An itemset is closed in a dataset if there exists no superset that has the same support count as its original itemset — [4]

Impact: Maximum Frequent Itemset as the name suggests is the most compressed form of an itemset holding the maximum information possible. This is ideally the most optimized way to use. But one main disadvantage is while every subsequent Itemsets can be derived, their support is lost. If your business goals and insights revolve around the requirement of support, we need to adopt another optimized approach which is Closed Itemsets. This approach ensures lowering inessential items without losing the support values.

What are some use cases of association rules mining?

Image Source: Association Rules Apriori Algorithm [5]
  • Market Basket Analysis (Affinity Analysis): By looking for combinations of items that occur together frequently in transactions, we try to uncover associations between these items, for improving product placements for offline shopping. The graph shown outlines these associations visualized as rules.
  • Improving recommendations by showcasing relevant products: Recommender systems are intuitive and complex architectures leveraging many aspects of an ecosystem, inclusive of customer behavior, product behavior, content behavior etc. With the help of collaborative filtering, we can attribute the relationship back to a customer and intuitively recommend better products at a user level.
  • Formal Concept Analysis: When we have a principled way of deriving a concept hierarchy from a collection of objects and their properties, we conduct a formal concept analysis. A snapshot of this is when we tried to relate to concepts using my cat — [7]
  • Text Analysis: A novel method used to visualize associations between words to try and capture using a directed graph method. This can help uncover some thoughts on which words can go hand in hand often, especially in a text search where the query heavily depends on a user’s behavioral and cultural background. — [8]
  • Fraud Analysis in Insurance: In this setting, association rules are used to identify and flag fraud rings which maps the claim identifier to the parties involved as items with a goal to identify frequently occurring associations between these items when handling the claims. For example a fraud would be: If insured by A and police offer X then auto repair shop could imply fraud. At least it gives us a list to validate — [9]

This article is meant to be a gentle introduction into association rule learning. For Part 2 of this article we will be implementing this concept using various models such as Apriori, Eclat and FP growth algorithms.

How do you implement association rules within your firm? Feel free to leave your comments.

--

--