Association mining — Support, Association rules, and Confidence

Little Dino
3 min readMay 1, 2022

--

Introduction

Association mining aims to learn patterns/substructure/knowledge from a dataset through association rules. For example, we can use transaction records of a supermarket to investigate which items are often bought together. Further, we can use this knowledge to analyze which display will lead to the highest revenue.

In this article, we’ll introduce the concept of support, association rules, and confidence, as well as how to generate association rules in Python.

Support

Support (relative support) is the proportion of transactions containing a certain item set. An item set can contain more than one item.

Say the item set Fruit contains apple and banana, and our database looks like this.

The support of Fruit is 1/4 (=0.25) since only 1 out of 4 transactions (the 1st transaction) contains apple and banana.

Association rules

An association rule represents the pattern/co-occurrence of two item sets by using an if-then condition. For example, a rule (Apple) → (Banana) means “IF Apple is in a transaction, THEN Banana is also in that transaction”.

Support and Confidence

As you might imagine, we need some measures to evaluate the association rules. Support measures the frequency of item sets co-occurring in a rule, and Confidence is a reliability measure of a rule.

In specific, support of a rule A → B is the support of the union of A and B, which is the proportion of transactions that contain both A and B. Confidence is the proportion of transactions containing A also contains B. If the confidence is high, we know the rule is applicable to our database, we should hence further investigate the rule.

The formula of confidence goes like this.

Using the same example above, what’s the support and the confidence of rule (Apple) → (Banana)?

The support of Apple is 1/4 since 1 out of 4 transactions contains apple, and the support of Apple and Banana is also 1/4 as we mentioned before. Thus, the support of this rule is 1/4, and the confidence is 1/4 divided by 1/4, which is 1!

Now let’s think about the support and the confidence of rule (Banana) → (Apple).

The support of Banana is 3/4 since 3 out of 4 transactions contains banana, and the support of Apple and Banana is 1/4. Therefore, the support of this rule is 1/4, and confidence is 1/4 divided by 3/4, which is 1/3!

⚡ The support of rule A → B is the same as the rule B → A, but the confidence of them is often different!

Strong association rules

The association mining is a user-dependent subject, so whether a rule is strong is also defined by the users. To be more concrete, we can define a minimum support and minimum confidence of the rules. If the support of a rule ≥ minimum support AND the confidence ≥ minimum confidence, then the rule is considered strong.

In the real world, we might have tens of thousands of association rules, and we’ll only consider the strong ones. These rules are worth investigated since the item sets co-occur frequently and the rules are more reliable in terms of confidence.

For instance, if we set the minimum support as 1/2 and the minimum confidence as 1/2, then the rule (Apple) → (Banana) is strong, while the rule (Banana) → (Apple) is NOT strong (its confidence is less than 1/2).

References

  1. https://www.techtarget.com/searchbusinessanalytics/definition/association-rules-in-data-mining

--

--

Little Dino

Welcome to my little world! I LOVE talking about machine learning, data science, coding, and statistics!