Further Exploration of Association Rules

Dan Isaza
Weekly Data Science
4 min readJul 19, 2018

Here, we’ll take a quick look at why an association rule between two itemsets that appear in baskets with independent probabilities has an expected interest of zero. (As a shorthand, we can call these independent itemsets.)

I’ll assume that you’ve read Finding Meaningful Associations in Retail Data — which pairs well with The Intuition Behind the Apriori Algorithm.

https://unsplash.com/photos/el5RpIy4o5c

Vocabulary Recap

If you’ve read the Apriori and Meaningful Associations posts, these terms will be familiar.

  • Our dataset consists of shopping carts at time of checkout. We call these baskets. They consist of items. Groups of items are called itemsets.
  • The number of baskets that an itemset appears in is called the support of that itemset. An itemset is frequent if it’s support above some predefined support threshold.
  • An association rule takes the form I → j, where I is an itemset and j is a single item. For example, {bread, eggs} → flour is an association rule. Describing this association rule allows us to say how purchasing bread and eggs influences purchasing flour, if at all.
  • Confidence is a useful intermediary metric when discussing association rules, though it doesn’t mean much by itself. Interest, however, is quite meaningful. If you haven’t encountered these terms before, you really should read Finding Meaningful Associations in Retail Data.

Quick note on Notation

Let’s distinguish between Pr.(j) and P(j). Let Pr.(j) represent the proportion of baskets in which j appears, whereas P(j) is the true probability of observing j in a basket.

Pr.(j) = Support(j) / (Num. Baskets) = the proportion of baskets containing j

P(j) = True probability of observing j in a basket.

Note that P(j) = E[Pr.(j)], but this doesn’t mean that Pr.(j) and P(j) are always equal.

Let I be an itemset and j be an item, both of which appear in a retail dataset with N baskets, and let observing I and j in a basket be independent events.

Assume, for the sake of contradiction, that the association rule I → j has an expected interest that is not zero.

Begin by taking the expectation of the definition of interest and expand by substituting the definition of confidence.

Multiplying the confidence term by (1/N)/(1/N) yields:

Note that we’re just multiplying one of the terms by 1.

Note that the terms in the numerator and denominator become Pr.(I and j) and Pr.(I), respectively.

Note the change from set notation to probability notation.

Keep in mind: We’re moving from set notation to probability notation here, so I’ve chosen to use words rather than symbols to denote “and”, so as to avoid confusion.

We then apply the linearity of expectation to get:

Recall that the expected proportion of baskets in which I and j occur is equal to the true probability of observing both I and j in a basket. The same is true for the proportion of baskets that contain I.

Now we’re using the true probabilities of observing these itemsets.

Note that since observing I and observing j are independent events, we can express P(I and j) as P(I)P(j). (They are independent events, so we multiply them to get the probability of them both occurring.)

Observing I and observing j are independent events.

Note that we’ve shown that the expected interest of the association rule I → j is zero. This directly contradicts our assumption that observing I and j are independent events and that the E[Interest(I → j)] ≠0. Thus, it is not possible that observing I and j be independent events and that E[Interest(I → j)] ≠ 0, proving that E[Interest(I → j)] = 0 when observing I and j are independent events. (Q.E.D.)

So there we have it — a quick proof on why the expected interest of an association rule is zero when observing the two itemsets involved are independent events.

Want to see more posts like this? Consider hitting the Follow button.

Share your thoughts in the comments below!

--

--

Dan Isaza
Weekly Data Science

Stanford Math & CS | VP of Engineering at Clever Real Estate | (he/him pronouns)