Apriori Algorithm explained!

Published in

the Data World

4 min readMay 5, 2020

Quick Fact :
40% of app installs on Google Play come from recommendations.
60% of watch time on YouTube comes from recommendations.

Recommendation System in today’s world is a very conventional thing, from YouTube to Amazon, Netflix to Google. But understanding the fundamental mechanics of such a system is often considered intricate.

We will now try to breakdown the architecture of a basic Recommendation System by delving into the Apriori Algorithm.

The Apriori Algorithm works on the principles of Association Rule Mining. Association Rule Mining is used to identify the underlying relationships between different items.

Let us move forward by considering an example of a Movie DVD seller shop, where customers can rent or buy DVDs of movies. If we observe vigilantly, there always exists a pattern in what the customers buy, maybe a theme like Thriller, Action or Comedy, etc.

We can help the shopkeeper to get more profit if we can identify the relationship between the movies. If movie A and movie B are frequently bought together, we can use this pattern to increase profit, maybe by placing them at the same location in the store or by offering a bundled discount. People who buy or rent these two movies can be driven into renting or buying the other one.

Components of Apriori Algorithm

There are three major components of the Apriori algorithm namely:

Support

Support indicates the popularity of an item, i.e. movie in our case. It can be calculated as the number of times a DVD of that movie is bought divided by the total number of DVDs bought.

For instance, if out of 100 DVD transactions, 20 transactions contain the movie ‘Avengers: End Game’, the support for that movie can be calculated as below:

Support(End Game) = (#Transactions containing End Game)/(Total Transactions)

Support(End Game) = 20/100 = 20%

Confidence

Confidence indicates the likelihood of movie A (suppose Avengers End Game) being bought or rented if movie B(suppose Iron Man) is bought or rented. It can be calculated by finding the number of transactions where movies B and A were bought together, divided by the total number of transactions where B is bought or rented.

Confidence(Iron Man → End Game) = (Transactions containing both Iron Man and End Game)/(Transactions containing Iron Man)

If we had 15 transactions where customers bought both Iron Man and End Game, and in 30 transactions, only Iron Man is purchased or rented. Then we can find the likelihood of buying End Game when Iron Man is bought as below.

Confidence(Iron Man → End Game) = 15/30 = 50%

Lift

Lift(Iron Man -> End Game) refers to the increase in the ratio of sale of End Game DVDs when the DVD of Iron Man is sold. It can be calculated by dividing Confidence(Iron Man -> End Game) divided by Support(End Game).

Lift(Iron Man → End Game) = (Confidence (Iron Man → End Game))/(Support (End Game))

Lift(Iron Man → End Game) = 50% / 20 % = 2.5

So, here the lift value of 2.5 indicates that the likelihood of buying an Iron Man and End Game DVD together is 2.5 times more than the likelihood of just buying the End Game DVD.

A Lift equal to 1 indicates that there is no association between items. A lift of greater than 1 indicates that products are more likely to be bought together. Finally, a Lift of less than 1 refers to the case where the two products are unlikely to be bought together.

Steps involved in an apriori algorithm

In real-world scenarios, there can be a large number of items to compare leading to a higher number of transactions. The Apriori algorithm will try to extract all possible rules with respect to each pair of items. But it is actually not necessary to extract the relation between each and every pair, as we would like to concentrate on only those pairs which have some effect on each other.

First, we set a minimum value for support and confidence. Thereby we eliminate the unwanted pairs and find the rules for the items that have certain default existence/popularity (support) and have a minimum value for co-occurrence with other items (confidence).
Now, we extract all the item pairs/subsets having support greater than the minimum support value.
Then, we retain all the rules from the subsets with a confidence value higher than the minimum confidence value.
Finally, we order the rules by descending order of Lift values.

In the next post, we will try and implement the Apriori Algorithm using Python.

Originally published at http://thedataresearch.wordpress.com on May 5, 2020.