Market Basket Analysis: Finding associated games using Apriori algorithm

Anand Singh
Capillary Data Science
3 min readJan 23, 2020
Image Credits- Mashable

In a gaming store, if a customer plays a game, what is the probability that he/she will play another game and which are the most frequent sets of games played among the customers?

This is a common business problem and answers for this can be given by Market basket analysis also referred to as Affinity Analysis. It is one of the fundamental techniques used by large retailers to uncover the association between items. In other words, it allows retailers to identify the relationship between items that are more frequently bought together.

Objective: To see which games are frequently played together for leading retail-based games and the entertainment industry.

Challenges: Use of Apriori algorithm generates numerous uninteresting item sets which lead to generating various rules which are of completely of no use also in the cases of big data sets computation time might be high.

Approach: There are three major components of the Apriori algorithm: Support, Confidence and Lift.

Support(B) refers to the default popularity of an item and can be calculated by finding the number of transactions containing a particular item divided by the total number of transactions.

Confidence(A->B) refers to the likelihood that item B is also bought if item A is bought. It can be calculated by finding the number of transactions where A and B are bought together, divided by the total number of transactions where A is bought.

Lift(A -> B) refers to the increase in the ratio of sale of B when A is sold.

For instance, if out of 1000 total transactions,100 transactions contain kids rides(KR), 200 transactions contain attraction games(AG) and 75 transactions where kids rides and attraction games are played together then,

Mathematical representation of support, confidence, and lift

A Lift of 1 means there is no association between products A and B. Lift of greater than 1 means products A and B are more likely to be bought together. Lift of less than 1 refers to the case where two products are unlikely to be bought together.

In our instance, kids rides and attraction games occur together 3.75 times more than random, so we conclude that there exists a positive relationship between them.

Implementation in Python

Apriori class from the apyori library is used for implementation in python. The Apriori class requires some parameter values to work. The first parameter is the list that we want to extract rules from. The second parameter is the min support parameter. This parameter is used to select the items with support values greater than the value specified by the parameter.

Next, the min confidence parameter filters those rules that have confidence greater than the confidence threshold specified by the parameter. Similarly, the min lift parameter specifies the minimum lift value for the shortlisted rules.

Finally, the min length parameter specifies the minimum number of items that we want in our rules. The minimum confidence for the rules used was 20% or 0.2. The value for lift as 3 and finally min length as 2 since we want at least two products in our rules.

The Python code used for market basket analysis is given below:

Market Basket Analysis using Python

Insights

More than 1.5K associations of different games with support value>0.2 and lift >3 was obtained. Based on these best-associated products these games can be placed together in the store. We can also make the players of one game the target prospects for another best-associated game or group of games.

What next?

The steps are to build a recommendation engine based on the findings of apriori algorithm. Recommendation engines are data filtering tools that make use of algorithms and data to recommend the most relevant items to a particular user which in turn helps in cross-selling and up-selling.

—By Anand Singh

--

--