Madhura Joshi
Fnplus Club
Published in
5 min readApr 18, 2019

--

#ALGORITHM

My study table

Ap..Apri..Apriori Algo!

From learning how it is pronounced to remembering its name I have learnt it in the best way I believe. *hahaa*

And now here I write a blog sharing what I have learnt. Let me tell you before hand this is my first technical blog and I aim to explain whatever I have understood in the best and easiest way to my readers.

Yeah! yeah!

Big data analytics, a course that I’m being taught in my university has introduced me to a variety of algorithms and their fascinating names!

Here’s one such Ap..Apri..Apriori Algorithm.

This is an algorithm, that is used for finding frequent item sets in a dataset and its relevant association rule. It has got the odd name because it uses its ‘prior’ knowledge of its frequent itemset properties. This algorithm is operated on a database containing large number of transactions.

Before I tell you more about this algorithm, there are some key terms to be understood.

1. Support

2. Confidence

3. Lift

4. Candidate pair

Let us learn these terms through an example.

Suppose 1000 customer transactions take place in a supermarket. Out of these 200 transactions are for Bread, 300 transactions are for Cheese, and 100 transactions are for both bread and cheese. I have taken Bread and cheese as an example since we usually buy them together.

Support:

Support for this given example can be calculated as the quotient of the division of the transaction number of transactions containing that particular item (say it bread here) by the total number of the total number of transactions.

Support(Bread) = (Transactions involving Bread)/(Total number of transactions)

= 200/1000

= 20%

Confidence:

Confidence in our example would be the customer buying both bread and cheese. It can be calculated as the quotient of the division of transactions involving both bread and cheese by the transactions involving only Bread.

Confidence = (Transactions involving both bread and Cheese)/(Transaction involving Bread)

= 100/200

= 50%

Lift:

Lift here is the increase in the ratio of the sale of bread when you sell cheese. It can be calculated as:

Lift = (Confidence (Cheese to Bread))/Support (Bread)

= 50/20

= 2.5

A person buying both bread and cheese is 2.5 times more than him buying bread alone. Greater the value better is the combination.

Candidate pairs:

Candidate pairs are those that at least map on to the same basket or bucket that contains items in it.

This Ap…Apri …Apriori Algorithm has a property, that says..

“All subsets of the frequent itemset must be frequent.” If an itemset is infrequent then all the subsets will be infrequent.

Here’s a common picture that describes how the algorithm works having a minimum support 2.

Example image

Consider the database TDB with item ID’s 10,20,30 and 40 having their item sets respectively.

  1. In the first scan we identify how many times an item set appears in a particular Item ID and find out the support for every item set here.

For example, item {A} appears in ID-10 and in ID-30 hence it appears twice. (support of {A}=2).The output of the first scan is the candidate table C1.

2. On checking the support of every item set with the minimum support =2(threshold given in the example). Eliminate the item set that has support < minimum support=2 (here it is item set {D})and create the table L1.

3. Making all possible pairs from the item sets that belong to table L1 with the size of the set as 2, we get the table C2. Scanning the table C2 and eliminating the candidate pairs having support value less than minimum value i.e., 2,(marked blue in the example picture)the table L2 is generated.

4. The candidate pair table C3 is generated having size of the set as 3, on scanning it and checking it with the minimum support value we get table L3 with item set {B,C,E} and support value 2.

We stop here since no frequent itemsets are found further.

The key to remember this process is, in Pass 1 → Finding frequent item sets and in Pass 2 → Finding frequent pairs

That’s how it works!

This algorithm is easy to understand and is used on large datasets. But, it is a tedious job since it has to go through the entire large database.

Uses of this Algorithm:

Apriori Algorithm is being used in the field of Health Care as it can help in detecting the Adverse Drug Reactions (ADR) by producing association rules to indicate the combination of medications and symptoms of patients that can cause Adverse drug reactions.

The use of this algorithm is for finding association rules efficiently. To find the association rules the requirements are:

· The Support value should be more than the threshold Support or minimum threshold.

· Confidence values should be greater than the threshold confidence.

Hoping to have helped you by giving some insights of Apriori Algorithm in a little easy way, I’ll take your leave now.

See you in the next blog!

--

--