TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Orange Data Mining Tool and Association Rules

Caner Erden
TDS Archive
Published in
4 min readMay 11, 2020

--

Photo by Oleg Magni via pexels

In this article, association analysis will be studied using the Orange Data Mining tool. The Apriori algorithm will be utilized for creating association rules. Algorithm steps will be shown on a small set of market shopping data.

Association Rules

Association analyses are studies that try to uncover if-else rules hidden within the dataset. It usually yields good results with categorical data. The most common example on association analysis is basket analysis. In addition, it has a wide range of uses such as bioinformatics, disease diagnosis, web mining and text mining.

Basket Analysis

In basket analysis, we keep products bought by shoppers in a list, and wonder which products are sold more together.

The Data

Let’s say we have a data consisting 5 transactions in a market like:
1 Bread, Milk
2 Bread, Tea, Coffee, Eggs
3 Milk, Tea, Coffee, Coke
4 Bread, Milk, Tea, Coffee
5 Bread, Milk, Tea, Coke

We can see that most shoppers who buy Tea also buy Coffee in the dataset. Now, let’s show the dataset using one-hot encoding. The dataset can be downloaded from here.

One-hot encoding

Some Definitions on Association Rules

Product list: List of all products in the basket, i.e {Bread, Milk, Eggs}.

Support count (σ): The number of items passed on purchases, i.e. σ({Milk, Tea, Coffee}) = 2

Support rate(s): The proportion of the product list in the exchange, i.e. s({Milk, Tea, Coffee}) = 2/5

Product list frequency: Support rate list of products above a specific value.

There is more information here on association rules. In this blog, I will show how to utilize association rules using Orange tool.

Apriori Algorithm

The Apriori Algorithm is the most used algorithm in basket analysis. The algorithm starts by specifying a threshold value. For example, let’s take the minimum support threshold to 60%.

Step 1: Type product lists in frequency and identify the product with maximum frequency. Multiply the number of products by threshold value and remove products below the value you find.

Step 2: Multiply the number of products by threshold value and remove products below the value you find.

Step 2

Step 3: Create a frequency table for binary product sets.

Step 3

Step 4: Create a frequency table for triple product sets.

Orange Data Mining Tool

Picture from Orange

Orange is and open source machine learning and data visualization.

Some important features;

  • Free, open source
  • Machine learning and data visualization tool
  • Simple graphical interface and add-on support
  • Python codes can also be developed.

Association Rules with the Orange Tool

After you install the Orange , select Options >>Add-ons and install Associate! We’re setting up the add-on. You can download the dataset(.csv) we use in basket analysis at this address. After opening the dataset, we can show rules with minimal support with frequent Itemsets. Finally, we can show the Rules of Association with the Rules of Association.

That’s it, you can now use your own dataset to find if-else rules hidden in data.

I also share this blog in Turkish in my personal website.

References

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Caner Erden
Caner Erden

Written by Caner Erden

Meta-heuristics, optimization algorithms, discrete event simulations, machine learning, statistics

No responses yet