Association Rules — The ECLAT algorithm

Gabriel Reversi
4 min readFeb 9, 2023

--

Using the ECLAT algorithm in Python to know the frequent itemset.

After my last article about Association Rules with Apriori, I decided to go deeper more in these solutions, both technically and in business application. After some searches I found other algorithms to solve this business problem, like ECLAT and FP Growth.

As you could read in the title, in this article I will talk about the ECLAT algorithm and in the next ones I’ll talk about others like FP Growth and Apriori.

ECLAT Algorithm

The first thing that is important to say is that there are 2 kinds of data formats to apply the Association Rules Algorithm. Horizontal data format and vertical data format. For example, is pretty common we used the horizontal format layout, where each transaction is represented by an ID and a list of items besides.

Example 1 of Horizontal data format.

Or like the second example below, that is more common to see in a business, where each transaction is repeated.

Example 2 of Horizontal data format.

Working with Apriori in this data layout will work very well. But with the ECLAT it’s not like that. It works with a vertical data format.

Example of vertical data format.

This format for ECLAT is necessary because it does the association using the Depth-First Search of a graph (DFS). What does it mean? This means that it finds all the combinations of items of the most item more frequently before of change to another combination.

For example, using the bread in a basket of purchase. It will search which transactions the bread appears in and which items are associated with it before changing to the next item. It does this deeply for each item of the table, passing by each transaction and finding the patterns.

First, it searches for all transactions of item A. After that, it searches for all transactions of item B. And finally, it searches all transactions where A and B appear together.

When we call the function in python, we need to pass the minimal and maximum combinations that the algorithm will make.

The end result is frequent items with their support. If you were waiting for other measures like lift or confidence… sorry, the ECLAT just give us the support.

Well, what are the advantages of this algorithm?

  1. The Eclat algorithm has low memory requirements compared to Apriori because it uses the vertical layout (DFS method).
  2. The Eclat algorithm does not the repeated scan of the data to calculate the support, thus, it is faster than Apriori.
  3. At each stage of the generated database, the Eclat algorithm uses the current generated dataset to learn frequent items, unlike the Apriori which scans the original database repeatedly. Since the Eclat scans over the database once, it is much faster than the Apriori algorithm.

Conclusion

The ECLAT is faster than Apriori if we use it in the small or medium dataset. However, when we talk about the large dataset is possible that Apriori performs better. This happens because the ECLAT algorithm consumes more space in memory than Apriori and when we leave a large dataset intermediate results of vertical item lists become too large for memory, thus affecting the algorithm scalability.
So, we can conclude that ECLAT is good in small or medium datasets and in situations where we won't need a lot of measures like lift or confidence.

I did some tests using this algorithm in a bakery’s dataset. You can check the code in my GitHub, here.

References

The Eclat Algorithm | Towards Data Science

Association Rule Mining using ECLAT Algorithm | by Ana Makharadze | Machine Learning and Artificial Intelligence | Sep, 2020 | Medium | Machine Learning and Artificial Intelligence

Getting Started with ECLAT Algorithm in Association Rule Mining | Engineering Education (EngEd) Program | Section}

ML | ECLAT Algorithm — GeeksforGeeks

--

--

Gabriel Reversi

Hi, I'm data analyst and data scientist. Here I share content about data, tools, methods and business. https://www.linkedin.com/in/gabrielreversi/