Building Rollup hierarchies in python with Treelib and atoti

A product catalogue example

Anastasia V Polyakova
Atoti
5 min readOct 30, 2020

--

In this atoti tutorial, I will walk you through how you can create a hierarchy — aka parent child data structure — to interactively aggregate and drill down using python libraries Treelib and atoti.

The example I’ll be using is an e-commerce product catalog, and the same technique can be applied to create natural hierarchies with many levels: for example, legal entities structure, regional hierarchies and so on.

UPDATE: The GIFs and code snippets in this article are based on an older version of atoti. We have released much smoother and even more functional dashboards and widgets with the latest version of atoti. Check out this link to see the documentation of the latest version of atoti.

Product catalog example

Users will be able to then drag and drop the hierarchy in atoti to interactively re-compute metrics, in my example, “unique sessions count” is displayed next to the multi-level e-commerce product catalog:

Parent-child data and Treelib

The common way to model a tree structure — is through parent-child relationships. In the following example, you can see a list of categories together with their parent categories. They form a multi-level tree — a product catalogue.

Source: educational course “Data analytics in R

Although parent-child pairs are a very natural way to express hierarchies, we can’t use them in its raw form for slicing and dicing. Think about a table in Excel — with different levels of a tree in separate columns, we can combine them in a pivot table to roll up and down through the levels of the catalogue. Let’s extract the levels of the tree into separate columns! But before doing that, let’s look into a library that helps us flatten the tree easily — Treelib.

Treelib python library makes it super easy to manipulate hierarchical data, as it provides common tree operations: traverse it, access leaves, nodes, subtrees etc.

To create a tree object in the Treelib, I’m looping through the categories twice: to create the nodes of the tree and then to put them under the correct parent.

Populating a tree in Treelib

The tree.show() method in Treelib allows to visualize the tree we’ve just created:

Product catalogue in Treelib

Levels of the hierarchy

So far we have invoked the Treelib module to create a tree object from the input data. Now let’s create columns for aggregation as discussed in the previous section.

For the first three categories in the tree (see above), we’d need to create the following columns representing higher, lower and leaf level categories:

You will see later in this post, that the input data on the views and purchases (“facts”) is linked to the leaf level, in our case — “Category_Lvl3”. The fields “Category_Lvl2” and “Category_Lvl1” are broader groups, in other words, they are “parent” levels.

The method tree.paths_to_leaves() implemented in Treelib makes it trivial to create those columns. Every row in the following list represents the identifiers of the nodes, which can be translated into the human-readable labels:

We just need to loop through the leaves and save nodes above each leaf as a row in, say, a CSV file. The only nuance is that the tree is unbalanced, some of the leaves do not reach the max depth of the tree:

As a workaround, I’m adding additional children under the leafs of the shorter “branches” — equal to the leaf itself.

By now, we’ve managed to create columns from the levels of the tree structure, set using parent-child relationships. Let’s move on and expose those attributes as a hierarchy for data analytics — in atoti python library.

Expand and collapse data in atoti

The example in my previous post “Happy data scientist: How to build a business intelligence app with 10 lines of python code” was about loading an events log — views and purchases — for an online shop into atoti cube for further analysis. Let’s see how to inject the product catalogue into the same app.

The base store — “Events” — provides product identifiers, so I’ll load product-to-category mapping and the categories levels we discussed above:

The following code snippet loads the data into atoti from CSV files:

loading and linking data in atoti

To organize the levels into a hierarchy in atoti the following code can be used:

creating a multi-level hierarchy for analytics in atoti

After the above code is run, the new multi-level hierarchy “Catalog” can be selected from the content editor:

Now let me just quickly define a measure — computing unique sessions count (based on the data loaded in “Happy data scientist: How to build a business intelligence app with 10 lines of python code”) as follows:

Done, we can now visualize the measures and recompute the events count along the nodes of the hierarchy:

atoti UI

Instead of a conclusion

Thank you for reading this tutorial, I hope it’s helpful. Please let me know if you have any questions.

--

--

Anastasia V Polyakova
Atoti
Writer for

Anastasia is a quantitative financial analyst and risk management practitioner experienced in modern data analysis tools and frameworks.