Hierarchical charts in plotly express

Clive Jude Ronald
5 min readAug 10, 2023

--

Plotly Express is a high-level interface to Plotly that makes it easy to create interactive visualizations. It supports a variety of data types, including column-oriented data that can be stored in a DataFrame. Plotly Express functions use Pandas internally to process the data, but can also accept other types of DataFrames as arguments.

Here are some key points about Plotly Express:

· Plotly Express is a high-level interface to the Plotly library that makes it easy to create interactive visualizations.

· It can work with different types of data structures, including Pandas DataFrames.

· It can handle data in both “wide” and “long” formats.

· It provides functions for creating a wide range of visualizations, such as scatter plots, bar charts, line plots, pie charts, box plots, and more.

· It offers built-in styling options to customize the appearance of your charts.

· It can generate interactive plots that allow users to explore data by hovering over points, zooming in, panning, and more.

· It is built on top of the core Plotly library, so you can still access and modify the lower-level properties of your visualizations if needed.

· It is a good choice for beginners who want to create interactive visualizations without having to learn the details of the Plotly library.

· It can also be used by experienced users who want to create quick and easy visualizations.

· Plotly also has wonderful features like hover name and hover data which tells us the exact information that we want to know when we point to a specific area in the chart.

For the purpose of this blog, we would be using three charts to explain the usability and versatility of the python library.

I would be using data from the following link:

https://www.kaggle.com/datasets/akshaydattatraykhare/car-details-dataset

The data speaks about the number of used cars that have been sold throughout the start of the organization till 2020.

Sunburst charts:

A sunburst chart is a type of circular visualization that is used to represent hierarchical data. It is often used to show the distribution of a total value across different categories. Sunburst charts are made up of concentric rings, with each ring representing a different level of the hierarchy. The size of each slice of a ring represents the value of the corresponding category.

It helps us to establish a relationship between the variables and explain how one factor is placed in hierarchy with the other. For this the following codes were used.

This helped us to know about the columns mentioned in the data set. For our purpose, we want to check first the type of owner, and the seller type which can be an individual, a dealership or a Trustmark dealership. The chart defines where the owners prefer to sell their respective vehicles. The sunburst chart is as follows:

This suggest that most of the owners are the first owner of the vehicle. Further bifurcations are mentioned below:

For first owners, maximum of them sell their cars to individuals followed by normal dealers and then Trustmark dealers.

Second owners, like the first owners, prefer to sell their cars to individuals followed by normal dealers and then Trustmark dealers.

Third owners also prefer to sell their cars to individuals then to dealerships.

We can conclude that the preferred channel of sale is to the individuals and people prefer selling their cars to individuals whom they trust rather than dealerships.

Tree maps:

A tree map is a type of visualization that is used to represent hierarchical data. It is made up of a series of nested rectangles, where the size of each rectangle is proportional to the value of the data it represents. The rectangles are arranged in a tree-like structure, with the top-level rectangle representing the root of the hierarchy and the bottom-level rectangles representing the leaves.

Here are some of the benefits of using tree maps:

· They can be used to visualize large amounts of hierarchical data in a compact and easy-to-understand way.

· They can be used to show the size and relative importance of different parts of the hierarchy.

· They can be used to identify patterns and trends in the data.

The codes used are as follows:

The columns used here are year range, which was an additional column which was added to the dataset. To categorize the different time periods into 3 broad categories — “1990–2000”, “2001–2010” and “2011–2020”. Further, we have categorized the data based on owners and thereafter, the brand of the columns, the values of the size of the boxes depend on the selling price.

The highest number of sales have happened during the most recent years i.e., 2011–2020, this may have happened because of the rise in popularity of the platform. Thereafter, the most sales have been done by first owners, in the following layer, we can observe that cars from the Maruti brand have been bought and sold the most.

The second highest number of sales have happened during the years i.e., 2001–2010. Thereafter, the most sales have been done by second owners, in the following layer, we can observe that cars from the Maruti brand have been bought and sold the most.

Lastly, the least sales have occurred for the time period 1990–2000. Thereafter, the most sales have been done by second owners, in the following layer, we can observe that cars from the Mercedes -Benz brand have been bought and sold the most.

--

--