Be A Data Maestro, Part IV

Data Visualization: Screaming Trees

Pie and Donut Analytics
Away From Towards Data Science
5 min readOct 11, 2021

--

Photo source: uncut.co.uk

Welcome to the last article in my Medium series about doing a data analysis project on music preferences. I know I said that back in Part III, but this time I mean it! In that last installment, I presented the results of my analysis in what *I* thought was a super fun and innovative way. That being said, my previous way of conveying the fruits of my labor was not visually-oriented whatsoever, apart from some pics of certain music artists in my dataset. Now, I can show you another way to display information that is more appealing to the aesthete who resides in all of us.

A very short primer on treemaps

I will not launch into a boring mini academic lecture since you can easily Wikipedia that stuff if you are really interested in learning all the ins and outs of this topic. Rather, I will explain why a treemap is an ideal way to pictorially represent a particular dimension of my data that I wanted some insights on.

Briefly, you process a treemap the same way you would read any other document — from top left to the bottom right corner. The bigger the component rectangle (i.e. the “leaf”), the larger the volume of that leaf. Within the rectangle, you can further encode information by having different color-coded categories, again sized in proportion to their volume. The nice thing about this type of chart is the efficient and elegant use of space it makes as opposed to other types of graphs. Say I wanted to do a bar graph instead since most people are more familiar with that type of chart. I would have to have the main bar for each ‘leaf,’ and it would have to be either stacked bars or individual bars for each category grouped together for each leaf. AND I would still need to add a legend and axis labels to the page as well. Yikes!

Does anything jump out at you?

I can IMMEDIATELY see that Taylor Swift has the widest generational fanbase just by seeing that her leaf is the most colorful. Other things that are made obvious to me: Baby Boomers and Generation X, by the sheer fact that they have been alive the longest out of the five generations represented in my dataset and thus have had tons more exposure to great music throughout the decades, are putting on the treemap a whole lotta magenta and teal leaves. I can clearly see that the Gen Z kids have only 3 blue rectangles, while the toddler and elementary school-aged Gen Alpha has a solitary pink rectangle.

Some possible criticisms of this data viz approach and my rebuttals

  1. These conclusions are SO OBVIOUS AND INTUITIVE. Why did you even bother doing this? OK, here’s the thing with data science projects, either self-imposed like this one or more realistically, assigned to you by your boss. A lot of the time you will already have a priori thoughts and anecdotal observations from the get-go, don’t need a data scientist for that. HOWEVER, what a data scientist IS needed for is to execute the scientific method of actually proving or disproving your initial ideas with good old fashioned, hardcore hypothesis testing. If your workplace is more chill and laid back, then touché; maybe don’t go through the trouble. But just be warned that many people will not take your company seriously if it makes business decisions mainly using gut feelings rather than careful evidence-based data analysis.
  2. I remember you had other demographic variables like race, geographical region, etc. Why didn’t you put those into your treemap? I want it ALL and I want it in a single dashboard, goshdarnit! That’s what I’m paying you for, bub! I can just see many of you out there who do data analytics work going “OMG, you must be reading my mind, sistah!” Obviously, I am satirizing how difficult and unrealistic some clients can be. But tbh, this is not so far from the cold hard truth.

Please, data requesters, we cannot be all things to all people. We cannot satisfy all theoretically possible stakeholders and anticipate each one’s needs all in one report. Even if we wanted to, the resulting dashboard will become too unwieldy as to render it utterly useless. But we have multiple SMEs, and they all want their own neat and tidy report, says the project manager. Well, figure out who are the key decision-makers, and let’s just deal with THEM, not make everyone have to wait a whole month for 20 people’s calendars to all be clear at the same time so we can have a fun paralysis-by-analysis requirements gathering meeting taking up our lunch hour.

Final shout-out and thoughts

MicroStrategy was used to make the insights pop out of the data, but there is a plethora of other software, both commercially available and open-source, that you can use to make a treemap or any other type of data visualization. The industry standard is Tableau, but not all companies have this due to how expensive the licenses are. There is also quite a steep learning curve, especially if you have not worked with other business intelligence tools. Some other options you have out there are Alteryx, Cognos, PowerBI, Qlikview, and Looker. The great thing is once you know how to use one tool, you can easily transfer your data viz skills to another tool, trust me.

Two closing comments to wrap up my Be A Data Maestro series: (1) Collaboration and sharing of best practices are the keys to being a truly great data scientist! No lone wolf, academic ivory tower types need apply, IMHO. (2) It is inevitable though that you WILL run into such characters throughout your data analytics career. But don’t let these buggers get you down! Soldier through it and do like Tay-Tay says: Shake It Off!

--

--