Design for Understanding — Data-Driven Graphs

7 min readNov 21, 2018

For this assignment in our Human-Computer Interaction class, our group was tasked with creating two kinds of graphs. We decided to split the group into two teams, each one tackling each kind of graph: One would make a stylized graph, using whatever techniques or presentation they saw the most persuasive, while the other designed three data-driven graphs, whose only objective was to show the data as clearly and understandably as possible instead of pushing for a certain point of view. Since the two halves of the team operated mostly independently, this review will mainly focus on the data-driven half of the project that I worked on.

The Dataset

Our brainstorming process, and some other topics we considered.

The first order of business was to choose what data to analyze, so we were pointed to several databases with extensive data to work with. That said, some of their datasets failed to explain themselves; the labels didn’t quite explain themselves, and most of them didn’t include more detailed explanations that allowed us to understand what we were seeing, let alone confidently work with them. After spotting a few candidates and brainstorming with the team, we agreed to use the“Music” dataset from the CORGIS Datasets Projects website.

Now that we had data to work with, we had to decide what to make out of it; What labels were we going to plot against each other? Each entry of the dataset corresponded to a song and included plenty of information, such as the artist, its genre, duration, tempo, and even time signature.

The second phase of the brainstorming process

For the data-driven half of the project, we first discussed potential label combinations and which kinds of graphs could best display each. But eventually we decided to experiment with the idea of showcase a single set of labels with several different graphs. The labels in question for all of these graphs would be each song’s Tempo and Loudness. Hotness would also become part of most of our graphs, although this was a later decision.

We chose to use Vega-Lite to build each of the graphs. Vega-Lite was easy to pick up and provided us with just enough control to build effective data-driven graphs. After all, since out graphs are meant to not have any animations or effects, losing access to these and other advanced tools does little to no harm.

The Scatter Plot

First, we made a Scatter Plot of the tempo of songs compared to loudness. This yielded an idea of how the songs in our database were distributed. However, we were worried for the possibility that the dataset may be skewed in some way and become misleading. Therefore, we took it one step further and colored each of the points based on the song’s hotness; the more popular the song, its point was colored a stronger shade of red. We tweaked the range of colors to ensure that less popular songs, while less prominent in the graph, would still be visible.

After the first day of testing, comments were mixed. While several people commented that they liked the graph, some also mentioned that the graph still felt clustered. They also mentioned that while it was easy to see the big picture, it was difficult to focus on particular points or spot subtle details.

We addressed all of this issues with a single change to the colors tied to each song’s hotness. In not ignoring the less relevant songs, their amount caused the graph to feel saturated and confusing. Thus, we made it so points with less hotness be colored white, causing them to blend with the background. This way, the more popular songs have an easier time popping out. As a finishing touch, we made the graph bigger, which spread out the points enough to make the graph cleaner and more readable.

And these changes seem to have been good calls; Comments on the second day of testing were positive all around, some specifically mentioning that the color scale was very effective at indicating hotspots. This graph was a solid success.

The Bar Chart

We then tried to see how a similar set in information would look like as a bar chart. Bars were placed along the x axis based on their loudness, and their height varied with tempo. Then, each of the bars was colored based on their hotness, with popular songs colored red and less popular songs colored gray. It was similar to the scatter plot above in some regards, and we wanted to see how that worked out.

Test day showed that it wouldn’t work as well as we hoped. People found the overlapping bars confusing, and the gray colors from less popular bars were hard to spot and made the graph look ugly and less effective. A couple of comments made suggestions, such as making the graph 3D, or base the coloring on something other than song hotness.

We stuck to out guns and kept the labels the same, instead tweaking the color scale once again. Experimenting with several colors, we found that a sharp black for the lower end of the scale made the graph more visually pleasing.

Second day of testing rolled in, and testers praised the graph for looking cool and providing gist of the overall data. On the other hand, some comments also mentioned that the graph was hard to understand. While some additional changes might yield further improvement, it is also possible that this graph is not the quite the tool for the job.

The Binned Bar Chart

Even during the first draft of our graphs, we saw the amount of bars on the previous graph to be a potential problem, which inspired the creation of this alternative version. We grabbed the original dataset and divided it into several bins based on their loudness. Then, we grabbed all the songs in each of these bins, calculated their average tempo and hotness and used those numbers to plot the graph.

During testing, feedback was positive in general, with a couple of comments saying that the graph was easy to read and to see what is popular. Because of the graph’s simplicity and the overall well reception, this graph underwent little to no changes.

The Box Plot

We also tried to display our original dataset using a Box Plot, following similar conventions to our other graphs, otherwise following most of the conventions set by the other three.

Our first version of the graph was harshly criticized for being very hard to read. This was mostly attributed to its overall gray tone caused by the misuse of the graphs color scale, and because the plentiful, tiny bars that represent each of the songs were difficult to tell apart.

We decided to improve the graph by dropping the hotness axis completely, and instead use color to make the graph easier to read. The graph was also made quite bigger, in an effort to make the bars easier to tell apart.

While these efforts certainly helped readability, specially the colors, testes still complained that the graph was hard to read or even get a gist of.

Conclusion

If I was able to draw one overall conclusion from the design process of these graphs, is that often times less is more. Both the original bar chart and the box plot were criticized because the overabundance of bars made it hard focus on any single one. If on top of that the overall shape of the graph does not show an obvious pattern, such as with the box plot, it becomes very hard to draw any conclusions at all.

But then we look over to the other two graphs. The alternative bar chart, was praised, unlike its cousin, because it was easy to read and draw conclusions from, all thanks to using a handful of averages instead of every individual data point. And the scatter plot was vastly improved when we altered the color scheme to make less relevant data point essentially transparent, reducing the overall density.

People can often keep track of so many things. Graphs are meant to condense or simplify the information in some way. If your graph fails to do that in some meaningful way, it might wind up being harder to read than the spreadsheet it pulls its data from.