Visualizing Categorical Data with Disproportionate Values Using Du Bois Wrapped Bar Charts

Figure 1: Two examples of bar charts drawn by Du Bois. In these charts, the datasets included values that were disproportionately larger than other values, and thus Du Bois applied this visualization technique to them. These two images (courtesy of Library of Congress Prints and Photographs Division) are from “The Georgia Negro: A social study,” part of the “African American Photographs Assembled for the 1900 Paris Exposition” exhibition.

This post is adapted from our ACM CHI 2020 article, which was written in collaboration with my co-authors Ryan Wesslen, Isaac Cho, and Wenwen Dou.

TLDR: Du Bois Wrapped Bar Charts limit the y-axis and wrap the largest bar’s values along the categorical axis. Compared to standard bar charts, they make finding and reading the smallest values in a dataset more accurate. They also make comparing the largest and smaller values more accurate but require more time on the user’s part to complete this task.

In the year 1990, the number of African American students taking various courses of study in Georgia schools was very disproportionate, with 2252 students taking Industrial courses and only 12 taking business courses. This discrepancy translated into the occupations of African Americans in Georgia, where the number of agricultural workers overwhelmed other occupations. In that year, as part of an exhibition in Paris, W. E. B. Du Bois used photography and data visualization to highlight the life of African Americans in the United States at the turn of the century. Facing some extreme data characteristics, Du Bois created new visualization techniques, including a modified version of bar charts. By setting a threshold for the numerical axis of his bar charts, he would wrap the largest values along the categorical axis, allowing for the smallest values to be more visible (Figure 1).

Consulting visualizations from sources such as NewYork Times, WallStreet Journal, and FiveThirtyEight, we can find many examples with large differences in numeric values across different categories that might benefit from a technique we call Du Bois Wrapped Bar Charts. It is important to note that a similar term “Wrapped Bar Graph” was previously used to describe a technique that splits sorted bars in a horizontal bar chart into multiple columns to eliminate the need for scrolling and save space. For the rest of this article, we will use Wrapped Bar Charts to refer to the Du Bois Wrapped Bar Chart technique. So let’s see how we can create Wrapped Bar Charts and interpret them.

How to create and interpret Wrapped Bar Charts?

How can we create and interpret Wrapped Bar Charts (Figure 1)? Creating one requires defining just one parameter as a threshold for the y-axis (numerical axis). Any bar with a value larger than the threshold will wrap along the categorical axis. For the wrapped bars, we can estimate the value by counting the number of wraps and multiplying the count by the assigned threshold. So, imagine a dataset with the largest value of 8500. If we set the threshold to 1000, we would create a bar that wraps 8.5 times. We can simply interpret the value of that bar by multiplying 8.5 by 1000. Any bar with values smaller than 1000 would be rendered like any standard bar chart.

Do Du Bois Wrapped Bar Charts benefit users in any way? Let’s take a look at a timely example. The total number of COVID-19 cases varies largely across different countries. A bar chart of countries by their total cases of COVID-19 would look like this:

Figure 2: A standard bar chart showing the total number of COVID-19 cases by country.

In this chart, we can easily find countries with the largest values: the USA, Italy, and Spain. And it is relatively easy to estimate their values. However, when it comes to the smallest values, it is pretty difficult to differentiate the values between Mexico, Egypt, and Argentina.

Now let’s use the same dataset to create a Du Bois Wrapped Bar Chart. We can do that by defining a threshold for the numerical axis. In this example. We can set the threshold on the y-axis to 30,000. Any bar with a value larger than 30000 will be wrapped:

Figure 3: A Wrapped Bar Chart showing the total number of COVID-19 cases by country.

Observing at the Wrapped Bar Chart (Figure 3), we can see that the bar with the most number of wraps in the USA, and we can estimate its value by multiplying the number of wraps (8 and a little bit) by the threshold of 30000. It will take a bit more time to estimate the value, but we can tell that the US has around 241000 total cases. Now let’s take a look at the smallest values. Because the y-axis covers a smaller range, the smaller values will render larger than a standard bar chart. We can now find that out of the countries in this chart, the one with the smallest number of total COVID-19 cases is Egypt with a value of around 1000. Argentina and Mexico both have slightly larger counts.

The following animation shows a standard bar chart being transformed into a Wrapped Bar Chart. We are doing this by slowly reducing the threshold on the y-axis. This animation can help us see how differentiating smallest values becomes easier as the y-axis threshold is further limited:

Figure 4: Animation showing how decreasing the threshold on the numerical axis, wraps the bars larger than that value.

How does a wrapped bar chart benefit users?

We conducted two controlled experiments and a focus group to compare Wrapped and Standard bar charts and understand how users experience Du Bois’ visualizations. Here we summarize the findings, for more details, please feel free to refer to our upcoming ACM CHI 2020 article.

In the first experiment, we asked whether users perform better doing specific tasks while using wrapped bar charts as compared to standard bar charts. The tasks were to identify the largest value, the smallest value, and the ratio between the largest and smallest value in two real-world datasets. We randomly assigned users to two different groups. Each group saw the exact same two datasets but either through a Wrapped Bar chart or a Standard Bar Chart. From 98 users, we found that:

  1. When asked to identify the smallest values, users are more accurate using the Wrapped Bar Charts.
  2. When asked to calculate the ratio between the largest and smallest values, users are more accurate using Wrapped Bar Charts.
  3. But we also found that users take more time to calculate the ratio between the largest and smallest values.

So it seems like Wrapped Bar Charts provides us with better accuracy in dealing with disproportionate values but at the expense of more time. But are they equally useful with all datasets? Let’s find out!

What data characteristics make Wrapped Bar Charts more useful?

In the second experiment, our research question was, What data characteristics make Wrapped Bar Charts more useful for users? In the previous study, we established that users could benefit from Wrapped Bar Charts when the datasets being visualized have categories with disproportionately large values. Here we used two different metrics of Normalized-Entropy and H-Spread to measure the extent of this characteristic in datasets:

Normalized-Entropy essentially shows us how much of all the values in a categorical dataset are concentrated in only a few categories. A dataset with all values equal to each other will have a Normalized-Entropy of 1, and a dataset with one value overwhelming every other value will get closer to 0.

H-Spread is a similar method to how outliers are identified in box-plots. It was originally described by John Tukey. It is used to measure how much a certain value in a dataset is larger than every other value. H-Spread is not a bounded measure so that it can be interpreted as the distance of the largest value from all other values. As Tukey would put it, a large H-Spread means that a value is “far out” from most of the values in the dataset.

Even though these two values are related, together, they capture slightly different information, allowing us to use them together to find the characteristics of datasets that would benefit from Wrapped Bar Charts.

Figure 5: Examples of datasets with high Normalized-Entropy and low H-Spread (top) and low Normalized-Entropy and high H-Spread (bottom).

In the second study, each user went through all of the 13 datasets twice, once with Wrapped Bar Charts and once with standard bar charts. Similar to the previous study, we asked users to find the smallest and largest values, then calculate the ratio between them. We found that users’ accuracy in using a wrapped bar chart in both identifying smallest bars and calculating ratios between largest and smallest values increases with lower Normalized Entropy and higher H-Spread (datasets with characteristics like the bottom half of the above chart).

Figure 6: 13 simulated datasets with different data characteristics as measured by H-Spread and Normalized Entropy. These datasets were shown to users using both Wrapped and Standard bar charts.

Insights From Users

How were users’ individual experiences with Wrapped Bar Charts? We conducted a focus group study and had a very interesting discussion with users after they all went through the questions in the previous study. We found that users generally had a clear understanding of when Wrapped bar charts would be useful. For example, one of the users said:

“[Wrapped bar charts are useful] when it is hard to read minimums and when the minimums are crucial.”

We also found that users had some frustrations with these types of charts. Majority of our users mentioned that when there are too many wraps, the largest values become hard and cumbersome to estimate:

“wrapped chart is easier if there are only a couple of wraps. but if the number of wraps increases, it gets harder.”

We also received an interesting design suggestion for improving Wrapped Bar Charts. Since the direction of the tail of the wrapped bars could move from top to bottom and from bottom to top, users suggested that it would be useful to have a reverse axis on the right side of the chart to help with reading the tail end values:

“… You need to do additional subtractions when the bar is coming downwards.”

These focused group studies helped us get some insight into how users interpret Wrapped Bar Charts and identify ways that we could potentially improve them.

Questions we still need to ask

From our studies, we have a few future directions to further improve Wrapped Bar Charts:

  1. “too many wraps” might make reading the largest values increasingly more difficult was. In follow-up studies, we need to test whether to what extent increasing the number of wraps ( in other words, choosing smaller thresholds for wrapping) would influence users in a negative way. How would we make Wrapped Bar Charts adaptable for these cases?
  2. What should we do when the tail end of a wrapped bar is facing the opposite direction of the numerical axis? We are thinking of creating an extra axis to help reading the values of the tail ends. How would that help users’ experience with these charts?
  3. How about considering Wrapped Bar Charts as an interaction technique? We could use a slider to allow users to change the threshold and wrap and unwrap bars.
  4. There is another parameter that could be added to wrapped bar charts. In the Wrapped Bar Charts, we looked at bars that always wrapped at the upper threshold and 0. The new parameters define the lower location for where bars would wrap on. We think it looks really interesting. But we haven’t tested this on users. How would introducing this parameter influence the performance of users?
Figure 7: Modifying the lower threshold in a Wrapped Bar Chart.

What do you think?

We are working to implement, test, and improve these changes to Du Bois’ Wrapped Bar Charts. We have made an interactive version of Du Bois Wrapped Bar Charts (Thanks to Raihan’s help on this), which already includes some of the changes discussed in this article. You can upload a CSV file and visualize your dataset. The app shows the entropy and H-Spread scores of your dataset. As a rule of thumb, we think that datasets with Normalized Entropy of less than 0.75 or H-Spread of more than 4.5 might benefit from being visualized using Wrapped Bar Charts. Using sliders, you can modify the threshold for the bar chart, and see how that changes the visualization outcome.

--

--

Alireza Karduni
Multiple Views: Visualization Research Explained

I’m a stationary traveler. Also, a designer, computational social scientist and information visualization researcher. https://www.karduni.com