Data Visualization: Bubble Charts

Laura E Shummon Maass
4 min readMay 1, 2019

--

Frozen Bubble Photo by Pixabay

There is no single best graph for displaying information visually. If you want to display stock prices over time, a line graph may be the best option for visualization. If you want to compare hundreds of home prices and the potential correlation to square footage, a scatter plot would likely be optimal. If instead you want to compare the horsepower of different car models, a bar chart would display this information appropriately. However, in a situation where we have four or five different variables that we want to display simultaneously, there are very few data visualizations that can do this better than a bubble chart.

Imagine a situation where you need to represent five separate variables in one combined visualization. For example, let’s assume you want to show the relationships between GDP (1), life expectancy (2), and population (3) for different countries (4) across multiple continents (5). If you try to accomplish this visually with any standard graph it may be possible, but the end result would not be as effective as the graph below. The bubble chart below depicts each of the five variables into one concise graph. We can easily see the Americas and Europe have a higher life expectancy and larger GDP per capita than most of the countries in Asia and Africa. Within just a few seconds of studying the bubble chart the reader can quickly get a sense of what the story is behind the numbers as well as the trends across each of the continents.

Bubble Chart from Plot.ly (https://plot.ly/python/bubble-charts/)

The Problem With Bubble Charts

Although the bubble chart is ideal for displaying four to five unique variables, it does have its shortcomings. The biggest shortcoming is that it takes time for the reader to interpret all of the various sections of the graph. When presenting a bubble chart you need to spend some time walking your audience through how the graph is structured before going into the results of what story the graph tells. Although it may seem intuitive, an audience may be overwhelmed by the amount of information contained in the visual. Walking through each of the five variables will help keep your audience engaged and they will be better prepared to correctly interpret the findings.

Another shortcoming is that you often don’t have an easy way to label the various points. In the example above, we can not see which country is which. If you are presenting the bubble chart in an interactive format you can view each country by moving your mouse over each bubble. However, this is not the most efficient way to view or present the data. If your goal is to compare differences across countries rather than across continents you’d be better off showing this data in another format.

Code for Bubble Chart from Plot.ly (https://plot.ly/python/bubble-charts/)

The last shortcoming is unique to each dataset, the variables that are chosen, and the story that is being told. The chart to the left is using the same data as above. In this version, however, the x-axis and bubble size have switched positions. By showing this change I am attempting to highlight two issues that can affect the ease of visualization. In the first bubble chart, the size of the circles are not very effective at providing a quick indicator of any trends in population size. However, in that first bubble chart the color is very effective at showing the differences across continents. In the second bubble chart the size of the circles makes it very easy to see that there is a correlation between GDP and population size, as well as at least some correlation between GDP and life expectancy (it is important to note here that the GDP in the second chart is not per capita and population is per 100k). However, with the exception of the blue dots, the color is not as useful as it was in the first graph. In both cases (the size of the circles in graph one, and the color of the circles in graph two) there is no easily distinguishable pattern. In both of these cases it may be best to leave the variable with minimal pattern out completely.

Bubble charts can be ideal for visualizing many variables in a single graph. However, the usability relies heavily on the data and question being answered. There may be some situations where all variable trends can be effectively portrayed in a single bubble chart. However, if the trend of any one variable is being muddied by the format of the graph, it may be best to leave it out and show that variable in a separate visualization. In the end, the answer for which graph to build comes down to which graph most effectively conveys the story being told.

--

--

Laura E Shummon Maass

I have worked in healthcare for 6 years analyzing data and promoting change through meaningful insights. Data is my passion, I hope my articles inspire others!