Creating a dumbbell plot with Plotly Python

4 min readOct 12, 2021

Show group differences without a bar graph

Dumbbell plots, like bar charts, can be used to compare two groups (e.g., proportion of male and female). An advantage of a dumbbell plot is it makes your visualization more compact, which is useful when comparing multiple groups (e.g., proportion of male and female in different occupations). Additionally, if you want to emphasize the difference between the two groups, a dumbbell plot is a great visualization to use. As opposed to bar charts, it puts the emphasis on the difference between the groups rather than the absolute values. In this article, I’ll show you how to make a dumbbell plot with a Python library called Plotly [1].

Dataset

The results of the Philippines’ 2016 vice presidential election will be the dataset we’ll be working with here. The raw data consists of the total number of votes for the six different candidates at the precint-level [3]. Robredo won this election with 14,418,817 votes, but it was a close battle with Marcos, who garnered 14,155,344. We’ll examine how Filipinos in the different regions of the Philippines and those who were abroad at this time (i.e., overaseas voters) voted for these two candidates by using a dumbbell plot.

I won’t delve into the specifics of the preprocessing I performed, but basically, I calculated the proportion of voters who voted for Marcos and Robredo in each region. You can see Figure 1 for an example of the table format that I created. Each row in the table represents the absolute and relative number of people who voted for either Robredo or Marcos in each region.

Data Visualization

To create a dumbell plot with Plotly, the first thing to do is to create a normal scatter or dot plot (see Figure 2). Here, the x-axis represents the percentage of individuals who voted for a particular candidate, the y-axis represents the region, and the marker colors represents the specific candidate.

import plotly.express as pxfig = px.scatter(df_region, x="pct", y="region",    color="candidate")fig.show()

The next thing to do is to just add a line connecting these two markers. We will use the add_shape functionality of Plotly to do that (see Figure 3).

import plotly.express as pxfig = px.scatter(df_region, x="pct", y="region",    color="candidate")# iterate on each region
for i in df_region["region"].unique():
    # filter by region
    df_sub = df_region[df_region["region"] == i]
    
    fig.add_shape(
        type="line",
        layer="below",
        # connect the two markers
        ## e.g., y0='Robredo', x0=43.53
        y0=df_sub.region.values[0], x0=df_sub.pct.values[0],        ## e.g., y1='Marcos', x1=26.60
        y1=df_sub.region.values[1], x1=df_sub.pct.values[1], 
    )fig.show()

Essentially, Figure 3 is already a dumbbell plot. But to make the plot easier to understand, we can also sort the individual dumbells of the regions from highest to lowest based on the percentage difference between Robredo and Marcos. By doing this, we can easily compare the relative differences between the regions. Furthermore, because voter sizes vary by region, we can have the marker sizes show the absolute number of voters (see Figure 4).

import plotly.express as pxfig = px.scatter(df_region, x="pct", y="region", color="candidate", size="n_voters", category_orders={"region": ["I", "II", "V", "VI", "CAR", "NIR", "VII", "OAV", "IV-B", "XIII", "NCR", "III", "ARMM", "X", "IX", "XI", "XII", "VIII", "IV-A"]})# iterate on each region
for i in df_region["region"].unique():
    # filter by region
    df_sub = df_region[df_region["region"] == i]
    
    fig.add_shape(
        type="line",
        layer="below",
        # connect the two markers
        ## e.g., y0='Robredo', x0=43.53
        y0=df_sub.region.values[0], x0=df_sub.pct.values[0],## e.g., y1='Marcos', x1=26.60
        y1=df_sub.region.values[1], x1=df_sub.pct.values[1], 
    )fig.show()

Figure 4. Dumbbell plot wherein the ends of the dumbells are scaled according to the absolute number of the voters.

Now that we have a dumbbell plot that is sorted and scaled, we can easily identify the differences across the regions. In general, in areas where Marcos received a large number of votes, Robredo received a smaller number of votes, and vice versa. Both Marcos and Robredo also had their own strong supporters in various regions of the country. For Marcos, they were Regions I and II, while for Robredo they were Regions V and VI. We can also see which regions, such as Regions VIII and IV-A, had tight races during the 2016 election.

Finishing Touches

I tried to change some of the graph’s design aspects to make the plot more aesthetically appealing. To indicate the marker, I selected the same color that they use in their more recent election campaigns. I also added annotations to the final graph to help convey the findings more effectively. The final graph is shown below.

Comparison of Marcos’ and Robredo’s regional vote percentages in the 2016 Philippine vice presidential election.

References:

[1] Python library: https://plotly.com/python/

[2] Dataset: https://figshare.com/articles/dataset/2016_Philippine_vice-presidential_elections_precinct-level_data/3380116/1

Creating a dumbbell plot with Plotly Python

Dataset

Data Visualization

Finishing Touches

References:

Written by Xavier Eugenio Asuncion