4 steps to make a dumbbell chart with Python Plotly

Tianmin
8 min readFeb 14, 2023

--

What is a dumbbell chart?

Dumbbell charts use circles and lines to show changes over time. This type of chart is ideal for illustrating and comparing change between groups. Specifically, the circles represent the starting and ending time periods, with a line between them, resembling a dumbbell. source

From my perspective, a dumbbell chart can be used more than to compare changes over time. As it emphasizes the difference in measures we are interested, it can be adopted quite broadly. For example, this Gender Pay Gap visualization is a perfect example showing the difference in pay by gender via a dumbbell chart and how its variation could look like.

The dumbbell chart is very useful because it allows you to compare two points in a series that are on the same axis. So, you can compare two points from different times, different currency values, different test scores … really any data that has two points that use the same scale on the axis. source

In this article, I’m going to show you how to build a dumbbell chart with Python Plotly in small steps.

Photo by Rubaitul Azad on Unsplash

First of all, let us prepare a dataset for building a dumbbell chart

Here we use the classic Kaggle Titanic survivor dataset to compare the survival rate between female and male passengers by ticket class. In this case, to build a dumbbell chart, we need to have a dataset where each row representing the measure, i,e. survival rate for either female or male passengers from various ticket classes.

Photo by Edwin Petrus on Unsplash
# load dataset
data = pd.read_csv("/kaggle/input/titanic/train.csv")

# first five rows
data.head()
First five rows of the Kaggle Titanic Survivor dataset
# number of passengers per sex, pclass and survived status
aggragated = data.groupby(['Sex','Pclass','Survived']).count().reset_index()[['Sex','Pclass','Survived','PassengerId']]

# to get the survival rate for each category
aggragated['sum'] = aggragated.groupby(by=['Sex', 'Pclass'])['PassengerId'].transform(lambda x: x.sum())
aggragated['share'] = round(aggragated['PassengerId'] * 100 / aggragated['sum'],0)

# We only care about the rows where passengers survived in this case
df = aggragated.loc[aggragated['Survived'] == 1]

# Transform the Pclass value into the format more readable
def pclass(value):
if value == 1:
return '1st class'
elif value == 2:
return "2nd class"
elif value == 3:
return "3rd class"

df['Pclass'] = df['Pclass'].map(pclass)

# Okay, we have the aggragated dataset ready
df[['Sex','Pclass','share']]
Data is ready for building a dumbbell chart

Then, we draw the chart in small steps.

Here we take the example from economist.com as a reference.

On a high level,

  • On the y axis, we rank the ticket class from top to down. On the x axis, we display the number between 0% to 100%, with a step of 20 percentage.
  • Plotly has a function to draw the data point as vertical lines, and in between each pair of the data points, we draw a line, so that a dumbbell is made!
  • To make the chart in a more condensed manner, I replaced the legend, both ‘Male’ and ‘Female’, with annotations into the chart. Each annotation shares the same color with the line it associates, which makes it easy for readers to grasp.

Step 1 — Let us create the lines, x axis and y axis for the dumbbell chart.

  • To make the y axis more readable, I would like to put 1st class at the top, 2nd class in between, and 3rd class at the bottom.
pclass = list(df['Pclass'].unique())
# reverse the order of the python list elements
pclass.reverse()
  • Then we loop through the y axis values, and attach the share value by sex to each pclass, i.e. ticket class.
data = {"line_x": [], "line_y": [], "male": [], "female": [], "colors": [], "Sex": [], "pclass": []}

for p in pclass:
data["female"].extend([df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0]])
data["male"].extend([df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0]])
data["line_x"].extend(
[
df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0],
df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0],
None, # What is the purpose of this None?
]
)
data["line_y"].extend([p,
p,
None # What is the purpose of this None?
])

# take a look of the variable data
data
The data variable that will be used to create the visualization object

You may ask a question, what is the purpose to add None in the data[“line_x”] and data[“line_y”] below?

I will skip answering the question for now, and let us finish the first step.

I set value of showlegend as False because I want to replace legend with annotations in the chart so that it looks more condensed.

fig = go.Figure(
data=[
go.Scatter(
x=data["line_x"],
y=data["line_y"],
mode="lines",
showlegend=False,
marker=dict(
color="#85868a"
)
),

]
)

fig.show()
We have y axis, x axis and three horizontal lines

Okay, we have the x axis, y axis and three lines now.

Let us look back at the question and remove the None in the code and compare how it looks thereafter.

data = {"line_x": [], "line_y": [], "male": [], "female": [], "colors": [], "Sex": [], "pclass": []}

for p in pclass:
data["female"].extend([df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0]])
data["male"].extend([df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0]])
data["line_x"].extend(
[
df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0],
df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0]
# None - comment out the None
]
)
data["line_y"].extend([p,
p
# None - comment out the None
])

fig = go.Figure(
data=[
go.Scatter(
x=data["line_x"],
y=data["line_y"],
mode="lines",
showlegend=False,
marker=dict(
color="#85868a"
)
),

]
)

fig.show()
Without the None, the three lines are mistakenly joined together.

Aha, the None works as a break to cut the three lines, otherwise, it will turn out the three lines being joined together, which of course is not what we want.

Step 2 — add bells for the dumbell charts

  • Based on the horizontal gray lines, we add data points as the bells at both the start and the end of each line.
  • To make the bells visually contrast, I give color #1e5f97 to female. data points, and #abaa9c to male data points.
  • We also add information that shows up when reader hover the mouse over the data point.
pclass = list(df['Pclass'].unique())
pclass.reverse()

data = {"line_x": [], "line_y": [], "male": [], "female": [], "colors": [], "Sex": [], "pclass": []}

for p in pclass:
data["female"].extend([df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0]])
data["male"].extend([df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0]])
data["line_x"].extend(
[
df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0],
df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0],
None,
]
)
data["line_y"].extend([p, p, None]),

fig = go.Figure(
data=[
go.Scatter(
x=data["line_x"],
y=data["line_y"],
mode="lines",
showlegend=False,
marker=dict(
color="#85868a"
)
),
# Added the code below to draw bells
go.Scatter(
x=data["female"],
y=pclass,
mode="markers",
name="female",
marker_symbol = "line-ns",
marker_line_color="#1e5f97",
marker_color="#1e5f97",
marker_line_width=6,
marker_size=15,
hovertemplate=
"<b>%{x}</b> <br></br>% of female passengers in <b>%{y}</b> cabin survived" +
"<extra></extra>"

),
go.Scatter(
x=data["male"],
y=pclass,
mode="markers",
name="male",
marker_symbol = "line-ns", # set markers as vertical lines. https://plotly.com/python/marker-style/
marker_line_color="#abaa9c",
marker_color="#abaa9c",
marker_line_width=6,
marker_size=15,
hovertemplate=
"<b>%{x}</b> <br></br>% of male passengers in <b>%{y}</b> cabin survived" +
"<extra></extra>"
),
]
)



fig.show()
We have added the bells for the dumbbell chart

Step 3 — add annonation and make the chart condensed

  • To make the whole visualization more condensed, I removed the default legends which sits at the right up corner of the chart, and replaced it with annotations attached closed to the line.
  • Style the font family, font size, position of the chart, and background color, etc. of the visualization. You can find the parameters in the code below.
pclass = list(df['Pclass'].unique())
pclass.reverse()

data = {"line_x": [], "line_y": [], "male": [], "female": [], "colors": [], "Sex": [], "pclass": []}

for p in pclass:
data["female"].extend([df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0]])
data["male"].extend([df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0]])
data["line_x"].extend(
[
df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0],
df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0],
None,
]
)
data["line_y"].extend([p, p, None]),

fig = go.Figure(
data=[
go.Scatter(
x=data["line_x"],
y=data["line_y"],
mode="lines",
showlegend=False,
marker=dict(
color="#85868a"
)
),
go.Scatter(
x=data["female"],
y=pclass,
mode="markers",
name="female",
marker_symbol = "line-ns",
marker_line_color="#1e5f97",
marker_color="#1e5f97",
marker_line_width=6,
marker_size=15,
hovertemplate=
"<b>%{x}</b> <br></br>% of female passengers in <b>%{y}</b> cabin survived" +
"<extra></extra>"

),
go.Scatter(
x=data["male"],
y=pclass,
mode="markers",
name="male",
marker_symbol = "line-ns", # https://plotly.com/python/marker-style/
marker_line_color="#abaa9c",
marker_color="#abaa9c",
marker_line_width=6,
marker_size=15,
hovertemplate=
"<b>%{x}</b> <br></br>% of male passengers in <b>%{y}</b> cabin survived" +
"<extra></extra>"
),
]
)

fig.add_trace(go.Scatter(
x=[30],
y=["1st class"],
mode="text",
name="Male",
text=["<b>Male<b>"],
textposition="middle left",
hoverinfo = 'skip',
textfont=dict(
family="Open Sans",
size=20,
color="#abaa9c"
)
))

fig.add_trace(go.Scatter(
x=[57],
y=["3rd class"],
mode="text",
name="Female",
text=["<b>Female<b>"],
textposition="middle right",
hoverinfo = 'skip',
textfont=dict(
family="Open Sans",
size=20,
color="#1e5f97"
)
))


# style the chart
fig.update_layout(
title="<b>Percentage of passengers that survived at Titanic<b>",
title_font_family="Open Sans",
title_font_size = 25,
title_font_color = "#343437",
font_family = "Open Sans",
font_size = 25,
width = 800,
legend_itemclick=False,
showlegend=False,
paper_bgcolor='#f1f0ea',
plot_bgcolor='#f1f0ea',
xaxis_range=[0,100],
margin=dict(
l=180,
r=50,
b=100,
t=120,
pad=4
),
)


fig.show()
A dumbbell chart is ready!

Step 4 — highlight the zeroline of the x axis and adjust the visuals properly (Last step)

  • Dumbbell chart is useful when we compare things, so highlighting the zero line in this chart where we compare the percentages is helpful to make the information more appealing.
  • I also move the numbers of the x axis upward at the top of the chart, in order to make it closer to the title.
pclass = list(df['Pclass'].unique())
pclass.reverse()

data = {"line_x": [], "line_y": [], "male": [], "female": [], "colors": [], "Sex": [], "pclass": []}

for p in pclass:
data["female"].extend([df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0]])
data["male"].extend([df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0]])
data["line_x"].extend(
[
df.loc[(df.Sex == "female") & (df.Pclass == p)]["share"].values[0],
df.loc[(df.Sex == "male") & (df.Pclass == p)]["share"].values[0],
None,
]
)
data["line_y"].extend([p, p, None]),

fig = go.Figure(
data=[
go.Scatter(
x=data["line_x"],
y=data["line_y"],
mode="lines",
showlegend=False,
marker=dict(
color="#85868a"
)
),
go.Scatter(
x=data["female"],
y=pclass,
mode="markers",
name="female",
marker_symbol = "line-ns",
marker_line_color="#1e5f97",
marker_color="#1e5f97",
marker_line_width=6,
marker_size=15,
hovertemplate=
"<b>%{x}</b> <br></br>% of female passengers in <b>%{y}</b> cabin survived" +
"<extra></extra>"

),
go.Scatter(
x=data["male"],
y=pclass,
mode="markers",
name="male",
marker_symbol = "line-ns", # https://plotly.com/python/marker-style/
marker_line_color="#abaa9c",
marker_color="#abaa9c",
marker_line_width=6,
marker_size=15,
hovertemplate=
"<b>%{x}</b> <br></br>% of male passengers in <b>%{y}</b> cabin survived" +
"<extra></extra>"
),
]
)

fig.add_trace(go.Scatter(
x=[30],
y=["1st class"],
mode="text",
name="Male",
text=["<b>Male<b>"],
textposition="middle left",
hoverinfo = 'skip',
textfont=dict(
family="Open Sans",
size=20,
color="#abaa9c"
)
))

fig.add_trace(go.Scatter(
x=[57],
y=["3rd class"],
mode="text",
name="Female",
text=["<b>Female<b>"],
textposition="middle right",
hoverinfo = 'skip',
textfont=dict(
family="Open Sans",
size=20,
color="#1e5f97"
)
))



fig.update_layout(
title="<b>Percentage of passengers that survived at Titanic<b>",
title_font_family="Open Sans",
title_font_size = 25,
title_font_color = "#343437",
font_family = "Open Sans",
font_size = 25,
width = 800,
legend_itemclick=False,
showlegend=False,
paper_bgcolor='#f1f0ea',
plot_bgcolor='#f1f0ea',
xaxis_range=[0,100],
margin=dict(
l=180,
r=50,
b=100,
t=120,
pad=4
),
)

fig.update_xaxes(zeroline=True,
zerolinewidth=1.5,
zerolinecolor='#343437',
side="top", # https://plotly.com/python/reference/layout/xaxis/#layout-xaxis-rangeslider-thickness
showgrid=True,
gridwidth=1.2,
gridcolor='#e3e1df'
)
fig.update_yaxes(showgrid=True,
gridwidth=1.2,
gridcolor='#e3e1df',
#ticklabelposition = 'outside left'
automargin= "left"
)

fig.show()
Yay!

Thanks for reading!

This is the Kaggle notebook from where I was creating this visualization. Please feel free to read or comment.

--

--