Dashing to application: a minutiae guide to creating a Dash app

28 min readAug 10, 2023

Introduction

Flashback two years ago. I was suffering from a gastro-intestinal infection, which after several tests, was discovered to be H. Pylori. After taking medication and several subsequent tests afterwards, the embarrasing gut-related effects from this pathogen still lingered on. And on.

Before you swipe over this blog because it seems to start on a somewhat wrong footing of someone else’s medical issues, far from it. In fact, this is a data visualization blog and if you are one of those visual people you are darn in the right place. Actually, it is my personal experience that inspired me to use data science to bring to fore the harrowing afflictions that dysentry sufferers undergo in silence. Whoever is reading this might be in a position to expediate the necessary assistance or craft policies that will lead to a dysentry free life for all.

If you’ve worked with Dash-Plotly, you will know it is an amazing data visualization tool. However, learning it requires patience. In this blog, we shall embark on a journey of creating a dashboard that shows the extent of diarrhoeal related deaths among children under 5 years upto the year 2019. It’s a long read, so don’t be in a rush to finish it. As proper orientation a priori, this is the app we shall building.

Data Cleaning

As always, in Python, you begin with loading the necessary packages. Bear in mind that the entire project was done in Pycharm but printed out in Jupyter.If you want to install a package into Pycharm, it is highly recommended you follow this advisory. The entire code is available here on Github.

# Import the required packages
import pandas as pd
import numpy as np
import plotly.express as px
from dash import Dash, dcc, html, Input, Output
import json
import time

After importing the above, add the following dictionary. The purpose of the dictionary is to help in debugging a certain markdown we shall generate later.

styles = {
    'pre': {
        'border': 'thin lightgrey solid',
        'overflowX': 'scroll'
    }
}

Don’t let the above code scare you, you’ll soon be back into familiar territory. For now, let’s download the dataset we want. The source is from Our World in Data by Oxford. Surely no phenomena is outside the study realm of scientists. The image below shows how you can download the dataset, but your’s truly has been generous in uploading it to Github as can be seen here.

Alright, since you have been able to access the dataset, next in the pipeline is loading it to Pycharm or any other interpreter of your choice.

# Download the dataset
df = pd.read_csv("https://raw.githubusercontent.com/sammigachuhi/dash_plotly_projects/main/data/diarrhoea_children_gdp.csv")
df.head()

 Entity Code Year Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate) GDP per capita, PPP (constant 2017 international $) Population (historical estimates) Continent
0 Abkhazia OWID_ABK 2015 NaN NaN NaN Asia
1 Afghanistan AFG 1990 196.60 NaN 10694804.0 NaN
2 Afghanistan AFG 1991 198.09 NaN 10745168.0 NaN
3 Afghanistan AFG 1992 201.59 NaN 12057436.0 NaN
4 Afghanistan AFG 1993 234.98 NaN 14003764.0 NaN

Great.

You may or may not be an experienced data scientist, but you can note there is an issue with this dataset due to the presence of several NaN values that stand out in plain black and white. Missing data can be either Null or NaN where in the latter they stand for Not A Number. We shall not go into the nitty gritty details of the differences between the two, but they do cause problems in data analysis, such as introducing bias and unbelievably as it may sound, increasing memory usage if they are too many. Here is one of the best ways to handle missing data: elimination.

We will remove the NaN values in the columns “Deaths — Diarrheal diseases — Sex: Both — Age: Under 5 (Rate)”, and “Population (historical estimates)”. Both are crucial for our data visualization project and thus the extra effort of cleansing them from pesky NaN values.

# Now remove all null values in the column: "Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)" 
df = df.dropna(
    subset=["Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)", "Population (historical estimates)"])

# Let's check
df.head()

Entity Code Year Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate) GDP per capita, PPP (constant 2017 international $) Population (historical estimates) Continent
1 Afghanistan AFG 1990 196.60 NaN 10694804.0 NaN
2 Afghanistan AFG 1991 198.09 NaN 10745168.0 NaN
3 Afghanistan AFG 1992 201.59 NaN 12057436.0 NaN
4 Afghanistan AFG 1993 234.98 NaN 14003764.0 NaN
5 Afghanistan AFG 1994 221.60 NaN 15455560.0 NaN

Since at some point we would like to aggregate the countries in our dataset according to, say — their respective continents, it would be good if each country was associated with its respective continent. Our dataset has a Continent column but it is interspersed with just one continent name standing for several countries! There’s no way aggregation can happen since we can’t categorize countries to their respective continents when the Continent column consists more of NaN values than the actual continent names themselves. Luckily, there are various open source datasets having both the country name, continent and other auxillary data such as country code. One just has to be smart in the common ID to use during a joining operation, or otherwise create one if non existent. We happen to fall in the latter.

Let’s load one such geospatial dataset.

# The below dataset contains country codes and their continents. We want to join the countries in our diarrhoea dataset
# to their sub-regions since the `continent` column in our diarrhoea dataset has missing values
df_code = pd.read_csv("https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv")
df_code.head()

name alpha-2 alpha-3 country-code iso_3166-2 region sub-region intermediate-region region-code sub-region-code intermediate-region-code
0 Afghanistan AF AFG 4 ISO 3166-2:AF Asia Southern Asia NaN 142.0 34.0 NaN
1 Åland Islands AX ALA 248 ISO 3166-2:AX Europe Northern Europe NaN 150.0 154.0 NaN
2 Albania AL ALB 8 ISO 3166-2:AL Europe Southern Europe NaN 150.0 39.0 NaN
3 Algeria DZ DZA 12 ISO 3166-2:DZ Africa Northern Africa NaN 2.0 15.0 NaN
4 American Samoa AS ASM 16 ISO 3166-2:AS Oceania Polynesia NaN 9.0 61.0 NaN

One thing to note: our initial df dataset had the country codes listed under the Code column. For our df_code dataset, they are listed under the alpha-3 column. We can create a common ID key between the df and the df_code datasets by creating a new column called Code in the df_code dataset. For a join operation to succeed, there has to be a common key between the two datasets to be joined.

# Create the commond ID key for df_code
df_code["Code"] = df_code["alpha-3"]
df_code.head()

name alpha-2 alpha-3 country-code iso_3166-2 region sub-region intermediate-region region-code sub-region-code intermediate-region-code Code
0 Afghanistan AF AFG 4 ISO 3166-2:AF Asia Southern Asia NaN 142.0 34.0 NaN AFG
1 Åland Islands AX ALA 248 ISO 3166-2:AX Europe Northern Europe NaN 150.0 154.0 NaN ALA
2 Albania AL ALB 8 ISO 3166-2:AL Europe Southern Europe NaN 150.0 39.0 NaN ALB
3 Algeria DZ DZA 12 ISO 3166-2:DZ Africa Northern Africa NaN 2.0 15.0 NaN DZA
4 American Samoa AS ASM 16 ISO 3166-2:AS Oceania Polynesia NaN 9.0 61.0 NaN ASM

Now, the continents that we need are found in df_code's sub-region column. We want to add this column to our initial df dataset and thankfully, the join operation to do this is possible thanks to the common column Code existing in both datasets. This common column serves as the common key for the join operation.

df = pd.merge(df, df_code[["Code", "sub-region"]], on="Code", how="left") # Merge with country `Code` to their continents

# Print out the joined df
df.head()

Entity Code Year Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate) GDP per capita, PPP (constant 2017 international $) Population (historical estimates) Continent sub-region
0 Afghanistan AFG 1990 196.60 NaN 10694804.0 NaN Southern Asia
1 Afghanistan AFG 1991 198.09 NaN 10745168.0 NaN Southern Asia
2 Afghanistan AFG 1992 201.59 NaN 12057436.0 NaN Southern Asia
3 Afghanistan AFG 1993 234.98 NaN 14003764.0 NaN Southern Asia
4 Afghanistan AFG 1994 221.60 NaN 15455560.0 NaN Southern Asia

As you can see, each country is joined to its rightful continent in our new df dataset!

By checking the above few table rows, it is tempting to assume all is well, wind on the sails and all systems go to proceed to your data science exercise. However, if you open the dataset in Excel, you will find out that the sub-region column has several missing values as well! In Python we can check for missing values like so:

# Check for missing values in sub-region
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6150 entries, 0 to 6149
Data columns (total 8 columns):
 #   Column                                                         Non-Null Count  Dtype  
---  ------                                                         --------------  -----  
 0   Entity                                                         6150 non-null   object 
 1   Code                                                           6150 non-null   object 
 2   Year                                                           6150 non-null   int64  
 3   Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)  6150 non-null   float64
 4   GDP per capita, PPP (constant 2017 international $)            5365 non-null   float64
 5   Population (historical estimates)                              6150 non-null   float64
 6   Continent                                                      204 non-null    object 
 7   sub-region                                                     6120 non-null   object 
dtypes: float64(3), int64(1), object(4)
memory usage: 432.4+ KB

Ignoring all the other columns, there are around 30 missing values in the sub-region column. Let's remove them. As our modus operandi, we show no mercy to missing values.

# Remove all rows with value `None` in column `sub-region`
df = df.dropna(subset=["sub-region"])
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6120 entries, 0 to 6149
Data columns (total 8 columns):
 #   Column                                                         Non-Null Count  Dtype  
---  ------                                                         --------------  -----  
 0   Entity                                                         6120 non-null   object 
 1   Code                                                           6120 non-null   object 
 2   Year                                                           6120 non-null   int64  
 3   Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)  6120 non-null   float64
 4   GDP per capita, PPP (constant 2017 international $)            5335 non-null   float64
 5   Population (historical estimates)                              6120 non-null   float64
 6   Continent                                                      204 non-null    object 
 7   sub-region                                                     6120 non-null   object 
dtypes: float64(3), int64(1), object(4)
memory usage: 430.3+ KB

You can now save the cleaned dataset to your local directory.

You always save the best for last… and to last

# Save the cleaned dataframe
# df.to_csv("data/cleaned_df2.csv")

Creating the Dash App

To create a dash app, you create a Dash instance and assign it to the name of your app. For simplicity purposes, let’s name the app as app.

app = Dash(__name__)

We shall also add another instance called server which shall enable us to publish our Dash-Plotly dashboard to a platform called Render.

server = app.server

Now to something that can be described more intuitively. To create a layout for your dashboard, we use the layout method. The layout is composed of a tree of "components" such as html.Div and dcc.Graph which are the building components of our Dash app. You first start with creating an overall html.Div which is like the megastructure that will hold other html.Divs and dcc.Graphs together.

To create a layout, you start with the following syntax:

app.layout = html.Div([
    ])

Pull up a handbrake and let me give some advice to you dear enthusiastic programmer…

If building a dashboard, start simple and scale up as you simultaneously change screens from your code to your dashboard. Don’t build everything at once and hit the run button when you think it is complete. There is nothing as frustrating as cording hard all hour long, only to be met with a straight-to-your-face kind of error when running your code.

One of my maxims is: “slow and steady is better than fast and shoddy”.

To run our dash script so far, we introduce an open sesame kind of code shown below.

if __name__ == "__main__":
    app.run(debug=True)

Now, what do the above lines do? The first — if __name__ == "__main__": has to do with running a script if it imported or not. The second part, app.run(debug=True) automatically refreshes the browser when you make changes.

So far, our framework looks as follows:

# Import the required packages
import pandas as pd
import numpy as np
import plotly.express as px
from dash import Dash, dcc, html, Input, Output
import json
import time

styles = {
    'pre': {
        'border': 'thin lightgrey solid',
        'overflowX': 'scroll'
    }
}

# Source of the data used is: https://ourworldindata.org/childhood-diarrheal-diseases?utm_source=pocket_saves
# from the download section of compound line graph

# Clean the dataset for: "D:\gachuhi\dash-projects\dash-layout\data\diarrhoea_children_gdp.csv"
# Specifically the CSV file: "diarrhoea_children_gdp.csv"

# Remove all rows that have null values in the column: "Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)"
# and "Population (historical estimates)"

df = pd.read_csv("https://raw.githubusercontent.com/sammigachuhi/dash_plotly_projects/main/data/diarrhoea_children_gdp.csv")
# print(df)

# Now remove all null values in the column: "Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)" and
# "Population (historical estimates)"
# df = df.copy()
df = df.dropna(
    subset=["Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)", "Population (historical estimates)"])
# print(df)

# The below dataset contains country codes and their continents. We want to join the countries in our diarrhoea dataset
# to their sub-regions since the `continent` column in our diarrhoea dataset has missing values
df_code = pd.read_csv("https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv")
df_code["Code"] = df_code["alpha-3"]

df = pd.merge(df, df_code[["Code", "sub-region"]], on="Code", how="left") # Merge with country `Code` to their continents

# Remove all rows with value `None` in column `sub-region`
df = df.dropna(subset=["sub-region"])

# Save the cleaned dataframe
# df.to_csv("data/cleaned_df2.csv")

# Now to create the plotly dashboard
app = Dash(__name__)
server = app.server

app.layout = html.Div([
    
])

# if __name__ == "__main__": # Uncomment this in your file
#     app.run(debug=True)    # Uncomment this in your file

Running the above code with py <name-of-your-dash-app>.py in the PyCharm terminal produces an empty canvas in the browser since we have not parsed anything into app.layout.

A dash layout is composed of html.Div and dcc.Graph components. The former can be thought of as a wrapper that encapsulates any content that goes into it. The latter, on the other hand, is responsible for most plotly created data visualizations such as maps and graphs. It is mostly parsed with the figure argument.

The Layout

The first components we shall parse into our layout are a heading and some background information concerning our dataset.

Update the layout with the following code. Dash has a plethora of components. It is highly suggested that for anything new you come across, it will be a noble obligation to refer to their API reference. Although as large as the ocean, it is well simplified for both a beginner and well-versed programmer to understand.

app.layout = html.Div([
    # 0 The heading
    html.H2(f"Diarrhoea related deaths amongs children <5 years, World"),

    html.Br(),

    dcc.Markdown("""
    Source: [Our World in Data](https://ourworldindata.org/childhood-diarrheal-diseases?utm_source=pocket_reader)

    Our World In Data is a project of the Global Change Data Lab, a registered charity in England 
    and Wales (Charity Number 1186433).
    """,
                 link_target="_blank"),

])

Hoping that your if __name__ == "__main__": app.run(debug=True) is uncommented (unlike mine), running the terminal again fires up our dash app.

One must conquer small hills before they challenge Everest. Likewise, if our intro part of the dash app is well completed we can now go on to deal with the more complex parts.

The Graphs

A mountain can have several hills. The first hill we shall climb will be to create an interactive global map that is responsive to the year slider (you will see it later) and the country clicked on the map. When the slider is moved, say, from 2013 to 2015, the map, which is ideally a choropleth map, should reflect the changes. When we click on a country, say Kenya, it should update some other graphs we’ll come to later.

Remember when we said that the dcc.Graph component stands for figures? Well, not just graphs but maps as well. It will be passed an ID, which applies to all dash components so as to identify them in callbacks. If one will employ callbacks somewhere in their Dash application, it is necessary for their Dash components to have an ID attribute.

Enough said.

Let’s add it.

-- snip--
#1 the map layout
    dcc.Graph(id="map-year"),

The Dash component of creating a slider is dcc.Slider.

--snip--
#1 the map layout
    dcc.Graph(id="map-year"),
    
    #2 The slider
    dcc.Slider(
        df["Year"].min(),
        df["Year"].max(),
        step=None,
        id="year-slider",
        value=df["Year"].max(),
        marks={str(year): str(year) for year in df["Year"].unique()}
    ),
--snip--

If you are curious, you can run your dash app again. You should see an empty graphic and a slider populated with all the years across our cleaned dataset df; from the earliest (1990) to the latest (2019) in our df dataset.

In plotly, just like other visualization packages, one can plot two figures on the same row. We can place two figures side by side by putting them inside the same html.Div element. Think of html.Div as creating a division to put content into, as explained earlier. In our case, we want to display the heatmap and the scatterplot within the same division but later on I shall show how one can place figures side by side. Note that I didn't try for this particular heatmap and scatterplot because both appeared too squeezed to be legible when they were plotted on the same line. Nevertheless, your attention so far shall not go unrewarded. We shall draw two charts on the same line later on.

Notice I have added a new component — html.Br() to introduce a linebreak. It's just another one of several Dash components.

--snip--
#3 The heatmap and scatterplot on the same column
    html.Div([
        dcc.Graph(id="heat-map-country-year"),
        dcc.Graph(id="scatterplot-death-gdp-year")
    ]),

    html.Br(),
    --snip--

If you refresh the browser, you will notice two empty figures added.

We had mentioned earlier that we want our map to update some graphs when a country is clicked. The code that shall be introduced involves a markdown that is generated upon a user clicking certain fields, in our case, the countries. In web programming parlance, these are known as ‘click’ events. How to create it has largely been borrowed from here.

--snip--
##### This is to help in debugging capturing the clicked country on the choropleth map
    html.Div([
        dcc.Markdown("""
                **Click Data**

        Click on points in the graph.
                """),
        html.Pre(id='click-data', style=styles['pre']),
        ], className='three columns'),
    #########
--snip--

In fact, the style object we had created earlier on right at the very beginning was for this specific code chunk. It was to style how our markdown shall appear. Through the help of html.Pre, the markdown generates preformatted text from our HTML.

If you refresh your browser, you will notice a new kid on the block. If we had our map ready, any country we clicked on the map will generate some text in that portion. It shows nothing for now, but once we add the callbacks some text is generated based on a ‘click’ event.

I wanna quickly fulfill my promise so it’s out of the way.

Just how do we put two figures side by side?

Well, there are various ways, but the easiest is putting the two Dash components inside a html.Div, and using the style attribute of Dash to set the CSS placement styles for each. In the below code, we have set each figure to fill a width of 48% of the screen and both to be on the same block. The latter is enabled by the "display": "inline-block" CSS property.

#4 Draw line graph of population of selected country dependent on country selected on map and likewise for
    # gdp per capita in one row
    html.Div([
        dcc.Graph(id="line-graph-population", style={"width": "48%", "display": "inline-block"}),
        dcc.Graph(id="line-graph-gdp-capita", style={"width": "48%", "display": "inline-block"})
    ]),

Finally, to close our long journey of creating our Dash app layout (app.layout) we shall add the last Dash component. Another graph.

--snip--
#5 Draw bar graph of diarrhoea related deaths across the years
    dcc.Graph(id="diarrhoea-bar-graph"),
--snip--

You should have two empty figures side by side on the same row and one large graph below them (the one with id=”diarrhoea-bar-graph”).

Wow. At least we have a skeleton in order. Your app layout should be spread out as follows:

app.layout = html.Div([
    # 0 The heading
    html.H2(f"Diarrhoea related deaths amongs children <5 years, World"),

    html.Br(),

    dcc.Markdown("""
    Source: [Our World in Data](https://ourworldindata.org/childhood-diarrheal-diseases?utm_source=pocket_reader)

    Our World In Data is a project of the Global Change Data Lab, a registered charity in England 
    and Wales (Charity Number 1186433).
    """,
                 link_target="_blank"),

    #1 the map layout
    dcc.Graph(id="map-year"),

    #2 The slider
    dcc.Slider(
        df["Year"].min(),
        df["Year"].max(),
        step=None,
        id="year-slider",
        value=df["Year"].max(),
        marks={str(year): str(year) for year in df["Year"].unique()}
    ),

    #3 The heatmap and scatterplot on the same column
    html.Div([
        dcc.Graph(id="heat-map-country-year"),
        dcc.Graph(id="scatterplot-death-gdp-year")
    ]),

    html.Br(), # This is to insert a line break.

##### This is to help in debugging capturing the clicked country on the choropleth map
    html.Div([
        dcc.Markdown("""
                **Click Data**

                Click on points in the graph.
                """),
        html.Pre(id='click-data', style=styles['pre']),
        ], className='three columns'),
    #########

#4 Draw line graph of population of selected country dependent on country selected on map and likewise for
    # gdp per capita in one row
    html.Div([
        dcc.Graph(id="line-graph-population", style={"width": "48%", "display": "inline-block"}),
        dcc.Graph(id="line-graph-gdp-capita", style={"width": "48%", "display": "inline-block"})
    ]),

#5 Draw bar graph of diarrhoea related deaths across the years
    dcc.Graph(id="diarrhoea-bar-graph"),


])

Our second hill to conquer shall involve introducing interactivity to the map.

Callbacks

During a hike to the Aberdares as a teenager, there was a certain hill called “Foot of Despair”. In Dash, creating callbacks can be hell if you don’t fully understand some subtle concepts, but don’t despair. Just as I conquered the “Foot of Despair” with zero hiking skills so you too can conquer this thing!

Firstly, what the heck is a callback?

A callback is simply a function that is passed inside another function. In other words, the function passed inside another serves as the argument of the master function it has been passed into.

In Dash, callbacks are useful in updating the app’s outputs such as a figure when some parts of the input components change.

Without further ado, let’s create our first callback.

As a nota beta, your callback should come after the app.layout section. In brief, the callback should have the Output and Input arguments. One more thing, order matters. The Output refers to what will be the result while the Input refers to the input components that will influence the result (the Output).

# Callbacks section
## Callback for #1 The map layout
@app.callback(
    Output("map-year", "figure"),
    Input("year-slider", "value")
)

You will notice that the Output and Input arguments also contain their own arguments. The first argument, such as map-year in the case of the Output refers to the ID of our Dash component. The keyword for this first argument is component_ID but it's not necessary to put the keyword before the argument. The second argument refers to the property it takes, in this case figure. The keyword for this second argument is component_property but like the first keyword, it is also often omitted. Thus, these are the Output arguments:

Output(component_id=”map_year", component_property="figure”`.

The same rules apply for the Input argument.

Leaving the callback as above will result in an error in our Dash app. According to the Dash rules, the callback function must appear immediately below this callback, not even line breaks are allowed as breathing space between the two. Therefore, because a map was the first graphic in our app, we have to create the function that displays it. As already stressed, it will be written immediately below our callback decorator.

--snip--
@app.callback(
    Output("map-year", "figure"),
    Input("year-slider", "value")
)
def update_map(year_slider):
    dff = df[df["Year"] == year_slider]

    fig = px.choropleth(dff, locations="Entity", locationmode="country names",
                        color="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                        hover_name="Year",
                        color_continuous_scale=px.colors.sequential.Plasma,
                        title=f"Map showing deaths from diarrhoeal diseases for children <5 years in {year_slider}",
                        custom_data=["Entity"],
                        labels={
                            "Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"})

    fig.update_layout(transition={"easing": "elastic-out"})

    return fig
--snip--

A decorator is more than just a fancy sounding name to make your code look nicer. It actually performs an advanced role of modifying the functionality of another function by wrapping it in another function. This site takes you through it in a fluid way.

Just two things to be mentioned.

year slider argument inside def update_map() function - understanding arguments inside dash callback functions was one of the most confusing things in my early Plotly days. It is only after understanding it that it made all the sense. The argument inside def update_map is actually a reference to the year-slider inside the callback decorator. Your callback function arguments should only contain the references to the inputs, in the exact order they have been listed inside your callback decorator.

The naming of the argument in the callback function --update_map above, doesn't have to be similar with the component_id in your callback decorator. It can go by any name, such as slider or whatever (just try it out). Just know it will go with the order you have put in your callback decorator. If there was a second Input argument of Input("another-slider", "value"), the arguments in update_map would be update_map("year_slider", "another_slider"). Any other argument names would suffice since order is all that matters.

2. The new dff in dff = df[df["Year"] == year_slider] - what this code line does is filter our dataset to only contain the rows of the year selected in our slider.

The rest such as px.choropleth and update_layout can be checked up in the Plotly API reference.

Now refresh your browser. You should see a plotly map appear. Play around with the slider and note that the choropleth map changes the colour for various countries based on the year, as well as updating the title based on the year selected.

For now, note the custom_data=["Entity"] which we shall revisit later.

For the other callbacks that shall follow, the format is more or less the same. Let’s start with a Treemap.

# Callbacks for #3 heatmap and scatterplot on the same page
# Heatmap callback
@app.callback(
    Output("heat-map-country-year", "figure"),
    Input("year-slider", "value")
)
def update_heatmap(year_slider):
    dff = df[df["Year"] == year_slider]

    fig = px.treemap(dff, names="Entity", path=["sub-region", "Entity"],
                     values="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                     color="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)", hover_name="Year",
                     hover_data="GDP per capita, PPP (constant 2017 international $)",
                     color_continuous_scale=px.colors.sequential.Plasma,
                     color_continuous_midpoint=np.average(
                         dff["Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)"],
                         weights=dff["GDP per capita, PPP (constant 2017 international $)"]),
                     title=f"Treemap Chart showing deaths from diarrhoeal diseases" + "<br>" +
                           f"for children <5 years in {year_slider}",
                     labels={"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"})

    fig.update_layout(transition={"easing": "elastic-out",
                                  "duration": 50},
                      margin={"t":50, "l":25, "r":25, "b":25})

    return fig

The result is a beautiful treemap reactive to the year selected on the slider.

In keeping with the trail of our app’s layout, how about a scatterplot? The scatterplot shall show the diarrhoea deaths for each country against the Gross Domestic Product (GDP) in tandem with the year selected on the slider.

--snip--
# Scatterplot callback
@app.callback(
    Output("scatterplot-death-gdp-year", "figure"),
    Input("year-slider", "value")
)
def update_scatterplot(year_slider):

    dff = df[df["Year"] == year_slider]

    fig = px.scatter(dff, x="GDP per capita, PPP (constant 2017 international $)",
                     y="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                     color="sub-region",
                     size="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)", hover_name="Entity",
                     hover_data="GDP per capita, PPP (constant 2017 international $)",
                     title=f"Scatterplot showing Deaths from Diarrhoea cases against GDP per capita," + "<br>" +
                           f"PPP (constant 2017 international $ for {year_slider}",
                     labels={"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"})

    fig.update_layout(transition={"easing": "elastic-out",
                                  "duration": 50})

    return fig

--snip--

We get a scatterplot as shown below.

Now, to the mysterious markdown that prints out what we’ve clicked on the map. Its callback anatomy is displayed below.

--snip--
@app.callback(
    Output('click-data', 'children'),
    Input('map-year', 'clickData'))
def display_click_data(clickData):
    return json.dumps(clickData, indent=2)
--snip--

The above function, aided by json.dumps prints out a python object in JSON format. JSON stands for JavaScript Object Notation (JSON).

“Why convert it to JSON format?” You may ask.

JSON is a data interchange format that stores information in the form of name/value pairs. Printing out in JSON format will enable us know which objects to choose from a clicked event which shall thereafter be used to update our remaining three figures.

Now click on any country on the map, say Australia due to its conspicous nature, and you get a legible text output as follows:

Now if we wanted to access the keys that would serve to update our maps, we would go for either the location key or customdata key. We would then store the keys into an object like so:

country_name = clickData["points"][0]["location"]

country_name = clickData["points"][0]["customData"]

It is unclear where “customData” comes from since in my investigation even commenting out custom_data=["Entity"] in px.choropleth did not make it disappear from the JSON text. Nevertheless, using the second method to access the country clicked did not work out. Only the location key worked stress free.

Before demonstrating how to access the country names, just note that clickData is an attribute of dcc.Graph component that updates when you click on a point. There are others whose function names are a tell-all such as hoverData, selectedData, and relayoutData.

We will proceed to creating callbacks for the figures we set side by side. By doing so, you shall see how to extract our country names.

#4.1 Callback for line graph for population against years
@app.callback(
    Output("line-graph-population", "figure"),
    Input("map-year", "clickData")
)
def line_population(clickData):
    if clickData is None:
        country_name = "Kenya"
    else:
        country_name = clickData["points"][0]["location"]

    dff = df[df["Entity"] == country_name]
    dff = dff.sort_values(by="Year")
    #
    fig = px.line(dff, x="Year", y="Population (historical estimates)", markers=True,
                  )
    #
    fig.update_layout(
        title={"text": f"Population (historical estimates) for {country_name}" + "<br>" +
                       f"({dff['Year'].min()} - {dff['Year'].max()})"}
    )
    #
    return fig

In short, the if/else statement sets “Kenya” as the default country name if none is clicked on the map. If the user clicks a country on the map, the value in the country_name object is set to whatever country has been clicked. Omitting out the if/else statement and going with a straighforward declaration of country_name = clickData["points"][0]["location"] leads to an error. I welcome any suggestions of extracting the country name within one line without the use of if/else statement.

Refresh your browser, and click any country. You may even time travel to the past by clicking a year on the slider. Below is the line graph of Population against Year values for Australia.

Let’s complete the other twin by creating a line graph of GDP against Year. Both will help us see any relationship between population growth and GDP.

#4.2 Callback for line graph for gdp-per-capita against years
@app.callback(
    Output("line-graph-gdp-capita", "figure"),
    Input("map-year", "clickData")
)
def line_capita(clickData):
    if clickData is None:
        country_name = "Kenya"
    else:
        country_name = clickData["points"][0]["location"]

    dff = df[df["Entity"] == country_name]
    dff = dff.sort_values(by="Year")

    fig = px.line(dff, x="Year", y="GDP per capita, PPP (constant 2017 international $)", markers=True,
                  labels={"GDP per capita, PPP (constant 2017 international $)": "GDP per Capita"})

    fig.update_layout(
        title={
            "text": f"GDP per capita, PPP (constant 2017 international $)" + "<br>" +
                    f"for {country_name} ({dff['Year'].min()} - {dff['Year'].max()})"}
    )

    return fig

Here are the much awaited twin figures in all their glory.

Finally, the much awaited final nail to the coffin. We shall create a bar graph that showcases the trend of diarrhoea cases across all years. Having the population and GDP charts above it serve as good background information for a researcher trying to identify any nexus between demographics, economics and health.

#5 Callbacks to draw bar graph for diarrhoea related deaths
@app.callback(
    Output("diarrhoea-bar-graph", "figure"),
    Input("map-year", "clickData")
)
def update_bar_graph(clickData):
    if clickData is None:
        country_name = "Kenya"
    else:
        country_name = clickData["points"][0]["location"]

    dff = df[df["Entity"] == country_name]
    dff = dff.sort_values(by="Year")

    fig = px.bar(dff, x="Year", y="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                 hover_data=["GDP per capita, PPP (constant 2017 international $)", "Population (historical estimates)",
                             "sub-region"],
                 labels={"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"},
                 color="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)")

    fig.update_layout(title={"text": f"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)" + "<br>" +
                                     f"for {country_name} ({dff['Year'].min()} - {dff['Year'].max()})"})


    return fig

The bar chart for diarrhoea cases for children under 5 years in Australia is, wait for it… no I am not showing it.

“Why?”

Australia is not your typical country where you would hear such cases. Heck, I am saying this when I ain’t even in Australia, never even set foot there and the university offer I got from there I couldn’t honour it due to an existent reality that ran contrary to my wishes.

Australia has almost negligable diarrhoea deaths. Forget about it. Go for some other country.

On the year slider, select 2011 and zoom to Haiti, located off the Florida coast to the south east. You will notice there is a huge hump on child diarrhoea cases for this year. Upon doing some basic research, a hurricane was the cause for this sharp increase.

Now click on Botswana, and you will notice there was a bump on the year 2006. Some Google search showed this year received exceptional rain that resulted in flooding. It’s no suprise that floods can lead to a spike in diarrhoea cases due to contaminated water infiltrating into water infrastructure.

Seeing that our app works the way we wish, you can comment out the html.Div containing the dcc.Markdown responsible for printing our click events. In our app.layout, it is located just before the line charts.

##### This is to help in debugging capturing the clicked country on the choropleth map
    # html.Div([
    #     dcc.Markdown("""
    #             **Click Data**
    #
    #             Click on points in the graph.
    #             """),
    #     html.Pre(id='click-data', style=styles['pre']),
    #     ], className='three columns'),
    #########

Also delete it callback compatriots.

########
# @app.callback(
#     Output('click-data', 'children'),
#     Input('map-year', 'clickData'))
# def display_click_data(clickData):
#     return json.dumps(clickData, indent=2)

######

It’s work is done and deserves a honourable exit by remaining as an inert code chunk to help in debugging if some ‘click’ event error arises.

As a finality, this is how our data visualization dashboard recipe looks like:

# Import the required packages
import pandas as pd
import numpy as np
import plotly.express as px
from dash import Dash, dcc, html, Input, Output
import json
import time

styles = {
    'pre': {
        'border': 'thin lightgrey solid',
        'overflowX': 'scroll'
    }
}

# Source of the data used is: https://ourworldindata.org/childhood-diarrheal-diseases?utm_source=pocket_saves
# from the download section of compound line graph

# Clean the dataset for: "D:\gachuhi\dash-projects\dash-layout\data\diarrhoea_children_gdp.csv"
# Specifically the CSV file: "diarrhoea_children_gdp.csv"

# Remove all rows that have null values in the column: "Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)"
# and "Population (historical estimates)"

df = pd.read_csv("https://raw.githubusercontent.com/sammigachuhi/dash_plotly_projects/main/data/diarrhoea_children_gdp.csv")
# print(df)

# Now remove all null values in the column: "Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)" and
# "Population (historical estimates)"
# df = df.copy()
df = df.dropna(
    subset=["Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)", "Population (historical estimates)"])
# print(df)

# The below dataset contains country codes and their continents. We want to join the countries in our diarrhoea dataset
# to their sub-regions since the `continent` column in our diarrhoea dataset has missing values
df_code = pd.read_csv("https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv")
df_code["Code"] = df_code["alpha-3"]

df = pd.merge(df, df_code[["Code", "sub-region"]], on="Code", how="left") # Merge with country `Code` to their continents

# Remove all rows with value `None` in column `sub-region`
df = df.dropna(subset=["sub-region"])

# Save the cleaned dataframe
# df.to_csv("data/cleaned_df2.csv")

# Now to create the plotly dashboard
app = Dash(__name__)
server = app.server

app.layout = html.Div([
    # 0 The heading
    html.H2(f"Diarrhoea related deaths amongs children <5 years, World"),

    html.Br(),

    dcc.Markdown("""
    Source: [Our World in Data](https://ourworldindata.org/childhood-diarrheal-diseases?utm_source=pocket_reader)

    Our World In Data is a project of the Global Change Data Lab, a registered charity in England 
    and Wales (Charity Number 1186433).
    """,
                 link_target="_blank"),

    #1 the map layout
    dcc.Graph(id="map-year"),

    #2 The slider
    dcc.Slider(
        df["Year"].min(),
        df["Year"].max(),
        step=None,
        id="year-slider",
        value=df["Year"].max(),
        marks={str(year): str(year) for year in df["Year"].unique()}
    ),

    #3 The heatmap and scatterplot on the same column
    html.Div([
        dcc.Graph(id="heat-map-country-year"),
        dcc.Graph(id="scatterplot-death-gdp-year")
    ]),

    html.Br(), # This is to insert a line break.

##### This is to help in debugging capturing the clicked country on the choropleth map
    # html.Div([
    #     dcc.Markdown("""
    #             **Click Data**
    #
    #             Click on points in the graph.
    #             """),
    #     html.Pre(id='click-data', style=styles['pre']),
    #     ], className='three columns'),
    #########

#4 Draw line graph of population of selected country dependent on country selected on map and likewise for
    # gdp per capita in one row
    html.Div([
        dcc.Graph(id="line-graph-population", style={"width": "48%", "display": "inline-block"}),
        dcc.Graph(id="line-graph-gdp-capita", style={"width": "48%", "display": "inline-block"})
    ]),

#5 Draw bar graph of diarrhoea related deaths across the years
    dcc.Graph(id="diarrhoea-bar-graph"),


])

# Callbacks section
## Callback for #1 The map layout
@app.callback(
    Output("map-year", "figure"),
    Input("year-slider", "value")
)
def update_map(year_slider):
    dff = df[df["Year"] == year_slider]

    fig = px.choropleth(dff, locations="Entity", locationmode="country names",
                        color="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                        hover_name="Year",
                        color_continuous_scale=px.colors.sequential.Plasma,
                        title=f"Map showing deaths from diarrhoeal diseases for children <5 years in {year_slider}",
                        custom_data=["Entity"],
                        labels={
                            "Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"})

    fig.update_layout(transition={"easing": "elastic-out"})

    return fig

# Callbacks for #3 heatmap and scatterplot on the same page
# Heatmap callback
@app.callback(
    Output("heat-map-country-year", "figure"),
    Input("year-slider", "value")
)
def update_heatmap(year_slider):
    dff = df[df["Year"] == year_slider]

    fig = px.treemap(dff, names="Entity", path=["sub-region", "Entity"],
                     values="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                     color="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)", hover_name="Year",
                     hover_data="GDP per capita, PPP (constant 2017 international $)",
                     color_continuous_scale=px.colors.sequential.Plasma,
                     color_continuous_midpoint=np.average(
                         dff["Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)"],
                         weights=dff["GDP per capita, PPP (constant 2017 international $)"]),
                     title=f"Treemap Chart showing deaths from diarrhoeal diseases" + "<br>" +
                           f"for children <5 years in {year_slider}",
                     labels={"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"})

    fig.update_layout(transition={"easing": "elastic-out",
                                  "duration": 50},
                      margin={"t":50, "l":25, "r":25, "b":25})

    return fig

# Scatterplot callback
@app.callback(
    Output("scatterplot-death-gdp-year", "figure"),
    Input("year-slider", "value")
)
def update_scatterplot(year_slider):

    dff = df[df["Year"] == year_slider]

    fig = px.scatter(dff, x="GDP per capita, PPP (constant 2017 international $)",
                     y="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                     color="sub-region",
                     size="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)", hover_name="Entity",
                     hover_data="GDP per capita, PPP (constant 2017 international $)",
                     title=f"Scatterplot showing Deaths from Diarrhoea cases against GDP per capita," + "<br>" +
                           f"PPP (constant 2017 international $ for {year_slider}",
                     labels={"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"})

    fig.update_layout(transition={"easing": "elastic-out",
                                  "duration": 50})

    return fig

########
# @app.callback(
#     Output('click-data', 'children'),
#     Input('map-year', 'clickData'))
# def display_click_data(clickData):
#     return json.dumps(clickData, indent=2)

######

#4.1 Callback for line graph for population against years
@app.callback(
    Output("line-graph-population", "figure"),
    Input("map-year", "clickData")
)
def line_population(clickData):
    if clickData is None:
        country_name = "Kenya"
    else:
        country_name = clickData["points"][0]["location"]

    dff = df[df["Entity"] == country_name]
    dff = dff.sort_values(by="Year")
    #
    fig = px.line(dff, x="Year", y="Population (historical estimates)", markers=True,
                  )
    #
    fig.update_layout(
        title={"text": f"Population (historical estimates) for {country_name}" + "<br>" +
                       f"({dff['Year'].min()} - {dff['Year'].max()})"}
    )
    #
    return fig

#4.2 Callback for line graph for gdp-per-capita against years
@app.callback(
    Output("line-graph-gdp-capita", "figure"),
    Input("map-year", "clickData")
)
def line_capita(clickData):
    if clickData is None:
        country_name = "Kenya"
    else:
        country_name = clickData["points"][0]["location"]

    dff = df[df["Entity"] == country_name]
    dff = dff.sort_values(by="Year")

    fig = px.line(dff, x="Year", y="GDP per capita, PPP (constant 2017 international $)", markers=True,
                  labels={"GDP per capita, PPP (constant 2017 international $)": "GDP per Capita"})

    fig.update_layout(
        title={
            "text": f"GDP per capita, PPP (constant 2017 international $)" + "<br>" +
                    f"for {country_name} ({dff['Year'].min()} - {dff['Year'].max()})"}
    )

    return fig

#5 Callbacks to draw bar graph for diarrhoea related deaths
@app.callback(
    Output("diarrhoea-bar-graph", "figure"),
    Input("map-year", "clickData")
)
def update_bar_graph(clickData):
    if clickData is None:
        country_name = "Kenya"
    else:
        country_name = clickData["points"][0]["location"]

    dff = df[df["Entity"] == country_name]
    dff = dff.sort_values(by="Year")

    fig = px.bar(dff, x="Year", y="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)",
                 hover_data=["GDP per capita, PPP (constant 2017 international $)", "Population (historical estimates)",
                             "sub-region"],
                 labels={"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)": "Deaths"},
                 color="Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)")

    fig.update_layout(title={"text": f"Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Rate)" + "<br>" +
                                     f"for {country_name} ({dff['Year'].min()} - {dff['Year'].max()})"})


    return fig

if __name__ == "__main__":
    app.run(debug=True)

One last hill remains to be conquered. Publishing our app on Render. This video shall serve as the tour guide for the remainder of the hike. From me, it is “Adieou”!

Conclusion

Our dashboard may look somehow sophisticated and interactive if the changing of graphics based on year and country is anything to go by, but there are other more complex, yet awe-inspiring dashboards out there. This Plotly site has a host of them, and one can stretch their imagination to create their own custom dashboards based on the ingenuity of others. One must confess some of their code is intimidating from the outset, but practising little by little one edges closer to taking the champion’s belt off their waists.

Fast forward to two years later. At least some gastro-intestinal normalcy has resumed, but I feel that pathogen-induced gastro-intestinal problems are one thing that can be internationally eradicated much like malnutrition. Reverting back to our dashboard, it is evident that least developed countries have the highest under-five child-deaths from diarrhoea. It is also highly likely that adolescents to adults and others could be suffering from this condition despite it being very much treatable.