End-to-End Guide: Creating a Web Application using Dash
By the end of this guide you will know how to create and deploy your own dashboard on the web.
Why Dash?
Why use Dash to build an application over other great tools? I became interested in learning Dash because:
- it is compatible with other big data frameworks such as Spark for building a front-end for large datasets
- it can create highly customizable interactive Dashboards for unstructured datasets using R or Python
- it requires no knowledge of HTML or Javascript
Why not just use BI tools?
I want to preface by saying Tableau and Power BI are great tools. If your goal is to make a great-looking interactive dashboard using a structured dataset, they are a great option.
However, Tableau and other BI tools don’t offer the same level of flexibility as Python and R when it comes to working with unstructured data. It can be more efficient to use Python or R if the data requires a lot of pre-processing and transformation before analytics.
What I find most appealing about Dash is its integrability with big data and parallel computing frameworks. Dash apps can serve as the front-end to Spark clusters. If your file is too large to fit on your memory, and you don’t want to work with cloud computing frameworks, Dash is also compatible with Vaex. Vaex is a python library for lazy “Out-of-Core” DataFrames (similar to Pandas) which can be as large as the size of your hard drive! The founders of Vaex wrote an excellent blog post, which I highly recommend if you’re interested in working with large datasets.
The Global Power Plant Dataset
The Global Power Plant data displayed on the dashboard comes from the World Resources Institute. The Global Power Plant dataset is comprehensive and contains information about approximately 30,000 power plants across 164 different countries. The dataset contains detailed features about power plants such as geolocation, plant owner, primary fuel and secondary fuel types, etc.
Getting to know the data
The first step towards building the dashboard was gaining familiarity with the dataset and figuring out what relevant information I wanted to convey. After narrowing down a list of features, the quality of the data was accessed.
I used the missingno library to visualize the distribution of missing values in the dataset.
This is a crucial step because it helped refine the list of import features. I decided it was not worth including values from fields such as other_fuel2
& other_fuel3
since most of the values are missing.
Designing the Dashboard
I decided that an interactive map would be the best way to capture the dataset. I started by plotting all power plant’s latitude and longitude with Plotly’s px.scattermap
.
One of the principal quantities I was interested in comparing was the distribution of different categories of power plants worldwide. I used the Sunburst plots because they can be an effective way to visualize hierarchical data distributions.
I used Dash’s Multi-Value Dropdown to filter the countries displayed on the sunburst plot and the map.
After importing the dataset as a pandas DataFrame, the names of all the countries in the dataset are extracted into a pandas series. The multi-value dropdown takes the Pandas series containing the countries as an iterable to create a dictionary that the dash-component requires.
Connecting the components
Up until now, we have successfully created individual components, but we still haven’t connected them.
To connect the different components, we have to use Dash Callbacks functions.
What are Dash callback functions?
Dash callback functions are python functions that are automatically called by Dash whenever an input component’s property changes.
Whenever the user selects a different set of countries in the multi-value dropdown, a python function, referred to as a Dash Callback, will be called. This function’s input parameter will be the new countries that have been selected by the user, and it will return an updated map and a sunburst plot.
The code looks something approximate to this:
The “Inputs” (a set of countries) and the “Outputs” ( updated map and sunburst plot) are described declaratively as the arguments of the @app.callback
decorator.
If you don’t know what decorator is, essentially, declaring the @app.callback
decorator tells Dash to call the function below it whenever the “Input” value changes.
Therefore, whenever the selected countries are changed by the user, the def update_figures
function is called. The input parameter of this function takes the selected countries from the @app.callback
decorator. For instance, if we had two "Inputs" in the decorator, the def update_figures
function would have to have two parameters. The decorator has two “Outputs”: the map and the graph. Hence, def update_figures
returns two objects: scatter_map and graph.
Similarly, I used another @app.callback
decorator for updating the summary when the user clicks on a particular power plant.
Note that in the decorator, the “Input” is now the dcc.Graph
component (the scatter map), and the "Output" is a Dash markdown-component.
The function update_summary
takes a parameter called click_Data
. This comes from the clickData
attribute of dcc.Graph
components. clickData
is one of four user-interactions attributes of dcc.Graph components
. The other attributes are hoverData
, selectedData
, and relayData
. More information about them can be found here.
I found the clickData
to be the best choice for the Global Power Plants dashboard. I recommend testing them out to see which one works best for your Dashboard.
Structure of App Layout
There are a few different ways to configure the layout of a Dash application. I used the dash-bootstrap-components library for styling my app. If you’re just getting started, I highly recommend this video by Charming Data.
Using Bootstrap I separated the app into two main components:
- the Sidebar
- graphics
- The
dash_html_components
(HTML)library contains components for all HTML tags. - The
dash_core_componets
(dcc) describes higher-level interactive components such as the map and the sunburst plot - the
dash_bootstrap_components
(dbc) library contains Bootstrap components for Dash
From a high-level overview, the layout of Dash apps is structured like a tree. This kind of hierarchical structure provides a lot of flexibility in terms of adding or removing components. It isn’t necessary to use Bootstrap for styling your application, however, it does provide a lot more flexibility. There are many free Bootstrap stylesheets available and the theme of the application can be changed with a single line:
The dash_bootstrap_components.themes
module contains the Content Delivery Network (CDN) links for Bootstrap and Bootswatch themes. It is possible to modify or compile your own theme. It can be served locally by replacing dbc.themes.BOOTSTRAP
with the URL of the stylesheet. For more information, you can read the documentation.
After tweaking and styling everything, our final python script for creating the dashboard looks like this:
Deploying the application on the Web
Once we are ready to share the dashboard, we can start the deployment process. There are many ways to deploy your application on the Web. The method shown below is the easiest method that I’ve come across for quickly deploying your web application. It doesn’t involve creating a virtual environment or using the command line (CLI).
Step 1: Initialize and create a GitHub repository. To deploy your application on the Web, we need the following five files in our repository:
app.py
- CSV file containing your dataset
README.md
requirement.txt
Procfile
The first three files are self-explanatory. The app.py
is what we have been working on thus far, and the README.md
is a markdown file that contains information regarding the project.
The requirement.txt
looks like this:
It is important to add the correct version of the packages you are using in your environment. If you don’t know the versions of your packages don’t worry! The print(name_of_package.__version__)
command will print the version of the package, and you can directly copy-paste them in the requirement.txt
file using the format I showed above. Thegunicorn
is an essential package for deployment! It is a Python WSGI HTTP Server for UNIX. If you don’t have this package simply use the $ pip install gunicorn
command in the terminal.
The Procfile
is an essential file for deploying our application, and it looks like this :
You can copy-paste the line above into your repository. When finished, our repository for the application looks like this.
Step 2: Create a new App on Heroku. After registering for Heroku, we create a new app and choose a name for our application.
Step 3: Connecting our GitHub repository to Heroku app. Once we have created our Heroku app, we can see several deployment methods. In my opinion, the simplest option is connecting with GitHub since it requires no knowledge of Command Line Interface (CLI) or Virtual environments. After selecting GitHub as our deployment method, we enter the name of the GitHub repository we created in Step 1.
Step 4: Manual Deployment. Finally, we deploy our app by the “Deploy Branch” command. This step takes a few minutes.
Step 5: View App! Once the deployment step is complete we can view the webpage of our application in the browser! The Global Power Plants Dashboard can be viewed here!