A dashin’ plot: Beginner’s guide to Plotly & Dash

Data Knyts
SFU Professional Computer Science
14 min readFeb 8, 2022

--

Authors: Tanmay Jain, Ayush Raina, Siddhartha Haldar, Anirban Banerjee

This blog is written and maintained by students in the Master of Science in Professional Computer Science Program at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/mpcs}.

Are you a budding data scientist just wanting to transform your datasets into gorgeous, colorful graphs? Do you want to showcase your data on the web, but have no idea how to wrangle with javascript?

Welcome to a 15-minute crash course on creating visualizations with Plotly and Dash. By the end of this article, you’ll be a pro in plotting and hosting picture-perfect charts for the entire world to see.

Overview

Visualization plays an important role in creating a mental image of data, as well as revealing the trends and patterns that may be hiding within. Unlike machines, which can cheerfully crunch through thousands of lines in a .csv file without complaining, human brains tend to see things like pictures. By depicting the same data in visual form, we can shift our method of analysis from cognition to perception, and provide a bird’s eye view of the dataset as a whole. Visualizations are therefore paramount when the size of our dataset runs into gigabytes — which it often tends to do in the case of big data.

Good visualizations help provide understanding and context, because they highlight trends and patterns that we might otherwise miss. Consider the famous Anscombe’s quartet, which are four datasets that have the same descriptive statistics (same mean, variance, linear correlation, etc) while being completely different when graphed. The four plots below allow you to literally see, at a moment’s glance, details that the usual statistical methods would have missed:

Anscombe’s quartet graphed (from Wikipedia)

For another example of why good visualizations are important, look at the two maps below - both of which show Skytrain stations in the Vancouver area. It is almost instantly noticeable that the stylized map on the right is easier to follow than the geographic map on the left, even though they depict the same thing.

Maps of Skytrain stations in the Vancouver area

To Plotly or not to Plotly?

Virtually every data science student takes their first steps into visualization using Matplotlib — it’s easy to understand and gets the job done when you want to pull up a quick graph. However, while useful for creating static plots or exploratory analysis, it is extremely cumbersome and frustrating to work with when more complex or interactive plotting is required.

Almost as tangled as the code for creating subplots

In contrast, Plotly provides the following:

  • Interactivity: Seamlessly zoom in on the important sections of a graph; customizable interactive tools like buttons, sliders, and dropdowns to display different perspectives of graphs
  • Complexity: Since Plotly is based on Pandas (another great big data tool!) it is more conducive to intricate visualizations that require performing complex transformations on the data
  • Aesthetics: Plotly generates a wide variety of gorgeous graphs including, but not limited to Statistical Charts, Scientific Charts, Financial Charts and Geological Maps
  • Web integration: With Dash, it is possible to create amazing web applications without the worries of needing to know web development
  • Cost: Did we mention it’s completely FREE?

Getting The Data

FYI, the graphs we’ve created have used article and comment data from the New York Times (NYT). The only datasets we need for this atricle is available at this github-link.

The New York Times has also provided a comprehensive API for obtaining various NYT-related data. The dataset provided above has been scraped from Archive API and Community API.

Since data collection and cleaning is not the focus of this article, we will not be going into the details of how we generated the finalized datasets we have used for the visualizations.

TL;DR you can grab the datasets here

Plotly Basics

Before we deep dive into the implementation of plotly graphs, we need to understand some of the plotly methods that will be used in the next section.

Lets go.figure() it out

go.figure() 

go.figure()is called to initialize python dicts or instances as plotly.graph_objects.Figureclass and are serialized as text in Javascript ObjectNotation (JSON). This Json is then passed on to plotly.js for implementation of the visualization.

Next . . .“data=[]”

After go.figure() we arrive at "data=[]"

go.figure(data=[])

data is the dict or instance that is passed in go.figure() which is passed onto plotly.js.

WTD — WHAT THE DATA?

go.figure() and data will call the graph_object and the data would be stored in this object. This raises an obvious question — what, in fact, is the data?

The data is the multitude of different graphing methods in plotly.

go.figure(data=[go.chart_type()])

These are the different chart_type values available:

  • go.scatter()
  • go.Bar()
  • go.Pie() . . . . . . . and so on.

Now the question is - how do we use chart_type?

In chart_type we have to initialize our values which will be passed onto the object function. These values can be alist, dict, seriesor numpy.ndarray.

go.figure(data=[go.chart_type(x=value_type, y=value_type)])

Note: Name of x and y change depending on the chart_type used in the graphing method.

These concepts we learned in plotly will act as a motivation for the next section where we will create a simple framework using these concepts.

Create our graphing framework

To get started start with plotting graphs using Plotly Go, we will need to first create a figure using go.Figure()and then add a trace to it. Regardless of how a graph object figure was constructed, it can be updated by adding additional traces to it and modifying its properties.

What is a trace you ask? From Plotly’s documentation:

“A trace is just the name we give a collection of data and the specifications of which we want that data plotted. Notice that a trace will also be an object itself, and these will be named according to how you want the data displayed on the plotting surface.”

Each trace has one of more than 40 possible types (see below for a list organized by subplot type, including e.g. scatter, bar, pie, surface, choropleth etc), and represents a set of related graphical marks in a figure.

1. Add a trace

Now let’s try out what we have learnt in the basics section.

Here we have used add_trace() method to add an attribute of visualization to out current figure. We have used go.Scatter() in this case, where the parameters for data are in x(values of x-axis) & y(values of y-axis).

fig = go.Figure()fig.add_trace(go.Scatter(
name=’Graph’,
x=df['column_x'],
y=df['column_y'],
)]))
fig.show()

However, if you had added a trace for something else, say a pie chart, it would not have the same data parameters. Don’t worry if you don’t understand just yet! We’ll go over this later at the individual graph levels.

2. Adding multiple traces

We can add more traces, to compare and contrast between other data points or even another chart on the sames axes.

fig = go.Figure(data=[
go.Scatter(
name=’India’,
x=df_covid_country_line_2021[‘month’],
y=df_covid_country_line_2021[‘india’],
),
go.Scatter(
name=’China’,
x=df_covid_country_line_2021[‘month’],
y=df_covid_country_line_2021[‘china’],
),

...
...
...
go.Scatter(
name=’China’,
x=df_covid_country_line_2020[‘month’],
y=df_covid_country_line_2020[‘canada’],
),
])

Don’t sweat it programmers, we can use loops to add traces in plotly. We didn’t use it in the example above to illustrate the basics of trace.

for y in [y1, y2]:
fig.add_trace(go.Scatter(
x=x,
y=y))

3. Creating buttons

Let’s start by answering why do you need buttons?

When dealing with data of varietal factors, there is a need for filtering data or seeing data at specific time frames. This calls in for buttons that helps manage certain views of the graph, and even introduce interaction, right inside of Plotly!

3.1 Dropdowns

For adding views of different graphs for better comparisons against each other. By calling the update_layout()method we specify that we are updating the given chart with some additional information.

The update_menus() method is used for adding a dropdown menu button with certain parameters to customize.

fig.update_layout(
updatemenus=[
dict(
direction='down',
showactive= True,
active=0,
x=0.1, y = 1.13,
buttons=list([
dict(label="None",
method="restyle",
args=[{"visible": [True, False]}]),
dict(label="High",
method="restyle",
args=[{"visible": [False, True]}
])]))])

Here is a list of parameters to help you get started:

direction= menu items open ups in this direction
showactive= highlights active dropdown item
active = default active item index number
x,y= position on the graph as cartesian points
buttons.method= determines which plotly.js function will be used to modify the chart. restyle is used for modifying by data attributes.
buttons.args.visible= visibility of trace based on dropdown selection ie. which graphs you want to be visible at that selection.

3.2 Range Selectors

Now, with range selectors which could be used for quantitative measure values.

fig.update_layout(
xaxis=dict(
rangeselector=dict(
buttons=list([
dict(count=1,
label=”1m”,
step=”month”,
stepmode=”backward”),
dict(count=6,
label=”6m”,
step=”month”,
stepmode=”backward”),
dict(step=”all”)
])
))

Here, we have parameters such as:

count = number of steps to take to update the range
label = text to appear on the button
step = unit that the count value will set the range by
stepmode = update the mode of step, for resetting

4. Styling & Colors

Everything just works right out of the bag, but the difference in the look and information displayed may eventually start to get monotonous.

In this section, we try to understand how we can personalize each graph and add some zest to them. Let’s start to visualize the graphs that look better aesthetically.

4.1 Templates

Plotly comes pre-loaded with several templates that you can get started using right away. This will set the chart appearance and select some colors for you. These are some examples from the template documentation.

example_templates = ["plotly", "plotly_white", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"]for template in example_templates:
fig.update_layout(template=template)
fig.show()

4.2 Colors
Well, colors in plotly are quite simple. We can go ahead with the discrete colors provided, or add in some nice suited values as:

A hex string '#ff6b6b'
An rgb/rgba string 'rgb(255,0,0)'
An hsl/hsla string 'hsl(0,100%,50%)

For continuous colors, you can find some color scales from the built-in colorscales in the documentation, or even create your own!

and here you go, you’ve got a simple framework to start plotting your new graphs!

Let’s get plottin’

1. Bar Plots

Start out with your ‘hello world’ code for plotting a bar chart.

import plotly.graph_objects as go
animals=['giraffes', 'orangutans', 'monkeys']

fig = go.Figure([go.Bar(x=animals, y=[20, 14, 23])])
fig.show()

We can customize this basic graph with parameters, most often you’d be using some of these:

barmode= menu items open ups in this direction
text= add text value to bar chart
hovertext= adds text on each bar plot on hover
width= sets the bar width (in position axis units)
orientation= set orientation h or v(default), make it ‘h’ for a column chart.

Let’s try out what you’ve learned by implementing this graph. You can make this graph here from the attached top-author-count-by-article-dataset. You can use the above framework to make your task easier.

Top 10 Publishing Authors in New York Times from 2016–2021

2. Line Plots

Next up, we have the ‘Hello World’ for plotly’s line plot.

import plotly.graph_objects as go
import numpy as np
x = np.arange(10)
fig = go.Figure(data=go.Scatter(x=x, y=x**2))
fig.show()
Plotly Line Chart - ‘Hello World’

Now that we got that out of the way, we can start out making graphs that are a little more complex. Here are some parameters that can fast track your work:

mode(lines/lines+markers/markers)= sets drawing mode with just lines or with points highlighted as markers
line.color= sets the color of the line
hovertext= adds text on each line plot on hover
orientation= set orientation h or v(default), make it ‘h’ for a column chart.

We recommend you try and get hands-on by making the graph below from the covid-mentions-of-countries-over-month-dataset!

Countries referenced by articles on Covid-19 in 2020

3. Choropleth Map

As a formal definition: a choropleth map is a map composed of colored polygons. It is used to represent spatial variations of a quantity.

Let’s go over the hello world here again.

import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv(‘https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv')fig = go.Figure(data=go.Choropleth(
locations=df[‘code’], # Spatial coordinates required
z = df[‘total exports’].astype(float), # Data to be color-coded
locationmode = ‘USA-states’, # set of locations match entries in `locations`
colorscale = ‘Reds’,
colorbar_title = “Millions USD”,
))
fig.show()

Now, let’s go over some parameters that might help you to customize graphs as per your liking:

colorbar_title= Adds title to colorscale
colorscale = sets colorscale based on magnitude
geo_scope= sets the scope of map displayed

Next up! Now try to visualize this using the location-nyt-commenters-dataset.

Location of NYT Commenters in the US

4. Bubble Chart

This one is a multi-variable graph that is a cross between a Scatterplot and a Proportional Area Chart. Size is used to compare the quantitative measures.

For the ‘hello world’ plot, this time we do:

import plotly.graph_objects as gofig = go.Figure(data=[go.Scatter(
x=[1, 2, 3, 4], y=[10, 11, 12, 13],
mode=’markers’,
marker_size=[40, 60, 80, 100])
])
fig.show()

Let’s continue with how we can customize the bubbles (marker):

hovertext = Sets hover text elements associated with each (x,y) pair
marker.size =Sets the marker size (in px)
marker.showscale=True to display colorbar
marker.sizemode = Use diameter or area to set rule
marker.sizeref = Sets the scale factor for size of marker points
marker.color =Sets the marker color
marker.colorscale=sets the colorscale from builtin or specified colors

And finally, try visualizing this bubble chart from the top-commenters-dataset!

Highest Comment Count based on Authors & Categories

And that’s about it to graphing in plotly folks. Now, we will learn the subsequent steps with an interesting tool called Dash!

Who dis Dash?

Okay, so by now we have the visualizations ready. But how do we showcase them to the world?
Of course, by hosting them on a website.
But this would require prior knowledge of HTML, CSS, JavaScript and server-side technologies, which entails a pretty steep learning curve.

Fret Not! This is where Dash comes to the rescue.

Dash is a one-stop solution for folks who are data and visualization ninjas but are new to the world of web technologies. Powered by Flask, React.JS and Plotly.js, Dash is a python framework presented by none other than Plotly, that enables users to rapidly create user-friendly web applications containing their visualizations with minimal hassle.

Layouts and Callbacks form the basis of any Dash app.

Layouts define the look and feel of the application and dictate the overall positioning and appearance of various elements throughout the page. Dash comes out of the box with dash_core_componentsthat provide ready-to-use HTML components, thereby saving time on building them from scratch. The dash_html_componentson the other hand, help create our own custom HTML tree of elements, completely in python.

Callbacks are nothing but functions that are triggered whenever an input component’s property changes, to update some property in another component (the output).

Let’s make a Dash-in’ app! ⚡

The first step obviously is to get hold of the packages using good ol’ pip :

pip install dash 
pip install dash-html-components
pip install dash-core-components
pip install dash_bootstrap_components
pip install gunicorn #Optional, more on this later!

The next step is to create an app.py file containing the necessary imports :

import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc

Now, we need to start our Dash (Flask to be precise) server that is going to do the heavy lifting of handling incoming requests and rendering our visualizations :

app = dash.Dash()
server = app.server
if __name__ == '__main__':
app.run_server(host="127.0.0.1",debug = True, port = 8050)

The above snippet initializes the dash app and spins up a flask server at port 8050. The use of the server variable will be discussed later on. With a running server, it’s now time to add the meat of our web application, i.e. the visualizations and necessary HTML components. Essentially all our graphs should be contained within separate divs.A basic HTML div in Dash looks like this:

html.Div(  style   = { ‘color’ : <INSERT COLOR HERE> },
className = <INSERT CLASSNAME HERE>,
children = [<INSERT DASH HTML ELEMENTS>])

Dash has support for custom CSS for its HTML elements using the style property which is a dictionary of CSS attributes, which in this case contains the color CSS attribute. The className field pertains to HTML classes which helps apply external CSS styles to groups of elements. The children field takes a list of Dash html elements such as html.A, html.P, html.H1, etc, thereby enabling us to build a completely custom html tree of elements. Our visualizations can similarly be nested within the aforementioned HTML tree. The visualization elements are a part of the dash_core_components library and can be created as follows :

dcc.Graph(  id   = <INSERT GRAPH ID HERE>, 
figure = <INSERT PLOTLY GRAPH OBJECT HERE>)

Complying with good software development practices, it is always best to create a function(for the sake of reusability) that returns the div containing the visualization(s). This function can behave like our personal template generator and it may look like the one below:

def build_graph_div(graph,graph_type,
header_text,descriptor_text,
graph_type_color):
div = html.Div(
className = 'graph_div',
children=[
html.P(children=graph_type,
className='graph_chart_type',
style={
'color': graph_type_color
}),
html.H2(children=header_text,
className = 'graph_header'),
html.Br(),
html.P(children = descriptor_text,
className = 'graph_para'),
html.Br(),
html.Br(),
dcc.Graph(
id=<UNIQUE ID FOR THE GRAPH>,
figure=graph,
)
]
)
return div

The above function accepts a Plotly Graph Object and other additional variables as parameters and returns a div with a className attribute and its tree of children that hold both Dash html and Dash Graph elements. This code block produces a div that looks like this:

Web layout in action: http://dataknyts-nyt.herokuapp.com/

We can let our creative abilities flow when dealing with the above function to produce eye-catching elements. With the divs ready and styled there is only one final step to making our Dash app dashin’, attaching it to a parent div and listing the same under the layout attribute of app — the Dash(Flask under the hood) server we created some time back:

app.layout = 
html.Div( style = {<INSERT CSS STYLES HERE>},
children = [<INSERT ALREADY CREATED DIVS HERE>])

Putting it altogether, our nifty web application(with literally Zero HTML,CSS or JS) will look like :

import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc
app = dash.Dash()
server = app.server
def build_graph_div(<PASS VARIABLES AS NEEDED>):
div = html.Div(<BUILD YOUR DIV HERE>)
return div
app.layout = html.Div(style={<WHATEVER CSS STYLES TO BE ADDED HERE>},
children=[
html.Div(
className = <PASS CLASSNAME>,
children = build_graph_div(<PASS REQUISITE PARAMETERS>)
)
])
if __name__ == '__main__':
app.run_server(host="127.0.0.1",debug = True, port = <PORT #>)

It’s Dashin’ but is it SuperCharged? 🦾 (optional)

Our Dash app is ready to host our visualizations.
But is it robust enough to handle multiple simultaneous requests?
Can it withstand heavy incoming traffic? NO

In comes Gunicorn.

Gunicorn is a Python WSGI HTTP Server that takes care of running multiple instances of your web application, ensuring they are healthy and restarting them as needed. It sits in front of the web app and distributes incoming requests across those instances and communicates with the webserver.

In our use case, Gunicorn will fork several worker processes of our Dash app
that will help support multiple concurrent requests. The number of worker processes is configurable and it depends upon how robust we want our Dash app to be.

Incorporating it is easy. Remember the server variable we had mentioned previously. It is this server process that Gunicorn forks/make copies of. All we need to do is, open up a terminal and type the following :

gunicorn -w 4 app:server

The above code points to the server residing in the app.py file.
The -w option takes in how many workers we want and in this case, it will spit out 4 replicas of our server. The default is 2.

And BOOM, with that we have a Supercharged Dash-in’ Viz web-app running! ⚡

References:

https://plotly.com/python/
https://docs.gunicorn.org/en/stable/settings.html
https://dash.plotly.com/introduction
https://dataknyts-nyt.herokuapp.com
https://github.com/banerj10/CMPT732_NYT_User_Engagement

--

--