Python and Bokeh — Part III (Tutorial)

Published in

Yandex school of Data Science

22 min readAug 4, 2019

The beginner’s guide to creating interactive dashboards: real-time Bokeh application

In the first two parts of the series, we learned a lot about Bokeh. We already know how to create standalone documents with Bokeh glyphs, how to embed them into Jupyter notebooks, configure and add some interaction. More importantly, we learned how to develop basic Bokeh apps and launch them with the Bokeh server. First two parts can be found here: Part I, Part II

With all the knowledge we can build something real. For this tutorial, we will develop an interactive dashboard for Seattle city 911 calls visualization. Moreover, this application will not only display some static data, but it will regularly fetch new data and update the dashboard accordingly.

Separation of concerns

Before going deep into the development of our application, let’s agree on an important architectural principle: the data layer must be clearly separated from the view layer. Although a generally known and respected principle in software development, this type of separation is critically important for dashboard applications. Why?

Rich dashboard applications may perform a lot of data manipulation under the hood, like fetching, transforming and joining data from multiple sources. Without the separation between application layers, any changes to how data is collected or transformed, or even which data is used, may quickly leak into plotting, interaction and even styling parts of the application, cascading into massive and messy code changes.

Fortunately enough, Bokeh provides us with perfect tools to decouple the data from its representation: columnar data sources and views. We will create a simple and self-contained data provider, which will keep track of all data changes and provide plotting facilities with appropriate data sources and views.

We will create an application with a very generic layout and styling. Bokeh applications are not just Python scripts, they may contain templates, CSS files, custom themes and more. Templates and custom themes serve the same purpose as in general MVC pattern: they allow for nice separation between how plots are prepared and how they are actually presented in a web page and styled.

These tools add flexibility and we will use them extensively in our application to modularize it even further.

Dashboard Preview

Now, let’s go and create it!

The final result of our Bokeh exploration

Application design

For this application, we will use an open data set provided by the city of Seattle. The data contains 911 dispatches and is updated every 5 minutes. Both the data structure and its access are simple enough, while constant data updates will allow us to practice in developing all the infrastructure needed for dynamic dashboards.

We do not have any client requirements for the application we’re building, so we should try to imagine reasonable application design ourselves.

So, let’s break down the information we have about each 911 call:

dispatch location and address,
date and time,
type.

For each call, these three data elements answer three questions: “What happened? Where did it happen? When did it happen?”. Now, let’s outline, how we may present the data, and which aggregated information we may want to present alongside.

Dispatch location

The first and most obvious idea is to visualize dispatches on a map. Reasonable enough. This would allow us to quickly assess to which parts of the city 911 dispatches go and also give a visual understanding of geographical dispatch density. This would present the “Where?” aspect of the data.

Next, since we display dispatches on a map, we should consider adding tooltips with detailed information, so that the user can look at a specific call and understand, which call type it is, the exact time when the call was dispatched and address the call was dispatched to. This would complement the spatial aspect of the data.

Time dimension

Having a dynamically updated map is great, but we can and should display the same information in some other form. The map provides only a spatial representation of the data, but the data has other dimensions, most importantly — time.

Imagine, that calls enter one after another within minutes: this would indicate quite a severe load on the system. But it’s a very different story if there are dozens of minutes between calls. This is unlikely for a city of that size, of course, but we try to cover as many realistic scenarios as possible to make our application usable in any of them.

Looking at, say, a table with all dispatches listed one after another, we can quickly assess the current load on the Seattle emergency response services. This view would provide the “When?” perspective and give a more general picture of the data as well.

Emergency Types

Although it is critically important for a useful application, location and time are still not enough. We have another important component in the data: dispatch type. Distribution of the dispatch types may provide important information about current events. A bar plot with dispatch type statistics would be the easiest to comprehend.

By the way, if you want to gain some specialized domain knowledge and deeper understanding, you can go to the Seattle Fire Department radio exchange. The page also contains detailed descriptions of dispatch types.

Let’s summarize the design ideas we will implement:

we will use a map plot to display dispatch locations, and this plot will have meaningful tooltips to provide some additional information about the dispatch,
an additional table will display dispatches in a textual form, and this table should allow sorting by any column,
the third element of the application is a bar plot, displaying statistics of dispatch types.

Filtering mechanism

We do not yet have any filtering mechanism. Should we display dispatches for the last hour? Twelve hours? The entire day? Depending on the user needs, this number may be different, so the best way to address the requirement is to choose a sensible default value and let the user change it.

Thus, we need a slider to set the number of hours. For simplicity, let’s set 1 hour as the default value and 168 hours (i.e one week) as the maximum value and allow to select any number of hours in this interval. On the application start-up, we will display 911 calls for the recent hour.

With this knowledge, we already can build a skeleton for our application.

Laying things out

Previously, we used Bokeh internal functionality to create application layout. This time we will go further and add full-fledged templating to the application. Bokeh allows custom templates easily. We only need to create valid templates, put them into the right location and embed Bokeh models into the template.

Bokeh application may be one of two kinds: either it may be contained in a single file, or it may sprawl over several files as a directory app. Directory apps in Bokeh require a specific directory layout: with the application file named as main.py in the root directory and custom templates located in <app directory>/templates.

We will also add a separate configuration file for application-wide configuration options, just to make things a bit cleaner. Let’s create all the necessary files and directories:

Resulting directory layout is the following:

Indeed, we have a valid Bokeh directory app and can serve it:

On launch, Bokeh prints the link to the development server, and you can follow it to look at our newly born (and empty) Bokeh application.

Handling the data

Socrata API

The data we will use is provided through Socrata interface. Socrata is a database system, widely adopted by government and municipal institutions for both internal and external use, and is very often the main and the easiest way to get open data. To use Socrata from Python, we first need to install sodapy package, which provides Python bindings to Socrata:

Fetching data from Socrata API is simple: we only need to know the source domain and dataset identifier. Go to API > API Docs on Seattle Real Time Fire 911 Calls page and look through the documentation on Socrata:

Now we are set up to fetch data:

Let’s break down the request.

First, the command sodapy.Socrata(domain, app_token) creates a Socrata client. Note, that we do not use an application token: you can get one on the corresponding page, but the access is open even without the application token, although with some throttling limits.

Second, the line client.get(dataset_indentifier) performs the actual data request. client.get supports SQL-like queries, and we will use this functionality later to filter requests by date and time. The result is provided as a list of dictionaries, and each dictionary contains all the fields we need:

We will need the data as apandas DataFrame, so let’s transform it right away:

That’s all we need to know about Socrata to create the application we want. However, data itself needs a bit more work to be used in the application and we will add additional preprocessing steps later.

Data provider

We now know how to fetch data from Socrata, but how about all the containers and views our Bokeh application needs? Before diving deep into the code, let’s spend a moment to design the data provider a bit:

it should contain data fetching facilities,
fetched data will be stored both to be consumed by Bokeh and for calculations like getting response types statistics. As you remember, we also want data to be constantly updated, which means, we will use Bokeh data sources and views extensivelly,
data provider has to update views based on the selected number of recent hours to display,
we need data provider to calculate dispatch types statistics.

For clarity, some of the functionality outlined above would be implemented in a slightly simplified way, but the main caveat we need to remember is that in a real-life application we will have some type of persistent storage and non-trivial internal data dispatching. But in this tutorial, we will stick to a simple in-memory implementation.

So, let’s create a module for the data provider:

API of the data provider reflects the requirements:

While the docstrings are self-explanatory, we should still explain the parameters to the class initializer:

- source and dataset_id are the dataset source and dataset identifier, i.e. data.seattle.gov and kzjm-xkqj; we make this configurable so that the data provider is flexible to possible changes in either the dataset location or identifier,

- n_types is the number of dispatch types to calculate statistics for; we do not want our bar plot to be cluttered, so we will only display n_types most frequent types,

- hrs is the current number of recent hours and max_hrs is the maximum number of hours from the past (168 hours, as we discussed earlier); we will use a trick here to simplify data provider dramatically: on initialization, data provider will fetch all the data up to max_hrs into the past. After that, we will only need to get updates and filter data as needed without complicated or partial fetches.

Later on, you will see why we need update_filter for dispatch type statistics calculation and how it will be used. We’re ready to code the data provider, but there are still minor issues we need to deal with.

Handling coordinates

Coordinates in the dataset we use are provided as conventional latitude and longitude. Plain and simple? Not really. Map plots in Bokeh require Web Mercator projection coordinates, although Bokeh will still label axes in latitude and longitude units. Thus, we need to convert coordinates first.

To implement the transform we will use GeoPandas. To perform the actual transformation, all we need to know is the coordinate reference system (CRS) of the original data and of the map plot in Bokeh: easy enough, it’s EPSG:4326 and EPSG:3857, correspondingly.

However, we will implement transformation in a more generic way, allowing to transform from any source CRS to any target CRS, since it costs almost nothing, but it may be helpful elsewhere or in case of any changes in Bokeh API:

Let’s break down this code a bit. First, we extract coordinates from the original DataFrame and transform them to numerical values:

pd.to_numeric is here because Socrata will return all values as strings. Next, we create the actual GeoPandas DataFrame:

The only thing left is the actual transformation, which is easily achieved with coords.to_crs.

Handling time zones

Another minor decision is related to time zones. Seattle 911 dispatches are provided in a US/Pacific time zone. Depending on user needs, we may want to transform date format to the user local time (or UTC) or leave it as it is.

To simplify things, we will use the original unaltered time zone. The only point in our application where we will specify time zone would be in filters for our data. However, elsewhere we will use the time zone naive date-time objects, as they work best with Bokeh.

Data provider initialization

So far we have taken care of coordinates transformation in our data provider. Let’s go on and fully implement the data provider given all the requirements outlined above.

First, let’s create an initializer for the data provider.

Note, that in the code blocks we only provide incremental changes to the code, while complete code will be provided for download at the end of the post.

So, in the initializer we set all the necessary fields and create all the containers:

The code is self-explanatory, except start_time calculation. Note, that on the first fetch we get all the data for the maximum number of hours. On each next fetch, we will need only complimentary updates.

We may postpone the creation of containers, but if we create them here, we would be able to initialize the data provider using with fetch_data method and skip any custom initialization code.

Note also, that we attempt to make things a bit cleaner by adding class attributes (COLS and RAW_COLS), so that we do not need to specify the full list of columns all the time.

Fetching data

Now, let’s go for data fetching:

Actual fetch from Socrata is done using client.get with where clause, which works exactly as you would expect from a SQL-like language: you provide a condition or a list of conditions as a string. Note, that we sort the result by datetime, since this would keep our data container always ordered as we append new data streams.

After getting the data from Socrata we create a data frame from it and check, whether we actually have any new data. If we do, we can proceed to data preparation: first, we need to transform spatial coordinates, so that we can plot them on a map, second, we assure datetime to be actually a datetime and not a string, and then we just add the newly fetched data to the data source.

You may ask, why we need an else clause. This is a tricky question and requires some knowledge of Bokeh. The problem is, we’re doing stream on our data source at initialization. At that stage, no models are subscribed to it. As a result, this first stream may not propagate to the models later on and we may end up having unsynchronized updates to the models.

Tables are the first candidates to suffer from this. Thus, we add a trivial stream, which will be called even if there’s no data at all, which in turn will force everything to be updated and in sync after all the models are subscribed.

Updating the filter

At the moment we still need to update the data source view, so that all the calls, which happened more than hrs hours ago, are filtered. This is done with update_filter, which we will create shortly.

After that we need to update dispatch type statistics, since it also reflects statistics for the latest hrs hours.

You will see in a moment, why we need time_filter, but long story short is that it’s used to simplify things: we in any case calculate it for data source view, why not reuse it?

So,update_filter method would work like this:

It first gets the current time in Seattle time zone, and then just filters out all the calls, which happened more than self.hrs ago. Inside the data source view data_view we update filters directly: Bokeh will keep track of this and perform all the steps needed on the client side.

Updating dispatch types statistics

Let’s move on and code update_stats method:

Now you see why we used the time_filter trick: it allows us to simplify value counts calculation using boolean indexing in data data frame. After that we just refill type_stats_ds with the new data.

You may ask: why we do not use some advanced technique here to handle updates to type_stats_ds?
Patching may be complicated, since we may have less than n_types different types (imagine, there were only 4 different dispatch types during recent hour). It would be simpler to use this data source directly and update the glyph elsewhere.

Implementing `set_hours`

The last thing we need in the data provider is functionality for hrs update:

In this method, we enforce bounds with np.clip and then update filter and dispatch types statistics.

Congratulations, we have just completed the most complicated part of our application! Take a minute to play with the data provider and understand how it works.

Of course, we can still improve the data provider and make the code more robust. There are several ways this code can be abused. For example, it’s not thread-safe. Also, imagine, that someone will try to call update_filter without update_stats, which will draw internal containers uncoordinated.

Application code

Now that we have all the data managing code we need, we can start with the application itself. Let’s recap what we have so far in our configuration file:

Some of these configuration options will be used to actually create the data provider inside the application.

Application HTML template

To layout the application, we will use HTML template and not Bokeh internal tools (like column or row). Custom HTML template provides more flexibility and allows us to use any styling we want.

The first tool we will use in our template is the Bootstrap CSS framework. We will use it to place the elements of the app on the grid nicely. Bokeh templates use Jinja templating engine, so it’s not a problem to add any custom elements to HTML. If you’re familiar with web development, you’ll find the template below very familiar:

Bokeh provides a base template, which should be extended in user applications. This base template has sections to place custom machinery appropriately.

For example, preamble section goes to head element of the final HTML page, while content section goes to body.

On render, Jinja will assemble the final HTML page for us behind the scenes. While Jinja is an extremely powerful templating engine, we will only use two of its tools to create Bokeh application with custom templates: blocks and embed macro.

Let’s break the template down. In the preamble section, we add Bootstrap CSS via a CDN link. Having done that, we can use all the Bootstrap functionality inside our template. So, we do exactly this with row and col- classes, placing elements on the grid.

Explaining the Bootstrap 12-column grid layout is out of scope for this series, but the general idea is very simple: we place elements in containers and specify their width using relative quantities (thus, col-6 takes half of the parent container). Various devices and screen resolutions are handled by the Bootstrap grid system. If you’re unfamiliar with Bootstrap, please check the documentation.

Another important element is embed macro. It allows us to embed a model into an HTML page. At render time {{ embed(roots.main_plot) }} will be replaced by a model named main_plot (or, to be more precise, by a client-side twin of a Python model). So, we need to assign names to models for them to be embeddable into the HTML template.

Application models

We now have the data management code and the application HTML template. Creating the application itself is now pretty straightforward. Let’s start with a table for the textual representation of the data:

We create the table according to the template we outlined in Part II of the series, with the only difference being in the formatter for datetime field. We force the table view to hide the row index, as we do not need it. Note, that we add name=”table” so that the table can be discovered by the template engine.

Next comes the map plot:

For the map plot, we add some specific configuration options. First, we specify axes types withx_axis_type=”mercator”, y_axis_type=”mercator”. Second, we define the x and y axes to have the same screen units with match_aspect=True, since we do not want the map to be distorted.

To have actual tiles on the plot we need to select a tile provider. We will use CartoDB: it’s free and looks just fine visually. Calls are displayed with circles on the map and we provide both data source and data view so that both table and map display the same events for the selected number of hours.

The only model left is the bar plot for dispatch type statistics:

Note, that for this kind of plot we need to specify x_range=data_provider.dispatch_types, so that Bokeh knows, that we actually want categorical X-axis. Under the hood Bokeh will create what is called aFactorRange to accommodate for the categorical data.

Application controls

We are now one step away from launching our application. The last thing we need to finalize the layout is to add a slider to select the number of hours. Let’s add one:

Now that we have all the elements, we can add them to the default document with

Let’s launch our application in dev mode (so that Bokeh will reload application on any code change) and see what we have so far:

As you can see, our application actually displays the entire layout. It’s still static, cause we have not connected any callbacks. We will do this soon. Another element our app lacks is style. We will work on this at the end of this tutorial and will make our dashboard slick and visually lightweight.

To make our application truly dynamic we need to create a periodic callback, which will trigger regular updates to the data we display. On the data provider side, updates are handled by fetch_data, so let’s create the callback function and connect it to the document.

First, we will add the update interval to the configuration file:

Next, we create a function and connect it to the document as a periodic callback. It may seem, that we only need to call data_provider.fetch_data regularly. But we also need to update the dispatch type statistics plot, which has to be done manually (although corresponding data source is created and maintained by the data provider).

Remember, we did this to bypass any complicated patching, and now we pay a much smaller price for this.

Updating dispatch type statistics plot is actually quite simple: we only need to change X-axis factors (since they may change over time). Updates to the data source for dispatch type statistics is done inside the data provider and are propagated by Bokeh without our intervention.

If we do not update X-axis range, Bokeh will just propagate the data updates from the underlying data source, and glyph data will be updated, while X-axis factors will remain in their previous state, which may or may not reflect the updated Top-10 dispatch types.

Next is a slider callback. The slider is there to change the number of recent hours to display. Like any other widget, it exposes on_change(attr, old, new) callback. In this case, attr is "value" (current value of the slider) and we need to call set_hrs method on our data provider whenever slider value changes:

Note, that we again enforce dispatch type statistics plot to refresh.

So far, all the functionality of our application is in place. You can launch it or reload the browser tab if you launched the Bokeh server with --dev.

Play with the app for a moment to identify if anything is missing or doesn’t look as smooth as expected. The application is operational now and contains all the data management and plotting functionality. The only thing left behind is styling. We will now polish the application: create custom styles and theme and add minor tweaks to the layout.

Styling

There are several ways to customize the styling of Bokeh applications. We will use several of them because we need to customize general appearance, plots, and table all at once.

Custom theme and layout

Let’s start from the custom theme. Bokeh expects it to be theme.yaml, so let’s create one:

Inside the theme file, Bokeh expects the attributes for various models, which you want to change application-wide. Of course, this is a matter of personal preferences and we encourage you to tweak it according to yours, but we want to style at least the following first:

The theme file is straightforward: it contains the per-model attributes, which you want to set, but do not want to do that in Python code.

Note, that we add FactorRange model separately so that it’s applied only to dispatch type statistics plot to make more room around bars.

For our table widget, we provide more specific constraints, so that it fits nicely, given the HTML grid size.

Unfortunately, there’s no way to specify attributes for X and Y axes separately in the theme file. So, to rotate X-axis label of stats_plot, we have to resort to Python code:

Effectively, we minimized the styling information in the Python code and moved it to the theme file, improving the modularity of the app and decoupling the style layer from the view layer.

It would be cool to set attributes by model name and not only model type, of course, but such functionality is not yet implemented, so we have to specify any per-plot attributes in code.

So far, our application looks like this:

Seattle 911 calls dashboard with basic styling

This is already not that bad, but there are still many things we can improve. Let’s start by adding a bit more spacing to the layout. To do this, we will add classes to HTML elements directly. Bootstrap provides mx- and my- set of classes. They set either horizontal or vertical margins. Let’s add my-4 to the first two row elements of our HTML template:

Custom font

The next obvious step would be to change fonts. We will use Quicksand font via Google fonts, but you may prefer another custom font. The point is to demonstrate the approach in general. To customize font for the entire app, we need to do several things. First, we add Quicksand from Google Fonts as a <style> element inside preamble section of our HTML template:

Next, we set the new custom font in our theme.yaml for tick labels:

To force all other elements to use Quicksand, we will use a shortcut: we know, that Bokeh embeds its elements in <div> element with class bk-root. Ok, let’s leverage that and add another entry to the newly created <style> section in the preamble:

Now, finally, it’s all Quicksand. This is not the cleanest way, but definitely, the simplest one. Our application starts to take shape.

Seattle 911 calls dashboard with basic styling and custom fonts

Tooltips

New font adds some style to the application, but look at our tooltips on the map plot:

Not informative at all. Let’s refactor them to actually display something useful. For that, in our main.py we will create a custom tooltip template:

Bokeh allows specifying data field in templates like this. At render time (i.e., when you hover over a glyph, to which this tooltip is attached) @type will be replaced with actual dispatch type, as well as other data fields. To attach this tooltip to the hover tool on the map plot, we do the following:

We did three simple things:

removed hover from tools list,
created one manually with custom tooltip template,
and attached the newly created tool to the plot.

Let’s also add some room around the text with custom style:

Now tooltips look much better:

Styling the table widget

To finalize styling, we may want to change, how the table is rendered. To render tables, Bokeh uses a variant of SlickGrid, which has its own set of classes defined. To change how column headers look, we add new entries to our custom styles:

You may ask, how do we know, which classes to modify. There are no rules, since Bokeh uses a lot of various classes, so if you want to change something, you go to Bokeh CSS files (for example, here) and try to find, which class is responsible for the visual attribute you want to change.

As a final touch to the application, let’s add some explanatory text just after the application title:

Bonus: using Font Awesome icons

Although our application is fully functional and styled, we may add a final touch and use some icons from Font Awesome collection to add a visual clue to dispatch types. We will add icons for some medical types, car incidents, and fires.

Before starting to use Font Awesome, we need to include the corresponding CSS:

So, what do we want to get? In the table column, which denotes dispatch type, we want not only a text, but also a corresponding icon. To achieve this, we first need to calculate, which icon to use. For this, we will add a custom JavaScript function to our template:

As you can see, this function is pretty straightforward: it checks type first and then emits HTML, which includes a corresponding icon (consult with Font Awesome gallery to see how they look).

Next, table cells allow custom HTML formatting. HTMLTemplateFormatter is responsible for this. Moreover, inside HTML template for a cell, we may provide a JavaScript template with Underscore.js.

Let’s look, how this works. Our cell HTML template will look like this:

At render time, if attached to a column, which corresponds to type, this formatter will recognize, that <%= … %> is a template, and will execute whatever JavaScript is there. To attach this formatter to the type column in our table, we simply specify it at column creation, i.e. we create column like this

instead of

And that’s it. Now everything will happen on its own. Unfortunately, there’s no simple way to add icons to the map plot tooltips, cause custom JavaScript tooltips escape their contents and we will get <i> element as a plain text, instead of an actual icon.

Conclusions

After all the development and improvements, our application looks great:

To see, how it works after being deployed, you can visit Seattle 911 calls and the code for the application is available on GitHub.

Our dashboard does the job, it’s stylish and it’s not that large at all. It even has linked selections between table and map plot, and it has them for free!

Note, that most of the code goes to the data provider, and not plotting or layout.

Wrap-up

We spent a lot of time exploring Bokeh and building various plots and layouts. You learned how to create basic and not so basic plots in Bokeh, how to use them in Jupyter notebooks, and, most importantly, how to create standalone applications with Bokeh, from simplest to quite advanced.

It’s not the end of the story, however. Often, interactive dashboards are not the products on their own, but serve as building blocks for larger applications. Fortunately, Bokeh server is flexible enough to be embedded into Flask or Django applications, and with the combined power of Python and JavaScript you can achieve whatever complicated goal you have.

Hopefully, after completing this tutorial series you have a strong foundation to dive into advanced Bokeh functionality.