Hands-On Lab

Creating BI Dashboards inside a Databricks notebook

Plotly + PySpark + Databricks

Ali Abbas
Geek Culture

--

Databricks is becoming the Big Data Analytics solution of choice for more and more enterprises now. I personally have been working on Databricks from more than a year and have handled migrations from other Data Analytics platforms to Databricks.

Databricks provides a rich feature set for building interactive dashboards by connecting SQL data sources with its compute and providing a UI interface for running SparkSQL and fiddling with the charts. More information about it here.

This blogpost in a nutshell

In this post I will demonstrate how to create a dashboard with multiple interactive graphs and view the dashboard right inside your Databricks notebook. I will be using Plotly for generating all our graphs. The trick here is to use the subplot feature of plotly and append multiple traces to the parent plot. I then finally generate HTML out of the parent plot and render it using displayHTML method available in Databricks.

Calculating the size of the dashboard

Our first step is to determine the size of the overall plot or the dashboard. Imagine it as a grid or a 2D array where each element of the array would be a subplot. We need to find the length and width of this 2D array. For simplicity we are assuming that every graph will be of equal size on the dashboard.

import mathdef get_dashboard_dimensions(no_of_plots):

y = math.sqrt(no_of_plots)
x_width = 0
y_width = 0
if(y.is_integer()):
x_width = y_width = y
else:
z = int(y)
y_width = z
x_width = z+1
if (y_width * x_width) < no_of_plots:
y_width += 1
return (x_width, y_width)

Populating the Dashboard

Next step is the use the plotly’s subplots module and add all of our graphs as traces into our dashboard. We use the make_subplots method of the subplots module.

from plotly.subplots import make_subplots
from plotly.graph_objs import *
def populate_dashboard(plot_array): x_width , y_width = get_dashboard_dimensions(len(plot_array))
plot_counter = 0
dashboard = make_subplots(rows=x_width, cols=y_width, start_cell="top-left") for x in range(1,x_width+1):
for y in range (1,y_width+1):
if plot_counter <= len(plot_array) - 1:
dashboard.append_trace(plot_array[plot_counter],row=x, col=y)
plot_counter += 1

return dashboard

Displaying the Dashboard inside Databricks Notebooks

Once we have the dashboard ready, our final step is to use plotly’s offline module and export the dashboard to HTML. We use the plot method of the offline module. Once we have the HTML ready, its just a matter of calling the displayHTML method of Databricks and render the dashboard right inside our notebook.

import plotly.offline as pyodef display_dashboard(dashboard):
inner_html = pyo.plot(dashboard, output_type='div')
displayHTML(inner_html)

Driver Program

To test this code yourself you can use the following driver program. Here we generate multiple graphs with random datasets and then pass all of them as a list to our above methods to render the dashboard.

from plotly.graph_objs import *data_x = ["a","b","c","d","e","f","g","h","i","j"]
data_y = [2,3,6,3,4,6,7,8,2,1]
plot1 = Bar(x=data_x,y=data_y)
plot2 = Scatter(x=data_x,y=data_y)
plot3 = Scatter(x=data_x,y=data_y,mode='markers',marker=Marker(color='black'))
plot_array = [plot1, plot2, plot3]export_canvas(figs)generated_dashboard = populate_dashboard(plot_array)display_dashboard(generated_dashboard)

In the above code we generate 3 different plots (bar, line, scatter) for the same set of data and then pass on those plots to our method which creates a dashboard out of it. We then finally display our dashboard as HTML right inside the notebook.

The dashboard will look like below

Exporting the Dashboard

Having the entire HTML of the dashboard available, exporting the dashboard becomes really straight forward. We just need to copy the HTML contents to a .html file and place it in any of the external mounts. It is a self sufficient HTML file and can also be embedded as an iframe/widget into existing webapps.

def export_dashboard_html(dashboard, file_path):

inner_html = pyo.plot(canvas, output_type='div')
with open(file_path, 'w') as f:
f.write(inner_html)

Customising the Dashboard layout and adding more features

Once we have our dashboard rendered inside our notebook, we can now do anything and everything that plotly supports.

In our sample code we are rendering every graph with equal overall height and width. We can take this one step further and create a custom layout within our subplot. More information about it here.

We can also add dropdowns and other widgets in the graph to interactively filter them based on selected criteria. More information about it here

Alternative ways

There are other ways of doing this as well. Instead of using subplots module, we could directly generate HTML for each graph separately, append all of it in a file and then render the entire file as HTML in the notebook. One drawback of this would be that every graph would come with the same set of plotly.js code and the file size of HTML will be huge even in the case of 4–5 graphs.

We could have also achieved the same outcome using Databricks’s inbuilt visualisation and plotting capabilities. This would tie our code with Databricks to some extent and would require re-write if we would want to port it to a different analytical engine.

--

--

Ali Abbas
Geek Culture

Architect by role, developer by heart! I help organisations get best of Big Data on Cloud