Beginners Guide to Data Visualization with Bokeh
Building web-based visualization in Python from scratch
Bokeh is a data visualization library in Python. It provides highly interactive graphs and plots. What makes it different from other Python plotting libraries is that the output from Bokeh will be on the web page, meaning if we run the code in python editor the resulting plot will be in the browser. This gives the advantage of embedding the Bokeh plot on any website using Django or Flask.
Most of us are familiar with the iris dataset, it has morphological data of three different flower species namely Setosa, Virginica, and Versicolor. Let’s plot the above graph from scratch by learning the basics of Bokeh.
First things first
Installing the Bokeh library.
pip install bokeh
Importing necessary packages.
from bokeh.plotting import figure
from bokeh.io import output_file, show
from bokeh.sampledata.iris import flowers
from bokeh.models import HoverTool, ColumnDataSource
- Bokeh plotting is an interface for creating visual glyphs from which we are importing figure that acts as a container holding our plots.
- We need output_file and show from bokeh i/o to render our graph. output_file is used to specify the HTML file path where the graph will be displayed and show to render it.
- Bokeh comes with sample data to work with, we are importing the iris dataset which is of type DataFrame. It is as same is reading the CSV file with Pandas.
- HoverTool is used to display the data when we hover the mouse pointer over the points of the plot and ColumnDataSource is the Bokeh version of DataFrame. We will discuss more on it later.
Defining the output file path
Defining the output file with output_file is the first thing we have to do and show() is the last thing. Any kind of plotting and customizing has to be done between these two lines.
output_file(’iris.html’)’’'Plotting and customizing code'''
show()
We are going to specify the output path as ‘iris.html’ which will create an HTML file in the same directory we are working in. This HTML file will have the graph we are going to plot now and can be shared and used independently of the Python code.
Creating a figure object
We need to create a figure object which is the container that holds our graph. Any kind of plotting has to be done with reference to figure object ‘ f ’.
output_file('iris.html')
f = figure()
'''Plotting and customizing code'''
show(f)
Let’s plot
Now we will plot our first graph, which will be the base, and on top of which we will be tweaking its properties to customize the graph.
output_file("iris1.html")f=figure()f.circle(x=flowers['petal_length'], y=flowers['petal_width'])show(f)
The circle is a method of figure object to plot a scatter plot in Bokeh, x and y are the parameters for the x-axis and y-axis respectively. We get a very basic plot.
Circle is just one of many plotting styles, Bokeh supports plenty of such plots which you can find here.
As you can see the graph is an HTML file output viewed in a browser. This is the advantage oh Bokeh, we can simply embed any such graph in our websites and can be viewed independently.
Bokeh tools
Bokeh provides us a few tools to interact with the plot. By default, they are available on the top right corner towards the outside of the plot border vertically.
Adding background-color
Let’s increase the height and width of the plot and add a background-color. I will be showing only the code required in each customization section, you can find the full code at the end of this article. All the customization code should come after the ‘f.circle()’.
f.plot_width=1100
f.plot_height=650
f.background_fill_color='olive'
f.background_fill_alpha=0.3
The methods are pretty self-explanatory. plot_width and plot_height to change width and height respectively, filling background color with Olive and alpha is the percentage of color transparence. Irrespective of plotting style customization methods remain the same except few minor changes.
Adding a title and axis labels
A plot is readable only when we have a title to it. It is the first thing any human eye will search for in a plot.
f.title.text='Iris Morphology'
f.title.text_color='Olive'
f.title.text_font='times'
f.title.text_font_size='25px'
f.title.align='center'
‘title’ is the method used here. It has plenty of methods to tweak with, I recommend you to experiment with them.
Now we will add x-axis and y-axis labels to make it more readable.
f.xaxis.axis_label='Petal Length'
f.yaxis.axis_label='Petal Width'
f.axis.axis_label_text_color='blue'f.axis.minor_tick_line_color='blue'
f.yaxis.major_label_orientation='vertical'f.axis.major_label_text_color='orange'
ColumnDataSource
We can think of ColumnDataSource as a Bokeh version of DataFrame, as Pandas operations work better with DataFrame Bokeh works better with ColumnDataSource. We will create a ColumnDataSource out of our flowers DataFrame to make our life easier with Bokeh.
We will create different ColumnDataSource’s for each flower specie. This way we can plot the Bokeh graph easily.
setosa = ColumnDataSource(flowers[flowers["species"]=="setosa"])versicolor = ColumnDataSource(flowers[flowers["species"]=="versicolor"])virginica = ColumnDataSource(flowers[flowers["species"]=="virginica"])
ColumnDataSource object is nothing but a python dictionary. Let’s have a look at setosa ColumnDataSource.
We will encounter the use of there ColumnDataSource’s from now on.
Adding color and legend
We will try to display scatter points of different species with a different color. Setosa with red, Versicolor with green, and Virginica with blue. To do that we have to plot the three species separately with three f.circle() methods but on the same figure. This gives us the flexibility to add a legend to each species also.
The source is a ColumnDataSource, x, and y values are the keys of that ColumnDataSource which we like to plot. Notice, we have added size to each scatter point to show the density of each point, the only reason we multiplied each value by 4 is to make points more visible on the graph. The line_dash parameter is used to make the scatter point border dashed rather than plain line and legend_label is to add a legend.
Notice, the legend is overlapping on the scatter point and it also doesn't look good. Let’s customize it.
f.legend.location = 'top_left'
f.legend.label_text_color = 'olive'
f.legend.label_text_font = 'times'
f.legend.border_line_color = 'black'
f.legend.margin = 10
f.legend.padding = 18
Adding hover effects
I would like to see more information upon hovering on the scatter points like the petal length and width which are basically axis coordinates and also an image of the corresponding species.
Remember we imported HoverTool at the beginning? this is where it comes into play. We can do a lot with HoverTool. To achieve the above goal we need to add an HTML code within HoverTool and then add the HoverTool to the figure object.
HTML code to add species image
<div>
<img
src="@imgs" height="42" alt="@imgs" width="42"
style="float: left; margin: 0px 15px 15px 0px;"
border="2">
</img>
</div>
wait, where does “@imgs” in the “src” attribute come from? We need to add the URL of the images of respective species to our DataFrame before creating ColumnDataSource.
#image url to species
urlmap = {'setosa':'https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg/800px-Kosaciec_szczecinkowaty_Iris_setosa.jpg',
'versicolor':'https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/Blue_Flag%2C_Ottawa.jpg/800px-Blue_Flag%2C_Ottawa.jpg',
'virginica':'https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/800px-Iris_virginica.jpg'}#creating new img column in DataFrame with image url as values
flowers['imgs'] = [urlmap[x] for x in flowers['species']]
“urlmap” is the python dictionary containing the image URL for the species, we then add the URL to each instance of our DataFrame using a list comprehension. Post addition our DataFrame looks like this.
After this we create ColumnDataSource and that is where “@imgs” in the “src” attribute came from.
then we add HTML code to display the name of the specie, petal length, and petal width. Where ever you find “@” in the HTML code it means it is referring to ColumnDataSource and Bokeh is smart enough to map the data with respect to each scatter point on the plot. This is how Bokeh and ColumnDataSource work hand-in-hand and hence ColumnDataSource is preferred over DataFrame.
Note:
I notice that when I use “@”, Medium is referring to the Twitter profile link. That is not the aim of the article. If this has hurt the feels of respective Twitter handlers in any way, my apologies.
That’s it, our first Bokeh plot is ready, and if you have come then far, cheers.
Source Code
Hit “view raw” at the bottom right corner of the source code the to copy-paste code
Learn Matplotlib and Seaborn
Let’s connect…