10 Heatmaps in 10 Python Libraries

I recently watched Jake VanderPlas’ amazing PyCon2017 talk on the landscape of Python Data Visualization. That presentation inspired this post. In programming, we often see the same ‘Hello World’ or Fibonacci style program implemented in multiple programming languages as a comparison. In Jake’s presentation, he shows the same scatter plot in several of the libraries he featured. Below, I am following the same formula. I am recreating a heatmap about airline flights, in ten different python visualization libraries.

I am also launching a public GitHub repo Python-Viz-Compared for these comparison notebooks. Each Jupyter notebook will contain one chart (bar, scatter etc) and then up to 10 different ways of implementing them. But for this post, we are going to start with HeatMaps.

Source code in this version of the article maybe curtailed or cut.

the challenge

How easy is it to take a matrix or 2d array and convert it to a heatmap with custom color scaling?

Jake gave us a good chart sorting out the various approaches of these libraries:

Taking Jake’s lead , let’s summarize the various quadrants shown here:

  • Upper Right Purple: Matplotlib Family These libraries keep matplotlib as a versatile battle-tested back end but provide streamlined domain specific APIs.
  • Upper Left Turquoise: Browser JS Build a new API that results in a javascript serialization of a plot for display in a browser.
  • Lower Right Red: Declarative Visualization Following a open, cross-platform specification language the data, and visualization can be rendered in any back end by any language.
  • Everything Else: This includes a lot of other stuff from platforms that support server-side rendering of streaming data to libraries that leverage GPU platforms specifically.

I’m going to refer to this system as the VanderPlas taxonomy. So with that overview, let dive in to our challenge.

Update: Just after posting I cam across this amazingly written post A Dramatic Tour through Python’s Data Visualization Landscape and I am sorry I didn’t have the creativity to equal it.

Data for Our Heatmap

The heatmap will be making is actually one of the examples in the Seaborn documentation. It comprises monthly totals for airline passengers from 1949 to 1960 on a specific route. It is adapted from the R package airpassengers.

For me detail on the data, see the original post where I go into the detail of the matrix versus array data form. For now, let us move on to the plots.

Heatmap 1: MatplotLib

First up matplotlib, the most venerable python visualization library with support to export and use many many rendering types (png, pdf, svg etc).

While matplotlib makes heat maps really easy with imshow, I find it tricky to have a mental model of the figure, subplot, and axes. The subplot is critical here in order ensure you can rename the axes labels. Then a whole new module needs to be imported to set the ticker correctly.

But at the end, every element is here and it will be the standard we measure against. Clean color scale mapped to the passenger volume, titles, axis labels all consistently shown.

credit: https://stackoverflow.com/questions/32236046/add-a-legend-to-my-heatmap-plot

Heatmap 2 Seaborn

Seaborn is a streamlining of matplotlib’s API to make it more applicable to statistical applications. Seaborn’s API makes you think about the best way to compare univariate or bivariate data sets and then has clear and concise syntax to get the charts needed to immediately compare your variables.

Remember, seaborn is implemented on top of matplotlib. So, you can use the same conventions in matplotlib to manipulate the chart. So, I used matplotlib’s figure to define the size of the chart I wanted and then Seaborn took care of the rest. Giving us a perfect heatmap with the added color bar to show the scaling.

Heatmap 3 PlotNine (ggplot2):

plotnine is the python implementation of R’s most dominant visualization library ggplot2. Like matplotlib in python, ggplot2 is the default visualization for R with support for all types of outputs.

Plotnine is a bit of magic especially if you are really good at ggplot (I am not) and you are willing to import the entire namespace with import * so that you don’t have to prefix every element. Then it faithfully recreates the ggplot2 syntax.

As I said above, if you are coming from R to python, you will love this library which faithfully recreates the ggplot syntax. The team at Plotnine has said they are looking to implement the entire “Grammar of Graphics” using matplotlib as the backend to render the images.

credit: https://www.r-bloggers.com/how-to-make-a-simple-heatmap-in-ggplot2/

Heatmap 4: BqPlot

Based on the VanderPlas taxonomy, the next four libraries are from a different core set of assumptions. These all use a python API to customize a javascript client-side framework that renders the data and figure in the browser. The advantages to this approach are that the figures have a modern look and can include rich browser interactions such as zooming, selection, and filtering.

I had not used bqplot prior to this exercise and like ggplot, plotnine, it has a syntax inspired by grammar of graphics. Frankly, the heatmap example was so easy, I can’t claim to have learned very much

Got a great comment from Sylvain Corlay, the author of bqplot on how to add the tick and label names which were missing in the original version of this post. It was really obvious and clearly shown in the examples. Basically, there are column and row arguments that allow you pass the label names so that they appear. Thanks to Sylvain and bqplot team for helping me.

Heatmap 5: plotly

plotly is fantastic plotting library that combines a free, open-source version and also a paid version that offers some server assisted features. Plotly has amazing cross-platform support for python, R, and Javascript. It has also has great documentation and example library.

Plotly as you can see was very succinct and they added interactivity automatically. The online support also is really great. By visiting the links at plotly, you can can edit the chart in a sort of gui on their website and even regenerate the code used to create the plot. This built up a great community of chart visualizations.

This is sort of a trick. Cufflinks is plotly just with a different api designed to be run directly from a pandas dataframe. This makes the data inputs easier set-up and use in the charts.

Yea that was easy, perhaps too easy since the plot was arranged vertically instead of left to right. This is the sixth heatmap of our group and the only one out of ten to derive the aspect of the chart in this way. Still, I love the combination of plotly and cufflinks. The embedded tools to save, as PNG, zoom and select are really powerful. If you can accept having a reliance on plotly’s servers, this library is an extremely concise and easy.

Heatmap 7: Bokeh

Bokeh is another combination javascript client library and python API. Developed and maintained by Anaconda (formerly Continuum Analytics). Bokeh has a rich grammar elements not only the chart elements but also for interactions and dashboarding. Bokeh has two aspects that make it unique: First, Bokeh has shared data structures that sync with the server and can then update multiple linked plots in rich dashboards. Second, bokeh is being developed as a backend for newer libraries such as holoviews (coming up in plot 8).

As you can see, Bokeh is one of the more verbose implementations of this chart so far. But, I will admit I am homer for Bokeh. That verbosity reveals an underlying complexity where you can literally map everything. What makes it verbose is the manipulation of the properties (labels, colors, data ranges) but those properties follow a logical grammar and are easy to find. The label to identify the x-axis is simply the xaxis.axis_label. Likewise, the font size of the title is simply title.text_font_size.

The second major advantage of Bokeh is it’s unique way of linking the data between plots. Notice the ColumnDataSourceabove. Say if we wanted to add a table or line graph from the same underlying dataframe, Bokeh would automatically recognize that the graphs shared the same data source and update them together. The charts could be updated from a widget like a dropdown box or radio button. In this way, Bokeh is works like Shiny from the R ecosystem. Some simple jinja templates and Bokeh server functionality allow the construction of rich visualizations and web apps.

Heatmap 8: Holoviews

So if Bokeh is too verbose with too many attributes to manage, take heart that there is a new project from the same team at Anaconda that uses bokeh as a backend but gives the advantage of a cleaner API that infers the correct chart based on the input data. Holoviews has the very ambitious goal of “letting data visualize itself.” So for holoviews, the aim of the designers is to have you state what you want to learn and pass in the data and then the library will infer the various variables and create the chart. In fact, bokeh is not even the only backend supported. The team has a backend interface to matplotlib and also to plotly, though these do not have feature parity with the bokeh backend yet.

If you thought bokeh was too verbose, Holoviews is why. Anaconda is positioning bokeh as a backend for visualization while the ease of use for the analyst/data scientist gets developed in holoviews.

One line of code is all that took. Very cool! Let me explain a little bit about what happened. Holoviews separates the style from the substance. When plotting with holoviews, the various plot types HeatMap, Scatter etc look for a combination of value dimensions (vdims) and key dimensions (kdims). So our data contained in flight_rows, has two kdims (month, year) and one vdim (passengers). So when we invoke hv.Heatmap it can immediately infer the correct display. The style is all handled separately through declared options which can be invoked globally through cell or line magics in Jupyter Notebooks.

Heatmap 9: Altair

Altair is one of the newer libraries on the block. It was started by a team including Jake VanderPlas and is being supported by the Interactive Data Lab @ the University of Washington. Altair is part of an ecosystem surrounding Vega a JSON specification format that renders using the D3 visualization library. Given the power and amazing influence of D3, Vega, and Altair (they are both stars in the summer triangle) promise a much easier interface and implementation of visualizations.

As I mentioned in Heatmap 4 sometimes the tutorials are so simple, it can be hard judge the library. That is absolutely true of Altair, which was really easy to figure out.

It’s very important to know that the specification being written is converted immediately to JSON. You can view that JSON using the “view source” or Open In vega Editor button which automatically links to a nice online tool imperatively change the JSON and view result.

Because this will end up as JSON, an attribute getting a single value can be set directly, such as column='year', but when more arguments are needed, you have to call it like a class method. Thus, color=Color('passengers', color=Color('passengers', scale=Scale(type='linear', range=['#bfd3e6', '#6e016b'])) gets transformed into the following JSON.

"color": { "field": "passengers", "scale": { "range": [ "#bfd3e6", "#6e016b" ], "type": "linear" }

So the basic Altair/Vega-Lite syntax is as follows:

  1. Create a chart that invokes the data
  2. As for “text” marks to be drawn on the chart. We then ask that text be drawn with background color.
  3. Encode the data (from #1) onto the text marks. Encodings accept expressions so rather than encoding ‘passengers’ onto the text property, we instead encode the value ‘ ’ to ensure the actual counts do not appear.

The only problem with this chart is that the color scale is not the full BuPu brewer scale rather it is simply a scale between the two colors as written. It should be possible to have a Scale written against vega’s continuous “bluepurple” color scheme which matches BuPu palette we have used so far. However, it does not appear that altair does not support passing those colormaps to scales yet. (I believe that will come when upgraded to vega-lite 2.0).

Heatmap 10: Lightning Viz

Referring back to the VanderPlas taxonomy, Lightning Viz is in the everything else category. Like plotly, it is cross-platform with a javascript client api for R, Scala, and javascript. Also, it relies on communication with a server that handles much of the difficult rendering. However, it is released under MIT license so its completely free to use and improve. When you develop in a notebook environment like this post, you can choose to use a public lightning server maintained by the development team or you can one-click deploy a server via heroku, docker or other service

And we have a new winner for the easiest implementation of our chart. Lightning really did perform this task in one line of code. We could nitpick that there is no way to add a title but really this was impressive. The server also adds a bunch of functionality that would otherwise not be available (unless you are using plotly). Each chart gets its own unique URL so they can be shared and passed through an organization. The server even includes a mini-wizard for users to build their own visualizations without code. The server also makes these visualizations really fast because it does some magic to customize the javascript and cache results. This is especially helpful if you are going to stream in data requiring constant updates.

Wrapping Up

Did you make it this far hoping I would recommend one of these libraries for all your use cases? I can’t do that unfortunately. But I can give you a little guide for how to pick.

  • Do you need publication worthy images for a journal or PDF? Pick matplotlib, seaborn or any of the libraries that use an matplotlib backend. This is because matplotlib has the most numerous outputs to render the chart as an image and is natively supported by nbconvert for jupyter notebooks. I should say that altair and bokeh are making strides in this area. But they rely on massive external libraries (often written in Node) to consistently generate the outputs.
  • Do you plan on building a large set of dashboards or custom web-application with these charts? Then I would consider any of bokeh, plot.ly, lightning, or altair. They are built for browsers and and support adding interactivity through signaling and widgets. Lightning especially will help speed up the delivery of these charts and the caching/sharing of the results will ease your client-side loads.
  • Are you a statistician data scientist who just wants to explore data? I’d consider holoviews or seaborn whose APIs focus on bivariate, univariate tests based on inbound data. The styling may be dense and tricky to navigate but styling isn’t your chief concern.

Feel free to leave comments below or reach out to any of us at Algorex Health.

I have converted most of the charts to png images for this post. This allows for faster loading and better integration with our CMS. However, the working notebook that generates all the charts in browser is at Python-Viz-Compared