MLearning.ai
Published in

MLearning.ai

An Alternative Introduction to Altair Plotting

Common Options and Best Practices

Photo by Stephen Phillips — Hostreviews.co.uk on Unsplash

Scope

The article is intended for readers that have some familiarity with at least one plotting library, and within the python ecosystem that is likely to be matplotlib . Therefore when introducing a new plotting library, beyond the basic syntax, most people want to know how to enable the common options and apply basic best practice in setting titles, defining user friendly tooltips and separating transformations from plotting elements.

Rather then introducing a new dataset, the article is based on altair’s Example Gallery but with these common options and best practices applied. This article shows both the original and modified plots to produce a skim-friendly reference.

Welcome to Data Science Guides by Ashraf Miah | Image by the Author

Simple Bar Chart

The comparison shows the original (on the left) and the modified Chart (on the right). The plot is the simplest example, however it lacks some basic best practice such as a plot and axes titles. Colour coding of the bars was added to create greater contrast.

A Simple Bar Chart Comparison by Ashraf Miah | Image by the Author

The following code is directly from the Altair Developers:

The data in source DataFrame looks like the following:

The new modified chart was produced using the following:

Three significant changes:

  • The axes were given a more meaningful title using the alt.X class.
  • Tooltips were added with again a custom title using the alt.Tootlip class.
  • A title for the whole plot was added to reflect best practice using the properties element for the Chart.

Simple Heatmap

The comparison between the original and the modified version is provided below:

Simple Heatmap Comparison by Ashraf Miah | Image by the Author

The code from the Altair Developers generates the source data using a simple relationship between x, y and z:

A preview of the source data is presented below:

The modified Chart was produced with the following code:

Relative to the previous plot the color channel has been customised using alt.Color class. The rest of the Chart reflects best practice: the axes, legend and plot are given titles using a similar pattern: alt.X('x:O', title='X Coodinate'). The plot size was also changed — note the use of Integers instead of Strings.

Simple Histogram

The comparison between the original and the new chart hides significant under-the-hood changes. Altair performs a transformation using a single line (bin=True) to create a histogram, whereas manipulating the data with pandas requires much more effort:

A Simple Histogram Comparison by Ashraf Miah | Image by the Author

The original code contains a single keyword argument that transforms the data with bin=True:

The data is from IMDb and includes a number of film facts:

The histogram is of the IMDB_Rating, which is made easy by altair using the bin=True option. In contrast, manually binning the data and presenting as a histogram takes more than a single line!

This example helps illustrate how a big data set containing say 10 Million rows could easily be transformed using spark, dask, vaex, etc into a pandas Series with only 10 rows, where altair would typically struggle. The following steps have been adapted from Issue #1691 from the altair team.

The data within the url is stored as DataFrame ( df ) using the pd.read_json() function. Then the first step is to create the bin categories (or edges) and then divide the data accordingly:

The output of the check confirms the categorisation is correct:

The next steps are:

  • aggregate the data using groupby
  • create label columns for both the minimum and maximum rating per category
  • remove the index as its no longer needed

From the whole DataFrame, only the IMDB_Rating column is required and aggregated (groupby) using the binning categories, counted, converted and renamed as a DataFrame. The range of each bin is then extracted using list indexing into separate columns and the existing index removed. The data is now of this format:

The following example is not found in the Gallery and is not explicitly stated, but the following code is the minimum to create a Chart with pre-binned data:

Pre-binned Histogram by Ashraf Miah | Image by the Author

Note the keyword argument bin=’binned’ is required, which enables the upper limit of the bin range to be defined using the x2 (or y2) encoding channel:

Image by the Author

The tooltip contains both the upper and lower range for each bin; this could be improved using Vega Expressions and altair’s transform_calculate function:

Image by the Author

The code shows a number of altair features at once; the data is transformed to create a new parameter called rating_range. The two label columns are converted to strings using Vega expressions and then combined using JavaScript syntax to create a label of the format: 6–7. This is displayed as a tooltip. The Chart is also assigned a slightly larger size for better comparison.

However, the plot does not reflect best practice as illustrated by the side-by-side comparison below:

Comparison with Best Practice by Ashraf Miah | Image by the Author

The code of the modified chart (on the right) is presented below:

The number of altair lines has doubled for the benefit of using pre-binned data. The transform_calculate function to generate a new parameter has already been covered. To remove the second axis title, it must explicitly be set to None i.e. alt.X2('bin_max', title=None). Tooltips are added with well formatted labels using the title parameter as is a plot title.

Simple Line Chart

The comparison between the original and new plots is shown below with some minor changes to reflect best practice:

Simple Line Chart Comparison by Ashraf Miah | Image by the Author

The code used to generate the original plot is a very simple example of a sine plot:

The data is of the format:

The modifications to the original plot are minor using more explicit titles for the axes and a title for the overall plot:

Simple Scatter Plot with Tooltips

The simple scatter below using static images between the original and the new hides some changes in using Tooltips as well as the application of best practice:

Simple Scatter Chart Comparison by Ashraf Miah | Image by the Author

The original code uses the cars dataset from Vega and shows a modified size for the circles:

The data is of the format:

The modified plot contains a number of best practices in terms of making it easier to read and understand:

An plot title has been added and explicit titles for the y channel given that it reflects fuel efficiency. The Tooltip now contains an abbreviation for the Miles_per_Gallon encoding to obstruct the Chart less when viewing interactively.

Simple Stacked Area Chart

A comparison between the original and the modified:

Simple Stacked Area Chart Comparison by Ashraf Miah | Image by the Author

The original code uses the Iowa, USA energy generation data set and the area mark:

The data is in the long form:

The modified code reflects the best practice discussed previously using explicit titles for all channels:

Simple Strip Plot

A comparison between the original and the modified:

Simple Strip Chart Comparison by Ashraf Miah | Image by the Author

The data is again based on the cars dataset from Vega; note the explicit use of encoding data types:

  • Q : Quantitative
  • O : Ordinal

The data is of the format:

The modified Chart adds the color channel for added contrast:

The plot reflects best practice with the addition of the color channel as a Nominal data type.

Summary

A first set of plots from the Altair example gallery has been reproduced with best practice wherever possible but also by separating transformations from the plotting components. The modified plots have illustrated the following additional features:

  • Adding a plot title
  • Explicitly labelling axes with alt.X(<column>, title='Meaningful Axis')
  • Explicitly labelling other channels and Tooltips
  • Generating new supplementary data using transform_calculate for display purposes.
  • Binning data to generate aggregations for subsequent plotting with altair.

So an alternative introduction to Altair by the use of several examples showcasing common options and best practices, which are missing from the Example Gallery from the Altair user guide.

Attribution

All gists , notebooks and terminal casts are by the author. All of the artwork is based on assets explicitly CC0, Public Domain license or SIL OFL and is therefore non-infringing. Theme is inspired by and based on my favourite vim theme: Gruvbox.

Connect

Feel free to connect with me on LinkedIn.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ashraf Miah

Ashraf Miah

Data Scientist and Chartered Aeronautical Engineer (MEng CEng EUR ING MRAeS) with over 15 years experience in the Aerospace, Defence and Rail Industry.