Almost 10 Pie Charts in 10 Python Libraries
Here is a follow-up to our “10 Heatmaps 10 Libraries” post. For those of you who don’t remember, the goal is to create the same chart in 10 different python visualization libraries and compare the effort involved.
All of the Jupyter notebooks to create these charts are stored in a public github repo Python-Viz-Compared. Each Jupyter notebook will contain one chart (bar, scatter etc) and then up to 10 different ways of implementing them. We started with Heatmaps now we move on to PieCharts.
Are Pie Charts the worst?
It’s a common visualization joke that PieCharts are the worst. Or in the words of famed data visualization guru, Edward Tufte:
One of the prevailing orthodoxies of this forum — one to which I whole-heartedly subscribe — is that pie charts are bad and that the only thing worse than one pie chart is lots of them. source
Accepted, Pie charts may be the worst. But they have one amazing purpose. Pie Charts are best for pie. That’s right our data set is all about Pie and National Pi Day is coming up in just two weeks.
The data set for this exercise is taken from the National Health and Nutrition Examination Survey. Every two years over 10,000 people are interviewed about what they ate the day before. The foods they ate are carefully categorized allowing us to really understand how Americans eat pie. How many of us eat pie on a random day? Let’s see.
Data for our Heatmap
The pie categories look something like this:
53304000 Pie, blueberry, two crust
53305720 Pie, lemon (not cream or meringue), individual size or tart
Very specific. I have taken the liberty of filtering the data set including only foods that start with
Pie, and then extracting the principal flavor. You can see that in the Prep NHANE Data.ipynb notebook.
We will read it into pandas and use it for all charts going forward.
To see more of data, view the original post which supports tables and more outputs.
Pie charts are often presented in sets. They show the realtive proportion of different groupings. In our tests below, we will try to graph both where Americans get their pie (source) and what flavors they prefer (FoodCode). This provides a new test for the ten libraries used to visualize them. Should a visualization library include layout tools to create a grid or rows of different charts. Or, should charts be their own distinct elements to be composed by a document such as Jupyter (or by the DOM for the web)?
All libraries demonstrated here answer this question differently.
So let’s get our data on pie sources and pie flavors together. We will use a standard pandas aggregation to create two data frames from which most of the visualizations will be built. This
pie_sources = pie_raw.groupby('Source').agg('count') pie_flavors = pie_raw.groupby('FoodCode').agg('count')
Pie Chart 1: MatplotLib
First up matplotlib, the most venerable python visualization library with support to export and use many many rendering types (png, pdf, svg etc).
And now we can clearly see one of the major drawbacks of pie charts. Any dimension of greater than five elements looks awful and can’t be differentiated. Let’s fix this going forward with some pandas magic which will group lesser values under “other”.
The formula below will calculate the 75th quantile and group the lesser values together:
We can then remake our chart:
Overall, matplotlib/pyplot pie charts are pretty easy. Notice we setup a 1 row grid and placed two subplots within that grid. That allowed matplotlib to draw each plot in one overall figure. The next two libraries use matplotlib as a backend so you will notice some of the same layout features used.
Almost Pie Chart 2 Seaborn
When we did the post on heatmaps, I wrote about Seaborn’s special use case:
Seaborn is a streamlining of matplotlib’s API to make it more applicable to statistical applications. Seaborn’s API makes you think about the best way to compare univariate or bivariate data sets and then has clear and concise syntax to get the charts needed to immediately compare your variables.
Given this use case, there is actually NO way to do a pie chart using Seaborn. This makes sense. Pie charts are a difficult and deceiving way of comparing univariate data. A bar chart can always replace a pie chart so pie chart is simply not included and shouldn’t be included. Of course being an open source project, people have requested it. However, Seaborn is the ultimate swiss-army knife for data science. Part of creating the perfect tool for peering into data means leaving out views that aren’t helpful or frankly deceptive by design. Fear not, every pie chart can be a Bar Chart. So where we cannot see pie, we can still visualize pie.
Seaborn inherits axes, figures, and subplots from matplotlib. Above, we plotted the same data on a bar graph. This is instantly better because it allows the viewer to know not just the relative comparison of the flavors or sources, but also, exposes that we only have about 164 entries for Pie in the entire NHANES survey.
And that is how Pie charts deceive you. In a pie chart, I can show you relative percentiles but cover up the fact that pie isn’t eaten all that often. Of the more than 10,000 people interviewed very few had eaten pie that day.
Almost Pie Chart 3 PlotNine (ggplot2):
Now you can do pie charts in ggplot2 by using polar coordinates to draw specific sectors of a circle. That is a interesting and forces the user to identify exactly how a pie chart works: a full circle in radians divided by the relative percentage of each sector to be drawn.
I am back to 7th grade math. Unfortunately, plotnine, the Python implementation of ggplot, has not yet ported over the
coord_polar ggplot layouts so alas we also can’t use it to create a pie chart either. Once again back to bar charts:
So alas, pie charts not supported. Also, the layout features of ggplot that would allow for layout multiple plots are not yet implemented so we used Jupyter commands to display both outputs.
Still, ggplot2 and plotnine have a nice natural rhythm to them. I love the grammar of defining my plot and continuing to add elements that shift it. It feels like functional programming for visualization as opposed to just a big script of code.
Pie Chart 4: BqPlot
Pie Chart 5: plotly
And then here is our chart: I have swapped the chart for static images so some of the interactivity may be disabled
Plotly as you can see was very succinct and it added interactivity automatically. The online support also is really great. By visiting the links at plotly, you can can edit the chart in a sort of gui on their website and even regenerate the code used to create the plot. Notice the intelligent handling of the labels for each wedge. This is so hard to do and calculate reliably. Plot.ly really excels at these little details.
This is sort of a trick. Cufflinks is plotly just with a different api designed to be run directly from a pandas dataframe. This makes the data inputs easier set-up and use in the charts.
Embedding ploty.ly and pandas is really powerful but alas it means no composing of sub-plots into a single figure.
Pie Chart 7: Bokeh
Now remember all the circle angle talk above with ggplot. Well, Bokeh is going to require us to manually calculate, in radians, the start and end angle of each wedge.
Did that hurt you as much as it hurt me? Bokeh has a great grammer of graphics but, you have to calculate everything about the wedges manually. So at the top, we calculated the start and end angle of each wedge in radians. To do that we need the cumulative sum of each percentage. After all of that, we still have a gap in the pie chart because our cumulative sums don’t start at zero.
Don’t ever make a pie chart in bokeh. This stack overflow was critical to figuring this out. If rule 1 of visualization is “don’t make a pie chart” rule 2 is don’t make it in bokeh.
Pie Chart 8: Holoviews
Holoviews uses bokeh as its underlying engine but reduces the verbosity by having the user declare attributes about their data and allowing the visualizations to infer themselves from the dependent and independent variables, referred to as value dimensions (vdims) and key dimensions (kdims). It’s really great but that also means it has no use for Pie Charts. We will use a bar chart just to show off.
One of the advantages of holoviews is that before declaring what visualization you want, you actual identify the structure of your data and then how you want that data summarized and displayed. As stated in the holoviews introduction:
HoloViews focuses on bundling your data together with the appropriate metadata to support both analysis and visualization, making your raw data and its visualization equally accessible at all times.
To demonstrate this notice how we first created a Holoviews table from the raw data
pie_raw. From there we were able to go all the way through our data preparation process and finally simply ask for a bar graph of the data. We lost the groupings but with a bar chart we can see all the detail without a problem. Holoviews delivers one easy nicely formatted chart in very few lines of code. Also, the layout is as easy as using the
* operators to join charts together. This is by far the best layout syntax we have used so far. That means if you are blending multiple types of charts using the same underlying data Holoviews might be the best option for you.
The coloring did not work though I am fairly sure that I got the Cycle syntax correct. Someone feel free to comment with how I should fix that.
Pie Chart 9: Altair
Altair the python implementation of the Vega-lite Specification. What’s the difference between vega and vega-lite? Well, there are lots of differences and one of them is that Pie charts aren’t supported. We are lucky because even in full vega it would require us to calculate the angles once again manually just like in Bokeh.
For now, here is a bar chart of our data.
Pie Chart 10: PyGal
New to our analysis is PyGal. Described as “Sexy python charting” which is great. PyGal uses a range of svg and css frameworks to create rich in browser visualizations with a syntax that is dramatically simpler. The syntax of PyGal is very different in that you will compose each data set almost row by row. This seems like it could be hassle but it is very easy in practice. Because it generates raw SVG, you use Jupyter’s own DISPLAY command to visualize it making this very portable around the web. Now, Medium doesn’t support SVG but you can view it on the original blog post. I have used a picture here.
PyGal wasn’t included in our previous edition of 10 for 10 (pour one out for Lightning-viz which is retired). Considering it creates fully scalable vectors, it was fantastic and much simpler to use than Altair which is the leading full SVG visualization library. PyGal also has great set of default styles that look fantastic. PyGal’s SVG web first style also makes it much easier to embed with Flask, Django or other web framework.
What is the overall takeaway from this post? Well, you should clearly understand by now is that you don’t want to use pie charts ever. It is routinely left out of visualization packages or buried so deep that its almost impossible to implement. Pie charts are also visually deceiving for your users so best to avoid them.
The real lessons here in this post involve how you take two charts and lay them out together in one seamless view. We can see some of the libraries Bokeh, Matplotlib provide natural and embedded ways of combining two plots that don’t share data or axes so that they can play well together. Other libraries rely on you the user to layout the final product either using Jupyter HTML commands or with in your website or publication.
Lastly there are some real differences in terms of data handling:
- Matplotlib and Plot.ly were able to use the counts under each grouping to easily create the pie chart and automatically calculate the percentages for each wedge.
- BqPlot, Pygal needed to have the percentages pre-calculated. Not terribly difficult but just different.
- Bokeh required that you calculate the exact angles and percentages.
- Holoviews was the only that could really just use the raw data from the survey itself.
Originally published at https://blog.algorexhealth.com/2018/03/almost-10-pie-charts-in-10-python-libraries/.