Break your bar chart habit

Deconstructing visualization design using visualization grammar

Nancy Organ
Dec 1, 2020 · 6 min read
Image for post
Image for post

Remember that truly stellar bar chart you saw a few months ago? Truthfully, no one else does either, and that’s a problem for those of us in the business of communicating data with impact.

We often describe chart design as a dichotomous process of selecting specific charts for specific purposes: Lines describe change over time, bars compare quantities, histograms or box plots illustrate distributions, and so on. Groupings like these that cluster charts according to characteristics or usage are broadly referred to as visualization taxonomies.

Visualization taxonomies play an important role in data visualization theory and design. These taxonomies — like in any other domain — are powerful organizational tools. They can be a useful starting place when you begin conceptualizing a chart and are essential for creating rules-based chart generation tools. However, while they excel in predictability, they do little to inspire the kind of out-of-the-box thinking needed to create custom visualizations that wow and entertain your reader.

Fortunately, there’s another systematic way to approach chart design that provides more flexibility than a taxonomy while ensuring that the resulting figure is interpretable. In the academic visualization world, we know this to be the Grammar of Graphics coined by Leland Wilkinson in his 1999 book, The Grammar of Graphics. Wilkinson’s framework teaches us how to think about graphs as a collection of components that snap together into a final “grammatically correct” visualization.

You are likely more familiar with this framework than you realize. There is a reason that we’re able to understand graphs, and that’s because they tend to adhere to the same basic recipe. Charting tools and visualization languages, too, often reflect the grammatical nature of visualizations in their user interfaces and syntaxes. When you create a visualization in PowerBI or Tableau, for example, the grammatical elements of the figure are assembled predominantly through GUI selections. Code-based visualization languages like D3.js, Vega, and ggplot2 make use of a layered approach that allows for even greater customization. The “gg” in Hadley Wickham’s R plotting library ggplot2 is even a direct reference to the Grammar of Graphics, albeit his own variation.

In this article I’ll walk through a high-level overview of visualization grammar so that you, too, can start thinking compositionally about your charts. This particular framing is based on the ggplot2 interpretation of visualization grammar, but I’ve also taken some liberties to explain it concisely.

Data visualizations invariably require some kind of data — this should be no surprise. For the purposes of visualization, however, we care more about the shape of the data than the actual values themselves. That’s how we are able to create dashboards that update when new data are supplied.

Image for post
Image for post
Data are certainly important in a data visualization, but the shape of the data is more grammatically relevant to the chart than the values themselves.

In order to cut a data set that is the correct shape for your visualization, you’ll first need to consider what you would like to say with your data. I won’t cover that entire thought process in this piece, but take a look at the article I wrote about data visualization “therapy” for some tips on how to get at your main message:

The predominant shape or shapes you select will vary depending on the message you’re trying to convey. There are some safe rules of thumb, but bear in mind in mind that exceptions abound. That said, lines tend to be good at emphasizing change within groups and points tend to do well at demonstrating relationships between discrete entities. Box-and-whiskers or violins do a nice job of displaying distributions. Rectangles are an easy choice for comparing quantities.

Image for post
Image for post
The geometric object you choose to display your data will influence how your message is interpreted.

Once you’ve selected the shape that will hold your data, think about how those shapes will change based on the values. That could mean associating colors to categories or intensities, size to quantity, or any other number of visual traits that will encode the variation in your data.

Image for post
Image for post
Aesthetic mappings connect visual traits to variation in your data.

You’ve already used aesthetic mappings to establish relationships between characteristics in the data and visual attributes like size and color. Scales take that a step further by defining the specifics of these relationships. For example, you might decide to map groups in your data to colors using a categorical scale of pink, beige, and brown. In this case the legend would represent the categorical color scale. Similarly, you could map quantity to vertical distance using a scale of 0 to 10 where the y-axis is analogous to a legend for quantity.

Image for post
Image for post
Scales are the exhaustive range of possibilities that define the aesthetic mappings between geometric objects and visible qualities like color, size, and distance.

How will each element be positioned relative to the others? Position adjustment gives you control over the shared spaces in which your geometric objects reside. Stacking or dodging bars are common examples of position adjustment. You might also use “jittering,” which is the process of adding a bit of aesthetically stochastic noise to points in a scatterplot to minimize overlap in dense areas.

Image for post
Image for post
Position adjustments prevent geometric elements from overlapping and can help emphasize different comparisons.

Are you showing aggregated quantities across groups? What about means, or perhaps a smoothed line? These are examples of operations performed on data called statistical transformations. In essence, if it isn’t a scatterplot or an unadulterated line chart, some statistical transformation has come into play.

Image for post
Image for post
Statistical transformations are operations on the raw data that result in a different display format.

Unlike statistical transformations that perform operations on the data themselves, the coordinate system you choose changes how the relationships between your data points are mathematically defined. For example, moving a bar chart from the Cartesian plane into polar coordinates results in a pie chart.

Image for post
Image for post
Most graphics are draw in the Cartesian coordinate system, but other mathematical spaces are fair game, too!

Depending on the comparisons you are trying to make in the graph, you might want to use facets, also known as small multiples. Faceting comes in handy when you’d like to see patterns within subsets of the data relative to a common scale. Faceting side-by-side allows for comparisons with a shared y-axis, whereas displaying stacked facets allows for comparison across a shared x-axis. The latter is particularly useful if you are showing several groups’ behavior over the same time period.

Image for post
Image for post
Facets allow you to compare groups that share a common scale.

Moving beyond two dimensions in space is a matter of mapping a third aesthetic to an axis-based scale. That’s it! If your data have a variable that works for faceting, you’ve really already created a 3D graphic. Similarly, if you are using color, size, or another continuous aesthetic you can attribute it to an axial scale.

Image for post
Image for post
3D spaces are 2D spaces with an additional scale — and that scale could be a continuous axis or more akin to a facet.

Like 3D graphics, animation is a special case of using time as an aesthetic mapping. You can think of animation as iteratively viewing facets over time.

Image for post
Image for post
You can think of animation as employing time as an axis.

There’s nothing wrong with making a graph from a template or solution from a charting library. But, if you are hoping to create a bespoke visual that illustrates your point just so then you’ll need to think a bit more systematically. Frameworks like the Grammar of Graphics guide you through the design process toward an end result that stands out through the forest of bars. Because ultimately, data visualization should inspire communication, not constrain it.

Thinking about leveling up your team’s visualizations? Take a look at this article on the art of branding in data visualization to see how our team at Microsoft achieves a more consistent aesthetic:

Nancy Organ is on LinkedIn.

Data Science at Microsoft

Lessons learned in the practice of data science at…

Nancy Organ

Written by

Nancy is a Data Visualization Developer at Microsoft. She previously worked in medical research and leads the Data Viz Jam Sessions Meetup group.

Data Science at Microsoft

Lessons learned in the practice of data science at Microsoft.

Nancy Organ

Written by

Nancy is a Data Visualization Developer at Microsoft. She previously worked in medical research and leads the Data Viz Jam Sessions Meetup group.

Data Science at Microsoft

Lessons learned in the practice of data science at Microsoft.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store