Remember that truly stellar bar chart you saw a few months ago? Truthfully, no one else does either, and that’s a problem for those of us in the business of communicating data with impact.
We often describe chart design as a dichotomous process of selecting specific charts for specific purposes: Lines describe change over time, bars compare quantities, histograms or box plots illustrate distributions, and so on. Groupings like these that cluster charts according to characteristics or usage are broadly referred to as visualization taxonomies.
Visualization taxonomies play an important role in data visualization theory and design. These taxonomies — like in any other domain — are powerful organizational tools. They can be a useful starting place when you begin conceptualizing a chart and are essential for creating rules-based chart generation tools. However, while they excel in predictability, they do little to inspire the kind of out-of-the-box thinking needed to create custom visualizations that wow and entertain your reader.
Fortunately, there’s another systematic way to approach chart design that provides more flexibility than a taxonomy while ensuring that the resulting figure is interpretable. In the academic visualization world, we know this to be the Grammar of Graphics coined by Leland Wilkinson in his 1999 book, The Grammar of Graphics. Wilkinson’s framework teaches us how to think about graphs as a collection of components that snap together into a final “grammatically correct” visualization.
You are likely more familiar with this framework than you realize. There is a reason that we’re able to understand graphs, and that’s because they tend to adhere to the same basic recipe. Charting tools and visualization languages, too, often reflect the grammatical nature of visualizations in their user interfaces and syntaxes. When you create a visualization in PowerBI or Tableau, for example, the grammatical elements of the figure are assembled predominantly through GUI selections. Code-based visualization languages like D3.js, Vega, and ggplot2 make use of a layered approach that allows for even greater customization. The “gg” in Hadley Wickham’s R plotting library ggplot2 is even a direct reference to the Grammar of Graphics, albeit his own variation.
In this article I’ll walk through a high-level overview of visualization grammar so that you, too, can start thinking compositionally about your charts. This particular framing is based on the ggplot2 interpretation of visualization grammar, but I’ve also taken some liberties to explain it concisely.
Data visualizations invariably require some kind of data — this should be no surprise. For the purposes of visualization, however, we care more about the shape of the data than the actual values themselves. That’s how we are able to create dashboards that update when new data are supplied.
In order to cut a data set that is the correct shape for your visualization, you’ll first need to consider what you would like to say with your data. I won’t cover that entire thought process in this piece, but take a look at the article I wrote about data visualization “therapy” for some tips on how to get at your main message:
The predominant shape or shapes you select will vary depending on the message you’re trying to convey. There are some safe rules of thumb, but bear in mind in mind that exceptions abound. That said, lines tend to be good at emphasizing change within groups and points tend to do well at demonstrating relationships between discrete entities. Box-and-whiskers or violins do a nice job of displaying distributions. Rectangles are an easy choice for comparing quantities.
Once you’ve selected the shape that will hold your data, think about how those shapes will change based on the values. That could mean associating colors to categories or intensities, size to quantity, or any other number of visual traits that will encode the variation in your data.
You’ve already used aesthetic mappings to establish relationships between characteristics in the data and visual attributes like size and color. Scales take that a step further by defining the specifics of these relationships. For example, you might decide to map groups in your data to colors using a categorical scale of pink, beige, and brown. In this case the legend would represent the categorical color scale. Similarly, you could map quantity to vertical distance using a scale of 0 to 10 where the y-axis is analogous to a legend for quantity.
How will each element be positioned relative to the others? Position adjustment gives you control over the shared spaces in which your geometric objects reside. Stacking or dodging bars are common examples of position adjustment. You might also use “jittering,” which is the process of adding a bit of aesthetically stochastic noise to points in a scatterplot to minimize overlap in dense areas.
Are you showing aggregated quantities across groups? What about means, or perhaps a smoothed line? These are examples of operations performed on data called statistical transformations. In essence, if it isn’t a scatterplot or an unadulterated line chart, some statistical transformation has come into play.
Unlike statistical transformations that perform operations on the data themselves, the coordinate system you choose changes how the relationships between your data points are mathematically defined. For example, moving a bar chart from the Cartesian plane into polar coordinates results in a pie chart.
Depending on the comparisons you are trying to make in the graph, you might want to use facets, also known as small multiples. Faceting comes in handy when you’d like to see patterns within subsets of the data relative to a common scale. Faceting side-by-side allows for comparisons with a shared y-axis, whereas displaying stacked facets allows for comparison across a shared x-axis. The latter is particularly useful if you are showing several groups’ behavior over the same time period.
What about 3D?
Moving beyond two dimensions in space is a matter of mapping a third aesthetic to an axis-based scale. That’s it! If your data have a variable that works for faceting, you’ve really already created a 3D graphic. Similarly, if you are using color, size, or another continuous aesthetic you can attribute it to an axial scale.
Like 3D graphics, animation is a special case of using time as an aesthetic mapping. You can think of animation as iteratively viewing facets over time.
There’s nothing wrong with making a graph from a template or solution from a charting library. But, if you are hoping to create a bespoke visual that illustrates your point just so then you’ll need to think a bit more systematically. Frameworks like the Grammar of Graphics guide you through the design process toward an end result that stands out through the forest of bars. Because ultimately, data visualization should inspire communication, not constrain it.
Thinking about leveling up your team’s visualizations? Take a look at this article on the art of branding in data visualization to see how our team at Microsoft achieves a more consistent aesthetic:
It’s all yours: The why and how of visualization branding
You might typically associate the word branding with consumer products like breakfast cereals, clothes, or cars, but…
Nancy Organ is on LinkedIn.