Behind the Build: Data Sketching in R

Why and how to sketch in code

Claire Santoro
Graphicacy
5 min readAug 4, 2022

--

In this blog series, we go behind the scenes of Graphicacy’s data visualization projects to explore the design and engineering details that made them effective.

Recently, a client asked if Graphicacy could take a preliminary dataset from their research and generate maps — 32, to be exact — from which they would pick a few to include in a progress report to their funder. These maps would be the client’s first view of their data, so they weren’t sure yet which they wanted to share.

This presented a challenge: I needed to create a large number of similar visualizations in a short period of time, knowing that the client was ultimately going to disregard most of them.

Grid of 8 maps of the US, some showing county level data and some showing state level data, in a light-dark blue gradient
A small number of the maps created (data has been changed for this blog post)

Part of being an effective and efficient data visualization designer is knowing the right tools to use for the job. In this case, I knew immediately that I should approach this project by “sketching” in code. Typically, I make static data visualizations using a mix of tools: R (ggplot2), Excel, QGIS, or a data visualization tool like Flourish to create the basic chart view, and Adobe Illustrator to polish colors, fonts, and layout. But manually editing 32 maps in Illustrator was impractical. I needed a single map design that could be generated — and recreated 32 times — entirely in code.

Curious whether sketching in code is the right choice for a project? Here are my considerations.

Why Sketch in Code

There are some advantages to visualizing data in code:

  • The ability to quickly try out different chart views or design aesthetics.
  • The ability to change chart views and design aesthetics independently of each other (i.e., minimal loss of work when changing one aspect of the visualization).
  • The ease of updating underlying data.
  • The ease of swapping out variables to generate multiple, similar visualizations quickly.
  • The ability to export charts in svg format, if you need to make manual edits or provide the image to a developer.

This particular project was a good candidate for code because the client was working with preliminary data that would eventually change and they needed a large number of similar visualizations on a tight timeframe, but they still wanted the opportunity to provide feedback on the draft visualizations.

Of course, for other projects, sketching in code may not be a good choice. The main disadvantage is that I can’t make the tiny, pixel-perfect adjustments in R that I would normally make in Illustrator. That limits my design options and requires a slightly different design strategy.

How to Approach Sketching in Code

When data sketching in R, I take a different approach than when sketching by hand or in Illustrator — in particular, I try to think strategically and then simplify.

Think Strategically

“Sketching” generally means that I’ll be experimenting with chart views or design aesthetics to find the right fit. When doing this in code, I start by asking myself whether the data is likely to change between iterations, so that I can pull out key information — like variable names and corresponding chart titles — as inputs. This keeps the design decisions separate from the data, where they’re easier to tweak.

For this project, I knew I would be creating 32 maps: one for each of four variables, at two levels of aggregation, at two geographic scales, from two different data sources. I wanted the code to do as much of the work as possible, so I specified the dataset, variable name, chart title, and chart caption upfront as inputs. That kept my visualization choices in their own code block, which focused on map styling.

Because minor differences in the datasets meant that I wasn’t able to use a single code block to make all 32 maps, keeping the visualization code separate from the dataset-specific code also meant that I wasn’t too worried about introducing errors when carrying through design edits later on.

Two US maps side by side. On the left, a dark, poorly formatted map with the legend overlapping the map. On the right, a lighter map with an easier to read color scale and improved formatting of the text and layout.
Before and after: ggplot’s default map view (L) and the map view presented to the client (R) after editing the styling in code

Simplify

R is not a graphics editor, which means that certain decision choices are just more difficult to implement in code. For this reason, when sketching in code, I like to keep things simple. This means:

  • Avoiding layering information. Layering information, such as combining multiple chart views or adding annotations on top of a chart, is one of the hardest things for me to implement in R. (Others with more advanced skills may disagree.) When sketching in code, I typically stick with a default layout that has a single chart in the middle and a few basic elements (title, subtitle, caption, legend) around it. It’s relatively easy to tweak the positioning of those elements, but it’s much more complicated if you want to layer them on top of each other.
  • Embracing a minimalist style. Selecting specific colors, using custom fonts, and adding icons or imagery are also more difficult in code than in Illustrator. To minimize the need for work-arounds, I try to pick from ggplot’s built-in color schemes and fonts, and I keep other design elements to a minimum. I typically specify one of the more minimalist ggplot themes (like theme_void or theme_minimal) to quickly remove plot borders, gridlines, axes, and background colors. If the visualization warrants additional embellishment, I can always add it in Illustrator later, after the client has signed off on the general approach — but for data sketches, “simple” is usually good enough.
  • Minimizing text. Clients often request multiple rounds of text edits because it’s critical to get the details of their research right. To cut down on edits and custom text that might apply only to a single chart, I keep the text short.
Map of the US showing county level data in a light-dark blue gradient. Title: “Per Capita Lifetime Spending on Avocado Toast by County: Median." Most southern counties are on the lower end of the scale and a small number of counties in the northeast are at the high end.
Example map demonstrating a default layout, minimalist style, and short text (data changed for blog post)

Conclusion

Although sketching in code won’t be a good choice for every project, it was absolutely right for this particular project. When I sent the set of 32 maps, the client spent a few minutes looking through them and was immediately able to identify the four that they wanted to include in their presentation. Even though 28 of the 32 charts I built will never see the light of day, sketching in code meant that I didn’t waste effort and could focus instead on the design decisions that mattered to the client.

Part of building design expertise is pausing to consider the best approach for a particular project. Sketching in code is just one tool in Graphicacy’s design toolbox, but it can save a tremendous amount of time and effort in the right situation.

Graphicacy partners with clients to tell engaging stories with data. Graphicacy’s team combines storytelling, thoughtful human-centered design, and deep technical capabilities to build and deploy strategic, data-rich digital projects. Graphicacy has created data visualizations and infographics for top-tier organizations and companies, domestically and internationally, including the Bill & Melinda Gates Foundation, the World Bank, Google, the Johns Hopkins Coronavirus Resource Center, the Center for American Progress, the Anti-Defamation League, and many others.

--

--

Claire Santoro
Graphicacy

Environmental analyst, science communicator, data viz designer. www.cesantoro.com