How the BBC Visual and Data Journalism team works with graphics in R
Over the past year, data journalists on the BBC Visual and Data Journalism team have fundamentally changed how they produce graphics for publication on the BBC News website. In this post, we explain how and why we have used R’s ggplot2 package to create production-ready charts, document our process and code and share what we learned along the way.
Data journalists on the BBC News’ Visual and Data Journalism team have been using R for complex and reproducible data analysis and to build prototypes for some time.
For example, we used R to extract, wrangle, clean and explore data from hundreds of spreadsheets on whether NHS targets are being hit, for the award-winning NHS tracker project. R was our go-to when in 2017 we analysed more than eight million residential property transactions in England and Wales for a project looking at how house prices have changed in real terms, a project that received an award from the Royal Statistical Society last year.
But when it comes to making graphics, it’s been another story.
We used R and in particular R’s data visualisation package ggplot2 for data exploration, to visualise patterns and help us understand the data and find stories. But we stopped short of building charts in the BBC News graphics style ready for publication on the site.
To create graphics to accompany stories on the BBC News website, we had two main options: if there was enough time, we could commission graphics from our design team. If we needed a quick turnaround, opt for our in-house chart tool instead.
In the first months of 2018 some curious members of the data team started experimenting, diving deep into the internals of the ggplot2 package in a bid to figure out how close we could get to replicating the BBC’s in house style.
In March last year, we published our first chart made from start to finish using ggplot2.
Since then, change has been quick.
ggplot2 gives you far more control and creativity than a chart tool and allows you to go beyond a limited number of graphics. Working with scripts saves a huge amount of time and effort, in particular when working with data that needs updating regularly, with reproducibility a key requirement of our workflow.
In short, it was a game changer, so we quickly turned our attention to how best manage this newly-discovered power.
We needed to find a good way of collecting and sharing all the knowledge we amassed in a way that our whole team could make use of, and work out a simple and easily reproducible workflow for making graphics from start to finish that had a consistent feel to them.
We adopted a two-pronged approach to working with graphics in R: placing the solutions to those puzzles in a package we are calling ‘bbplot’ as well as our team’s R cookbook, a ggplot reference manual. Now we’re open sourcing both.
What does the ‘bbplot’ package do?
The package was developed to deal with all recurring obstacles, simplifying the workflow of adding objects that need to be in all charts.
When we started working in R, every time you made a chart you had to tweak every individual element to get from the default ggplot style to in-house BBC style.
Saving it as a function was the first obvious way to simplify our lives.
There were lots of similar puzzles to figure out: How could we add in the BBC logo, with the correct dimensions regardless of aspect ratio of the chart you wanted to export? How could we align the chart title to the top left? You get the picture.
Working closely with the designers in the Visual and Data Journalism unit, we solved these puzzles one by one, putting the solutions in functions that would be easy to reproduce.
The next step was to collect these solutions in one place, for consistency and to make everything as easily reproducible as possible — enter bbplot.
Early on we discussed how much to add to the package. Should we create functions to make specific chart types? How about always including horizontal gridlines in our line charts? Is it a good idea to pre-select the colours of a stacked bar chart to match our design colour palette?
We resisted the temptation to be too prescriptive and coming up with one-size-fits-all solutions for every potential question that was likely to come up when creating graphics.
Our goal was for the package to contain only the functions for things you have to do for every graphic, to make the workflow simple, without making it inflexible, as the flexibility is the real benefit of using ggplot2.
The idea was that bbc_style(), the function we created for changing ggplot2’s default appearance to our in-house style, should get you 90% of the way, leaving you in control to make any additional tweaks to your chart, rather than it feeling akin to a chart tool that just presented you with a finished graphic and with little room for manoeuvre.
Why the cookbook?
The cookbook, meanwhile, is a guidebook in which we gather the team’s collected knowledge of ggplot2. A reference manual, rather than a tutorial, it might not tell you how to make your very first chart in R, but is a useful collection of little tips and tricks.
The idea is that whenever a member of the data team solves a specific problem, like adding curved arrows to a plot or highlighting a single bar in a bar chart, the code is added to the cookbook to save you and colleagues time next time.
Team members have been able to turn to the cookbook to find answers and solutions when building a graphic — like how to make a particular type of chart, say a dumbbell chart or how to add a text annotation to your plot. On the other hand, the package automatically takes care of the solutions that are needed for every graphic you make — like adding the BBC logo.
We wanted to make things simple, but it was also important to keep the freedom and control that comes with scripts.
What have we learnt?
Working with graphics in this way has provided many benefits.
The focus on creating a reproducible workflow means we can create as many charts as possible completely in R, without having to open them up in a different program to add finishing touches, and collecting all our knowledge in one place made it easy to spread to team members less comfortable with using R.
Most important, perhaps, was the team work: we made exponential gains in knowledge by pooling our efforts and sharing our skills. Because developing our use of R was not one person’s sole responsibility, but rather shared among several people on the data team experimenting in parallel, our collected knowledge grew much faster than it would otherwise have done.
Teaching others — an unintended consequence
Another key benefit of creating production-ready graphics using ggplot2 was not something we necessarily planned for when we set off down this route.
Positive reactions from colleagues from other parts of the team led us to develop a six-week course internally, to get people up to speed with the very basics of using R and making graphics using bbplot and the cookbook.
The course is not a magic formula to teach people everything about R in six short lessons, but rather aims to get people completely new to R accustomed to what it is. Each week we introduce one concept, talk it through with them and point them to online tutorials they can complete over the course of the week. We have a Slack channel for course participants where they can discuss issues or ask for help.
Over the six weeks, participants go through how to load data into R, different data types, some very basic data manipulation and analysis in R using tidyverse packages as well as an introduction into ggplot2. At the end of the course, they are given a challenge to produce a basic chart from raw data using all the different skills, concepts and code they learnt.
The course concludes with a three-hour workshop on how the bbplot package works and how to use our R cookbook effectively. We have found that showing people the cookbook and the graphics they should be able to produce within six weeks, doesn’t make it easier to learn R — but there is much more of an incentive to dive into it with a target in mind, knowing how it can help them in their day-to-day work.
One of the major successes of running the course for our colleagues has been how that has stimulated some participants to keep using R and improving their ggplot2 knowledge.
Many of them now have a better understanding as to how and why certain things work beyond copying code we prepared for them and are now producing charts that were not part of the cookbook recipes with progressively less hands-on help from the data team.
The next step? To see everyone on the team adding recipes to the cookbook and committing code to Github.