What Does “Visualization Literacy” Mean, Anyway?

tl;dr: We keep using the word “literacy” to guide us in how we design and evaluate visualizations. I don’t think that’s always a helpful word for us, since it carves out our audience into the “literate” and the “illiterate.” I think we need to include more pedagogical elements in our charts, and be mindful of our audiences, rather than assume that there are lots of people who are permanently out of reach.

Literacy as a Measure

1000 squares representing “true werewolves” and “false werewolves”, “true villagers” and “innocent villagers”
A unit visualization of a Bayesian reasoning problem. 1,000 people live in a village. A few of them are werewolves. Most werewolves are allergic to silver. However, some innocent villagers are allergic to silver too. Because there are so few werewolves, just testing to see if somebody is allergic to silver is not a good test: the number of innocents who are allergic would outnumber the detected werewolves by more than 12 to 1.

I remember having an “a-ha!” moment when I read Alvitta Ottley et al.’s paper from a few years ago, “Improving Bayesian Reasoning: The Effects of Phrasing, Visualization, and Spatial Ability.” You see, there’s been a long effort to get people to perform better on Bayes’ Rule related tasks. For instance, questions like this:

1% of villagers are werewolves

The probability that a werewolf is allergic to silver is 80%

The probability that an innocent villager is allergic to silver is 10%.

If a villager is allergic to silver, what is the probability that they are a werewolf?

The answer turns out to be about 7.5%, but people are just awful at this. We anchor on that “80%” and assume that a true positive (a detected werewolf) is much more likely than a false positive (an unlucky villager). But werewolves are so rare that the innocent but allergic villagers outnumber them by a significant amount. It comes up a lot in situations like medical diagnoses (if you test positive for disease, what are the chances that you actually have it), and patients are wrong almost all of the time. Doctors are wrong a lot of the time too. They are still wrong a lot of the time even if you give them purportedly helpful visualizations, like the one at the top of this article.

But I mentioned the “a-ha!” moment. That was because, rather than just try out visualizations and see which ones helped, Ottley et al. looked at people’s spatial reasoning and, essentially, figured out how much spatial ability you had to have in order for the visualization to help you. In other words, some people were just so good at Bayesian reasoning and/or interpreting charts that any graphic, even if it was very complex, would assist them. Others didn’t get it right even with the simplest or most obvious graphic or problem phrasing. The success of the different designs was measured not in raw accuracy improvement, but how smart (well, a certain kind of smart) you had to be for the visualization design to “work.” I’d never thought of evaluating visualizations that way.

I bring up this anecdote because visualization is now mature enough as a field that we’re starting to think in a deep way about audience. Using batteries of tests from the psychology literature, or occasionally more complex measures, we try to quantify the limits of our audiences and what we can expect from them. We have started to pick out the shape of the beast and given it the name “visualization literacy” (or sometimes “graph fluency” or “chart IQ” or something in that vein).

However, I’m skeptical that there is some monolithic skillset picked out by “visualization literacy.” I think “literacy” is what AI pioneer Marvin Minsky called a “suitcase word”: a term where we just sort of threw a bunch of connotations in, closed the lid, and forgot about it. Is visualization literacy our ability to “decode” charts? Our exposure to novel forms of visual presentation? Is it a sub-type of numeracy, or just correlated with it? Visualization literacy is sometimes a useful metaphor to help us diagnose issues with our design, but I think that it constrains our thinking about the many skillsets and backgrounds can go into deriving meaning from visualizations. More dangerously, I feel that “literacy” sometimes functions as a word to shift blame to our audiences rather than ourselves for failing to provide the necessary tools to make use of the charts we design.

Chart Explainers

Minard’s map of Napoleon’s Russia campaign. The path of the army gets narrower as casualties increase.
Edward Tufte calls Minard’s map of Napoleon’s march to Moscow one of the best visualizations ever, but almost a full third of the chart space is devoted to explanatory text, some of which is redundant with the map. This is not a coincidence.

A good example of being mindful of audience are the statistical graphics that show up in the New York Times. The New York Times has a mandate to inform the public about important issues. Sometimes these important issues have a statistical component to them, like the uncertainty in election polling, or the fact that global warming increases both the mean and the variance of our daily temperatures. This latter fact could be conveyed in a single chart, but the New York Times includes 500 words or so of more information, including explanations of how to interpret the chart, and annotations as the chart changes form and content. They accompany a histogram of calorie counts of Chipotle orders with 1500 words to explain and contextualize the data. This is not because they are wasteful, but because they know that they need to convey this information across many audiences, and so must do their best to give as many people as possible the tools to properly interpret the charts using only information within the article itself. The chart and the text work together to convey the story, and function like a Rosetta stone to allow translation between and among different forms of knowing about the data.

Scott Klein, who studies (among other things) the history of graphics and charts in the media, has found examples of these textual chart explanations dating back to the 19th century. These were not just what we would now consider alien or idiosyncratic or “advanced” chart types, but even for now basic or elementary charts like line graphs. More recently, Jo Wood, Alexander Kachaev, and Jason Dykes, borrowing from Donald Knuth’s concept of “Literate Programming,” have argued for “Literate Visualization” where text and design interleave to document the creation and interpretation of visualizations. These textual explanations indicate that it’s very rare for a chart to “speak for itself” without a occasionally long and involved effort to provide context and explanation.

We sometimes forget that the visual language of charts is not innate or fundamental, but shaped by exposure and popularity. Scatterplots are one example: when Pew Research found that only 63% of people can correctly state the trend in a plot, or when someone came up to Alberto Cairo after one of his “Visual Trumpery” lectures and thanked him for finally explaining how to interpret a scatterplot, we should interpret those incidents as a wakeup call that our audiences are narrower than they appear at first glance, and that our jobs as educators extend past teaching people how to use ggplot or D3. It also suggests that it’s not enough to just propose and evaluate a novel design that we think will help with an important data communication problem, but that we have to circulate and popularize and evangelize our design, to add it to the visual vocabulary of our audience. The success of a design is not a function of its theoretical or even empirical efficiency but also its use. It doesn’t matter how good the engine is on a sports car if you never take it out of the garage.

Is Literacy The Right Word?

Whenever I think about “digital literacy” or “visualization literacy” or even concepts with longer pedigrees like “numeracy,” I’m reminded of Anne Frances Wysocki and Johndan Johnson-Eilola’s “Blinded by the Letter: Why Are We Using Literacy as a Metaphor for Everything Else?” They mention that when we use terms like “visualization literacy,” we bring over meanings from the print world that may not hold in these new contexts.

For instance, we often consider literacy to be a single binary skillset — one is either literate or illiterate. This elides the pedagogical structure of the sub-skills that make up learning to read, and also the many concepts (like engagement and the ability to contribute to the public sphere and so on) that are brought along for a ride with the notion of “literacy.” These conceptual hangers-on are even more inappropriate when they arise from haphazardly bundled suitcase concepts like “chart literacy.” Do I need to understand only basic charts in order to be chart literate, or do I need to be familiar with the odder exhibits in the visualization zoo? Do I also need to understand statistical concepts like confidence intervals and regression coefficients? If so, I might be in very select company, since even scientists don’t always interpret those sorts of things correctly. Do I need to be cognizant of all the many ways that charts and their underlying data can mislead, or is that skepticism distinct from interpretation?

I’m worried about the terminology of “literacy” for several reasons. First off, I’m worried that it will appear to absolve us from our duty to democratize data. If people (and, by some definitions, many or even most people) are chart illiterates, then we may feel tempted to write those groups off. We may prioritize the design of visualizations to help the creators of, say, machine learning models, from whom we can presume a sufficient level of visual and statistical literacy, rather than the populations who may be impacted by these models (sometimes unjustly). If what we mean by “visualization literacy” is narrow enough, or rare enough, then we’re already setting ourselves mental upper bounds for the number of people we’ll impact with our work.

Another worry I have is that a binary interpretation of chart literacy will discourage us from trying new designs. If, as I suspect, the process of “figuring out” a particular genre of chart is one that involves training or acclimatization or even repeated cultural exposure (as opposed to an all-or-nothing “decoding”), then we may try out designs that appear to have very little in the way of quantitative empirical benefit at first blush, but could be extremely useful once they have been established within some broader visualization milieu. The benefits of novel or unfamiliar designs might be subtle and require longitudinal studies rather than short term time-and-error evaluations. Steve Haroz et al.’s work on the relatively uncommon connected scatterplot has some of this flavor: connected scatterplots are difficult to interpret initially. But, after training and repeated exposure, connected scatterplots can highlight patterns in the data that would be difficult to see in other, more traditional charts. If we just keep repeating what works after short term exposure or after a page or two of simple instructions, we could miss out on opportunities to advance the field.

This is not to say that visualization literacy does not have its place as a useful metaphor when other terms fail us. As shown with the Bayesian example above, it might afford us new perspectives in thinking about how we design and deploy our visualizations. And we should work against the temptation to always demand more nuance, even when it is of limited utility. But here I am worried mostly about what we use literacy to do. We should use literacy to expand our borders, not contract them.

A Call To Action

The use (and mis-use) of data already has a profound impact on our daily lives. As visualization designers and researchers, we are well-positioned to control who does (and doesn’t) have the tools to interpret their data. To make sure that we aren’t only focused on the (chart) literati, we need to use every tool in our arsenal to communicate data, even to those who lack familiarity with visualization types whose interpretation we take for granted.

This means we might need to do more:

  1. Integration of charts with non-graphical explanations like tutorials, directions, and descriptions.
  2. Long-term exposure and longitudinal evaluation of novel designs.
  3. Outreach and pedagogy to build familiarity with analytics in the general public.

There are lots of very smart people engaged in these directions: this is not to say that the visualization community is not already working on widening our audiences. But I’m worried that a focus on an overly constrained definition of visualization literacy, and the resulting winnowing of people who are or are not the kind of literate we’re looking for, narrows our impact as a field and closes off avenues of research that are otherwise worth investigation.

Thanks to Jessica Hullman and Marti Hearst for their feedback on this post.

--

--