The Curious Absence of Uncertainty from Many (Most?) Visualizations

All I want for Christmas is some error with those estimates

Jessica Hullman
Dec 23, 2019 · 10 min read

Consider the last few visualizations you encountered outside of scientific publications, whether in a news article backed by data, in researching a product, or in using an application to plan a trip. Did the visualizations depict uncertainty–the possibility that the observed data or model predictions could take on a set of possible values?

A chart from the U.S. Fed’s 2019 report implies that the public holds exactly 77.64% of the the U.S. nominal GDP (which is itself an enormous number typically reported without error). When uncertainty is withheld, visualizations imply unrealistic precision that viewers may or may not question.

Everybody’s talking about it, nobody’s doing it

What’s behind this omission? I wondered. Uncertainty communication is an increasingly hot topic in research and practice. As a visualization researcher, I’m well aware of the many techniques that have been proposed for representing uncertainty. There are the standard, STATS 101 class representations of probability via intervals (confidence, standard deviation standard error) or plotted density functions (summarized here). Then there are newer techniques that apply a frequency framing (e.g., 3 out of 10 rather than 30%), which psychologists like Gerd Gigerenzer have shown can improve Bayesian reasoning, to visualizations. Some of these, like hypothetical outcome plots (HOPs), can be applied to pretty much any visualization so long as you can generate draws from the distribution you want to show, leaving little room for excuses for omitting uncertainty from already complex visualizations.

So I began talking to visualization authors to find out more. I wanted to know how they thought about the value of uncertainty or error information in presenting estimates, how often and how they presented it, where they struggled to calculate or convey it, and how they perceived convention or norms around uncertainty in data visualization. First I surveyed about 90 professionals — from journalism, industry, research and education, and the public sector — who create visualizations as a regular part of their job.

I also did more in-depth interviews with some really smart people: 13 visualization designers, graphic editors, and general influencers in the world of data visualization whose work I personally respect.

Contradictions between design aspirations and practice

What I want to discuss here is one particular tension that I observed again and again in responses. That tension consists of two facts:

  1. Most authors responded positively when asked about the value of conveying uncertainty in visualizations. They understood that uncertainty was beneficial to viewers for making decisions and calibrating their confidence. Some described conveying it as an author’s responsibility.
  2. Most authors confessed to NOT including uncertainty in the majority of the visualizations they create.

Omitting uncertainty is a norm. Here’s why that might be.

Premise: Uncertainty omission is a norm

Tenet 1: Visualizations convey signal

According to many of the authors I surveyed and interviewed, the function of a visualization intended for other people to consume is to convey some “message” or “signal”. As one graphics editor described, the goal of their team was to focus on “one key takeaway”, with a few sub-messages possibly folded in. Multiple others talked about using the design process to tune the visibility of signal to viewers.

The way that authors talked about the message or signal implied that it had a truth value that could be objectively defined. Where does this truth value come from, I wondered?

Tenet 2: The author’s analysis process ensures the signal in a visualization is valid, both from the author’s and the viewers’ point of view. Viewers don’t need to see much evidence of the analysis process, because their trust in the author’s credibility is a priori.

When asked how they knew a signal was valid, authors typically alluded to their process, saying things like “well, we vet things.” In a few cases they referred to hypothesis testing, but more often alluded to exploratory data analysis and discovery through graphing and inspecting the data.

And not only did authors trust this process, but many implied that their viewers did as well. “I would say that you want trust established before you show uncertainty,” explained one interviewee when I asked how audiences knew the message in a graph was valid. When asked if they had seen audiences asking for uncertainty communication in data scientists’ presentations, another interviewee responded “They don’t care in the short term. Uncertainty is a long-term strategic problem.

Tenet 3: Uncertainty obfuscates signal

Many interviewees and survey respondents suggested that uncertainty was separate from the message or signal of a visualization. Because it was separate, uncertainty was capable of challenging the signal, of obfuscating it. Uncertainty could compete with the signal for the viewer’s attention, or confuse them. If error was too large, it could threaten the validity of the signal statistically. Finally, uncertainty could signal a lack of credibility given the current norm of omission. A survey respondent described how their work “would not be accepted by politicians because other analysts never describe uncertainty, so my more robust visualizations appear less valid if I am transparent.” An interviewee described how adding uncertainty and other methodological information could be “off-putting” to a viewer: “sometimes it makes you think that people are trying too hard. There must be something even more that they’re not telling me about.

Escaping a bad equilibrium

Solution: Formalizing how graphs communicate

Visualization as test statistic

The nice thing about this analogy is that it can guide how we think about the mechanics of visualization judgments, both under uncertainty visualization and without it. After all, a model check has a specific definition in statistics. For example, if the viewer is engaging in a model check akin to a visual hypothesis test, we need a null hypothesis specifying what the absence of a pattern looks like, a set of assumptions about how the data was generated, a test statistic summarizing how extreme the data are in light of the null hypothesis, and a criterion for judging that extremeness (like the conventional p-value less than 0.05 threshold in Frequentist statistics).

Imagine an author presents a chart called “Irelands debt grows quickly”, implying that the change in debt-to-GDP ratio for Ireland is different from that of the other countries shown. We can think of the visualized data as a test statistic, a summary from which we can to judge the data’s extremeness in light of some model’s predictions.

Now think for a moment about the hidden (implicit) mental process the viewer might go through when judging the data in the chart. A model of statistical graphical inference proposes that the viewer mentally compares the visualized data to an imagined reference distribution representing the predictions of some model¹. But what model, fit to what data?

Here, we’re on our own, as no statistician can tell us exactly what magic happens when all of the prior experiences and chart reading habits of a person collide with a visual representation of data that they consume. And yet decades of research in visualization, not to mention cognitive psychology, have implied some commonalities in what people conclude from graphs. It seems safe to assume that at least among some viewers, graphical inference processes will be similar.

We might guess, for example, that a viewer imagine taking draws from some sort of linear model fit to all of the other lines besides Ireland in the chart, kind of like a simulated cross validation. In imagining these simulated trends, which you can picture as a kind of envelope surrounding the lines other than Ireland, the viewer compares them to the trend for Ireland, noting the discrepancy in slopes. Just like a conventional hypothesis testing scenario involves computing a p-value and applying a threshold to it to judge whether a difference qualifies as statistically significant or not, a visualization viewer might apply a mental discrepancy function with an associated internal threshold on how much “pattern” is enough to conclude a difference exists.

Uncertainty visualization can help authors communicate a message

Well, we know that visualization authors care a lot of about getting their message, or signal, across. That’s why uncertainty is perceived as so dangerous. And yet, according to the graphical inference model, a viewer recognizes a signal in a visualization by hypothesizing a reference distribution, a distribution constructed by imagining uncertainty. Whether or not this process is conscious, a viewer cannot imagine a model without acknowledging probability (or likelihood). Seeing what other values the visualized data might have taken —whether through a conventional or more novel visual representation of uncertainty — provides directly relevant information for guiding the viewer’s inference.

Why would an author be content to let the viewer mentally “fill in” the gaps, when they could exert more control over this process by representing uncertainty explicitly? My conversations with authors led me to believe that most are quite sensitive to the ways that small decisions in design impact the mental “filling in” or conclusions of their viewers. Often this occurs through considerations like how to scale a y-axis, or what data to include for context.

What may be less obvious though, is how visualizing uncertainty can help one communicate an intended message. For example, even if the viewer imagines a slightly different model than the author, visualizing uncertainty can convey what other values are possible for the data that is shown — adding some consistency to the data that viewers mentally “fit” a model to. This should reduce degrees of freedom in the inference process, bringing the viewer’s inferences closer to the author’s goal. Even better, the author could visualize uncertainty that directly corresponds to the model they want the viewer to infer, reducing many degrees of freedom.

Thinking more deeply about the process a viewer engages in when they examine a visualization may be the key to recognizing the value of depicting uncertainty. One visualization designer I interviewed seemed to sense this, saying:

What task is the reader going to go through when they look at this chart? The more we do think about that, the more likely it is we might start thinking about the importance of putting uncertainty into the charts

However, it can be challenging to identify an inference task in contexts where visualizations are intended simply to inform. That’s where visualization research might help. In addition to producing tools and techniques for calculating, visualizing, and explaining uncertainty, researchers could help authors specify goals for, and predict, viewer’s inferences, and reason about the implications of omitting uncertainty. As applied as the field of data visualization may seem, we need theory to guide the empirical data we collect and the tools that we build. A deeper understanding of how uncertainty can help visualization authors communicate could lead to a world in which presentations of data are a bit more honest. And who doesn’t want that?

  1. In a Bayesian framework, the reference distribution is a set of draws from the posterior predictive distribution, the distribution of the outcome variable implied by a model that used the observed data to update a prior distribution of beliefs about the unknown model parameters (Gelman 2003, 2004).

Thanks to members of the MU Collective for feedback on this work.

Multiple Views: Visualization Research Explained

A blog about visualization research, for anyone, by the people who do it. Edited by Jessica Hullman, Danielle Szafir, Robert Kosara, and Enrico Bertini

Jessica Hullman

Written by

Assistant Professor, Computer Science (and Journalism!) at Northwestern University. I research data visualization and the communication of uncertainty.

Multiple Views: Visualization Research Explained

A blog about visualization research, for anyone, by the people who do it. Edited by Jessica Hullman, Danielle Szafir, Robert Kosara, and Enrico Bertini

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade