The Curious Absence of Uncertainty from Many (Most?) Visualizations
All I want for Christmas is some error with those estimates
Consider the last few visualizations you encountered outside of scientific publications, whether in a news article backed by data, in researching a product, or in using an application to plan a trip. Did the visualizations depict uncertainty–the possibility that the observed data or model predictions could take on a set of possible values?
Everybody’s talking about it, nobody’s doing it
Chances are, they did not. Out of curiosity, I gathered data visualizations from nearly all articles published online in February 2019 by leading purveyors of data journalism, social science surveys, and economic estimates. Of 612 visualizations, the majority (73%) presented data intended for inference, such as extrapolating to unseen or future data rather than simply summarizing exactly what happened in the past. Yet only 14 (3%!) portrayed uncertainty visually. And that number includes visualizing raw data to convey variance, in addition to more conventional representations like intervals (error bars) or probability densities.
What’s behind this omission? I wondered. Uncertainty communication is an increasingly hot topic in research and practice. As a visualization researcher, I’m well aware of the many techniques that have been proposed for representing uncertainty. There are the standard, STATS 101 class representations of probability via intervals (confidence, standard deviation standard error) or plotted density functions (summarized here). Then there are newer techniques that apply a frequency framing (e.g., 3 out of 10 rather than 30%), which psychologists like Gerd Gigerenzer have shown can improve Bayesian reasoning, to visualizations. Some of these, like hypothetical outcome plots (HOPs), can be applied to pretty much any visualization so long as you can generate draws from the distribution you want to show, leaving little room for excuses for omitting uncertainty from already complex visualizations.
So I began talking to visualization authors to find out more. I wanted to know how they thought about the value of uncertainty or error information in presenting estimates, how often and how they presented it, where they struggled to calculate or convey it, and how they perceived convention or norms around uncertainty in data visualization. First I surveyed about 90 professionals — from journalism, industry, research and education, and the public sector — who create visualizations as a regular part of their job.
I also did more in-depth interviews with some really smart people: 13 visualization designers, graphic editors, and general influencers in the world of data visualization whose work I personally respect.
Contradictions between design aspirations and practice
What did I find out? That authors face challenges in explaining, in addition to calculating and visualizing, uncertainty; that they sometimes choose visual encodings to convey “fuzziness” without quantifying uncertainty (circular area, for example, which is hard for people to accurately read); that they worry about confusing audiences with uncertainty or worse, getting it wrong themselves. For more details like these, check out the IEEE VIS 2019 paper.
What I want to discuss here is one particular tension that I observed again and again in responses. That tension consists of two facts:
- Most authors responded positively when asked about the value of conveying uncertainty in visualizations. They understood that uncertainty was beneficial to viewers for making decisions and calibrating their confidence. Some described conveying it as an author’s responsibility.
- Most authors confessed to NOT including uncertainty in the majority of the visualizations they create.
Omitting uncertainty is a norm. Here’s why that might be.
If most authors are positive about uncertainty, and agree it should be visualized more often with data, why aren’t they doing it? I listened closely to what each author said, as they described what gets in the way of conveying uncertainty, why they think others might not be doing it, and how they perceive their audiences and their own responsibilities as data mediators. Through their responses, I glimpsed a system of beliefs behind omission.
Premise: Uncertainty omission is a norm
Tenet 1: Visualizations convey signal
According to many of the authors I surveyed and interviewed, the function of a visualization intended for other people to consume is to convey some “message” or “signal”. As one graphics editor described, the goal of their team was to focus on “one key takeaway”, with a few sub-messages possibly folded in. Multiple others talked about using the design process to tune the visibility of signal to viewers.
The way that authors talked about the message or signal implied that it had a truth value that could be objectively defined. Where does this truth value come from, I wondered?
Tenet 2: The author’s analysis process ensures the signal in a visualization is valid, both from the author’s and the viewers’ point of view. Viewers don’t need to see much evidence of the analysis process, because their trust in the author’s credibility is a priori.
When asked how they knew a signal was valid, authors typically alluded to their process, saying things like “well, we vet things.” In a few cases they referred to hypothesis testing, but more often alluded to exploratory data analysis and discovery through graphing and inspecting the data.
And not only did authors trust this process, but many implied that their viewers did as well. “I would say that you want trust established before you show uncertainty,” explained one interviewee when I asked how audiences knew the message in a graph was valid. When asked if they had seen audiences asking for uncertainty communication in data scientists’ presentations, another interviewee responded “They don’t care in the short term. Uncertainty is a long-term strategic problem.”
Tenet 3: Uncertainty obfuscates signal
Many interviewees and survey respondents suggested that uncertainty was separate from the message or signal of a visualization. Because it was separate, uncertainty was capable of challenging the signal, of obfuscating it. Uncertainty could compete with the signal for the viewer’s attention, or confuse them. If error was too large, it could threaten the validity of the signal statistically. Finally, uncertainty could signal a lack of credibility given the current norm of omission. A survey respondent described how their work “would not be accepted by politicians because other analysts never describe uncertainty, so my more robust visualizations appear less valid if I am transparent.” An interviewee described how adding uncertainty and other methodological information could be “off-putting” to a viewer: “sometimes it makes you think that people are trying too hard. There must be something even more that they’re not telling me about.”
Escaping a bad equilibrium
Summarizing what I found as a small set of model tenets was somewhat satisfying: it provided a concise, if initial, attempt at describing a problem that I believe more researchers (in computer science, economics, psychology, …) should be thinking about. But, my inner logician remained unsatisfied. I started down this rabbit hole because I sensed that visualization techniques alone would not solve the problem of “incredible certitude,” as economist Chuck Manski has called the absence of uncertainty from many government reports. Given the pervasiveness of beliefs that support a norm of uncertainty omission, what hope do we have of overcoming the “bad equilibrium” we’re in? I needed to explore avenues for escape, if only for my own peace of mind. Specifically, I began looking for reasons why visualizing uncertainty was still the right answer, even in light of a norm driven by authors’ pragmatism.
Solution: Formalizing how graphs communicate
The authors I talked to seemed clear on one thing — visualizations are meant to convey messages or “signal.” But can we formalize exactly what signal means, and how it is communicated in a visualization, so that we can consider uncertainty’s role in this process more rigorously?
Visualization as test statistic
Luckily, some statisticians have thought about this a bit. Andrew Gelman, Andreas Buja, Dianne Cook, and others have proposed that when a person examines a visualization looking for patterns, they are doing the visual equivalent of a visual hypothesis test, or more generally a “model check”.
The nice thing about this analogy is that it can guide how we think about the mechanics of visualization judgments, both under uncertainty visualization and without it. After all, a model check has a specific definition in statistics. For example, if the viewer is engaging in a model check akin to a visual hypothesis test, we need a null hypothesis specifying what the absence of a pattern looks like, a set of assumptions about how the data was generated, a test statistic summarizing how extreme the data are in light of the null hypothesis, and a criterion for judging that extremeness (like the conventional p-value less than 0.05 threshold in Frequentist statistics).
Imagine an author presents a chart called “Ireland’s debt grows quickly”, implying that the change in debt-to-GDP ratio for Ireland is different from that of the other countries shown. We can think of the visualized data as a test statistic, a summary from which we can judge the data’s extremeness in light of some model’s predictions.
Now think for a moment about the hidden (implicit) mental process the viewer might go through when judging the data in the chart. A model of statistical graphical inference proposes that the viewer mentally compares the visualized data to an imagined reference distribution representing the predictions of some model¹. But what model, fit to what data?
Here, we’re on our own, as no statistician can tell us exactly what magic happens when all of the prior experiences and chart reading habits of a person collide with a visual representation of data that they consume. And yet decades of research in visualization, not to mention cognitive psychology, have implied some commonalities in what people conclude from graphs. It seems safe to assume that at least among some viewers, graphical inference processes will be similar.
We might guess, for example, that a viewer imagine taking draws from some sort of linear model fit to all of the other lines besides Ireland in the chart, kind of like a simulated cross validation. In imagining these simulated trends, which you can picture as a kind of envelope surrounding the lines other than Ireland, the viewer compares them to the trend for Ireland, noting the discrepancy in slopes. Just like a conventional hypothesis testing scenario involves computing a p-value and applying a threshold to it to judge whether a difference qualifies as statistically significant or not, a visualization viewer might apply a mental discrepancy function with an associated internal threshold on how much “pattern” is enough to conclude a difference exists.
Uncertainty visualization can help authors communicate a message
Why is this analogy to model checks or hypothesis testing useful? A more principled notion of the form a visual judgment takes allows us to more rigorously consider how changes to how a visualization is conveyed (including whether or not uncertainty is visualized) result in different interpretations. The question that, if answered, could hold the key to the problem of uncertainty omission is How is visualizing uncertainty better for communicating than not visualizing it?
Well, we know that visualization authors care a lot of about getting their message, or signal, across. That’s why uncertainty is perceived as so dangerous. And yet, according to the graphical inference model, a viewer recognizes a signal in a visualization by hypothesizing a reference distribution, a distribution constructed by imagining uncertainty. Whether or not this process is conscious, a viewer cannot imagine a model without acknowledging probability (or likelihood). Seeing what other values the visualized data might have taken —whether through a conventional or more novel visual representation of uncertainty — provides directly relevant information for guiding the viewer’s inference. In theory, we might expect this to lead to more consistent inferences across users.
Why would an author be content to let the viewer mentally “fill in” the gaps, when they could exert more control over this process by representing uncertainty explicitly? My conversations with authors led me to believe that most are quite sensitive to the ways that small decisions in design impact the mental “filling in” or conclusions of their viewers. Often this occurs through considerations like how to scale a y-axis, or what data to include for context.
What may be less obvious though, is how visualizing uncertainty can help one communicate an intended message. For example, even if the viewer imagines a slightly different model than the author, visualizing uncertainty can convey what other values are possible for the data that is shown — adding some consistency to the data that viewers mentally “fit” a model to. This should reduce degrees of freedom in the inference process, bringing the viewer’s inferences closer to the author’s goal. Even better, the author could visualize uncertainty that directly corresponds to the model they want the viewer to infer, reducing many degrees of freedom.
Thinking more deeply about the process a viewer engages in when they examine a visualization may be the key to recognizing the value of depicting uncertainty. One reknowned data journalist I interviewed seemed to sense this, saying:
What task is the reader going to go through when they look at this chart? The more we do think about that, the more likely it is we might start thinking about the importance of putting uncertainty into the charts
However, it can be challenging to identify an inference task in contexts where visualizations are intended simply to inform. That’s where visualization research might help. In addition to producing tools and techniques for calculating, visualizing, and explaining uncertainty, researchers could help authors specify goals for, and predict, viewer’s inferences, and reason about the implications of omitting uncertainty. As applied as the field of data visualization may seem, we need theory to guide the empirical data we collect and the tools that we build. A deeper understanding of how uncertainty can help visualization authors communicate could lead to a world in which presentations of data are a bit more honest. And who doesn’t want that?
- In a Bayesian framework, the reference distribution is a set of draws from the posterior predictive distribution, the distribution of the outcome variable implied by a model that used the observed data to update a prior distribution of beliefs about the unknown model parameters (Gelman 2003, 2004).
Thanks to members of the MU Collective for feedback on this work.