Being Bayesian with Visualization

Jessica Hullman and Yea Seul Kim of the MU Collective.

TLDR: Most visualization design and evaluation methods don’t explicitly consider beliefs. Applying a Bayesian framework to visualization interaction provides a more powerful way to diagnose biases in people’s interactions with data, like discounting or overweighting data in judgments or decisions. We can also use Bayesian models of cognition to evaluate visualizations that present uncertainty, to personalize how we visualize or explain datasets, and to predict different individuals’ future responses to data.

Think about the last time you looked to a visualization to determine what to believe about the world, or how to act. For those of us who have been watching COVID-19 data accumulate, the answer might be “twenty minutes ago.” Even in the absence of a global pandemic, it’s common to look to visualizations in science, government, the media, and our personal lives to decide what is true about something in the world, whether that’s the state of the economy, what will happen in the next election, or how well you’re meeting personal finance or health goals. Visualizations can confirm, strengthen or call into questions things we already think.

Unfortunately, much of the rhetoric about why visualization is powerful implies that visualization is primary for helping us perceive data, and identify hypotheses or trends that we had no idea existed. Consider often-cited examples like Anscombe’s quartet, the clever set of 4 datasets that, when summarized as summary statistics, appear to be identical, but on visualization are clearly distinct in structure. Clear your mind and visualize the data, one might believe, and they will speak. Perhaps as a result, many visualization evaluations focus on how well people can read the data from a visualization, not what they think afterward.

Anscombe’s quartet, like many examples used to demonstrate the power of visualization, assumes that the user has no prior knowledge about the data.

How often do you find yourself looking at data about which you really have no prior expectations? The ways we use visualization in the world to inform our beliefs call into question latent assumptions that people approach visualizations as “blank slates.” We’ve been developing an alternative approach to visualization design and evaluation that acknowledges people’s prior beliefs.

An Example of a Bayesian Approach to Visualization

Imagine I am going to show you some sample data estimating the proportion of people who will contract coronavirus given that they are exposed (or, if you’re overwhelmed with COVID-19 news, you can imagine I am talking about presidential polls instead!) But before I do that, think about what beliefs you might bring to the estimate even before I show it to you. If you’re like me, you’ve been watching updated case counts and deaths across the world, and could harbor a rough guess of the proportion of people in who seem to get the virus when exposed. Even if you have little background on COVID-19, the term virus may make you think of other diseases, and harbor a guess based on those. For example, I might describe my beliefs as a prior probability distribution over possible values, with my best guess of the rate at 30% but believing that values between 11% and 60% are possible.

Setting prior beliefs on the percentage of people exposed to COVID-19 who will contract it.

Imagine you now saw a visualization of a sample of people who got coronavirus within some population, say, the Diamond Princess cruise ship. At least 712 out of 3,711 (19%) passengers contracted the virus while quarantined together on the ship.

712 people of the 3,711 passengers on the Diamond Princess were infected with COVID-19. Because we are trying to estimate the probability of contracting the disease upon exposure, but we observed a limited number of people, numbers as low as 660 and as high as 754 should be considered plausible.

When using the Diamond Princess to estimate the percentage of people in the larger U.S. population who will contract COVID-19 if exposed, we must acknowledge some uncertainty around our estimate of the population value due to the fact that various underlying “true” proportions could produce slightly different sample estimates. A likelihood function describes this uncertainty about the true proportion based on the limited sample size of the Diamond Princess passengers.

Now consider what your best guess about the proportion of people who will contract coronavirus given that they are exposed is now that you’ve seen the new information. Do you mostly overwrite your prior beliefs with what the data says? Or do you reject the data and stick with something pretty close to your prior? Or do you believe something in the middle? If so, are you more or less uncertain now?

A few examples of what a person might believe after learning that 19% of the Diamond Princess passengers contracted COVID-19, assuming their prior beliefs that 30% is the most likely proportion exposed who will get COVID-19, with some uncertainty. They might, for example, mostly reject the new data (a), update their beliefs to a value between their prior beliefs and the new data (b), or mostly throw out their prior beliefs (c).

By eliciting or inferring a user’s prior beliefs about a parameter, showing them observed data represented as a likelihood function, and eliciting their posterior beliefs, we frame visualization interaction as a belief update. And given this framing, we can do some powerful things.

First, in a Bayesian framework, we can use our prior and the likelihood for the Diamond Princess sample above to calculate a normative posterior distribution. This is what we should believe if we take the data at face value and use it to update our prior beliefs rationally. Rational in this case means that we’ve used Bayes rule, a simple rule of conditional probability that says that our posterior beliefs should be proportional to our prior beliefs times the likelihood, to arrive at the posterior beliefs.

For our example above, we can represent the two distributions — the prior distribution and likelihood — using Beta distributions, which are described by two parameters representing the numbers of “successes” (e.g., those who contracted COVID-19 after exposure) and “failures” (e.g., people who didn’t contract it). Under most circumstances, we can interpret the sum of these values as the amount of data (or size of sample) that is implied by those beliefs or data. This leaves us with two sets of information. It is rational to combine these two sets of information to figure out what we should believe now. We can sum successes and failures, then figure out the new proportion:

For a Beta distribution, we can sum the two parameters (alpha and beta, approximating counts of “successes” and “failures”) from the prior and likelihood to arrive at the best posterior estimate of the proportion.

Let’s say our posterior distribution was (b) in the figure above, with a most likely value of 25%. We can compare it to the predictions of our Bayesian model, which suggests that we should believe that 19% of people who are exposed will contract COVID-19 with a small window (a few percent) around that value, based on our prior beliefs and the Diamond Princess data. It appears that our posterior beliefs overweight our prior beliefs, which said 30% was most likely, relative to normative Bayesian inference.

Comparison of the posterior beliefs predicted by a Bayesian model of updating for the COVID-19 example (top left) and a user’s hypothetical posterior beliefs (bottom left) which are more uncertainty and shifted toward their prior beliefs. The right side represents the same two distributions as density plots.

From the difference, we learn something about how a person updates that can be broken down in several ways, such as by considering how well they update the location of their beliefs (how the mean of their posterior compares to that of the normative posterior, in this case shifted toward prior beliefs) versus variance (how much more or less certain they are than the normative posterior, in this case more uncertain than a Bayesian would be, suggesting they undervalued the informativeness of the sample). With this information, we can do a number of things, which we are currently exploring in our research:

  • Evaluate visualizations to see which brings updating closer to Bayesian. Even if people don’t appear Bayesian at an individual level, we can rely on the fact that people’s belief updates often tend to be Bayesian in aggregate to identify which of several visualizations is best for belief updating.
  • Use the user’s prior to personalize how we show them data. Their prior distribution describes how certain the user is about some quantity. We can use this subjective uncertainty to give them context on how much information new data contains (e.g., this data is twice as informative as your prior beliefs) or derive other analogies to guide their belief update.
  • Detect (and mitigate) cognitive bias as someone does visual analysis. If we can observe a few belief updates someone has already done, we can diagnose whether they tend to over- or under-update, and how. We can predict how they’ll respond given a new dataset, and adjust the visualization or other aspects of the interaction for detected biases.

But first, let’s unpack what might cause deviation from a Bayesian prediction.

What might cause deviation from Bayesian updating?

A few reasons seem especially worth considering.

Biases in Using Sample Size Correctly

One possibility is what is often called cognitive bias, where a person shows a consistent tendency to do something different than a perfect statistical processor would. Some well-known biases describe difficulties people face understanding the relationship between sample size and uncertainty. Often people overperceive the informativeness of a small sample (commonly called Belief in the Law of Small Numbers), and they may also discount the informativeness of a large sample (recently dubbed Non-belief in the Law of Large Numbers).

In a recent experiment, we had roughly 5,000 people on Mechanical Turk go through a procedure similar to the example above, giving us their prior beliefs, viewing visualized data, and then giving us their posterior beliefs. We showed them one of four datasets, which varied in sample size (a small-ish sample of 158 versus very large sample of 750k) and topic (dementia rates among assisted living center residents in the U.S. versus the proportion of surveyed female U.S. tech company employees who reported that mental health affects their work often).

Icon arrays depicting a small and large sample version of a dataset estimating the proportion of women in tech who feel mental health affects their work often. The huge sample size of the dataset on the right means that each icon represents multiple women.

Topic didn’t have too much of an effect on how people updated their beliefs. But sample size did. First, when we looked at the average individual level deviation from Bayesian updating (i.e., how far a person’s posterior beliefs were from the normative posterior beliefs given their prior), it was much higher for the very large sample. Second, when we looked at the average aggregate deviation from Bayesian inference — the deviation between the average of all people’s posterior distributions and the normative posterior distribution you get if you update the average of all people’s prior distributions using the Diamond Princess sample — it was also much higher. These results align with recent findings from a behavioral economics study that uses more abstract “balls and bins” scenarios to find that people increasingly discount the value of information as sample size grows. Ironically, this suggests that we should not show people big data all at once.

How do people mentally represent uncertainty? Our results also suggested that people are much more Bayesian in aggregate than individually. An intriguing hypothesis put forth for this “noisy Bayesian” effect is that people’s priors may take the form of samples, rather than full distributions. To use an example tested by mathematical psychologists, if you asked me how long a cake had to remain in the oven before it was done, given that it has already been in for 10 minutes, I might come up with an answer by imagining a few specific cakes I know of, say, one which takes 25 minutes to bake, and one which takes 50 minutes. We explored the idea that people may find it easier to think about samples than full distributions over proportion values by testing how robust our results were to different interfaces for eliciting people’s prior beliefs.

Interfaces for eliciting a person’s prior beliefs, which vary in the degree to which they encourage thinking in terms of discrete samples (left) versus full distributions over parameters (right).

Misperception of Uncertainty

A related reason to bias in using sample size is that a person may misperceive uncertainty in the estimate they are shown. The “machinery” of Bayesian inference allows us to ask some interesting counterfactual questions to investigate this possibility. Let’s assume that misperception of the Diamond Princess sample size, or conversely the uncertainty in the estimated proportion we got from it, is what caused our deviation. Since we know the user’s prior beliefs, and we know their posterior beliefs, we can ask: What is the sample size that a rational Bayesian agent (meaning, one who updates according to our model) would have needed to perceive to arrive at this user’s posterior beliefs given their prior beliefs? Assuming the posterior that predicts the most likely value is 25% with an interval from 20.4% to 30.1%, the answer is 305. So we learn that it’s possible that we misperceived the Diamond Princess sample size as being about 1/10 of the size that it actually was.

How can we get people to be more sensitive to how sample size translates to uncertainty? In our experiment on Mechanical Turk, we varied the way uncertainty in the data was visualized, between a more conventional static icon array with sample size mentioned in text, and an animated hypothetical outcome plot. Using our sample size measure, we found that using the conventional approach, people perceived a sample size of 750k on average as though it were only 400! (Compare this to the sample size of 200 they acted as if they perceived when shown a much smaller sample of 158). When we visualized uncertainty using animated hypothetical outcomes, the average perceived sample size jumped to 67k. Still a big discount over the true size of 750k, but much, much better.

Discounting Data Based on the Source

Another possible reason for deviation stems from the way our Bayesian model assumes that the user will take data at face value, judging its informativeness by sample size alone. If we believed that the Diamond Princess sample isn’t fully representative of the population we care about, perhaps because its passengers skew older and more affluent than the general population, then it would make sense that we might adjust the weight we place on it in arriving at our posterior beliefs.

In recent experiments we’re finding that how much someone says they trust the source of a visualized dataset can help us predict how much less Bayesian their belief update will look. So, if we can also infer a user’s trust in the source (e.g., from their political attitudes, browsing history, etc.) we can use that information to make the predictions of our Bayesian model more accurate.

Applying Bayesian Modeling to Visualization

To apply Bayesian modeling to visualization, there are a few things we need.

First, we need to decide the parameter(s) we care about. The parameters are the information that we think are important to the visualization message. Choosing a parameter may be straightforward for some datasets (e.g., the percentage support for a candidate given a poll). However, other datasets might be visualized to convey more complex multivariate relationships, requiring an author to choose what information seems most critical.

“Where Boys Outperform Girls in Math: Rich, White and Suburban Districts” by the New York Times.

Consider, for example, a New York Times visualization that depicts Math (blue) and English (orange) performance for U.S. third through eighth graders on standardized tests. Each circle represents a district. Family income is plotted along the x-axis, while the y-axis encodes how much better girls perform than boys. This visualization supports a number of parameters which entail different Bayesian models and elicitation interfaces, shown in the table below.

Possible parameters and elicitation approaches for the New York Times “Where Boys Outperform Girls in Math: Rich, White and Suburban Districts”

Next, we need a way to elicit or infer a user’s prior beliefs. The goal of elicitation is to capture the user’s sense of how plausible different values of the parameter are. The most direct way is to ask the user to say, in absolute or relative terms, how likely different values of the parameter seem. The example above demonstrates eliciting a prior in parameter space by asking users to think about population proportions directly. However, the math behind probability distributions makes it possible to derive a prior distribution (in parameter space) from user’s estimates in data space. Consider the graphical sample interface above, which showed people grids of 100 dots. Imagine instead that those dots were people icons, and the number of people matched the sample size of the data. This approach may be more effective when parameters are hard for people to reason about outside of examples

The nice thing about eliciting graphical samples is that we no longer have to ask the user to think in terms of abstract representations that differ from the visualization they are looking at. In the graphical sample interface below, the user drags and positions point clouds for each subject from a left panel, providing their prior over the average score difference by subject. For each positioned point cloud, they then adjust the slant (slope) by right-clicking and sliding a slider. This represents their prior on the strength of the relationship between income and score difference.

Eliciting a user’s beliefs about the relationship between income, Math and English grades, and (binary) gender.

Third, we need some data to show them, and a Bayesian model in which it is represented as a likelihood function. Simple Bayesian models often have just a single level of structure where a data-generating process is defined for the parameter in question and priors are specified only for parameters of that process (e.g., priors representing distributions over parameters in a model like our proportion examples above, or the intercept α or slope β in the linear model y=α+β∗x). More sophisticated hierarchical models specify hyperpriors (distributions over parameters of the priors).

Finally, for applications like visualization evaluation, bias detection, or personalizing based on update “type”, we need to elicit or infer people’s posterior beliefs. In most cases, we can do this the same way we elicited the prior, in parameter space or data space. Imagine a visual analysis system like Tableau Software periodically querying your changing beliefs as you visually analyze data, and responding by showing you data differently.

Want to read more? Check out our ACM CHI 2019 paper or EC Behavioral Economics Workshop paper on Bayesian cognition for visualization. Stay tuned for more research on the way, exploring Bayesian personalization and bias detection for visualization interaction.

--

--

Jessica Hullman
Multiple Views: Visualization Research Explained

Ginni Rometty Associate Professor, Computer Science at Northwestern University. Uncertainty visualization, interactive analysis, theory+interfaces.