Hypothetical Outcome Plots (HOPs) Help Users Separate Signal from Noise

7 min readOct 17, 2018

In daily life, we often find ourselves trying to separate signal from noise. For example, does the monthly jobs report suggest a growth trend, or that the jobs rate is steady? In a pair of experiments, we found that hypothetical outcome plots (HOPs) — animated samples of possible outcomes — can help people to make this judgment with greater accuracy.

How do people use HOPs?

HOPs enable viewers to experience variation in outcomes over time, similar to the way we experience uncertain events in our daily lives. Research finds that “a 3 out of 5 chance of rain” is easier to interpret than “a 60% chance of rain”, suggesting that probabilities are easiest to understand when framed as frequencies of events. HOPs make this frequency framing more visceral by using animation to show frequencies over time. In recent years, researchers and data-journalists have used HOPs and other data visualizations with frequency framing to communicate uncertainty in bus arrival times, findings of scientific studies, hurricane locations, election models, and the jobs report.

One example of HOPs in the New York Times showed how sampling error can lead to confusion about economic growth. The article focused on monthly reports of jobs added to the economy. They use HOPs to show how sometimes a growing economy will produce jobs numbers that look flat, and other times a stagnant economy will produce promising jobs numbers.

From How Not to Be Misled by the Jobs Report by Irwin and Quealy at the New York Times.

What do we know about HOPs?

In 2015, Hullman and colleagues ran a set of experiments showing that users of HOPs made comparable or better judgments of univariate probabilities than users of error bars and violin plots. For probability judgments about multiple variables (i.e., common language effect size), users of HOPs were an estimated 35 to 41 percentage points more accurate than users of error bars and violin plots.

Uncertainty visualizations compared by Hullman and colleagues: error bars, violin plots, and hypothetical outcome plots.

Image from an article by Haberman and Whitney.

Research on the psychology of visual perception (a.k.a., vision science) suggests that people are able to quickly, accurately, and automatically perceive sets of visual objects (i.e., ensembles). In experiments, people were able to judge the average size, location, lifelikeness, or facial expression of a set of visual objects.

This automatic perceptual averaging occurs whether people see objects in a static view or in an animated sequence. Interestingly, people are able to accurately report the statistical properties of a set of objects without being able to remember the characteristics of individual objects. This leads us to believe that HOPs and other ensemble visualizations are processed automatically (without trying) and subconsciously (without being aware) by the visual system.

Why did we run a new study?

We wanted to know whether HOPs improve users’ ability to make everyday judgments about uncertainty. Specifically, we were interested in users’ ability to infer a trend from samples of noisy data, an important judgment when interpreting common applications of statistics. To test this, we created two experiments inspired by a NYT article about how to interpret the jobs report.

Evaluating the impact of HOPs on trend perception

Judging trends in ambiguous data

Do HOPs enable viewers to identify the trend in noisy data better than they could using other uncertainty visualizations? We showed Amazon Mechanical Turk workers a chart of the number of jobs added to the economy each month of a hypothetical year and asked them whether the jobs report shows a trend of no growth or growth. In order to get a sense of the task, take a look at the figure below.

An image of the task that participants completed in our experiments. Participants judge examples of jobs numbers in the chart on the left while using the uncertainty visualizations on the right as a reference. We varied both the example charts and whether static uncertainty visualizations (e.g., error bars or static ensembles) or HOPs were used to display the two trends.

We conducted two controlled experiments in which participants had to make this judgment repeatedly for many examples of jobs numbers. To ensure that participants were tested across a range of task difficulty including plenty of judgments at the boundaries of their ability, we used an algorithm called a staircase that presented a more difficult example after every third correct response and an easier example after every incorrect response.

Examples of charts judged by participants, grouped by the correct trend and difficulty of classifying the trend.

Uncertainty visualizations tested

Throughout the experiments, participants referenced a pair of uncertainty visualizations (e.g., the right side of the figure above) showing trends of no growth vs growth with uncertainty due to sampling error.

In our first experiment, we compared HOPs to error bars, a very common uncertainty visualization. We compared participants’ performance when they used the two uncertainty visualizations below as a reference for the task. These visualizations mirrored the visualizations from the NYT article.

Uncertainty visualizations compared in our first experiment.

In a second experiment, we set out to test whether or not there is something helpful about animating possible outcomes across frames rather than aggregating them into a static display. We asked participants to make the same judgment as before, but this time the examples of jobs report numbers were shown in line charts. We compared participants’ performance when using the three uncertainty visualizations below. These uncertainty visualizations show the same lines with in one static view without animation and with animation at different frame rates (400 ms per frame and 100 ms per frame, respectively).

Uncertainty visualizations compared in our second experiment.

Measuring user sensitivity to trends

Using the ground truth about the trends shown in the examples that participants saw, we labeled each judgment as correct or incorrect. What is the best way to analyze this kind of data?

A simple method is to compute accuracy across all the judgments a participant made when using a specific uncertainty visualization, and then compare the average accuracy people had for each type of uncertainty visualization. However, this approach does not account for the fact that the perceptual judgments people made varied in difficulty (i.e., some hypothetical jobs numbers were more ambiguous). Rather than their overall accuracy, we want to know how much evidence an observer needs before they can accurately recognize a trend.

We borrowed an approach from psychophysics (a methodology used by psychologists to study the relation between stimuli and sensations) and estimated just-noticeable-differences (JNDs) for each participant and uncertainty visualization type. JNDs measure the amount of physical evidence a person needs in order to detect a sensory signal. In our task, JNDs provided a measure of how pronounced the trend in ambiguous data needs to be for a participant to classify it correctly about 75% of the time.

Difference between the two squares is the average person’s JND for brightness.

To get a sense of what a JND looks like, try to tell which of the two squares to the left is lighter. If it feels difficult, that’s because the difference is close to the average JND for lightness. When a person’s JND is higher, they are less sensitive to the signal in a stimulus and require more evidence to make a correct interpretation. When a person’s JND is lower, they are more sensitive to the signal in a stimulus and require less evidence to make a correct interpretation.

Vision scientists commonly measure and compare JNDs under different display conditions in order to test the impact of specific conditions on perceptual sensitivity. In our experiments, we used JNDs to compare users’ sensitivity when using different uncertainty visualizations as a reference for the two possible underlying trends in the jobs report.

HOPs promote accurate perceptions of uncertainty

Examples showing the same amount of evidence in favor of growth and no growth, respectively. When using error bars as a reference, the average user could seldom detect the difference in trends. With HOPs as a reference, the difference is detectable.

In our first experiment, participants were able to correctly interpret the underlying trend for more ambiguous examples of jobs numbers when using HOPs than when using error bars (see ambiguous examples to left). Perhaps this is because error bars rely on summary statistics like standard error to represent uncertainty. These statistics are not the most readily interpretable representation for uncertainty, particularly for audiences without statistical training. This interpretation of our findings is consistent with the prior work in data visualization and judgement and decision-making which suggests that people more easily understand frequency framing of outcomes.

In our second experiment, participants were more consistently correct in their interpretation of ambiguous samples with HOPs at 400 ms per frame than with line ensembles. This suggests that animating outcomes improves user sensitivity to the underlying trend in ambiguous data beyond the impact of frequency framing alone. Perhaps this is because animation eliminates the visual clutter in analogous static ensembles. This interpretation is consistent with prior work on crowding, perceptual limitations on the ability to accurately read dense displays. However, HOPs are also subject to perceptual limitations. Compared to 400 ms HOPs and line ensembles, 100 ms HOPs showed an intermediate effect on participants’ sensitivity to the underlying trend. This suggests that the effectiveness of HOPs is diminished for frame rates faster than the blink of an eye.

When communicating uncertainty is a priority and animation is possible, visualization designers should consider using HOPs.

Authors: Alex Kale & Jessica Hullman

Paper: http://idl.cs.washington.edu/papers/hops-trends