Explaining the Gap: Visualizing One’s Predictions Improves Recall and Comprehension of Data

What if Visualizations Asked Users to Predict the Data First?

Imagine a data journalist writing a story about home prices in Denver. The journalist is planning to add a visualization to help users understand the trend of home price in 2014, 2015 and 2016.

The visualization could just show you the data, like we encounter all the time. But what if instead the interface prompts you to draw what you think the median home price in Denver looks like first?

Even if you don’t have specific knowledge about median home prices in Denver in 2014, 2015 and 2016, you might have some general prior knowledge that you could use to predict the data. For instance, in any healthy market, we might expect a home price to go up. Maybe you also suspect that Denver has many quality jobs.

Let’s assume that you drew the below trend, after considering a few factors like these.

The interface now shows you the actual data. What do you think about when you see the gap between the actual data and your prediction? Is your prediction close to the actual trend? If not, then how far, and what might have been the reason for your error?

Why Prior Knowledge Matters in Visualization Interaction

As you interacted with the visualization above, you had to think about what you know about the data domain, and how to externalize that knowledge. We wondered:

“Are there benefits to reflecting on one’s expectations while using a data visualization?”

A few studies in cognitive psychology have shown that making predictions about something as you learn can help you better understand the information. A well-known technique in education called self-explanation might explain the process that you went through as you thought about the gap between your prediction and the data. This technique was originally used by prompting students to explain a phenomena they were learning about, such as in a textbook, to themselves. Multiple studies showed that students who generated a larger number of thoughtful self-explanations tended to learn the material more accurately, as measured by tests they took later. We guessed that predicting the data in a visualization, and examining the gap between the prediction and the actual data, would encourage a visualization user to notice and self-explain why they were off. And that this process would, similar to self-explanation from a text, lead to better understanding of the information.

We designed a controlled online experiment to test how predicting data and other active reflection techniques like self-explanation affect a user’s ability to recall the data in a visualization. Being able to remember what you saw is one proxy for understanding, which is often used in cognitive psychology.

Three Ways to Interact with Prior Knowledge

So, how might a visualization elicit users’ prior knowledge? We designed three interactive techniques we expected might help users reflect on their prior knowledge, as they interacted with a data visualization (Fig. 1). We tested how these interactive techniques affect a user’s ability to recall the data by running a study on Amazon Mechanical Turk.

Fig 1. Three interactive techniques to prompt reflection on prior knowledge

Prompting users to explain to themselves the data can be a good way to make them to reflect on their prior knowledge. For instance, a user might think “Prices might have increased less between 2015 and 2016 because new jobs were down”. In our study, we gave participants in explanation conditions a text box along with the visualized data, and asked them to write their explanation of the data.

Next, asking users to predict the data might prompt them to think about what they already know. If the visualization makes the gap between their prediction and the data clear, the user might then think about the gap (e.g., “I was mostly right about the trend, but I was off by 100K in 2016”). In our study, we prompt participants to draw their prediction directly on the visualization before seeing the data.

Lastly, providing feedback by annotating the gap between the user’s prior knowledge and the data can help users focus on adjusting their prior knowledge. By emphasizing the gap, users may be more likely to perceive how much they should update their knowledge to accurately perceive reality. In our study, we provided personalized feedback to feedback conditions that described the accuracy of the overall trend (e.g., “you underestimated the median home price of Denver in 2014”) and the accuracy of prediction of individual data points (e.g., “you are 30K off”).

Fig 2. Experimental conditions

We combined the three techniques to create 4 experimental conditions and one baseline condition (Fig. 2).

We expected that all three interactive techniques would improve the user’s ability to recall the data later. Generating self-explanations should help a user recall the data more accurately, because of the proven efficacy of self-explanation in learning environments. But no one had ever tested self-explanation in a visualization setting.

We also expected participants who were asked to predict the data would recall the data more accurately compared to participants who had not been prompted to do so: prediction prompts a user to construct and externalize the prior knowledge, and viewing the gap provides visual feedback on how much to update one’s knowledge.

We wondered, what about when the data is presented in the text format? With text, the gap between the participant’s prediction and the actual data would need to be described in text. To compare the effect of our techniques with visualization and text, we added text conditions where participants saw the data summarized in a paragraph, though the information was otherwise identical to the visualization.

How to Measure Data Memorability?

Fig 3. Testing dataset

To evaluate our expectations, we needed a dataset to test on. Line graphs showing several data values for a few categorical variables (Fig. 3) are widely used in the social and physical science. As you’ll notice, this data format allows us to observe how well participants can recall high level patterns and also individual data points. As a specific data set with this format, we chose data on the percentage of different ethnicities that voted Republican in the 2008 election in various. Crowdworkers rated this data as moderately familiar in an initial study we did to identify how familiar various data sets were. ‘Moderately familiar’ was our goal because we thought people would need some prior knowledge (enough to make a prediction) but not so much that they were experts in order for our techniques to potentially help. Our preliminary study on data set familiarity found that of the data sets we showed them, participants were most familiar with the calories in various fast food items and least familiar with the results of a scientific study on rat activity levels.

Fig 4. Two measures of recall errors

To measure data memorability between the conditions, we devised two measures to observe participants’ ability to recall high level patterns and the individual data points (Fig. 4). In our study, participants were asked to recall the data after spending three minutes on a distractor task.

Prediction and Explanation Improve Data Recall

Our expectation was if participants used our three interactive techniques while interacting with the visualization, they would remember data better than those who just examined the visualization as we usually do. It turned out that these techniques really helped! Participants who were prompted to predict and/or explain the data recalled the individual data points 24% more accurately than participants who just examined the data but didn’t explicitly predict or explain. More interestingly, if these techniques were prompted sequentially (e.g., predicting first, then explain, predicting first, then examine feedback) the techniques enhanced the ability to recall the trend of the data 21% more than participants who weren’t prompted. These effects were reliable as determined by statistical tests.

“As we expected, prediction and explanation helped participants to remember the data accurately.”

Then what about when the data presented in text? We found that only the explanation technique worked to improve recall. This result implies that self-explanation, which is usually applied in learning environments, also works in a data presentation setting. Also, we confirmed that prompting users to predict data didn’t necessarily help improve their ability to recall the data when participants interacted with the data as text. This is in contrast to the result from the visual condition, where prediction helped people to recall the data accurately. Imagine seeing your prediction as a line alongside the data in a visualization like the Denver example — it makes sense that the visual feedback would make it easier to focus on the difference. With text, a user has to spend greater effort to “see” the gap, and many users may not be motivated to do this.

Deeper Interaction Between a User and a Visualization

Our finding that predicting data helps users remember the data better paves the way for new, prediction-oriented visualization designs. How might users of visualizations beyond line charts make predictions? We have begun to identify important design choices and types of differences — in other words, to describe the design space — for prediction-oriented visualizations.

Fig 5. Possible variable types for prediction task

Designers can ask users to predict quantitative variables in different chart types like a bar chart (Fig. 5). As the above choropleth shows, prediction for categorical variables can be implemented as well, for instance, by choosing the category to apply (by clicking) and then brushing over the region to indicate the prediction. Like in the example of the dendrogram (far right), the designer can also prompt users to predict the data structure.

Fig 6. Contextualization cues

Designers also can implement contextualization cues to guide the user as they form a guess. Based on how familiar they expect their audience to be with the data, designers can adjust how much data to reveal (Fig.5, Partial Prediction). Designers also can bound the range of prediction to help users make reasonable predictions (Fig. 6, Bounded prediction). Designers should consider the scale range of quantitative x and y-axes, since these ranges can have a big impact on the users’ prediction. Providing a few other data points that can influence the users’ prediction will guide users’ prediction.

Fig 7. Types of feedback

In our study, we found that feedback can be a helpful device to enhance participants’ ability to recall on top of the prediction technique. As in our study, feedback on how the user’s prediction compares to the data can prompt deeper reflection on prior knowledge. Personalized feedback can be based on the accuracy of a prediction, like “Overall, you were 80% right in guessing the amount of CO2 emission” or can include a social comparison: “your prediction is more accurate than 50% of the people who predicted”. When feedback is provided after a prediction, participants tended to recall not only the individual data points, but also the overall trend in our study. For a similar effect, a designer could provide feedback based on the main trend, like “you overestimated the slope of the overall trend” or based on the individual values like “you are wrong by 3 percent”. These feedback can be delivered in text format or visual annotation. And, the act of prediction and receiving feedback may also help users who are not very familiar with visualizations to build their graphical literacy, or basic understanding of how to read and use graphs.

Current visualization research and practice lacks techniques for explicitly incorporating a user’s prior knowledge. We think there is lots of work to do in envisioning what possibilities exist in terms of interaction techniques, as well as studying the effects of these techniques. For example, Can thinking about others’ visualized predictions have similar effects to predicting yourself, or influence your beliefs about the data? Can prediction and reflection on the gap help people understand difficult aspects of data, like uncertainty? Or help people understand Bayesian analysis, in which expectations play an important role? Stay tuned for future posts about our upcoming InfoVis 2017 papers on several of these topics!

Want to learn more about our work? Check out our paper.

This post was written by Yea-Seul Kim and Jessica Hullman. This work is in collaboration with Katharina Reinecke.