Drinking from the Hose

The pitfalls and possibilities of progressive visualization.

Kurt Dahl
VisUMD
4 min readOct 24, 2022

--

Image by shutterbean from Pixabay.

Today’s world is generating data at a pace too fast to enable proper analysis. As a solution to this problem, data scientists have developed progressive visualization, which enables analysis on partial and incomplete data. However, as with any new tool, industry needs to know whether it is effective. And in this case, effective means primarily three things. Are the conclusions drawn from progressive visualization (1) fast, (2) accurate, and (3) trustworthy?

This is exactly what a group of professors from Tufts University, the University of Arizona, and Columbia University ventured to find out by testing four of the most common cognitive biases of this new approach. The biases studied include the perception of uncertainty or uncertainty bias, the perception of incomplete information or illusion bias, giving the user control before the data is complete or control bias, and the user’s over-reliance on prior beliefs or anchoring bias.

The researchers completed five studies: an initial study to determine which types of visualizations to use in each of the four following studies which covered each of the four major cognitive biases. The initial study asked participants to complete six simple analysis tasks using either error bars(b) or gradient plots(a) and rate them on satisfaction, ease of use, productivity, and frustration.

The study revealed no difference in performance results between the two, so the researchers used gradient plots, an industry standard, moving forward in the following bias studies.

For each of the four bias studies, participants completed three types of tasks. First, they simply read a value from the visualization by answering “how many units were sold?” Second, they derived a value by combining two singular values. And lastly, they found an extremum by comparing the minimum or maximum value to the entire dataset to make a decision. Again, they evaluated each bias on three criteria: completion time, accuracy, and confidence.

  1. Uncertainty Bias. The researchers thought that participants’ ability to judge the uncertainty of results with progressive visualization would be more difficult on all three fronts: lower accuracy, speed, and confidence. Turns out they were wrong. Uncertainty of the data did not affect the accuracy or confidence of their answers and only affected the speed of the third and final, find extremum, task.
  2. Illusion Bias. The researchers thought that introducing “false patterns” in data at an intermediate level would reduce participants’ accuracy without affecting their speed or confidence. In other words, it would give them a false sense of security and confidence. The accuracy of their answers did reduce, and their speed was only affected once again on the find extremum task, however, their confidence reduced as well, especially when the false pattern was introduced earlier in the process.
  3. Control Bias. The researchers thought that a blind spot in the accuracy of the sampling process would cause participants to steer the visualization in an inaccurate direction. In addition to lower accuracy, they assumed that the speed of completion times for the tasks would also be slower. Interestingly, their assumptions proved mostly false. Completion times lowered, but accuracy did not decrease resulting in increased confidence as well. Participants took their time and this justified their accurate responses.
  4. Anchoring Bias. The researchers primed the participants with a specific subset of the data to see if they exhibited an over-reliance on that subset. They thought this over-reliance would reduce the accuracy and completion time of answers and increase confidence. In the end, the accuracy and completion time of the primed participants was no different than unprimed participants and confidence was unchanged rather than increased.

So, what does this all mean? The four bias studies revealed a few trends that the researchers think to offer a cautiously optimistic view of the future of progressive visualization’s use in data analysis. First, it’s easy to use. Only 43% of participants considered themselves novices in the method and yet still had low error rates. Second, it saves time. Users were able to complete tasks in about an eighth of the time that it took to completely process the data. Third, it’s not as inaccurate as one might expect. The tradeoff in accuracy sacrificed for more efficient processing time is only 10 to 15% in most cases. The one caveat here is that illusion bias caused by early emerging false patterns is a real concern and must be addressed in future use. And lastly, it boosts user confidence. When faced with uncertainty or illusion bias, participants’ confidence in their answers correlated with the accuracy of their answers. However, control and anchoring bias presented real concerns about user confidence being misleading. All in all, the results of this study tell us that progressive visualization is a worthwhile tool to utilize as long as these few caveats are kept in mind.

References

  • Marianne Procopio, Ab Mosca, Carlos Scheidegger, Eugene Wu, Remco Chang. Impact of Cognitive Biases on Progressive Visualization. IEEE Transactions on Visualization & Computer Graphics.

--

--