Visualizing Trends on Mobile Phones: Animation or Small Multiples?

TL;DR. My colleagues and I conducted an experiment where we asked people to identify one to three target data points among a set of sixteen items in either an animated scatterplot or a non-animated small multiples scatterplot, and we asked them to do so on their mobile phones. What did we find? It appears as though small multiples is still a viable design choice for small displays, though some trend trajectories may better shown with an animated design. Try the experiment on your own phone at aka.ms/multiples, read the paper on arxiv, or watch the IEEE VIS talk.

The rise of the DataGIF.

A few years ago, I noticed that animated graphics were becoming more prevalent in online news and in social media. The ones that interested me the most were those that make use of position change over time, which you can see by browsing collections of “DataGIFs” curated by ProPublica’s Lena Groeger and Buzzfeed’s Jeremy Singer-Vine.

A Data GIF published by the Pew Research Center where the positions of marks change over time.
A Data GIF published by the Pew Research Center where the positions of data points change over time.

In 2015, the NPR News Graphics Team proposed a design strategy of displaying small multiples on desktop and looped animated GIFs on phones. My colleagues and I wanted to know if there was evidence behind this allocation of designs to different device profiles, so we decided to conduct an experiment. What we found seems to indicate that small multiples actually hold up pretty well, but the shape of the trend is also an important determining factor when choosing between animation or small multiples.

Data viz solutions: small multiples on desktop, GIFs on yer phone!
 Brian Boyer (NPR News Graphics, 2015)
Data viz solutions: small multiples on desktop, GIFs on yer phone!
Brian Boyer (NPR News Graphics, 2015)
In Hans Rosling’s TED talks, he narrated a story about global health and economic metrics set to animated scatterplots.
In Hans Rosling’s TED talks, he used animated scatterplots to tell a story about changes in global health and economic metrics.

But first, some history.

Over a decade ago, Microsoft Research’s George Robertson and his colleagues were also intrigued by animation, particularly after seeing the late public health expert Hans Rosling’s TED talks. Rosling would narrate a story about shifting global health and economic metrics set to animated scatterplots. So Robertson and his colleagues conducted an experiment comparing animated and static scatterplots. Their paper about the experiment was quite influential, earning a Test-of-Time award at the IEEE VIS conference in 2018.

Their experiment considered two settings, a data analysis setting and a presentation setting with a narrator, so as to emulate Rosling’s TED talks. They also had three design conditions (animation, a connected scatterplot of trajectory trails, and small multiples with individual trajectory trails), a large and a small dataset, and 24 trajectory comparison tasks.

The three design conditions in Robertson and colleagues’ 2008 study.
The three design conditions (animation, a connected scatterplot of trajectory trails, and small multiples with individual trajectory trails) in Robertson and colleagues’ 2008 experiment. As in Rosling’s TED talks, the data is socioeconomic metrics for various countries, sized by population colored by world region.

In their data analysis setting, the experiment’s participants had better overall performance with small multiples, though interestingly, they preferred animation, finding it to be fun and engaging. This was all of course done using a large display, and with experimenters observing the participants in a lab setting.

Our experiment.

An example animated scatterplot stimuli from our experiment.
An example animated scatterplot stimuli from our experiment.

We conducted a crowdsourced experiment that borrowed some experimental design elements from Robertson and colleagues’ 2008 laboratory experiment. However, our experiment differed in several ways. First, we did not consider a narrated presentation setting (though having incorporating a narration audio track could be an interesting follow-up experiment!). Second, we only considered a small dataset of 16 items from the 2008 experiment; we attempted larger dataset sizes, but the interpretability of both animation and small multiple designs degrades even with a 5x5 grid of 25 items, especially on a smaller phone screen. We therefore adapted a subset of the 2008 experiment’s experimental tasks, the ones that could be answered with a small dataset. Finally, we asked people to participate in our experiment using their mobile phone instead of a PC.

An example small multiple scatterplots from our experiment (with trajectories encoded as trails).
An example small multiple scatterplots from our experiment (with trajectories encoded as trails); the same data shown in the preceding animated scatterplot.
The trajectories of targets (red) and distractors (black) used as stimuli in our experiment.
The trajectories of targets (red) and distractors (black) used as stimuli in our experiment. Note that these are simplified illustrations of the stimuli, not the stimuli themselves. Different participants in our experiment saw different shuffled orders of these tasks.

One of the things we had observed about the tasks used in the 2008 experiment was that they were quite different from one another. These tasks asked participants to compare and select one to three target data points exhibiting different trajectories, having markedly different starting and ending positions. In our new experiment, we wanted to examine the experimental tasks individually to see whether people could identify the different types of trajectories to a greater or lesser extent with either animation or small multiples.

The figure below shows what the stimuli actually looked like on a phone. Any individual participant saw either animation or small multiples. For each task, we first showed the instruction and the pair of axes. After that, the animation began looping in the animation condition. In both conditions, we asked participants to select responses from a multiple choice list corresponding to alphabetically labeled points in the plot.

What our experimental stimuli looked like on a phone.
What our experimental stimuli looked like on a phone, with the animation condition along the top row and the small multiples condition along the bottom row.

We recruited participants from the US-based population of Mechanical Turk crowd workers, and after excluding participants who responded incorrectly to an obvious quality control task, we had 45 participants in the small multiples condition and 51 in the animation condition, which gave us nearly 900 completed tasks to analyze.

What we found.

Speed and accuracy results by for each task.
Speed and accuracy results by for each task; an absence of an icon indicates no difference between the animation and small multiples conditions.

Despite the smaller screen size, those using the small multiples scatterplots were still faster in 7 of 9 tasks, with no difference in the remaining 2 tasks. With respect to accuracy, it was a draw: in 5 of 9 tasks, there was no evidence for a difference in accuracy. In the remaining 4 tasks, Small Multiples prevailed in 2 and Animation had higher accuracy in the other 2. Interestingly, we noted that the animation group participants felt slightly more confident than their small multiples counterparts.

Beyond the overall results, what is interesting is when we consider individual tasks, such as when we found no difference in performance between the two groups, or when we found differences in accuracy that favored one design over another.

We found no differences in participants’ speed or accuracy in the two tasks where the targets exhibit a trajectory reversal.
We found no differences in participants’ speed or accuracy in the two tasks where the target data items (red) exhibit a trajectory reversal.

Trajectory reversals. In two tasks, we found no noticeable difference in completion time or accuracy between the two groups. In both of these tasks, the target data points initially move with the distractor points, but then they suddenly reverse course. These reversals may remain apparent using animation, while in a static small multiple design, the trajectory may be obscured since the trail might occlude itself. This potential obfuscation didn’t appear to impact those using small multiples to the extent that they performed any worse than those using animation, but the speed and accuracy advantage of small multiples that we saw in other tasks was absent in the case of these trajectory reversals.

Those using small multiples were more accurate in tasks requiring an absolute comparison of ∆Y, such as the largest decrease
Those using small multiples were more accurate in tasks requiring an absolute comparison of ∆Y, such as the largest decrease along the Y axis.

Displacement along one axis. When we consider the tasks where the small multiples exhibited higher accuracy, these tasks involve absolute comparisons of ∆Y, such as identifying the largest decreases spanning the Y axis. This result is likely to be driven by the fact that in our small multiples design, you can make this visual comparison at a glance, while with animation, this comparison relies on memory.

Those using animation were more accurate in tasks requiring them to identify direction-of-motion outliers.
Those using animation were more accurate in tasks requiring them to identify direction-of-motion outliers.

Direction-of-motion outliers. Finally, there were two tasks where animation had higher accuracy and comparable completion times relative to small multiples, where it would appear as though the targets have trajectories that are outliers in terms of their direction of displacement. For example, in the adjacent figure, the targets move from left to right, while the distractors move from top to bottom.

What can we conclude?

Ultimately, we were surprised that small multiples remains to be a viable design option for mobile phone displays, and that they may be particularly well-suited if the viewer’s task requires comparisons of trajectory lengths or angles.

Meanwhile, animation also seems to have circumstances in which it is appropriate, such as when the viewer’s tasks involves identifying direction-of-outliers or reversals.

In both cases, further research needs to be done with more tasks and trials, possibly with systematically generated synthetic data to ensure that a large set of possible trajectory scenarios are considered (we used a subset of the real socioeconomic metric data from the United Nations that was previously used in Robertson and colleagues’ 2008 experiment).

Want more detail?

I invite you to try the experiment on your own phone at aka.ms/multiples. If you want more detail, read the arxiv pre-print of the IEEE TVCG paper, watch the IEEE VIS 2019 talk (video | slides), or check out the source code at https://github.com/microsoft/MobileTrendVis.

Talk at IEEE VIS 2019 about this paper.

Acknowledgments

This work was a collaboration between Bongshin Lee (@bongshin | Microsoft Research), Petra Isenberg (@dr_pi | Inria), Eun Kyoung Choe (@slowalpaca | University of Maryland), and myself (matt brehmer | formerly of Microsoft Research, now with Tableau Research).

Thanks go out to Pierre Dragicevic, Steve Haroz, and Ed Cutrell, who helped us with our study design and our statistical analysis, as well as to Roland Fernandez, who shared material from the 2008 study.

--

--