Visualizing Trends on Mobile Phones: Animation or Small Multiples?
The rise of the DataGIF.
A few years ago, I noticed that animated graphics were becoming more prevalent in online news and in social media. The ones that interested me the most were those that make use of position change over time, which you can see by browsing collections of “DataGIFs” curated by ProPublica’s Lena Groeger and Buzzfeed’s Jeremy Singer-Vine.
In 2015, the NPR News Graphics Team proposed a design strategy of displaying small multiples on desktop and looped animated GIFs on phones. My colleagues and I wanted to know if there was evidence behind this allocation of designs to different device profiles, so we decided to conduct an experiment. What we found seems to indicate that small multiples actually hold up pretty well, but the shape of the trend is also an important determining factor when choosing between animation or small multiples.
But first, some history.
Over a decade ago, Microsoft Research’s George Robertson and his colleagues were also intrigued by animation, particularly after seeing the late public health expert Hans Rosling’s TED talks. Rosling would narrate a story about shifting global health and economic metrics set to animated scatterplots. So Robertson and his colleagues conducted an experiment comparing animated and static scatterplots. Their paper about the experiment was quite influential, earning a Test-of-Time award at the IEEE VIS conference in 2018.
Their experiment considered two settings, a data analysis setting and a presentation setting with a narrator, so as to emulate Rosling’s TED talks. They also had three design conditions (animation, a connected scatterplot of trajectory trails, and small multiples with individual trajectory trails), a large and a small dataset, and 24 trajectory comparison tasks.
In their data analysis setting, the experiment’s participants had better overall performance with small multiples, though interestingly, they preferred animation, finding it to be fun and engaging. This was all of course done using a large display, and with experimenters observing the participants in a lab setting.
We conducted a crowdsourced experiment that borrowed some experimental design elements from Robertson and colleagues’ 2008 laboratory experiment. However, our experiment differed in several ways. First, we did not consider a narrated presentation setting (though having incorporating a narration audio track could be an interesting follow-up experiment!). Second, we only considered a small dataset of 16 items from the 2008 experiment; we attempted larger dataset sizes, but the interpretability of both animation and small multiple designs degrades even with a 5x5 grid of 25 items, especially on a smaller phone screen. We therefore adapted a subset of the 2008 experiment’s experimental tasks, the ones that could be answered with a small dataset. Finally, we asked people to participate in our experiment using their mobile phone instead of a PC.
One of the things we had observed about the tasks used in the 2008 experiment was that they were quite different from one another. These tasks asked participants to compare and select one to three target data points exhibiting different trajectories, having markedly different starting and ending positions. In our new experiment, we wanted to examine the experimental tasks individually to see whether people could identify the different types of trajectories to a greater or lesser extent with either animation or small multiples.
The figure below shows what the stimuli actually looked like on a phone. Any individual participant saw either animation or small multiples. For each task, we first showed the instruction and the pair of axes. After that, the animation began looping in the animation condition. In both conditions, we asked participants to select responses from a multiple choice list corresponding to alphabetically labeled points in the plot.
We recruited participants from the US-based population of Mechanical Turk crowd workers, and after excluding participants who responded incorrectly to an obvious quality control task, we had 45 participants in the small multiples condition and 51 in the animation condition, which gave us nearly 900 completed tasks to analyze.
What we found.
Despite the smaller screen size, those using the small multiples scatterplots were still faster in 7 of 9 tasks, with no difference in the remaining 2 tasks. With respect to accuracy, it was a draw: in 5 of 9 tasks, there was no evidence for a difference in accuracy. In the remaining 4 tasks, Small Multiples prevailed in 2 and Animation had higher accuracy in the other 2. Interestingly, we noted that the animation group participants felt slightly more confident than their small multiples counterparts.
Beyond the overall results, what is interesting is when we consider individual tasks, such as when we found no difference in performance between the two groups, or when we found differences in accuracy that favored one design over another.
Trajectory reversals. In two tasks, we found no noticeable difference in completion time or accuracy between the two groups. In both of these tasks, the target data points initially move with the distractor points, but then they suddenly reverse course. These reversals may remain apparent using animation, while in a static small multiple design, the trajectory may be obscured since the trail might occlude itself. This potential obfuscation didn’t appear to impact those using small multiples to the extent that they performed any worse than those using animation, but the speed and accuracy advantage of small multiples that we saw in other tasks was absent in the case of these trajectory reversals.
Displacement along one axis. When we consider the tasks where the small multiples exhibited higher accuracy, these tasks involve absolute comparisons of ∆Y, such as identifying the largest decreases spanning the Y axis. This result is likely to be driven by the fact that in our small multiples design, you can make this visual comparison at a glance, while with animation, this comparison relies on memory.
Direction-of-motion outliers. Finally, there were two tasks where animation had higher accuracy and comparable completion times relative to small multiples, where it would appear as though the targets have trajectories that are outliers in terms of their direction of displacement. For example, in the adjacent figure, the targets move from left to right, while the distractors move from top to bottom.
What can we conclude?
Ultimately, we were surprised that small multiples remains to be a viable design option for mobile phone displays, and that they may be particularly well-suited if the viewer’s task requires comparisons of trajectory lengths or angles.
Meanwhile, animation also seems to have circumstances in which it is appropriate, such as when the viewer’s tasks involves identifying direction-of-outliers or reversals.
In both cases, further research needs to be done with more tasks and trials, possibly with systematically generated synthetic data to ensure that a large set of possible trajectory scenarios are considered (we used a subset of the real socioeconomic metric data from the United Nations that was previously used in Robertson and colleagues’ 2008 experiment).
Want more detail?
I invite you to try the experiment on your own phone at aka.ms/multiples. If you want more detail, read the arxiv pre-print of the IEEE TVCG paper, watch the IEEE VIS 2019 talk (video | slides), or check out the source code at https://github.com/microsoft/MobileTrendVis.
This work was a collaboration between Bongshin Lee (@bongshin | Microsoft Research), Petra Isenberg (@dr_pi | Inria), Eun Kyoung Choe (@slowalpaca | University of Maryland), and myself (matt brehmer | formerly of Microsoft Research, now with Tableau Research).
Thanks go out to Pierre Dragicevic, Steve Haroz, and Ed Cutrell, who helped us with our study design and our statistical analysis, as well as to Roland Fernandez, who shared material from the 2008 study.