Visualizing Ranges Over Time on Mobile Phones

Published in

Multiple Views: Visualization Research Explained

12 min readNov 3, 2018

**TL;DR**. We experimentally evaluated two layouts of ranges over time across three temporal granularities for use in mobile applications and mobile-first websites, such as those pertaining to weather, personal health, and finance. Our experiment included locating dates, reading values on indicated dates, locating extreme values, comparing individual values, and comparing spans of multiple ranges. Try the experiment on your own phone at aka.ms/ranges. Read the IEEE TVCG paper, check out the IEEE VIS slides, or see the source code at https://github.com/Microsoft/rangesonmobile.

The next time you pick up the print edition of The New York Times, turn to the weather page; here you’ll see temperature ranges over time. Highs and lows for the last five days and the next five days are superimposed over average and record temperature ranges for several American cities, and you’ll also get a full month of temperature ranges for New York City itself. Or the next time you ask the question “was this past year warmer or more erratic than usual?”, you might encounter a chart like one by Randy Olson, overlaying observed and average ranges for every day of an entire year. These charts are fairly conventional when it comes to weather reporting, but they certainly aren’t limited to the weather domain. Consider eric boam’s “7 Months of Sleep” project, in which every day has a range indicating the hours slept, with bedtimes at the top and waking times the next morning at the bottom. These examples work well in print and when shown on large displays, but that’s not how many of us now consume weather data or personal health and activity data, such as sleep duration. Instead, these and several other sources of range data are likely to be consumed from a mobile device.

Apps (L → R): Dark Sky · Weathertron · Weather Line · Azumio Sleep Time · Garmin Connect · Bedtime (iOS Clock) · Activity (iOS) · SleepTight (Choe et al, Proc. UbiComp 2015) · A blood pressure tracking app by Chittaro (Proc. AVI 2006)

There are weather apps like Dark Sky, Weathertron, and Weatherline that feature the familiar temperature range encoding for 7 or 10 days or aggregated across 12 months. As for sleep tracking, there have apps like Azumio, Garmin’s Connect, or the Bedtime feature in the iOS Clock app. If we consider the heart rate range charts shown in the iOS Activity app and the sleep and blood pressure ranges shown in these research prototypes, it’s worth finding out just how many ranges can be feasibly shown on a mobile phone.

Another question relates to the linear convention used when visualizing ranges over time. Time is also cyclical; we experience seasons, lunar cycles, and weekday/weekend routines. The cyclicality of time is exemplified in work like Timm Keketitz’s Weather Radials, or in a number of radial range charts featured in Manuel Lima’s Book of Circles. But are these designs appropriate for mobile displays? Bear in mind that mobile app and website designers appear to be quite fond of radial layouts, as illustrated in a survey of visualization on mobile devices by Sebastian Sadowski.

A crowdsourced experiment on mobile phones

The answer to how many ranges can be fit into mobile display and to whether you should use a linear or radial layout is of course, “it depends on the data and the task”. We know this because we (Bongshin Lee, Petra Isenberg, Eun Kyoung Choe, and myself) conducted a crowdsourced experiment in which we asked crowd workers to perform a set of different tasks on their mobile phone.

Our experiment involved both linear and radial layouts, as well as three granularities of time: a week of 7 seven ranges, a month of up to 31 ranges, and a year of 365 ranges.

Given the prevalence of ranges in weather and sleep tracking applications, we opted to use a year of daily temperature range data from a temperate American city known for its seasonal fluctuations, as well as a year of real bedtime and waking time data from a diligent quantified selfer on the r/datasets subreddit.

For the temperature data we had observed as well as recorded average temperatures (we show the latter in grey, with the color gradient observed temperature range superimposed over it). However, with sleep data, we only had the observed sleep time ranges. To produce the analog of an ‘average sleep’ range, we considered that sleep apps such as iOS’s Bedtime or Garmin’s Connect app indicates sleep time goals, and we also hear of popular advice from health professionals that it is best to keep a consistent sleep schedule and to sleep for about 8 hours if you are an adult. This means that while the ‘average’ ranges for daily temperatures fluctuate throughout the year, the ‘average’ sleep range remains constant throughout the year.

An explanation of the visual encoding used in our experimental application for temperature ranges (Left) and Sleep duration ranges (Right).

It’s also important to note that the quantitative scale for a linear layout is different for these two data sources, owing to the weather reporting convention that warmer temperatures are higher in a chart and that later times are lower in calendars; The color encoding reinforces this difference. We also provided different semantic cues for these two data sources, which included different wordings in task instructions and different iconography. As a result, we divided our participants into two groups, one for temperature ranges and one for sleep ranges. We did not directly compare the results of the two groups due to these confounds; they should be seen as separate experiments.

Using an existing visualization task typology as a framework, we designed five experimental tasks, having increasing difficulty and completed in order, though within each task we counterbalanced the presentation of layout and granularity, with several trials for each combination.

A 30-second video includes examples of the five experimental tasks.

These tasks included locating dates, reading values on indicated dates, locating extreme range values, comparing observed and average range values on indicated dates, and comparing spans of observed and average ranges. For the two comparison tasks, we presented participants with a fixed set of 3 response options. For the other tasks, participants had to select a region on the chart. We asked participants to contain the correct response within a dashed region that followed their touch point, and this region spanned between 1/7th and 1/12th of the possible response domain, depending on the granularity; in other words, they didn’t need to exactly touch the correct value.

We collected both completion time and response accuracy for each trial, and at the end of the experiment, we asked participants about their preference and their overall confidence in their responses for each combination of layout and granularity. We recruited 100 participants from Mechanical Turk’s U.S. crowd worker population, split evenly into a temperature range group and a sleep range group. They only had one opportunity to perform the experiment in its entirety, which took between 20 and 25 minutes. They had to use a mobile phone running a recent version of iOS or Android and either the Chrome or Safari mobile browser. We excluded results from 13 participants, due to non-completion, non-compliance, or for failing to correctly respond to quality control trials distributed throughout the experiment, leaving 40 participants in the Temperature group and 47 participants in the Sleep group. With the remaining data, we calculated ratios in completion time and differences in error rate between the two layouts and between the 3 levels of granularity.

Should you use a Radial or a Linear Layout?

Let’s first consider ratios in task completion time between linear and radial layouts. Given our results, people tend to complete comparison tasks in about the same amount of time with either layout, but for tasks that require locating values, people tend to be slower with radial layouts, irrespective of the source of the data, especially when reading values for an indicated day. However, radial layouts don’t seem to incur accuracy costs, at least with temperature range data, where there tends to be a lot of seasonal variation. With sleep range data, which exhibits no seasonal variation, people are less accurate when reading values for indicated dates or when locating extreme values.

The quantitative domain in a radial layout is compressed to half of the chart area.

The difficulty in reading range values for indicated dates may be due to the fact that the quantitative domain in a radial layout is compressed to half of the chart area, from the centre to the periphery, thus putting it at a disadvantage relative to a linear layout, where the quantitative domain spans the entire height of the chart, so individual marks are twice as tall. We kept the chart size constant between the two layout conditions, but it would be interesting to repeat our experiment in which mark size is kept constant instead of chart size, meaning that linear range charts would be compressed to half of their current height.

We had speculated about the resolution at the periphery of a radial layout.

Our findings also suggest that it does not make a difference whether the task has to do with the beginning or end of the range. We had speculated about whether the increased chronological resolution around the periphery of a radial layout would contribute to better performance for locating and reading values at the periphery, and conversely whether the centre of a radial layout would incur worse performance. People tended to be slower with a radial layout irrespective of whether the task asked them about the start or end of the range, there were no pronounced differences in accuracy.

So you might be wondering if our results contribute another nail in the coffin so to speak for radial layouts in general, this despite their popularity in practice and design communities. It’s tempting to say this but our results are somewhat more nuanced. Yes, people tend to be slower with radial layouts, but only in value reading task contexts are they less accurate, while there seems to be no difference in accuracy for comparison tasks. However, it’s also worth remarking that our participants universally preferred linear layouts, and they felt more confident using linear layouts relative to radial ones.

If the task is primarily about comparing ranges values, it doesn’t appear to matter whether you use a linear or radial layout, at least in terms of accuracy.

If the task is primarily about comparing ranges values or comparing spans of observed and averages ranges, like comparing whether the observed temperatures in one month were more aligned with average temperatures relative to some other month, it doesn’t appear to matter whether you use a linear or radial layout, at least in terms of accuracy.

It’s also entirely possible that there are tasks other than those we considered where a radial layout has an advantage or performance parity with a linear layout, such as in year-over-year comparisons like in Ed Hawkins’ Climate Spirals, though to determine this we may need to revisit our choice of encoding and consider the use of paging, scrolling, or animation, at which point performance depends on memory as well as perception. Nevertheless, there are opportunities for additional future research in this area.

How many ranges should you show in a mobile display?

Our other primary question in this work pertained to how many ranges can you fit in a mobile display and still retain reasonable task performance, such as when you jump from a week of 7 ranges to a month of 31 ranges to a year of 365. You might expect that adding more marks in a chart will incur worse performance, but again our findings are more nuanced.

As you might expect, people were slower with a month ranges than with a week of ranges, and in some instances the ratios in completion time from a week to a month was greater than from a month to a year. With regards to accuracy, the jump from a week to a month only incurred noticeably worse performance when locating extreme values. Interestingly, there were cases where people were as accurate or even more accurate with a year of ranges than with a month of ranges, particularly among those looking for extreme temperatures. It is in this respect that we observed different results between the temperature and sleep range groups, where the expected annual trend of temperature ranges appears to make the task easier relative to examining a month of sleep ranges.

A month may not be an appropriate granularity for temperature or sleep ranges.

One interpretation of these results is that a month may not be an appropriate granularity for either temperature ranges or sleep ranges, since temperatures follow an annual cycle, not a monthly one, and we tend to have a weekly sleep routine across weekdays and weekends, as opposed to a monthly sleep routine. It is possible that other sources of range data are more appropriate to display at a monthly granularity, such as lunar or tidal cycles.

Broader implications and open questions

Ultimately, the questions of which layout and which granularity to display are really questions about congruence with the data and task. You should ask whether a cycle is meaningful in the context of the data, whether the task is about locating values or comparing values, and whether task efficiency is the first priority.

It’s important to stress that our results and how we interpreted them in terms of implications for design should in no way be used to inform the visualization of ranges over time in non-mobile contexts. Our experiment was conducted exclusively on mobile phones and we can only speak to that form factor.

Nor do our results allow us to comment on the experience of interacting with range charts with different layouts and granularities. We designed the response mechanism in each task to be fairly simple, without requiring precise selection or the entering of responses into text fields.

The design of experiments such as ours aim for a balance between external validity and control over potential confounds. In light of our results and the possible differences between participants who saw temperature data and those who saw sleep data, a potentially informative follow-up experiment would be to remove the semantic cues from the charts and the task wording, or to repeat the experiment with other sources of range data, such as heart rate or blood pressure.

Another direction to consider is the engagement of the participants with respect to their lived experience of the data being shown: what would our task performance results look like if participants were looking at their own sleep data? Or if they were looking at temperature data from where they lived? Perhaps what is needed here is a deeper engagement between visualization researchers and the quantified self community, with people who already have a keen interest in tracking and analyzing their personal data, particularly when that data is consumed from a mobile phone.

Beyond range data, our work reaffirms the need for more studies of visualization for mobile devices, particularly as more and more data from of our lives becomes accessible on our phones. The presence of visualization in mobile apps and in mobile-first news outlets will continue to rise, and the visualization research community must continue to investigate mobile-first and mobile-only visual encodings and interactions. Our work offers one approach to carrying out this research via crowdsourced experimentation, though there are certainly other approaches worth considering.

If this work resonated with you, we invite you to engage with us, to provide feedback and ideas with respect to future research directions for visualization on mobile devices.

Experience the experiment yourself

Our experimental application is available under an MIT open source license, and you can still experience our experiment on your own phone at aka.ms/ranges. This website can only be viewed from a mobile phone held in portrait mode, and it is compatible with recent versions of mobile web browsers such as Chrome and Safari. The experiment takes about 20–25 minutes to complete.

Want more detail?

This post is adapted from a talk that I gave at the IEEE VIS conference on October 24, 2018 in Berlin. The talk and this post summarizes an 11-page journal paper in Volume 25, Issue 1 of IEEE Transactions on Visualization and Computer Graphics. You can read our pre-print version here.

Acknowledgments

This work was a collaboration between Bongshin Lee (@bongshin | Microsoft Research), Petra Isenberg (@dr_pi | Inria), Eun Kyoung Choe (@slowalpaca | University of Maryland), and myself (matt brehmer | Microsoft Research).

We thank Pierre Dragicevic for his suggestions regarding result analyses, Ken Hinckley, Catherine Plaisant, and Lonni Besançon for their comments on the paper, as well as our pilot participants for their feedback on the experiment application and procedure.

Beyond the academic research cited in our paper, this work is inspired by and indebted to the work of practitioners: radial/circular visualization example curation by Manuel Lima (The Book of Circles); mobile data visualization example curation by Sebastian Sadowski (mobileinfovis.com)and Irene Ros (mobilev.is); SVG tutorials by Nadieh Bremer ( SVG gradients | Boston weather radial); and range charts by Timm (weather-radials.com), STUDIO TERP (Eindhoven weather radial), eric boam (7 Months of Sleep), Randy Olson (weather chart for fivethirtyeight), and Susie Lu (@DataToViz| weather radial chart).