39 studies about human perception in 30 minutes
These are my speaker notes from a talk I gave at OpenVis in April 2016. Originally this talk was supposed to be called “Everything we know about how humans perceive graphics,” which is… at a minimum, pretty pretentious. I scaled back a bit.
Clearly I have learned everything there is to know about how humans perceive data graphics in preparation for this talk. Literally everything. But I couldn’t possibly fit it all into 30 minutes. So for your sake, let’s just refer to this talk as “39 studies on human perception in 30 minutes.”
I’m a graphics editor at The Washington Post, which means I’m part developer, part journalist. Graphics editors like me often rely on common wisdom and experiential knowledge to inform our decisions about design and visualization choices — which is incredibly valuable.
For the last several years, I’ve wondered about what we actually know from scientific studies about how humans perceive graphics. I’ve collected things here and there, but when I started to get into the thick of it, I realized how extensive this body of research really is. There is a lot that I’m leaving out.
This is an extreme distillation of these studies. In reality, these are robust studies, many with multiple experiments each, loaded with meaningful nuance that I do not have time to address. These are very simplistic representations of one or two core findings. I encourage that you read them in their entirety and consider their complete findings.
In future posts, I’d like to dive in deeper to each section and present a bit more context for these findings. Look out for more!
In Colin Ware’s book, Information Visualization, he wonders: Is visualization a science or a language?
It’s perhaps a science because it must represent data accurately, methodically and without flourish so that we can see the underlying trends and patterns. Because of this, selecting the right visualization for your content could be prescriptive based on what you want to show.
However, many argue that it is a language because it uses diagrams to convey meaning. Data is encoded into symbology and semiology. The syntax and conventions of these diagrams must be learned and are not inherent.
Throughout this presentation, think about which you think it is.
In 1984, William Cleveland and Robert McGill (1) published a study that can only be described as the archetypical seminal study for information visualization.
It gives us our first ranking of so-called “elementary perceptual tasks,” which are the most basic visual tasks we perform in our perception of graphs. It is referenced by many of the studies I’m about to talk about, and in fact, whenever that happens, I put William Cleveland’s little head next to it (I couldn’t find a picture of McGill).
At the top of the ranking, the easiest perceptual task, is “position along a common scale.” Cleveland and McGill say it is easiest for us to compare objects in terms of a common scale, like an axis. Scatterplots are a good example of this: circles in scatterplots are anchored on two common scales, the x- and the y-axis.
Bar charts also involve comparing objects on a common scale, typically the x-axis, but Cleveland and McGill admit they can also involve judgments of length and area. The list goes on. I encourage you to read and familiarize yourself with what they consider to be elementary perceptual tasks.
However, this study was conducted three decades ago. How relevant is it today?
Fortunately for us, we have some clue. Heer and Bostock (2) revisited some of Cleveland and McGill’s old experiments in a study they did as more of a proof of concept for using Mechanical Turk users in scientific studies.
Their results were very similar to those of Cleveland and McGill, at least with the perceptual tasks they tested for.
Three studies show that we have inherent biases related to the types of graphs we see and the objects we see in them. These biases may distort the information we retrieve from a graph.
We know that from Steven’s law, when an object is seen in context of other larger objects, it appears larger itself. When it is seen in context with smaller objects, it appears smaller. Jordan and Schiano (3) found that increasing spatial separation between lines produced the opposite effect. If lines were close enough, a line’s length was more similar to the length of the line around it (this is also known as assimilation). If the lines were far apart, long lines appeared longer, and short lines appeared shorter (also known as contrast).
In two other studies, Schiano and Tversky (4, 5) found that charts were remembered as being more symmetrical than they actually were. They gave participants diagrams that looked like these, and told them they were either charts or maps.
They found that when participants encountered the diagrams introduced as charts, they remembered the lines closer to the imaginary 45-degree line. Further, when the same line was presented to them as a map, this distortion did not occur.
When text appeared with the chart calling attention to its symmetry, participants recalled it being a more symmetrical chart even if it was not. This leads me to believe that annotations can be powerful in relaying information.
In a separate study, they confirmed a systematic bias toward an imaginary 45-degree line in line charts. When a diagonal line was presented to participants, they continually remembered this line as being closer to 45 degrees than it actually was, meaning they underestimated larger angles and overestimated smaller angles. Thus, the 45-degree line is a reference point for line charts, but not in other contexts.
Their work suggests that different visual systems promote different reference frames.
Croxton (6) found over eight decades ago that bars were more effective in communicating comparative values than either circles, squares or cubes. Circles and squares were about as effective as the other. Cubes were undoubtedly the worst. More on 3d later.
bars and pies for proportions
Much is said about the relative merits of bars and circles for showing proportions.
All five of these studies legitimize the use of pie charts when conveying proportions and some even show their superiority over bar charts.
I did not encounter any studies that said we should not use pie charts for showing proportions in all cases.
Eells (7) was among the first to publish a paper on this topic in 1926. In his time, pie charts were ridiculed much as they are today for their assumed perceptual inadequacies. For example, he was told that the human eye cannot judge arcs, angles or chords very efficiently.
He also wanted to know more about how circles were processed. So he handed out worksheets to a psychology class and asked them to estimate the proportions in these pie and bar charts.
Not only did he find that pie charts were read as easily, quickly and accurately as bar charts, but that as the number of components in the chart increased, bars become less efficient encoding the data. The opposite was true for pie charts
He found that 50 percent of people use the outer arc to make proportional judgments, while 25 percent use area, and the other 25 percent use the inner arc or angle. Furthermore, 71 people in the class preferred the pies and only 25 preferred the bars.
He concluded that we ought to use pie charts, not just for their appeal but because of their scientific accuracy.
He also concluded that men were superior to women in estimating these proportions. So, hats off to men.
A follow-up study in response to Eells’s work the following year by Croxton (8) did not find that pie charts were so conclusively better than bar charts, but they did pull ahead for some of the cases.
Six decades later, and in three more experiments, pie charts were hailed for their strength in conveying proportional data, in some way or another.
Simkin and Hastie (9) had participants make proportional judgments and segment-to-segment (comparison) judgments. And found that for segment-to-segment judgments, simple bar charts worked best, followed by divided bar charts and then pie charts.
For proportional judgments, pie charts and divided bar charts were tied, with simple bar charts the least effective.
Spence and Lewandowsky (10) found that comparisons among multiple segments take longer and have lower accuracy. Pie charts fared the worst except when multiple segments had to be compared. Tables were found to be inferior to everything except for communicating absolute values, despite what Tufte advises.
Hollands and Spence (11) found that as the number of components in bar charts increase, their effectiveness at communicating proportions decreases. In fact, for each new component in bar charts, a reader needs an additional 1.7 seconds for processing.
bars and lines
In two separate experiments in the same study, Zacks and Tversky (12) found that when participants were shown bar graphs and asked to describe the data, they continually referenced contrasts between the variables in the bars (e.g., “A is greater in X quantity than B”). Whereas with line charts, participants described trends (e.g., “As X increases, Y increases”).
Even when researchers (13) presented participants with a graph showing a third variable of data, the line chart descriptions remained focused on x-y relationships, whereas the bars branched out a bit more to include this new variable.
These studies show that people have a hard time seeing messages in line charts beyond trends.
bars and pies and lines
Hollands and Spence (14) evaluated if the performance of a graph depends on the type of judgment that needs to be done. They felt that line charts were superior to other graphs for showing change because they were “integrated” interfaces: viewers are able to perceive change directly from slope. Using pie charts to depict change requires sequencing several charts, and therefore are inferior, “separated” interfaces.
They tested participants’ perception of change and proportion among bar, pie and line charts.
Pie charts obviously failed at communicating change efficiently, but they found that bar charts had similar success to line charts, and they wondered why.
They hypothesized that it was because people draw imaginary lines between the bars. So they created a new, terrible graph called a tiered bar chart that would break up these imaginary lines, and tested participants again.
They found that yes, any chart allowing the reader to see a real or imaginary trend line was the best at communicating change. For proportions, if charts had no scales, pies were the best.
Line shape can be loaded with context that fascinates us, but also distorts our perception of data.
As we know, the independent variable (the cause) is usually plotted on the x-axis and the dependent variable (the effect) on the y-axis. But we also tend to perceive slope as being a metaphor for quickness, height or amount. These two conventions can be in conflict with each other.
Gattis and Holyoak (15) designed an experiment where slope could indicate height, or altitude but that meant that the independent and dependent variables were on the wrong axis. They presented both the right and wrong chart to participants and asked if the dotted line indicated a quicker or slower rate.
When altitude was on the y-axis, corresponding with the visual metaphor, participants were more accurate. In other words, we tend to see slope as representative of quickness, height, amount or rate above anything else. They concluded that there are certain pictorial properties of slopes that “facilitate reasoning above all others.”
They also revealed a universal association of “more” or “better” with the upward direction.
Carswell and coauthors (16) found that when line graphs showed trend reversals, people studied them longer. They found that this was not the case when they varied the number of data points, symmetry or linearity.
Here are four studies that suggest we evaluate 3d objects more accurately than we commonly think. Two of these studies individually reject Tufte’s popular “high data-to-ink ratio” philosophy.
Siegrist (17) finds that among bar charts, 2d is not superior to 3d, but 3d charts take slightly longer to process. With pies, 2d is better, and the perspective angle makes a big difference in how accurately the slices are evaluated, most likely because some of the slices are more obscured than others.
Levy and coauthors (18) acknowledge that 3d graphics, while “glitzy” and “sexy,” do not convey any additional information and force the reader to “deal with redundant and extraneous cues.”
Participants were given the option to select among 2d and 3d charts. When they were told to select a chart to present to other people, they tended to choose 3d charts. They also selected 3d charts when they were told the data had to be remembered. They selected 2d bar graphs more when they were told they needed to convey specific details, and selected line charts when the message had to be communicated quickly. The authors conclude that 3d charts can be useful in some cases
The final two experiments by Spence (19, 20) deal with Steven’s law, which again (very simplistically) says that an object’s size appears larger when presented with larger objects, or smaller when presented with smaller objects. Spence found that contrary to popular physics, this distortion does not happen when comparing two shapes of the same dimensionality. Only when you vary the dimensionality among shapes does this distortion occur.
Cleveland and coauthors (21) found people come to conclusions about the correlation in scatterplots partly based on the size of the point cloud. When the same correlation is represented in two graphs, but in one graph the scale is blown out so the point cloud becomes very small, people perceive it as having a higher correlation.
Experimenting with symbol type in scatterplots, Lewandowsky and Spence (22) find that altering color is most discernible to the eye. When varying color is not an option, varying fill or shape (or even non-confusable lettering) has “no great loss in accuracy.”
He suggests that using letters has one clear advantage: providing a semi-label for the data (M for males; F for females). I personally think this can be achieved with annotation without the risk of confusing the location of the symbol’s centroid.
In a crowd-sourced experiment, Demiralp and coauthors (23) reordered the native Tableau color and symbol palette so that it’d be ordered by the shapes and colors most discernible to the eye.
Ziemkiewicz and Kosara (24) found that directing participants to navigate treemaps with metaphors to complete certain tasks made them more accurate. For example, directing participants to find a data point “inside” a container-like treemap and telling them to look “below” in a cascading treemap worked best.
Kong, Heer and Argawala (25) found that people discern values in treemaps best when the components are rectangles with diverse aspect ratios. Somewhat counterintuitively, squares are not easy to compared to each other. Extreme ratios in rectangles are also ineffective for comparisons.
They additionally found that, surprisingly to them, small multiples of bar charts were better than treemaps at representing datasets with fewer than about 1,000 data points for leaf-to-leaf comparisons.
In 2001, Barlow and Neville (26) asked participants to compare four different hierarchal plots. They found that participants did not like the treemap, and preferred the icicle plot and org chart.
In 2007, Cawthon and Moere (27) showed that aesthetics can be linked to an individual’s engagement with a particular visualization. They found that the sunburst vis was the most popular, and participants found the 3d-looking beam tree to be the most hideous. Engagement was best with the sunburst, icicle and startree. Performance was worst with beamtree and treemap. They concluded that sunburst exemplifies that “beautiful can be useable.”
Harrison and coauthors (28) ranked the effectiveness of several visualization types for depicting correlation. They found that scatterplots and parallel coordinates were best at this. Among the stacked chart variants, the stacked bar significantly outperformed both the stacked area and stacked line.
A year later, Kay and Heer (29) reevaluated the data and split the visualizations into four groups by level of precision. The top ranking remained the same, however.
Heer, Hong and Argrawala (30) explored how dense graphics can be, and still be effective at conveying values. They mirrored the negative values in time series charts and overlayed the extreme values into 2, 3 and 4 color bands, effectively reducing the chart size by 75 percent. They also manipulated the height of the chart so that some appeared only 6px high.
They found that the more color bands present in the graphs, the more people made mistakes, suggesting that sometimes not all visual markers are helpful to viewers. Regular line charts performed the worst at small sizes, showing that the mirroring effect isn’t a detriment to comprehension.
pictographs and drawings
Haroz, Kosara and Franconeri (31) experimented with using pictographs instead of generic shapes to represent data in simple charts. They found that using discrete shapes, whether they were generic circles or pictographs, helped people remember the data better than a single bar. Using pictographs as replacement for text on the axis led to more errors.
They found that replacing generic shapes for pictographs was not a detriment to perceiving or remembering the data presented in stacked graphs. People were also more inclined to investigate visualizations that used pictographs rather than ones with generic shapes.
In one of my favorite studies, Cole and coauthors (32) investigated if participants could understand computer renderings of line drawings as well as a human artist. They found that many algorithms available today are indeed comparable to human renderings, but some do not come close.
The models that participants had the hardest time with were typically “smooth, blobby shapes.” Even when viewers interpreted the shapes inaccurately, however, their interpretations were similar to each other, and were concentrated in hotspots on the shapes.
Ziemkiewicz and coauthors (33) evaluated how personality traits affect performance with list-style visualizations and container-style visualizations.
Those with a high internal LOC (those who believe than can control external events) weren’t as good with the container-like visualization. Those who had high external LOC (believe that they are merely controlled by external events) performed faster and more accurately overall.
Similarly, more neurotic participants were best with the more structured (container-like) visualization, while other participants displayed the opposite trend. Introverts were more accurate overall than extroverts.
Lee and coauthors (34) observed “novice users” when they encountered unfamiliar visualizations and tried to make sense of them. Participants had difficulties in moving away from their initial frameworks, even if they were incorrect. Thus, first impressions are quite important.
Harrison and coauthors (35) provided participants with a New York Times article aimed at affecting their emotion, either positively or negatively, and tested how well they performed at simple perceptual tasks.
They found that negatively primed participants made more errors, and positive priming tended to improve performance. Only 1 in 5 were successfully primed, however, proving it is difficult to achieve.
Hullman, Adar and Shah (36) replicated Heer and Bostock’s earlier study with Mechanical Turk, but with one major change. They showed a social histogram of the last 50 responses as the participant answered questions. For some participants, the histogram was skewed upwards or downwards from the real values. Those shown the incorrect histogram got more questions wrong.
Liu and Heer (37) found that a latency of a half a second with interactive graphics has profound effects on the way a viewer engages with the graphic. They moved the mouse less, shifted the types of interactions they did and lessened some of their interactions altogether. They were also more likely to comment verbally on the interface.
This delay also affected subsequent sessions for that particular participant; he or she was less likely to engage in a graphic during a session that followed. Originally the researchers wanted to also include a 1-second delay option but in pilot studies, users found this unusable.
In 2007, Wigdor (38) and coauthors found that completing elementary perceptual tasks like detecting position or angle direction was a lot harder if a screen was flat on a table. Therefore, the orientation of the screen can distort the perception of a graph.
In 2008, Robertson and coauthors (39) gave participants a large dataset presented three different ways: in an animated fashion, in a static but traced version and a small multiples version — and asked them questions about the data.
Even though the animation version won time and time again in a preferences survey for helpfulness, ease of use, enjoyment, excitement, researchers said that the small multiples version was more effective for larger datasets, was faster and led to fewer errors.
Finally, in another one of my favorite studies, Lin and coauthors (40) created an algorithm to identify “semantically resonant colors.” For example, if I want to talk about oceans, I use the color blue. If I want to talk about love, I use pink or red.
The algorithm searched Google Images and assigned a particular color to a keyword. They crowd sourced how well this algorithm did with Mechanical Turk participants. Some of it turned out a little funny.
In a later experiment in the same study, the Tableau symbol artist design a semantically resonant color scheme, and researchers tested their algorithm against that and a non-semantic color scheme. They bet against their own algorithm, hypothesizing that the human-picked palette would be the most effective.
But it turned out that it did just as well as the human-generated palette with a subset of the data.
What can we conclude from these studies? What is still left to be uncovered?
How do these findings change as visualization becomes more commonplace and people become more visually literate?
More importantly, what do you think about Colin Ware’s question about data visualization being a science or a language?
I don’t think data visualization or data storytelling is prescriptive; I think there are many options and an infinite number of possible visualizations, many we haven’t even discovered yet.
But perhaps it’s not purely a language either. Obviously there are biases and distortions we bring with us when viewing graphing frameworks and objects, some of which is learned, but some of which seems inherent.
I don’t want to imply that the findings in these studies should be applied wholly and indiscriminately to graphics. Most of these studies test the accuracy of very specific tasks, mostly dealing with an individual’s ability to perceive differences in size of certain shapes and elements. That is often not the primary goal of a graphic.
The findings about treemaps illustrate this notion: participants were more accurate with small multiples of bar charts than treemaps for leaf-to-leaf comparisons. That is helpful to consider, but it is quite possible that representing the entire ecosystem of data is more important.
There is so much left to study in the realm of human perception of graphics, especially with dynamic and interactive graphics. We all benefit when practitioners share their experiential knowledge openly and widely, which can help give us insight into the large gaps of space where current studies do not cover.
I’m not able to format the bibliography as I had in my slides, but the following studies reference the Cleveland and McGill study and therefore should get the pink William Cleveland head: 1, 2, 4, 5, 9, 10, 13, 14, 17, 18, 19, 21, 22, 23, 24, 27, 33, 34, 37, 38 and 39.
- Graphical Perception: Theory, Experimentation, and the Application to the Development of Graphical Methods. William Cleveland, Robert McGill, 1984.
- Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design. Jeffrey Heer and Michael Bostock, 2010.
- Serial processing and the parallel-lines illusion: Length contrast through relative spatial separation of contours. Kevin Jordan and Diane Schiano, 1986.
- Perceptual and conceptual factors in distortion in memory for graphs and maps. Barbara Tversky and Diane Schiano, 1989.
- Structure and strategy in encoding simplified graphs. Diane Schiano and Barbara Tversky, 1992.
- Graphic comparisons by bars, squares, circles and cubes. Frederick Croxton, 1932.
- The relative merits of circles and bars for representing component parts. Walter Crosby Eells, 1926.
- Bar charts versus circle diagrams. Frederick Croxton and Roy Stryker, 1927.
- An Information-Processing Analysis of Graph Perception. David Simkin and Reid Hastie, 1987.
- Displaying proportions and percentages. Ian Spence and Stephan Lewandowsky, 1991.
- Judging Proportion in Graphs: The Summation Model. J.G. Hollands and Ian Spence, 1998.
- Bars and Lines: A Study of Graphic Communication. Jeff Zacks and Barbara Tversky, 1997.
- Bar and Line Graph Comprehension: An Interaction of Top-Down and Bottom-Up Processes. Priti Shah and Eric Freedman, 2009.
- Judgments of Change and Proportion in Graphical Perception. J.G. Hollands and Ian Spence, 1992.
- Mapping conceptual to spatial relations in visual reasoning. Merideth Gattis and Keith Holyoak, 1996.
- Stimulus complexity and information integration in the spontaneous interpretation of line graphs. C. Melody Carswell, Cathy Emery and Andrea Lonon, 1993.
- The use or misuse of three-dimensional graphs to represent lower-dimensional data. Michael Siegrist, 1996.
- Gratuitous graphics: Putting preferences in perspective. Ellen Levy, Jeff Zacks, Barbara Tversky and Diane Schiano, 1996.
- Visual Psychophysics of Simple Graphical Elements. Ian Spence, 1990.
- The apparent and effective dimensionality of representations of objects. Ian Spence, 2004.
- Variables on scatterplots look more highly correlated when the scales are increased. William Cleveland, Persi Diaconis and Robert McGill, 1982.
- Discriminating Strata in Scatterplots. Stephan Lewandowsky and Ian Spence, 1989.
- Learning Perceptual Kernels for Visualization Design. Cagatay Demiralp, Michael Bernstein and Jeffrey Heer, 2014.
- The Shaping of Information by Visual Metaphors. Caroline Ziemkiewicz and Robert Kosara, 2008.
- Perceptual Guidelines for Creating Rectangular Treemaps. Nicholas Kong, Jeffrey Heer, and Maneesh Agrawala, 2010.
- A Comparison of 2-D Visualizations of Hierarchies. Todd Barlow and Padraic Neville, 2001.
- The effect of aesthetic on the usability of data visualization. Nick Cawthon and Andrew Vande Moere, 2007.
- Ranking Visualizations of Correlation Using Weber’s Law. Lane Harrison, Fumeng Yang, Steven Franconeri and Remco Chang, 2014.
- Beyond Weber’s Law: A Second Look at Ranking Visualizations of Correlation. Matthew Kay and Jeffrey Heer, 2015.
- Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations. Jeffrey Heer, Nicholas Kong and Maneesh Agrawala, 2009.
- ISOTYPE Visualization — Working Memory, Performance, and Engagement with Pictographs. Steve Haroz, Robert Kosara and Steven Franconeri, 2015.
- How well do line drawings depict shape? Forrester Cole et. all, 2009.
- How Visualization Layout Relates to Locus of Control and Other Personality Factors. Caroline Ziemkiewicz et. all, 2012.
- How do People Make Sense of Unfamiliar Visualizations?: A Grounded Model of Novice’s Information Visualization Sensemaking. Sukwon Lee et. all, 2015.
- Influencing Visual Judgment through Affective Priming. Lane Harrison et. all, 2013.
- The Impact of Social Information on Visual Judgments. Jessica Hullman, Eytan Adar and Priti Shah, 2011.
- The Effects of Interactive Latency on Exploratory Visual Analysis. Zhicheng Liu and Jeffrey Heer, 2014.
- Perception of Elementary Graphical Elements in Tabletop and Multi-Surface Environments. Daniel Wigdor et. all, 2007.
- Effectiveness of Animation in Trend Visualization. George Robertson et. all, 2008.
- Selecting Semantically-Resonant Colors for Data Visualization. Sharon Lin et. all, 2013.