Data is Personal. What We Learned from 42 Interviews in Rural America.
This post is based on our paper at ACM CHI 2019 (Best Paper Award!)— Data is Personal: Attitudes and Perceptions of Data Visualization in Rural Pennsylvania by Evan M. Peck, Sofia Ayuso, and Omar El-Etr. For our data, materials, and other summaries of this work, please visit our project website.
From Barns to Bar Graphs in Rural America
The landscape in central Pennsylvania is beautiful where I work. If you take the 2–3 hour drive from Philadelphia or New York or Baltimore, you’ll enter a landscape that is molded by rolling green hills and “model train” farms. You’ll see Amish horse-and-buggies as you wind through small, close-knit communities, and along the way, you’ll pass roadside farm stands, confederate flags, and homemade religious signs (sometimes side-by-side). Nestled next to the anthracite coal region, central PA is a region that forged its identity alongside coal’s fading energy source.
But the topology of the land impacts more than the kinds jobs we work or our how far we drive to reach an Indian restaurant. It can fundamentally impact the way we access data. Consider this quote from the wonderful work of Dr. Jenna Burrell in rural California:
I argue that poor Internet connectivity is not simply a ‘natural’ consequence of the demographics of rural areas where residents tend to be lower income, possess lesser educational attainment, and are older in age. It is a matter of exclusion. This exclusion is shaped by geography, remoteness, and population density which are consequential within a particular American political economy where the availability of connectivity is largely market-driven.
The degree to which our access to information can empower or disadvantage us in 2019. And while Dr. Burrell talks about inequality in the context of Internet connectivity, we want to consider exclusion in the context of data communication — does the way in which we present data through visualizations help some people reason or understand data more than others?
While you might believe that there is nothing different about how rural populations reason with data, it takes only a glance at the 2016 presidential voting maps to see that, in some ways, we don’t all interpret the world the same way. And those same demographic groups mentioned in the quote above — lower income, lesser education, older in age — are hard to find in the data visualization literature. What don’t we know?
And so here, in beautiful, rural central PA, we began a research project looking at big questions: Who is paying attention to our data and who isn’t? Why aren’t they paying attention to data? What do they trust… and why?
We asked 40+ people from rural Pennsylvania to rank a set of 10 graphs. Then we talked about it.
At a farmers market, a construction site, and in university dining facilities, we interviewed 42 members of our community about graphs and charts to understand how they understand and engage with data.
- We showed people 10 data visualizations about drug use that varied in their visual encodings, their style, and their source.
- We asked them to rank the 10 graphs (without source information!) based on their usefulness.
- After revealing the sources of the graphs, people were given an opportunity to rerank their visualizations.
The people we talked to weren’t just young and weren’t just in college. They diverse in their education (60% never completed college) and age (26% were 55+, 33% were between 35–44). Through many hours of conversations, here is what we found…
Aggregate data is messy, hiding individuals
To get a high-level view, let’s start by seeing how many people gave each graph each ranking:
It didn’t take long for us to see that the ranking data was messy — and messy to the point that sharing aggregated means or medians was useless. We have a lot to say about this in our paper, but for the purposes of this post, individual preference and attention is complicated. Infographics are divisive (Chart J received the most 1 rankings and the most 10 rankings!), some people like simplicity, some like color, and some just like finding where they live.
But aside from these rough trends, if we’re serious about communicating data to all people, we need to understand more about these messy distributions. What are the stories behind the data?
Data is personal. Data is intimate.
As we analyzed and coded our interviews, we were reminded of something that we often forget — data can be intimate and personal. If someone found a personal connection to any graph, it didn’t matter the color, the style or the technique. For the people we talked to, charts with personal connections superseded all other design dimensions.
People who were impacted by alcohol were drawn to graphs with alcohol….
Information about alcohol [is the most important].
I’m dealing with a functioning alcoholic. The most important person in my life is an alcoholic.
Right now, that’s important to me.
— 65–74 year old, college graduate
People who were impacted by opioids were drawn to graphs with opioids…
As for some of the other [graphs], I happen to know quite a few people who unfortunately happen to have an issue with opioids… and it’s something you consider… are you going to see that person tomorrow or not?
— 25–34 year old with some high school, no diploma
Over and over (and over) again, people cited personal experience in order to rationalize their ranking decisions. And the stories they told us — researchers they had never met — were often intimate…
I have a few friends that died [from opioids], so [Graph F] made me put it that way.
— 25–34 year old, high school graduate
What we find striking about these conversations is not that they occurred at all, but the frequency with which they occurred in an interview design that wasn’t looking for them. It’s very possible that many other people we talked to held similar experiences that were left unspoken. And it leaves us with troubling questions… how can we possibly account for such powerful, personal factors in our designs?
Data is Personal: Relevant Geography
While these personal stories may be challenging to design for, others nudge us towards clearer design implications. Consider the response from one participant when asked why he chose to rank the line graph on the left higher than the line graph on the right.
I ranked it higher just for the simple fact that I live in America so I thought it was pretty relevant… more than the other one.
— 45–54 year old, associate’s degree
To be clear, both these graphs are about the United States. But notice how only one of them has a clear title that makes the data’s connection to the USA explicit? This is a simple design choice, but for our participant, it was the piece that matters.
Data is Personal: Where is Home?
If you’re like us, you may think that our findings would suggest that map visualizations are a clear and obvious winner. After all, overviews of the United States include Pennsylvania (PA).
And in fact, Pennsylvania did matter to our participants. But it manifested itself in a surprising way…
These two [US country] maps are [ranked low] because I like them less. It’s the whole country; it’s so huge. You naturally look at your state. It’s too busy. I’m not thrilled with those.
— 65–74 year old. high school graduate.
This wasn’t an outlier. Our beautiful overview maps were routinely criticized, often referred to as “cluttered” or “busy”. While Pennsylvania is on the map, it is surrounded with dense data from areas of the country our participants were not interested in.
This is interesting because we tend to lean hard on the design pattern of Overview → Details on Demand. But what we’re seeing here is that some people find the overview to distract from key information they care about. If we have access to personal information (like a browser had access to geolocation), we may be best served designing with a new pattern: Personal Details → Overview.
Many people see data as objective. That’s dangerous.
To this point, people judged visualizations without knowing the source. But once they provided their initial rankings, we revealed the source of our 10 visualizations. They ranged from government sources (National Institute on Drug Abuse) to universities (Drexel University) to news outlets (The New York Times, The Economist, BreitBart).
But for most of the people we interviewed, the sources didn’t matter. In fact, 60% of our participants decided not to change their rankings regardless of where the visualization came from.
We found that many people suggested that information has an objective quality that is immutable regardless of where that data may be showcased…
I think the information is information no matter from where it comes from.
— 18–24 year old, some college credit (no degree)
In fact, for many people, the data and the visualization were synonymous. For these people, the pipeline from data to design is clean and clear, without bias or rhetoric.
We know this isn’t true… but people still believe it. How can we design our systems to counter these false perceptions of objectivity?
Who is making these decisions?
Digging into the demographic data, we saw that the people we interviewed with more education were much more likely to change their rankings.
The pattern is interesting, but please be careful with these findings. The sample size is too small to start running around with generalizations.
But there’s one point here worth considering: A lot of the research and guidelines that guide our visualization designs were crafted through studies with people who had at least some college experience. Look carefully at how our findings would have changed without those people…
The story is very different. Which assumptions are we baking into our research papers, processes, and design guidelines that we may not be aware of? Which stories may we be missing?
Trust Matters. For people who DO account for source, political identity may frame their trust.
Of the people who chose to change their rankings, it may not come as a surprise that some of their decisions aligned with their political identity. People who identified as more liberal reranked graphs and charts from The New York Times higher than conservatives. Meanwhile, some conservatives reranked graphs and charts from BreitBart higher than liberals.
Wrestling with these implications is important. While we celebrate the data stories told in The New York Times or the Washington Post (for good reason!), I think we also need to reflect on who actually invests attention into them. Are we looking at the same data? Are we trusting it the same way? Are we remembering it the same way?
Aside from political identity, one person even suggested that they would pay more attention to visualizations from local news sources than national ones:
I don’t read [The New York Times], but even if I did like this picture, I still won’t buy the newspaper because I don’t live in New York. The Sunbury paper, that’s close to here. Then I would read it… but I still won’t read that one
— 45–55 year old. Associate’s degree
Again, the personal matters. And while we tend to an analyze visualizations in isolated, well-controlled environments, our platforms matter too.
What is the story of data visualization?
When I teach data visualization to students, I often lead with what I believe is the compelling story of the field. It goes something like this…
- Data reasoning is a necessary skill for everyone in 2019. Whether it’s navigating loans or choosing a college or understanding climate change… we need to understand data to make informed decisions both for us and for our communities.
- Data visualization is a critical tool that amplifies understanding and reasoning with data. At the highest level, it has the potential to democratize data and make it more accessible to more people. This is exciting!
If you believe this story in the same way that I do, it also means that we need to ask hard questions about data visualization in the same way we are asking hard questions about other technology in 2019.
Tools that amplify us — including data visualization — also have the potential to deepen divides if they’re not designed for everyone.
We need a better understanding of exactly who visualization amplifies and who it leaves behind.
What’s the best way to do this?