Embracing Variety — Conveying Different Messages with Your Choice of Data Visualization

Re-visualizing the COVID-19 Impact Study

Coleman Harris
Nightingale
Published in
9 min readDec 18, 2020

--

Photo by Emily Morter, Unsplash

Survey analysis is a valuable tool to gauge the public’s interest in some of the most important topics facing the world. This is evident in the coverage of political elections and voting trends, or even the familiar “we asked 100 people…” surveys behind the game show Family Feud. Rich survey data yields a bevy of useful results to guide prospective responses to changing attitudes, thoughts, and feelings.

In the case of the COVID-19, you can utilize the results from the expansive COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation, to understand global impacts of the pandemic. Initial analysis performed by the survey researchers added much to the ongoing conversation about the pandemic—whether on the intersection of COVID-19 mortality and race or varying disparities in economic conditions. Although these results reflect data from April through June, as infection rates continue to rise and lasting economic impacts are still looming, this data is still vital to the conversation.

In this article, I revisited the COVID Impact Survey to demonstrate how different perspectives can impact the messages a visualization conveys. This is the power of exploratory data analysis and data visualization, in general. Adjusting proportions, drilling down to specific subsets of the data, comparing different populations or trends side-by-side, and understanding trends over time are all vital to communicating what is important in the data.

Data trends are geographic: there are regional variations in the impact of COVID-19

Even from the beginning of the COVID-19 pandemic, there has been regional variation in the spread and impact of the disease. As reported in the April results of the COVID Impact Survey:

Expanding these results to include all of the geographic areas in the survey dataset, you discover that this regional variation is even more dramatic than those summaries suggest. This is an excellent method to utilize when visualizing data — start big and work down to the most interesting stories. In this case, analyzing according to the survey’s regions provides useful insight into regional variation of COVID-19 impact.

Percentage of each region’s respondents (normalized) who had a family member or close friend die due to COVID-19. Original image here.

As was reported in April, major cities in the United States were heavily hit by coronavirus outbreaks before the rest of the country — this is evident in this sorted bar chart visualization above. Large cities like New York and Chicago are at the top of the list, with large proportions of respondents in these cities having lost a family member or close friend. Interestingly, you also see that some Southern states with criticized responses to the pandemic like Alabama and Louisiana reported similarly large numbers.

I can also adjust this perspective — the above visualization and corresponding analysis from the COVID Impact Survey rely on a normalized count from each region that implements survey weights. This method ensures that comparisons between groups, like geographic regions, are more representative of underlying populations.

With this in mind, you can also visualize survey responses using a more straightforward, but statistically more limited, approach — by computing percentages out of counts of respondents, instead of sum of weights of respondents within each group. In other words, you can count the “raw” responses instead of using the population weights assigned by the conductors of the survey.

Percentage of each region’s respondents (raw counts) who had a family member or close friend die due to COVID-19. Original image here.

With this approach, trends and numbers change! Let’s take Florida as an example. With simple counting, Florida is near the top (ranking 4th), but when adjusting for participation in the survey, it shifts to the middle of the list (ranking 9th). The take-away? Remember that the sampled respondent in a survey may not reflect the whole population, sometimes over-sampling or under-sampling certain demographics, areas, or opinions. Where population weights are available, using them can provide more nuanced, and potentially more accurate, trends and numbers. This allows for a more representative look at the data, and the important attitudes they convey, where simply counting raw responses from a survey would give a biased result.

There are variations in age groups: younger Americans struggle with CDC recommendations

There has been much discussion about the public’s adherence to health recommendations on actions that can help slow a pandemic, like mask wearing and social distancing, and how factors like age and region can play a part in following these guidelines. As reported in May from the COVID Impact Survey:

Let’s dive deeper into this non-compliance with COVID-19 restrictions and compare the results over age groups. You can focus on answering some essential questions — what percent of each age group are following the CDC recommendations? Is this the same for mask wearing and social distancing? What about across regions?

Comparing adherence to CDC guidelines by age group. Original image here.

For social distancing, you see that only 3-in-4 respondents aged between 18 and 24 are claiming to social distance with people outside of their household — the lowest among all age groups. And, as the age group increases, this proportion of respondents in each age group willing to social distance steadily increases.

The percent of each age group willing to wear facemasks is much more consistent for all ages, with the percent of those willing to wear masks hovering around 90%. This could also suggest that restricting social activity is more difficult for younger demographics, or that masks are accepted as the new normal.

The visualization above has double the power: comparing trends across age groups, and comparing trends across two different, but related, behaviors. The side-by-side comparison of behaviors, using different colors, reveals trends in behaviors, on top of the separation of population by four different age groups in categories.

Breakdown of social distancing adherence by age group, adjusting for population density. Original image here.

And returning to the list of important questions, how do people in more densely populated cities behave differently than more sparsely populated areas? You can filter, or drill-down, with an interactive analysis to provide the answer. By a simple interaction on the population density question of the survey, you see that people in more rural areas are less likely to wear a mask outside their households.

While the differences between age groups were roughly within 15–20% points when merely looking at proportions wearing a face mask, visualizing this across geographic regions indicates differences of up to 40% points. This suggests that younger age groups are in fact not adhering as well to the face mask recommendations, especially in rural areas.

This re-imagination of the initial COVID Impact Survey, using the side-by-side bar charts to compare behaviors and filter to compare across densities, provides greater insight into where younger age groups tend to be less likely to follow CDC recommendations.

Choose your slice: pies and bars tell different stories of work in the pandemic

The COVID Impact Survey initially identified that the pandemic affects employment differently across regions, namely that respondents from a city like Chicago held a slightly lower belief (56%) that they would have work in 30 days compared to the U.S. in general (60%).

In the pie chart below, these pie sections appear similar and, as such, the visualization makes it difficult to understand this distinction. This motivates a more in-depth look at the data to understand the actual trends in the data, in a way that presents the results in a practical, easy-to-understand way.

Let’s re-imagine these pie charts as bar charts instead. Pie charts show part-of-whole relationships better, while bar charts make it easier to compare different responses with each other. I also removed the initial coloring so that the answers can be assessed in groups or separately.

Comparing employment chances in Chicago. Original image here.

Ordered by the likelihood of an employment chance, we have a better understanding of the Chicago area’s beliefs about working in the near future. And to make results even easier to understand, I removed responses from people who were unsure or skipped the question.

With the bar chart, it’s easier to discern that the two “extreme” options, Extremely likely and Not likely at all, dominate the responses. You also quickly understand that the three middle options are equally answered across respondents, which is a bit trickier with the wedges in a pie chart. This includes the middle option, typically reserved in a Likert scale for “No opinion”, which is the least opinionated — “Moderately likely”.

Focusing the story on Chicago, you can get a more detailed look at how hopeful these respondents are. Compared to the country as a whole, you see that Chicagoans are on par with expecting work in the next 30 days.

One aspect of in-depth analysis can look into the context of people’s lives and current economic situation, and focus on the hopefulness of those who are having the most hardship. What happens if you focus on the Chicago respondents in April through June who did not work in the last week? So, in essence, this includes those who lost work due to COVID-19 at the beginning of the pandemic. Let’s make this comparison using a bar chart to compare Chicagoan attitudes about an employment chance in 30 days, compared to in three months.

Comparing employment chances for affected Chicago respondents in 30 days and 3 months, with skipped and “don’t know” responses removed. Original image here.

In general, people are much more hopeful about their work chances in three months compared to the next 30 days — the proportion of respondents who expect work in 30 days nearly doubles when asking about three months. So, in Chicago, respondents were not confident about work chances in May, but they were more confident about work chances over May, June, and July. While the exploratory analysis tool I used to create these charts, Keshif, presents these trends across two charts and controls for ordering of the answer options, you may choose to combine this into a single bar chart with side-by-side bars for each of the features.

And as it turns out, this hope was perfectly reasonable. Unemployment rates, as reported by the Bureau of Labor Statistics, were incredibly high in April. Yet, over the past 6 months, they have steadily declined (even if they are still uncharacteristically high). The respondents’ perceptions that economic conditions would improve beyond the weary first month of the pandemic was ultimately correct, as the American economy continues to improve. While this may reflect some degree of optimism bias, the gradual decline in unemployment rates is certainly welcome.

Unemployment trends via the U.S. Bureau of Labor Statistics

What we can learn from survey data

My goal in this article was to create new visualizations and views into an important data source, taking care to explain the visualization choices and what insights they provide. This re-imagining offers a different perspective on the survey data to better understand impacts of the COVID-19 pandemic and how it has shaped attitudes across the world.

And, by using a rich survey with many interesting features, I showed that different ways of counting survey responses demonstrates regional differences in the impacts of COVID-19. I further showed that splitting responses by age group illustrates the difficulty younger Americans are having with CDC guidelines. I then illustrated how the pandemic has impacted work, and the utility of filtering the data to understand respondents’ opinions.

From following health recommendations to economic attitudes in the job market, I hope data-savvy (or, data-curious) readers gained some tips to better understand attitudes and impacts of a deadly pandemic here in the United States.

Coleman Harris is a technical writer for Keshif, a data scientist and graduate student. You can find him on Twitter or via his website.

References

  • Abigail Wozniak, Joe Willey, Jennifer Benz, and Nick Hart. COVID Impact Survey: Version #1–3. Chicago, IL: National Opinion Research Center, 2020.

--

--

Coleman Harris
Nightingale

Dad and statistician. Writes about data, science, and data science.