Gathering and Sharing Facts in a Post-Truth World

My mom always told me to take a stats class. She said that knowing how people can manipulate data is the foundation of discerning truth from fiction.

Well, those words must have stuck, because I’ve spent the past few years as a student and professional in research design, practicing the art of data collection and presentation.

I think the assumption is that data is inherently neutral. That’s just not the case. From intent to collection to interpretation, data goes through a lot of filters before it gets to you.

I am increasingly concerned by representation of “facts” in the world today, so I’ve created this post as a short lesson in the essential structure of a non-biased survey. Let’s explore the recent fiasco that was the “Mainstream Media Accountability Survey” as an example of what not to do for the three main phases of survey design:

  • Unbiased questions
  • Representative sampling
  • Honest data analysis

Unbiased Questions

It is essential that the questions in a survey be unbiased and non judgmental. Never has this been more apparent than when taking President Trump’s “Mainstream Media Accountability Survey.” The administration’s clear objective with this survey is to illustrate how the vast majority of Americans don’t trust American news media. I say this is clear because it is written into every question on the survey. Whether or not Americans distrust the news media will not accurately be answered by this survey, because every question is leading and drips with bias.

Effective survey questions let the respondent share their thoughts, behaviors, and opinions without judgement. Participants should feel free to answer honestly, no matter their affiliations or viewpoints. That is how you get the most truthful answers.

The “Mainstream Media Accountability Survey” is basically a study in how to write leading questions. Each question has a thinly veiled objective, which can manipulate respondents into answering how the survey designer intended, rather than allowing them to freely state their own opinions.

To illustrate my point, I’ve rewritten two of the “Mainstream Media Accountability Survey” questions.

The first question of the survey gets us off to a rocky start.

One easy way to remove bias in survey questions is by writing questions that are clear and free of jargon. There are many terms that litter today’s political landscape that are charged with meaning. In this case “mainstream media” has come to mean something beyond just the most popular news channels. It is jargon, a phrase that is now tightly packed with other meanings. I suggest we change this to “television news media outlets in the United States,” which is really what the survey is focusing on.

This question also assumes that the respondents — supposedly representative of all American adults — identify as being supporters of the Trump administration. The use of “our movement” in this case is at once leading and leaves out a huge section of possible respondents. I suggest “President Trump’s administration” as a replacement.

Here is how I would rewrite that question:

Q: On a scale from 1–5, how fairly do you think the American news media characterizes President Trump’s administration?

A: [1: Extremely unfairly, Somewhat unfairly, Neutral, Somewhat fairly, 5: Extremely fairly]

Now for our second question rewrite…

Undoubtedly, this question assumes the respondent thinks the media is doing a poor job of representing Republicans in many areas. The question gives only the option to say where the media is getting it wrong, and not where they may be getting it right. The real problem here is that there is no clear scale for the answers to be measured against. I could see this question being rewritten with two different approaches in mind: 1) The question is asking if the media is representing Republican views correctly or 2) it is asking if Republicans are equally represented in the media compared to Democrats.

Let’s start with approach #1, and assume the question is seeking the respondent’s opinion about whether their political views are represented accurately in American television media. To remain unbiased, we cannot assume that the respondent identifies with the Republican party. By breaking the question into two parts, we can first gather with which political party the respondent most closely aligns, and then we can get their opinion on whether they are accurately represented in the media. In my opinion, this makes the answers to this question infinitely more interesting!

Rewrite Part 1:

Which political party most accurately represents your political views?

  • Democrat
  • Green Party
  • Independent
  • Republican
  • I do not typically affiliate with any one political party

Rewrite Part 2:

On a scale from 1–5, how accurately are your political party’s views represented on the following topics in American television news media on each of the following topics?

Each topic is represented with a scale from [1-Never accurate, 2-Mostly not accurate, 3-Neutral, 4-Mostly accurate, 5-Always accurate, I have never seen my party’s views represented on this issue]

If we take approach #2, and assume the question is asking if Republicans and Democrats are equally represented in American television news media, then this takes quite a turn.

When designing a survey, it’s important to consider if the data you are seeking could be gathered any other way. Surveys are best for gathering self-reported data on opinions and behaviors, but in approach #2, this question is asking for a fact and not the respondent’s opinion. In this case, the information could be gathered more accurately by referring to the actual source — the media — and recording how much actual air time each party receives on each issue over a specific period of time.

But phrasing questions correctly is only part of the battle when it comes to survey design. The next step is gathering the data from a group that can accurately represent the thoughts, feelings, and behaviors of the whole target population.

Representative Sampling

Representative sampling for research simply means that the people who participated accurately model the target population on a smaller scale. For example, if I want to know where Americans in their 20s think the best burger is made, it would not be enough to just ask my female coworkers. What about men? What about people who live in different parts of the country or from different economic backgrounds? My burger survey would not be representative. In a similar way, if a survey aims to represent all American voters, then respondents should be people who identify with each political party, at the same ratios that they are in the general US population.

Online surveys are a notoriously difficult tool to collect a representative sample. In the case of a national sample, allowing participants to opt in to a survey is not an effective way to get the right mix of participants, especially when you throw in the viral nature of social media.

The participant snowball effect is the phenomenon where participants tend to refer like-minded people to take the survey once they have finished, which often leads to an unbalanced sample. Think of the social media echo chamber. Do you really want that type of environment determining the sample of people who are meant to represent all American adults? National sampling needs to be done through outreach in order to be effective.

Another problem with online surveys is that many lack safeguards to keep people from submitting multiple entries. The “Mainstream Media” survey falls prey to this issue. Anyone can go back and take it as many times as they want. It is difficult to keep results factual and representative when anyone could fill out a survey dozens of times, skewing the responses.

Gathering data is a science, and should be done with extreme care. There are people who specialize in data collection and spend their lives taking measures to be sure it’s done well. In general, data gathered and presented by real data scientists should be the most trustworthy. The best way to make sure the data was gathered correctly is to do a quick background check on the organization that reported it, to be sure they have a record of representing data well. If the organization has something to gain by representing the information in a certain way, take a harder look at how they analyzed and visualized their data.

Honest Data Analysis

Assuming the questions were written correctly and the data gathered in a representative way, how data is analyzed and shared can still be misleading. Here is where my Mom’s advice about taking a stats class comes back around! Understanding the basics of data framing and visualization will help uncover many dishonest sources.

Investigate data cleaning.

For all I know, my answers to the “Mainstream Media” survey will be scrubbed out of the final report, because the data doesn’t tell the story they wanted when my responses are included. This is not OK. Be aware of data cleaning methods that can skew what is presented in the end.

Pay attention to the y-axis

Poor data visualization can be extremely misleading, and inflated scales or falsely plotted graphs are a major culprit. One example of this is a graph shown at a Planned Parenthood hearing in Congress in 2015. The real numbers show that more cancer screenings were performed than abortions in 2013, but the more prominent colored lines seem to say that the number of procedures switched between 2006 and 2013. A more accurate graph of this same data would show cancer screenings dropping between 2006 and 2013, while still far above the nearly-flat rate of abortions.

This graph uses real data but plots it in a very misleading way. Is 327k larger than 935,573? No.
This revised graph of the same data shows the real relationship between these two services provided by Planned Parenthood between 2006 and 2013.

This can also happen with simple statistics, such as percentages. Percentages often only tell you how much something increased or decreased, without giving you exact numbers on either end. Consider a chair marked down 20%. As my late grandpa used to say, “20% off of what?” He was a furniture salesman and knew the tactic well. If the chair was originally $3,000, that’s still a very expensive chair.

You can consider the flip side, and how percentages can make increases seem more significant than they really are. I could say, “The number of people who have read my article on survey design has increased 400% in the last 24 hours!” Sounds impressive, right? What if it was just that my reader count had gone from one to four people? That’s not as impressive. Maybe I’ll just stick that flattering percent increase.

The same can be done with survey data, inflating or deflating numbers without communicating the real changes.

Stay Vigilant

So remember, data lives a very full life before it makes its way to you. What you believe should be a conscious, well-informed choice.

Data is often thought of as neutral, but it usually goes through many filters before making it into your hands. When you encounter numbers or facts, be sure to consider the motivations behind the questions, how that data was gathered, and how it was presented to you.