DATA

Do you know where your survey data come from?

Outsourcing data collection poses huge risks for public opinion

Peter K. Enns
3Streams

--

by Peter K. Enns and Jake Rothschild

lego-people” by Scott Clark is marked with CC BY 2.0.

The online survey industry has created a complicated and opaque web of data outsourcing that can make it nearly impossible for researchers to know where their data come from and creates conditions that incentivize distracted and insincere survey responses.

We began wondering about survey data sourcing when we noticed that since 2018, the Cooperative Election Study (CES, formerly the Cooperative Congressional Election Study, CCES) has included data from Dynata, Critical Mix, and Prodege. This was a shock to us because among academic researchers, the Cooperative Election Study is widely understood as a survey conducted by YouGov.¹ It turns out that although YouGov administers the survey, the data are not all sourced from YouGov’s panel. We suspect many researchers may not realize that YouGov relies on other data sources when administering the CES.

Text from CES survey documentation, “The sample drawn for the CCES were chosen from the YouGov Panel, along with the Dynata, Critical Mix, and Prodege panels using a six-way cross-classification (age x gender x race x education x region x sample source).”
Figure 1: Data sources for the Cooperative Election Survey (Ansolabehere, Schaffner, and Luks 2019, p.13)

We then noticed that the 2020 AP VoteCast, another prominent election survey (conducted by NORC at the University of Chicago), also relied on multiple external data sources.

Information on AP VoteCast Nonprobability sample, “Nonprobability participants will include panelists from Dynata or Lucid, including members of its third-party panels…”
Figure 2: AP VoteCast Methodology — 2020 General Election

The Tip of the Iceberg

It turns out that this type of data outsourcing is ubiquitous. We partnered with Verasight, a survey firm with which we are both affiliated, to learn more about the online survey industry. Verasight has built a verified community of members who take surveys in exchange for a variety of rewards and incentives, including direct compensation and charitable donations.² Recently, Verasight allowed its community members to take surveys through Lucid, which is a company that brands itself as a survey marketplace. This allowed us to see which companies use Lucid to obtain samples. Lucid is a “survey marketplace” because it does not maintain its own pool of respondents. Rather, it partners with more than 350 data suppliers to find individuals to take surveys for the companies or researchers that pay Lucid.

List of 12 Lucid suppliers
Figure 3: Select Lucid Data Suppliers

Two details about the Verasight experiment with Lucid are important to note.

First, these were not surveys commissioned by Verasight’s customers. They were surveys commissioned by clients of Lucid.

Second, even if a Verasight community member did not qualify for the Lucid survey (which is often the case due to lengthy and complicated filter questions), Verasight still compensated the community members for their time.

We were astonished by what we learned.

Many of the most prominent companies in the industry were using Lucid to get data. Lucid, in turn, gets data for these companies by reaching out to hundreds of different data providers — potentially unknown to the original client. Within seven days, Verasight community members were invited to take surveys by 96 different companies or organizations who had contracted with Lucid to get survey respondents. These included some of the biggest names in the industry, such as Civis Analytics, CloudResearch (which uses Amazon Mechanical Turk), Dynata, Ipsos, SurveyMonkey, Qualtrics, and Zogby Analytics. We are unable to tell how often these companies obtain survey data through Lucid or why they do so. But we were extremely surprised to see so many prominent companies going to this platform for survey responses, particularly given the data quality concerns some academic researchers have expressed about Lucid.³

Figure 4: Yale Professor, Joshua Kalla’s Concerns with Data from Lucid

If a researcher goes to a survey company that goes to Lucid, Lucid then goes to its 350+ data sources, making it incredibly difficult to identify the ultimate source of the data. Identifying the data source is further complicated because other survey marketplaces, such as PureSpectrum, also provide data to Lucid, potentially extending the chain of outsourcing. Additionally, some Lucid suppliers also go to Lucid for respondents (In the Verasight experiment, more than 15 of the companies that came to Lucid — and then Verasight — for data are also providers of respondents to Lucid).

The difficulties identifying where respondents come from mean that details related to the respondent experience that affect response quality, like how many surveys respondents have taken in the last week or whether the respondent was routed directly from another survey (or series of surveys), cannot be assessed. Data quality is also likely to be highly variable, because the source of data can include different Lucid data providers (or combinations of data providers) each time a survey is conducted.

Lucid was very transparent with Verasight about who provides data and who comes to Lucid for data. This transparency allowed the analysis described above. If you work with a survey company that relies on other vendors or a survey marketplace for respondents, it is crucial that the survey company informs you it relies on these sources and that it provides a comprehensive list of respondent sources.

It may be that the Verasight experiment just happened to be the only time these companies used a survey marketplace. But since 96 different companies invited Verasight community members to take surveys within a single week, the practice appears to be widespread. YouGov’s use of Dynata, Critical Mix, and Prodege and NORC’s use of Dynata and Lucid reinforce this perception. Researchers can no longer assume that their survey data come directly from the company they contracted with. Given this concern, we recommend the following to researchers conducting surveys.

Strategies to Solve the Data Outsourcing Problem

When selecting an online survey vendor always ask:

1.) Do they ever outsource their respondents?If yes,a.) ask for a full list of potential respondent sources the      vendor or marketplace uses and assess whether the data disclosure standards (e.g., are they members of the AAPOR Transparency Initiative, do they have a Roper Center Transparency Score?) of each potential data source meets your data needs. Only work with the vendor if you would work with the data sources the vendor uses.b.) ask if the vendor can ensure that none of the respondent sources further outsource for respondents. That is, if Qualtrics goes to Lucid and Lucid goes to PureSpectrum (another survey marketplace), the chain of outsourcing continues. If your vendor cannot ensure that the sources they go to do not further outsource to obtain survey respondents, ask for the list of all the sources those companies use and assess data disclosure standards for each (i.e., repeat step 1a).2.) Do they (or their data sources) ever route respondents directly from one survey to another?The questions asked on prior surveys can influence how respondents answer questions on your survey by making certain considerations salient. Consecutive surveys can also introduce survey fatigue and increase satisficing.3.) How many surveys can respondents take each week?Regularly taking numerous surveys can increase survey fatigue and satisficing.4.) What happens to those who do not qualify for the survey? For example, if the survey focuses just on likely voters, a certain demographic group, or respondents in a certain area.Many online surveys automatically route an individual who does not qualify to another survey. This creates two data quality concerns. First, respondents interested in survey rewards have an incentive to falsify responses in order to qualify. Second, if respondents give sincere responses and do not qualify for previous surveys, by the time they are routed to your survey they may experience survey fatigue and certain considerations may have been made salient by previous filter questions.5.) How much do they (or their data sources) compensate respondents?If the compensation is low, this may create an incentive for respondents to speed through the survey or pay less attention while taking the survey.

We hope that by asking these questions, researchers are able to better ensure that the survey data they collect meet their data quality and research needs. We also recommend that researchers provide the answers to these questions in their published research so that those learning from their research will better understand the data that led to the reported results.

Footnotes

[1] See, for example, Bauer, Kalmoe, and Russell (2022, p.29), Clifford, Simas, and Kirkland (2021, p.652), Royden and Hersh (2021, footnote 7).

[2] Verasight invites individuals to take surveys and join the community by randomly selecting addresses to receive mail invitations, through social media campaigns, online advertisements, and a variety of other outreach strategies. The reward and recruitment approaches ensure that community members are compensated fairly, their identities are verified, and they have a positive survey experience.

[3] See, for example, Aronow, Kalla, Orr, and Ternovski (2020) and Ternovski and Orr (2022), as well as https://twitter.com/ScottClif/status/1498850024666677255, https://twitter.com/NathanKalmoe/status/1314632769696301056, and https://twitter.com/j_kalla/status/1401935385459007493.

References

Ansolabehere, Stephen, Brian Schaffner, and Sam Luks. 2019. “Guide to the 2018 Cooperative Congressional Election Survey.” https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910/DVN/ZSBZ7K.

Aronow, Peter M., Josh Kalla, Lilla Orr, and John Ternovski. 2020. “Evidence of Rising Rates of Inattentiveness on Lucid in 2020.” SocArXiv Papers. https://osf.io/preprints/socarxiv/8sbe4/.

Bauer, Nichole M., Nathan P. Kalmoe, and Erica B. Russell. 2022. “Candidate Aggression and Gendered Evaluations” Political Psychology 43(1):23–43. https://onlinelibrary.wiley.com/doi/epdf/10.1111/pops.12737

Clifford, Scott, Elizabeth N. Simas, and Justin H. Kirkland. 2021. “Do Elections Keep the Compassionate out of the Candidate Pool?” Public Opinion Quarterly 85(2): 649–662. https://doi.org/10.1093/poq/nfab026

Royden, Laura and Eitan Hersh. “The Young American Left and Attitudes About Israel” Contemporary Jewry (March). https://link.springer.com/content/pdf/10.1007/s12397-022-09417-2.pdf

Ternovski, John and Lilla Orr. 2022. “A Note on Increases in Inattentive Online Survey-Takers Since 2020” Journal of Quantitative Description: Digital Media Vol.2. https://journalqd.org/article/view/2985

Suggested Citation:

Enns, Peter K. and Jake Rothschild. 2022. “Do you know where your survey data come from?” 3Streams. May 2.

--

--

Peter K. Enns
3Streams

Co-founder, Verasight; Professor of Government & Public Policy, Cornell University; Director, Cornell Center for Social Sciences. @pete_enns