US 2020 Facebook & Instagram Election Study: Frequently Asked Questions (FAQ)

2020 Election Research Project

21 min readJul 27, 2023

1) How was the academic team selected?

In early 2020, researchers at Facebook (now Meta) approached Social Science One about the possibility of jointly organizing a research project around studying the impact of Facebook and Instagram on the November 2020 U.S. elections. Social Science One had been created to facilitate industry-academia collaboration to study social media platforms and their impact on society, and in particular to make data available to researchers who did not work for Meta. Social Science One at this point consisted of two directors/founders and a series of advisory committees, each of which had a chair. These included Natalie (Talia) Stroud, the Chair of the North America Advisory Committee, and Joshua Tucker, the Chair of the Disinformation and Electoral Integrity Committee. As the Chairs of the Social Science One Committees that were most related to the proposed project, Stroud and Tucker agreed to jointly co-Chair the academic team to collaborate with Meta on what has come to be called the U.S. 2020 Facebook & Instagram Election Study.

There were of course many qualified researchers who could have been involved in such a project and selecting among them was the first challenge of the project. In the interest of balancing the competing needs of assembling a research team quickly and ensuring the necessary research expertise for the project, Stroud and Tucker made the decision to recruit the remainder of the academic team from existing Social Science One advisory committee members based on their diverse research expertise. In March of 2020, Stroud and Tucker approached 16 of these advisory committee members, 15 of whom agreed to join the academic team for the project. Subsequently, one of the 15 withdrew in the first months of the project because of competing time demands. In addition, one of the advisory committee members requested that a current co-author on very closely related projects and with relevant statistical expertise for the project be allowed to join the team as well, resulting in an academic team of 17 members including Stroud and Tucker. Meta had no say in who was selected by Stroud and Tucker to be part of the academic team. Other possible selection methods that should be considered in future such collaborations are detailed in FAQ 16 below.

2) Why was the decision made initially to limit recruitment to members of Social Science One advisory boards?

Planning for this project began in earnest in February 2020, which meant that there was a time constraint on finalizing study designs prior to the 2020 U.S. elections. In order to proceed promptly with the study designs, the academic team had to be put together as quickly as possible. The existence of the Social Science One advisory committees provided Stroud and Tucker with a pool of researchers with relevant expertise for conducting these studies, as well as a clearly delimited choice set to make the process of selecting the academic research team more manageable. Academic core authors (those who have control rights over each study) brought on additional academic researchers and research assistants for specific papers as they deemed necessary. There are strengths and weaknesses to this selection method, with one weakness being the many qualified academics who were not part of the team. Other strategies should be considered in any future such collaboration, as detailed in FAQ 16.

3) Did the academic researchers have prior connections to Meta? Does this impact the research?

Many, although not all, members of the core academic team had associations with Meta prior to this collaboration, all of which are disclosed in the project’s publications. In particular, of the 17 members of the core academic team on the research project, one owned individual stocks, one had previously served as a paid consultant on another project, nine had received funding for research for other projects, and three had received fees for attending or organizing an event or serving as an outside expert at an event. In total, 10 of the 17 researchers fit at least one of these categories prior to beginning work on the project. In several cases, this background knowledge proved beneficial to the project as it aided communication between the academics and the Meta researchers, as well as furthered the research designs.

One may question whether these connections impact the research. The scientific method is designed to minimize the influence of human biases in drawing conclusions, and this collaboration followed, to the best of our knowledge, social scientific best practices. We put into place a number of integrity provisions, detailed below in FAQ 4. These provisions, such as specifying our hypotheses and how we would do the analysis in writing before analyzing the data and then uploading these plans to a date-stamped pre-registration website, are mechanisms to minimize potential biases.

4) What steps were taken to ensure the integrity of the research?

This project faced the challenge of attempting to study the impact of a for-profit company (formerly Facebook, now Meta), but with the study being impossible to conduct without the cooperation of employees at that company. It is important to note that had the academic team not had confidence in the research integrity of our research partners within Meta, we would not have participated in the project. Still, from the very start of the project, we focused heavily on what steps we could take to bolster the integrity of the research.

We adopted the following five conventions to guide the research process.

First, although Facebook covered the costs associated with running the study itself (e.g. paying the survey vendor), none of the academic team nor their institutions received financial compensation (e.g., support for research assistants, course buyouts) from Meta for their participation in the project.

Second, the analyses for all the papers resulting from the project were pre-registered — that is, we specified research questions, hypotheses, methods, and planned analysis — at the Open Science Foundation. The pre-registrations, which we refer to in the remainder of this FAQ as “PAPs” (Pre-Analysis Plans), were embargoed while the research was being carried out, but will be publicly released at time of publication. Further, each paper will be accompanied by a list of deviations from and clarifications of the analysis plan as specified in the PAPs.

Third, for every paper, a set of core authors with control rights over the content of the paper were specified in the PAP. These core authors consist only of members of the academic team (i.e., not employees of Meta). While the process of designing research plans was collaborative, core authors were given final decision-making authority over all aspects of the projects, including PAPs, the decision to invite additional collaborators when needed, and paper content.

Fourth, Meta publicly agreed that there would be no pre-publication approval of papers on the basis of their findings. At the time the PAPs were proposed — but before any data analysis was conducted — Meta conducted legal, privacy, and feasibility reviews of the studies. Meta was entitled to review papers prior to submission and publication, but could only do so in order to ensure that 1) no confidential or personally identifiable information was released in the paper, and 2) what was released did not contradict any of Meta’s existing legal obligations.

Finally, we appointed a rapporteur for the project, Professor Michael Wagner of the University of Wisconsin, Madison, who was neither a paid employee of Meta nor a member of the academic research team. The rapporteur was given access to the project researchers, was allowed to join project-related meetings, and had access to project documents. The rapporteur is not a co-author on any of the papers resulting from the study, but the expectation was that the rapporteur would publish both academic and popular press articles assessing the research process itself.

We also decided that our primary approach to releasing results publicly would be upon completion of the peer review process. The peer review process involves having other scholars, who are not authors on the paper, review the work, provide criticism and feedback, and make a recommendation as to whether the scholarship is worthy of publication. The output from this project was never intended to be a “report” on Facebook’s and Instagram’s impact on the 2020 U.S. election, but rather a series of peer-reviewed academic publications addressing scientific questions related to the impact of various aspects of the Facebook and Instagram platforms on the 2020 U.S. election.

As relevant social science data is increasingly housed within private companies — as opposed to the public commons — it is imperative that the research community explore innovative models of academic-industry collaboration, all of which involve trade-offs. The model represented by this project is not the only one that could have been chosen, and like any model, has strengths and weaknesses (see FAQ 16). That being said, refusing to address pressing societal questions because there is not a perfect model is clearly suboptimal.

5) How were the research topics chosen and the studies designed, and did Meta have undue influence on what was studied?

As specified by Meta, and agreed to by the academics, the collaboration was to assess Facebook and Instagram in the context of the 2020 U.S. election. The academic team decided that the project would focus on four general areas: (1) political polarization; (2) political participation, both online and offline, and including vote choice and turnout; (3) dis/mis/information, knowledge, and (mis)perceptions; and (4) attitudes and beliefs about democratic norms and the legitimacy of democratic institutions. The academic team selected these areas based on what was not known in the academic literature, topics widely researched across multiple fields, topics likely to be of interest to the public, and our best guesses about possible topics that could arise during the 2020 presidential election. Within the bounds of these four areas, the academics were free to propose specific research questions and study designs with the understanding that the only reasons Meta could reject those designs would be for legal, privacy, or logistical (i.e., infeasibility) reasons.

Once the research questions and study designs were established, the academics and Meta researchers worked collaboratively on the details of the implementation with academic lead authors retaining final control rights. This included developing the experimental protocols, survey methods, survey questionnaires, measures of on-platform behavior, new classifiers and content coding in instances where they were needed, and sampling and weighting procedures for the on-platform recruited samples. The Meta team provided information about how internal Facebook and Instagram products and systems work and shared outputs of aggregated log data analyses with the academic team to inform the research designs. They also provided information about classifiers developed by Meta that could potentially be used for research purposes. Finally, Meta proposed the initial sampling design and weighting procedures based on internal insights about the distribution of Facebook and Instagram use. NORC at the University of Chicago also provided feedback on the survey methods, questionnaires, and sampling and weighting procedures.

6) Excluding Meta entirely from the research design process would seem to be another way to ensure the integrity of the project — why did you not take this route?

The simplest answer to this question is that excluding Meta entirely from the research design process would have been logistically impossible. There was so much about the internal workings of Facebook and Instagram that was unknown to the academic research team that it is practically impossible to imagine how we would have gone about designing the research without the active collaboration of Meta employees. While it is possible to imagine an infrastructure that would allow research to proceed in this manner in the future, no such infrastructure was in place in February of 2020. Further, the platform interventions would not have been possible without Meta’s cooperation.

Alternatives to the collaborative approach we took are that (a) Meta does this type of work internally and many of the results are not published, (b) academics conduct research externally, but are limited to observational data that can be collected via API or scraped in violation of terms of service, or © the research is not conducted. We believed that the arrangement in which we chose to participate — academic researchers collaborating with internal Meta researchers — would be valuable for producing high quality scientific research, as both groups of collaborators brought scientific value to the project.

It is also important to note that the academic research team was never presented with the option of designing research studies that would be implemented by Meta without the input of Meta’s own internal research team. The original initiative for this project came from Meta researchers, and the proposal was for a collaborative effort. While the academic research team carefully considered the conditions under which it would participate in the collaboration (see FAQ 4 above), the context of the opportunity was for a collaborative research effort.

7) How was the privacy of Facebook and Instagram users protected?

For consenting participants, the data collected by NORC was linked to data about their on-platform behavior in an analytics environment isolated from Meta’s business servers using encrypted participant identifiers. After the quality of all participant data was assessed, the encryption mechanisms allowing us to link the datasets were destroyed. Only de-identified data (i.e. data where identifiers such as the user’s name or other information that could reasonably be linked to the user have been removed) were made available to the academic team and are being made available to the broader academic community (see FAQ 15).

For the broader U.S. Facebook and Instagram user population, only aggregated, de-identified outputs of analyses were shared with the academic team. Code to collect and process the data from Meta servers was written by Meta researchers and then reviewed by at least one member of the academic research team. There were a few cases where the academic researchers were unable to review the underlying code (see FAQ 11). The use of this data for social good research is permissible under Meta’s Data Policy, and care was taken to ensure that user privacy was protected including review by Meta’s Legal and Privacy team in order to check that no individual-level or identifying data were shared with the academic research team.

8) How were survey and experimental research participants informed of the opportunity to participate, as well as what participation would entail?

Randomly selected participants that were recruited by Meta saw a message at the top of their Facebook or Instagram feed asking them if they would like to share their opinion. Those clicking “Start Survey” were directed to a consent form that provided details about the study and their participation. For the deactivation experiment, the consent form specified that participants would be asked to deactivate their Facebook or Instagram account. For the platform experiments, the consent form noted that “Your [Facebook/Instagram] experience may be different than what you’re used to. For example, you might:

See more or fewer ads in specific categories such as retail, entertainment, or politics
See more or fewer posts in [News Feed / your feed] related to specific topics
See more content from some [friends/connections] and less content from other [friends/connections]
See more or less content about voting and elections”

All Meta-recruited participants were asked to consent to the collection of data on their on-platform activity as well as (where legally feasible) voting and political contribution data. Agreeing to share voting and political contribution data was optional for participation in the study. They were told, “Over the next four months, you’ll be asked to fill out a short survey each month. This monthly survey will take about 15 minutes, for a total of 60 minutes over four months. Our partner, NORC at the University of Chicago, will administer this research.” Participants were separately asked to participate in another wave of the survey following the events of January 6th, 2021.

A separate sample of survey panelists, some of whom were not Facebook or Instagram users, was recruited by NORC at the University of Chicago to take the same surveys as the Meta-recruited participants and given the option to provide data on their on-platform activity for comparison purposes. NORC-recruited participants also were asked if they wanted to share voting and political contribution data separately. They were told that if they consented, “NORC will join your survey responses to publicly available third-party data like if you’ve voted or made a political contribution, if this data is available.”

Some participants were asked to give their consent to share web browsing data. Agreeing to share web browsing data was optional for participation in the study. They were told, “NORC at the University of Chicago would like to understand more about how you’re using your device during this study. To participate, you’ll need to download software to your device. When installed, this software will automatically collect data about your device and the websites you visit and apps you use. The data will only be used for research purposes. Please note that passwords, and other information you might enter on websites, like your banking details, will not be collected.”

All survey participants who self-reported using Twitter were asked as part of the fifth survey wave whether they would be willing to share and verify their Twitter handle with NORC to allow them to collect data on their public Twitter activity. All survey participants were also later asked to consent to participate in a sixth survey wave.

Participants were compensated for their participation in the study, and additional compensation was offered for sharing browsing data.

9) What steps were taken to ensure the project was conducted in an ethical manner that respected the rights of participants in the study?

The research was reviewed and approved by the NORC Institutional Review Board. Institutional Review Boards are set up to protect the “rights and welfare of human research subjects,” as described by the Food and Drug Administration (FDA) and as “a committee that performs ethical review of proposed research” as described by the Office for Human Research Protections housed in the U.S. Department of Health and Human Services.

The academic team consulted with their respective university Institutional Review Boards about participation in this project and followed the guidance provided. For some universities, the study was approved as an exempt or expedited protocol. Others ruled that this was not human subjects research or that this was not university-affiliated research.

Meta also contracted with Ethical Resolve, an ethics consulting firm founded by two philosophy PhDs, that provided feedback on research designs, reviewed study materials, and provided suggestions for ensuring the integrity of the research while protecting users and providing information to the public.

10) How was this collaboration financed?

Meta funded the data collection efforts from NORC and subcontractors hired by NORC to execute the project, the review process with Ethical Resolve, and allocated employee time to this project.

Although Meta covered the costs associated with running the study itself, the academics did not take any funding from Meta for their participation in the project. Several academics had funding that allowed them to buy out from courses or to hire research assistants. The team also received public relations support on describing the study and sharing the findings. Some of the academic team’s work was supported by funds from the Democracy Fund, the Guggenheim Foundation, the John S. and James L. Knight Foundation, the Charles Koch Foundation, the Hewlett Foundation, the Hopewell Fund, the Alfred P. Sloan Foundation, the University of Texas at Austin, New York University, Stanford University, the Stanford Institute for Economic Policy Research, and the University of Wisconsin-Madison .

11) Are there aspects of the research process that were outside of the view of the academic team?

Yes. The entire setup of the project as a collaborative effort between Meta employees and academic researchers meant that by definition, the academic researchers were not employees of Meta. As such, the academic researchers were never able to access Meta’s internal data logs. Instead, separate tables were created for the purpose of carrying out all the research involving platform usage data that was specified in the pre-analysis plans. These research-specific tables were stored on Meta’s Researcher Platform to which the academic research team had access.

The project-specific code used to generate the results found in all of the papers from the project was produced in two stages. First, there is the code that generates the research-specific tables for platform usage data. As the academic team did not have access to the original Meta tables where the data are stored, this layer of code had to be written and executed by Meta employees. Examples include code used by Meta to log individual user actions on the platform as part of their business operations; code used by Meta to train the internal proprietary classifiers used in this research project; and code used by Meta to determine which Facebook Pages, Facebook Groups, Instagram accounts, and web domains meet Meta’s definition of a Misinformation Repeat Offender.

However, the code that generated the research-specific data tables was reviewed by members of the academic team as part of the code production pipeline.

The second layer of code analyzed (and pre-processed) the data in the research-specific tables on Meta’s Researcher Platform. As the academic team had access to the tables, the team could both write and execute this code. Academic researchers had access to all treatment identifiers and survey data in the raw form provided by NORC, with the exception of a small number of demographic variables such as income that were coarsened to protect privacy. They also had access to passive tracking data that included the website domains visited by at least 20 participants and a pre-specified list of apps. Academic researchers also had access to aggregate political news domain and URL data for all U.S. adult Facebook users. In practice, the analysis code was written by both academic and Meta researchers, with the exact distribution of who wrote what varying across different papers. Code written by Meta researchers was reviewed by members of the academic research team and vice versa. Crucially, members of the academic research team could run (and modify) any code at this analysis stage.

In contrast to the platform data, the survey data used in many of the papers was fully available to the academic team and was joined to platform data in Meta’s Researcher Platform for consenting participants.

12) SS1’s U.S. data was plagued by a very consequential coding error — what have you done to avoid the possibility of a similar problem?

From the outset of the project, we implemented peer code review by both Meta researchers and the academic team. Additionally, we conducted assessments of the descriptive statistics of the variables to look for anomalies that may indicate errors in the underlying data. Through this process, we discovered issues and fixed or reduced their impact where possible. Issues that were discovered but couldn’t be fixed have been/will be disclosed within the papers.

Additional quality assurance was applied for select variables used in the main analyses or hypotheses of the original pre-registered studies. The Meta researchers worked with a larger team of data scientists and data engineers within the company to conduct additional quality assurance measures. There were several steps in this process:

Reviewing the logic of the code used to create the tables, such as ensuring that joins and filters work as intended
Conducting checks to ensure that the data in the study table matched the data in the upstream table (e.g. total number of rows, total number of nulls, etc.)
Inspecting data for anomalies across time, such as days without data or outlier values in variables
Investigating any known quality issues about the tables from within the company
Comparing descriptive statistics from the study tables to external data shared by the company

Finally, lead authors tried to come up with “common sense” tests to identify potential anomalies in the final datasets (e.g., the number of posts classified as civic should be less than the number of total posts).

Although many quality assurance checks were implemented, the possibility for errors, as with all research, remains.

13) The project has missed multiple deadlines for submitting papers for peer review. Why?

The simplest explanation is that none of the lead researchers on this project had ever attempted anything of this nature or magnitude before, and we massively underestimated the amount of time it would take to carry out the project.

In hindsight, though, a decision made early on in the project to prioritize the breadth of the research — we ended up pre-registering over a dozen different papers — was consequential. Some aspects of the project would have taken the same amount of time to complete regardless of how many papers were planned, such as fielding the panel survey. For many other aspects, however, the fact that we had so many different papers as part of the project inevitably led to it taking longer to move through the research process for all of them.

In addition, there were two unforeseen events that both substantially pushed out the timeline. The first was the events of January 6th. Although we had foreseen that there might be controversy over the election results and had originally planned to include a fifth wave of our survey in early December 2020, we ultimately decided that it was important to add a sixth wave of the survey that was carried out in late February 2021. This delayed the processing of the survey data by NORC, which in turn delayed the linking of the survey data to the platform data.

The second event was the aforementioned discovery of the coding error in the original preparation of the Social Science One Condor dataset, which led us to once again revisit the issue of data quality, and, in turn, was one factor leading to the previously discussed Quality Assessment program in the Spring/Summer of 2022 (see FAQ 12).

Additionally, like many other projects over the past few years, the Covid-19 pandemic presented some unexpected challenges in terms of the demands on the time of the researchers.

Further, when presented with the opportunity to carry out research that would have been impossible without the cooperation of Meta but would also be conducted with all the conditions outlined above in FAQ 4, the academic team made the decision to prioritize the number of different research questions that we could answer, and therefore the scientific contribution we could make. We sought to give members of the academic research team the opportunity to propose paper topics for pre-registration as opposed to starting with a pre-specified number of papers that the project would produce. When confronted with questions that forced us to balance speed versus the quality of the scientific research, we prioritized the latter.

To be clear, there were very different models that could have been adopted that would have prioritized speed and produced some sort of comprehensive, non-peer-reviewed report as a primary output. We do not in any way seek to claim that would have been any less important or valuable than our prioritization of peer-reviewed scientific research, but only to be clear that the production of peer-reviewed scientific research was our ultimate goal and to elucidate the reasons why we made these decisions.

14) What rights did Meta and the academics have with respect to the research findings and reporting?

As a precondition to participating in this project, Meta agreed that the findings would not be approved by Meta prior to publication. The only Meta reviews that the research was subject to included (a) feasibility to ensure that the proposed research designs could be executed within the 2020 election timeframe with reasonable resourcing, (b) legal to ensure that information shared abided by the company’s legal obligations, and (c) privacy to ensure that users’ information was being protected in line with the company’s terms of service. As we also described in FAQ 4, members of the academic researcher team served as “core authors” of each paper and were given control rights over final versions of the pre-analysis plans and papers. By control rights, we mean that in the event of disagreements between members of the research team, the core authors would have the final say in resolving these disagreements.

15) What data sharing plans are in place once results are published?

Replication data and code for the studies published as part of this collaboration are archived at the University of Michigan’s Social Media Archive (SOMAR) as part of the Inter-university Consortium for Political and Social Research (ICPSR) (https://socialmediaarchive.org) and made available in the ICPSR virtual data enclave for university IRB-approved research on elections or to validate the findings of the studies. These data will include the survey data, on-platform data, and external data (e.g. passive tracking, Twitter, validated vote, and FEC data) for participants who consented to share it. It also will include platform-wide data for U.S. adults in aggregated form as shared with the academic team. Any instances in which replication data cannot be shared will be disclosed and explained in the academic articles, and evaluated as part of the peer review process.

16) This project has been described as a “new model” for industry-academic collaboration? In what ways is it? In what ways is it not?

What we have laid out in this FAQ represents one model for collaboration between academic social scientific researchers and industry researchers to analyze the impact of that industry actor on social (in this case political) outcomes. We highlight the following components of this approach:

Collaboration between a team of academic researchers and a team of industry researchers in executing the research project
Funding for executing the research supplied by the industry partner
Academic researchers serving as lead authors with control rights for all papers
All research studies resulting from the project are pre-registered and pre-analysis plans are released at the time of publication
Industry partner can vet final outputs for privacy and legal obligations, but does not have the right of pre-publication review based on content
Replication data are made available for follow up analyses by academic researchers not involved in the original project

There are many other models that future industry-academic partnerships should consider, some of which we have mentioned in answers to earlier questions in this FAQ. Infrastructure could be set up to allow academic teams to apply to conduct projects with an industry partner. This model would require specifying an application process, determining who decides which teams get access, and detailing the selection criteria. The academic collaborators could be compensated by the industry partner in an effort to expand potential scholarly participants to those who require these resources. Findings could be released in the form of a report as opposed to peer-reviewed research. The research could be designed exclusively by the academic team, with industry researchers only providing input on feasibility. Funding could be provided not by the industry partner, but by government entities, foundations, or new institutions established for the purpose of funding such projects. All told, there are many factors here that warrant consideration for any type of research collaboration, such as ensuring that a team works well together, that the team has the varied expertise needed to conduct the research, and that the team can come up with executable research designs in a specified amount of time, all the while considering issues of academic inequality. Questions of the optimal manner in which to select academic researchers to participate in such projects and how industry collaborators will be involved remain. We expect that this will be an area of innovation in future projects.

17) Have there been any corrections to the published papers?

Yes. You can find one correction here.

Written by 2020 Election Research Project