Presidential election polling
Revisiting the ‘gold standard’ of polling: new methods outperformed traditional ones in 2020
Even before 2020 polling errors became evident, analysts wondered if we could trust the polls. After the election, concerns spiked — overall errors were even larger than in 2016. Frank Luntz went so far as to declare, “the polling profession is done.” We disagree. We think polling has a strong future if the proper methods are used — but these methods may come as a surprise.
Surveys that use probability-based samples — for example, recruiting respondents through random digit dialing — have long been considered the “gold standard.” But given the high cost of these methods, many researchers and media outlets have turned to non-probability sampling, such as recruiting respondents who volunteer for internet surveys. The overwhelming majority of election surveys in 2020 included non-probability sampling methods, raising the question: Did the shift toward less expensive, opt-in samples hurt polling accuracy in 2020? Our research shows the answer is no. Non-probability surveys and surveys combining probability and non-probability methods outperformed probability-based surveys in the 2020 election!
Analyzing 355 Polls: To evaluate the accuracy of non-probability, probability, and mixed-sampling approaches (using both approaches), we analyzed all national polls used in the FiveThirtyEight election forecast with start dates between September 1st and November 1st, 2020. We coded each poll’s sampling approach by consulting the survey organization’s methodology discussion. For each poll, we then calculated the “raw error” of the national vote as the difference between the percent supporting Biden minus the percent supporting Trump in each survey and the actual percent Biden received minus the actual percent Trump received. Biden won by 4.5% (51.3% — 46.8%), so a poll with 53% for Biden and 46% for Trump shows Biden ahead by 7% (53% — 46%), for an error rate of 2.5% (7% minus 4.5%). Positive numbers indicate a pro-Biden error, negative numbers are more pro-Trump, and values closest to zero represent the most accurate polls.
The figure below shows that among surveys of likely voters (LV), probability-sample polls produced an average raw error of 6%, while non-probability samples were, on average, 2.4 percentage points more accurate, with a mean raw error of 3.6%. Mixed samples were more accurate still, with an average raw error of just 1.3%. (Full disclosure: although not included in the 538 data, we work with Reality Check Insights, a data firm that used mixed-sampling methods in 2020 election polling. Reality Check Insights election data has been archived at the Roper Center for Public Opinion Research.)
We also analyzed absolute error — i.e., the average of the absolute values of each survey’s error — and mixed and non-probability samples again were more accurate, with absolute error rates of 3.1 and 3.7%, respectively, compared to 6% for probability-based surveys.
We can also see in the above figure that the survey results were more spread out for non-probability and mixed samples. The flatter and wider error distribution for non-probability and mixed samples makes sense given past research that notes the multitude of approaches used and the variability in estimates across non-probability samples. However, this wider distribution of estimates is not necessarily a bad thing. First, not only were non-probability and mixed-sample polls better on average, but the least accurate of these were more accurate than the least accurate probability polls. Additionally, every probability-based survey had a pro-Biden error. The tightly clustered results of probability-based surveys may have suggested a false sense of confidence that was not replicated by other methods.
What about random digit dial surveys and registered voters? The probability-based polls analyzed above include those conducted via online surveys, polls using voter files, and those conducted via live interviewers who randomly dialed cell and landline telephones. To better understand the above findings, we separately analyzed the 16 live interview random digit dial (RDD) surveys, because some analysts view live phone interviews as an important indicator of survey quality. Our findings did not change. The average error rate was 5.2%. Since all error was in the pro-Biden direction, the average absolute error was an identical 5.2%. The maximum error for the 16 RDD surveys was 11.5%, just one percent less than the maximum error for all 218 non-probability surveys and equal to highest mixed-sampling survey error.
We also looked at survey results among registered voters (RV). Here we found a similar non-probability advantage. Probability RV polls had an average raw error of 5.6% (5.2% among RDD live phone polls) compared to 3.7% for non-probability polls. (There was only one mixed-sampling RV survey, which we omit from this analysis.) Again, we see that the least accurate non-probability polls were more accurate than the least accurate probability polls. The results are virtually identical when we examine average absolute error.
The future of election polling: These findings reinforce existing studies that find non-probability samples often perform well in election polling and forecasting, but the fact that non-probability and mixed samples did better than probability-based surveys is especially noteworthy. The accuracy pollsters have come to expect from probability samples was conspicuously missing in the 2020 presidential election cycle. Our findings have emerged alongside recent work showing that live phone interviews may no longer outperform online surveys and other innovative modes. It is possible that plummeting telephone response rates (a primary method of probability-based samples) and the previous president’s efforts to undermine trust in mainstream surveys hurt traditional methods, whereas innovative sampling approaches were better able to reach the types of voters low in social trust that some have suggested were missing from the polls.
Of course, it will be important to make similar comparisons for state-level polls, in future contests, and analyzing survey questions on other topics. Moreover, concerns about the variance of subgroup estimates in non-probability samples may continue to merit caution. But our research suggests that — at least for election polling — survey innovations yielded highly accurate results in 2020.
Update (4/12/21): After publishing our article, we identified a survey incorrectly coded in FiveThirtyEight’s data as live interviewer, but that should have been coded as online non-probability (which we confirmed by reaching out directly to the survey organization). We have notified FiveThirtyEight of the error and our article and replication data have been updated to reflect this coding change, which did not alter any conclusions in our article (the number for RDD surveys is now 16 instead of 17, the RDD average error (and absolute average error) rate shifted from 5.4% to 5.2%, and the probability survey average error changed from 6.1% to 6.0%).