FiveThirtyEight And The Future of Polling Averages
A closer look at where polling went wrong and how the FiveThirtyEight forecast fared in the election.
There are still some votes to be counted. But, clearly, there has been another polling error of something around that of 2016 and 2018. And many Democratic supporters and commentators are disappointed and dissatisfied with the results, as they expected a blow-out.
There are still three important Senate races, which will decide who has won control. If the Democrats can win two of the two Georgia run-offs and the Alaskan Senate race (where Democrat-leaning independent Al Gross still has a shot of winning with under 60% of the vote counted), then they will control the Senate, mitigating some of the attacks on the forecasters and the models.
Despite some of the premature and frankly wrong criticism, the FiveThirtyEight forecast was pretty accurate. It gave roughly equal chances to a slim victory and a landslide win — so we shouldn’t be even remotely surprised that one of them has happened over the other. It gave a broad forecast of 47–53 Senate seats for the Democrats, and the Democrats look set to fall within that range.
If there was an error in the forecast, it was not the fault of the model itself. It was due to polling error. But, in the end, the model does account for polling error — hence why a slim victory was predicted as just as likely as a large victory.
But clearly, forecasters (and campaigns) need the polls to do better than this. Constant polling errors of 3 points or more are not really good enough. Despite correcting weighting errors since 2016 and 2018, pollsters have once more got it wrong — although not by a huge margin as some have tried to suggest.
It’s popular to bring up the Post-ABC poll that suggested a Biden +17 in Wisconsin, and then contrast that to the 20,000 vote margin of victory. But this is reductive: the biggest problem that is causing polling error is undoubtedly poll herding — where pollsters mess around with methodology post-fieldwork in order to fit in with the perceived “acceptable range” of polls. What we definitely don’t want to do, therefore, is attack pollsters for releasing outlier polls. We like outliers — they are proof of solid commitment to the scientific method and can help to break down perceived acceptable ranges, even if they are not massively useful in and of themselves.
The problem that we have is how we can bring polls closer to reality, and stop such consistent polling error.
There are, broadly speaking, three solutions. Only one will happen naturally: adjustment of weighting. Pollsters know they have to change their methodology to weight the votes of minority groups and less educated people better, or face repeated public anger and consternation at the inaccuracy of their polls. They also know that they have to get better at finding people within these groups with views which differ from the majority of their group — and specifically attach greater weight to respondents with differing views if they know that they’re struggling to find them.
All that will come, because it is a matter of survival for the pollsters. But there are two less natural solutions that will have to be forced.
Firstly, pollsters need to agree to better practice on a lot of fronts. Too often, pollsters release polls with low sample sizes, or meddle with the methodology and weighting post-fieldwork if their fieldwork has produced an anomalous result. These two factors lead to lower-quality polling and the rigging of the normal scientific method, so instead of producing purely observed results, polling also represents the wish of pollsters not to release polls which make them look bad.
You might observe that, as a result of poll herding, all polls are starting to look bad. 95% of people don’t critically evaluate the performance of Monmouth, or YouGov, or Post-ABC — all they see is the 538 average and the final result, and conclude “polls bad”.
On the subject of the 538 average, it isn’t without guilt either. I’m not talking about 538’s methodology, which is excellent. The problem is that the average encourages herding towards it, such is the quality of its methodology. It has become an industry gold standard which pollsters aim for, and that’s not healthy. It is an unsolvable problem (unless pollsters collectively eradicate all post-fieldwork weighting changes).
The only way to really tackle herding is for 538 to stop running averages. It’s an unhealthy luxury.
There are four important caveats when talking about polling error, however.
Firstly, we need to wait for final results and counts before judging the polls and the extent to which they were off the mark.
Secondly, we don’t know how full of a role voter suppression and lost mail-in ballots might have played. There was a suggestion that 23% of ballots in Florida were stuck in USPS processing facilities, after DeJoy ignored a court order to sweep all of them and send the ballots in. Pollsters can’t control voter suppression.
Thirdly, polling error seems to have differed widely across the country. Florida and most of the Rust Belt seem to have big errors. But the polls also seem to have predicted Nevada, Georgia, California, and Arizona more accurately. We’ve got to ask why this has happened to reveal what the true challenges were.
Fourthly, there’s a pandemic going on. That presented new challenges: were voters working from home more likely to answer the phone? Do people who work from home lean Democratic? Can this factor explain the geographical differences in polling error, perhaps?
Polling methodology and error is not easy to understand. There is a range of different factors. But agreeing collectively to better methodology and practice, and removing the averages which enable poll herding, is a good place to start.