Not only are the polls wrong, the pollsters don’t know how to fix them
The 2020 election had two clear losers: Donald Trump and polling.
Analysts had predicted a possible Biden landslide, but immediately after voting ended it became clear that the election was tight. Pundits and journalists scrambled to explain what was taking place, while some Democrats panicked at the possibility of a 2016 repeat. It felt awful. Over the coming days, Biden was able to pull ahead in Georgia, Pennsylvania, and Michigan thanks to absentee ballots. Democrats are happy to win, but there is an obvious question: what happened? Why were the polls so off?
Polling got slammed after the 2016 election. Clinton had been rated as the overwhelming favorite but lost. In the aftermath of 2016, pollsters and analysts rationalized their mistake, explaining that polling had missed enthusiasm for Trump among non-college-educated white voters. Pollsters said they fixed the mistake. It was OK to trust polls again.
For 2020, while Biden won the election as predicted, the margin was much thinner than anticipated. Polling in down-ballot races was a mess. If polling is so inaccurate, what do polling results even mean? Is polling broken? What would systematic polling failure means for our discourse?
What Polling Got Wrong
Polling was really bad in this election. State-level polling was atrocious. Texas was off by 4.5%, Wisconsin was off by 6.0%, and Ohio was off by 7.2%. Congressional polling was just a disaster. It isn’t possible to dissect each of these cases in detail, but two examples stuck out as particularly inexcusable.
Democrat Sara Gideon challenged Republican Susan Collins in the Maine Senate election. Anyone with even a passing interest in American politics knows that Collins is a controversial figure, annoying Republicans and Democrats in almost equal measures.
No one was surprised when Collins’ reelection bid looked weak, coming in behind Gideon in poll after poll. According to the Real Clear Politics (RCP) polling average, Gideon had an estimated 4.4% lead, while 538 estimated a 2–4% lead. On election night, Collins shocked everyone when she brought in over 51% of the votes to Gideon’s 42%. Not only did Collins win, but it wasn’t even close.
It is challenging to rationalize how the polling was so wrong. Here is a list of reasons the polling in the Maine Senate race should have been accurate:
- The race was polled 30 times since July, of which 29 showed Gideon in the lead, and one showed a tie.
- A variety of firms conducted polling, many of which are considered high quality.
- Six polls were conducted since the middle of October, all showing Gideon with a lead. The average was about 3% in Gideon’s favor.
- Polls in the 2020 Maine Senate election showed very few undecided voters, reducing uncertainty.
- Maine is an ethnically homogeneous state, with 93% of the population being non-Hispanic White (U.S. Census). Difficult-to-poll ethnic minority groups cannot explain polling failure.
- Maine’s Covid situation is one of the best in the country. While the outbreak is getting worse, the overall levels are low.
- Many states engaged in a back-and-forth over election law in the days leading up to November 3rd. Maine’s rules were not substantially changed from previous years.
- State polling was accurate for the presidential election. 538 forecasted 55–43% in favor of Biden, and we will end with 53–44% in favor of Biden. Recent polling has also been reasonably accurate; 538 predicted a 49–42% victory predicted for Clinton in 2016, and a final tally of 48–45%; in 2018, Senator Angus King was predicted to win reelection by 56–36% and won by 54–35.
- While Collins is controversial in some ways, she is moderate and has not had any notable scandals. Some may say that the polls are off because of “shy Trump voters,” who don’t want to admit they support a candidate who is considered racist. “Shy Collins voters” seems far-fetched.
If there were a single race that should have accurate polls it would be this one. Not a single pollster was even close. 538 did predict Gideon had a chance of losing, but falling to 42% of the vote was exceptionally unlikely. This was a massive polling miss.
Florida defined the narrative on election night. RCP gave a polling average of +0.9 for Biden, and 538 had given Biden a 69% chance to win the state, with an anticipated 2.5% margin. On election night, the New York Times’ Florida “needle” swung hard in Trump’s favor the moment that vote counting began. This was the first sign of the evening that polling was off.
The final tally in Florida was about 3.4% in Trump’s favor, meaning RCP was off by 4.3%, and 538 was off by a whopping 5.8%. Poor results, but not as large as some of the others this cycle. But the problem goes beyond one unlucky year.
Since 2016, pollsters have been consistently off in Florida, with several big misses. Not only that, all the misses are in the same direction. Every single time the pollsters and the modelers are underestimating the Republican vote. It also looks like these predictions are getting worse over time. Statistically speaking, misses are inevitable, but it is difficult to explain this pattern.
To pretend that these errors are acceptable is gaslighting. It is also becoming increasingly clear that not only are the polls wrong, but the pollsters do not know how to fix them.
Why Were the Polls Wrong?
First, dismiss any conspiracies about pollsters systematically handicapping Republican polling to help Democrats. Aside from the absurd idea of dozens of polling firms coordinating their efforts in secret, it is simply not true that the left benefits from Democratic-leaning poll results. Clinton may have lost the 2016 election due to people sitting out or voting third party because they thought Trump had no chance. Polling agencies also benefit from accuracy; these are companies that work with businesses to do market research. More accurate polling firms land more lucrative contracts.
Given that polling agencies rely on their legitimacy to market their services, they need to come up with an explanation of what they missed in the 2020 election polling. Did they undercount a key demographic? Did the pandemic mess up the turnout models? Did they incorrectly weigh by education, religion, or income? Before long, pollsters will cook up a theory of what happened and reassure us that they will never make the same mistake again.
But this is the wrong approach. The question is not “What did the pollsters miss?” The real question is “Why did the pollsters miss it?”
Imagine you are in math class. You are taking a test on percentages. You feel confident during the exam, but your mark is bad: D. It turns out you were calculating percentages out of 1000 rather than out of 100. Many of your calculations were totally off, screwing up the whole test.
In this case, there is a simple solution: calculate percentages out of 100! You made a mistake, but now you have learned your lesson.
But there is another question: why did you think percentages were calculated out of 1000? Did you pay attention in class? Did you misread the textbook? Did you misunderstand a deeper concept? These questions are difficult, as it is asking about the learner rather than about math.
This level of introspection would be unnecessary if it were one underwhelming test, but there is a pattern of poor results. In algebra, you misunderstood the order of operations and got a C-. In fractions, you confused the meaning of the numerator/denominator and failed the test. There is clearly something systematically wrong with your understanding of math.
It is easy to see your mistakes once you have the correct answers. You can work backward and rationalize how you almost got things right. But that is not enough. Pollsters need to develop methods that stop polling misses from happening.
Declining survey response rates are a major driving force behind polling’s failure. Most polling is conducted by calling people, often via landline, and asking the respondents how they plan to vote. Given the shift away from landlines, and the prevalence of caller ID, it is predictable that response rates have dropped, but the scale of the drop is shocking. According to Pew Research, the survey response rate fell from 36% in 1997 to only 6% in 2018.
Low response rates lead to several interconnected problems. First, pollsters need to perform more calls to collect data. If a pollster in 1997 needed to reach 1000 people they would have to make about 2800 calls. A pollster today would need to call over 16,500. Pollsters are forced to hire more staff, or make do with smaller sample sizes.
While all groups are less likely to pick up the phone, the declines in response rate are unevenly distributed throughout the population. Young people, people below the poverty line, and non-native English speakers are much less likely to pick up phone calls from pollsters. These groups are then systematically underrepresented in survey responses.
Pollsters are aware of this problem, so they attempt to compensate. It is easy to tell who is under-represented in a sample: say 15% of the expected electorate is under 35, but only 5% of the respondents are under 35. In that case, pollsters will assign extra weight to the opinions of the under-35s that they did answer to compensate.
Political opinions may influence survey response rates. One theory is the “shy Trump voter”, which suggests that Trump supporters don’t want to admit they support him on account that they don’t want to be seen as racist. Another theory suggests that those suspicious of authority and science may both dislike answering pollsters and are attracted to Trump. The evidence for each of these are spotty, but if these effects do exist, they would be almost impossible to correct.
As each of these effects progresses year-after-year, more weight is placed on pollster’s likely voter models. When response rates are small, the turnout model must be properly calibrated to apply the proper weighting to each demographic. At a certain point, polls become a prediction of who the pollsters think will show up to the election. It is impossible to properly test these turnout models given the rapid shifts in the electorate from election to election.
What About Modeling?
Modeling is slightly adjacent to polling. 538 founder Nate Silver has become the avatar for polling and modeling and has been catching a lot of flack for everything that went wrong. All things considered, 538’s model did not have a disastrous night. It predicted the popular vote would be 8% in Biden’s favor, and we are anticipating the final total to be about 4.5% in favor of Biden. It accurately predicted Biden would win the presidency. As things stand, Florida and North Carolina are the only states 538 missed in the presidential race.
While 538’s performance in 2020 wasn’t bad, it certainly wasn’t good. At its core, a polling model is just a weighted average of polls. By aggregating polls, grading them by quality, and adding some corrective factors, it is hoped that it is possible to improve the quality of polling. None of these correcting factors can compensate if the polls are wrong. In statistics, there is a distinction between “Precision versus Accuracy”.
Precision: how close measurements are when repeating a method.
Accuracy: the measure of how close the measurements are to the true value.
The diagram below demonstrates the distinction between these two concepts:
Models are premised on the idea that while polls may not be precise, they are accurate. Some polls overestimate one group, while other polls underestimate them. A well-developed model can provide a weighted average. Models cannot correct for inaccurate polls.
Modeling has some deeper epistemological problems. Going into election day, the 538 model gave Biden about a 90% chance of winning the presidency, and the Democrats had a 97% chance of winning Congress. What set of outcomes would disprove the model? If Trump won, the model clearly showed that Trump had a 10% chance of winning, and sometimes things with a 10% chance occur. Even if the Republicans won the presidency, the Senate, and the House, that doesn’t invalidate the model because events with a 3% chance sometimes take place. As long as the model gives a non-zero probability to every possible result it can never be wrong!
Falsifiability is an important concept in the philosophy of science. Good scientific theories are possible to prove wrong. Imagine you took a ball, tossed it in the air, and instead of falling, it just continued to fly into space. This experiment suggests there is something wrong with the theory of gravity. You have falsified it. Now, imagine someone told you that a lizard that lives in their bathroom that controls the weather. The lizard is invisible, cannot be detected by any known device, nor is it possible to sense the brainwaves that the lizard uses to control the weather. It is impossible to prove this theory wrong, but that is a weakness! The theory is so weak it is untestable. Gravity is tested constantly but does not fail. This makes the theory of gravity stronger than an unfalsifiable story about weather-controlling lizards.
When taken together, it is difficult to know how to interpret models like 538. They are based on sketchy data, and it is impossible to falsify them. Nate Silver and the 538 team has done a great job to build a forecasting model that is strong as possible, but it still isn’t accurate enough to be useful.
Taking Polling Failures Seriously
It was maddening seeing all the Monday morning quarterbacking about the Democratic Party’s performance in the days after the election. Every single journalist, writer, politician, and blogger is explaining why Biden almost lost, each coming to wildly different conclusions. At a certain level, this is inevitable. There is a running joke that pundits can rationalize any outcome as supporting whatever you already believed. This behavior was loudly and proudly on display this past week.
Note: this section uses the example of leftists accusing centrists of nearly losing the election because Biden’s policies were too moderate. There are also plenty of centrists saying Biden was too “woke” and could have had better results if he moderated on culture war issues, shut down discussions of abolishing the filibuster, and committed to not expand the Supreme Court. Both leftists and centrists are guilty of this sloppy reasoning I describe below, and I am simply using leftists as an example.
Where things go off the rails is when people use polls to justify why their preferred electoral strategy would have been better. Twitter leftists have been jumping online to claim that Bernie would have crushed Trump, and the problem was not embracing a more progressive agenda. Really? Bernie could have won by a wider margin than Biden. It’s also possible that Pennsylvania, Michigan, and Georgia would have slipped away in exchange for a losing Texas by a thinner margin.
Leftists online have been spouting off comments like “Medicare for All is popular! And the Green New Deal! Policies people love would have changed the election!” How do you know that? Polling? “Medicare for All” is narrow-to-moderately popular according to polls by the Kaiser Family Foundation, Gallup, and Morning Consult. How do you know that this polling is right when so much other polling is wrong? How would it break down in state-by-state results?
It may seem nihilistic, but do we know anything about public opinion? If pollsters got the Maine Senate race so wrong, how are we supposed to believe them about anything?
Election polling has a real advantage over other polling: it gets tested. The only reason we know the Maine Senate and Florida presidential polling were off was because we had an election. Voter models have been refined and sharpened over decades. Opinion polling on questions like Medicare for All, the Green New Deal, or police reform is impossible to verify. While these policies appear to be somewhat popular, it is possible that the polling is wrong. They could be somewhat unpopular, or very popular! That is a really large range…
Public opinion polling also struggles with framing and ambiguity. Questions like “Who do you plan to vote for in the presidential election?” are unambiguous. “Biden” and “Trump” are clearly defined people, meaning there is no room for interpretation. This is not true when asking about ideas like “police reform”. Take this question from Gallup:
Which of the following best describes your view about changes that may or may not need to be made to policing in the United States?
Major changes needed, minor changes needed, no changes needed.
What is a “major” change versus a “minor” change? Should the police be more or less strict? What is meant by “police”? Does it include federal law enforcement agencies like the FBI or ICE, or do people understand this as only applying to local police? And so on. If pollsters can’t get an accurate answer about concrete questions like “Who are you voting for?” there is no reason to take public opinion polling seriously.
Critiques of this type are not new, but there is an added urgency to them in a world where polling is failing by wide margins. What do polling results actually mean? How do we make sense of public opinion in a world where polling can fail so badly?
What is the Point of Polling?
It is worth taking a step back. Why do we poll? Market research and public opinion polling are estimated to bring in over 20 billion dollars in revenue over the last three years. Despite this, the results are still laughably bad, as seen in Florida and Maine. So why even use polling?
Public opinion polling plays an essential role in counteracting vocal minorities. If you were following the news casually, you might believe gun control is a 50–50 issue. This is not true. Assault weapon bans, high capacity magazine bans, and universal background checks all have high approval according to the available polling. NRA-style gun enthusiasts are incredibly loud, which creates the appearance that gun control is unpopular. Polling acts as a counterweight to groups like the NRA.
Money is also speech. Large corporations or wealthy individuals can spend their way into creating the illusion of support. Advertising, sponsored groups, and events can create the appearance that people favor pro-business policies like relaxed regulations or lower corporate taxes. This phenomenon is known as “astroturfing,” where monied interests pay to create the appearance of a grassroots interest. The Tea Party movement, which was heavily subsidized by the Koch brothers, is a famous example of astroturfing. Without polling, it is difficult for politicians to tell the difference between real energy or clever lobbying.
Polling also plays an essential role in pushing back against elected officials acting like royalty. Politicians often talk as if an electoral victory proves that everyone agrees with them on policy. Just because Biden won does not mean he has public support for all aspects of his agenda. People vote for and against a given candidate for a lot of complicated reasons, and it is impossible to disentangle each voters’ motivations. Polling checks to make sure politicians are acting in line with the general will.
Electoral strategies are also determined via polling. Does the public want “Medicare for All” or a “Public Option”? Do people want a ban on fracking? A big part of winning elections is identifying popular policies and running on them. Without polling it is impossible to know what voters actually want.
This goes beyond policy but is also about allocating resources geographically. Part of the reason Michigan’s Senate race attracted little attention was that the Democrat looked to be a heavy favorite. In reality, the Michigan election was close, meaning it would have been more strategically optimal to reallocate resources. One of the reasons Biden won Georgia was because surprisingly strong polling encouraged the campaign to invest in the state. Polling also identified Pennsylvania as the potential “tipping point” state, meaning it was most likely to control the outcome of the election. Biden visited Pennsylvania 10 times in the last two months of the election — far more than any other state.
For the public, polling can feel like watching the score in a football game. You cheer when your team scores and get upset when your opponents pull ahead. But our perception of the score doesn’t make a difference in the outcome of the game. For the players and the coaches inside the game, they will choose plays based on the score. If a team has a lead they will play conservatively to reduce the chances of a fumble, while a team that is losing will attempt high-risk plays. Without the score, it is difficult to calibrate your strategy.
Polling isn’t just about political commentary. It provides a lens into what people want and believe. The failure of political polling is a sign that all polling might be broken. In that case, we need to ask hard questions about what this means for our society.
A Post-Polling World
Of the various polling-backlash stories circulating in the wake of the 2020 election, few are acknowledging the role of polling outside of punditry. Yes, 538 is a public-facing product designed to feed political junkies a constant drip of pseudo-news. It is worth asking whether products like this are healthy for our political discourse. On the other hand, polling plays an essential role in politics, elections, and “discourse” more broadly. How are we supposed to reason and debate as a society without polling to guide our conversations?
If polling has failed, we need to figure out where we go from here. We must internalize that our view into the public’s consciousness is far weaker than we had thought. What does this mean for our politics? Where do we go from here?