Trumped Up Polls

Or, why the media needs to learn to stop worrying and love analytics.


People think that the problem with media polling in the Republican primary is that we’re deciding who participates in debates based on surveys with 200 to 300 respondents that carry an extremely high margin of error. That’s not totally right. It’s that the people being surveyed by media polls bear only a glancing resemblance to the people who will vote in next year’s Republican primaries and caucuses, making it more likely that the polls as a whole are systematically off.

Let’s take a look at today’s NBC/Marist survey of Iowa voters. Like nearly all public surveys, it’s a random digital dial (RDD) survey that gives everyone in the population with a phone the same chance of being included. Questions are asked of respondents to qualify them for inclusion in the survey or to ask them the Republican primary question (are they a registered voter? and which political party do they identify with or lean towards?).

NBC/Marist interviewed 919 registered voters in Iowa. Of these, 342 —37% of respondents — were asked the question about the Republican nomination after being qualified “potential caucus-goers” by expressing a general preference for the Republican Party.

The problem is that the universe of likely caucus attendees is far lower than the 37% of registered voters screened into the caucus universe by the survey. In January 2012, there were 2,112,655 registered voters in Iowa. A total of 121,501 people showed up for the Republican precinct caucuses on the evening of January 3, 2012. That’s 5.75% of registered voters in the state at the time, a far cry from 37%.

This is not a problem if the 5.75% look like the broader 37% who generally identify with the Republican Party. But all the available evidence suggests that they don’t, especially in caucus states. Participation levels shape voter preferences. The people who will show up for hours on a chilly Iowa night have different preferences from the larger universe of people who will vote in a Republican primary, who in turn look different than those who will vote Republican in the fall but sit out the primary. Media polling right now doesn’t get more specific than the loosest, most casual definition of a Republican.

This year, the effect of including low-propensity voters in primary polls may be especially great because of a certain mega-celebrity who starts with near total name recognition: Donald Trump.

Without knowing anything about the polling, we might hypothesize that Trump might do better amongst the “low information voters” who might lean Republican but generally don’t follow politics or participate in primaries — but are nonetheless being included in the polls being used to determine eligibility for the August 6th debate.

And there is evidence for this in the polls released so far that show Trump surging to a lead in the Republican primary. In this week’s ABC News/Washington Post poll, Trump received his highest levels of support from people at the margins of the Republican primary process.

In the poll, Trump’s supporters leaned to the left of the typical Republican, belying the idea that his rise reflects a disturbing new far-right tilt in the GOP. In the ABC/Washington Post survey, Trump receives 27 percent support from moderate to liberal voters, 24 percent from somewhat conservative voters, and 17 percent support from very conservative voters. Amongst the moderate to liberal group, Trump wins by 14 points. Amongst very conservative voters, he loses by 8.

Trump has more support amongst Independents than he does amongst Republicans (25 percent vs. 22 percent).

And Trump receives fully four times as much support amongst those with no college degree (32 percent) as he does amongst college graduates (8 percent).

In other words, Trump outperforms the most amongst the groups least likely to vote in a Republican primary.

This doesn’t mean these are bad polls. They could very easily be a highly accurate snapshot of the population they sought to survey. The trouble is that this population doesn’t match up with the likely electorate (something that is easily gotten wrong in lower-turnout primaries). As my colleague Kristen Soltis Anderson points out, Republican-leaning independents make up nearly half of the subsample in today’s CNN poll, but traditionally make up a quarter or less of the GOP electorate in early voting states. In many states, these independents are barred from voting in party primaries at all.

We would not need to infer anything about Trump’s support from casual Republicans if the media would do what most political pollsters do and use voter files to build their survey samples. A voter file carries with it a respondent’s full voting history, and the ability to generate a individualized probability that any respondent will turn out to vote in the next election. With a voter file tied to data from their surveys, media pollsters would be able to quickly answer whether Trump’s support came from people who are actually likely to vote in Republican primaries or not.

Just as the media is currently including too many casual voters in its primary surveys, there’s a risk that polls conducted with data from voter files could include too few. The ability to use data to dial in on any group of voters sometimes leads pollsters to underestimate the impact of unlikely voters. The answer is not to calibrate surveys so that we’re only talking to the 15% or so of Iowa voters who will vote in either party’s caucus, but to talk to a somewhat larger group and weight respondents in the survey according to our best guess that the individual will turn out. A respondent with a perfect history of showing up on caucus night should be weighted higher than a Republican leaner who only votes in general elections. A “listed sample” survey technique must also take into account people who aren’t on the list at all (because they haven’t registered yet) or movers and newly registered voters.

The end result in this process still looks like a survey, but one that borrows heavily from the world of statistical modeling and analytics that was more accurate than conventional polling in predicting the outcome of both the 2012 and 2014 elections. If the media wants to see the world the way the campaigns do, and truly understand how they make decisions, they will use the same techniques serious campaigns use to inform their strategy.

Polls at this stage of the process are not hugely important or predictive. It’s telling that candidates who don’t need to worry about making the cutoff for the Fox News debate aren’t spending resources trying to boost their numbers either nationally or in early states. There’s little point, because polls now don’t equal victory next winter and spring.

But for those who are on the cusp of making the debate, the polls are make or break, and could determine who has a chance to advance or not. Whatever you think about the 10-candidate limit (I personally think there’s some merit to imposing a chilling effect on new entrants), the criteria for choosing the 10 candidates should be rooted in reality. Sound polling, analytics, and data should be used to gauge opinions amongst the people who are actually likely to vote, not casual passers-by.

Right now, we can’t be certain this is the case.