LaGov Polling: Build Me A Model

To read my piece on LAGov primary polling, click here. To read my piece on LAGov runoff polling, click here.

John Bel Edwards clobbered David Vitter in Louisiana’s Gubernatorial runoff. To many, this was a shocking result. Louisiana has turned increasingly Republican over the past decade. In Presidential races, Louisiana voted for Mitt Romney by 17+ points, and John McCain by a similar margin. All of Louisiana’s statewide elected officials are Republicans. Three-term Democratic US Senator Mary Landrieu was easily beaten by Republican Bill Cassidy just a year ago.

Yet on that Saturday night, John Bel Edwards claimed a massive victory over Vitter, 56% — 44%.

Did we know this was going to happen?

According to the polling averages, the answer is a definite “yes.”

During the runoff, I kept track of all of the publicly available polls and calculated a simple “average” of the poll standings of each respective candidate to aggregate the data. While this worked in this particular case, I would defer to other formulas for greater accuracy. Nate Silver, formerly of the New York Times, developed a weighted poll average that graded polls on their sample size, technology, recency, and more. Introducing this type of statistical weighing can be very effective, but can also be extremely treacherous. Each factor you apply to poll to “weigh” it introduces potential error. Compounding error can rot the raw data and transform it into manifesting conclusions that simply aren’t warranted.

My excuse for my more simplistic endeavor is easy: Nate Silver is a statistician. Despite an undergraduate course or two in statistical econometrics, I am not. So I kept it simple.

In the final Louisiana Governor’s race poll average, I found JBE receiving 51% of the vote and David Vitter receiving 38%. Undecideds accounted for the rest. This is a margin of 13 points.

The final margin was 12 points.

Pretty close. Right?

Not so fast.

One of the great challenges of polling is not the accumulation of data. Anyone can build a survey and begin asking questions. It is, however, what you do with that data once it is collected that makes or breaks a poll. This is what divides polling from anecdotal political coverage. As I explained in my previous piece, polls may be predictive of an outcome, but truly tell you only what voters feel at the time the surveys were collected. Even this more modest goal of properly contextualizing voter attitudes at a specific time is difficult.

This is where “modeling” comes to the fore. It is a little-discussed, but crucially important, aspect of polling that can make or break a survey.

In short, modeling is the statistical adjustments pollsters apply to raw survey data. These “adjustments” attempt to match the characteristics of a much larger group of people in the much smaller set of those surveyed. This is especially difficult in elections due mainly to uncertainly about whom will actually cast a ballot.

Pollsters attempt to deal with this uncertainty in a number of ways. First, pollsters that want to test an electoral outcome pick their interview subjects from a list of registered voters. This ensures that those asked are at least eligible to cast a ballot.

Since turnout inevitably falls below 100%, pollsters must then weed out people that are unlikely to show up on election day to vote. This is a dicey problem, but can be mitigated by “screening” voters during the survey. Interviewees are asked at the outset of the survey, no matter what the technology (robo IVR, live calls, online) to rate their likelihood of voting in the upcoming election. (For surveys that are conducted far from an election — say three to six months in advance — some pollsters will only survey “registered voters” due to the fact that most people cannot tell you if they are “likely” to vote in an election that is still months away.) If a voter says they are not likely to vote, the survey is terminated, thus weeding out people who will not affect the outcome of the election.

I will pause here to say that a “likely voter screen” is clearly wrought with potential error. People may lie. Others who are wise to the screen may say that they are “likely” to vote in order to hear the other survey questions. And then there are those who may have the intention to vote in the abstract, but who fail to do so on election day. Despite the pitfalls, this method is among the best ways to model the electorate on election day.

Once interviewees have been culled based on their likelihood of voting, the data collected must be “weighted.” Unfortunately, this is a process rife with controversy and suspicion — and probably for good reason. A poll interpreted through different weighting processes can say radically different things.

In 2013, Nate Cohn (then of the New Republic, now of the New York Times) took popular Democratic pollster Public Policy Polling (PPP)to task over their weighting decisions. Among the charges Cohn lobbed against PPP were accusations of “weighting” results with an unpublished question (in this case, “Who did you vote for in the 2008 Presidential election?”). PPP used those numbers to adjust their political polls but did not reveal this practice until Cohn confronted them via email. Did this adjust their results for inaccurate outcomes? It is still hard to say.

Weighting poll data can be done in a variety of ways, but often relies on adjusting the sample along a single variable (PPP used the results of the 2008 Presidential election above). Some polls adjust for entire demographic datasets, and while this is a more accurate procedure, it is also mathematically more complex.

Now, with that prologue out of the way, we turn to Louisiana’s gubernatorial runoff.

As noted above, the polling average properly predicted the outcome on Saturday night. The eighteen tracked polls in my dataset had a total sample of 12,720 Louisianians. The polling dates ran over three weeks, with the earliest just days after the primary election, and the latest sampled two days before the runoff election on November 19th.

Yet, in this heap of polls, pollsters used a variety of models to predict the November 21st electorate. Almost all of those models wrongly estimated one particular variable — the racial composition of the electorate.

I’ll tell the story through three examples.

Red Racing Horses

On November 14th, one week prior to the election, Red Racing Horses (RRH, a Republican/enthusiast polling consortium) released a poll showing Edwards leading Vitter by six points, 48%-42% with 10% undecided. This was clearly far from the final result. Were voters much more torn between the candidates a week prior? Hard to say, since contemporaneous polls showed a much wider margin — this survey was definitely an outlier. Furthermore, around the time of its sampling, Early Vote indications showed an Edwards surge, not a Vitter rebound. Very little other evidence gave credence to RRH’s topline conclusion.

So what went wrong? We can speculate a few things: The sample size was small — only 359 likely voters — which calculates to a 5%+ margin of error for the Louisiana electorate. This is quite low, raising the possibility that the statistical error accounts for the clearly out-of-sync result.

There is another factor, however, that might be more instructive. Delving into the racial composition, however, we see a much more probable culprit. According to their final tally, their potential electorate was only 26% African-American. According to the latest estimates, the runoff electorate turned out to be about 30% African-American. Why does this matter so much in Louisiana?

Historically, and contemporaneously, African-Americans vote for Democratic candidates by near unanimous margins. For example, in the 2014 Senate primary, Senator Mary Landrieu won 94% of the African-American vote. According again to estimates, John Bel Edwards may have won up to 97% of the African-American vote.

So what? Well, Louisiana’s registered voters are approximately 30% African-American. If a candidate appeals to African-American voters, and those voters vote on election day near to their registration percentage of 30%, she makes significant headway towards 50%+1 otherwise known as the winning margin.

A quick illustration. If candidate A receives 95%+ of the African-American vote (yielding ~28.5% of the total vote), she needs to win only ~30% of the other 70% of the electorate to get over 50% of the total vote. This has led to the common refrain of the “30/30” rule for Democrats in Louisiana: Win 30% of the non-African-American vote + ensure that the electorate is 30% African-American and (under normal voting preferences) the Democrat wins.

RRH Survey Crosstabs 11.14.15 LAGov

Back to RRH: Despite their too-close result, their poll did provide an important insight. According to their crosstabs, Edwards was winning 39% of the white vote. When combined with a fair split of “other races” at 50%/50%(although Democrats tend to win Asians, Latinos, and other races by margins of up to 65%/35%), this illustrated a much larger advantage for Edwards than the 46%–42% topline revealed. Remembering our rule from above, this meant that if the electorate had been modeled at 30% African-American, Edwards was actually ahead in this poll with approximately 55.8% of the vote — nearly the precise final margin.

RRH’s poll was actually more predictive than they the toplines might indicate. Maybe it was partisan cheerleading that led them to model the electorate incorrectly. There was already clear precedent for thinking the electorate would be more than 26% African-American. The primary was approximately 29% African-American. And all indications (including early vote) showed more enthusiasm for the runoff. Yet, their model produced an out-of-sync result that rendered their poll less accurate.

Marbleport/Hayride and MRI

The RRH example above most easily illustrates the problem with modeling in producing “predictive” polling. But two others pollsters illustrated how a model can shape a result.

The Hayride, another group of Conservative activists, also released polling in the runoff with their partner Marbleport. To review, in the primary the “Marbleport/Hayride” survey was fairly anodyne in its disposition. It correctly predicted candidate finish order and, in a much more difficult-to-poll jungle primary motioned in the direction of the actual results. Not perfect, but certainly respectable and useful information.

In the runoff, for whatever reason, Marbleport/Hayride released only one survey. The poll also showed Edwards leading Vitter by six points, 48%-42%. Released about ten days before election day, it pre-staged the results of the RRH, but also made a similar mistake. The poll’s model electorate was only 27% African-American, upon which they confidently comment “[t]hat’s about what we expect the electorate to be.” It was not. Again, excepting for pure partisanship (also known as “unskewing”), there was little evidence for releasing a poll with an electorate modeled as such.

Now, and finally, I want to address Verne Kennedy’s (of MRI Polling) modeling of the electorate. Mr. Kennedy consistently delivered the most predictive results during this electoral cycle. Yet, he trafficked in some amusing modeling “Hamlet” throughout the electoral season.

Kennedy’s many models in his final poll

What do I mean? Mr. Kennedy presented results of his polls with up to three distinct electorates. This was quite helpful, and truly illustrates my earlier points, but can be quite confusing to the average observer. As you see to your left, Mr. Kennedy presented several options — and revealed the “raw,” unadjusted data from which he was working. His final “forecast,” which we will discuss later, was very accurate. But something strange is happening here.

Since Mr. Kennedy conducted his rolling tracking polls (meaning sampling several days a week, each week, to illustrate trends over time) for a “private” group of businessmen, I want to be clear that he did not present all of his polling personally to the public. The final poll he discussed in the media was Thursday night tracker. He allowed it to be posted on BayouBuzz, here.

In his final “forecast,” which I received multiple times via email, Kennedy makes a final call of Edwards 55%, Vitter 45%. This includes one extra night of polling from his Thursday release (which showed Edwards 52%-Vitter 40%). The strange thing about Mr. Kennedy’s predictions is that it purports to show an electorate that is 26% African-American giving Edwards a 10% win. This is roughly the same model that RRH shows only a six point Edwards lead. Similarly, it is close to the 27% African-American electorate share in the much earlier Marbleport/Hayride that showed Edwards with the same six point lead.

So, was Kennedy just guessing in his final “forecast?” Maybe.

If we look closely at Mr. Kennedy’s long final write-up, you can see that he actually did model the electorate correctly. He just ignored it.

With an electorate that was 30% African-American, Mr. Kennedy’s final poll called the race at Edwards 55%, Vitter 38%. Mr. Kennedy’s process includes “assigning” undecided African-American voters along traditional patterns (voting 95%+ for Democrats). His undecided 7% (making up 100%) in the 55–38% result was almost entirely white. If the undecided voters break in a similar pattern to the decided voters, Vitter would win a little less than five out of that seven points. This makes the final “adjusted” result 56–44% with an electorate that is 30% African-American. Otherwise known as the final tally on election day.

Sure, those guesstimates we just levied could have been wrong. Each successive transformation builds more error into your raw data.

Yet, the effects of modeling are laid bare by these transformations. Each attempt at casting polling data through a model relies on the pollster’s (hopefully) informed choices. Mr. Kennedy’s final forecast, colored by his polling but reliant on his gut, proved to be more lucky than good. If the electorate was only 26% African-American overall, the election could have proved to be a nail-biter, even with Edwards winning more than 37% of the non-African-American vote.

Better To Be Lucky Or Good?

Despite the issues noted above, I still think the increased emphasis on polling in politics is a good thing. As I’ve written before, it generally keeps the pundits honest.

But polling is not immune to “spin.” By modeling an electorate, a pollster places their own stamp on the data they collect. To be sure, this is essential to producing polls that properly contextualize voter attitudes. Nevertheless, the practice is rife with potential pitfalls. Hence the common refrain, “they can make polls say anything.”

For the uninitiated, deploying a fancy set of numbers can certainly bolster an argument or color any conversation. The key to picking the wheat from the chaff is checking the pollster’s model first. Who do they think will vote? In what proportion? Does this reflect history? Or does it foretell a unique electoral atmosphere? This is crucial to understanding any survey.

In sum, a model is a pair of glasses. Seeing the world through the wrong lens, like data through the wrong model, can distort your view. Getting the right prescription is essential. Qualified and experienced professionals will give you the best fit. Without it, you might just be blind.

Addendum: Why Race?

Some may question why it makes sense to filter polling through the prism of race. It is a fair point — anyone can tell you an anecdotal story of a person of this ethnicity or that race that thinks this way or that. We are all individuals, of course. Yet, (runoff) elections set up a binary equation. Without delving into the historical context of racial partisanship, it is clear that if you have to sort people by any demographic, some of the starkest differences in voting behaviors can be observed through a racial lens. Clear differences help us make good guesses when we collect imperfect data. Other groupings might also work, but maybe not as well.