About Likely Voter Samples

Charles Franklin
6 min readSep 7, 2016

I feel the need for a tweetstorm on likely voter samples. Today getting lots of questions about a poll with results many don’t like. 1/

Tomorrow could be the other way. Let’s worry about methods and LV before knowing if we like a particular result. 2/

Live by the method, die by the method. But don’t blame the method when you don’t like result, love method when you like result 3/

Identifying likely voters matters because only those who show up affect the election. Opinions of nonvoters matter not at all for outcome4/

But likelihood of voting is not a fixed attribute. It responds to circumstances, enthusiasm, closeness of state. 5/

In fact, shifts in who is LV can drive changes in polls at least as much as changes in candidate preference. 6/

For this reason, I prefer the much more stable RV population as the basis of assessing change during campaign. 7/

But as election approaches, LVs begin to be more certain of voting (or not) and it makes sense to use LV late in campaign. 8/

So how to measure LV? There are four broad approaches. Most media polls use #1 or #2. Campaigns more apt to use #3 or #4. 9/

Many polls (including my @MULawPoll) use a very simple LV screen. In my case: Registered? Certain to vote in November? 10/

I also include unregistered who are sure they will register by November. But key is certainty of voting. 11/

My question offeres “absolutely certain to vote”, “very likely”, “50–50” “Don’t think will vote”. 12/

I use those “absolutely certain” as my definition of LV. In the last two @MULawPoll samples, 77% say absolutely certain 13/

Last 2 polls had 11% & 15% “very likely”. In Wisconsin we’ll see turnout around 85–88% of RVs in Nov. Late polls reach 80%s LV. 14/

Some LVs won’t vote, and some “very likely” will, but track record shows this simple measure has a good track record. 15/

Aside from vote, this LV method also matches Exit results well, validating the measure 16/

2016 Republican Presidential Primary
2016 Democratic Presidential Primary
2014 Wisconsin Governor Election

BUT this doesn’t mean a simple measure is “best”. Pew and Gallup and others use a lengthy battery of questions… 17/

… such as Did you vote last time? Where is your precinct? In elections you’ve been eligible have you always voted? 18/

They have fine tuned these batteries to improve LV identification beyond a simple screen. In principle this is better measure. 19/

But in practice, it isn’t evident that their LV measure is significantly better than a simple measure. 20/

In fact, it may introduce more uncertainty. Variation in the several components induce variation in LV. 21/

So #1 and #2 are basically: Just ask people if they will vote, either simply or with several items. 22/

The advantage is people’s responses will reflect their enthusiasm and how it waxes and wanes. That is important as LV can change. 23/

As one side waxes and the other wanes, the LV pop will shift and so will vote. That is a feature, not a bug. 24/

But it is a feature that recognizes elections depend both on turnout and on preference. That is a more complicated picture. 25/

And so, focusing on RV sample is more stable, but ignores turnout to focus on preference. LV brings turnout in, but is uncertain. 26/

How else might we measure LV? Method #3 uses a statistical model of some sort. 27/

Each election, the Current Population Survey, a massive federal survey used to measure unemployment rate, includes turnout questions 28/

Considered the best available measure, the CPS measures registration and turnout, but not vote choice. 29/

From the CPS we have a good array of demographics including age, education, sex, race, region. But not Party ID. 30/

We can estimate a model of who turns out from those demographics, and then apply that model to our own sample of voters. 31/

That can be in the form of predicted probability of vote based on CPS model and demographics measured in both CPS & our poll. 32/

Or it can be in the form of estimated demographic make up of voters, to which our sample of LVs is weighted. 33/

The advantage is that the CPS sample is over 100,000 cases with a very high response rate, and multiple years can be pooled. 34/

The disadvantage is the CPS is always the last election, not this one. So we have to estimate this year from past data. 35/

And the CPS doesn’t have an enthusiasm or interest measure. Just registered, voted or not and demographics. 36/

So current polls have better grasp of short term enthusiasm but CPS may be better measure of demographic propensity to vote. 37/

Method #4 uses a listed sample from the voter lists, a relatively new resource much used by campaigns. 38/

The voter list for most states provides not only names but also past voting history, that is which elections voted & skipped. 39/

The advantage of Registration Based Sampling (RBS) is that we KNOW these people are registered, and we have their vote history. 40/

The disadvantage is the voter list necessarily omits new voters who register for the first time this year. 41/

With the voter list, one can establish a cutoff for how many past elections the person voted to be considered a LV. 42/

That calculation can be based on a statistical model as well: age, sex, past vote history to give a predicted probability of voting 43/

Campaigns like RBS because it also indicates which district the voter lives in for congress, state legislature, municipality. 44/

There is no reason in principle not to mix a RBS sample with questions about enthusiasm or certainty of voting, exploiting vote history45/

along with short term enthusiasm. But at some point, a criteria must be specified for who to treat as a LV and include in LV sample 46/

How do we validate these LV measures? Past poll accuracy based on the method. Congruence with other data (as exit comparison). 47/

or internal comparison, as how well vote history predicts subsequent measured turnout in the voter file. In short, empirically. 48/

Each of these methods work “pretty well”. I compare my simple LV with CPS based model & weights and w/ exit results. 49/

I’ve seen no evidence in our polling that an alternative is unambiguously superior to my measure, but the others are not… 50/

…not unambiguously worse either. So I opt for a simple to understand method that works pretty well. But, if I were doing 51/

RBS sampling I would certainly welcome having vote history available to improve the LV model. That said, I’d want short term 52/

measures of enthusiasm and commitment to voting, and not rely only on a demographic model or vote history alone. 53/

So what is to be done? All professional pollsters have done what I’ve done in one way or another. They’ve considered alternatives. 54/

They have compared their measures to others and have settled on a LV method they have confidence in. That is a professional judgment 55/

From sample to sample, the LV results may vary a lot. Partly for short term enthusiasm shifts, partly for random error. And… 56/

LV samples will always be smaller than an RV sample for the equivalent price. So random variation will be greater than for RV. 57/

Yet many close observers become obsessed with LV samples before I think they are fully justified empirically. 58/

This leads to many polls, including mine, reporting both RV and LV results, and providing LV estimates before their time. 59/

I advise waiting to September or even early October to give primacy to LV, but release both so everyone can see what is happening. 60/

The fundamental point is that the LV method is not to be blamed for results you don’t like … 61/

… unless it is also blamed for results you do like. 62/

It would be a lot better if we didn’t focus on method ONLY when we don’t like results. LV samples may produce outliers we like or not. 63/

--

--

Charles Franklin

Co-Dev. http://Pollster.com, Dev-http://PollsAndVotes.Com, Director Marquette Law School Poll, Prof Emeritus UW-Madison. R nerd.