What’s Up with YouGov’s UK Election Model?

Heading down the stretch of the UK election, there’s quite a bit of disagreement among pollsters on the exact magnitude of the Conservative Party’s lead, with leads ranging from 1 to 12 points in Saturday’s polls alone. Much of this has to do with how much pollsters try to adjust for the composition of the likely electorate, as opposed to taking respondents at their word that they will vote. There is much literature suggesting that self-reported likelihood to vote is a deeply flawed way of gauging actual likelihood to vote. Those polls that have tended to show Jeremy Corbyn’s Labour performing best tend to weight young and old respondents to their proportions of the adult population, without taking into account vast differences in turnout among these groups. In an electorate deeply polarized by age (with Corbyn leading by 40-something points with 18 to 24 year olds, and Theresa May’s Conservatives up by a similar margin with 65+ respondents), getting this wrong could end up being the U.K.’s version of U.S. state pollsters not weighting on education in the 2016 election.

YouGov is one of the pollsters that has tended to show a narrower Conservative lead (4 points at this writing). They’ve also gone a step beyond other pollsters in projecting individual seat-by-seat results using a statistical technique known as Multiple Regression with post-stratification (MRP). This model currently shows the Conservatives losing seats, resulting in a hung parliament. Because the U.K. is a parliamentary system, it’s not the popular vote that matters, but how this translates to individual Parliamentary seats. Sound familiar?

Accurately mapping national polls to Parliamentary seats is something of a last frontier for political analytics in a multi-party system like the U.K. with heavy tactical voting. For a long time, the uniform swing hypothesis dominated British election coverage, assuming that if the national vote swings 3 points in the Conservatives’ direction, they will win all seats that were closer than that the last time. But swings are anything but uniform in British elections, affected by demographic and political factors like the underlying competitiveness of a given seat. Even when the overall outcome is known, the exact seat breakdown is always something of a surprise. For example, in 1997, Tony Blair somewhat underperformed the national polls, but blew out everyone’s expectations in the number of seats because his gains came exactly where he needed them.

By posting their raw seat-by-seat data, we can deconstruct what went into YouGov’s model (the only one of its kind with such detail available). My intention is not to pick apart their bottom line projection (which can shift at any point), but to dissect the relative pecking order of seats they expect to shift in which direction and to isolate which factors seem to be driving different areas of the country to the right or left of where they were in 2015.

To do this, I’ve combined their model projections with past election results (from both 2015 and Chris Hanretty’s constituency-level Brexit estimates), as well as demographic data, running regression analyses of their estimates against all constituency-level data I could find. This can help us reverse-engineer their model to help explain what is going on.

  1. YouGov’s Model Expects Mean-Reversion

A dominant factor in YouGov’s expected swing to the Conservatives or Labour is how the seat voted along Conservative or Labour lines to begin with. Specifically, they expect the Tories to do well in Labour-held seats, but Labour to make inroads in Tory seats. This is true even when holding constant factors like the percentage voting to leave the EU or demographics. The Conservative vs. Labour margin from 2015 alone explains 34% of the variation in the expected Labour-Conservative swing. Throwing in the winner and second place finisher from 2015— since top 2 party dynamics can matter a great deal — explains 63% of the variation in the potential swing in YouGov’s model.

This is a curious finding in that it implies that both sides are shedding traditional supporters (a few weeks ago, this was assumed to be a one-sided process to the benefit of May’s Tories). It could show that model-builders were too cautious in translating their national dataset into inferences about individual seats, rendering predictions more muted than the eventual result. Or it could be a sign of an anomaly in the data collection process. It reminds me of some similar pre-2016 rumblings of red states getting bluer and blue states getting redder. In the end, red states and blue states were more polarized than ever in 2016.

This tendency to underplay geographic and demographic divisions is an apparent feature of online panel surveys, many of which underestimated the gender gap or the college gap that emerged in the U.S. in 2016. Even when weighted to statewide characteristics of the 2016 electorate, the Cooperative Congressional Election Study, a massive survey with an n-size of more than 64,000 also conducted through YouGov’s panel, tended to underestimate the actual Trump or Clinton margin in counties each won by a lopsided margins — especially in pro-Trump areas.

We’ll know Thursday whether a similar phenomenon bedevils polling in the U.K., or whether we’ll actually see the normal lines of British politics scrambled.

2. Tory Gains Come from Pro-Brexit Labour Areas — and Scotland

Support for Brexit is a factor driving voters to and from the Tories, with the Tories most apt to pick up voters in Labour areas that voted heavily to leave the EU. Adding a constituency’s percentage for Leave gets us from explaining 63% of the variation in YouGov’s model to 79%. The entirety of the Brexit effect is captured by the Leave vote; the vote for UKIP in the last election is a less powerful predictor — only bumping up the model’s predictive power to 75%.

One way to visualize this is to multiply the percentage of Labour 2015 voters and Brexit voters in a constituency as a rough indicator of how many pro-Brexit Labour voters are up for grabs. The relationship between this figure and expected Conservative gains turns out to be quite strong — especially in Labour-held seats — with the only major outliers being Scottish SNP-held seats.

The same trend can be seen in plotting the overall percentage to Leave against the expected two-party swing — with a notable “hump” in the seats held by minor parties, especially in Scotland.

3. Region

Once existing partisan dynamics are accounted for, only a handful of regions demonstrate statistically significant variations from the national swing — Scotland and the North West in the direction of the Tories and the East of England and the South West against them.

Looking at region in isolation — as most commentators are wont to do — YouGov projects freakish gains for the Tory vote share in Scotland, strong gains in the North East and Northwest, and losses throughout the south and in London. This lines up pretty neatly with existing areas of partisan control, with the exception of London.

4. Age

With an electorate polarized heavily on age, we might except the ratio of different age groups in each constituency to be a significant predictor of the national swing.

In reality, the relationship between the percentage of voters aged 18–34 and the expected swing in YouGov’s model looks something like this:

Because YouGov expects a deterioration in the Tory position in the older seats they control, the plot exhibits a hump-like effect, with Tory losses in both very young and very old seats.

When factored into a model (along with the political, Brexit, and regional variables already discussed) it is only the percentage of younger voters that turns out to be significant in driving Labour gains, with no apparent bonus to the Tories from older voters. This is puzzling given that we see swings in both directions along age lines — and the fact that 65+ voters are quite a bit more powerful in the electorate than 18–24s, accounting for around 26% of likely voters versus 8% — but perhaps this is already accounted for in other variables like the percentage to leave the EU? Actually, not really: there’s no significant bonus to the Conservatives from older voters when the other variables are removed, at least according to YouGov. We can conclude from this that YouGov doesn’t expect older voters to be a major factor in driving Tory gains.

All told, with the inclusion of these variables, we get to explaining 83% of the variation in YouGov’s model, which is quite high. Other factors tested that did not turn out to be significant include all manner of social and economic variables (only the percentage of a constituency in “good health” showed a significant and slight positive effect on the pro-Tory swing). Also not significant was whether a constituency was close in 2015. That’s an interesting finding given what we know about the large swings present with tactical voting, but YouGov is quick to disclaim that their model doesn’t account for local constituency-by-constituency factors.

Great credit goes to YouGov to releasing this data and for the use of modern statistical inference techniques. YouGov affirms our expectations in some ways and challenges them in others, predicting that:

  1. The movement of blue-collar pro-Brexit voters in Labour areas towards the Conservatives, a trend anticipated when the Tories led by much more than they do today, seems likely to still happen in some form.
  2. This will be counter-balanced by Tory losses in their heartlands, the south, and pro-Remain areas like London.
  3. The Tories will gain in Scotland — but not in Wales.
  4. Age may be less of a factor than people think, especially older voters shifting towards the Tories.