The final straight

A short note on how why my election forecasts differ from current polls

Some polls suggest GE2017 will be close. Others, not so much.

My forecast continues to suggest that the Conservatives will finish with a healthy lead in vote share, and a large majority. I thought it helpful to explain why.

This post is based on the forecast as it stood on the Monday morning prior to the election.

Current polls

Let's start with the current polls. An average of the last six polls (from Survation, ICM, ComRes, YouGov, Opinium, and ICM) has the race like this:

  • Conservatives: 43.5%
  • Labour: 37.4%
  • LDem: 7.5%

or a lead of six percent.

Current polls (2)

Because different polling companies have different approaches, and because those different approaches materially affect their top-line figures, a simple average can give very different results depending on which polling companies are included. I use a model to combine polls. It tries to model pollsters' different house effects, on the assumption that these on average cancel out. The model has the race like this:

  • Conservatives: 42.4%
  • Labour: 38.2%
  • LDem: 7.5%

or a lead of just over four percent. This is probably because ICM feature twice in the simple average above, and because ICM produce higher Conservative leads.

Election day polls

Why then is my forecast so bullish about the Conservatives' prospects? In part, it's because of movement I expect to happen between now and the day of the election in the polls. I described how I model this in a previous post, but generally Labour do badly in the run-in to elections. This means that I expect their share to go down. That means that the final day polls ought to look like this.

  • Conservatives 42.7 (95% forecast interval: 38, 47)
  • Labour 35.2 (31, 39)
  • LDem 7.9 (5, 11)

(You'll see that I've added forecast intervals to the list now. This means that although I think that the Labour vote share will go down in the polls, I'm not 100% confident that it will. I'm only 95% confident it'll end up in the range between 31 and 39. These intervals aren't this wide because I'm hedging. These are the intervals that a particular regression model produces when fed with data from 1979 onwards. Want tighter forecast intervals? Go back in time and make opinion change more predictable).

Polling errors

That gives a Conservative lead of seven and a half percentage points. That would probably be sufficient for a majority, but perhaps not a very healthy one. The final step in reaching my forecast vote share is to make assumptions about errors in the polls. On the basis of general elections since 1979, I'm going to assume that the average of the polls (more specifically, my model output, which takes account of the house effects of different companies) is going to over-estimate Labour, and under-estimate the Conservatives. (See the Polling Inquiry Report for details). This, plus some other unrelated changes to do with candidacy, gives…

  • Conservatives: 43.6% (38, 49)
  • Labour: 33.3% (28, 38)
  • LDem: 8.5% (4, 13)

for a Conservative lead of just over ten percentage points.

What if you're wrong?

I might well be wrong, in the sense that the actual outcome is quite far from what I forecast to be the most likely outcome.

You can tell that, because my forecast intervals are so wide. These intervals are a feature, not a bug.

Hopefully, I won't be wrong in the sense that the outcomes land outside these 95% forecast intervals (… or wrong in the sense that far more than 20% of outcomes fall outside the 80% forecast intervals, or wrong in the sense that far more than 50% of outcomes fall outside the 50% forecast intervals).

Some of the ways in which I'm wrong will be self-correcting. If I'm wrong about campaign movements, we'll know over the next couple of days, and future forecasts will reflect this.

However, if I'm wrong about polling industry error, we'll only know on the night.

Isn’t that what makes it so much fun?