Election forecasts outside the forbidden zone
Or, how I learned to love the multivariate normal…
Suppose that we have information from the very last day of polling, and we want to predict the share of the vote that each party will win. How should we go about it?
We could predict that the result will be exactly what the polls say it will be. That wouldn’t be a very good way of predicting the result, because it doesn’t allow for the uncertainty present in opinion polls. Even when opinion polls work exactly as they’re supposed to, then they will be wrong by a percentage point or so.
It’s more useful to think of our prediction not as a single number (or “point estimate”), but as a range of outcomes drawn from a particular distribution. (If you don’t like talking about distributions, you can replace all talk of distributions with “machinery for generating lots of simulated election outcomes”).
Here’s a (possible) prediction of the Conservative party’s vote share. It’s drawn from a normal distribution. The mean of the distribution is 0.43 (or 43%), and the standard deviation is 0.04 (or four percentage points).
I’ve used two numbers to characterise this distribution: the mean, and the standard deviation. When we change the mean, we shift the predictions up or down. When we change the standard deviation, we choose how spread out our predictions are. A bigger standard deviation makes for predictions that are more spread out. Most of the time, our predictions should fall within two standard deviations of the mean.
If we wanted to forecast election outcomes based on polls, we could base our mean on the average polling figure. Or, we could adjust this poll-based average in some way.
Similarly, we could base the choice of standard deviation either on polls’ published margins of error (we're not going to do this), or we could find a number which spreads out the predictions so that the predictions made for past elections encompass the actual results, even in elections when the polls were badly wrong (1992, 2015).
We could, if we wanted to, make predictions for each party, proceeding one by one. We could draw lots of simulated elections from a normal distribution, characterised by a particular mean and a particular standard deviation.
But unless our predictions are very precise, we’ll run into problems. The vote shares of the different parties have to add up to 100% — but our predictions might not.
Here’s a graph which shows this. For the purposes of exposition, I’ve set the mean of our Labour prediction to 0.33, and the mean of our Conservative prediction to 0.48. The standard deviation in both cases is now slightly greater, at 0.05.
Each dot represents a simulated election. The blue ellipses show where most of the points lie. The red line indicates the boundary of the “forbidden region” — the predictions where the sum of the Labour and Conservative vote share exceeds 100%. As you can see, some of our predictions lie in this forbidden zone. That’s not great.
There are different ways of dealing with this type of compositional data. One way is to model all of the parties’ vote shares at the same time, using a distribution that spits out multiple values (that is, a multivariate distribution).
Using a multivariate distribution doesn’t solve our problem straight away. Multivariate distributions can be used for many purposes, including modelling variables that don’t add up to 100% (or indeed any constant). We need to hack our multivariate distribution so that it spits out values which respect the nature of the data.
We can do that with a correlation matrix. (We’ll need a different type of matrix later — a covariance matrix — but I’m going to postpone that to later). A correlation matrix shows us, for two or more variables, how each variable changes with changes in each of the other variables. Here’s the correlation matrix for my (simulated) Labour and Conservative vote shares.
You can see that the correlation of a variable with itself is one (how could it be otherwise?), but there’s almost no correlation between the Conservative and Labour vote shares.
We need that correlation to be negative if we’re to spit out simulated elections with vote shares equal to 100%. To see why the correlation needs to be negative, imagine if Labour and the Conservatives were level pegging on 50%, and that they were the only two parties. In this situation, if the Conservatives went up by one percentage point, Labour would have to go down by one percentage point. The correlation between the two parties would be -1, because there would be a perfect one-to-one correspondence between Conservative gains and Labour losses.
In this situation, we don’t need the correlation to be that negative. A Conservative gain needn’t come from Labour — it could come from one of the other parties (the ones we haven’t talked about yet).
Fortunately, we can use some tricks from another distribution to help us. If c is the Conservative vote share, and l is the Labour vote share, then the expected correlation between the two is:
which means that if (for example) the Conservatives were on 0.48, and Labour were on 0.33, the correlation between the two vote shares would be -0.67.
We can use this expected correlation to generate some new simulated elections which don’t fall into the forbidden zone.
Just like before, each dot represents a simulated elections, and the blue contours indicate where most of the data lies. This time, no simulated elections land in the forbidden zone, and there's a nice negative correlation between Labour and Conservative vote shares. You can see this from the shape of the blue ellipses, which now look properly elliptical.
This isn't the only correlation matrix that will generate vote shares that (mostly) stay outside the forbidden zone. But it's hard to know how we would find out about these other correlation matrices. We could look at how parties' vote shares correlate over time in opinion polls — but the patterns which result from voters switching between parties might be very different to the patterns amongst errors in how polls predict parties' vote shares.
The correlation matrix will help us the closer we get to the boundary. In the animation below, the ellipse (which represents where most of the data lies) becomes more… well, elliptical, the closer we get to the forbidden zone, and more circular the further away we go.
It's a thing of elliptical beauty…
Spread those ellipses
It's at this point that we have to return to one of the questions we began with — if we're predicting an election as involving a range of possible outcomes, how spread out should that range be?
I said earlier that we'd need to move from a correlation matrix to a covariance matrix. This is where we can bring back in the idea of the standard deviation. You can think of the covariance matrix as the multidimensional analogue of a standard deviation — and you can think of a covariance matrix as a correlation matrix multiplied by a certain scaling parameter. The bigger these scaling parameters, the more spread out the (multidimensional) estimates.
So how to choose this scaling parameter? I did the following
- For the 1979 election, take the average of the polls on the final day
- Subtract the amount by which polls in all other elections save 1979 have over-estimated the three main parties (plus all others)
- Create the expected correlation matrix using this adjusted figure
- "Multiply" this correlation matrix by a scaling factor
- Check whether 90% of the vote shares that actually happened were within the ninety percent range of the simulations, whether 80% of the vote shares that actually happened were within the eighty percent range of the simulations and so on
- Repeat step 5 with different scaling factors until calibration is achieved
What does this mean in practice? It means that if we feed in the following election result:
- Conservatives 43%
- Labour 28%
- Liberal Democrats 11%
- all others 18%
then we get the following ninety-five percent forecast intervals:
- Conservatives: 39% to 48%
- Labour: 22% to 33%
- Liberal Democrats: 7% to 15%
- Others: 13% to 22%
If you want to re-express these as though they were margins of error, then you can think of these as though they were margins of four or five percentage points either way.
That means that after adjusting for historical over/under-estimation of certain parties, we need to operate on the basis of margins of error that are two-thirds greater than the margins of error most polling companies publish.
(Nerdy note: This is a very different conclusion to the one that Nate Silver reaches — Nate thinks the real margin of error on the gap between the top two parties is +/- 13 to 15 percentage points. In my view it's more helpful to talk about margins of error on individual party vote shares, and to disaggregate the two issues of systemic over/under-estimation and variance. You can check some of the code at GitHub).
This post has discussed how to move from polls to results. In a subsequent post, I'll explain how I move between polling thirty days before the election, to polling on the day of the election. In the meantime, keep viewing electionforecast.co.uk and following on Twitter.