MRP Estimates and the 2019 General Election
What is multi-level regression with post-stratification?
MRP stands for ‘multi-level regression with post-stratification’. It is a statistical technique used to transform national opinion survey results into local estimates.
This article examines the technique and what can go wrong, particularly focused on its use prior to the 2019 UK General Election.
What is MRP?
MRP is a technique for building a model of public opinion. It is not the model itself. The underlying assumption is that similar types of people have similar views, such as vote intention, across the country.
The model seeks to estimate a person’s political opinion such as how they intend to vote, by their demography and constituency. By using the census, we can add up what kinds of people there are in each constituency, giving an estimate of vote intention for that constituency.
This technique is not perfect. However, MRP offers a good way of estimating local opinions where full, separate surveys would be impractical or too expensive, such as all constituencies in the House of Commons.
There is a step-by-step process for building such a model.
Step 1: Gather survey responses
This poll should include demographic information from the respondent, as well as their constituency. That demographic information — like age group and highest educational qualification — should only include what is available in the national census.
Step 2: Compile constituency-level predictors
Our estimates can vary by constituencies. If you want to estimate a party’s support in the upcoming General Election, one good choice might be the party’s vote share in the 2017 General Election.
Step 3: Collect census data
We need to know how many people with different sets of demographic characteristics there are in each constituency.
If our model of vote intention used age groups and education level, then we need to know how many eligible citizens there are aged 45 to 55 with a university degree.
Step 4: Build a model of individual vote intention
A respondent’s vote intention is treated as a function of their demographics and constituency. A regression analysis is used to find the best-fitting model.
This is the regression part of MRP. This is multi-level because the vote intention probability is a function across multiple levels: the respondent’s demographics and their constituency. As an example, an older person without a degree living in the shires has a higher probability of voting Conservative than a younger graduate living in a city.
Step 5: Calculate weighted averages in each constituency
Using our census data (step 3) and our individual model (step 4), we now have:
- Numbers of each demographic type in every constituency;
- Modelled probabilities of vote intention for each person based on their demography and constituency.
Through a weighted average, we calculate the estimated party vote intention in each constituency. This is called post-stratification.
Analyses of this kind may use age groups, education level, social grade and ethnicity. Recalled votes from the 2017 General Election can also be used as demographic information.
An example is the recent Focaldata MRP estimates of vote intention in British constituencies for the upcoming General Election, commissioned by Best for Britain. Prof Hanretty (Royal Holloway) has reproduced their central estimates from each constituency.
What can go wrong with MRP?
A curious case of selective memory has surrounded the use of MRP. In the 2017 General Election, people seem to recall YouGov’s accurate central estimation of a hung parliament. Using the same technique, the Lord Ashcroft model (which estimated a Conservative majority over 60) is sometimes forgotten.
Just as surveys can have errors, models can too. Large sample sizes and MRP are not guarantors of reliability.
Here are some things that can go wrong with this technique:
Sampling bias in the survey data: Survey data collected by the social research company could have systemic errors, estimating parties are above or below their real support in the country. Those errors are then funnelled through — overestimating or underestimating a party’s support across the country. In short: bias in, bias out.
Choice over demographic predictors: The individual model relies on a reasonable choice of demographic variables. Different models will produce different estimates — with different errors, based on these choices.
Choice over the constituency-level predictors: A substantive constituency-level predictor is needed. Picking a poor one — or even forgoing the use of a constituency-level predictor — may increase error.
Errors in the constituency weights: The statistics for each constituency may be out-of-date: the census was last conducted eight years ago — in 2011. Recent estimates of each constituency should be available from the Office for National Statistics, but may be limited in terms of demographic dimensions.
In their evaluation of MRP estimates for various American political questions, Prof Lax and Prof Phillips (Columbia University) found the median absolute difference between the model estimate and the true value was roughly 2.7 points. This error is likely to range from 1.4 to 5.0 points.
Claims of pin-point accuracy from MRP estimates should be read with extreme caution. As Lax and Phillips write:
One cannot blindly run MRP and expect it to work well. Users must take the time to make sure they have a reasonable model for predicting opinion.
What questions should you ask?
Here are some questions that journalists and researchers could ask about published MRP estimates of vote intentions in Great Britain.
- Which research company or companies conducted the surveys?
- How many people were interviewed, and when?
- What were the most important demographic variables used in the model?
- What constituency-level predictor was used?
- How was the overall model selected?
- What is the uncertainty of your estimates?
The methodology behind MRP estimates should be transparent, so effects of these choices can be scrutinised.