Weight, Weight, Weight
No, Kantar did not imply young people will have an historically low turnout in the upcoming General Election.
It was suggested, in a Novara Media newsletter and elsewhere, that a recent Kantar poll that “the turnout for each demographic… would be the lowest in decades”, with the exception of people aged over 65.
This article looks at what survey weights are, correct mistakes in the Novara Media newsletter, and shows the problem that Kantar are seeking to solve.
What are survey weights?
A polling company or research organisation conducts a survey. The goal of a survey is to be representative of its intended population, such as all adults in Great Britain or the United Kingdom. Surveys aim to take a perfect ‘slice’ of the general public: generally asking one or two thousand people what their views are. Using these surveys, we can get good estimates of important parameters — like what proportion of people intend to vote for each party.
These responses are then adjusted so the samples look more like the overall population. If we know that they are more women in the population than there are in the sample, we ‘weight’ the responses of women, so they count for more. Companies will often use variables like age and gender for these weightings.
‘Unweighted’ survey estimates just means the responses before weights are applied. ‘Weighted’ survey estimates are after the company has applied those weights.
The goal of using weights is simple: giving different loads to the responses so the sample is more like a perfect slice of the population, providing more representative sample statistics. There is a cost: usually, using weights increases standard errors of these estimates.
Determining how polling companies choose to weight their data is difficult. Why? Because it often isn’t published.
This claim is false. Almost all polling companies which publish vote intention estimates are members of the British Polling Council — as well as the Market Research Society. The British Polling Council has rules of transparency, ensuring the publication of data tables and methods.
For example, the latest ICM Unlimited internet poll for Reuters states:
The data has been weighted to the profile of all adults aged 18+ in Great Britain and is weighted by age, gender, social grade, household tenure, work status, and region. The data is also weighted by 2017 general election vote and 2016 EU referendum vote.
On Reddit, a user took the ‘weighted total’ for each age group in one table, and divided by the respective ‘unweighted total’ figures. In a misunderstanding, they called these proportions ‘turnout by age’. This was compared to the early YouGov estimates for turnout in the 2017 General Election. This table was then shared on Twitter.
As Matt Singh (Number Cruncher Politics) and Anthony Wells (YouGov) have identified, this is incorrect. That calculation will capture effects of Kantar’s turnout weight and all of its other weightings — including by age.
It does not represent turnout by age group.
This error is replicated in the Novara Media newsletter, where Bastani labels this turnout likelihood weighting — which was also applied in Kantar’s accurate 2017 polling — “as dodgy as a £2 note”.
What is the turnout by age? It can be difficult to tell directly from the Kantar data tables. Matt Singh has estimated the implied turnout among eligible voters, which somewhat fell compared to the 2017 British Election Study vote-verified estimates (from its post-election random probability survey).
These implied turnout rates are not “the lowest in decades” — the implied turnout was higher than the comparable 2015 rate in some age groups. Kantar’s turnout calculation can change in subsequent polls.
The newsletter then practices some ‘unskewing’, using this incorrect turnout calculation to assert “simply applying turnout by age from 2017 means the Tory lead is cut to just 4%”. This is another calculation copied from Twitter — with seemingly no verification made by Novara Media as to its accuracy.
The missing voter problem
Kantar is unusual in including turnout likelihood in its weightings.
Most polling companies weigh on demographic variables (like age and gender) to match the overall profile of the general public, and then only estimate vote intentions using respondents who say they are likely to vote.
Voters are more likely to respond to surveys than non-voters. There is a non-response bias between voters and non-voters: the resulting sample can end up with too many voters. Weights are then used to match the overall population. However, the population we are interested in is people who will vote in the election — not everyone of voting age.
When using population weights, these missing non-voters are replaced by voters with similar demographics —artificially inflating support for some parties. The inappropriate use of population weights can lead to error.
This is the problem that Kantar are trying to solve by including turnout probabilities within their weights.