What's a negligible effect in an election?

One possible rule-of-thumb for researchers

One of the quantitative methodology papers which has most shaped my thinking in recent years is Carlisle Rainey's Arguing for a Negligible Effect.

Like all great papers, it makes a simple point well: statistical significance is not the same as substantive significance, and to assess substantive significance we need to form judgements about what kinds of effects we would generally consider to be substantively meaningful.

Specifically, we need to identify the smallest meaningful effect m. This helps us talk about our effect.

  • If the 90% confidence interval for some effect is in the range [-m, m], then our effect is negligible.
  • If the upper end of the 90% confidence interval is lower than -m, or (conversely) if the lower end of the interval is higher than m, then our effect is substantively significant.
  • By implication, if our 90% confidence interval spans m, then our effect is neither negligible nor substantively significant: a meh interval, if you like.

What's great about this paper — and what has made it resonate so much with me — is that it forces research to think about questions of magnitude, and to ask (in effect), what is the smallest effect I could consider caring about?

Unfortunately, exercising this kind of considered judgement is hard. In some current research on the effects of the Brexit referendum on the 2017 general election, I've had to think about what makes a meaningful difference to outcomes.

It's clear to me that any putative effect which increases a candidate's vote count by one vote is not substantively significant.

It's also clear to me that any putative effect which increases a candidate's vote share by, say, the average winning margin in elections (24 percentage points) is substantively significant.

But where between these two points do we find our smallest substantively meaningful effect?

One way of leveraging existing social judgements is to look at the rules which govern recounts of elections.

Put informally, if the margin of victory in an election is close enough to trigger or permit a recount, it's small enough to be affected by idiosyncratic features (tired counters, lost boxes) that we don't usually bother with — and so it's small enough that we would declare effects of the same magnitude negligible.

Unfortunately, there are no formal rules which govern how close an election result must be to trigger a recount in UK parliamentary elections. There are, however, formal rules which govern recounts in several US states. These rules are collected by Citizens for Electoral Integrity Minnesota.

On my very cursory examination of the margins which permit a recount in the states listed, the median margin is 0.5 percentage points.

If instead we impute missing states a value of zero, then the median margin slips to 0.25 percentage points.

On this basis, a substantively meaningful effect in an election might be one which causes one candidate's vote share to increase (decrease) by 0.25 or 0.5 percentage points.

How does this rule of thumb allow us to interpret findings from published research?

Let me take as an example "Ballot Order Positional Effects in British Local Elections, 1973–2011", a 2014 paper by Richard Webber, Colin Rallings, Galina Borisyuk, and Michael Thrasher.

It's a great paper in part because it's so comprehensive. The paper includes information on 164,333 distinct ward elections. The largest category of elections is the category of single member ward elections.

The authors find that the mean difference in vote share for last- and first-placed candidates is 0.6%, with a standard error of 0.2%. Let's interpret that as an effect of being first-placed. Is that substantively meaningful?

If there was no uncertainty around our estimate, we would class it as substantively meaningful, since 0.6 percentage points > 0.5 percentage points. But if we form a 90% confidence interval around the estimate (0.272, 0.928), we find that it includes effects we'd regard as negligible.

It's possible to argue that the proper value of m in this case should we 0.25%. In this case, we would conclude that the effect was substantively significant. But this would shift the debate on to considering questions of magnitude more directly, which in my view is a boon.

The morale of the story is: always interpret the magnitude of your putative effect, and ask yourself some questions if the 90% confidence intervals surrounding your effect on vote share include effects of half a percentage point or less.