Time to Write About the Voting Science
I left this off the agenda the last time to give it it’s due focus. Now to cover the biggest problem with the vote, aside from Cambridge Analytica’s involvement. Yet, I know that it is a problem that almost nobody will understand.
As a mathematical and technical analyst, this portion of Mike’s article reeks of a huge logical fallacy! Such a fallacy is exactly why homeopathy, a discredited practice, exists and is allowed at all in UK society. Indeed, that con-trick was so good, the NHS was embarrassingly still paying £120 million a year for homeopathic treatments as late as last year.
There is as distinct difference between a test that diagnoses you with cancer and you actually having cancer. EU ref was the same. The outcomes of the EU Referendum of 52:48 isn’t a significant difference from the control position of being the EU or chance of 50:50. 48:52 in favour of Remain is! Read on to understand why.
It’s a mistake to think that Leave v Remain were two options or extremes of voting. They were not! It was a question of whether a statistically significant proportion of the population wanted to Leave the European Union. As well as already being in the Remain scenario, this is what makes the referendum fundamentally different from a regular election.
We have not dissolved Parliament to choose another one from a discrete list of possibilities (Labour, Conservative, Lib Dem) etc. not just that, but it was all or nothing. We don’t do that normally either, since that would be like voting in a dictatorship.
What this is like, or should have been like, is a scientific study.
Scientific research is carried out based on what is regarded as a “gold standard” of research. The;
- Double Blind
This is done by several steps:
Step 1 — Select a Null Hypothesis
Every scientific experiment, survey or piece of research, including advisory and mandatory referendums, must have the property of falsifiability. You must be able to prove something is false, so that you know what is true. A simple example is this sequence:
2, 4, 8, ?
What should the next number in the sequence be?
2, 4, 8, 16 <- Is this it? Many folk would go for this. It would continue 32, 64…
2, 4, 8, 64 <- WTF? But this is also true. Consider 2 to the power 3 = 8, 4 to the power 3 is 64, 8 to the power 3 is 512…
Both of these are true. As ludicrous as one of them is likely to be, it isn’t wrong.
However, what is definitely not true, is:
2,4,8,1,1,1,1. Since that doesn’t continue any sequence (pedantically, it doesn’t unless the function is discontinuous).
You have to know what false looks like to determine what is true.
Null hypotheses present that false condition.
“There is no statistically significant appetite to leave the European Union”
The alternate (or alternative) hypothesis is the
Step 2 — Separate Outcomes into Control & Experimental Groups
This one is easier to visualise. Use a control group to determine the “chance level” or the “known” position. We are already in the Remain scenario. We know what that is. That is our control group. Therefore the “Leave the EU” group is our experimental group. The researchers themselves were not in the room when people chose and the researchers did not not themselves know who was voting in each group.
Step 3 — Determine Significance Threshold & Sample Size
A significance threshold is essential to determine at what point you are as close to certain as you can be that any result is absolute and almost certainly irrefutable.
This starts us into the world of statistics. It is basically, what is the threshold of votes, after which it can be deemed better than chance.
In a two option referendum, chance is a coin-flip. If you flip a coin 100 times, you don’t get 50:50 every time.
- Quit often you will get 48:52
- Other times you’ll get 52:48
- Rarer, you’ll get 56:44
- Very rarely you’ll even get 60:40
This creates a probability distribution that looks something like this (for 1,000 coins).
It isn’t a hard line at 50%! Indeed, not only is the result 50% but the ‘confidence’ is only 50%! This is commonly considered like being 50% sure that 50% of the population actually want to Leave, once they have voted Leave. That’s uncertainty on top of an uncertain question.
In order to reach a sufficient significance level. 99% of the ‘mass’ of the above graph must be behind that line. That sits at close to 55% of the vote. Anything below that and the likelihood of an error increases dramatically! So much so, it’s not possible to draw any conclusions from it.
Step 4 — Guard Against Covariates
This is a nasty little gremlin! A Covariate is a second, often unknown factor in the reason for a result. Something that, together with the alternate hypothesis, can explain the reasons for a result. In the case of the vote, people voting Leave ‘for a laugh’ or a protest vote, were one such grouping. For example, this guy.
When you are working with 50:50 thresholds, it has the effect of magnifying any such ‘illegitimate’ or protest votes. These are covariates which in combination with unsound thresholds, add too much gravity to errors, or mistakes.
Another problematic position is combining too many groups into a single cluster. Since this hides a substantial amount of the reasoning and legitimacy from the result. When people say “Leaving the Single Market” wasn’t on the ballot, this is really what they mean. There were 17 million different reasons cited for leaving the EU. Each one was to the detriment of the other 16,999,999. Since:
- People voted to stop immigration but didn’t want to stop the NHS getting staff
- People voted to give £350 million a week to the NHS but not kill British Farming
- People voted to counter tax avoidance [Haha] but not lose workers rights
This created a cluster that is naturally at odds with itself. There was a disconnect between the voting choices and the actual behaviour, wants and needs of the population. So selfishly, each Leave voter voted for their own thing, to the detriment of every other Leave voter. Meaning that not only are 16.8 million (or around 17.2 million now) Remainers disenfranchised, but so will almost all Leavers be. But I digress.
Step 5 — Run your Shizzle
Just run it. Have the vote.
Step 6 — Ensure Soundness of Result
Accounting for the Covariates in step 4, make sure the percentage of the result breaches above the threshold of significance. If it’s 99%, then 55% would be it. Anything below that is an insignificant result that has failed to disprove the null hypothesis.
Why do we use that wording? Because remember, the the null hypothesis is not diametrically opposite to the alternative hypothesis and like our sequence example, it doesn’t automatically mean that you’ve proven the alternative hypothesis false or true. But what still is true, is the null hypothesis. The experiment, or advisory referendum, has failed to disprove it.
I follow Mike. Some of it is useful stuff. But here he demonstrates why he is a PR man and not a scientist or statistician. The assumption that a 48:52 to Remain is the same as a 52:48 to Leave is flawed at what is the most basic level. Since that threshold for leave, remains behind the 55% necessary to achieve it. Note, in the steps above, it has to be all or nothing. You can’t have a “half right” result by doing half of them. If you do half the work, you get an invalid result!
What Mike’s somewhat naive logic exhibits, is akin to a study which compares placebo’s against a drug under test, where the placebo works 48% of the time and the drug works 52% of the time (against separate groups), yet declaring the drug trial a success, with all its side-effects and costs, when the threshold for success is set at 55%. This is not too far off the historical and recent drug scandals of medicine testing. It isn’t sufficient to wanting what you can measure as that leads to nothing but ruin. You have to know what you want to measure and be able to measure it.
Moreover, Cambridge Analytica is a statistically literate company. To them, this is the basics. The bread and butter. They can and did exploit that illiteracy in government and the population, to ensure that we Leave on an inconclusive result. They can, do and did run rings around the UK population using statistical and data-science tools alien to almost everyone who’s not in the know. Not least because most of the UK is analytically and scientifically illiterate, including the civil service and the Electoral Commission. They, as well as GCHQ, can’t pay enough to keep the competent on the books. So UK Gov is primed for this sort of exploitation.
When a government is that outclassed by a private rogue organisation, democracy cannot not long last.