What do polls actually say?
How to interpret a poll like a statistician
1/ Suppose you ran a @Twitter poll asking
“Should @RW_UNP be PM?”
Suppose, If 151 people answered, and 85 (56.3%) said “Yes”.
How should you interpret the results?
Could we conclude that a *majority* of #SriLanka think “Yes”?
2/ Not everyone in #SriLanka is on the internet.
Of that not everyone is on @twitter.
Of that not everyone will see and answer your poll.
So, No.
Instead can we say,
“A majority of my @twitter audience think ‘Yes’” ?
Possibly.
3/ Some people in your audience think “Yes”, others “No”.
The 151 who answered are a subset of this audience.
(For now, let’s assume this subset is a random sample.)
What we need to find out is, what proportion of the audience think “Yes”, and if that is >50%.
4/ Intuitively we know that the “Yes” proportion is around 56.3%.
But it could also be 56% (i.e. 56.0%). Or even 49%.
Theoretically, even 10% or 90% might be possible.
Intuitively we also know that 56% is more likely than 49%. And 49% more so than 10% or 90%.
5/ Suppose the actual proportion was 56%, and suppose we picked 151 random responses from my audience.
The responses follow a Binomial distribution.
We can use Microsoft Excel’s BINOM operator, to make various Binomial distribution calculations.
6/ If we picked 151 reponses, any number between 0 and 151 could be “Yes”. 0 and 151 are possible, but unlikely. What values are more likely?
95% of the time we would get between 73 and 96 “Yes” responses.
7/ Similarly, if the actual proportion was 49%, 95% of the time we would get between 62 and 86 “Yes” responses.
Note, the observed 85 responses is within the range. But only just.
8/ Now, if the actual proportion was 10%, then 95% of the time we would get between 8 and 23 “Yes” responses.
Hence, while 85 is theoretically possible, it is highly unlikely.
9/ Now, suppose we ask “What is the range of actual proportions for which 95% of the time, our observation (i.e. 85) can be observed?”
We know that 56% and 49% fall in this range. And 10% does not. But what is the actual range?
We can use Excel’s “Goal Seek” to find this.
10/ The Upper Bound of our range of actual proportions will have 85 at is lower end. And the Lower Bound, vice versa.
We get 48.5% to 64.3% as are range of possible “actual proportions”.
11/ In statistics speak [48.5% to 64.3%] is the 95% confidence internaval of our 56.3% estimate.
Alternatively, we could write, 56.3±7.8%
7.8% is known as the margin of error.
12/ So, can we say “A majority of my @twitter audience think ‘Yes’” ?
No. Because there is a small chance that the actual proportion is 49%.
Let’s discuss some FAQs.
13/ FAQ: Why 95%?
95% implies that there is a 95% chance that the actual proportion is within the [48.5% to 64.3%]. There is a 5% (known as “Significance”) chance that it is outside.
Statisticians think 95% Confidence is “sufficiently” large.
14/ FAQ: What happens if we have more samples?
If we had 10x as many or 1,510 responses, of which 850 said yes, then our 95% confidence interval would be [53.7% to 58.8%] or 56.3±2.6%.
In this case, we could say “A majority of my @twitter audience think ‘Yes’”.
15/ FAQ: What about “let’s assume this subset is a random sample”?
This assumption could indeed be false. For example, if only RW supporters answer the poll.
However, in many cases this is not an unreasonable assumption.
16/ FAQ: Could we conclude that a *majority* of #SriLanka think “Yes”?
No. Like I said, Not everyone in #SriLanka is on the internet. Of that not everyone is on @twitter. Of that not everyone will see and answer your poll.
So, No.