# What do polls actually say?

## How to interpret a poll like a statistician

1/ Suppose you ran a @Twitter poll asking

“Should @RW_UNP be PM?”

Suppose, If 151 people answered, and 85 (56.3%) said “Yes”.

How should you interpret the results?

Could we conclude that a *majority* of #SriLanka think “Yes”?

2/ Not everyone in #SriLanka is on the internet.

Of that not everyone is on @twitter.

Of that not everyone will see and answer your poll.

So, No.

Instead can we say,

“A majority of my @twitter audience think ‘Yes’” ?

Possibly.

3/ Some people in your audience think “Yes”, others “No”.

The 151 who answered are a subset of this audience.

(For now, let’s assume this subset is a random sample.)

What we need to find out is, what proportion of the audience think “Yes”, and if that is >50%.

4/ Intuitively we know that the “Yes” proportion is around 56.3%.

But it could also be 56% (i.e. 56.0%). Or even 49%.

Theoretically, even 10% or 90% might be possible.

Intuitively we also know that 56% is more likely than 49%. And 49% more so than 10% or 90%.

5/ Suppose the actual proportion was 56%, and suppose we picked 151 random responses from my audience.

The responses follow a Binomial distribution.

We can use Microsoft Excel’s BINOM operator, to make various Binomial distribution calculations.

6/ If we picked 151 reponses, any number between 0 and 151 could be “Yes”. 0 and 151 are possible, but unlikely. What values are more likely?

95% of the time we would get between 73 and 96 “Yes” responses.

7/ Similarly, if the actual proportion was 49%, 95% of the time we would get between 62 and 86 “Yes” responses.

Note, the observed 85 responses is within the range. But only just.

8/ Now, if the actual proportion was 10%, then 95% of the time we would get between 8 and 23 “Yes” responses.

Hence, while 85 is theoretically possible, it is highly unlikely.

9/ Now, suppose we ask “What is the range of actual proportions for which 95% of the time, our observation (i.e. 85) can be observed?”

We know that 56% and 49% fall in this range. And 10% does not. But what is the actual range?

We can use Excel’s “Goal Seek” to find this.

10/ The Upper Bound of our range of actual proportions will have 85 at is lower end. And the Lower Bound, vice versa.

We get 48.5% to 64.3% as are range of possible “actual proportions”.

11/ In statistics speak [48.5% to 64.3%] is the 95% confidence internaval of our 56.3% estimate.

Alternatively, we could write, 56.3**±**7.8%

7.8% is known as the margin of error.

12/ So, can we say “A majority of my @twitter audience think ‘Yes’” ?

No. Because there is a small chance that the actual proportion is 49%.

Let’s discuss some FAQs.

13/ FAQ: Why 95%?

95% implies that there is a 95% chance that the actual proportion is within the [48.5% to 64.3%]. There is a 5% (known as “Significance”) chance that it is outside.

Statisticians think 95% Confidence is “sufficiently” large.

14/ FAQ: What happens if we have more samples?

If we had 10x as many or 1,510 responses, of which 850 said yes, then our 95% confidence interval would be [53.7% to 58.8%] or 56.3**±2**.6%.

In this case, we could say “A majority of my @twitter audience think ‘Yes’”.

15/ FAQ: What about “let’s assume this subset is a random sample”?

This assumption could indeed be false. For example, if only RW supporters answer the poll.

However, in many cases this is not an unreasonable assumption.

16/ FAQ: Could we conclude that a *majority* of #SriLanka think “Yes”?

No. Like I said, Not everyone in #SriLanka is on the internet. Of that not everyone is on @twitter. Of that not everyone will see and answer your poll.

So, No.