What do polls actually say?

How to interpret a poll like a statistician

Nuwan I. Senaratna

Published in

On Economics

5 min readMay 13, 2022

1/ Suppose you ran a @Twitter poll asking

“Should @RW_UNP be PM?”

Suppose, If 151 people answered, and 85 (56.3%) said “Yes”.

How should you interpret the results?

Could we conclude that a *majority* of #SriLanka think “Yes”?

2/ Not everyone in #SriLanka is on the internet.

Of that not everyone is on @twitter.

Of that not everyone will see and answer your poll.

So, No.

Instead can we say,

“A majority of my @twitter audience think ‘Yes’” ?

Possibly.

3/ Some people in your audience think “Yes”, others “No”.

The 151 who answered are a subset of this audience.

(For now, let’s assume this subset is a random sample.)

What we need to find out is, what proportion of the audience think “Yes”, and if that is >50%.

Sampling (statistics) - Wikipedia

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample)…

en.wikipedia.org

4/ Intuitively we know that the “Yes” proportion is around 56.3%.

But it could also be 56% (i.e. 56.0%). Or even 49%.

Theoretically, even 10% or 90% might be possible.

Intuitively we also know that 56% is more likely than 49%. And 49% more so than 10% or 90%.

5/ Suppose the actual proportion was 56%, and suppose we picked 151 random responses from my audience.

The responses follow a Binomial distribution.

We can use Microsoft Excel’s BINOM operator, to make various Binomial distribution calculations.

Binomial distribution - Wikipedia

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability…

en.wikipedia.org

6/ If we picked 151 reponses, any number between 0 and 151 could be “Yes”. 0 and 151 are possible, but unlikely. What values are more likely?

95% of the time we would get between 73 and 96 “Yes” responses.

7/ Similarly, if the actual proportion was 49%, 95% of the time we would get between 62 and 86 “Yes” responses.

Note, the observed 85 responses is within the range. But only just.

8/ Now, if the actual proportion was 10%, then 95% of the time we would get between 8 and 23 “Yes” responses.

Hence, while 85 is theoretically possible, it is highly unlikely.

9/ Now, suppose we ask “What is the range of actual proportions for which 95% of the time, our observation (i.e. 85) can be observed?”

We know that 56% and 49% fall in this range. And 10% does not. But what is the actual range?

We can use Excel’s “Goal Seek” to find this.

10/ The Upper Bound of our range of actual proportions will have 85 at is lower end. And the Lower Bound, vice versa.

We get 48.5% to 64.3% as are range of possible “actual proportions”.

11/ In statistics speak [48.5% to 64.3%] is the 95% confidence internaval of our 56.3% estimate.

Alternatively, we could write, 56.3±7.8%

7.8% is known as the margin of error.

Confidence interval - Wikipedia

In frequentist statistics, a confidence interval ( CI) is a range of estimates for an unknown parameter. A confidence…

en.wikipedia.org

12/ So, can we say “A majority of my @twitter audience think ‘Yes’” ?

No. Because there is a small chance that the actual proportion is 49%.

Let’s discuss some FAQs.

13/ FAQ: Why 95%?

95% implies that there is a 95% chance that the actual proportion is within the [48.5% to 64.3%]. There is a 5% (known as “Significance”) chance that it is outside.

Statisticians think 95% Confidence is “sufficiently” large.

Statistical significance - Wikipedia

In statistical hypothesis testing,[1] [2] a result has statistical significance when it is very unlikely to have…

en.wikipedia.org

14/ FAQ: What happens if we have more samples?

If we had 10x as many or 1,510 responses, of which 850 said yes, then our 95% confidence interval would be [53.7% to 58.8%] or 56.3±2.6%.

In this case, we could say “A majority of my @twitter audience think ‘Yes’”.

15/ FAQ: What about “let’s assume this subset is a random sample”?

This assumption could indeed be false. For example, if only RW supporters answer the poll.

However, in many cases this is not an unreasonable assumption.

Randomness - Wikipedia

In common parlance, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence…

en.wikipedia.org

16/ FAQ: Could we conclude that a *majority* of #SriLanka think “Yes”?

No. Like I said, Not everyone in #SriLanka is on the internet. Of that not everyone is on @twitter. Of that not everyone will see and answer your poll.

So, No.

What do polls actually say?

How to interpret a poll like a statistician

Sampling (statistics) - Wikipedia

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample)…

Binomial distribution - Wikipedia

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability…

Confidence interval - Wikipedia

In frequentist statistics, a confidence interval ( CI) is a range of estimates for an unknown parameter. A confidence…

Statistical significance - Wikipedia

In statistical hypothesis testing,[1] [2] a result has statistical significance when it is very unlikely to have…

Randomness - Wikipedia

In common parlance, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence…

Written by Nuwan I. Senaratna