Election Opinion Polling For a Statistical Sleuth
Happy Election Day everyone! Tis the season news anchors and political analyst discuss statistical inferences and prior polling trends to predict the midterm elections. In this post, I would like to highlight some of the statistical practices that are used in polling to assist in predicting election candidate winners.
The basic idea of the election opinion polls is to survey the US voting population to estimate a certain percentage on how people agree on particular issues and views on candidates. One approach would be asking each individual in the population and tally who everyone is voting for, however, with a US population of 330 million it would be impractical to ask everyone (also noting not everyone in the population is eligible to vote). So the next approach would be to try sampling the eligible population and extrapolate the results. A couple of common methods to sample the population would include:
- Simple random sampling where every member of the population has the same probability of being a part of the sample (e.g., drawing names out of a hat).
- Stratified random sampling where we define population groups (or strata) such as a zip code or electoral district and randomly choose individuals from each of the groups.
- Cluster random sampling, similar to the above but every member from some of the groups are randomly selected.
In these sampling methods it is important our selections are representative of the population. Non-representative practices would cause a sampling error, for example when performing a random sample a systematic error could occur where we poll participates whom are not eligible to vote or polling the same participants more than once. Therefore, there is a strong importance place on the process of selecting a solid sample approach.
Example: Texas Statewide Poll
Here is an example of a Trafalgar Group poll conducted Nov 3–5 2018 for the great state of Texas: https://drive.google.com/file/d/18h31Oi90C0teXatPo1CQgpuKnee-MscF/view
— — Texas Statewide Survey: Statistical Summary — —
- 2,135 Respondents Likely 2018 General Election Voters
- Margin of Error: +/- 2.1
- Response Rate: 1.9%
- Confidence: 95%
- Response Distribution: 50%
- Trafalgar Group polling methodology: auto-dialing with computer generated questionnaires (1–2 mins) https://ballotpedia.org/Trafalgar_Group
- Target audience: those who would not typically participate in political polls
- [PH]: Sampling methodology: unknown
Below is the Texas Senate Ballot Test:
- There were about 2,000 likely voters that were polled with a low response rate of 1.9%; so about 40 individuals participated in this poll
- A margin of error of +/- 2.1 percentage point estimate indicating the polling accuracy provided by the confidence interval.
- We can see that 49.8% of voters preferred Cruz and 41.1% preferred O’Rourke (+8.7 favor Cruz)
- At the same time Trumps approval rating among those participants were 53% Approve and 45% Disapproved (+7.7 favor Trump).
- Over 52% of participates were over the age of 50.
- Taking into consideration the margin of error, the reporting of 49.8% for Cruz could be as low as 47.7% and as high as 51.9%. This sampling error indicated to those interpreting the result based off the sampling distribution if the poll was conducted multiple times over, we can expect slightly different results. Therefore the sampling error provides a range of expected accuracy.
In terms of sampling methodology, in the report there was limited information on how the individuals were selected for the population. With a quick Bing search (#microsoft), we was able to find a recent article regarding Trafalgar polling methodology indicating usage of auto-dialing technology with computer generated questionnaires. I think it is noteworthy to indicate that this information was provided though a Breitbart interview, who’s organization is known to have conservative bias. Therefore, it appears (based off this information) that the sampling approach selected for the Trafalgar Group to be a “convenient sampling” approach where individuals are sampled from the population in terms of “convenience” or the least expensive-easiest-fastest method. This further indicated a nonprobability method to sampling as this is one of the weaker sources for inferential statistics to learn about the views of a population.
Couple other interesting thoughts on this election day:
- How does a pollster know when their sample size is “big enough”(n=500 vs n-10,000)?
- Do polls have a causal relationship between voter behavior and election outcomes?
- What would be the effect on public opinion if polls were not released to the public but only used for election campaigns?
Happy Election Day!!! Have you noticed anything statistically interesting during this mid-term election? It sure will be interesting to see how right (and wrong) these polls were post-election!