November Data Challenge — Poll Believability

Jerry Li
Jerry Li
Nov 2 · 2 min read

In an ensemble of polls collected by Fivethirtyeight, any of the top 5 Democrats has a 10 point lead versus Trump. However, the previous 2016 elections showed an even stronger leaning towards the Democratic party, and yet Trump still succeeded in the elections. This begs the question — to what extent are current polls useful in truly predicting the outcome of elections?

The dataset used by the present author is a collection of state and national polls conducted from November 2015-November 2016 on the 2016 presidential election. Data on the raw and weighted poll results by state, date, pollster, and pollster ratings are included. Weighted polls are weighted by an estimated poll weighting, which varies by different sources.

We can first plot the ratio between weighted polls on Hillary Clinton versus Donald Trump — this gives a relatively straightforward comparison between the two.

Ratio of weighted polls between Clinton and Trump before the 2016 elections. The polls are plotted based on when the polls began.

However, we can immediately observe that there seem to be some significant outliers. We remove the outliers and replot the graph:

Ratio of weighted polls between Clinton and Trump before the 2016 elections, after removing outliers.

While the data is incredibly scattered, we can clearly observe two trends:
Firstly, the ratio of weighted polls between Clinton and Trump is generally above 1.0, indicating a belief that Clinton was more likely to win the elections.
Secondly, the disparity (range) of ratios starts out very high and then decreases significantly in March, after which the disparity consistently increases. This suggests that people become less and less confident in the outcome of the elections as time wore on.

Why might this be the case? What factors influence people’s responses to polls, and how do analysts construct weights? There are many articles written on the 2016 elections — indeed the 2016 elections were quite unconventional — but we want to see what you can come up with!

Here are a couple of questions to get you started:

  • How did polls change over time in correspondence with specific events that appeared in the news?
  • Can we find a correlation between polls and the actual results of the election (perhaps by looking at specific states), and if so, how good are polls at predicting election results?

To submit, write a medium post and post it to this publication, and at the end of the 2 hours, the best write-up will win a Google Home Mini! Best of luck!

CMU Data Science Club

A place for our club members to publish their findings from their data exploration, our data challenges, and also our event wrap-ups!

Jerry Li

Written by

Jerry Li

CMU Data Science Club

A place for our club members to publish their findings from their data exploration, our data challenges, and also our event wrap-ups!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade