Bias, Benchmarks, and Over-Simplification

Why Election Polling and Tracking Was An Example Of Poor Analytics

Creative Analytics
Published in
5 min readNov 9, 2016

--

Disclaimer — if you are looking for an article bashing media bias, racism, sexism, or rigged elections — don’t look here.

If you are an analyst and last night’s results surprised you, you just weren’t paying attention. Everything you needed to understand last night’s possibilities had been widely documented and available for weeks. Unfortunately, it required some data mining skill and you to cast aside your cognitive bias. This is true no matter which candidate you support. And make no mistake, even in the unlikely event that recounts hand this election back to the democrats… last night was still a surprise for too many.

The problem in a nutshell was a simple one. Determine who was going to go to the voting booths and for whom they would vote. That is simple — not easy. But this two step model is simple. Go to the polls. Make a choice.

Polling

Polling is a poor form of sampling. And the semantics of polling and voting at the polls — reinforces ignorance. Polling engages a representative sample of the population in a behavior that in no way reflects step #1 — go to the polls, but does at least looks similar to step #2.

A better way to poll would be to open up selective voting booths around the country and ask people to voluntarily show up and tell you their future intentions. Only this would double down on the flaw in step #2. Voting is anonymous — polling is not. The very physical and time consuming nature of that process would also double down on another flaw. It would be even harder to find truly representative locations/populations.

Bias

The best way to approach the issue starts with a lack of bias — that is not easy either. Perhaps I had an advantage here — I am a Libertarian. My more subjective inclinations had me believing Gary Johnson might get 5%. That allowed me to be much more objective in analyzing the data coming from the real race.

Bias had numerous impacts on the election coverage this year. First bias always influences how we define representative. Whether that comes through channel bias — believing online polls and phone calls reflect on physical voting. Through observational bias — believing that subjective observation of things like turnout, enthusiasm, or intention are accurately reflected in the data. Or through confirmation bias — always keying on the evidence that reinforces what we want the outcome to be.

The most common way this bias arrives in analytic models is through weighting. This article is a wonderful explanation — if you read it analytically.

Here are the cliff notes. The LA Times poll was an outlier this election cycle. It showed Trump leading Clinton for most of 2016. In this article, an expert pulls the model apart. Reweights it. And voila — Clinton will win. Thankfully for us — the article explains precisely what was biased with all the other polls. The author should have titled this article — How I applied confirmation bias to the one poll that was providing us useful insight and broke that, too.

Benchmarks

I spent my evening, until I fell asleep on the sofa, bouncing between Google Elections and Fox Business. For all of you hissing at the mention of Fox, this is the network that provides me with Stossel and Kennedy… for me it creates the least cognitive dissonance. If you think it is like Fox News, you really aren’t paying attention.

The Election Intelligence tools available last night were sub-par. While I can’t speak to other networks, Fox Business was doing something especially useful for actually predicting where this election was headed. They were benchmarking… with extreme inefficiency, but kudos for trying.

The analyst, whose name I do not recall, that stood in front of their giant election visualization screen would flip back and forth between 2012 and 2016 returns in what he deemed key counties. In this fashion, he benchmarked results in Florida, Ohio, and Michigan. It became clear quite early that Hillary was under-performing Obama’s results.

While this image is older, it provides a pretty good example at the state-level of the benchmarks this analyst was providing. Only no one gave him split screen or side-by-side functionality.

Over-Simplification

We already spoke of how polling over-simplified the two step process of voting. In this section, lets address the insights that came from demographic data, early voting, and exit polls. They were all vastly over-simplified! Stop with the stereotyping already! Sorry Libertarian…

We are all individuals. Even the guy in the back of the crowd yelling “I’m not”. The increase in Latino voters, African-American turnout, the War on Women, and even more Democrats voted early than Republicans is analytic nonsense. News flash — people make choices based on their individual circumstances. In an election determined by a vote swing of 1–2%, mass stereotyping all Latinos as Hillary supporters is going to elevate your error and obscure any insights.

I do not dispute that certain groups favored one candidate over another, but when the margins are this thin — these iron clad segmentation schemes fail. They run too shallow. Wouldn’t it have been great if someone segmented likely voters by employment condition? Of course having a benchmark for new segments is always a challenge, but we need to start some time.

Learn From This

Whether last night’s results left you in tears, buying a bus ticket to Canada, or doing two claps and a Ric Flair — learn from the mistakes! You have to fight cognitive bias. Don’t give much creed to over-simplified models. Use benchmarks. Develop smarter segmentation. Here is a hint — if your segmentation reinforces a stereotype, you don’t want it. Your cognitive bias will weight all that stereotype nonsense without you reinforcing it.

And so if you allow me a little humor — it is time to unite these lessons. And Make Analytics Great Again! Thanks for reading.

--

--

Creative Analytics

Decision-First AI is an investment company focused on the future of data. We maintain this medium publication to further analytic debate and discussion.