An Interview with David Shor — A Master of Political Data

In this episode of the Masters of Data podcast, I sit down with David Shor for a discussion about politics, data and how data analytics are a fundamental part of what drives today’s political campaigns. David was in the trenches during one of the seminal elections in the last couple of decades, working on the 2012 Obama campaign in the legendary “cave” doing political data science. David is now the Head of Political Data Science at Civis Analytics. Civis Analytics was founded in the afterglow of Obama’s win with the backing of Eric Schmidt, the former CEO of Google. These Obama campaign alums were building on the successes in polling and data science used so successfully in that campaign. David sits down with Ben for a fascinating conversation around his entrance into the political data science arena, the events that lead up to his tremendous success in that pivotal election, and the challenges that contributed to the 2016 election from a data science perspective. During our conversation, David also gives an inside look into how the formulas and processes used in polling and political surveys are designed, the ways in which they can be incredibly accurate as well as the new cultural shifts taking place among voters which can inadvertently impact their effectiveness.

As we kick off our conversation David shares a little background into how he even was given the opportunity to step into the political campaign arena to begin with. As he shares, after Obama won the first primary in 2007, David (then an undergrad student) put together a toy data set, did a simple linear regression and tried to predict how Obama was going to do as a function of racial demographics and how democratic the area was. As he shares, he saw that it was a pretty easy and predictable process, which intrigued him. He also shares that this was a really interesting time as there were only a couple of key forecasting hobbyists who were all on the internet at this point and David became familiar with those key players. After a time overseas David inevitably came back stateside looking to get involved in something that was not pure math. He was then introduced to Sam Wong, a Princeton Professor who was actively working in election forecasting, who invited David to come to look at the work he was doing. And the rest was history. As David shares, he helped to build the house forecasting model for the 2010 midterms which opened the door to doing some additional election forecasting work before he ultimately applied to join the Obama campaign. A highlight of his experience at that point in time was his development of a forecasting system called The Golden Report, which David shares more about. Up to this point the campaign had been doing polls but they hadn’t been doing anything to aggregate them, which meant they hadn’t been doing really anything to turn the data into decisions. The Golden Report was basically a model that took into account all of the information they had available and synthesized it to estimate the probability of winning the election and the probability of winning in every state, which was built on the largest polling program that any political campaign had ever done up to that point. The result of the efforts was a forecast that was off by less than a tenth of a percent from the overall national result. This pivotal success opened the door to David’s introduction to Eric Schmidt, then CEO at Google, who inevitably founded Civis Analytics where David now works and focuses his efforts.

And while the results in the 2012 election were undeniably accurate and instrumental in forecasting Obama’s victory, the truth is that there was a variance in the accuracy and effectiveness in forecasting in the next 2016 election, something the two also discuss at length. While many blame the inaccuracy of the polls, David gives an insider look into what they experienced. As he shares,“The thing for us is that it wasn’t just that the polls were wrong, it’s that the polls were wrong in a way that led our campaigns to make bad decisions. So as a result of the polling, the Clinton campaign underinvested in states like Wisconsin or Michigan.” So not only was the prediction wrong, but as David explains, “All of the advice was either muddled or pushed Democratic campaigns to do basically the opposite of what they should have done.” And while the results were no doubt challenging to review, the reasoning behind it was equally as challenging. The two discuss how polling is fundamentally built on the assumption that the people who are answering your surveys are statistically exchangeable with people who aren’t answering your surveys. But what they saw in 2016 was that this idea was no longer true, something attributed to survey non-response bias. As he unpacks that idea, he shares that it simply means that, “The people who were picking up the phone just weren’t the same as the people who weren’t.” While the variance in participation could not be predicted, the results were processed and it was clear that there was a transition in mentality between the 2012 and 2016 elections. David explains that the referendum in 2012 was whether people should have universal health care while the referendum in 2016 was about whether you trust society and do you trust others around you. The results of the election were a telling response that people no longer trusted others with their information and were not willing to be public with their feedback-something that was inadequately calculated in the polling.

As the conversation concludes though, David and I also review strategies that are being used to increase polling response as a means of protection from a repeat of the same problem. But another topic of discussion is the idea of ethics, data collection and how that informs polling and surveying. No doubt, recent events have brought to light questions and concerns around data collection and the methods used to acquire, process and distribute personal data. As someone in the data analytics realm, David shares his concerns with these strategies. “Most of the data that we work with is a matter of public record. In general, we’ve always taken the position that it’s not worth it to push to try to be creepy, because at the end of the day…making your targeting 1% better isn’t worth the news story.” And while many in the industry do not operate with this mentality, the mantra of Civis Analytics is clear-you don’t need to be unethical or “creepy” to acquire the data needed to inform and answer strategic questions around voting; something David and his team are aimed at following through with to protect and promote good practices with data analytics in all arenas.

Outbound Links & Resources Mentioned

Masters of Data episode:

https://www.sumologic.com/masters-of-data/political-data-science-david-shor/

Connect with David on LinkedIn:

https://www.linkedin.com/in/david-shor-96548620/

Follow Daiv on Twitter @davidshor

Read more about The Golden Report:

https://www.bloomberg.com/news/articles/2013-06-06/obama-campaign-says-it-was-42-percent-more-accurate-than-nate-silver

Learn more about Civis Analytics:

https://www.civisanalytics.com/

Read from Civis Analytics on Medium:

https://medium.com/civis-analytics

Follow Civis Analytics on Twitter @CivisAnalytics

Connect with Civis Analytics on LinkedIn:

https://www.linkedin.com/company/civis-analytics/

Follow Civis Analytics on Facebook:

https://www.facebook.com/CivisAnalytics

Key Takeaways

  • Prior to 2012 political campaigns had been doing polls but they hadn’t been doing anything to aggregate them to turn the data into decisions.
  • The Golden Report was basically a multi-level Bayesian model that took into account all of the information available, public polls, our private polling program, the IDs we were getting on the ground and synthesized all of that to estimate the probability of winning the election and the probability of winning in every state as a covariance between all the different states.
  • The 2012 election was powered by the largest polling program that any political campaign has ever done.
  • Ideas used to build the Obama campaign were support scores, turnout scores and persuasion scores.
  • In polling, you have a list of voters you have to turn out, a list of voters you have to persuade, and a list of voters you have to raise money from.
  • One of the most important things that you can do as a data scientist at a political campaign is to more efficiently rank orders on your list.
  • 2012 was the first campaign where a persuasion score was built in.
  • The results of the 2012 polls showed that most people aren’t that persuadable.
  • After reviewing the 2016 polls it was discovered that the polls weren’t just wrong, but that the polls were wrong in a way that led campaigns to make bad decisions.
  • The polls were wrong because the people who take surveys are unique and are not easily calculated.
  • Polling fundamentally is built on the assumption that the people who are answering your surveys are statistically exchangeable with people who aren’t answering your surveys.
  • The 2016 election showed that this was no longer true and was an issue of non-response bias.
  • The non-response bias issue showed that the people who were picking up the phone just weren’t the same as the people who weren’t.
  • The 2012 election was a referendum on whether or not we should have universal healthcare.
  • The 2016 election was a referendum on how much voters trust the system, how much do they trust society and how much do they think that their neighbors can be trusted.
  • With data collection, the key is collecting as much data as possible and using machine learning to get a sense of what data is both correlated with the outcome cared about and correlated with response.
  • Most of the data that Civis Analytics works with is a matter of public record; whether or not you voted in the last election is a matter of public record or in most states, whether or not you’re a registered Democrat, which is a voluntary thing.
  • Civis has taken the position that it’s not worth it to push ethics in data acquisition because at the end of the day, making the targeting 1% better isn’t worth becoming the news story and it’s a distraction from the things in campaigns that are really important.