Looking into the Data For Progress Survey Data and Exploring Democratic Party Divides over Candidates

Justin Herman
5 min readMay 13, 2019

Abstract

Recently Data for Progress published a report here with an interesting new take on polling the primaries. Their intention is to provide some ranked based voting statistics, alongside a “Not Considering”” and “Considering Candidate” statistic. They asked voters over forty questions dealing with voter demographics and voter choice. Specifically, they asked voters to check all candidates they would consider voting for, rank all those candidates, and lastly to establish which candidates they would not consider voting for. The headline statistics revealed from the survey are Biden has 49% support level, Warren has a 40% support level, and Bernie has the highest “would not consider” percentage with 28%.

This is an interesting way to look at some of the lesser known candidates. Would not consider for Tulsi(24%), Bill De Blasio(25%), and several lesser known candidates(Moulton, Bennet, Williamson, Gravel are all over 20%). While these candidates are at less than 5% of overall consideration, the sentiment against these candidates is rather high. This leads to an interesting question, does lack of name recognition influence negative sentiment towards a candidate?

This write up attempts to validate the survey data that was published. I found difficulty in filtering the data to produce the same results as the survey summary indicated. The survey data indicates that 465 survey observations were polled on their preferences for the democratic party. When filtering for these results (done below), statistics don’t line up with the statistics presented in the report here. Perhaps I made a mistake. The survey still provides some interesting information, below we will attempt to investigate some of the results.

Survey Demographic Data

Worries About Representativeness of Survey Data

  • According to several links including one here, 58% of the electorate voted in 2016

As seen below only 11.7 % of our survey respondents did not vote in 2016 :

Breakdown of Voting Percentage in 2016 Election of Survey Respondents

It’s not just the overall vote breakdown, but the vote by age which is off here as well. According to censusdata, the voter breakdown for 2016 by age group was:
65+ = 70.9%
45–64 = 66.6%
30–44 = 58.7%
18–29 = 46.1%

The Survey data vote breakdown by age was as follows:
65 + = 96.08%
50–64 = 91.18%
40–49 = 86.9%
30–39 = 84.43%
18–29 = 69.7%

You can tell the data is significantly biased in the survey. The survey is polling almost exclusively a cluster of people who voted in 2016, which can add significant bias into representing actual population of voters.

Frequencies of Age demographics for the survey data

Age Breakdown by Survey Respondents
  • We add this to the fact that our survey data only polls 1/3 as many voters of age 18–29 as 65+, there may be some bias toward older voters in the data that goes beyond typical demographics

Overall our survey data is much more politically involved than a normal sample and is over-representing older voters.

Data Sanity Check

To check my imputation methods I lined up statistics that Data For Progress published with my imputations. It was easy enough to match their results by dropping all indexes which weren’t polled by candidate preferences. Still, I don’t get the same results displayed in the Data for Progress write-up. To double check, I created a pivot table in excel on the direct survey data, and I receive the same results as those printed below from my analysis in R. Still, mistakes in data crunching are common, perhaps I am the one who made the mistakes. Let us proceed ignoring these issues.

Support for Candidates

Support and Would Not Support Levels Based on My Data Analysis
Data For Progress Published Breakdown

Opposition to Other Candidates

  • Below some tables are built based on some of the front runners(Biden,Bernie, Kamala, Booker,Warren,Beto, Klobuchar, Buttigieg)
  • Anti Biden,Bernie, Kamala, and Warren tables are built. Each row represents voters who given they said they would consider a specific candidate also indicated they wouldn’t consider another candidate
  • Voters selected here merely indicated they would vote for the candidate, not that they would only vote for that one candidate or even rank that candidate as their first choice

Anti-Biden vote

Rows Represent Candidate and Column Represent Percent Vote For Trump Against Biden

Anti-Bernie vote

Rows Represent Candidate and Column Represent Percent Vote For Trump Against Bernie

Anti-Kamala vote

Rows Represent Candidate and Column Represent Percent Vote For Trump Against Harris

Anti Warren vote

Rows Represent Candidate and Column Represent Percent Vote For Trump Against Warren
Tables are built on voters choosing either Warren,Biden, or Bernie as their number 1 rank based choice.The results are then filtered for what percentage of those voters wouldn’t consider the choices in the rows. So table on the left 5% of Warren voters , 9% of Bernie voters and oddly enough 1 Biden voter(polling mistake) would not support Biden

Average Refuse to Vote for Any Other Candidate By First Ranked Choice

This table shows that on average Bernie(6.6%), Biden(11.5%), and Warren(4.3%) voters have selected other democratic candidates they refuse to vote for.

Rows Represent Candidate Choice and Column is Average Refuse to Vote Responses by Candidate Choice

Interesting Results

If you look back at my last report on Emerson data you will see that the Emerson data seemed to support the idea that Bernie voters were “anti-Warren”. Here we can see that Biden voters are twice as likely as Bernie voters and three times as likely as Warren voters to not support other candidates. In this survey Warren voters(8%) would not consider Bernie and Bernie voters(10%) would not consider Warren. This is strangely high, given it’s above the average for both groups would not consider rates(Bernie 6.6 and Warren 4.3). Considering how close these two are in policy positions, while I hate the idea of the left eating itself alive, it does appear that this trend continues to show in this survey data.

Conclusion

Overall this is an interesting way of conducting polling. However, I do fear that the sample in this survey is out of line with the general population. There is plenty of more to be explored in this survey. I hope to look into some more demographic data and to find trends in the rank based choices in further analysis.

To see the code for the above article, please visit my github repository

To see my other projects please visit DataScience projects. If you would like to contact me, please reach out to me on Linkedin

--

--