Predicting how you’ll vote: a follow-up
I recently wrote about how Brigade is able to help people decide how they should vote on Election Day. In that post, I talked about our social ballot guide, which contains thousands of state and federal races as well as every statewide ballot measure in the country. Our tool compares your views on a range of issues to users who’ve pledged to vote for particular politicians or ballot measures in order to arrive at a voting recommendation for you.
We calculate alignment by having users agree or disagree with short opinion statements we call positions and comparing responses to those positions. It is our main indicator for whether two people may be like-minded and it has a major presence throughout the product.
Since that blog post was published, more than a quarter million people have pledged their vote for a presidential candidate. We’ve added more races, ballot measures, dramatically improved the ability to share your pledges and attempt to sway others to vote with you. We’ve also reworked the system to incorporate more information to better calculate alignment.
What we haven’t changed is our calculation for alignment. In the screenshot above, the personalized recommendations for Hillary Clinton and Jill Stein are very close to each other, and the issues that have greatest alignment are the same. This was something that we’d identified as a weakness, and it gets more difficult to determine as candidates are more similar to each other:
Results of experimentation
Over the course of a few weeks, our data team experimented with everything from logistic regression to random forest, up to deep learning. We focused largely on the presidential race, where we have the most data.
Most algorithms behaved similarly; random forest and deep learning in particular were very close. In general, for Hillary Clinton and Donald Trump we saw precision of ~0.85 (where the predictions that we made were for the right candidate and recall of ~0.6 (where the predictions that we made were for the candidate that users had actually chosen). As a rule of thumb, precision of 0.8 is roughly what one can expect with an unoptimized application of a model in most uses. Performing the same analysis against social alignment, we found precision of 0.6, implying that most machine learning models could be better at predicting who someone will vote for compared to what we are doing.
Looking at Jill Stein and Gary Johnson, we found that most models tended to optimize them out of the system — any vote pledges for them were mostly ignored. Recall for them dropped way down as a result. This is not true for social alignment, where we don’t take the number of pledges of each group into account. The more sophisticated models tended to have better recall for all candidates, especially when they took some factors into consideration that tend to sway Stein supporters from Clinton supporters.
The objective
This leads us to the question that we should have been asking from the beginning (and indeed was asked one or two members of our team): What should we be solving for? It’s useful to tell our users who we think they will vote for, which is what we get with our machine learning models. However, we built social alignment to tell users who they should vote for, based on how similar their views are to others who are pledging to vote. “Should” is a nebulous term, and we grappled with the difference for a time.
In the end, we decided that the distinction between the two is significant enough that identity of our recommendation being based on alignment is important. Once we start applying ML to this problem, we start to lose our ability to explain “why” someone should pledge to vote for a candidate. One of the pillars of Brigade is that by acting together as a community, we can have greater effect than being alone. By removing the feeling of the community from our recommendations, we risk losing a bit of that purpose.
Of course, that doesn’t mean that our analysis is done. In the previous post, we laid out a bunch of issues with alignment as it stands today, and those should be corrected. There is a lot that we can do to figure out the significance of each agree and disagree action that a user takes, identifying what positions unify groups of people, and what positions aren’t particularly important to a group.
Improving these can improve alignment across our product, and we’re eager to get to it.