Crowdsourced Voting: Modeling User Behavior with Math
The Nonprofit Technology Conference uses crowdsourced voting (plus a panel of experts and NTEN staff) to determine which panels to host at their annual conference. As I was perusing and voting on panels myself this afternoon, I started thinking about the value of a single up vote or down vote, and how we might use math to model this user behavior.
Like that other technology conference I just blogged about, they saw a record number of panel submissions this year — about 450 for 100 or so total spots, and have changed up the voting process a bit from last year. Key points:
- anyone can vote without having to register on the site;
- voting is a simple “up vote” or “down vote”
- voting is persistent throughout the 2 week voting period (in other words, once you’ve voted for something, you can change your mind from up to down, but can’t vote multiple times for the same panel;
- “vote stuffing” is minimized by limiting voting to 1 IP address (a real bummer for those of us who work in offices where 30 people might share the same IP address)
- panels are somewhat anonymous — submitters could choose to include their panelists names in their panel description, but weren’t encouraged to do so
I propose there are likely 3 scenarios that motivate people to vote:
- A friend asks them to vote for a panel; so they surf over to the NTC website, give an upvote, maybe happen upon a few other panels, but by-and-large stick to the “vote & run” technique.
- Die hard NTC fans; so they mull over the pros and cons of most/all panels, end up voting for 20? 50? 450? with a mix of up and down votes.
- They have proposed a panel; so they vote up their own, promote it to all their friends, and (maybe) vote down a few of their “competitors”
If our purpose in crowdsourced voting is to really get the opinion of the average member of the crowd, the “value” we might place on a single vote in each of those scenarios is quite different.
The simplest method of course is to just look at the differences between up votes and down votes. In the above picture, I took a sample of the first 200 panel submissions (which were themselves randomly ordered, so I have a random sample of just under half of the population) and plotted up votes versus down votes. The gray box covers exactly half of all panels (with some data points stacked on top of each other). Those are the panels “the crowd” essentially feels lukewarm about. The red box appears to be clear “no” votes — lots of down votes, with very few up votes. The green box are clear “yes” votes, and the yellow box are the “controversial” panels — the ones who polarize voters to either love or hate them.
But I don’t think that tells the real story, or at least, it’s not the best use of our data.
So back to our scenarios. What if we (and by we, I mean, the NTC voting commission) gave less “weight” to those voters from scenario 1 who only voted once? They would seem to be the people LEAST likely to be attending the actual NTC conference, instead they’re just doing a favor for a friend. And I say that as someone who proposed three different panels, and have been encouraging all of my friends and colleagues to vote. What if we also gave less “weight” to those voters who are predisposed to vote negatively, as in scenario 3? Review sites with a stable set of critics tend to do this; it corrects for the difference between a critic who typically gives an “average” movie 2 stars, and when they’re really in love with something 3 stars, versus the critic who gives practically anything carte blanche 4 stars.
It’s possible NTC is already planning for this bit of statistical manipulation of voting patterns. And I’m not saying if they don’t, it was necessarily the wrong call. But in trying to figure out “what the crowd wants,” the simplest math doesn’t always equal the optimal outcome.