Response to an approval voting doubter
This was my response to a recent post on the Effective Altruism board.
> All things considered, I think electoral reform, while probably not a “top tier” intervention, should be part of the longtermist EA portfolio.
My view is that electoral reform, specifically alternative voting methods, is by far the biggest “bang for the buck” utility-increasing reform for humanity. I began studying alternative voting methods in 2006, largely focused on the findings of a Princeton math PhD named Warren Smith, who created RangeVoting.org to publicize his research. What was very surprising and counterintuitive were the Bayesian regret calculations he performed via computer simulation, showing that an upgrade from plurality voting to score voting (aka range voting) nearly doubled the human-welfare increasing impact of democracy. Approval voting is just score voting on a 0–1 scale and it performs almost as well as score voting with a range like 0–5; and of course it has the advantage of using ordinary ballots and not requiring voting machine upgrades. As Smith puts it:
there are very few causes out there with this much “bang for the buck.” Examine the numbers yourself. I do not believe religious causes can compete. Disaster relief cannot compete (in the long term; for large disasters in the short term, it can). Curing diseases also cannot compete except for the biggest killers. E.g, ending malaria or halving illiteracy each would cause an amount of good comparable to range voting, but would probably be more difficult to accomplish.
More recently, a Harvard stats PhD named Jameson Quinn performed his own computer simulations using slightly different modeling assumptions, and presenting the results inverted and normalized to voter satisfaction efficiency (VSE) rather than Bayesian regret. However he found similar results to Smith’s.
That last point about the cost (the second component in “bang for the buck”) is crucial. Aside from the one-time political campaigning cost, voting reform is essentially free. Unlike, say, mosquito nets or medicines which have to be manufactured and distributed.
Full disclosure: Smith, Quinn, and myself are all former board members of the Center for Election Science, but I am no longer associated with them and am speaking only for myself.
Approval voting is vulnerable to tactical voting.
All deterministic voting methods are vulnerable to tactical voting. But computer simulation and game theoretical analysis shows that approval voting is especially resistant to tactical voting. For instance, as Warren Smith and Jameson Quinn showed in their computer simulations, approval voting worked very well with wide variances in assumptions about voter behavior and the preponderance of tactical behavior. The difference was so large that in some of Smith’s models, approval voting performed better with 100% strategic voters than instant runoff voting (aka “IRV”, the most prominent ranked method) did with 100% honest voters. There is even a mathematical theorem that, given plausible models of voter strategy, approval voting always elects a Condorcet winner (a candidate who beats every rival by a head-to-head majority) whenever one exists. This is a very mild, some would say beneficial, reaction to strategy.
It’s also notable that approval voting was shown generally favorable to ranked methods in the William Poundstone book Gaming the Vote, and is specifically advocated by Steve Brams, an NYU professor of political science and game theory, who wrote such page turners as Mathematics and Democracy.
It fails the later-no-harm criterion: approving a second candidate can hurt your favourite.
At the outset, I want to suggest eschewing properties as a means for evaluating voting methods, and instead focus on voter satisfaction efficiency as described above. Focusing on properties is kind of analogous to evaluating race cars based on characteristics such as horsepower and aerodynamics. You might intuitively think a car with 10% more horsepower will be better, but once you perform a statistically significant number of timed trials, you may find instead that other factors such as aerodynamics or tire quality (or even properties you never thought to consider) have a countervailing effect that causes the more powerful car to surprisingly lose. Voter satisfaction efficiency (or Bayesian regret) simultaneously measures the combined effects of all those properties, even ones you forgot to consider. And indeed, those figures I cited already account for later-no-harm (LNH); and yet of the five alternative voting methods compared, IRV did the worst, even though it was the only one of those five methods to satisfy LNH.
Having said that, let’s consider this failure of the later-no-harm criterion (LNH) more specifically. I actually contend that failing LNH is a benefit, not a flaw. Here’s a thorough analysis by Warren Smith (again, full warning, Smith’s writing is interspersed with venom for his enemies, but the content itself is high quality).
To summarize, later-ho-harm means that given you’ve started by showing support for your favorite, X, it cannot hurt X to indicate support for a lesser liked candidate, Y. E.g. suppose you rank the Green in first place. Now it cannot hurt the Green to rank the Democrat (or Labour) in second place. IRV is the only commonly discussed method that satisfies LNH. By contrast, if you cast a vote for the Green with approval voting, casting a second vote for the Democrat (or Labour) could cause the Green (your actual favorite) to lose. IRV proponents often argue that this creates an incentive for “bullet voting” only for one’s favorite candidate. I admit this does seem problematic on its surface, but that goes away once you inspect a bit deeper.
See, all that assumed that you started by ranking your favorite candidate honestly in first place. E.g. you ranked the Green #1, and now you’re considering whether to rank your honest second favorite as #2. But your best strategy with IRV is, similar to FPTP, to rank your favorite frontrunner in first place, not necessarily your favorite overall. We see this often in USA primary elections, where people are afraid to vote for their favorite candidate because they fear she’ll lose the general election. For instance, right now my aunt in Iowa favors Democrat Elizabeth Warren over Joe Biden, but voted for Biden because she feels he’s more likely to win in the next round against Trump. To convert this to its IRV analog, with all three of those candidates running on a single ranked ballot together, she would insincerely/strategically rank Biden in first place, to help ensure Warren (her actual favorite) is eliminated, thus giving the presumably more competitive Biden the chance to square off against Donald Trump in the next round. IRV fails the favorite betrayal criterion.
And contrary to those concerned about bullet voting with approval voting, the best strategy is actually to approve everyone you prefer to the expected utility of the winner. Bullet voting is absolutely not the best general strategy. Case in point, imagine a Green Party supporter who normally casts a strategic vote for the Democrat. With approval voting, she still casts that strategic vote for the Democrat, but then she also casts a sincere vote for her true favorite, the Green, plus anyone else she prefers to the Democrat. This is the underlying game theory that makes approval voting so robust against strategy. There’s even empirical data showing that approval voting often has less bullet voting than comparable elections with IRV.
Moving beyond strategy, to the fundamental social choice theory behind LNH, the problem with LNH is that it forces a voting method to ignore important preference data. E.g.
Imagine preferences like these:
C is eliminated first with 31%, and L wins with 51%.
But suppose just 2%, from the CLR faction, swap their LR preference, creating:
C is still eliminated, but now R wins with 51%.
A tiny change in preferences changes the winner from L to R.
But now suppose a comparatively massive change in preferences occurs. The LRC voters find out something terrible about R, causing them to lower R to 3rd place. And the RCL voters find out something super positive about L, causing them to elevate L to 2nd place.
An enormous shift in public opinion just took place, positive for L and harmful for R. But this huge change didn’t change the winner from R to L, even though a tiny change of preferences moved the winner from L to R in the first place. LNH causes this fundamental distortion in sensitivity to preference information.
The winner, then, may not be the candidate with the most support, but the one that’s best at manipulating the system.
Absolutely true. But also consider that:
A. This happens with IRV too, and in some ways more so. Since you still have to worry about electability with IRV, a candidate like Biden can convince Warren’s supporters to drop her for him just by running ads that make voters fear her to be unelectable. That cannot possibly happen with approval voting. The best they could do is convince Warren’s supporters to also vote for Biden.
B. Approval voting behaves so much better in the general case (again, see the voter satisfaction efficiency results above) that it gives substantial margin for error, such that even if this effect you describe happens, approval voting plausibly still comes out ahead.
Admittedly, we have to see approval voting in the real world to get a better handle on such campaign dynamics that are beyond the capacity for a computer to simulate.
Approval voting radically re-interprets the common-sense notion of “having a majority”…For instance, approval voting sometimes selects a candidate even though a majority of voters would, in a head-to-head contest, prefer any other candidate. (This is the Condorcet loser criterion.)
Again, approval voting tends to elect Condorcet winners whenever they exist, in practice. And most ranked methods, including most Condorcet methods, are extremely vulnerable to tactics, meaning they may in practice be worse at electing Condorcet winners. Indeed, IRV can elect candidate X even though candidate Y is preferred to X by a huge majority and also has twice as many first place votes as X.
And to go more esoteric and technical, it is mathematically proven that a group may prefer candidate X even though a majority of voters in that group prefer candidate Y. See here and here. So it is not inherently wrong to avoid electing the Condorcet winner, or even the majority winner for that matter. Here are even some examples where it seems bad to elect the Condorcet winner. I contend the goal of social choice is to maximize expected utility, i.e. maximize the net utility of the group, not to elect “majority winners” per se. We can talk about the biological origins of that grey decision-making machine between our ears, and how that makes us effectively utility maximization machines, but I suspect the EA community is already fairly on board with this.
Indicating support or opposition for each candidate is more expressive than just having a single vote, but it is still binary and does not allow voters to express more nuanced preferences between different candidates.
Yes, this is the expressiveness issue. Ranked ballots do provide more expressiveness, but this is counteracted by two other factors: 1. their increased vulnerability to strategy, and 2. their generally decreased tabulation efficiency (particularly in the case of IRV). This is why approval voting generally outperforms the ranked methods in computer models. Even in the circumstances where some ranked methods perform better, it’s only a marginal amount that I do not think justifies the complexity. Especially given ranked voting methods have been repealed in something like 60 U.S. cities, apparently largely on account of their complexity.
There is almost no track record of approval voting being successfully used in competitive elections. Where it was used, approval voting was often repealed later on — e.g. in Dartmouth alumni elections and in internal IEEE (Institute of Electrical and Electronics Engineers) elections (search for IEEE here).
But all indications are that it worked well in those situations, and it was repealed for “political” reasons, plausibly because it worked. Granted, that’s little consolation if it manages to be repealed in American political contests. But there are good reasons to believe that’s less likely to happen in government elections. E.g. it’s harder to imagine a city like Fargo, where approval voting was adopted by a 64% majority, voting to repeal it by a majority via another ballot initiative. And given ranked voting has already been repealed so many times in the U.S., it’s not clear that approval voting is riskier in this regard. We’ll just to have find out empirically by seeing how it plays out in Fargo this June, and hopefully in other cities in the coming years.
My view is that the urgency of voting reform demands small experiments limited in their scope. If it turns out approval voting actually is better, and its simplicity allows it to scale faster, then I feel it will have been worth it. I think the urgency of issues like climate change and authoritarianism demands a massive scaling out of “electoral technology” that can increasingly democratize the USA in something like a decade. If approval voting fizzles out and/or IRV or other methods turn out to scale faster, it wasn’t that expensive of an experiment.
the tendency to favour moderate candidates could also be considered a bias and is not universally viewed as a positive feature of a voting system.
Again, the key here is to look at it through the lens of utility efficiency, which already measures “bias” and any other distortionary factors in terms of human welfare, which is the ultimate metric we want to look at.
It seems to me that effective altruism has not examined approval voting (or alternatives) in sufficient detail.
Speaking as someone who has spent countless hours studying this subject since 2006, whereupon I soon did my first exit poll, I think it has been studied voluminously. Organizational elections, exit polling, computer simulation, and all manner of years-long debates between math PhD’s and various engineers have taken place. An NYU professor of game theory and political science has worked on it for four decades and written multiple books on the subject. I’ve personally visited Kenneth Arrow, who won the Nobel Prize in economics for his study of voting theory going back to the 1950s. We’ll still find out new things from what we see in municipal elections like those in Fargo, but I do believe the general arguments you raise have been analyzed inside and out.
In general, my impression is that discussions of voting reform suffer from the problem that people tend to pick their favourite method and then cherry-pick one-sided arguments in favour of it.
Perhaps, but what’s relevant is the veracity of the arguments, not the personality traits of people who favor one system or another.
The Center for Election Science often talks about no favourite betrayal (which approval voting satisfies) and not much about later-no-harm (which it fails). FairVote doesn’t talk much about no favourite betrayal and talks a lot about later-no-harm — because their favoured method (instant runoff voting) satisfies later-no-harm but fails no favourite betrayal.
I would put this the other way around. Favorite betrayal matters for important game theoretical reasons that I described in detail above, whereas LNH is effectively an “anti-criterion”. And from those observations, I form a relative assessment of approval voting and IRV. I do not start with the system and then try to make the facts fit it. Indeed, Warren Smith initially ran his computer simulations with no particular foreknowledge of how it would turn out. He thought the winning system would vary depending on the assumptions (the “knob settings” of the program), but it turned out that score voting won in all 720 different permutations of those settings. And that led him to become a supporter of score voting. When I came across his work, I was initially skeptical and I sent him an email berating him. Then over the course of some email correspondence, he refuted my arguments and made me a convert.
A few years back, an associate of mine added a top two runoff to score voting and “STAR voting” was born. At first I was skeptical that it was a needless additional complexity. But then I saw that Jameson Quinn’s voter satisfaction efficiency calculations were quite favorable to it, so I’ve shifted my thinking substantially.
Since there is (some degree of) consensus that plurality voting is bad, but no consensus on which alternative is best, we should focus on the reform proposals that are most viable. That’s arguably instant runoff voting (IRV, called ranked choice voting / RCV in the US), which is championed by FairVote.
I don’t know if there is “consensus”, but I do feel the facts objectively show approval voting to be superior to IRV (if not every ranked method) according to essentially every metric we have, from voter satisfaction to ballot spoilage rates to voting machine complexity. I also dispute that IRV is more viable. Approval voting was adopted by a 64% majority in Fargo, and is polling at 72% support in St Louis. I suspect we are about to see that approval voting is more politically viable, due to its simplicity and transparency.
Unlike approval voting, IRV has a track record in competitive elections and is much more in line with conventional notions of “majority”.
But again, it is mathematically proven that majoritarianism is not the right metric. Maybe you’re talking more about public perception, but in that case I would again cite the 2–1 margin by which approval voting managed to pass in Fargo, and the 72% support it’s getting in St Louis. If this majority failure is a risk in terms of political viability, we’re not seeing it.
My personal favourite voting system would be a Condorcet method such as Ranked Pairs, but there are no large organisations advocating this, and it’s unlikely that Condorcet methods will be adopted.
I know I’m repeating myself, but approval voting may in practice be a better Condorcet method than real Condorcet methods. And it performs better under models of significant voter strategy. And of course is much simper and thus politically and logistically easier to implement and scale.
IRV seems clearly superior to plurality voting and has stood the test of time
It was used in nearly two dozen U.S. cities and repealed in all but one of them, then adopted in several cities again, and then repealed in four of them. So it’s not clear what its long term staying power is yet.
For parliamentary, I think it’s best to use a form of proportional representation rather than (or in addition to) first-past-the-post in single-seat constituencies.
Whether PR systems can outperform the best single-winner systems, such as STAR voting and approval voting, is very much an open question and highly speculative. But it’s also very difficult to adopt in the USA without first getting a system like score voting or approval voting which is capable of ending two-party domination. That’s because federal law makes multi-winner districts illegal, and the two-party system will make that impossible to change.
Proportional representation tends to lead to multi-party systems that require cross-party collaboration and reduce the team sport mentality that drives US polarisation.
One could expect that a congress full of moderate/centrist candidates would be even less polarized than one that includes every faction from socialists to fascists.
A steelman of plurality voting is that it grants power to the largest coherent political coalition (coherent in the sense of being able to coordinate on a single candidate).
Computer simulation shows that it is extremely bad. And as a 41-year-old American, I’ve seen first-hand how much its two-party lock-in fosters binary tribal thinking that makes it impossible to view issues like impeachment or climate change through an objective non-partisan lens. And combined with Gerrymandering, it virtually nullifies democracy. As FairVote says, elections become so predictable that, “Under our current system, we can predict 379 of the 435 House seats — or more than 87 percent of the total — with high confidence.”