Electoral Reform Models

What Math Can Tells Us About Elections

Imagine a country — diverse lands, people with different interests, needs and principles. Imagine that it is a democracy. The diversity of the people is therefore translated into institutions that exercise power aligned with the expressed interests, needs and principles of the people. The process at the heart of it all is the election.

Viewed abstractly, the election is a method that transforms the people’s political opinions into a simple representation. This can be a distribution of seats in a parliament between political parties, each having a political position that is presented to the public in some form leading up to the election. The subject of this text is different election methods.

The inspiration for this work is the renewed debate on electoral reform in Canada. Although it seems no such reform is imminent, the debate has been illuminating. However, it has also been predictably coloured by the perceived short-term gains in parliamentary power of the pundit’s preferred team. In what follows I intend to apply a mathematical view to the election process. Through modelling and computer simulation, I will show how different ways to evaluate voting generates a representation in parliament. I hope to bring certain facts and processes to the fore, which should inform the Canadian, and related, debates.

There is plenty to say about what a good democracy ought to be. I only address one aspect. This is not a manifesto for democracy, nor a review of its place in history. The computer simulations, through models and explorations of “what-if” scenarios, are a means to discover latent properties of different election methods and put these in an abstract form for further consideration as democracy and governance are debated and enacted. What follows should be understood in that narrow sense. More math, less punditry.

Setup of Model — Landscape and Election

Before results and relations can be discussed, I will describe the assumptions and equations. I keep the technical details to a minimum, but some readers may still prefer to skip to the next section.

Since Canada is the inspiration, the model is formulated for the election to the 338 seats of the federal Canadian parliament in Ottawa. From the parliament the government is formed, typically by the party with the majority of seats. Each seat corresponds to an election district geographically located in one of the ten provinces or three territories. Each district has a population, typically around 100,000 persons with a few exceptions.

Histogram of population per election district in the 2015 Canadian federal election.

Each voter’s political opinion is set as a point in an abstract political opinion space. I use only two dimensions to describe this space, specifically the ones used in the Nolan chart: economic freedom and political freedom, set to range from zero to one. This is a simplification of the political opinions people can have. As a model of aggregate political opinion it is of adequate accuracy to uncover qualitative relations between election systems.

In order to explore different collective opinions, a number of voter archetypes are defined. Their political opinions are uncertain to a degree modeled by two-dimensional normal distributions. Three illustrative definitions are:

  • Centrist: the uncertainty distribution is centred on (0.5, 0.5) in the Nolan chart, with an isotropic covariance matrix with diagonal element 0.025.
  • Arch-Conservative: the distribution is centred on (0.8, 0.2) in the Nolan chart; covariance as the centrist.
  • Progressive leaning: the distribution is centred on (0.4, 0.6) in the Nolan chart; covariance as the centrist.

Other archetypes are used as well, defined in a similar manner. The probability density map of a centrist voter archetype is shown below.

Density plot of a sample of points from the centrist voter archetype. The darker the shade, the more points. The jagged boundaries are because the plot is created from a finite sample taken from the normal distribution.

Each election district is defined to contain a mixture of voter archetypes. The number of voters is proportional to the population. Some districts can have more progressive voters than others, or large cohorts of arch-conservatives and arch-progressives, making the district highly polarized etc. In this manner a number of district archetypes are defined.

Finally, the country is comprised of 338 districts of some specified frequency of district archetypes. For example:

  • 50% of districts are centrist-heavy, comprised on average by 40% centrist voters, 20% progressive-leaning, 20% conservative-leaning, 5% arch-progressive, 5% arch-conservative, 5% libertarian and 5% statists.
  • 25% of districts are left-leaning, comprised on average by 20% centrist voters, 40% progressive-leaning, 10% conservative-leaning, 15% arch-progressive, 15% statists.
  • 25% of districts are right-leaning — the inversion of the left-leaning district.

This mixture of a very large number of probability distributions defined on the Nolan chart is the political opinion landscape, or landscape for short, that the election system is meant to represent in parliament. Both landscape and election system are varied below in order to learn what effect that has on the election outcome.

The political parties are described as points somewhere in the Nolan chart. Unlike the voter archetypes, the location is without uncertainty. Furthermore, the points do not change place between districts. These are simplifications, but in search of general relations and associations, simplifications are helpful rather than harmful, as long as the limitations to the precision is understood.

The process of voting for any given voter in the model is done as follows:

  1. The Euclidean distances from the sampled point representing the voter’s political opinion to all available parties are computed.
  2. If at least one party is within some threshold separation, the minimum-to-vote distance, the parties are ranked in order of increasing separation from the voter’s political opinion.
  3. If no party is within the minimum-to-vote distance, no vote is cast.
  4. All voter party rankings from step 1 in a given district are passed to the election method under study. The election method returns the elected party (or parties) for the district according to the logic of the election method.
  5. This is repeated for all districts of the country, and a distribution of seats in parliament per party is obtained.

To be perfectly clear, the simplifications described above mean that the simulations are not replicas of past or predictions of future Canadian elections. Instead, the model will help find basic relations between election methods in an artificial, yet plausible, political landscape. The relations are general and hence one important part of how the election methods operate in reality.

Election Methods Under Study

Four election methods are considered.

First across the post: For all the voters in a district, the party ranked as the favourite is counted. The party that has been ranked at the top for the most voters is the winner, regardless what percentage of voters that constitutes. This is the current election method to federal parliament in Canada, as well as India, USA and UK among other.

Proportional: For all voters in a district, the party ranked as the favourite is counted. The seats allocated to the district are assigned such that the number given to each party is more or less proportional to the relative number of voters that considered that party their favourite. For this to be possible, election districts must be geographically larger such that they have multiple seats. In this model the provinces and territories are made into multi-seat districts. This is the election method used in most of continental Europe.

Instant run-off voting (IRV): For any district, the first step is to determine if any party is the favourite for a majority of voters. If so, this party wins. If not, the party that is the favourite of the least number of voters is eliminated. In turn the second favourite party for a subset of voters counts as their top choice instead. Again, if one party has been able to attain a majority as the top-ranked one, it wins. If not, the next least favourite party is eliminated and the process is iterated. This is a rare election method for federal and national elections, with only Australia currently using it. However, the method is common in referendums, in the election of party leaders, mayors or municipal offices around the world, and has also been adopted by the Academy Awards to elect the winner of the Oscar for best film.

Schulze voting: The details of this method are complicated and only an outline is given. The method is related to IRV in that it uses information on second, third and higher preferences to arrive at an elected candidate. Unlike IRV, which often terminates before the entire preference ranking of all voters has been traversed, Schulze voting employs the entire ranking in order to create an aggregate preference ordering of all parties (expressed mathematically as a directed graph). The highest ordered party is the winner. This is a recent method of voting, which has not been used in national elections. It has, however, found a following in the free-software community as collective decisions are made on different proposals, and various instantiations of the Pirate Party in Europe have used Schulze voting for their internal elections.

The Left-Right District Model and 8 Parties

I will start simple. A landscape model is defined with all districts of the same archetype: the left-right polar. In qualitative terms that means the most common political opinion is the centrist one, with sizable progressive and conservative leaning opinions present as well. In the Nolan chart that creates a density distribution with its principal axis pointing towards the lower right, but where deviations above and below this axis is present in significant numbers (a figure is given in the next section).

Eight parties are defined and placed somewhat evenly along the principal axis. The parties are assigned a color and name, where non-partisan names are chosen with an ironic view of the Canadian stereotype.

The elections are run by sampling about 2 million voters, where the minimum to vote threshold has been set to infinite. Given these models, let us explore some results.

Results: Party Along a Line

A simple variation is moving one and only one party along a well defined path and see what impact that has on the outcome. In this case the party called Winter Flip Floppers is moved along the diagonal of the Nolan chart, see the figure below.

Left-right polar landscape density, plus eight political parties, where one is moved along the dashed diagonal.

The four election methods are run for seven points taken along the diagonal. The election is simulated as described above, and the outcomes are shown in the diagram below.

Number of seats per party represented as stacked bar diagrams, as a function of election method and the location of the Winter Flip Floppers along the diagonal.

The proportional method is quite different from the other three methods. This is an illustration of that the proportional system does not attempt to reconcile the political differences of the voters into a most agreeable or least disagreeable political point at the time of election. Rather, the process to find common ground, which is needed to attain at least half of the votes in parliament, is left to after the election as the party representatives negotiate coalitions. I will discuss this more later.

The variations observed for the proportional system are predicable. The closer the Winter Flip Floppers are to the centre (that is 0.5 on the diagonal), the greater their share of the seats, mostly taken from the other two parties near the centre. The proportional system creates, at first glance, an easy to understand relation between the people’s political opinion and parliamentary power.

At the extreme values along the diagonal, the three other election methods all agree that the Polite Party is the optimal representation. That is a reflection of that this party is near the centre where the most voters in this landscape have their political opinion. As the Winter Flip Floppers approaches the centre, it becomes the preferred party for the same reason.

There is one very illustrative point, though, where the three methods differ considerably: at 0.45 along the diagonal. The first-across-the-post method makes the Ice Stormers the dominant party, the IRV method makes the Polite Party the dominant one, and the Schulze method makes the Winter Flip Floppers the dominant one. The political landscape is identical, the parties are identical, hence only differences between the election methods can explain the very different outcome. Let us take a closer look.

The figure above is a close-up on the centre portion of the earlier figure. The reason the first across the post method favours the Ice Stormers (the amber point) despite that it is farther from the centre where the most number of voters are located (“most agreeable point”) is that Winter Flip Floppers and the Polite Party are “stealing” votes from each other. Despite that the two parties are similar, voters divide evenly between the two and as a consequence the more separated party becomes the single most favoured party.

The outcome of the Schulze method can be explained in this case by the simple rule: the party closest to the centre wins. At 0.45 along the diagonal, the Winter Flip Floppers is slightly closer to the centre than the Polite Party. The Schulze method considers all preferences by all voters of a district, and thus the centre is the optimal point in terms of formulating a compromise opinion.

The IRV method is designed to avoid the issue of “stealing” of votes as present in the first across the post method. But why does it differ from Schulze voting in this particular instance? In order to understand that, we must take a look at the points farther away from the centre, see close-up above.

The less popular parties are eliminated in the iterated process of the IRV method. That means that the Trillion Trees (pink point), Orca Folks (red point) and Subzero Party (brown point) are progressively eliminated and the secondary and higher party preferences of the voters in that vicinity of the Nolan chart become relevant to the outcome. The voters that prefer these parties will mostly prefer the Polite Party over the Winter Flip Floppers in the higher-order preferences. On the other side of the landscape, parties are eliminated and their votes redistributed, predominantly to the Ice Stormers. In the last step, three parties remain, where both the Polite Party and the Ice Stormers have been assigned progressively more votes during the elimination, but not sufficient to make them a majority. However, their total vote count is greater than that for Winter Flip Floppers, and hence, it is this party that is eliminated and its secondary preferences tips the Polite Party over the majority threshold. In simple terms, the Polite Party wins by a combination of “vote stealing” of first preference voters from Winter Flip Floppers, and a positioning such that most of the lower right political flank of smaller parties are eliminated in favour of the Polite Party, rather than its closest competitor, Winter Flip Floppers.

Position of Party — A Random Walk Simulation

The variations above are illustrative of features of the election methods given a landscape and party positions. However, the expected party positions may depend indirectly on the election method. For example, in one case above the first across the post method created an outcome where two similar parties were “stealing” votes from each other. These parties would in a dynamic setting either merge, or further differentiate in order to avoid this outcome. A real example of this is the Canadian Conservative Party, which was formed in 2003 by the merger of two separate parties, which in the 1990s arguably splintered the conservative voters and gave the Liberal Party of Canada a couple of easy electoral victories.

In order to model the indirect relation between party position and election method, I introduce an optimization of party positions given a landscape and election model. The steps are:

  1. One of the eight parties is randomly selected.
  2. The party position in the political landscape is randomly altered within some threshold from its current position.
  3. An election is run as defined above.
  4. If the party selected in the first step gains seats, or has the same number of seats as before, the altered position is accepted. Otherwise, the altered position is rejected and the old position is kept.
  5. The steps 1–4 are repeated. Only after a specified number of iterations is the optimization ended.

A snapshot of an optimization is shown below.

Snapshot of a handful of concurrent steps of accepted party positions in the optimization. Size of marker indicates number of seats won in the election.

The messy image below illustrates how seats per party can evolve during the optimization. The optimization starts with randomly generated party coordinates, which evidently produce a results which is not representative of subsequent steps in the optimization. As the optimization progresses, all parties greedily converge towards a smaller set of relative positions where they all are more likely to win at least a few seats. In all analysis that follows, the first one hundred accepted steps in the simulation data are discarded.

This method of optimization is only meant to discover how a set of parties can be expected to distribute in a given landscape and given a particular election method. It assumes parties are driven by the objective to gain as much parliamentary power by themselves. In the real world, there is inertia to shifting party positions, mergers can take place, governing coalitions can be agreed upon prior to the election, knowledge of where other parties are situated can be used to guide differentiation, and the political landscape is shifting concurrently. So to be clear, it is not the party dynamics in the real world that is discovered, rather it is the statistical distribution of party positions, given a landscape and election method, that is discovered (within the accuracy and precision limits of the model and simulation). Thus, I expect the differences between election methods to be informative of the real world, while absolute quantities are irrelevant to an understanding or prediction of the real world behaviour.

The calculation below excludes the proportional voting method. The reason is that it is quite predicable how parties will distribute under proportional conditions: more or less a discrete reflection of the landscape. It is included in the conclusions though.

A Diverse Political Landscape is Defined

The optimizations are run on a different landscape than when one party was moved along the diagonal. The landscape is:

  • 30% of the districts are centrist-heavy, in other words predominately comprised of voters with a political opinion probability distribution strongly in favour of positions near the (0.5, 0.5) point on the Nolan chart, but with other voter archetypes present as well. The exact composition was defined in a section above.
  • 30% of the districts are right-leaning, hence more conservative and libertarian voter archetypes, and fewer, but not zero, progressive voters.
  • 30% of the districts are left-leaning, which is inverse of the right-leaning ones.
  • 10% are freedom-lovers, which is skewed strongly towards libertarian voter archetypes.

This landscape introduces diversity with respect to district types. Still, districts are overlapping considerably in terms of the set of political opinions the respective voters hold. That is qualitatively true for Canada, and most countries. But I make no claim that this is quantitatively how Canada is divided — that would require far more study of actual voting data and sentiment analysis.

Results: Position Optimization from Centre and Each Other

In order to characterize how the eight different parties position themselves relative each other and relative the landscape, two quantities are evaluated for all data:

  1. Distance to centre: Given a party position, determine the Euclidean distance to the centre of the landscape. This quantifies how agreeable the party is to the country as a whole, since the centre is the most common position, though not true for all district.
  2. Distance to nearest neighbour: Given a party position, determine the Euclidean distance to the nearest neighbour party. This quantifies in part how crowded the neighbourhood of the given party is with other parties.

For each election method, and each accepted step in the optimization, the relation between these two quantities is computed, conditional on that the parties have gained more seats than some threshold. The animated image below illustrates the relation for different seat thresholds as two-dimensional histograms.

A two-dimensional histogram shown as a density map, where a deeper shade of green means a larger number of parties are found in that bin during the course of the simulation.

An clear observation is that the number of instances in which a party gains a large number of seats reduces (the shade of green becomes paler) as the threshold increases — hardly surprising. However, Schulze voting drops to zero for much lower thresholds than the other two methods. Under the conditions of the model, no single party is able to get more than 120 seats, which is not the case for the other two election methods.

The cause of this difference is found upon closer inspection of how the parties position themselves. The figure below is a snapshot from the optimization, but in fact it is representative of most configurations obtained with the Schulze method.

There are three parties of similar size, centred very close to the maxima of the three dominant district types in the landscape model. There is also a fourth smaller party positioned closer to the upper right, where the fourth district type is predominately situated. Therefore, it is rare for a party to win many more than 30% of the districts (or 102 seats). As in the earlier landscape model, the Schulze voting method is extremely good at discovering parties that are positioned such that they are in the preference sweet spot for the district types.

The animated histogram is further confirming that the first across the post method is extremely sensitive to parties being similar. The corresponding first and second columns in the histogram are very pale. Any two parties that are similar in the political landscape, are rarely able to win in districts, regardless of where they otherwise are situated.

I will return to the results on the IRV method soon, but first I will adjust the model a bit.

No Vote

The analysis so far has modelled every voter as participating regardless of where the voter is situated relative the political parties. In reality we know that voter participation is below 100%. In the 2015 election to the Canadian parliament the turnout was around 66% nationwide with younger voters less inclined to exercise their franchise. The reasons for not voting vary, but self-reported reasons in Canada include mostly: health reasons, uninterested in politics, or some general political reason not to.

I will put aside the larger discussion about what low turn-out means for the health of a society and model only one aspect of it. As defined in a section above, a voter can be modelled to only participate in the voting if there is a party within some threshold distance of that voter’s political position. I repeat the optimizations as above with the threshold set to 0.25 and 0.15. A smaller value implies that voters require parties to be closer in order to vote.

Results: Less Inclined to Vote Changes Positioning

Another quantity to summarize how parties distribute relative one another is the average party-to-party separation. For any configuration obtained during the optimization, these distances are computed for parties that have exceeded some threshold of seats in the election, and the average such distance is evaluated. A large value implies that the parties are well separated from each other. The box-plots below show the average party-to-party separations obtained during the different simulations, given the condition that the party has obtained at least 80 seats.

Two trends are evident:

  1. The Schulze method leads to a tight clustering of parties with a sizable representation in parliament. IRV is more spread out, and first across the post generates the most spread out parties. (The proportional method would be even more spread out.)
  2. As voters become less inclined to participate, unless there is a party similar to their opinions, all election methods leads to a tighter clustering of the parties.

Before I discuss these relations further, the same type of two-dimensional histogram as above is created for the simulation with the 0.15 minimum to vote threshold.

In the first across the post method there is considerable cost to be similar to any other party, which pushes parties apart. Thus the average spread is relatively high. However, the parties can not stray too far from the position with most voters, and thus there is a pull towards a smaller set of common points balancing the former mechanism. The Schulze method experiences no penalty for nearby parties, hence all parties are competing to be maximally fit to the same small number of positions at or around the centre in the political landscape. IRV in a way falls between the two, as seen in the figures above. Earlier I explained why being close to some other parties is bad in IRV, while being close, or more precisely on the same flank, as certain other parties is good. Hence, parties that are elected through the IRV method, must seek to balance the two tendencies. The benefit of being the most centrist party for a specific political flank may explain why the IRV method produces sizable parties a bit more off-centre than the Schulze method (compare the corresponding first and second rows in the histograms).

The shrinking spread of party positions as voters become less inclined to vote is explained by that the political fringes effectively disappear. For a party to move towards the fringe comes at the cost of being non-competitive for the more common voter at the centre. Advantages of controlling a flank becomes smaller. It should be noted that this is only true in cases where the landscape has the most voters in a single centrist position. If the landscape was bimodal, other effects would be expected, although that is a subject for another study.

Lessons Learned

The simulations and models have discovered, confirmed or suggested properties of the election methods:

  • The first across the post method favours parties that are differentiated from other parties without straying too far from the position of most common political opinions by the people. The impact of “vote stealing” between similar parties can occasionally produce odd outcomes.
  • The IRV method favours parties that are both near the position of most common political opinion by the people, and positioned such that they dominate a political flank with only smaller fringe parties further out the political flank in question, which turn into more centrist votes during elimination.
  • The proportional method mirrors the political opinions of the people most accurately, but in turn creates a polarized parliament with a single party rarely being able to attain a majority. Hence, the effort of arriving at a compromise is entirely left to the parties after the election, a dynamic I have not studied.
  • The Schulze method is unaffected by parties being close, and instead favours very consistently the point the most voters disagree on the least, a kind of median position. That can in certain landscapes create a small spread of parties with sizable parliamentary power.

These rules or relations should be understood qualitatively. The degree to which they apply require a more sophisticated model, and in particular it depends on the topology of the landscape — an entity that is practically difficult to ascertain, but certainly possible through polling, and other political sentiment analysis.

So What Does This Mean for the Bigger Picture?

So is there a best election method? Not without defining what a good democratic society ought to be. The election methods are different in how they arrive at the political compromise and the just division of power. Therefore, the answer to the above question must be preceded by an answer to the question of what a just and good division and use of power ought to be for our shared society as a whole. This study helps to connect one answer to other, but the fundamental question is left unanswered.

One thing the study should make very clear, is that to predict what effect a change of the election method would have on current parties is a false concept. The current parties have evolved to the current election method. With a new election method, new parties and new relative positioning will take place. That may take time, so short-term predictions of how parliamentary power will change are feasible. However changing something as profound to the health of the democratic society as the election method must never be conflated with the pursuit of parliamentary power for the next four years by one party or another. That tendency should be met with a loud and clear repudiation.

My wish is that the basic abstract modelling creates a view of the bigger picture of what is at the heart of the election and the pursuit of common interests it embodies, and that the results enable a consistent and honest connection of principles to practice as the topic of electoral reform is debated. Math helps with many things.