Can Machine Learning provide better classifications for political parties than traditional approaches?

It’s time to rethink our groups!

Carlos Ahumada
5 min readSep 2, 2019
Photo by Maria Teneva on Unsplash

In my last article here at Data Social we saw that it is very tricky to cluster European political parties based on the classic (and outdated?) party lines (greens, conservatives, liberals and others). Indeed, I found that although parties may belong to the same family they have different positions on important policies. But fear not. Let’s invoke together the power of machine learning and develop a better classification. In this article, using Principal Components Analysis (PCA), an unsupervised method, I find out where political parties really do belong in respect to each other’s positions.

Note: Check out this very cool article to learn more about what supervised and unsupervised methods are in the context of machine learning!

What is PCA?

Without getting too technical, PCA is a useful statistical procedure when we have many variables and it’s hard to say which ones are most relevant as well as the relationships that can emerge from them. PCA “transforms” our variables keeping only the most relevant parts of them. This is to highlight variation and show patterns in our data.

Note: Check out this very cool article to learn more about PCA!

Setting up the PCA model

The dataset that I’m using this time is the same one: European political parties with a list of variables that help to define their ideology. However, this time I added an extra varible: people vs. elite. The final variables are: vote (percentage of votes obtained in last election), galtan (position of the party in terms of their views on democratic freedoms and rights), position (attitude towards the EU), immigrate_policy (position regarding migration policy), and people_vs_elite (position on direct vs. representative democracy).

Before applying the PCA model we have to be sure that the means and variances of the variables are relatively similar. Otherwise, a standarization process (transformation to mean zero and std. deviation of one) should be done. In our case, vote had the largest variance and needed standarization. That is why in the construction of the PCA model shown below scale is set to true. The function we’re using is prcomp from the stats package.

library(stats)#Building PCA model
pr.out=prcomp(pca_df, scale=TRUE)
#Plotting result
col=c("SkyBlue", "Orange")
cex = c(.6,.9)
biplot (pr.out,col=col, scale =0, cex = cex, xlim=c(-3,2), ylim=c(-2.5,2))

Results

Out of my five original variables, I obtained five loading vectors (PC1, PC2, PC3, PC4 and PC5). When using PCA we will always obtain the same number of loading vectors as the number of variables in the dataset. In this case, PC1 and PC2 seem to be explaining a big part of the variation on positions. That is why I choose them to plot in the parties.

This plot shows how strongly each variable influences a component. For example, PC2 seems to be a measure of how much a party supports direct representation or not (see the high score of people vs elite in the y-axis that correspond to PC2), while PC1 seems to be a measure of migration policy positions, attitude towards the EU and position regarding democratic freedoms and rights (see the high score of immigrate policy, galtan and position in the x-axis that corresponds to PC1).

To start with, we see a clear difference between conservative right parties like PiS, Fidesz, FvD, PVV, EKRE and others occupying the very left side of the biplot. These parties are the ones who have a traditional/authoritarian position regarding democratic freedoms and rights, and at the same time have a strong position against the EU. However, these parties differ a lot on their means to excercise power. For example, while Fidesz and PiS (top left of the biplot) promote representative democracy, others like PVV and FvD (bottom left) are much more in favor of direct democracy.

On the other extreme (right side), we see the green parties, Potami, PIRAT, TR, DK and others as parties that are more in favor of the EU, more libertarian and less restrictive when it comes to migration. However, they also differ a lot regarding direct democracy. Between these two extremes, we see all the Christian-Democratic parties (middle): CDS-PP, CU, CDU, CDU-PCP, CSSD, and some others, interestingly some socialist/communist parties.

In general, the plot shows that all political views, positions, and degrees are relatively well distributed across Europe. However, there are more parties with a pro-EU position than with an anti-position and these parties are more voted for than anti-EU parties. Also, more parties supporting direct rather than representative democracies can be identified, even though the difference is not that large.

Zooming in

The plot below zooms in on the most liberatarian, pro-EU, less restrictive in terms of migration, pro-representative democracy, and relatively high voted parties in Europe.

The plot below zooms in to the most authoritarian, anti-EU, more in favor of restrictive migration laws, pro-direct democracy, and relatively low voted parties in Europe.

Conclusion

Machine learning techniques such as PCA, can be useful to identify parties’ positions in regard to important topics in the agenda. Particularly, PCA provides an excellent (and fairly intuitive) visualization on the positions of parties taking into account different positions at the same time. Nowadays, with the amount of information and the speed with which it comes, it is hard for citizens and voters to actually identify the position of parties in multiple topics at the same time. If it is already hard for voters to identify the positions of one party across several topics, its even harder for them to compare the different parties in terms of their positions.

Providing voters with this kind of charts and information, would empower them. With these tools, voters could have more clarity on who they are voting for and demand more accountability from their representatives once in power!

--

--

Carlos Ahumada

Political Data Scientist @cosmonautsandkings l #NLP #Data Science for Communications and Politics |Berlin-based