Using Deep Learning to Predict Voting Outcomes in Europe

Published in

Analytics Vidhya

6 min readSep 16, 2019

Deep learning. Everybody in the the data science world has been talking about it. We are aware of the predictive and classifying capabilities of its algorithms. We know how the infrastructure works. However, despite all this knowledge and how often we are using it we don’t have yet a DEEP understanding on what is happening inside the black box. Could this be one of the reasons why we’re not using it yet in politics?

In this article, I’m going to explore deep learning for social sciences, applying one of its algorithms to predict voting outcomes for European parties. To do so, I use the same dataset as in my past two articles: The 2017 Chapel Hill Expert FLASH Survey (CHES) built by researchers from The University of North Carolina. I will also touch upon the concept of deep learning and discuss its limitations and ethical issues. So once again, hands on!

Deep, deep learning

The concept of deep learning is fairly easy to grasp. In a very basic sense, deep learning is a machine learning technique in which algorithms analyse the data, automatically extract features from it without human intervention, and identify patterns to classify or predict outcomes. The secret behind deep learning is the capacity that the algorithm has to learn from its own mistakes. Kind of creepy, isn’t it? Especially if we consider that even us as human beings sometimes struggle a lot to learn from our own mistakes! Anyway, I cannot explain the concept better than this video. Investing 20 minutes in it might be life changing. I promise. If you want to check some of the applications of deep learning in real life check out this other video.

The Model

If you read my last two articles, you are already familiarised with the dataset I will use. If not, I recommend you to take a look here and here. From the dataset I will use only the numeric variables that represent the parties’ positions regarding topics and ideological stands. The outcome variable is the share of votes obtained by parties in their respective election (either 2014, 2015, 2016 or 2017). The variables were normalised for a better interpretation of them by the model.

Note: To setup the model I will use the Keras package for R. Since this package relies on Python, it is possible that you might have to install Anaconda, specially if you are working in PC.

In the code below I select only the numeric variables from the dataset, randomly split the data into train and test sets and normalise the variables. Note that the algorithm will ‘train’ in the train set and then we will test its accuracy in ‘unseen’ data, i.e. the ‘test set’.

#Packages list
library(data.table)
library(pastecs)
library (dplyr)
library (magrittr)
library (Hmisc)
library(ggrepel)
library(keras)#New data set - Keeping only numeric variables
dl_df <- eu[ , -c(1,2,3,4,5,8,9,10)]#Splitting dataset into train and test set 
train_index <- sample(1:nrow(dl_df), 0.8 * nrow(dl_df))
test_index <- setdiff(1:nrow(dl_df), train_index)
# Build X_train, y_train, X_test, y_test
X_train <- as.matrix(dl_df[train_index, -1])
y_train <- as.matrix(dl_df[train_index, 1])
X_test <- as.matrix(dl_df[test_index,  -1])
y_test <- as.matrix(dl_df[test_index, 1])#Normalizing
mean<- apply(X_train, 2, mean)
std <- apply(X_train, 2, sd)
X_train <- scale(X_train, center=mean, scale = std)
X_test <- scale(X_test, center=mean, scale = std)

For the construction of the model, I am going to use dense layers of 64 nodes each, and a final layer, which activation is set to linear. In this way, the final output will be a prediction of the share of votes. If you saw the video posted at the beginning of the article, these concepts should now be familiar to you.

#Construction of the model
model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 64, activation = 'relu', kernel_initializer='RandomNormal', input_shape = dim(X_train)[2]) %>%
  layer_dense (units = 64, activation = 'relu') %>%
  layer_dense(units = 1, activation = 'linear')
summary(model)#Compiling the model using Gradient Descent 
model %>% compile(
  loss = 'mse',
  optimizer = 'rmsprop',
  metrics = c('mae')
)#Fitting the model
history <- model %>% fit(
  X_train, y_train, 
  epochs = 150, batch=25,
  validation_split = 0.2
)

The plot above shows that the model has a mean absolute error of around 5 in the validation set, which is part of the train set. It is worth noticing how after epoch 20 (20 rounds of training), the model starts to overfit. Now let’s evaluate the model in the test set to assess its real accuracy.

Validation

#testing accuracy in the test set
c(loss, mae) %<-% (model %>% evaluate(X_test, y_test, verbose = 0))

The model presents a mean absolute error on the test set of 5.58. However, given the small size of the sample, we end up with a small validation set. In this scenario, the validation scores would tend to change depending on which observations are taken for training, and which are taken for the validation set. For so, it is important to do a K-fold cross-validation. This process consists on splitting the available data into K partitions. If you are not familiar with K-fold cross-validation take a look at this gentle introduction.

###Setting k-fold validation
k <- 4
indices <- sample(1:nrow(X_train))
folds <- cut(1:length(indices), breaks = k, labels=F)num_epochs <- 100
all_scores <- c()
for (i in 1:k) {
  cat("processing fold#", i, "\n")
  val_indices <- which(folds==1, arr.ind = T)
  val_data <- X_train [val_indices, ]
  val_targets <- y_train [val_indices]
  
  partial_train_data <- X_train [ -val_indices, ]
  partial_train_targets <- y_train [ -val_indices]
  
  model %>% fit(partial_train_data, partial_train_targets, 
                epochs=num_epochs, batch_size = 1, verbose = 0)
  results <- model %>% evaluate (val_data, val_targets, verbose=0)
  all_scores <- c(all_scores, results$mean_absolute_error)
  
}mean(all_scores)

With a K-fold validation, the mean absolute error drops to 4.87. This means that with our model and the particular data that we used to train it, we could predict the share of votes that a party would obtain and be off, in average, by 4.87% of the share of votes. Although this number might seem low, a difference of 5% on the ballots might be decisive.

Conclusions

The application of deep learning techniques to social sciences is still very new. This opens the door for exploration and testing. In this project, it was shown that deep learning can be useful to predict voting outcomes for European parties based on their positions towards sensitive topics such as migration, pro/anti-EU, budget, economic policy, and others. Even though the mean absolute error was relatively small, a 5% change in the voting share might define an election.

Do keep in mind that this model has important limitations. For example, the data was trained for different years, in different countries. For so, if we would like to predict the share of votes in the next elections in a particular country only with this model, the accuracy would drop since political events and economic circumstances constantly evolve. In any case, it is a good example on how deep learning might be helpful for political science. I personally do not doubt that in a near future this kind of models will improve with more and better data.

Ethical considerations

If models like this evolve and get better, political parties might be able to do calculations on which political stands would yield more votes. Thus, political parties might be tempted to modify their stands to perform better in the ballots regardless of what their core voters are asking from them. Although parties can be regarded as rational entities whose survival and power depend on the number of votes, this should not dictate their core values. Minorities could eventually become even less represented, and ideological opposition among parties might become diluted. This effect would be bad for democracies; especially in a moment in which its own foundations are already being constantly challenged.