Understanding Stemwijzer

Unraveling the Dutch Voting Compass with Data Science.

Published in

Ordina Data

5 min readJan 7, 2024

On November 22nd the elections for the Dutch House of Representatives were held. Like many voters I used Stemwijzer as a starting point. This blog discusses how I leveraged Data Science methods to improve my understanding of this tool. Before getting into that however, some context:

Stemwijzer includes 25 political parties (new and incumbent).
Stemwijzer presents 30 statements. Users respond with agree, disagree or neither and each user gets the same statements in the same order.
Statements reflect important election themes (eg: The government should take action to reduce the cattle herd by 50%).
The user’s answers are matched to each political party on each statement. The final output is an aggregated agreement score per party on a scale of 0–100. In the example below the VVD matches best. Hence, based on these statements it would be in the user’s best interest to vote for this party.

With 205.9 trillion unique possibilities (3³⁰), sourcing every unique way of filling in Stemwijzer is infeasible. Therefore I randomly generated Stemwijzer scenarios:

def _create_scenario():
    """Generates a random Stemwijzer scenario.
    Options are limited to agree (1) and disagree (-1).
    """
    opts = [1, -1]
    scenario = [random.choice(opts) for _ in range(30)]
    return scenario

To interact with Stemwijzer programmatically I used Selenium, an open-source automation tool. The code below finds and clicks a button:

if opt == -1:
     driver.find_elements(By.CSS_SELECTOR, "[aria-label=Oneens]")[0].click()
     time.sleep(1)
elif opt == 1:
     driver.find_elements(By.CSS_SELECTOR, "[aria-label=Eens]")[0].click()
     time.sleep(1)

Using the above, my data collection process was as follows:

Generate a random scenario.
Open a browser window, navigate to Stemwijzer and send the scenario options to the corresponding statements.
Store final output and repeat the steps above.

I ran the data collection script for a few days and collected 9000 scenarios. Being completely random, these scenarios do not represent realistic user behavior. They can be used however to analyze Stemwijzer’s mechanics.

The first thing I wanted to analyze is the agreement score distribution. Stemwijzer is politically neutral, so this should be similar across parties.

Distribution of agreement scores per party

The violin plot above shows similar distributions, with medians of 50 for most parties. The distributions for JA21, Christen Unie, BBB and Volt are shifted lower. In Stemwijzer these parties hold a neither stance relatively often. Since my scenarios excluded this option this difference stems from my data collection method and not from Stemwijzer itself.

The density plot below confirms that most parties only slightly deviate. For example PVV and LEF are shifted to the lower end of the scale. I attribute this to my data collection process and to my sample, which covers only a fraction of all possibilities.

Secondly, I wanted to examine inter-party correlations. Logically, I expect positive correlations between parties with similar opinions and vice versa. The correlation matrix below shows this to be true. For example the PVV correlates positively with other rightist parties (BVNL, FVD, 50Plus) and negatively with Volt (a progressive, leftist party).

Thirdly, I wanted to identify the most important statements for the PVV since they won the elections. This analysis was structured as follows:

Generate a dataset with independent variables for each statement and one dependent variable (PVV has highest score 0/1)
The dependent variable is imbalanced because each party wins ~ 4% of the scenarios. To counter this, I used the RandomOverSampler from imbalanced-learn. With Optuna’s hyperparameter optimization framework I then trained a Random Forest model.
These steps worked well, yielding a model with f1-scores of 0.48 (minority class) and 0.97 (majority class) on an unseen validation set.
Compute/Display the Feature Importance plot

Most important Statements for the PVV in Stemwijzer

Statements 18, 9, 23 and 8 are most important for an outcome in favor of the PVV:

Statement 18: If you are entitled to benefits from the governments and you live together, you should receive the same amount as when you live alone.

Most parties (16) agree, PVV is one of 9 parties to disagree. In my data sample, of the 79% of the 452 times PVV received the highest score disagreed with this statement.

Statement 9: The government should allocate more funds to schools for lessons in art and culture.

Most parties (14) agree, PVV is one of 11 parties to disagree.

Statement 23: The government should make it easier to build residential neighborhoods on agricultural land.

PVV is one of 9 parties to disagree.

Statement 8: There should be a law stating that the Netherlands always allocates 2% of its gross domestic product to defense.

Most parties (16) agree, PVV is one of 9 parties to disagree.

I could not find a pattern in these questions. It is obvious however that for each of the 4 statements, PVV is part of the minority.

In conclusion, my findings illustrate that Stemwijzer is fair and unbiased: the distribution of the agreement scores is similar across parties. Secondly, correlations between parties in Stemwijzer’s agreement scores align with their closeness on the political spectrum. Thirdly, I found that statements in which a party holds a minority position are most important to be recommended by Stemwijzer.

Understanding Stemwijzer

Unraveling the Dutch Voting Compass with Data Science.

Written by Maarten Majoor