Simple Regression Analysis: Portugal 2019 Election Results

Shraddha Anala
Analytics Vidhya
Published in
3 min readMay 18, 2020

I’m expanding with more posts on ML concepts + tutorials over at my blog!

Finally, my random dataset (link) generator outputted a regression task after continuous classification problems in a row. This time, the dataset being the Real-time Election Results, the task is to predict how many MPs were elected at a district/national level after the 2019 Portugal Parliament Elections.

Photo by Joakim Honkasalo on Unsplash

Acknowledgements for the Dataset:

Nuno Moniz (2019) Real-time 2019 Portuguese Parliament Election Results Dataset. arXiv

About the Dataset:

The dataset contains 28 attributes relating to territorial, votes and voter information along with the timestamps at the time of recording the data.

Describing the evolution of the election results, the final column, ‘FinalMandates’ is the number of MPs elected and the target variable we have to forecast.

Tutorial:

As usual, the first steps involve cleaning the data and because there are 2 feature variables; Territory and Party(Names), that are categorical, we will be encoding them. But first, some exploratory data analysis to gain insights.

Distribution of Votes according to Territories

As can be seen, it seems the overwhelming majority of the voters are from the ‘Territorio Nacional’ or National Territory, although when I tried googling the term to see if it referred to a district or a prefecture, I came up short.

Anyway here’s how you can plot this distribution plot in seaborn.

Now moving on to encoding:

After preprocessing, all that’s pretty much left to do is splitting into training/testing subsets and build the model as I’ve done below.

And then evaluate the model performance according to metrics. Since this is a regression model, we will be calculating the Mean Squared Error, R2 Score and the Explained Variance Score.

The Mean Squared Error and the R2 Score explain how close the regression line is to the data points. The Explained Variance Score is a measure of the model’s ability to explain the variability encountered in the observations.

Here are the metrics of the Decision Tree Regressor:

Explained Variance Score of the model
Mean Squared Error of the model
R2 Score of the model

This is it. Having a regression task was a well-needed change of pace and it’s pretty interesting to look at regression analytics and see how the model performance is evaluated with different metrics.

Please leave any suggestions, questions, requests for further clarifications down below and I’ll see you next week.

Thank you for reading and Happy Machine Learning!

--

--