Blog Post 3
This week we performed an additional statistical analysis, undertook a machine learning sample, and refined our conception of populism to indicate particular electoral preferences. We ran two different types of machine learning experiments on our data this past week: SVM and regression. We used the regression method to learn a function that best fits the scores of our labelled data, and used SVM to learn a binary classifier from our labelled data that will determine whether a person is populist or non-populist.
We performed a simple linear regression data on the information using the Linear Regression module from sklearn. We were able to obtain sample variance, coefficients, and the R2 statistic for a basic model and discovered that satisfaction with democracy in one’s own country was the largest negative predictor of populist sentiment and the value of good behavior in a child over consideration for others was the largest positive predictor of populist sentiment throughout the population, with a low variance for our results, indicating a good fit for the model. In the future, we shall determine the Cp statistic using backward stepwise regression from our full model for each sub-model to select the sub-model that minimizes both the variance of and the bias of the estimators. We plan on comparing this sub-model we obtained by minimizing the Cp statistic for linear regressions and the one in which we determined the most effective model using 10-fold cross validation using a Epsilon Support-Vector Regression (SVM) to ultimately discover the best explanatory covariates that predict populist electoral preferences.
For SVM, We adapted the code from the Machine Learning Lab to train the SVM classifier, after preprocessing our data into the correct format. Our initial results are unsatisfactory. The current training mean accuracy is only about 61.3%. After inspecting the confusion matrix, we found out that the SVM classifier is classifying all data as “1” (populist) and none as 0 (“non-populist”). We have now realized that this is likely because the default setting does not work well for our data. To solve this, we downloaded the Optunity library to tune our parameters, a python library containing various optimizers for hyper parameter tuning. However because the experiments are quite time-consuming, we have not yet found a satisfactory set of parameter. We are hoping to fix this by next week.
A significant challenge posed by this project has been the classification of the abstract concept of “populism”. We have found two ways to quantify this philosophical idea. The first uses an individual’s personal preferences and general opinions towards the government. In this classification method we broke down populism to entail negative sentiments towards the government, a feeling that one’s interest is not being represented, liberal opinions regarding cultural conformity, and generally individualist sentiments. The second method looks at political party affiliations and the voting history of the respondent. This metric looks at parties that are in selected political groupings (Europe of Freedom and Direct Democracy & Europe of Nations and Freedom) and classifies them as populist due to their marked Euroscepticism, xenophobia, and nationalist sentiment while parties that are in the grouping Alliance of Liberals and Democrats for Europe are considered anti-populist due to their strong pro-Europeanism and their moderate (in some cases, technocratic) political attitudes in line with aligned establishments . We have so far run our machine learning on the latter of these two scoring methods. Our hope, however, is that these two metrics will be reconcilable. So far our results point to an easy reconciliation: Our linear regression using party data has identified the expected personal preference questions as major determinants of populism. Furthermore, we will compare our metric to one which determined populist social attitudes (similar to Altmeyer’s Right-Wing Authoritarianism Scale) to evaluate whether a marked discrepancy between stated preferences and actual electoral selections exists.