The Data Science Behind Malta’s Election Social Activity Scoreboard

Learn how AI can find patterns within social data to intelligently extract political content from social media and to classify posts according to the political party preference.

Mauro Pirrone
Minely
5 min readMay 27, 2017

--

Minely has worked with ICON to develop a Social Activity Scoreboard for Malta’s 2017 Election. In this project one can see a number of charts ranking the sentiment of the two parties — PN vs PL.

A large and diverse dataset is used to chart the fluctuating trends in this election. We have mined — with the help of Artificial Intelligence — almost 2,000,000 Facebook likes and almost 30,000 posts from across 13 news publishers. This has allowed us to find patterns within social data and extract sentiment preference for Malta’s two largest political parties. This unique approach is coupled with a noise filter to allow us to exclude stories which are non-political and thus should be excluded from our sample.

We analysed the PN vs PL trend in terms of active users — users interacting with political posts through the like button. Data was analysed for 137,603 unique users.

And a similar chart, showing data by month. We can see clearly how PN has gained users over the past months and how they are getting closer.

Monthly Unique Active Users

We also analysed the distribution of PN vs PL sympathies in user activity across the major local portals.

Finally, there’s a list of posts issued by different media and the relative success of the post based on user engagement.

But, how does it work?

Minely has adopted an AI-powered approach to solve a number of challenges.

(1) How can we intelligently filter political posts from non-political posts? (2) How can we determine if a post is in favor of PN or PL on independent media?

Step 1 : Build a user list of PL / PN followers

Through Minely’s Social Media connectors we extract all the users that have liked posts from the following Facebook pages during the last 100 days.

Step 2 : Not all users are the same

We assign each user a score. The score is equal to the number of likes made by that user during the last 100 days on PL-related on PN-related posts.

  • user_score = total_likes
User scoring with PL/PN score

Step 3: Rank each post

At this step we rank each post:

  • PL_likes = count of all likes from PL-related users
  • PL_score = sum of all user scores from PL-related users
  • PN_likes = count of all likes from PN-related users
  • PN_score = sum of all user scores from PN-related users
Post scoring before normalisation

All scores are normalised to take into consideration the average PL_score and PN_score for each media channel. Further more, we make use of a ratio where the score is divided by the number of likes.

  • PL_ratio = normalised_PL_score / PL_likes
  • PN_ratio = normalised_PN_score / PN_likes

Step 4: k-means clustering

At this step we apply the k-means clustering algorithm. The machine learning algorithm aims to partition n observations (posts) into k clusters (3 clusters) in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The clusters in our case are the following:

  1. PL-related posts
  2. PN-related posts
  3. Other — non-political, or non-relevant neutral political posts

The construction of the machine learning algorithm is developed using the Minely platform — no coding required. Minely’s visual flow designer enables users to build intelligent applications easily.

Minelys’ Visual Flow Designer for building machine learning pipelines.

The following is the visualisation of the clusters, after eliminating posts which have reactions less than 25.

k-means results

Accuracy

To test accuracy, we have manually analyzed a sample dataset of 500 posts. The method for filtering the noise — i.e. filtering political posts from non-political posts using clustering, is 86.6% accurate. Once a post is classified as political, the classification of PN/PL is 99.8% accurate. We exclude any posts with less than 25 reactions. When the number of reactions is too low, this affects the ratios which causes poor performance.

Alternative methods

Alternative methods for this project could have been used. Such as, applying natural language processing by analysing the content of the post and the comments made on that post. This involves reading the sentiment by analysing text. Since Maltese NLP tools are limited, this option has been excluded, as analysing English posts only was not an option.

--

--