The Data Science Behind Malta’s Election Social Activity Scoreboard
Learn how AI can find patterns within social data to intelligently extract political content from social media and to classify posts according to the political party preference.
A large and diverse dataset is used to chart the fluctuating trends in this election. We have mined — with the help of Artificial Intelligence — almost 2,000,000 Facebook likes and almost 30,000 posts from across 13 news publishers. This has allowed us to find patterns within social data and extract sentiment preference for Malta’s two largest political parties. This unique approach is coupled with a noise filter to allow us to exclude stories which are non-political and thus should be excluded from our sample.
We analysed the PN vs PL trend in terms of active users — users interacting with political posts through the like button. Data was analysed for 137,603 unique users.
And a similar chart, showing data by month. We can see clearly how PN has gained users over the past months and how they are getting closer.
We also analysed the distribution of PN vs PL sympathies in user activity across the major local portals.
Finally, there’s a list of posts issued by different media and the relative success of the post based on user engagement.
But, how does it work?
Minely has adopted an AI-powered approach to solve a number of challenges.
(1) How can we intelligently filter political posts from non-political posts? (2) How can we determine if a post is in favor of PN or PL on independent media?
Step 1 : Build a user list of PL / PN followers
Through Minely’s Social Media connectors we extract all the users that have liked posts from the following Facebook pages during the last 100 days.
- Joseph Muscat (PL)
- One News (PL)
- Partit Laburista (PL)
- Simon Busuttil (PN)
- Net News (PN)
- Partit Nazzjonalista (PN)
Step 2 : Not all users are the same
We assign each user a score. The score is equal to the number of likes made by that user during the last 100 days on PL-related on PN-related posts.
- user_score = total_likes
Step 3: Rank each post
At this step we rank each post:
- PL_likes = count of all likes from PL-related users
- PL_score = sum of all user scores from PL-related users
- PN_likes = count of all likes from PN-related users
- PN_score = sum of all user scores from PN-related users
All scores are normalised to take into consideration the average PL_score and PN_score for each media channel. Further more, we make use of a ratio where the score is divided by the number of likes.
- PL_ratio = normalised_PL_score / PL_likes
- PN_ratio = normalised_PN_score / PN_likes
Step 4: k-means clustering
At this step we apply the k-means clustering algorithm. The machine learning algorithm aims to partition n observations (posts) into k clusters (3 clusters) in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The clusters in our case are the following:
- PL-related posts
- PN-related posts
- Other — non-political, or non-relevant neutral political posts
The construction of the machine learning algorithm is developed using the Minely platform — no coding required. Minely’s visual flow designer enables users to build intelligent applications easily.
The following is the visualisation of the clusters, after eliminating posts which have reactions less than 25.
To test accuracy, we have manually analyzed a sample dataset of 500 posts. The method for filtering the noise — i.e. filtering political posts from non-political posts using clustering, is 86.6% accurate. Once a post is classified as political, the classification of PN/PL is 99.8% accurate. We exclude any posts with less than 25 reactions. When the number of reactions is too low, this affects the ratios which causes poor performance.
Alternative methods for this project could have been used. Such as, applying natural language processing by analysing the content of the post and the comments made on that post. This involves reading the sentiment by analysing text. Since Maltese NLP tools are limited, this option has been excluded, as analysing English posts only was not an option.