Predicting Eurovision Ranks Through Social Media Analytics
How to use social media data and apply sentiment analysis using Minely
Minely predicted the songs that made it to the top 6 in Eurovision 2017 through social media analytics. The following are the predictions done by Minely.
And the actual results…
Minely collected data from YouTube and Twitter to predict the sentiment of each song. The difference between positive sentiment and negative sentiment is the resulting score used for predicting the leader board.
Building this logic in Minely involves the following steps.
Step 1: Connect to the data sources
Minely has in-built connectors for several social networks. In this example, we have used Twitter and YouTube connectors. Creating a connection is very easy, all it takes is to input the necessary access credentials.
We also created a connection to MongoDB, the data source used for saving the results of the analysis.
Step 2: Upload data files
The following CSV/TSV files were uploaded.
- countries.csv — list of countries in the world containing country code, name, etc
- semi-final-1.tsv — list of participants for semi-final 1
- semi-final-2.tsv — list of participants for semi-final 2
- finals.tsv — list of finalists
Step 3: Define a stream to pull tweets by Eurovision hash tags
We create a Twitter stream to filter data by @Eurovision, #Eurovision and #ESC2017.
Step 4: Extract participants
Here we extract participants by country name, artist and name of song. We have downloaded the list of participants from Wikipedia as a CSV file and transformed the data in a way that’s easy to blend with streaming analytics.
Step 5: Pull data from YouTube in a scheduled workflow
YouTube only supports reading data in a batch matter. We created a workflow that pulls data from the Eurovision YouTube Channel and save the results in MongoDB. After downloading the data, we created a workflow to match any YouTube videos with their respective country or artist.
Matching of data was done on single words as well as on 2 n-grams. This was done to handle cases where we needed to match phrases such as “San Marino”. We also used the Levenshtein distance to match phrases in a fussy manner and cater for situations where we need to match phrases like the following OG3NE with O’G3NE.
Step 5: Blend data using streaming data from Twitter
Here we process data in a streaming manner and apply sentiment to English tweets only. The stream processes micro-batches every 1 minute.
You may notice how tweets spike as soon as the Eurovision started at 21.00, up to around 30 tweets / second.
Step 6: Visualisation of results
The results have been visualised using a dashboard.
The final dashboard was then given public access and embedded on our website accessible through this link: https://www.minely.com/eurovision-2017/