Predicting Eurovision Ranks Through Social Media Analytics

How to use social media data and apply sentiment analysis using Minely

Published in

Minely

4 min readMay 15, 2017

Minely predicted the songs that made it to the top 6 in Eurovision 2017 through social media analytics. The following are the predictions done by Minely.

And the actual results…

Minely collected data from YouTube and Twitter to predict the sentiment of each song. The difference between positive sentiment and negative sentiment is the resulting score used for predicting the leader board.

Building this logic in Minely involves the following steps.

Step 1: Connect to the data sources

Minely has in-built connectors for several social networks. In this example, we have used Twitter and YouTube connectors. Creating a connection is very easy, all it takes is to input the necessary access credentials.

We also created a connection to MongoDB, the data source used for saving the results of the analysis.

Step 2: Upload data files

The following CSV/TSV files were uploaded.

countries.csv — list of countries in the world containing country code, name, etc
semi-final-1.tsv — list of participants for semi-final 1
semi-final-2.tsv — list of participants for semi-final 2
finals.tsv — list of finalists

Sample data file containing list of participants.

Step 3: Define a stream to pull tweets by Eurovision hash tags

We create a Twitter stream to filter data by @Eurovision, #Eurovision and #ESC2017.

Step 4: Extract participants

Here we extract participants by country name, artist and name of song. We have downloaded the list of participants from Wikipedia as a CSV file and transformed the data in a way that’s easy to blend with streaming analytics.

Step 5: Pull data from YouTube in a scheduled workflow

YouTube only supports reading data in a batch matter. We created a workflow that pulls data from the Eurovision YouTube Channel and save the results in MongoDB. After downloading the data, we created a workflow to match any YouTube videos with their respective country or artist.

Tweet filtering with Levenshtein distance.

Matching of data was done on single words as well as on 2 n-grams. This was done to handle cases where we needed to match phrases such as “San Marino”. We also used the Levenshtein distance to match phrases in a fussy manner and cater for situations where we need to match phrases like the following OG3NE with O’G3NE.