Analyzing EuroVision votes

for fun and no profit

David Montag

Follow

Published in

The Eurovision Song Contest

5 min readMay 28, 2015

--

In case you missed it, Sweden won this year’s EuroVision Song Contest, the annual pan-European music competition. The voting data happens to be freely available, and of course this data is a graph. Let’s use Neo4j, a graph database, for analyzing it! The pattern is simple:

(:Country {name})-[:VOTED_FOR {points}]->(:Country {name})

The voting data is available as an Excel spreadsheet. It needs some light massaging before being imported into Neo4j — mainly clearing out cells with whitespace. You’ll then end up with this file containing rows like so:

Albania,Serbia,7,5,7,4,3,6,14,9,2
Albania,Norway,13,24,14,18,14,17,21,19,
Albania,Sweden,5,8,5,3,1,5,5,4,7

By looking at the last column we can see that Albania gave Serbia 2 points, Norway 0 points, and Sweden 7 points. The other numbers are Albania’s jury voting results that eventually produced the final score, but we’ll ignore these here.

We can now import the data using two simple Cypher queries:

CREATE CONSTRAINT ON (c:Country) ASSERT c.name IS UNIQUE

Followed by:

LOAD CSV WITH HEADERS FROM "file:///Users/david/Downloads/eurovision.csv" AS line
WITH line, CASE line.Points WHEN null THEN [] ELSE [1] END AS voted
MERGE (from:Country {name:line.`From country`})
MERGE (to:Country {name:line.`To country`})
FOREACH (x in voted | 
  MERGE (from)-[:VOTED_FOR {points: toInt(line.Points)}]->(to) )
RETURN *

If the line has no points (votes), skip the relationship between the countries.

Sweden’s votes: MATCH (c:Country {name:”Sweden”})-[r:VOTED_FOR]->(x) RETURN c, r, x

We can now ask questions of this data using Cypher. On the left you can see a screenshot of Sweden’s votes for other countries. Each country gives 1, 2, 3, 4, 5, 6, 7, 8, 10, and 12 points to a total of 10 countries.

This dataset makes for some interesting statistical queries we can ask. Most of the queries will be graph-global, but it’s also possible to ask many of these questions locally about a specific node or pair of nodes.

What does the top list look like?

Sorting votes garnered per country by total points.

MATCH (c:Country)<-[r:VOTED_FOR]-()
RETURN c.name, sum(r.points) AS score
ORDER BY score DESC

Who did people feel the most conflicted about?

We’ll examine the standard deviation and average of each country’s received votes, and order them by decreasing standard deviation.

MATCH (c:Country)<-[r:VOTED_FOR]-()
RETURN c.name, stdev(r.points) AS stdev, avg(r.points) AS avg
ORDER BY stdev DESC

Indeed, conflicted feelings usually produce a “meh” average.

Conversely, who did people feel consistent about?

Let’s use the previous example again, but instead sort by increasing score standard deviation.

MATCH (c:Country)<-[r:VOTED_FOR]-()
RETURN c.name, stdev(r.points) AS stdev, avg(r.points) AS avg
ORDER BY stdev

In other words, people felt that the UK’s contribution was consistently shit this year. (But not so shit as to not receive a vote, so at least they’ve got that going. Looking at you, Germany…)

What country got the most 12's? (max points per country vote)

Instead of summing up the points, let’s just count the number of 12-point votes for each country.

MATCH (c:Country)<-[:VOTED_FOR {points: 12}]-()
RETURN c.name, count(c) AS twelves
ORDER BY twelves DESC

What country ranks high in getting 12's but not reciprocating at all?

Some countries garnered top votes (12), but didn’t reciprocate with any points to their biggest fans. Let’s look at the 12-point votes, and keep only the ones where there was no vote back. Then we’ll sum up who is the biggest diva.

MATCH (a)-[:VOTED_FOR {points: 12}]->(b)
WHERE NOT (b)-[:VOTED_FOR]->(a)
RETURN b.name, count(b) AS count
ORDER BY count DESC

What countries are like-minded in their overlapping votes?

For each country a and b we’ll examine their votes for countries they both voted for. We’ll sum up the difference between their votes for each such pair of countries a and b. If we get a small difference, that means they voted very similarly, and thus disagree very little. The inverse also applies.

MATCH (a)-[av:VOTED_FOR]->(x)<-[bv:VOTED_FOR]-(b)
RETURN a, b, sum(abs(av.points-bv.points)) AS disagreement
ORDER BY disagreement

Note the symmetrical results. We can add an ID comparison predicate [id(a) < id(b)] if we want to eliminate these.

Indeed, Australia and Belgium voted extremely similarly with regard to their overlapping votes.

What does the graph of 12's look like?

Let’s retrieve all 12-point votes and let the UI stitch them all together.

MATCH (a)-[v:VOTED_FOR {points: 12}]->(b)
RETURN a, v, b

Fun observation — three factions form: Eastern Europe, Southern Europe, and Western/Northern Europe.

Now let’s compute a crude centrality measure over this graph! Don’t be scared by the more advanced query — this is typically something done algorithmically with Neo4j’s Java API, but Cypher can do it too with some smart tricks.

MATCH p=(a)-[:VOTED_FOR* {points: 12}]-(b)
WITH a, b, p, length(p) AS len
ORDER BY len
WITH a, b, collect(p) AS ps
WITH ps[0] AS p
UNWIND nodes(p) AS x
RETURN x.name, count(x) AS weight
ORDER BY weight DESC

We can now clearly see the hubs in the graph.

Hope you enjoyed these examples! I encourage you to pick a dataset you’re familiar with (the same if you want), model it as a graph, and see what you can do with it!

Other pieces:

Analyzing EuroVision votes

for fun and no profit

Written by David Montag