What‘s behind the Eurovision voting

Analysis of Eurovision data prior to the 2019 final, using Python

Jaime Durán

Published in

yottabytes

7 min readMay 18, 2019

“Kate Miller-Heidke (Australia 2019)” by Martin Fjellanger, Eurovision Norway, EuroVisionary

Artículo también disponible en español.

You are watching the Eurovision Song Contest final tonight (or not). The performances end, and the funniest part arrives: the “douze points” moment. In Spain we had a commentator (Jose Luis Uribarri) who always tried to guess the highest scores of every country, based on past years (“… and for sure the 12 points will go for Russia! “). How is it possible? How predictable can everything be? Is Eurovision a politicized circus? Do countries vote to their neighbours as much as they seem to do? What weight does emigration have in televoting? We’ll try to answer some of these questions throughout this article.

Preamble

The task won’t be easy for several reasons:

The participating countries have been changing due to conflicts, separations, invitations, membership of European Broadcasting Union, money, etc.

The current scoring system (12–10–8–7–6–5–4–3–2–1) has been used since 1975.
The scores given by each country were only decided by a jury at first. In 1997, televoting was introduced in some countries, and it began to be tested and applied unevenly. Since 2016, the vote of the jury and the vote from the viewers is computed separately in all countries (something very recent, so we have few samples enabling us to make predictions).
As of 2004, the semifinals were introduced to decide who participated in the grand final; something that supposes an added problem for our analysis.
I only had 1 day to do the analysis :)
We have the data of all the votes made between 1975 and 2018, compiled by Datagraver.com and available in data.world. We proceed with the analysis after doing a brief data review and cleaning (link to the notebook at the end of the article).
Everything has been done using Python.

Analysis of a single country

Let’s see how the other countries voted for one country; for example Spain, taking into account all the editions since 1975:

We now look only at the televoting, from when it’s computed separately (2016):

We might think that they hate us all over Europe, or we have really really bad songs! So how can we even think about winning?

And what about the scores issued by Spain?

So we can begin to theorize that some scores are predictable. We move forward…

Can we predict the scores of some countries without listening to the songs? Scoring matrices

We’ll represent the data taking into account different periods based on the scoring system. NOTE: We’ll only use the final’s votes.

1975–2018 (see notebook)
2001–2018. Since 2001, all countries are obliged to use televoting, except for technical or major cause problems. Therefore, it’s from then on that the votes don’t depend solely on a jury. Anyway, we must bear in mind that the system has been changing over the years with some modifications.
2016–2018. Since 2016, the vote from the public is added to the vote of the jury separately. It’s interesting to analyze it apart; especially because both are usually very different.

What we get here enables us to emulate Uribarri during the final and have fun with our friends :)

We can already see more than one favoritism here. We go exclusively with the vote of the public, for further analysis:

The downside of this part, as we can see above, is that there isn’t enough data to draw solid conclusions from the televoting (just look at the column for Russia, or other countries without data). The most interesting thing for the moment could be comparing it with the equivalent for the votes from the jury, and see how they differ (although it’s probably better to analyze that country by country). We could also analyze which of both votes is more biased by proximity, language, or immigration.

Game or Meta-game?

It’s always been said: there is a lot of meta-game in Eurovision; the vote from the countries does not only reflect how good a song is, or how well it is performed; Apart from this, there exist other factors:

The votes of certain countries tend to favor their neighbors. Spain crosses the fingers for Andorra to return to the Festival, and perhaps Russia is thinking of getting rid of some land to get more votes from new ex-Soviet neighbors.
Televoting favors countries with many emigrants in other European countries. In Spain for example, Romania is always very well scored, even if their song sucks.
There are political events affecting voting. Only 3 years ago Ukraine won Eurovision with a song that raised a lot of controversy.
etc, etc

What if we could capture the relations between the countries based on their votes?

We’ll try to group the countries using a dendrogram to visualize the similarities in the way they distribute scores:

Some relationships were already intuited, and others maybe weren’t known. We can clearly differentiate several blocks of similar countries. Take into account that the grouping is built by similarity in the votes cast; not by the votes between two countries in both directions (something we can see in the previous matrix).

Who has a neighbor has a treasure

Tell that to Australia, who in the last 2 editions almost didn’t receive points from televoting (although they did pretty well from the juries side):

In 2016, Australia received many votes from the public; we can think that the reason could be the song being good. Here we get a possible hint: the meta-game effect will be more visible when most countries misjudge you (low final score).

We’ll obtain the most voted countries (best friends) for each participant in the history of the competition:

If you know Geography, you’ll be seeing the neighbors effect right now!

The diaspora effect

It would be interesting to verify the effects of emigration on televoting, but we only have that detail for the last 3 years, so we can’t draw big conclusions at the moment. Next, we’ll do another analysis where we can observe this effect.

Voting from the jury vs. Voting from the public

These comparisons are very interesting in order to assess how different the jury and the public think.

Received votes

We plot how the votes were distributed in the 2018 edition:

Taking into account the last 3 editions (since this distinction can be made), and computing the average so that the number of reached finals doesn’t influence:

On the one hand we can look at Russia, Poland or Romania, where the score received from the public is much greater than the score received by juries. On the other hand, we have the extreme case of Malta, where the opposite happens (remember Malta is a small island, without neighbors).

Given votes

We see how the given vote from the jury and that from the public vary for a specific country; in our case Spain. We take into account the last 3 editions, and compute the average, in order to make the comparison more real:

We have to look at very unbalanced countries here:

On the one hand we have countries that receive much more votes from the public than from the jury: Bulgaria, Czech Republic, Romania, Ukraine, Poland, Estonia or Russia.
On the other hand, countries that receive much more vote of the jury: Australia, Austria, Greece, Hungary, Ireland or Malta.

Does immigration influence the public’s vote?

We compare it with the data: According to the INE (as of 2017), the countries with the most emigrants to Spain are: Romania, United Kingdom, Italy, Bulgaria, Germany, France, Portugal and Poland.

And what if we compute the data since the televoting was mandatory for all countries?

The most voted countries from Spain are: Romania, Italy, Bulgaria and Portugal. We have 3 countries in the top 4 matching the INE’s inmigration list! So it seems that it definitely does influence :)

Future lines of work

This analysis could be expanded in several directions:

More detailed analysis of what we have seen in the last sections; the bias due to proximity, language, or immigration …
Attempt to quantify the voting bias.
Attempt to predict results taking into account other data (SSNN, press, etc).
Possibility of obtaining the separated data from the jury and the public voting for the editions prior to 2016.

And that’s it; I hope this article was useful and interesting for you.

You can see the notebook associated with this article from the following link: https://nbviewer.jupyter.org/github/pyjaime/eurovision-2019/blob/master/eurovision-2019.ipynb

You can also download the code to expand the scope of the analysis: https://github.com/pyjaime/eurovision-2019