How to calculate ELO score for international teams using python

Daniel Guerrero
5 min readFeb 13, 2022

--

An easy way to compare international teams performance using historical matches data in python

Fifa ELO Ranking (source)

Historically there have been discussions about which international team is the best at the time. Some say that the European teams have always been better because soccer was born there, others say that the American teams have been better and bring up the 5 World Cups in Brazil and the many talented players who were born in this country and the discussion goes on and on. follow. Fortunately, we can rely on the data to have an approximation of which team is the best of the time, and this is achieved with the ELO system (which, in fact, is the one currently used by FIFA in its official ranking). Using Python and the data set available in Kaggle, you can find out which have been the 5 best and worst selections of all time.

What is the ELO system?¹

The ELO system owes its name to Arpad Elo, who was a chess player and mathematician who developed this scoring system to determine the skill level of a chess player. The formula to calculate the Elo rating after a match takes into account the current own rating, the probability of winning the match, the result of the match, and an adjustment factor K. The formula is as follows:

Where P at “t-1” is elo the score before the match, K is the adjustment factor, S at “a” is the result of the match (1 if they won, 0 if they lost, and 0.5 if there was a tie).

E at “a” is the expected value of the match that is calculated using the logistic function and takes as parameters the Elo rating of both players. It is calculated as follows.

In the case of soccer, this system has been used by FIFA in its official ranking since 2018. The adjustment factor K is taken as the importance of the game and takes the value of 5 for friendly matches and 60 if it is a quarter-final match. final or later of the FIFA world cup.

How can we calculate the ELO score of national teams using Python?

To calculate the ELO scores of the national teams over time, it is necessary to have all the matches that have taken place between them over time. Thanks to Mart Jürisoo this information can be found here. And the jupyter notebook that I will show next can be seen on my GitHub.

After loading the data and calling it df, we will select only the columns that we will use. What are they:

· date

· Home_team

· away_team

· home_score

· away_score

· Tournament

For the variable K, we will use 10 for international friendlies, 25 for FIFA world cup qualifiers, 40 for confederation competition matches and 55 for FIFA world cup matches, and 5 for other tournaments 1.

To calculate the expected result (E_a), we will create a function that takes the Elo of the home team and the Elo of the away team and returns an array where the first entry is the probability that the home team wins, while the second array is the probability that the away team wins.

To obtain the actual result (R_a), we create a function that takes the goals of both teams as parameters and returns an array with the result of the home team in the first position and the away team in the second position.

Finally, we will use the functions created above to create our main function, the one that calculates the rating of both teams after a match. This feature needs the goals scored in the match, the tournament being played, and the team's previous ratings.

A dictionary will be created that will store the last Elo score of each team. This dictionary will be updated every time the team plays a game. After creating this dictionary we can iterate over the dataset to get the ratings before and after each match.

In this way we obtain a dataframe with all the historical matches of the national teams, along with their Elo score at each point in time. Thus we would have an objective measure to say which are the best and the worst selections of the moment. Based on this score, the top 5 national teams of all times are:

  1. Germany (2014). Elo = 1795.7
  2. Spain (2010). Elo = 1795.5
  3. France (2021). Elo =1775.6
  4. Brazil (2021). Elo = 1771.3
  5. Netherlands (2010). Elo =1759.4

And the 5 worst:

  1. Luxembourg (2006). Elo = 879.4
  2. San Marino (2021). Elo =895.5
  3. Liechtenstein (2021). Elo = 987.8
  4. Andorra (2016). Elo = 993.1
  5. Sri Lanka (2021). Elo =1005

Disclaimer: Note that these results are subject to the chosen weights and the formula used to calculate the expected result. The score may vary if these parameters are changed. Feel free to change the weight and share your results!

Thanks for reading this post, feedback in comments will be well received. In the coming weeks I will be uploading new posts on how to visualize this data in streamlit and how the ELO score can be used to create a predictive model of international games with sckit learn. To be aware of these new posts, consider following my blog on this page or following my profile on Linkedin .

--

--