Implementing the ELO ratings for soccer teams in python.

5 min readAug 8, 2023

In this article I will present pythonic implementation of several variations of ELO ratings and examine their performance in a simple logistic regression model.

The basis for my code to implement the ELO ratings can be found here:
https://medium.com/mlearning-ai/how-to-calculate-elo-score-for-international-teams-using-python-66c136f01048

I have made changes to the code to speed it up.

First let’s take a look at our data. Our soccer data covers the top leagues in the four Scandinavian countries.

We have the team names, the country and league, the kick off time and the scores. The odds data is an average of bookmakers odds. Before obtaining our ratings we check our data for NaN values, and any inconsistencies.

We have only 2 NaN values, and they are in the odds columns. We can fix this easily. Now, the implied probability of our bookies prices should lie in the range between 1.04 to 1.10. The excess over 100% is called the overround and it is there to ensure the bookmakers always have an advantage. A total implied probability below 1.0 would imply an error in our data. The implied probability is calculated by dividing 1 by the decimal odds.

The average is 106.9% with a small standard deviation. All values lie above 100%, indicated above by the red dotted line. We see only one outlier in the data which occurs at around index 2200.

After fixing this outlier and the NaN values we can proceed with the calculation of ratings. Here is the code:

https://gist.github.com/klynch999/378150158313fab7ec6896208b7e90e2

The code follows that of the article linked at the beginning, with some changes. You will find a detailed explanation of the code and the ratings there. The changes I’ve made are:

Introducing a timestamp variable to ensure our data is ordered chronologically.
Converting the dataframe to a list of lists to speed up computation.

The parameter k = 22 was found following optimisation. In later articles I will demonstrate how to find the optimal values for these parameters.

The next piece of code implements the goals-based ELO ratings, as outlined in this paper.

The ELO-goals rating is a modification of the basic ELO that takes margin of victory into account. To prevent extreme score lines from having a significant impact on the outcome, we cap the number of home goals and away goals at 6 before performing the calculation.

The code is mostly similar to the first implementation, with the adjustment

k = k*(1+delta)**mu

So now k is not a constant, but depends on delta, the goal difference. The magnitude of the change in ratings now depends on margin of victory in addition to match outcome.

Finally, I introduce my own variation on the ELO ratings. This is very similar to the goals-based system. We amend it by replacing goal difference with the difference in their implied probabilities. This will mean that consistent favourites, who win when they are expected to will be given additional reward and inconsistent favourites additional penalty.

Before comparing their performance in a predictive model, let’s examine our features more closely.

Surprisingly enough, the basic ELO ratings shows the strongest correlation with our targets.

This chart shows the 250 period moving average of the correlations. The most stable is the ELO Goals based rating.

This chart shows the evolution of the ELO ratings for the Norwegian team Molde. Molde are strong performers in the Norwegian league. What is noticeable is the stability of the Odds based rating. The relative volatility of the basic ELO ratings can be attributed to the optimal k used for that ratings system.

We now compare the predictive power of each of our features, here is the code:

I compare the features using accuracy, precision, recall, brier loss, and roc auc. The inclusion of “home_ip” allows us to see how our features measure up against the bookies predictions. Then we create a Lasso classifier that uses an ensemble of all features to see if this improves our results.

The ELO rankings are clearly inferior in their predictive power compared to the bookmakers predictions. The basic ELO rating difference is the best performing of our features. Combining all features together improves the performance but we cannot match the performance of the bookies with these features. Most notable is the superior performance in the brier score loss and roc auc. These metrics measure how far off our probabilities are from the correct label. We will have to find better ratings!

That is it for this article, hope you enjoyed. I will be looking further into new adaptations of the ELO ratings using the bookmakers odds, and I will be publishing implementations of the ratings systems found here and here. Follow for more!

Implementing the ELO ratings for soccer teams in python.

Written by ML Soccer Betting