The datafication of sport players to boost pleasure and results

Published in

Decathlon Digital

7 min readSep 24, 2020

For some sport-addicted people like me, practicing physical activity is not just about exercise itself but, before everything else, going beyond his limits and having fun with it.

Optimizing individual sports is really easy but as soon as you want to practice it with other people, the equation becomes harder.

In 2015, I joined my current company Decathlon in Lille, France, where more than 200 guys were playing five-a-side football several times per week during lunch breaks. I decided to join the league where the teams were defined before each game based on the win/lose ratio of each available player. I really enjoyed the concept of generating teams automatically from player history but this simple formula was far from sufficient in making great balanced games with everyone enjoying it from the beginning to the final whistle. I started to think about how to improve it and here is what I built.

1. The Model

A player as a mathematical variable

To start slowly, let’s consider that each player can be modeled in a simple way as a mathematical variable called his “rating”. Let’s note cₚ for the rating of the player p.
Higher c is, stronger the player is and higher his impact will be on a game.

A team as a sum of players

As five-a-side football is a team sport, it’s not enough to know players, we need to know the strength of a whole team. The easiest modelisation that can be made to go further is to consider the strength of a team as the sum of its players. We will note C🇦 the rating of the team A :

NB : to simplify this article, I am going to let out all the possibilities we could have with concepts of compatibility / incompatibility between players.

A game as a simple mathematical equation

Now that we know how to represent a player and a team, we need to model an opposition between two teams that has a final result. Here is how the equation looks like :

Finally, the whole league as a matrix

Naturally, we can build a matrix representation with all the games equations (horizontally), the players (vertically) with the value 1 meaning that the player were on team A, -1 on team B, and 0 missing, the ratings, and the games result, as following :

2. The rating update

Obviously, the rating of each player should evolve with time for two reasons :

1- It’s almost impossible to immediately get the right rating representing a player, the algorithms have to iterate several times to converge to the closest possible value of his real level
2- Each player follows his own progression, it doesn’t make sense to work with final values

At game level

Here is a full example :

At player level

Once we know how each team has to be impacted, it’s time to go deeper and reevaluate each player (remember, the team rating should be equal to the sum of its player ratings). Once again there are two philosophies, either we decide to impact all the players equally, or we try to find a variable to make a difference between them.
To bring a little complexity, let’s think about how each player should be impacted.

For which player is it more probable that it is wrong ? I would say the player who has played least. For instance, if we organize a game with 9 really well known players and 1 new player, if the prediction is really far away from the final result, it’s really probable that the newly arrived player is causing this difference. As a consequence, he should be more impacted than others and take a greater part of the team modification, positively as negatively.

Here is a full example to understand it better :

I didn’t speak previously about another method to explore which could be to mix a defined weighting coefficient (λ who affects the distribution based on the number of played games) with feedback information given by each player about others. This solution is totally viable.

3. The initialisation

Everything that we saw previously was about the live usage of the solution, but you have to start somewhere before enjoying it. Basically two scenarios exist : either you don’t have any data and you’re starting from scratch, or you already have a big history of games and want to use it as a starting point.

From scratch

If you don’t have any data, we cannot just do magic. The easiest way to start quickly, have good results and converge as quickly as possible, is to try to define manually ratings for all the players and let the engine improve it. As defining a precise value is pretty hard, you can just split your players in 4 or 5 categories (from weak to strong) and then associate each category to a rating value. From this starting point you will be able to generate compositions quickly and the algorithms will learn faster, even if the categorization is not 100% correct, than if you consider everybody starting at the same level. You will probably be surprised about some players you thought were really strong but finally they didn’t have as much of an impact as you thought, or the same in the opposite case.

From existing data

The thing becomes more interesting if you already have some data. Trying to define the level of each player comes down to resolving a system of equations. If we want to be precise, we cannot really solve this system of equations which doesn’t have any solution, but we should rather try to find the best vector to optimize the distance between our previous matrices.

In my case, I tried to use genetic algorithms considering each player as a gene and the distance between matrices (in other words the global prediction error) as my selection criterion. It’s probably not the best way to solve this problem but it was really fun to implement.

4. API usage

To facilitate the daily usage of all these concepts, I uploaded my code on GitHub and created an API using Google Cloud Functions & Cloud Run. Check out this Swagger to try the two existing endpoints.

/compositions/generate

to generate one or several composition from players and known ratings

/games/evaluate

to evaluate after a game the rating modification that each player needs

Here is the documentation if you need more information.

5. Some use cases not supported by this API

If you think that you can use this model to predict real football game results and become rich betting all your money on them, you’re unfortunately wrong.

The strength of the project is to be able to learn quickly thanks to the regular player mixes into teams. If you want to apply this on real football where players almost never change clubs, or on other friendly leagues but with squads which are always the same, it won’t work well either because the algorithms will never have the opportunity to study deeper the level of each player.

6. Conclusion

Finally, I can say that I was really proud to have solved a real sport problem using some mathematical principles combined with a simple software development.

Here is the before/after result applying this system in our league during one season :

I really hope that some of you will be able to use it in your own organizations and have more fun in your practice. If you have any feedback or questions, feel free to contact me.

🙏🏼 If you enjoy reading this article, please consider giving it a few 👏👏👏

Follow our latest posts on Twitter and LinkedIn !

Learn more about tech products open to the community through and subscribe to our newsletter on http://developers.decathlon.com