Rating Performances in Competitive Environment

9 min readJan 9, 2022

First off, why?

Why would people want to rate things in competitive disciplines? There can be different usages. The most known ones to me are the ones that are presented to the public during games. For example, in soccer, we always have the score indicated (which is normal), but we also sometimes get ball possession, on target shots, number of passes etc.

From an external perspective, it is easy to consider those metrics as indicators of what team is having the better game, which player is having the biggest impact.

In esports, this idea has been also used and way more than in traditionnal sports. Indeed, when it was difficult to compute all of the informations from live footages (position of the ball, who has it etc), it was never a problem in esports. This makes esports players always surrounded with statistics, from the live show they are watching (such as League European Championship, LEC or Overwatch League) to their end-game screen, where they have access to their game summaries.

On League of Legends, Riot Games API is public. Which means that everyone can have access to everyone’s statistics all the time. Based on this, several third-party websites developed in order to provide players statistics and indicators about their gameplay (op.gg, LeagueOfGraphs or even Probuild).

Therefore, we could conclude that it would be natural that players in esports would actually use and take into account a lot of statistics, in order to improve their gameplays and to explore and find out new strategies.

But in fact, we’ll see that this is not really the case in practice.

What do we do currently?

For a long time, people have been using several indicators to rate themselves and others on League of Legends. Recently, the most common ways to tell if your teammate, yourself or your opponent during a soloQ game is doing well is by looking at its KDA. Who never heard this “you’re 2/7, you’re bad”.

Ultimately, people know that this take is untrue, but regardless they will use it at their advantage. When they want to tell they are doing a good game, they will use their KDA as an argument if it’s positive, say it’s not their fault if their KDA is bad and proceed to find external justifications : my jungler gave kills to my opponent, he baited me…

Among the players, those justifications can be true or just someone willing to make up for its mistakes depending on the situation. However, what can we say about the KDA metric? Is it actually good? Should we still consider it? Change it? Is it still relevant to a certain level?

Soccer and other traditionnal sports are also concerned by this issue. If we take the number of goals of each striker in a given league (let’s take the Ligue 1), can we say that a striker in PSG is better than a striker in ASSE because he scores more?

Yes, probably. But would the PSG striker score as much if he was playing in ASSE? If the team is way worse, he certainly wouldn’t have as much chances, and since he would be consider an ace it would be much more difficult for him to create situation on it’s own. When he plays in Paris, there are 10 other players on the field that could match him in terms of skills.

Therefore, what to do with the metrics Goals Scored? Is it really relevant to judge a striker by this number? Probably not completely, even though it can be used as a bias to check a fact. Indeed, we can consider that a good striker must score some goals at some point, so a striker with a low goal number probably isn’t that good (that could be false, that is why we use this as a bias).

And that’s the same thing for KDA : it’s a bias. A player that has a good KDA is more likely to be doing a good game than someone that has bad one. And even though this can be false, it helps to make up one’s own opinion.

So why do we see those numbers all the time?

For someone who watches sports games or esports competitions, this numbers are there all the time. It’s for a simple reason : those are convenient to use. You can understand easily what it is, and to the average viewer, it will be relevant. And you can’t really do otherwise in that situation, because the same average viewer wouldn’t really understand something more complex.

For example, in 2021, Blizzard’s Overwatch League partnered with IBM Watson to provide players rating, calculated by an Artificial Intelligence. IBM did soimething similar for Wimbledon Tournament, but since I don’t have knowledge on this I won’t detail more.

About the Overwatch League power rankings, you can find details about how it works here.

Despite describing it as a complex AI, it’s in fact only a simple correlation calculus between the outcome of the game and 360 datapoints. Then, using this correlations has coefficients, Watson can rate every player performance and compare it to other players playing the same champion. They simply redo this calculation every week on the new data.

I have developed a similar system on League of Legends competitive data (I didn’t know that Watson worked like this at this time). When it comes to rating players, this system is in fact a bit flawed, but can vastly be improved and represents one of the best system currently available.

Current systems are flawed

First off, this algorithm is indexed on champions played. Which, to me, is an issue. When a player is playing a bad champion, he will get higher grades if his impact is just average. This can be a big issue for players with atypic champion pools like Lider (midlaner for Vitality in 2021). Lider was, to most people, an average midlaner, who had the particularity to be coinflip (he could be very good or very bad). However, my model would rank him 2nd best mid in Europe, which was really weird given that players like Humanoid, Caps, Nisqy or even Vetheo were much more recognized for their performances.

The model would rank him high because he was playing assassins like Akali which were really bad at the time. But since his impact was average, he was higher than the middle point Akali players would actually get. However, we can easily see possible turn arounds, like adjusting depending on the champion performances which would help in this kind of cases.

Another problem with this system is that it heavily favors players that win games over those who loose. People that play team games know this : sometimes, you’ll loose, and there is nothing you could do about it. You didn’t really do any mistakes, you played your game correctly, you made plays when you could, but in the end it just doesn’t work. It can be for multiple reasons : your teammates played badly, the team didn’t work well together, the draft was heavily unfavored…

So within this system, you can’t be a good player in a bad team, and you can’t really be a bad player in a good team.

I’m not a specialist of Overwatch by any mean, and it’s difficult to me to compare one player to another. However, in League of Legends, I can cite a lot of examples that go against this concept (Patrik in XL, Teddy when he was in JAG, Broken Blade and many more).

The problem that a ranking/rating system is that it is supposed to create some sort of consensus. Sometimes, such a system can highly rate a player that people see as average or rate poorly a player that people rate as good. The issue is, when your system doesn’t create any form of consensus nor is considered approximately accurate, this system becomes useless in the eyes of the spectators and actors of the League.

On League of Legends, the only attempt at creating a similar model I’ve seen is Reha’s rating on the french broadcast. Other than this, most data-providers on competitions just use the bias we saw earlier in raw values, which isn’t really useful.

To me, it feels really weird that with so much data available in esports games, competitive organizations still use few to no data for their recruitment. Those are investments of dozens (if not hundreds) thousand euros/dollars, all of which is decided almost solely by managers and owners, sometimes coaches.

But you might think that if no model known could properly rate players, then it might simply not be possible? In fact the games might just be too complex to be modelized under data models.

I don’t believe this take is true. I just believe that the organizations lack manpower to crack those issues. A recent example of this is Brentford FC, who recently went from 4th division to Premier League, mostly using data-analytics to make proper player investments.

What can be improved then?

In this section I will specifically talk about League of Legends because it is where I specialize. But I believe some ideas can be applied for other games.

First off, the data points used can be improved. League of Legends is a game that is based on differential players can make with their opponents (in golds, damage, sustain, cs, kills…). Therefore, those data points should be relative (for example gold% of a player compared to the whole ennemy team) and they also should cover every aspect of the game. An issue of this is that some elements of gameplay might be overrated but the results will end up being more accurate.

Also, rating a single performance is something that can be used very differently. If you can rate a single game, you can rate the impact of a build, a rune, a matchup or a combo of champions.

Finally, if you can rate a single game indexed on champion, you can rate the impact of all champions relatively. This model can be really powerful, but is also heavily linked to specific metagame, which is why the whole data should be patch-dependent. I already did a model like that, and it really helped to predict what the metagame in worlds playoffs would be while only looking at the group phase (rise of Aphelios/Ryze for example). Applied to soloQ and with proper adjustments (like adjusting the value of some champions that are historically good in soloQ but bad in competitive), I believe those algorithms can save up a lot of precious test time at the beginning of each patch to focus directly on what is important.

The next step in my work is to find a way to rate the actual level of a game. An issue with my model (and with IBM Watson’s) is that it is solely based on differentials made. So if a team in 3rd division stomps another, the ratings will be on the same scale as if it was World’s Finals. The objective then is to know what level corresponds to each game, but it can be difficult to know because of the disparities between the leagues (which is why some competitions like Worlds, EUM or Champion’s League are thrilling to follow, since the results are unpredictable).

Also, in the case of franchise leagues like LEC, how can one tell if the last team in LEC is better than the best ERL team? How do we know for sure the level difference between the last in Ligue 1 and the first in Ligue 2? It’s a difficult problem, but I believe Deep-Learning can help on those regards.

Conclusion

To me, data-science models will be what will drive player recruitments in a few years, especially for middle-rate organizations. In fact, when you do not win entire competition like Champion’s League or LEC, it is difficult to get money outside of fundraising, and not every club can do this. A way to get money is to recruit players with potential, get them to an higher level, and finally sell them for more money (Rogue, MAD Lions, SK Gaming….), but at the same time organizations almost abandon the goal to win it all. However, today this is mostly done on decider’s guts rather than proper analytics like investors could do in finance.

With so much money at stake, I don’t believe the current system will hold for long. If a team doesn’t succeed its branding (which is based on getting victories), it is difficult for them to actually get money in fundraising. And if an organization can’t get money, they can hardly justify investing high amounts of money at players based on some people guts, especially since players salaries and buyouts are reaching sky-high amounts.

There is a real potential in statistical modelling applied to performance, and I believe those methods aren’t use as much as they should. IBM Watson only showed us a glimpse of what it is capable of, and it was pretty accurate. And Watson Power Ranking certainly wasn’t done by a team that was expert in Overwatch, so it was a difficult issue to figure out if they were going in the right direction while working on the subject. I believe that actual data-scientist with deep knowledge on League of Legends could do much more than what was done.

If you read this until the end, congratulations! If you want to discuss those subjects with me (whether you agree or not), feel free to message me on Twitter.

I plan to release more blog posts on my works, so if you like technical content like machine learning applied to esports, feel free to follow (@lol_Xenesis on Twitter)!