How I Simulated the UEFA Europa League Playoffs

And predicted 12 out of 15 games

Juan Ramirez
ILLUMINATION
9 min readAug 24, 2020

--

Source: UEFA official website

Every year, teams that are part of the Union of European Football Association (UEFA) partake in either one of the two most important tournaments in European football, the UEFA Champions League (UCL) and the UEFA Europa League (UEL).

The UEFA Champions League is contested by top-division European clubs that either won their respective domestic leagues or placed among the top teams in the league. While each league has a different amount of spots reserved in the UCL, most top leagues usually send their top 4 teams to compete in the UCL, with some other top teams having to face some qualification matches beforehand.

While most teams seek the glory and reputation that winning the UCL brings, there is another highly competitive tournament taking place at the same time, the UEFA Europa League. In order to qualify for the UEL, teams have to either:

  1. Place among some of the top places in their domestic leagues, with each league having a different amount of spots reserved and qualification requirements.
  2. Win their UEL qualification matches to clinch a spot in the tournament’s group stage.

Moreover, teams that qualified to the UEFA Champions League but did not make it to the playoffs are also secured a place in the playoffs for the UEFA Europa League automatically.

While the UEFA Champions League seems to define who is the best team among the best leagues in the world, the UEFA Europa League is usually used as a way to find who is the “Best of the rest”. However, while the UCL has been won by the same 5 teams in the last 10 years with the same 4 teams usually making it to the semifinals, the UEL is usually disputed by a far larger variety of teams.

Due to the fact that the UEL features teams that are not usually in the European football spotlight for the world to see, I wanted to see if I could use some on-field analytics as well as other types of analytics to accurately predict the outcome of the playoff series for the 2020 Europa League. The results? I was able to predict the winner of 12 out of the 15 playoff matches, including Sevilla winning the tournament as well as some other upsets. Here is how I did it.

Background

In a previous article, I went into detail on how I performed a similar simulation of the last match day of the Premier League and predicted 70% of the games. One of the conclusions drawn from that prediction model was that we could use teams’ scoring ratios (Goals For/Goals Against) to accurately predict the outcome of a match 50% of the time.

While this conclusion matches the ones from different models that use scoring ratios to predict the outcomes of games, I wanted to take this a step further to see if the accuracy level of a scoring-ratio predictive model could be increased.

Since then, I took upon the task of optimizing the old prediction model and incorporate it to the playoff series of the UEFA Europa League. This new approach would take into account new external variables previously ignored by the old model such as home and away advantage, match-ups between teams from different leagues, and playing games on the international stage. The result? I was able to tune the model to predict games 80% of the time.

Data Recollection

The first big change for this model was the type and quality of data that was acquired. Previously, the model would use a team’s scoring ratio from the whole season without keeping a separate record for home and away games. While this worked previously, I hypothesized that if broke down this ratio into a Home Scoring Ratio and an Away Scoring Ratio, we could take into account a team’s performance when playing under different sets of circumstances.

This adjustment would allow me to see if there was any clear trend among teams and compare how each of them performed when playing with a home advantage or away. For example, Istanbul Başakşehir had one of the best Home Scoring Ratios, but when it came to away records, they had the 3rd worst ratio.

Another adjustment that needed to be taken into account was the fact that this games were between teams from different leagues, and therefore had to take into account the different variables this new condition brought to the table. For instance, Rangers FC went undefeated in the last 5 games in the Scottish Premier League while Wolfsburg has 2 losses in the last 5 games in the German Bundesliga. However, one could argue that the Bundesliga is a far more competitive league than the Scottish one, with some of the best teams and players of the world.

Therefore, an argument could be made that these teams are in different playing grounds and using things like Domestic Scoring Ratios or Domestic Winning Records would be an unfair comparison. Therefore, in order to adjust the model to take this difference on degree of difficulty into account, I had to change the source of the data I was collecting.

In order to take these new variables into account, I decided to use teams’ International Scoring Ratios and International Winning Records for the model. These ratios were derived from the teams’ performances during any match against an international opponent in 2019 and 2020 in either the UEL or the UCL. These new data points were a more accurate representation of the situations teams were going to face on the UEL playoffs.

With all of these new variables taken into account, I simulated the remaining of the UEL after its restart. Here is how the simulation played out:

UEL Bracket simulated based on Scoring Ratios. Green numbers stand for correct predictions and Red numbers stand for wrong predictions

Round of 16

This round seemed to be the trickiest one. Not only was it the first international round of matches for most teams after the pandemic, but it also had a lot of moving pieces. Since the pandemic struck right in the middle of the round of 16’s first leg, some teams had already played their first games of the series and others had not.

Therefore, teams who had already played a game would have to finish their series in the respective countries while series that had not started before the pandemic were going to be defined in a single-elimination match in Germany. For instance, Manchester United and LASK still played the second leg of their series in England while Inter Milan and Getafe played their only leg in Germany.

Out of the 8 matches, the new model predicted the outcome of 7 of those games and the old model predicted only 4 games correctly. While this is already an improvement, I wanted to analyze the team that managed to beat the model more than once: Shaktar Donetsk.

Shaktar won their first leg against Wolfsburg and had to play their second game in their home turf, Ukraine. Shaktar’s home scoring ratio against international teams was only .4 when compared to Wolfsburg’s 2.3 scoring ratio on international away games. What should have been a walk in the park for Wolfsburg according to the model, actually ended in a 3–0 win for Shaktar.

One thing that could explain this if the fact that they came into the 2nd leg of the series with an advantage. Since they won the first leg 2–1, Wolfsburg would have to score at least 2 goals in order to make it to the next round. This scenario could mean that Wolfsburg would play under a different kind of pressure than in previous games and therefore change their whole tactics and play style. This was actually evident during the game, with Wolfsburg having a higher percentage of ball possession and attacking passes than their average. This was not enough to break Shaktar’s defence and their ability to capitalize in mistakes and counter-attacks seemed to be the defining factor of the series.

While I thought that this specific scenario could explain the mistake in the simulation, Shaktar would beat the model once again in the next round.

Quarter Finals

Source: dailymail.co.uk

This time, the new model accurately predicted 3 out of the 4 matches with Shaktar once again beating the model. This time, all 4 games took place in Germany and were a single-elimination match, meaning that there was no advantage coming into the match.

Shaktar would face the Swiss champions FC Basel in this round. They had a 1.2 scoring ratio and a 3 scoring ratio respectively. So, how did Shaktar beat one of the teams with the highest international scoring ratio on away matches in the tournament?

At the end of the day, it came to individual players’ performances as well as other external forces. Shaktar’s team was showing a really solid style of football coming into the series and with Junior Moraes in a scoring streak of 7 goals in the last 8 games.

What this showed me was what I will refer to as “Bubbles” in the model. Shaktar beating the model twice in a row meant that they were performing way better than expected. However, this also meant that there was a bubble forming around the team that could pop at any moment. See the bubble popping as “Bad luck catching up to them” and in this scenario, Inter Milan was a nail coming straight to them.

Semi-Finals

Source: Marca.com

Technically, the model only predicted one of the two games in the semi-finals, with Shaktar again being involved in the upset. However, this time it was not Shaktar who beat the model, but Inter Milan.

Coming into the semis, Shaktar had a 1.9 scoring ratio to Inter’s 1.7 scoring ratio, yet they fell to the Italian side to a whopping 5–0. Was this a surprise? Based on scoring ratios alone, sure. However, the model had also showed that Shaktar was under a bubble that would pop at any minute. Moreover, Inter Milan’s scoring ratio was increasing after every game by an average of 30% (2nd best in the playoffs). Therefore, putting these two together, one could see the inevitability of Shaktar’s loss in the semifinals.

There was a really important conclusion that came from the Shaktar Bubble that I had not thought about before: While the model alone can help predict a good amount of games, it can also find teams who are over-performing and are therefore bound to lose in the near future. This new conclusion can be used to by sports gamblers to see which teams are the most likely to lose their winning streak sooner.

Final

Source: eluniversal.com.mx

The UEL final was between Italy’s Inter Milan and Sevilla from Spain. Coming into the game, most sports networks, analysts, and betting websites saw Inter Milan as a clear favourite to take the cup. However, the model said otherwise.

Sevilla came into the final with a 2.8 Scoring Ratio after beating teams like Wolfsburg and Manchester United. On the other hand, Inter Milan came into the game with a 2.4 scoring ratio after beating Shaktar 5–0. Not only did the model favoured the Sevilla over Inter on scoring ratio alone, but Inter came into the final after beating a team with a higher scoring ratio. This meant that Inter had beaten the model before and therefore, were playing under a bubble. And if the new model has taught us anything before, this meant that:

  1. Bad Luck is bound to catch up
  2. All bubbles are bound to pop

Not only did the model predicted Sevilla’s win on paper, but it also displayed the circumstances under Inter Milan was playing. The result? Sevilla pulled out a “Surprising Upset” over Milan with a 3–2 final score.

So, not only did the my model go from 50% accuracy to 80% accuracy, but it also demonstrated that “wrong predictions” are not necessarily a bad thing. This new model can see which teams are over-performing and find bubbles in the tournament that are bound to pop at any minute. While there is still a lot to learn throughout my journey into sports analytics, this new model has taught me a lot of new things and is definitely a step in the right direction.

--

--

Juan Ramirez
ILLUMINATION

Business Student at Canada || Sports and Esports enthusiast || Writer for The Strangers Almanac