Forecasting the 2017–18 La Liga Season

Simulating the fates of Messi and Ronaldo

Alexander Powell
Kenyon College Sports Analytics
4 min readFeb 25, 2018

--

Soccer is typically an afterthought in the American sports landscape unless its World Cup season or the highly popular Champion’s League elimination stages each spring. But, Americans miss the domestic league races that contain some of the best cultural rivalries. The Spanish premier league, La Liga, contains not only two of the game’s greatest players, Ronaldo and Messi, two of the world’s most historic and dominant clubs, Real Madrid and Barcelona, but an exquisite style of soccer.

Soccer, or football if you will, is a game of unique pace and incredible geometry. But, scoring in soccer gives the sport some of the properties that make statistical modeling of a league like La Liga great appeal. It has been well noted that the scoring distribution in a soccer match follows a Poisson distribution and that, even more so than other sports, teams are more successful at home with the backing of their loyal fanáticos.

Thus, simply analyzing how each team performs offensively and defensively, we can simulate the results of the La Liga title race with a little more nuance. Because European soccer results are highly correlated with the results of the season prior, the Monte Carlo simulation uses a mixture model to predict each team’s offensive and defensive prowess, weighting last season’s statistics less and less each match day. Teams are also rewarded in each simulation for a good “run of form” — — team’s statistics in the model are not fixed, but team’s can go on a run, thus, allowing for natural volatility.

But how can a model that uses only publicly available information withstand against forecasts that use expensive data from player tracking companies like Opta? Forecasting the 2016–17 season, in which the Galácticos of Real Madrid won the La Liga title, our model’s preseason forecasts correlated with the final results with a correlation just slightly below the seemingly more robust FiveThirtyEight model’s correlation of 0.85.

Our 2016–17 La Liga forecast (left) and FiveThirtyEight’s forecast (right) compared to the actual points accumulated by each team (x-axis). Line is the x=y line.

While this Poisson simulation model fails to incorporate information beyond goals scored, such as expected goals scored, chances created, etc., its only major limitation is that it appears to be slightly more conservative. Our model clusters more teams towards the middle of the table, rather than giving more weight to the typical league powers of Real and Barca. The strength of our model is its logical simplicity such that it can achieve similarly low error to models that incorporate every intricacy of each game.

In order to predict, which of the Spanish league superpowers would win the 2017–18 La Liga title, I increased the weight placed on the 2016–17 season’s statistics such that instead of being a diminishing factor through the 10th match day of the season, they were a diminishing factor, albeit small, through the course of the La Liga schedule. This enables the model to be somewhat less conservative and, in turn, more realistic.

The model has shown good predictive power since the early parts of the season (see early model results here) when Barcelona and Valencia opened the season on a tear, and continues to appear in line with the thinking of other models and expert opinions. The model not only predicts the point totals for each team (in European soccer teams receive one point for a draw and three points for a win), but the probabilities a team will win the league title, be relegated to the second tier of Spanish soccer (the bottom three teams are relegated each season), or qualify for the UEFA Champions League next season (the top four teams qualify for the UCL in La Liga). Here are the current forecasts for the 2017–18 La Liga season:

La Liga forecast — updated 2/23/18

While Barcelona is expected to continue their dominance throughout the course of the spring, the model also projects the current top four to likely qualify for the Champions league with a possible contest from Sevilla and Villareal. For Real, Barca, and Atletico this is par for the course, but a club like Valencia is ahead of schedule in their rebuild to the run of success they had at the turn of the century. While there are a number of clubs clustered in the middle of the table, it is the relegation battle that shows the most competition. With five clubs holding a greater than one in four chance to be relegated, clubs will be desperate to not drop points in remaining league games.

As Americans turn on their televisions this spring, don’t forget the beauty of Spanish soccer. Not only will stars like Lionel Messi impress you, or the ability for fans to wave flags for 90 straight minutes exhaust you, but you might just be enthralled with the beautiful game that you see.

Alexander Powell is a senior Mathematics/Statistics major. You can email him at aepowell95@gmail.com and find his code for this project and others at his github page: https://github.com/powellae.

--

--