ANALYTICS MADE EASY

Expected Goals: Can They Predict Future Goals?

Better than the team average goals

octosport.io
Geek Culture
Published in
6 min readApr 15, 2022

--

Expected goals have been introduced in football by Sam Green in 2012 and rapidly became a standard metric in football analytics. Many articles have shown how you can use expected goals¹²³ to judge player or team quality over a season but not much on how to use them for predictions.

The objective of this article is to fill that gap and show why expected goals can be considered a good predictive statistic. Using Poisson regression we show that expected goals are a better predictor than actual goals.

What expected goal is about?

When you read articles about expected goals there is often confusion between the probability of scoring and the expected goal measure. Let us clarify. Each time a team is given an opportunity to score we can associate a goal probability. Note that a probability is just a number between 0 and 1. It also means that not all goals and shoot attempts are equal: a goal scored can have a probability of 0 while a shot on the crossbar can have a probability near 1.

Af first example is Ricardo Nunes’s goal for 2021. The goalkeeper scored from his penalty area. That shot has a probability of almost 0 to turning into a goal.

Example of a goal with a probability of almost 0. Video from
Último Lance
.

On the other hand, there are goals missed that have a probability of almost one. A classic example is an open goal miss that will make you scream “how can he miss that!”

Example of a non-goal with a probability of almost 1. Video from Guardian Football.

In the expected goal theory, a goal scored always worth less than 1 and a goal missed always worth more than 0.

At the end of a football match, you can associate a probability to every scoring opportunity and goal the team had. Now under the assumption, that goal opportunities are independent⁴, the expected number of goals a team would score in a game is the sum of all these probabilities. This is the team's expected goal for that match, called xG in short.

When you watch football, you often think that team should have won or that team does not deserve the loss. Well, the expected goals will tell you exactly that. A team can have lost the game but won it if you look at the expected goal results and vice versa.

Goals are the number of points effectively scored by the team. Expected goal is the number of goals a team should have scored according to the data in that game.

How we calculate the probability is another story and it is beyond this article. The web is full of technical articles on this topic. See for instance this article or this blog.

But, what these articles lack is how to use expected goals to make predictions and what makes expected goals a valuable predictor of the team’s future matches. Let’s find out.

Showing that xG is a relevant predictor

Expected goals are descriptive statistics calculated at the end of the match. They just tell you the story, how could they predict anything? Also, they are associated with a specific fixture, lineups, and strategy. It is actually the same problem with a single-match result. But, when you start looking at the past team’s form or the average number of goals scored in the season, you can get a better idea of the team's strength.

For instance, you expect that a team that always wins will also win its next matches. This is the concept of looking at the past average results to predict the future. The good news is we can also do it with expected goals and actual goals. Then, use these averages to predict how many times a team would score in the future.

First, let’s do some algebra and build our predictor. We want to predict the number of goals a team can score using their past results. The time of the game is denoted by t, and the previous match times by t-k. For instance, t-2 would be two matches ago. Note that these statistics are computed before the match starts, so they are effectively a predictor of the match result at t. The predictor is easy, it is simply the past average of the K last matches of the team. We compute the statistics using goals (g) and expected goals (xG) as follows:

The predictor of future goals using past goals (g)
The predictor of future goals using past expected goals (xG)

For each match, we can calculate both predictors and use them to predict the number of goals that will be scored. Since the number of goals to predict is a positive integer we can assume it follows a Poisson distribution⁵. Then build a simple Poisson regression model to help us calibrate the prediction.

We prepared a dataset with both predictors for more than 10000 thousand matches across the globe for 5 years (it can be downloaded here). We will use the mean squared error loss to compare the prediction of two models:

  • The goal model will fit the goals (y) using the team’s past 10 goals average. We name this variable goal_average
  • The expected goal model will fit the goals (y) using the team’s past 10 expected goals average. We name this variable xG_average

In scikit-learn, we can easily do the regression using PoissonRegressor. As we do not want to use regularisation, do not forget to set alpha to 0.

from sklearn.linear_model import PoissonRegressorregressor = PoissonRegressor(alpha = 0.0)

The model's mean squared error will be estimated using repeatedKfold cross-validation. Again in scikit-learn, we do:

from sklearn.metrics import mean_squared_error# Define cv and scoring method
cv_method = RepeatedKFold(n_splits = 10, random_state=1256)
scoring='neg_mean_squared_error'
# evaluate model
goal_model_mse = cross_val_score(regressor, y, goal_average, scoring=scoring, cv=cv)
xG_model_mse = cross_val_score(regressor, y, xG_average, scoring=scoring, cv=cv)

A notebook that reproduces these results is available here. Executing the code will show you the following box plot.

It shows that both the mean and standard deviation of the cross-validated MSE are lower using the expected goal model. The average cross-validated MSE for the expected goal model is 1.35 against 1.38 for the goal model while the standard deviation is lower. This is mean the expected goal average is a better predictor with better confidence than actual goals.

The expected goal average is a better predictor of future goals than the goal average

Conclusion

In this article, we tried to fill the gap between the expected goal and the predictive modeling. Indeed expected goals are descriptive rather than predictive if you use them as they are delivered. The expected goals tell you what would have been the final score if the match was played again and again under the same distribution of shots.

But, the word “expected” is misleading because xG itself is not a predictor. However, using past statistics of xG shows that it is a better predictor of future scores compared to actual goals.

References

[1] M. Brechot and R. Flepp (2018), Dealing With Randomness in Match Outcomes: How to Rethink Performance Evaluation in European Club Football Using Expected Goals.
Working Papers 374, University of Zurich, Department of Business Administration (IBW).

[2] W. Spearman (2018), Beyond Expected Goals, MIT SLOAN Sports Analytics Conference.

[3] A. Rathke (2017), An Examination of Expected Goals and Shot Efficiency in Soccer, Journal of Human Sport and Exercise, 12(2), 514–529.

[4] It is obviously wrong. Say a player shoots in front of the goal, hit the woodwork, and another player scores on an open goal. Both xG are dependent.

[5] M. J. Moroney (1956), Facts From Figures, 3rd edition, Penguin Books, London.

--

--

octosport.io
Geek Culture

I am a data scientist writing about machine learning for football prediction at octosport.io.