Expected goals (xG) have very much led the way for data analytics into professional football and now also the mainstream media. Sky Sports and the BBC introduced this metric recently through the hugely popular television programmes Monday Night Football and Match of the Day. Stratagem Technologies have also made their data available which has resulted in a huge spike in the shear volume of analytics work in the public sphere. However, much of the metric’s use in the media and a lot of the public work has centred around single match xG scores and running xG plots. While it is great to see such a sizable increase in the popularity of analytics and metrics such as xG, now is a good time review some of its limitations for those less familiar with the metric.
The interpretation of single match xG scores has always interested me from the moment I read a great piece from Danny Page on this subject and variance. Recently Matt Rhein revisited this subject in another fantastic article which was ultimately what rekindled my interest in the subject. These pieces only really touch on variance however so I will try and bring everything together by discussing some of the other considerations.
By adding the probability of individual shots or xG together you can form an xG score for a particular game. However, as discussed previously by Danny and Matt, adding these independent probabilities completely misses the variance.
Danny does a great job explaining this in his article:
Here’s a simple example to explain what’s missing: Let’s pretend have Team Coin and Team Die. We are going to compare the “number of times the coin lands on heads” to the “number of times the die rolls six”, and compare them like we would a soccer or hockey match.
If you were to flip the coin four times, expected heads would equal 2. But we can observe anything from zero to four heads. Only stating xH = 2 would be leaving out a lot of information! Similarly, if we rolled a six-sided die 12 times, expected sixes would also equal 2. But we could have anywhere from zero to twelve rolls that would result in six.
So if we compared Team Coin vs Team Die, we could get any score from 0–0 to 4–12. Given 4 flips and 12 rolls, the expected scoreline of Team Coin 2–2 Team Die does not accurately portray this possibility space. So how do we better represent the result of Team Coin vs Team Die? We need to measure the variance of these results. Using an expected goals simulator, we can show that in addition to the expected score of 2–2:
In football terms, Team Coin essentially have four Opta “big chances” which are converted at something around 50% and assigned an xG value of 0.5. This gives a cumulative xG total of 2. A “big chance” would be assigned in a situation such as when a player is put through on goal and has a shot in a one against one the goalkeeper. Team Die on the other hand have had twelve shots at a probability of scoring at 16% or an xG value 0.16 again giving a total xG of 2. This is similar to your average attempt from around the penalty spot.
By accounting for variance using Danny’s simulator, Team A or Team Coin are actually the more likely winners as shown above, despite both sides having the same xG.
Although unlikely, it’s actually possible to be the more likely winners of a game despite having a lower cumulative xG score. With this in mind I encourage you to try Danny’s simulator and see this for yourself.
Variability of Individual Shots
Football is a deeply complex game with numerous actions happening simultaneously, many of which aren’t picked up in the data used to create xG models. These variables such as the quality of the pass to the shooter, the pressure on the ball or the position of the defenders in front of goal amongst others, all affect the actual probability of a shot going in. Over a big sample these variables tend to even out. For example, sometimes a forward will receive an easy pass and get a powerful shot away into the corner while on other occasions, the same player may struggle to hit the ball cleanly as the assisting pass is overhit or inaccurate. But over many attempts you would expect the striker to receive some excellent assisting passes and some poor ones, eventually averaging out someway in the middle. This makes xG a powerful measure of performance over a large sample. However, this also means that the xG of a shot can vary significantly to the actual chance of scoring over small samples like a single shot or even game.
In the example above, the cross is accurate and rolling nicely to Seferovic’s feet, the defender is stretching and struggling to get a block in, while the attempt is taking in close proximity to goal. You’d imagine this is a decent opportunity to score.
In the same game, Filipe Melo takes a shot from about twenty five metres from goal. He’s also being closed down quite quickly and the shot is unlikely to be converted.
While I unfortunately don’t have the exact xG numbers, thanks to the excellent 11tegen11 I have attached his running xG plot from the same game.
Seferovic’s goal can be observed right in the top corner of the graph while Filipe Melo’s long range strike is directly below on the pink line. To the naked eye, both attempts look like very small chances with an xG of something around 0.03. While goals from tight angles assisted from crosses are probably quite rare; this is likely down to crosses often coming into congested areas at speed, increasing the difficulty of the chance. On the other hand Seferovic’s opportunity looks quite the opposite as the ball is played nicely into his feet, and a far better opportunity than Filipe Melo’s long-range attempt.
In this scenario Marcano heads the ball back across goal finding Soares unmarked and goalkeeper left in a poor position to the side of the net. Soares gets a terrible header away but under little pressure with such a big target to hit as a result of the goalkeeper’s positioning, you would think this is a pretty good chance to score.
This attempt can be observed on 11tegen11’s plot just above the 60th minute mark and to the right in blue. Unfortunately again I don’t have the exact xG number but to the naked eye this looks like a shot with an xG of approximately 0.10, a fairly low figure under the circumstances.
Dangerous Phases of Play
Although non-shot xG models do exist, the most popular and proven models are shot based. This means dangerous pieces of play that don’t end in shots are not captured in xG models.
In this situation the goalkeeper is stranded outside his own box but makes a great block to prevent Fabricio going straight through on goal. However, if you think about this situation repeating again; you would imagine that Fabricio might score under similar circumstances. This is a dangerous phase of play that wouldn’t be captured by xG models due to the lack of a shot.
Like above, this is another dangerous phase of play that doesn’t result in a shot. Paulinho makes the terrible decision to attempt a backheel for a teammate rather than take a touch and get a shot away. On another occasion it’s highly likely that he gets a good shot away. Again, this is a dangerous situation that is not captured by most xG models.
Stratagem have attempted to solve some of these issues with their subjective approach to data collection but that brings some other limitations which I won’t discuss here.
While xG explains what has happened during a game better than traditional statistics and goals scored, there still are some significant limitations. Variance is something that should be accounted for when interpreting single game xG scores while the actual chance of scoring can vary significantly to the probabilities expressed by xG in shots or small samples. Furthermore, xG models don’t capture dangerous phases of play that don’t end in shots.
With these limitations in mind I would encourage people new to analytics and wishing to produce content to move away from single match xG plots. These graphs have been done to the death at this stage and more powerful applications of xG exist. Get creative with your content and reap the rewards.
Danny’s fantastic match simulator must be one of the most underutilised tools in the sports analytics and can be found here.
Thanks also to 11tegen11 for publishing single match xG plots free of charge on a regular basis.