“The ball is floated in from the left wing.
The striker gets to it on the volley and that is a wonderful goal!! Straight into the top corner, the keeper didn’t stand a chance!”
Let’s ascribe an arbitrary expected goal (xG) percentage of 22% to the shot described above. In fact we can go further and even fill out the factors taken into account by the model according to the BBC website:
- Distance from goal — 10 yards
- Angle of the shot — 0 degrees (in line with the middle of the goal)
- Did the chance fall at the player’s feet or was it a header? — The ball arrived just below the player’s chest
- Was it a one on one? — Yes
- What was the assist like? — A cross
- In what passage of play did it happen? — Open play
- Has the player just beaten an opponent? — No
- Is it a rebound? — No
Now what if I told you that all of the above actually described two different goals?
In one the player is running in on goal and volleys home. In the second the player is not facing the goal, takes a couple of strides towards where the cross is delivered and has to perform a bicycle kick to execute the shot.
All of a sudden it seems remarkably unfair that both shots have an identical xG rate. Running onto a cross and scoring is awesome but there is a clear advantage in being able to actually see the goal when you strike the ball and that is to say nothing of the athleticism involved in connecting with a bicycle kick.
This is, of course, only one example but it speaks to a wider problem of the statistic that seeks to predict how likely a goal is to be scored from a particular position on the pitch.
The use of analytics in sports can trace its origins back to the Moneyball revolution in baseball. As outlined by author Michael Lewis, the Oakland Athletics were able to succeed by utilising player statistics instead of relying solely on scouting reports. The book (and subsequent movie starring Brad Pitt) does well to show the backlash from the scouting community but in 2017 every baseball operation has a backroom staff dedicated to statistical analysis.
But baseball is very different to football.
Measuring a batter’s performance is no mean feat and still has to take into account a wide range of variables but the environment for each pitch is relatively controlled. The batter remains within a specific box next to home plate, he uses a regulation bat to try and hit a regulation ball that is thrown towards a (fairly) stable strike zone by a pitcher who is 60 feet 6 inches away. You get the idea.
In football there is much more noise going on. The 20 outfield players may all have a fairly stable position within a particular formation but their motivation within a passage of play can vary based on the circumstances and the position of the players around them. An attacker might only have one defender to beat before being afforded a clear chance on goal but if his team are 1–0 up in the 91st minute he won’t be blamed for heading towards the corner flag instead of potentially giving up possession in a bid for glory. In baseball a batter may take a walk, get a base hit or even hit a home run but the motivation is consistent: don’t make an out.
A more relevant example in relation to xG is defensive positioning. According to Patrick Lucey of STATS, a sports data firm, the proximity of a defender is taken into account but just how useful is this factor on its own? An opposition player may be 2 yards away when a shot is taken but there is a big difference between that player being a hulking defender bearing down on the striker compared to a tired midfielder desperately trying to get back to cover for his defence who have gone walkabout.
To broaden it out further think about this: what does it mean when a team or a player has a high xG but no actual goals to their name?
On the face of it this appears to show that they were especially wasteful but again the context is hugely important. It may well be that the other team was down to nine men but already had a 1–0 lead and so was seeking to park the proverbial bus. This afforded the team a greater number of chances (and perhaps even high xG chances) but the penalty area was more crowded than it usually would be based on the full dataset from which the xG model is produced. One single event in a game can skew the results.
This is not to say that xG is entirely without merit. When Leicester City surprised everyone by winning the Premier League title it was interesting to note that they conceded around ten goals less than they were expected to. Indeed when looking at defensive performances in general there is value. If your team has one of the higher ‘xG against’ rates in the league then it is likely that they need to look at the defensive positioning that is allowing these chances to be created. Overall though xG is little better than looking at the ‘total shots’ stat that has long been a feature of matchday broadcasts.
As someone who is a huge fan of statistics and the impact that they have had on sports from the baseball diamond to the tennis court I have by no means enjoyed writing this article. I would love to see an equivalent analytics revolution in football and when that day comes I promise to fight the dinosaurs one at a time. That day, however, is not today and there is a real concern that by giving the current iteration of xG legitimacy by putting it on Match of the Day it will actually prove to be a major setback for advanced statistics in football.