xG: Analytics in Football

Danny
4 min readOct 20, 2023

--

Chelsea against Arsenal is set to take place this weekend. The North London Derby. Two of the largest clubs. Always a crowd pleaser. But also, the first time Havertz matches up against his former club, a tenure which featured the glory days of the Champions League to periods where we genuinely had thoughts of relegation.

Reece James could potentially return from injury this weekend.

Chelsea’s journey this season has been odd to say the least. After enduring what many consider to be one of the club’s worst seasons and a somewhat lacklustre start to the 23–24 campaign, things are beginning to turn around. The recent additions of Moisés Caicedo and Cole Palmer are bearing fruit, and young talents from the academy are stepping up. Currently seated eleventh in the Premier League with three wins, two draws, and three losses, Chelsea finds solace in one metric: xG.

What is xG?

xG, short for Expected Goals, dives deeper into shot quality during a soccer match. Beyond merely counting shots, xG evaluates the likelihood of a shot resulting in a goal. The metric accounts for factors such as shot location, shooting method, assist type, and more.

For instance, in a game where one team had five shots on goal and the opposition had 20, it’s easy to deem the second team superior. Yet, if the first team’s five shots were from point-blank range with a 0.75 scoring likelihood, while the opposition’s were long-range efforts with lower chances, then the quality, not quantity, of shots matters. xG nuances this, revealing that sometimes fewer shots can be more potent.

One might wonder, “Why rely on xG when the actual scoreline is what matters?” While raw scores are essential, they don’t always reflect a team’s performance. A team might be creating numerous high-quality chances but failing to convert them due to sheer bad luck or a string of extraordinary saves by the opposition’s goalkeeper. Over time, these outliers balance out, and a team’s performance aligns more closely with its xG. This metric may also highlight Chelsea’s performance gaps compared to rivals.

Selecting your features for xG

To model xG, we first select features believed to predict the likelihood of a shot resulting in a goal. With data collection in football analytics becoming extremely advanced, the sky is the limit. As for now, let’s take these four features as a starting point:

  1. Shot Angle
  2. Type of Play
  3. Type of Assist
  4. Defensive Pressure

Leveraging Logistic Regression

Logistic regression is more than just linear regression fine-tuned for binary outcomes. While it leverages a similar foundation, it uniquely scales the outcome to lie between 0 and 1. This translates to: a value close to 0 indicates a very low probability of scoring, while a value nearing 1 implies a high likelihood of a goal.

What differentiates logistic regression is its logarithmic transformation. It uses the natural logarithm to map any input into an output that ranges between 0 and 1. This is termed the log-odds or the logit. The equation will look like:

Thus, the power of logistic regression doesn’t just lie in its predictive capability, but also in its interpretability. Each coefficient provides insights into the relationship between its corresponding predictor variable and the log odds of the outcome, allowing for nuanced analysis in domains like football analytics, medical diagnostics, and more. Infusing these variables, one can design a detailed logistic regression model to understand if a shot from 20 metres away the goals, at a fast pace, with a pass crossed from the side and no defensive pressure is likely to score or not.

Chelsea have in fact won every game based on xG so far (UnderStat.com)

So, how does this fair in the case of Chelsea. Chelsea have in fact won every match this season, comparing their xG against the opponents. That means that based on the quality of chances they created versus their opponents in each game, Chelsea should have come out victorious in all of their matches. The consistent positive xG demonstrates that Chelsea’s underlying performance has been promising. This, coupled with their talent and determination, offers a glimmer of hope that the team could turn their fortunes around and make a strong push for the rest of the season.

Conclusion

The world of soccer analytics isn’t limited to logistic regression. Advanced techniques such as Random Forests, Neural Networks, and Boosting Algorithms offer even more precise predictions. The xG methodology has ushered in metrics like Expected Assists (xA) and xBuildup, broadening the analytic horizon in soccer. Expected Value, once a term embraced by statisticians in finance or marketing, xG has made its way to the football pitch, proving the infinite possibilities of data science.

Finally, writing has always been a cherished activity for me. However, since high school, the opportunities to write freely have become increasingly rare, this “blog” offers me a bit of time to talk about things I’m passionate about paired with a bit of data. So, I hope to see you next time on this little “blog” and trust me my next post won’t be about sport.

As for Sunday? My prediction:

Chelsea 2, Arsenal 1.

The heart of London, it seems, still beats in Blue.

--

--