Explaining Expected Threat (xT) In Football Analytics Using Markov Models & Its History — Part I

Gaurav Krishnan
After The Full Time Whistle
6 min readMar 12, 2022

--

One of the most interesting upcoming areas of football analytics is Expected Threat or (xT). We’ve slowly become used to analytics data like xG(Expected Goals) and xA(Expected Assists) which allow us to predict and evaluate player performance.

What Is Expected Threat?

Expected Threat or xT evaluates the probability of scoring a goal based on the position of the player on the pitch and his probability of either shooting, passing or dribbling. So we evaluate an action i.e shooting, passing or dribbling, and how that action changes the probability of scoring.

To do that, every section of the football pitch is divided into quadrants and is assigned a value based on the probability of scoring based on that particular position on the pitch.

Expected threat (xT) for different parts of the pitch. These show the probability of a goal being scored given that a team has possession at this point of the pitch. Created by Jernej Fllsar at Twelve.

So of course areas inside the box have a higher probability of scoring as compared to midfield or the wings.

As David Sumpter, author of the book Soccermatics writes:

If a player makes a pass which moves the ball from a place where it is unlikely for their team to score, to a place where they are more likely to score, then they have increased the xT in favour of their team. In general, the nearer you get the ball to the goal the more likely your team is to score (although if you look carefully passes back to the goalkeeper are also valuable).

The Sarah Rudd Model

Although it seems like a newer revelation and concept, it was in fact invented by a woman named Sarah Rudd in 2011.

Rudd made the earliest xT model based on Markov Chains(As shown in the figure below)

Sarah Rudd’s first xT model in 2011

To make this prediction, because football is so random, we can only estimate the probability within ideally 1, 5 or 10 seconds. This is because there are so many possibilities beyond 10 seconds like in a minute or more.

So in Rudd’s model, this is 5 seconds into the future where scoring a goal from that position in the center outside the box marked as M is 5%, going to the wing marked as position W is 10%, playing the ball into the box marked as B is 20%, losing the ball marked as L is 40%, and lastly the probability of it staying in midfield is 25%

This sets up what is called a Markov Chain of probability, which basically means if the ball is moved out to the wing, at position W, you then start the whole analysis from position W, and then if it’s moved into the box at B you then begin from that point and so on.

To further explain Rudd’s model, she used a Transition Matrix.

Sarah Rudd’s Transition Matrix

Rudd marked the positions on the pitch to divide them into seven areas, as shown in the figure abov. 0 being the area where not much is likely to happen, 2 just outside the box, 1 and 3 on either side outside the box, 4 and 6 on the flanks and 5 inside the box.

She then used a Transition Matrix which looks at the probability of going between these areas like 2 to 5 or 2 to 6 or 6 to 5 or 4 to 5 etc, with the final transition being a goal or the end of possession(marked as 1 in the above matrix) which she estimated using Opta data to parameterize the model.

She could then evaluate players based on their actions that lead to a goal based on their xT.

Sarah Rudd’s player goal scoring probability value based on transitions

So if player 1 moves the ball to player 2 in another zone, as shown in the above figure and the probability of scoring reduces from 0.25 to 0.17 player 1 gets a negative value of -0.08. But if player 2 makes a pass to player 3 increasing the probability of scoring from 0.17 to 0.28 he gets a positive +0.11 value. And finally if player 3 scores a goal, he gets a value of +0.72.

This was the basic model invented by Rudd.

Rudd gave a talk about this model all the way back in 2011, and was signed by Arsenal as the head of analytics after it and continues working there.

Karun Singh’s Model

Karun Singh, an analyst from the US, had another model, which he dubbed as xT and posted in the public arena on his blog in 2018.

Singh’s model broke the pitch into finer granularity and he hypothesized an equation to calculate it.

There’s some math involved in this which I will break down.

Karun Singh’s xT model

As shown in the figure above, V(x,y) is the value of each quadrant or zone (x,y) the x-axis being the length of the pitch from goal to goal as a 2D map and y being the y-axis or the breadth of the pitch in a 2D map.

s(x,y) is the probability of a shot and g(x,y) is the probability of a goal.

So V(x,y) = Either a shot or goal or a shot probability multiplied by a goal probability i.e. s(x,y) X g(x,y) while m(x,y) is the probability of moving the ball and the transition probability T from a point i.e T (x,y) to a new position (z,w). The latter of which are passes or dribbles. And the value at the next new position i.e. V(z,w).

So V(x,y) gives the value of the old area where the ball was and V(z,w) gives the value of the new area where the ball goes, and then this repeats.

All the data for these parameters are available on Opta, Statsbomb or other data companies and you then solve for the values or V.

So to make it sound a bit cooler, V(x,y) was dubbed as xT(x,y) or Expected Threat by Karun Singh in 2018.

You can find interactive dynamic visual charts (although they’re made in 2018) for xT for each Premier League team that season.

Karun has done a fantastic job theorizing and coining the term xT, the only problem is that he’s an Arsenal fan ;)

You can check Singh’s blog post by clicking the following link: https://karun.in/blog/expected-threat.html

You can also view this video made by the YouTube channel ‘Friends Of Tracking’ for a more in-depth explanation of xT and its history.

I must pause to quote David Sumpter and as he says, we must acknowledge Sarah Rudd’s foresight and insight in creating this model for football analytics in the male-dominated area of football analytics.

He writes:

It is extra important that when we have a clear example of an idea from a female scientist in a male-dominated area, which is now used everywhere, that we pause to make sure everyone knows where it came from. There is a history of womens’ contributions being forgotten in Science. It would be embarrassing if we made this same error in the so-called modern era, especially in football.

That being said, I hope my explanation has helped break down xT into a simplified way to understand it even if you’re not the best at math.

There are far more advanced ways of calculating xT that have evolved since Rudd’s talk in 2011.

I will go into more depth in my next post about possession chains & how ball movement contributes to xT. That essentially means that xT is determined by how the ball is moved, rather than where the ball is.

But for now this is the basic explanation of Expected Threat or xT.

You can follow me on my Twitter accounts below:

  1. Personal: https://twitter.com/gaurav_krishnan
  2. Football Analytics: https://twitter.com/statocastgaurav
  3. Artist/Music: https://twitter.com/ghost_intent

If this article helped you or inspired you, you can buy me a coffee by clicking the link below!

--

--

Gaurav Krishnan
After The Full Time Whistle

Writer / Journalist | Musician | Composer | Music, Football, Film & Writing keep me going | Sapere Aude: “Dare To Know”| https://gauravkrishnan.space/