Understanding Position Based Expected Threat(xT)

Ishdeep Chadha
7 min readJan 14, 2024

--

What are some of the popular metrics we see online to evaluate a player?Let’s say we have an attacking player for whom we want to summarize his/her performance over a season, what are some of the things one could see? Goals, Assists and ???

As most fans, when I started watching football I was always trying to compare players based on how many goals they score or whose assisting their teammates more or if they are a defender, whose giving more goal saving tackles and interceptions. But is that all we see in a football game? No. These are mostly just the end products of a long series of events happening at different parts of the field. For example, the line breaking passes, dribbles , interceptions and blocks.

So the next question arises, how do we value these actions? Here is when Expected Threat comes in. Let’s take a look at the definition first,

Expected Threat (xT) is calculated by laying a ‘value surface’ over a football pitch to divide it into zones, where each zone has a value assigned to it based on how likely a goal is to be scored from that zone. Players can then be credited for moving the ball from zone to zone.

The zones that are being referred in the definition can be visualized as from the picture below provided by Twelve Football,

Expected threat (xT) for different parts of the pitch. These show the probability of a goal being scored given that a team has possession at this point of the pitch. Assuming team is attacking from left to right

To understand this picture more clearly let’s try to calculate the value created from the pass shown in the video below by KDB to Leroy Sane (btw this is my all time favorite pass).

If we try to see point from which KDB makes the pass on the picture above we can approximately say the probability of scoring a goal from point A is in between 0.012 to 0.014 (these values will not be true for every match, as these were calculated for a specific Premier League season). Now if we see the point on the pitch where Sane received the ball before scoring, the probability increased to 0.07 to 0.08. Expected threat is basically the difference of probabilities of scoring a goal between the two events. So we can comprehend why this pass is so appreciated because it not only takes out 4–5 defenders out of the picture but it increases the probability of scoring by such a great extent.

This was just a novice explanation of how xT can be evaluated, if you want to understand how this metric is calculated by event-level data, I would highly recommend going through this article by Karun Singh.It was first introduced by him in 2018, and is arguably the most known possession value model in the industry. However, it only values actions that move the ball from zone to zone, such as passes and carries, excluding defensive actions and shots. It is also typically implemented on limited event data, ignoring factors such as whether the player was under pressure or not when making the action.

Case Study : Finding out players with highest xT

Now let’s play with some python code to figure out how we can assign xT to different parts of the pitch (as shown in figure above) and who are top 5 players who accumulated the highest xT for 2017/18 EPL Season ( I would have love to use a more recent data but as of now couldn’t find one).

Actions moving the ball

To calculate the Expected Threat we need actions that move the ball. First we filter them from the database. Then, we remove passes that ended out of the pitch. To make our calculations easier we create new columns with coordinates, one for each coordinate. Then, we plot the location of actions moving the ball on 2D histogram. Note that dribbling is also an action that moves the ball.

This shows the count of events (passes,dribbles etc that move the ball) per each bin. The pitch has been divided into 16 x 12 bins

Shots and Goals

To calculate the Expected Threat we also need shots and goals scored. First we filter them from the database. We also create new columns with the coordinates and plot their location. We store the number of shot and goal occurences in each bin in a 2D array as well.

This shows the count of goals and shots per each bin. The pitch has been divided into 16 x 12 bins

Move , Shot and Goal probability

We now need to calculate the probability of each moving action. To do so, we divide its number in each bin by the sum of moving actions and moves in that bin. We also need to calculate the probability of a shot in each area. Again, we divide its number in each bin by the sum of moving actions and shots in that bin. The next thing needed is the goal probability. It’s calculated here in a rather naive way — number of goals in this area divided by number of shots there. This is a simplified expected goals model.

Higher the gradient higher the chance of shooting,scoring or moving the ball from a particular bin

Transition matirices

For each of 192 sectors we need to calculate a transition matrix — a matrix of probabilities going from one zone to another one given that the ball was moved. First, we create another columns with the bin on the histogram that the event started and ended in. Then, we group the data by starting sector and count starts from each of them. As the next step, for each of the sectors we calculate the probability of transfering the ball from it to all 192 sectors on the pitch. given that the ball was moved. We do it as the division of events that went to the end sector by all events that started in the starting sector. As the last step, we vizualize the transition matrix for one of the middle zones on the pitch.

This shows the probability of which zone the ball is most likely to be played into if started from one of the middle zones.

Calculating Expected Threat matrix

We are now ready to calculate the Expected Threat. We do it by first calculating (probability of a shot)*(probability of a goal given a shot). This gives the probability of a goal being scored right away. This is the shoot_expected_payoff. We then add this to the move_expected_payoff, which is what the payoff (probability of a goal) will be if the player passes the ball. It is this which is the xT

By iterating this process 6 times, the xT gradually converges to its final value.

As we train the data for more number of moves or “passes,dribbles” we can visualize the xT values converge

5 players with highest xT

As the last step we want to find out which players who played more than 400 minutes scored the best in possesion-adjusted xT per 90.

Almost every name on this list is a world class and renowed midfielder playing for one of the top clubs in the world. But have you heard of the last name Jonjo Shelvey? No right. This is why I really like this metric a lot, this brings out underrated players who would most of not notice even if they were having the season of their life.

Limitations

This method of calculating Expected Threat which known as Position based xT has a huge assumption/limitation which we have ignored for the time being. Which is that we have to use assume that all actions happening on the pitch are memoryless or one action does not depend on all the previous actions. Which if we think about it is not entirely true in a football match, as it ignores factors such as if the player who played the pass from one position to another was under pressure by opponents or not. If I am making a simple 5 yard pass to my striker who is already in a good position with no opponent player attacking me my xT value would be the same as to the situation when I am playing the pass through two defenders, which does not seem right.

To counter this limitation, we have another method of calculating xT which is known as Action Based xT. Stay tuned for Part 2 of this series to understand more about action based metric.

Till then, adios!

References —

  1. Soccermatics by David Sumpter — https://soccermatics.readthedocs.io/en/latest/lesson4/xTPos.html
  2. https://soccermatics.readthedocs.io/en/latest/gallery/lesson4/plot_ExpectedThreat.html

--

--