How to Calculate and Plot Football Match Momentum Using Event Data?

Aleks Kapich
10 min readAug 3, 2024

--

Match momentum is one of the most essential visualizations for match analysis. That’s a great way to add context to the goals scored throughout the game.

In this article, I’ll walk through the steps necessary for calculating match momentum using event data. Our basis will be Expected Threat (xT) metric — if You’re not familiar with xT, don’t worry, I’ll get it covered. We’re going to derive our methodology from Opta Analyst — if You’re interested in football analytics I’m sure at some point You must have come across their match momentum graphics. As in my previous post, we will make use of free StatsBomb match event data and at the end of this piece, there awaits a Python function ready to copy and use in Your projects.

First, let’s check out how Opta approaches the matter of match momentum. If You’re willing to dive deeper, here’s their full article, but we will just take a peek at their methodology.

What Opta does is bin the match by each minute and compare the most dangerous effort of each team in terms of increasing the probability of scoring a goal. Certainly, the more recent the action is, the greater weight it gets assigned. As You probably noticed, Opta uses the term possession value.

According to their definition: possession value (PV) framework measures the probability that the team in possession will score during the next 10 seconds, and can assign credit to individual players based on their positive and negative contributions on the ball.

Speaking briefly, PV is meant to assess whether actions that players take with the ball, such as passes or carries, increase the team’s chances of scoring. PV metric happens to be very similar (or one could say, equal) to Expected Threat. With xT modeling, we divide the pitch into zones and each zone gets assigned a value. The greater the value, the more likely a goal is to be scored from that zone or the ball is to be passed into an even more dangerous zone. If a player carries/passes the ball between zones, his xT contribution is calculated as a difference between xT values for these two zones. With that simple model, we give credit solely for actions that move the ball from one zone to another, but that’s enough to get an idea of which side dominates the match. Expected Threat details are not the main point of this article, so if You’re willing to learn more, I recommend this great article from Karun Singh.

Example xT grid plotted on the pitch — this one was originally created by Jernej Flisar at Twelve. For various models the values for each zone may differ, it all depends on the data on which the model was trained.

The file with the same grid as displayed above is available here.

Let’s ponder Opta’s methodology more. We’re not going to follow their approach to the letter, but there are some key points worth noting.

Why does it make sense to consider only the maximum xT value from each bin? Imagine a situation in which one team builds up their attack and exchanges multiple passes in front of the opponent’s penalty area. After a long sequence of passes, their rivals recover the ball and perform a dynamic counter-attack, carrying the ball from one box to another. In total, accumulated xT from multiple passes of the first team may be greater than xT from one single carry of their opponents, nevertheless, it doesn’t mean the first team generated more threat.

Setting a cap for xT values is a good idea as well. Let’s consider a goalkeeper starting from a goal kick. If the keeper distributes the ball long, such pass is likely to get very high xT assigned despite not being such a threat to the opponents. Since Expected Threat modeling has its limits at the cost of its simplicity, we need to perform tricks such as xT clipping in order to obtain more reliable match momentum.

As we have theoretical background more or less settled, let’s focus on coding.

Those familiar with how to fetch the StatsBomb data in Python may skip this section. We are going to work on data from 2018/2019 La Liga match between Espanyol and FC Barcelona.

from statsbombpy import sb
sb.competitions().query('competition_name == "La Liga" & season_name == "2018/2019"')
We obtain ID for competition and season
sb.matches(competition_id=11, season_id=4).query('home_team == "Espanyol"')
Now we search for match ID

Now, having the necessary ID values we can fetch the match event data.

MATCH_ID = 16086
HOME_TEAM, AWAY_TEAM = 'Espanyol', 'Barcelona'
df = sb.events(match_id=MATCH_ID)

The first step will be to import values from our xT model along with key libraries. Expected Threat grid file is available in one of my GitHub repos here.

import pandas as pd
import numpy as np

xT = pd.read_csv("https://raw.githubusercontent.com/AKapich/WorldCup_App/main/app/xT_Grid.csv", header=None)
xT = np.array(xT)
xT_rows, xT_cols = xT.shape

Among all match events, we’re only interested in passes and carries as they involve moving the ball between pitch zones. In our data frame, there are columns location, pass_end_location, and carry_end_location that will help us determine xT for each action. Below I’m going to put a universal function for calculation and break it down with an example of passes made by players. Later, we will use the function to obtain xT from passes and carries and concatenate our results.

def get_xT(df, event_type):
df = df[df['type']==event_type]

df['x'], df['y'] = zip(*df['location'])
df['end_x'], df['end_y'] = zip(*df[f'{event_type.lower()}_end_location'])

df[f'start_x_bin'] = pd.cut(df['x'], bins=xT_cols, labels=False)
df[f'start_y_bin'] = pd.cut(df['y'], bins=xT_rows, labels=False)
df[f'end_x_bin'] = pd.cut(df['end_x'], bins=xT_cols, labels=False)
df[f'end_y_bin'] = pd.cut(df['end_x'], bins=xT_rows, labels=False)
df['start_zone_value'] = df[[f'start_x_bin', f'start_y_bin']].apply(lambda z: xT[z[1]][z[0]], axis=1)
df['end_zone_value'] = df[[f'end_x_bin', f'end_y_bin']].apply(lambda z: xT[z[1]][z[0]], axis=1)
df['xT'] = df['end_zone_value']-df['start_zone_value']

return df[['xT', 'minute', 'second', 'team', 'type']]

Let’s create an auxiliary data frame so that You can analyze better what the function does. It will contain only passes (as if we gave ‘Pass’ as an argument to the function).

from copy import deepcopy
aux_df = deepcopy(df)

aux_df = aux_df[aux_df['type']=='Pass']

Each row features coordinate data.

We’re going to unpack the values and determine zones in which the passes started and ended, dividing the pitch into bins corresponding to the zones derived from our model.

aux_df['x'], aux_df['y'] = zip(*aux_df['location'])
aux_df['end_x'], aux_df['end_y'] = zip(*aux_df[f'pass_end_location'])

aux_df[f'start_x_bin'] = pd.cut(aux_df['x'], bins=xT_cols, labels=False)
aux_df[f'start_y_bin'] = pd.cut(aux_df['y'], bins=xT_rows, labels=False)
aux_df[f'end_x_bin'] = pd.cut(aux_df['end_x'], bins=xT_cols, labels=False)
aux_df[f'end_y_bin'] = pd.cut(aux_df['end_x'], bins=xT_rows, labels=False)

Now instead of coordinates, we’re operating on pitch zones.

Knowing the zones, we can now assign xT to each zone and calculate the difference for xT values between the end zone and start zone.

aux_df['start_zone_value'] = aux_df[[f'start_x_bin', f'start_y_bin']].apply(lambda z: xT[z[1]][z[0]], axis=1)
aux_df['end_zone_value'] = aux_df[[f'end_x_bin', f'end_y_bin']].apply(lambda z: xT[z[1]][z[0]], axis=1)

aux_df['xT'] = aux_df['end_zone_value']-aux_df['start_zone_value']

As You already know how xT for each action is calculated, we can move on.

The next step is to concatenate our xT from passes and carries and perform clipping. I’m going to use the same cap value equal to 0.1 as Opta did.

xT_data = pd.concat([get_xT(df=df, event_type='Pass'), get_xT(df=df, event_type='Carry')], axis=0)
xT_data['xT_clipped'] = np.clip(xT_data['xT'], 0, 0.1)

Now, it’s time to find the maximum xT values each team obtained minute by minute.

max_xT_per_minute = xT_data.groupby(['team', 'minute'])['xT_clipped'].max().reset_index()

The subsequent part is crucial for our analysis, as we’re finally going to calculate momentum. As You recall, the most recent actions are going to be considered as more relevant for the momentum at set point in time, hence there’s a need for weighing the actions on a time basis. For every minute for each time, we are going to calculate the weighted sum of xT obtained in some time window preceding the current minute. The size of time windows is something to experiment with — I opt for using 3 to 5 minutes since momentum should illustrate which side is dominant in a particular short period of the match. Another parameter is the decay rate that controls the influence of the more distant actions within the time window. The greater decay rate we set, the less influence will preceding actions have. I highly recommend experimenting with these two values as they may alter our plot significantly.

As You notice in the code, we iterate over every minute, subset only minutes within the time window, calculate weights using the exponential function and sum it all up for each team. At last, the momentum in set minute is equal to the difference between weighted xT values for each team.

The exponential function suits the purpose of modeling the weights in our case. Notice that the decay rate is multiplied by -1. If You take a look at how the exponential function behaves for negative arguments, You’ll observe that the smaller the argument, the smaller values are obtained. That’s why a greater decay rate means focusing more on the most recent events — multiplication by -1 means we pass a negative value as an argument to the function and we get smaller weights for events from a few minutes ago.

minutes = sorted(xT_data['minute'].unique())
weighted_xT_sum = {
HOME_TEAM: [],
AWAY_TEAM: []
}
momentum = []

window_size = 4
decay_rate = 0.25


for current_minute in minutes:
for team in weighted_xT_sum.keys():

recent_xT_values = max_xT_per_minute[
(max_xT_per_minute['team'] == team) &
(max_xT_per_minute['minute'] <= current_minute) &
(max_xT_per_minute['minute'] > current_minute - window_size)
]

weights = np.exp(-decay_rate * (current_minute - recent_xT_values['minute'].values))
weighted_sum = np.sum(weights * recent_xT_values['xT_clipped'].values)
weighted_xT_sum[team].append(weighted_sum)

momentum.append(weighted_xT_sum[HOME_TEAM][-1] - weighted_xT_sum[AWAY_TEAM][-1])

momentum_df = pd.DataFrame({
'minute': minutes,
'momentum': momentum
})

Finally, what we got is a data frame with match momentum for each minute. Positive values indicate the dominance of the home team, and negative values on the contrary, the dominance of the away team.

The most relevant part is already behind us — now what’s left is creating an aesthetic plot. At first, let’s just plot the momentum curve. As we’re not particularly interested in exact values but just the notion of which team is the dominant one, the curve requires some smoothing. We will pass our data through a Gaussian filter in order to obtain it — mathematically, a Gaussian filter modifies the input data by convolution with a Gaussian function. The sigma parameter allows us to control how smooth we wish to have our function — greater sigma values mean a smoother function.

In the code below I also make sure some aesthetical details are considered —the frame for the plot is removed, ticks on the x-axis appear every 15 minutes and there’s no unnecessary margin.

import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter1d

fig, ax = plt.subplots(figsize=(12, 6))
fig.set_facecolor('#0e1117')
ax.set_facecolor('#0e1117')

ax.tick_params(axis='x', colors='white')
ax.margins(x=0)
ax.set_xticks([0,15,30,45,60,75,90])

ax.tick_params(axis='y', which='both', left=False, right=False, labelleft=False)
ax.set_ylim(-0.08, 0.08)

for spine in ['top', 'right', 'bottom', 'left']:
ax.spines[spine].set_visible(False)

momentum_df['smoothed_momentum'] = gaussian_filter1d(momentum_df['momentum'], sigma=1)
ax.plot(momentum_df['minute'], momentum_df['smoothed_momentum'], color='white')

Next, we wish to apply the distinctive colors indicating which team currently controls the match.

ax.axhline(0, color='white', linestyle='--', linewidth=0.5)
ax.fill_between(momentum_df['minute'], momentum_df['smoothed_momentum'], where=(momentum_df['smoothed_momentum'] > 0), color='blue', alpha=0.5, interpolate=True)
ax.fill_between(momentum_df['minute'], momentum_df['smoothed_momentum'], where=(momentum_df['smoothed_momentum'] < 0), color='red', alpha=0.5, interpolate=True)

What comes next is labeling the teams, axis and setting a title. We’re also calculating automatically the score using our main data frame with event data.

scores = df[df['shot_outcome'] == 'Goal'].groupby('team')['shot_outcome'].count().reindex(set(df['team']), fill_value=0)
ax.set_xlabel('Minute', color='white', fontsize=15, fontweight='bold', fontfamily='Monospace')
ax.set_ylabel('Momentum', color='white', fontsize=15, fontweight='bold', fontfamily='Monospace')
ax.set_title(f'xT Momentum\n{HOME_TEAM} {scores[HOME_TEAM]}-{scores[AWAY_TEAM]} {AWAY_TEAM}', color='white', fontsize=20, fontweight='bold', fontfamily='Monospace', pad=-5)

home_team_text = ax.text(7, 0.064, HOME_TEAM, fontsize=12, ha='center', fontfamily="Monospace", fontweight='bold', color='white')
home_team_text.set_bbox(dict(facecolor='blue', alpha=0.5, edgecolor='white', boxstyle='round'))
away_team_text = ax.text(7, -0.064, AWAY_TEAM, fontsize=12, ha='center', fontfamily="Monospace", fontweight='bold', color='white')
away_team_text.set_bbox(dict(facecolor='red', alpha=0.5, edgecolor='white', boxstyle='round'))

Let’s add one more detail — indicators of minutes in which the goals were scored.

goals = df[df['shot_outcome']=='Goal'][['minute', 'team']]
for _, row in goals.iterrows():
ymin, ymax = (0.5, 0.8) if row['team'] == HOME_TEAM else (0.14, 0.5)
ax.axvline(row['minute'], color='white', linestyle='--', linewidth=0.8, alpha=0.5, ymin=ymin, ymax=ymax)
ax.scatter(row['minute'], (1 if row['team'] == HOME_TEAM else -1)*0.06, color='white', s=100, zorder=10, alpha=0.7)
ax.text(row['minute']+0.1, (1 if row['team'] == HOME_TEAM else -1)*0.067, 'Goal', fontsize=10, ha='center', va='center', fontfamily="Monospace", color='white')

Below I leave full code in one function:

from statsbombpy import sb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter1d


def momentum(match_id, window_size=4, decay_rate=0.25, sigma=1):
df = sb.events(match_id=match_id)
HOME_TEAM, AWAY_TEAM= list(df['team'].unique())

xT = pd.read_csv("https://raw.githubusercontent.com/AKapich/WorldCup_App/main/app/xT_Grid.csv", header=None)
xT = np.array(xT)
xT_rows, xT_cols = xT.shape


def get_xT(df, event_type):
df = df[df['type']==event_type]

df['x'], df['y'] = zip(*df['location'])
df['end_x'], df['end_y'] = zip(*df[f'{event_type.lower()}_end_location'])

df[f'start_x_bin'] = pd.cut(df['x'], bins=xT_cols, labels=False)
df[f'start_y_bin'] = pd.cut(df['y'], bins=xT_rows, labels=False)
df[f'end_x_bin'] = pd.cut(df['end_x'], bins=xT_cols, labels=False)
df[f'end_y_bin'] = pd.cut(df['end_x'], bins=xT_rows, labels=False)
df['start_zone_value'] = df[[f'start_x_bin', f'start_y_bin']].apply(lambda z: xT[z[1]][z[0]], axis=1)
df['end_zone_value'] = df[[f'end_x_bin', f'end_y_bin']].apply(lambda z: xT[z[1]][z[0]], axis=1)
df['xT'] = df['end_zone_value']-df['start_zone_value']

return df[['xT', 'minute', 'second', 'team', 'type']]


xT_data = pd.concat([get_xT(df=df, event_type='Pass'), get_xT(df=df, event_type='Carry')], axis=0)
xT_data['xT_clipped'] = np.clip(xT_data['xT'], 0, 0.1)

max_xT_per_minute = xT_data.groupby(['team', 'minute'])['xT_clipped'].max().reset_index()

minutes = sorted(xT_data['minute'].unique())
weighted_xT_sum = {team: [] for team in max_xT_per_minute['team'].unique()}
momentum = []

for current_minute in minutes:
for team in weighted_xT_sum:
recent_xT_values = max_xT_per_minute[(max_xT_per_minute['team'] == team) &
(max_xT_per_minute['minute'] <= current_minute) &
(max_xT_per_minute['minute'] > current_minute - window_size)]

weights = np.exp(-decay_rate * (current_minute - recent_xT_values['minute'].values))
weighted_sum = np.sum(weights * recent_xT_values['xT_clipped'].values)
weighted_xT_sum[team].append(weighted_sum)

momentum.append(weighted_xT_sum[HOME_TEAM][-1] - weighted_xT_sum[AWAY_TEAM][-1])

momentum_df = pd.DataFrame({
'minute': minutes,
'momentum': momentum
})

fig, ax = plt.subplots(figsize=(12, 6))
fig.set_facecolor('#0e1117')
ax.set_facecolor('#0e1117')

ax.tick_params(axis='x', colors='white')
ax.tick_params(axis='y', which='both', left=False, right=False, labelleft=False)
for spine in ['top', 'right', 'bottom', 'left']:
ax.spines[spine].set_visible(False)
ax.set_xticks([0,15,30,45,60,75,90])
ax.margins(x=0)
ax.set_ylim(-0.08, 0.08)

momentum_df['smoothed_momentum'] = gaussian_filter1d(momentum_df['momentum'], sigma=sigma)
ax.plot(momentum_df['minute'], momentum_df['smoothed_momentum'], color='white')

ax.axhline(0, color='white', linestyle='--', linewidth=0.5)
ax.fill_between(momentum_df['minute'], momentum_df['smoothed_momentum'], where=(momentum_df['smoothed_momentum'] > 0), color='blue', alpha=0.5, interpolate=True)
ax.fill_between(momentum_df['minute'], momentum_df['smoothed_momentum'], where=(momentum_df['smoothed_momentum'] < 0), color='red', alpha=0.5, interpolate=True)

scores = df[df['shot_outcome'] == 'Goal'].groupby('team')['shot_outcome'].count().reindex(set(df['team']), fill_value=0)
ax.set_xlabel('Minute', color='white', fontsize=15, fontweight='bold', fontfamily='Monospace')
ax.set_ylabel('Momentum', color='white', fontsize=15, fontweight='bold', fontfamily='Monospace')
ax.set_title(f'xT Momentum\n{HOME_TEAM} {scores[HOME_TEAM]}-{scores[AWAY_TEAM]} {AWAY_TEAM}', color='white', fontsize=20, fontweight='bold', fontfamily='Monospace', pad=-5)

home_team_text = ax.text(7, 0.064, HOME_TEAM, fontsize=12, ha='center', fontfamily="Monospace", fontweight='bold', color='white')
home_team_text.set_bbox(dict(facecolor='blue', alpha=0.5, edgecolor='white', boxstyle='round'))
away_team_text = ax.text(7, -0.064, AWAY_TEAM, fontsize=12, ha='center', fontfamily="Monospace", fontweight='bold', color='white')
away_team_text.set_bbox(dict(facecolor='red', alpha=0.5, edgecolor='white', boxstyle='round'))

goals = df[df['shot_outcome']=='Goal'][['minute', 'team']]
for _, row in goals.iterrows():
ymin, ymax = (0.5, 0.8) if row['team'] == HOME_TEAM else (0.14, 0.5)
ax.axvline(row['minute'], color='white', linestyle='--', linewidth=0.8, alpha=0.5, ymin=ymin, ymax=ymax)
ax.scatter(row['minute'], (1 if row['team'] == HOME_TEAM else -1)*0.06, color='white', s=100, zorder=10, alpha=0.7)
ax.text(row['minute']+0.1, (1 if row['team'] == HOME_TEAM else -1)*0.067, 'Goal', fontsize=10, ha='center', va='center', fontfamily="Monospace", color='white')

Thank you for reading my article. I’d appreciate any feedback or suggestions. For more football content, visit my Twitter/X.

--

--