Visualising shots using football match event data in Python

Roland Kovács
8 min readApr 25, 2023

--

This article describes how you can create you own shot map in Python using football match event data.

Argentina vs France 2022 Word Cup Final shot map

StatsBomb released its 2022 Word Cup data to the public completely free. It contains event and tracking data about every match played in the 2022 Word Cup. You can download the data from their GithHub account. In this article, I use the final match between Argentina and France to demonstrate how you can create a shot map about a football match, which you can quickly implement for any other matches as well.

Importing the necessary libraries and the data

First we will import the necessary libraries which will be needed during the visualisation.

import json
import pandas as pd
import numpy as np
from mplsoccer import Pitch
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import random
import highlight_text
from highlight_text import HighlightText

Then, we import the dataset using the match id. The dataset contains several folders which named by the competition id which can be found in the competitions.json file. The 2022 FIFA World Cup has the competition id of 43 and the season id of 106, therefore in the matches folder, there is a folder named as 43 and in this folder a json file called 106. In this file you can find every match id. The final match between Argentina and France has the id of 3869685. With the below code, you can read the json file of the match from the events folder and save it into a dataframe.

match_id = 3869685

with open(f"/statsbomb_wc/open-data-master/data/events/" + str(match_id) + ".json") as f:
events = json.load(f)

df = pd.json_normalize(events, sep='_')

Preparing the data for the visualisation

First, we create some variables which will be used often in our code. These variables are the name of the teams, the colours of the teams and pitch, and the size of the pitch which will be plotted.

home_team = "Argentina"
away_team = "France"
home_color = "#ADD8E6"
away_color = "#00008B"
line_color = '#c7d5cc'
pitch_color = '#444444'

pitch_length_x = 120
pitch_width_y = 80

Then, we want to filter our dataset for the shot events only. Therefore, we are only interested in the records with ‘Shot’ as a type_name. As there was a penalty shoutout, we can also filter for the events that happened before the 120th minute. We create two dataframes for the two teams, which only contains the shot events.

def create_shot_df(df, home_team, away_team):
shots = df.loc[(df.type_name=='Shot') & (df.minute < 120)]

shots_home = shots[shots.team_name == home_team]
shots_away = shots[shots.team_name == away_team]
return (shots_home, shots_away)

shots_home, shots_away = create_shot_df(df, home_team, away_team)

The next step is to create the statistics of the match, which we can put on the middle of our chart. We are only focusing on the information about the shots, like the number of goals, shots, saves or the expected goals (xGoal). To this, we can use our shot dataframes’ outcomes column.

def create_statistics(shot_df):
outcomes = shot_df.shot_outcome_name.value_counts()
all_shots = outcomes.sum()
goals = outcomes.Goal if hasattr(outcomes, 'Goal') else 0
saves = outcomes.Saved if hasattr(outcomes, 'Saved') else 0
post = outcomes.Post if hasattr(outcomes, 'Post') else 0
on_target = saves + goals
off_target = all_shots - on_target
blocked = outcomes.Blocked if hasattr(outcomes, 'Blocked') else 0
xg = round(shot_df.shot_statsbomb_xg.sum(), 2)

team_stats = {'all_shots': all_shots, 'on_target': on_target, 'off_target': off_target, 'goals': goals,\
'saves': saves, 'blocked': blocked, 'xg': xg}
return team_stats

home_stats = create_statistics(shots_home)
away_stats = create_statistics(shots_away)

Creating a football pitch and plotting the shots

First, we want to create a football pitch for which we can use the mplsoccer’s package Pitch function. We just have to specify the pitch_type which is ‘statsbomb’ and the pitch and line colours. Then, we specify some parameters related to our figure size, title, grid, and endnote. These parameters are the ones I frequently use, but feel free to play around with the parameters to see how each modifies the size of your pitch.

pitch = Pitch(pitch_type='statsbomb', pitch_color=pitch_color, line_color=line_color)
fig, axs = pitch.grid(figheight=6, title_height=0.05, endnote_space=0, axis=False,
title_space=0, grid_height=0.82, endnote_height=0.01)

fig.set_facecolor(pitch_color)

Once you run the code, you should have a football pitch which looks like this:

Football pitch plotted with the mplsoccer package

Let’s plot the shot first with the below function. The plot_shots function expects an axis on which we want to plot the shots (this will be our pitch axis: axs[‘pitch’]), a shot_df which will be either the home or away shot_df that we created earlier, a colour which can be the colour of the team, the pitch length to calculate the coordinates of the shots, and a boolean variable which represents the home or away team.

def plot_shots(ax, shot_df, color, pitch_length_x, home=True):
marker_map = {'Goal': "*", 'Saved': "o", 'Off T': "x", 'Blocked': "s", 'Wayward':"x", 'Post': "x"}

locations = shot_df.location.values
outcomes = shot_df.shot_outcome_name
shot_xgs = shot_df.shot_statsbomb_xg.values

for loc, outcome, xg in zip(locations, outcomes, shot_xgs):
x = loc[0]
y = loc[1]

size = xg * 1000

if home:
x_cor = pitch_length_x - x + random.randint(-9, 9)/10
y_cor = y + random.randint(-9, 9)/10
pitch.scatter(x_cor, y_cor, s=size, color=color, alpha=.8, marker=marker_map[outcome], ax=ax)
else:
x_cor = x + random.randint(-9, 9)/10
y_cor = y + random.randint(-9, 9)/10
pitch.scatter(x_cor, y_cor, s=size, color=color, alpha=.8, marker=marker_map[outcome], ax=ax)

First, we create a mapping for the different type of shots and for the markers. Then, we extract the shot locations, outcomes, and expected goals. Finally, we plot those using the scatter function. Depending on which side of the football pitch we want to use, we calculate the plot coordinates. You can notice that, I added a bit of randomness to the coordinates, so the two goal scored by France from the penalty spot can be seen. We increase the size of the markers according to the xg, so the shots with higher xg have a bigger marker on the plot. Now, we can create our pitch with the shots with the following code.

pitch = Pitch(pitch_type='statsbomb', pitch_color=pitch_color, line_color=line_color)
fig, axs = pitch.grid(figheight=6, title_height=0.05, endnote_space=0, axis=False,
title_space=0, grid_height=0.82, endnote_height=0.01)

fig.set_facecolor(pitch_color)

plot_shots(axs['pitch'], shots_home, home_color, pitch_length_x, home=True)
plot_shots(axs['pitch'], shots_away, away_color, pitch_length_x, home=False)

Notice, that I added the plot_shots function calls after the pitch creating code, because if you run the code in a notebook, then you should have it in one cell. Hereinafter, once I share a new function and function call, the function call should be used together with the previous codes (without the functions) in one cell. This is what we got now. Notice, that using alpha and a little bit of randomness makes every shot viable.

Shot map with the shots

To describe what the size of the markers meaning, we can add a simple explanation. This code will plot an arrow with the text of XGoal and three increasing circle. The function expects a colour and an axis where it can plot the description. I like to use the same colour as the line colour. First, we plot the text into the bottom left corner, then the arrow, and finally, the 3 circles with increasing number.

def add_xgoal_desc(line_color, ax):
ax.text(10, 75, 'XGoal', va='center', ha='center', color=line_color, fontsize=12, fontweight='bold')
ax.arrow(4, 78, 20, 0, head_width=2, head_length=1, ec=line_color, fc=line_color, width=0.2)

ax.add_patch(plt.Circle((17, 75), 0.2, color=line_color))
ax.add_patch(plt.Circle((19, 75), 0.4, color=line_color))
ax.add_patch(plt.Circle((21.5, 75), 0.8, color=line_color))

add_xgoal_desc(line_color, axs['pitch'])
Shot map with XGoal description

Titles are essential for every chart. I like to use the name of the teams as the main title, and the season and competition as the subtitle. I often use the colour of the teams in the title, making clear what data points represent which team. With the highlight_text package, we can colourise the words of the text separately. You have to split the different parts in the title with the <> characters, so you can style each part differently. Finally, you have to create the text properties for the parts, and use these as a parameter in the HighlightText function. The add_title function expects the name and colour of the teams, the subtitle, and the axis, which is the ‘title’ axis.

def add_title(home_team, away_team, home_color, away_color, line_color, subtitle, ax):
title = f"<{home_team}> <vs> <{away_team}>"
title_font = 15
subtitle_font = 13
highlight_textprops =[
{"color": home_color, "fontsize":title_font, "fontweight":'bold'},
{"color": line_color, "fontsize":title_font, "fontweight":'bold'},
{"color": away_color, "fontsize":title_font, "fontweight":'bold'}
]

HighlightText(x=0.5, y=0.7, va='center', ha='center',
s=title,
highlight_textprops=highlight_textprops,
ax=ax)

ax.text(0.5, 0.0, subtitle, color=line_color, va='center', ha='center', fontsize=subtitle_font)

subtitle = "2022 - Word Cup Final - Shots"
add_title(home_team, away_team, home_color, away_color, line_color, subtitle, axs['title'])
Shot map with title

To help the viewers with the meaning of the different markers, we can create a legend which describes that. I consistently use the line and pitch colours. First we create the symbols with the Line2D function. Then add those and the texts to the plt.legend function, where we can also set the parameters of the location, sizes, and colours.

def plot_legend(line_color, pitch_color):
goal = Line2D([0], [0], marker='*', markersize=np.sqrt(30), color=line_color, linestyle='None')
saved = Line2D([0], [0], marker='o', markersize=np.sqrt(30), color=line_color, linestyle='None')
off_t = Line2D([0], [0], marker='x', markersize=np.sqrt(30), color=line_color, linestyle='None')
blocked = Line2D([0], [0], marker='s', markersize=np.sqrt(30), color=line_color, linestyle='None')

plt.legend([goal, saved, off_t, blocked], ['Goal', 'Saved', 'Off target','Blocked'], loc="lower right",
markerscale=1.5, scatterpoints=1, fontsize=10, labelcolor=line_color, facecolor = pitch_color, edgecolor = line_color)

plot_legend(line_color, pitch_color)
Shot map with legend

To summarise the information of the shot map we can create a stat table with the following code.

def plot_stats(home_stats, away_stats, line_color, pitch_color, ax):
stat_font_size = 12
ax.text(x=60, y=5, s='{} Goals {}'.format(home_stats['goals'], away_stats['goals']), size=stat_font_size,
color=line_color, backgroundcolor=pitch_color, ha='center')
ax.text(x=60, y=10, s='{} xGoal {}'.format(home_stats['xg'], away_stats['xg']), size=stat_font_size,
color=line_color, backgroundcolor=pitch_color, ha='center')
ax.text(x=60, y=15, s='{} Shots {}'.format(home_stats['all_shots'], away_stats['all_shots']), size=stat_font_size,
color=line_color,backgroundcolor=pitch_color, ha='center')
ax.text(x=60, y=20, s='{} On target {}'.format(home_stats['on_target'], away_stats['on_target']), size=stat_font_size,
color=line_color, backgroundcolor=pitch_color, ha='center')
ax.text(x=60, y=25, s='{} Off target {}'.format(home_stats['off_target'], away_stats['off_target']), size=stat_font_size,
color=line_color, backgroundcolor=pitch_color, ha='center')
ax.text(x=60, y=30, s='{} Saves {}'.format(home_stats['saves'], away_stats['saves']), size=stat_font_size,
color=line_color, backgroundcolor=pitch_color, ha='center')
ax.text(x=60, y=35, s='{} Blocked {}'.format(home_stats['blocked'], away_stats['blocked']), size=stat_font_size,
color=line_color, backgroundcolor=pitch_color, ha='center')

plot_stats(home_stats, away_stats, line_color, pitch_color, axs['pitch'])
Shot map with statistics

Using all of the functions, we can create our final shot map adding everything together. This is how the code looks like using the above mentioned functions.

shots_home, shots_away = create_shot_df(df, home_team, away_team)
home_stats = create_statistics(shots_home)
away_stats = create_statistics(shots_away)

pitch = Pitch(pitch_type='statsbomb', pitch_color=pitch_color, line_color=line_color)
fig, axs = pitch.grid(figheight=6, title_height=0.05, endnote_space=0, axis=False,
title_space=0, grid_height=0.82, endnote_height=0.01)

fig.set_facecolor(pitch_color)

plot_shots(axs['pitch'], shots_home, home_color, pitch_length_x, home_color, away_color, home=True)
plot_shots(axs['pitch'], shots_away, away_color, pitch_length_x, home_color, away_color, home=False)

add_xgoal_desc(line_color, axs['pitch'])

subtitle = "2022 - Word Cup Final - Shots"
add_title(home_team, away_team, home_color, away_color, line_color, subtitle, axs['title'])

plot_legend(line_color, pitch_color)

plot_stats(home_stats, away_stats, line_color, pitch_color, axs['pitch'])

axs['pitch'].text(x=0, y=83, s='Created by Roland Kovacs - Data provided by StatsBomb',
size=8, color=line_color, backgroundcolor=pitch_color)

Note that we added some credit at the end of the code and also indicated our data source.

Final shot map

This is our final plot which we can now easily use for any other matches as well. The whole code is available in this notebook.

--

--