A Step-by-Step Guide to Using Python to Create Football Passing Networks

Antonius Yoga Krisanto
7 min readApr 19, 2023

--

Photo by Frantzou Fleurine on Unsplash

The technique for creating passing networks for football games is covered in detail in this Python tutorial. Building passing networks and learning about team performance are both possible for readers by following the tutorial.

Passing Network

A passing network in football is a representation of how players pass to one another during a game. It serves as a tool for displaying the game’s passing habits, patterns, and flow.

In terms of data science, passing networks require the gathering, preparation, and analysis of passing data, as well as the subsequent development of network graph visualizations. These graphs illustrate how players interact with one another. Each player is represented as a node, and each pass is represented as an edge linking two nodes. The performance of individual players and the dynamics of the team can be understood through passing networks. They can be employed for player recruitment and tactical analysis as well.

I’ll demonstrate how to use Python to create passing networks in this article. Here are some of the crucial actions we must take:

  1. Choose the attributes we’ll be using to prepare the data.
  2. Pick 11 players who played more minutes than the other players after checking each player’s game time.
  3. Calculate location and size vertices for each player.
  4. Calculate edges between players.
  5. Visualize the information that has previously been prepared.

Dataset

The data set we’ll use comes from Barcelona and Manchester United’s 2010–2011 Champions League finals game. The statistics obtained from StatsBomb, a website that provides different player and match data, statistical visualizations, and forecasting tools for studying football.

Data Preparation

We’ll import a number of modules and libraries:

  • pandas: A library for data manipulation and analysis.
  • numpy: A library for scientific computing with Python.
  • mplsoccer: A Python library for visualizing football (soccer) data using Matplotlib.
import pandas as pd
import numpy as np

from mplsoccer import Pitch, Sbopen
from mplsoccer import VerticalPitch,Pitch

We will used Sbopen parser from the mpl soccer library to extract data from the game event with ID 18236 (Final Champions League 2010/2011).

parser = Sbopen()
df, related, freeze, tactics = parser.event(18236)

Our next step is choosing some relevant atrributes that we will use :

  • First we need to filter the original data frame to only include passing events.
  • Select only the relevant columns from the passing events data frame using like the starting and ending coordinates of the pass, the player and recipient names, the team name, the minute the pass occurred, and the player ID.
  • We want to add player number to in our visualization so we need select the relevant columns from the tactics data frame : player ID and jersey number.
  • Combine the two data frames, we need to merge them on the player ID.
df = df[df.type_name == 'Pass']
df = df[['x', 'y', 'end_x', 'end_y', "player_name", "pass_recipient_name","team_name","minute",'player_id']]
tactics = tactics[['jersey_number','player_id']]

# Merge tactics and df on player_id
df_merged = pd.merge(df, tactics[['jersey_number', 'player_id']], on='player_id', how='left')
Merged Dataframe

As you can see, the data frame we have now has a jersey number column based on the player’s name. I prefer to display the player’s name as well in our visualization, so we will replace the full names of the players from both teams with shorter names.

# Define the name replacement dictionary
name_replacements = {
'Lionel Andrés Messi Cuccittini': 'Lionel Messi',
'David Villa Sánchez': 'David Villa',
'Pedro Eliezer Rodríguez Ledesma': 'Pedro Rodriguez',
'Sergio Busquets i Burgos': 'Sergio Busquets',
'Andrés Iniesta Luján': 'Andres Iniesta',
'Xavier Hernández Creus': 'Xavi',
'Eric-Sylvain Bilal Abidal' : 'Eric Abidal',
'Daniel Alves da Silva': 'Dani Alves',
'Carles Puyol i Saforcada': 'Carles Puyol',
'Gerard Piqué Bernabéu': 'Gerard Pique',
'Javier Alejandro Mascherano': 'Javier Mascherano',
'Víctor Valdés Arribas': 'Victor Valdes',

'Edwin van der Sar': 'Van der Sar',
'Patrice Latyr Evra': 'Patrice Evra',
'Nemanja Vidić': 'Nemanja Vidic',
'Rio Gavin Ferdinand': 'Rio Ferdinand',
'John Michael O Shea': 'John O Shea',
'Ryan Joseph Giggs': 'Ryan Giggs',
'Michael Carrick': 'Carrick',
'Luis Antonio Valencia Mosquera': 'Valencia',
'Wayne Mark Rooney': 'Wayne Rooney',
'Javier Hernández Balcázar': 'Javier Hernandez',
'Anderson Luís de Abreu Oliveira': 'Anderson',
'Luís Carlos Almeida da Cunha' : 'Nani',
'Fábio Pereira da Silva' : 'Fabio'
}

# Replace the names in the dataframe columns using the dictionary
df_merged['player_name'] = df_merged['player_name'].replace(name_replacements)
df_merged['pass_recipient_name'] = df_merged['pass_recipient_name'].replace(name_replacements)
df_merged.sort_values(by='team_name', inplace=True)

The next step is to extract the home team and away team from the dataframe. We will use the “iloc” method to identify between the home team and the away team by setting the index to 0 for the home team and -1 for the away team.

hteam = df_merged['team_name'].iloc[0]
ateam = df_merged['team_name'].iloc[-1]
print('Home Team : ' + hteam)
print('Away Team : ' + ateam)

The output will be like :

Home Team : Barcelona
Away Team : Manchester United

As I will only be creating Barcelona Passing Networks in this tutorial, we need to filter by the home team before moving on to the following step: Remove players that have a fewer play time.

df_home_pass = df_merged[df_merged.team_name == hteam]

# Check player play time
home_player_df = df_merged[df_merged.team_name == hteam].groupby('player_name').agg({'minute': [min, max]}).reset_index()
home_player_df = pd.concat([home_player_df['player_name'], home_player_df['minute']], axis=1)
home_player_df['minutes_played'] = home_player_df['max'] - home_player_df['min']
home_player_df = home_player_df.sort_values('minutes_played', ascending=False)
Players played time

We only need to select the first eleven participants in the data frame because the data is already sorted by the amount of time played.

home_player_name = home_player_df.player_name[:11].tolist()
df_home_pass = df_home_pass[df_home_pass.player_name.isin(home_player_name)]
df_home_pass = df_home_pass[df_home_pass.pass_recipient_name.isin(home_player_name)]

For the objective of creating a scatter plot of soccer passing networks, we will set up a dataframe called scatter_df to calculate size and location of each vertices.

  • The average coordinates of each player’s passes and receptions are determined by iterating over each player in the df_home_pass data frame.
  • Additionally, it determines how many passes each player has made and records the information in the scatter_dfdata frame.
  • Finally, based on the number of passes each player completed, the size of the circle that represents each player in the scatter plot is calculated.
scatter_df = pd.DataFrame()
for i, name in enumerate(df_home_pass["player_name"].unique()):
passx = df_home_pass.loc[df_home_pass["player_name"] == name]["x"].to_numpy()
recx = df_home_pass.loc[df_home_pass["pass_recipient_name"] == name]["end_x"].to_numpy()
passy = df_home_pass.loc[df_home_pass["player_name"] == name]["y"].to_numpy()
recy = df_home_pass.loc[df_home_pass["pass_recipient_name"] == name]["end_y"].to_numpy()
scatter_df.at[i, "player_name"] = name

#make sure that x and y location for each circle representing the player is the average of passes and receptions
scatter_df.at[i, "x"] = np.mean(np.concatenate([passx, recx]))
scatter_df.at[i, "y"] = np.mean(np.concatenate([passy, recy]))

#calculate number of passes
scatter_df.at[i, "no"] = df_home_pass.loc[df_home_pass["player_name"] == name].count().iloc[0]
scatter_df.at[i, "jersey_number"] = df_home_pass.loc[df_home_pass["player_name"] == name]['jersey_number'].iloc[0]
scatter_df['jersey_number'] = scatter_df['jersey_number'].astype(int)

#adjust the size of a circle so that the player who made more passes
scatter_df['marker_size'] = (scatter_df['no'] / scatter_df['no'].max() * 1500)

To plot the edges in a visualization, we will edit data frame df_home_pass in the following step.

  • To combine and arrange the names of the player making the pass and the player receiving it, a new column called pair_key is generated.
  • The DataFrame is then grouped by pair_key, and the x column is changed to pass_count to contain the number of passes made between each pair of players.
  • The DataFrame is then given a threshold so that only pairs of players with at least 5 passes made between them are maintained.
#Calculate edge width
df_home_pass["pair_key"] = df_home_pass.apply(lambda x: "_".join(sorted([x["player_name"], x["pass_recipient_name"]])), axis=1)
lines_df = df_home_pass.groupby(["pair_key"]).x.count().reset_index()
lines_df.rename({'x':'pass_count'}, axis='columns', inplace=True)
#setting a treshold
lines_df = lines_df[lines_df['pass_count']>5]

Data Visualization

A typical approach for creating passing networks involves the following steps:

  1. Create a pitch object with specific attributes, such as pitch type, line color, and goal type.
  2. Plotting a scatter plot with each point representing a player’s position, it plots the vertices on the field. Additionally, it uses annotations to add the player names and jersey numbers next to the respective points.
  3. Plotting edges to indicate passes between players, it maps edges on the field. The number of passes made between the two players determines the width of each line, and the number of passes also determines each line’s opacity, albeit with a minimum opacity value.
pitch = VerticalPitch(pitch_type='statsbomb', line_color='white',linewidth=1,goal_type='box')
fig, ax = pitch.grid(grid_height=0.9, title_height=0.06, axis=False,
endnote_height=0.04, title_space=0, endnote_space=0)

# Plot vertices
pitch.scatter(scatter_df.x, scatter_df.y, s=scatter_df.marker_size, color='#272822', edgecolors='#EDBB00', linewidth=3, alpha=1, ax=ax["pitch"], zorder = 3)

#Add Players Name and Jersey Numbers
for i, row in scatter_df.iterrows():
pitch.annotate(row.player_name, xy=(row.x +6, row.y), c='white', va='center',
ha='center', size=6, weight = "bold", ax=ax["pitch"], zorder = 4,
bbox=dict(facecolor='#272822', alpha=1, edgecolor='#272822', boxstyle='round,pad=0.4'))

for i, row in scatter_df.iterrows():
pitch.annotate(row.jersey_number, xy=(row.x, row.y-0.1), c='white', va='center',
ha='center', size=8, weight = "bold", ax=ax["pitch"], zorder = 4)

#Plot edges
for i, row in lines_df.iterrows():
player1 = row["pair_key"].split("_")[0]
player2 = row['pair_key'].split("_")[1]
#take the average location of players to plot a line between them
player1_x = scatter_df.loc[scatter_df["player_name"] == player1]['x'].iloc[0]
player1_y = scatter_df.loc[scatter_df["player_name"] == player1]['y'].iloc[0]
player2_x = scatter_df.loc[scatter_df["player_name"] == player2]['x'].iloc[0]
player2_y = scatter_df.loc[scatter_df["player_name"] == player2]['y'].iloc[0]
num_passes = row["pass_count"]
#adjust the line width so that the more passes, the wider the line
line_width = (num_passes / lines_df['pass_count'].max() * 8)
# adjust the alpha of the lines based on number of passes and set minimum alpha for a fewer pass
alpha = max(num_passes / lines_df['pass_count'].max(), 0.2)
alpha = max(alpha, 0.5)
#plot lines on the pitch
pitch.lines(player1_x, player1_y, player2_x, player2_y,
alpha=alpha, lw=line_width, zorder=2, color="#EDBB00", ax = ax["pitch"])

fig.text(s=hteam + " "+ "Passing Networks", x= 0.06, y= 1, fontsize=18,fontweight="bold")
fig.text(s='Final Champions League 2010/2011', x = 0.06, y=0.97, fontsize=12)
plt.show()

The visualization will looks like this :

Barcelona Passing Networks

Insight

  • Barcelona’s possession during the final Champions League match of 2010/2011 was greatly aided by the strong passing chemistry between Messi (10), Xavi (6), and Iniesta (8).
  • Barcelona played with outstanding positional balance and composure throughout the game, especially in the opponent’s half.
  • Eric Abidal (22) and Dani Alves (2) of Barcelona’s fullbacks pushed forward to maintain width and create overloads in the midfields area.
  • During the final game, Victor Valdes made significant contributions to the team’s build-up play.

Conclusion

In this article we already learn step by step about how to passing networks. Thank you for taking the time to read this tutorial. We hope that you found it informative and valuable. Please feel free to return for future tutorials.

--

--