Football stats with Python: Pass Position, Direction and Distance

Published in

Analytics Vidhya

7 min readSep 3, 2020

Overview

Using open-source data from Wyscout, I extracted data and format into CSV. To get more details, please see my article below:

Preprocess FIFA World Cup data with Python

A large amount of football play by play data was published by Wyscout in May 2019 on figshare.

medium.com

Now we have prepared play-by-play data so I’m going to visualize football data with Python. This time, I visualize the average passing position, direction, and distance, called “Pass Sonar”.

I use Google Colab so don’t need to build any environment on your laptop.

https://github.com/shotin93/fifa-world-cup-2018

1. Read CSV and look into the data

Firstly, read the CSV files.

import pandas as pd
pd.set_option(“max_columns”, 100)matches = pd.read_csv(“csv/matches.csv”)
matches_member = pd.read_csv(“csv/matches_member.csv”)
events = pd.read_csv(“csv/events.csv”)
event_kinds = pd.read_csv(“csv/eventKinds.csv”)
sub_event_kinds = pd.read_csv(“csv/subEventKinds.csv”)
players = pd.read_csv(“csv/players.csv”)
teams = pd.read_csv(“sv/teams.csv”)

I want to visualize Spain v. Russia which Spain has recorded the most passes in the World Cup history, so search Spain and Russia in teams dataframe.

teams[teams.officialName.str.contains(‘Spain’)].append(teams[teams.officialName.str.contains(‘Russia’)])

matches[matches.teamId == 1598].append(matches[matches.teamId == 1598])

It looks like one match has 2000+ records. This match went through a penalty round, but we need only the passing plays by Spain. According to the document, we can find matchPeriod meaning so narrow down the events with this parameter.

- matchPeriod: the period of the match. It can be “1H” (first half of the match), “2H” (second half of the match), “E1” (first extra time), “E2” (second extra time) or “P” (penalties time);

events = events[(events.matchId == 2058004) & (events.matchPeriod != "E1") & (events.matchPeriod != "E2") & (events.matchPeriod != "P") & (events.teamId == 1598)]

We cannot visualize over 11 players on the pitch (technically we can but it’s strange), so we also narrow it down by players who appeared at the beginning of the game.

member_spain = matches_member[(matches_member.matchId == 2058004) & (matches_member.teamId == 1598) & (matches_member.startingF == 1)]

events = events[events.playerId.isin(member_spain.playerId)]

2. Calculate players with the average positions

I found a position specification on the document, so we can get the average positions of each player.

- positions: the origin and destination positions associated with the event. Each position is a pair of coordinates (x, y). The x and y coordinates are always in the range [0, 100] and indicate the percentage of the field from the perspective of the attacking team. In particular, the value of the x coordinate indicates the event’s nearness (in percentage) to the opponent’s goal, while the value of the y coordinates indicates the event’s nearness (in percentage) to the right side of the field;

However, a little bit tricky, x and y mean percentages. Let’s see some corner kick whose subEventId is “30”.

events[[“fromX”, “fromY”]][events[‘subEventId’] == 30]

We can understand 0 of x means Spain’s goal and 100 of x means the opponent’s goal. (100 of y is the right side for Spain according to the document)

I want to visualize vertically not horizontally, so I exchange x for y and y for x. In addition, the percentage is difficult to use so convert it into meters. (assume pitch size is 105 x 68)

events["fromXm"] = round((events["fromY"]*68/100),1)
events["fromYm"] = round((events["fromX"]*105/100),1)
events["toXm"] = round((events["toY"]*68/100),1)
events["toYm"] = round((events["toX"]*105/100),1)

events[[“fromX”, “fromY”, “fromXm”, “fromYm”]]

We are ready to calculate the average positions of each player. Narrow down by passing play, aggregate by each player, and calculate an average of x and y.

events.to_csv("csv/spain_passing_events.csv",index=False) #Save...pass_events = events[events.eventId == 8]pass_position = pass_events.groupby([“playerId”],as_index=False)pass_position = pass_position.agg({“fromXm”: “mean”,”fromYm”: “mean”})

In addition to this, we merge with player data to get player names.

pass_position = pd.merge(pass_position, players, on=”playerId”)

3. Summarize pass events for Pass Sonar

We calculated the average position of each player when they played passing, in addition to this, we also need the distance and the direction of the accurate pass plays. Therefore, I’m going to calculate these.

Firstly, narrow down by pass play and accurate one.

accurate_pass_events = events[(events.eventId == 8) & (events.accurateF == 1)]

Calculate distance using Pythagoras’ theorem.

import numpy as npaccurate_pass_events[“distance”] = np.sqrt(
  (abs(
    accurate_pass_events[“toXm”] — accurate_pass_events[“fromXm”]
  ) ** 2 + abs(
  accurate_pass_events[“toYm”] — accurate_pass_events[“fromYm”]
  ) ** 2).values
)

accurate_pass_events[[“fromXm”, “toXm”, “fromYm”, “toYm”, “distance”]]

Also, calculate the angle. We define degree as 0 when the pass goes straight forward.

from numpy import linalg as LAdef calc_degree(fromX, fromY, toX, toY):
  u = np.array([fromX — fromX, 105 — fromY])
  v = np.array([toX — fromX, toY — fromY])
  i = np.inner(u, v)
  n = LA.norm(u) * LA.norm(v)
  c = i / n
  a = np.rad2deg(np.arccos(np.clip(c, -1.0, 1.0)))  if toX — fromX < 0:
    a = 360 — a
  
  return adef calc_pass_theta(row):
  return round(
    calc_degree(
      row[“fromXm”]
      ,row[“fromYm”]
      ,row[“toXm”]
      ,row[“toYm”]
    )
  )#Apply function each row
accurate_pass_events[“angle”] = accurate_pass_events.apply(
  calc_pass_theta
  ,axis=1
)

accurate_pass_events[[“fromXm”, “toXm”, “fromYm”, “toYm”, “angle”]]

You can find a 0-degree pass in the second row.

Besides, we need to divide it into 8 directions (anything is ok) by degree. For example, if the angle is between 0-22.5 and 337.5-360, I define direction 1 (which means forward).

def divide(angle, divisions):
  degree = 360 / divisions
  division = ((angle + (degree / 2)) // degree) + 1  if division > angle:
    division = 1  return divisiondef divide_pass_direction(row):
  return divide(
    row[“angle”]
    ,8
  )accurate_pass_events[“direction”] = accurate_pass_events.apply(
  divide_pass_direction
  ,axis=1
)

accurate_pass_events[[“angle”, “direction”]]

Oops. Can you see direction 9? This means 1, so we replace it.

accurate_pass_events = accurate_pass_events.replace({“direction”: {9: 1}})

In the end, summarize accurate pass events with player and direction and calculate the average pass distance.

pass_sonar = accurate_pass_events.groupby(["playerId", "direction"], as_index=False)
pass_sonar = pass_sonar.agg({"distance": "mean", "eventId": "count"})
pass_sonar = pass_sonar.rename(columns={"eventId": "amount"})

We eventually finished data preprocessing. Let’s move on to visualization.

4. Visualization

I’m going to use matplotlib for visualization.

%matplotlib inline
import matplotlib.pyplot as pltfig = plt.figure(figsize=(7,11), facecolor=’white’)
ax = fig.add_subplot(111, facecolor=’white’)
ax.set_xlim(0, 68) #Horizontal pitch size 
ax.set_ylim(0, 105) #Vertical pitch size

We plot pass sonar on the average position of each player, so we use a loop in pass sonar data which is nested in average position data.

import matplotlib.patches as patfor _, player in pass_position.iterrows():
  ax.text(
    player.fromXm
    ,player.fromYm
    ,player.playerName.encode().decode(“unicode-escape”)
    ,ha=”center”
    ,va=”center”
    ,color=”black”
  )
  
  for _, pass_detail in pass_sonar[pass_sonar.playerId == player.playerId].iterrows():
    #Start degree of direction 1
    theta_left_start = 112.5
    
    #Color coding by distance
    color = “darkred”
    if pass_detail.distance < 15:
      color = “gold”
    elif pass_detail.distance < 25:
      color = “darkorange”    #Calculate degree in matplotlib figure
    theta_left = theta_left_start — (360 / 8) * (pass_detail.direction — 1)
    theta_right = theta_left — (360 / 8)    pass_wedge = pat.Wedge(
      center=(player.fromXm, player.fromYm)
      ,r=int(pass_detail.amount)*0.15
      ,theta1=theta_right
      ,theta2=theta_left
      ,facecolor=color
      ,edgecolor=”white”
    )    ax.add_patch(pass_wedge)

This is only 90 minutes of passing plays, but the Spanish players’ positions are very close, and also find Jordi and Ramos have short passing networks while Ramos long passing to the right side (maybe to Nacho?). Surprisingly, Silva and Busquets have fewer passes.

5. Look better

Do you want to make this look better? We can do this using PIL.

pip install pillow
from PIL import Image#convert matplotlib into PIL.Image
fig.canvas.draw()
pass_sonar_img = np.array(fig.canvas.renderer.buffer_rgba())
pass_sonar_img = Image.fromarray(pass_sonar_img)field_image = Image.open("image/field.png")
field_image.paste(pass_sonar_img,(0,0),pass_sonar_img)

This looks better and easier to understand on the football field.

That’s all!! What do you think of this? Thank you for reading.