Analyzing football game event data using mplsoccer in Python

Abhishek Mahajan
5 min readSep 21, 2022

--

Let’s take an example of a game from the newly released open ISL 2021/22 data from statsbomb.

Introduction

The use of Data Science and Analytics is making great strides in the field of Professional Sports, including football.

The scope of data analysis in football covers a wide range of topics including game event analysis, scouting players for teams, tactical decision making and visualizing the tactics.

In this article, I am going to discuss analyzing a game using the events data of the game. As an example we are going to make use of events data from the final of Indian Super League 2021/22 between Hyderabad and Kerala Blasters.

Implementation

The data for the ISL 2021/22 season has been released from statsbomb for free. There are multiple ways to access the data. One way is to download the data from statsbomb data repository on github(here!). This way is a bit cumbersome since the size of data is big and the parsing is relatively not simple. The second way is to use the statsbombpy package from statsbomb in python. This method is a better, however, we will be using a third way , which is accessing the data through mplsoccer library(check documentation)in python since it prevents the clutter of using more libraries.

Lets import all the required libraries.

Now let’s check the ISL season available in the data

Now, we check the matches of that season using the competiton_id and season_id.

Now we grab the final game events using the match_id.

Plotting Pass maps

Now to plot pass maps, we can either plot it for each player separately. I will demonstrate the process to do both.

The process to plot individual pass map of the player is:

  1. Filter the data by ‘type_name’ as ‘Pass’ and also by the player name. Save it in another data frame.
  2. Create an instance of class ‘Pitch’.(You can read the mpl soccer documentation for understanding how to change how the pitch looks)
  3. Iterate over all rows of events. Get the x and y co-ordinates of the pass origin and then the distance travelled in the direction using the subtraction of pass end co-ordinates and pass start co-ordinates.
  4. Now within that loop, check if the pass was incomplete or not. Based on that plot it using the ‘arrow’ function with color red or green. We also plot a scatter bubble at the pass start point.
  5. After the loop, add a title text for the image.

The code for the function is as follows:

The output:

Now, to generate the passing maps of all players of a team, we create a grid of plots and iterate over all players of that team. The code is as follows:

The output:

Plotting Heatmaps

The process to plot heatmaps is very similar to plotting pass maps. The only difference is that we plot KDE plots instead of scatterplots and arrows.

The code to plot a heatmap of an individual player is as follows:

The output is:

The code to plot the heatmaps of all players of a team together is:

The output is:

Analyzing the xG in the Game

The term ‘xG’ stands for expected goals from an action. It means the general probability of an action resulting in a goal from that position based on historical actions taken from a similar region.

To plot the xG graph for a team:

  1. Filter the data by team name.
  2. Then filter out the irrelevant columns too.
  3. Use a lineplot to plot the data of xG versus the minutes.

The code for the process is:

The output is:

Plotting the Shot map

To create a shot map for both teams in a single pitch, the steps are:

  1. Filter the data by team name for both teams and store in different data frames.
  2. Then filter out the irrelevant columns too.
  3. Filter out by ‘type_name’ as ‘Shot’ only since we are only concerned with xG relating to shots.
  4. Create an instance of class ‘Pitch’.
  5. Use scatter plot for both teams to plot the x and y co-ordinates of the action but for one of the teams make the co-ordinates as 120-x and 80-y since it should show up on the other side of the pitch for simplicity. Use different bubble colors for the teams as well.
  6. The size of the bubble should be xG value multiplied by a big number like 5000 or 4000, since it will allow us to see the bigger chance as a bigger bubble.
  7. Put the text title on the plot.

The code for this process is:

The output is:

If you want to plot the shot map for only one team on half pitch then the code is:

The output is:

Plotting the Passing map

We can also plot the passing map for the team by using the average player positions during a pass action taken by them. I will cover the detailed steps for the same in a separate article. The code for that is as follows:

The output is:

Conclusion

Congratulations, if you have done the steps above then now all of you can call yourself junior football data analysts. Give yourself a pat on the back for that!!

I hope I could inspire some of you to start your own journey in sports data science. For any questions, do not hesitate to reach out to me in the comments.

I have compiled all of these codes in a Kaggle notebook here.

--

--

Abhishek Mahajan

A developer, part time educator, part time football enthusiast and a full time procrastinator.