Jump start your learning of StatsBomb football data (including 360 data) using R and Python
This article shows some examples using StatsBomb data, including 360 data, and shows how I set up R, R Studio, Anaconda, Python and the StatsBomb packages on my MacBook Pro M1 to interrogate the data.
This article doesn’t go into any particular depth on R or Python, or the structure of the StatsBomb packages and data. It just aims to generate confidence by showing how to quickly generate some outcomes, so is probably ideal for product manager’s like me who are curious, want a high-level overview, mess around and experiment.
About 18 months ago I wrote this article about ‘How open is English Premier League football data?’ which led me to experiment with some of StatsBomb open data, which I did by setting up R Studio on Ubuntu. So, when I saw this that StatsBomb have made a selection of their 360 data freely available, I thought I’d give that a go on a fairly fresh installed MacBook Pro M1.
Opening up data helps people learn
StatsBomb’s reasons for opening up their data are summarised in quite a few places:
StatsBomb have an ongoing commitment to developing football analysts of the future. We try our best to provide the education, tools, and resources needed to learn the trade for the many amateur analysts out there looking to forge a career in professional football. Part of this commitment has been to release industry-leading event data free of charge.
I’m not looking for a career as a football analyst myself, but as someone who’s built much of my career around open-source, I think it’s fantastic they do this. What I am actually interested in is how sports data and video gets delivered, at low levels of latency, to those who want to make use of it. Working with StatsBomb data has supported that interest by helping me understand how football data is structured, so their philanthropy here serves more purposes than they perhaps realise. Certainly, the article I wrote back in Jan 2021 played a big part in me getting my last job; I was an experienced product manager already, but being able to work with these tools helped me move into what was a new sector for me. It also made me think that StatsBomb would be a cool company to work for.
You can’t just throw it out there
Opening the data up is one thing, but it needs to be well documented and supported by those that publish it. Often on the latter this is where you see these initiatives fall down: As a user, you work through a tutorial, hit an issue (which may be a bug), find a relevant forum to post on. Then wait….
My experience here may just be anecdotal, but I was seriously impressed that within little over an hour of reporting this issue, around a key function in the R library, that not only was a fix tested and pushed, but the guide also updated to reflect that.
Getting set up
I’m going to cover getting set up with both R and Python at the same time, although you only need to do one or the other. Follow the steps below (which I’ll also cover in the video)
1. Install Python
If you’ve never worked with Python before then the quickest way to get set up is to go to Anaconda and download and install that. Be careful — if you’re using an M1 with an Apple silicon chip, the homepage may not give you the correct download link — in which case you can find the right installer here.
Once you’ve got Anaconda installed, open Anaconda.Navigator and from there install Jupyter Notebook, which you can use to manage your Python code. (If this is completely new to you, Notebook is what’s called an Integrated Development Environment (Or IDE))
If you have an Intel Mac or Linux you’ll also see Anaconda.Navigator gives you an option to install R Studio Desktop, which you’ll need for R. If you want to keep everything in one place you might want to try installing it here.
2. Get the StatsBomb Python package
If you don’t already have a GitHub account, sign up for one, download the GitHub desktop client and learn how to clone a repository. (Code, like the StatsBomb Python package, called statsbombpy, lives in repositories, which you’ll want to clone so you can run on your machine.)
3. Install R and R Studio
In this article scroll down to the Accessing the data section, where you’ll see a link to guide to using StatsBomb data in R. In this, page 3 covers installing the coding language and page 4, the IDE, R Studio.
In most instances from page 3, going to this link to install R should be best — and again — look carefully for the M1 download on the MacOs page if you have an M1 Mac. For the install of R Studio, page 4, it’s R Studio Desktop that you need.
When you launch R Studio for the first time it may give you the option to install the command line development tools for git. Whether you say yes or no to that I think the next best thing to do once you have R Studio open is run the following commands in the console window, which are on page 8.
install.packages(“devtools”)
install.packages(“remotes”)
remotes::install_version(“SDMTools”, “1.1–221”)
devtools::install_github(“statsbomb/StatsBombR”)
Run them line-by-line, like I have here in the console, starting with the first line:
If you get any errors like ERROR: lazy loading failed for package ‘StatsBombR’ or Error: Failed to install ‘StatsBombR’ from GitHub: Git does not seem to be installed on your system then refer to point 2 above and set yourself up a Git account.
Check out this video which walks through the steps above, as well as showing data examples that come next in the article.
You are now ready to experiment!
An example in R
Let’s start in R with this file, which you can save to your desktop so you can open it with File > Open File in R Studio. This will generate the pass map above. (Once you have the file open you can run it by pressing the Run button, and running the file a line at a time, or select all the lines and then click Run.)
You can find out step-by-step how to generate this passmap by using this tutorial by BiscuitChaserFC, or by following my original video. Working through the Accessing & Working With StatsBomb Data In R guide will help you get more up to speed with the basics and generate more examples, before digging into the 360 data.
An example in Python
Abhishek Sharma has put together this great tutorial to generate passmaps in Python. If you want to short cut this and get straight to some results, then do the following:
- Clone Abhishek’s repo.
2. Open Anaconda.Navigator and from there, open Jupyter Notebook. As Notebook opens, it’ll open a terminal window, which will then create a tab in your browser at http://localhost:8888/tree (you can’t click on this link here, as it’s local to your machine and will only open when Notebook is running).
3. Browse to where you have your repo cloned (see below).
4. From here ,select the checkbox next to main.py so you can duplicate it.
5. Once you have your duplicated file, open it. You can do this by clicking on its name which will open it in a new tab. You can now edit this without destroying the original. So, for example, at line 13, you can change match_id = “8658” to match_id = “2275150”
6. Open up a terminal window and navigate to where your repo is cloned. For me, as my terminal opens in my home directory I do that by typing cd Documents/GitHub/passmaps (then hit ENTER).
7. Type python3 main-Copy1.py (assuming main-Copy1.py is the name of the file you created and edited, then hit ENTER). This will then open up a passmap, like the below, which you can save to your machine:
An example with 360 Data in R Studio
1. Download this file. This includes all the code from StatsBomb’s 360 announcement.
2. Open R Studio and use File > Open File to open the file you’ve just downloaded. (Remember, once you have the file open you can run it by pressing the Run button, and running the file a line at a time, or select all the lines and then click Run.)
3. This will generate you this table:
and this image:
I hope this has all been useful and gets you started. Feel free to add some of your own examples or questions to the comments. I’ll add any new examples I come up with in the comments as well.
Digital decisions are never a walk in the park, so please get in touch and let me help you find the right way through the technical landscape.