Football Analytics 101: How To Scrape Data From Understat.com

Cem ÖZÇELİK
4 min readJun 11, 2022

--

Photograph Via : Pexels, Pixabay

Hello from another data science project. How r u ? I hope you all are cool. Today was a very productive day for me and I had the opportunity to make an introduction to a topic that I had postponed for myself for a long time. Yes, I wanted to start working on football analytics for a long time, but I kept putting it off. And I asked to myself why don’t I start today?

Today we will be doing an introductory study on football analytics and we will get our own data from understat.com. There are already many companies that keep the data revealed in football matches, such as statsbomb, whyscout, opta. But sometimes, having the freedom to scrape our own data will allow us to work faster in the studies we want to do, and according to our analytical approach, we will be able to get our data from the site we want as we want. The technique we use for this is web scrapping!! For this, we will use the BeautifulSoup library and Request libraries. Without further ado, we can move on to our study.

There is one thing I would like to point out before we start our study. In our study, we will be obtaining the data of the Aston Villa match from understat.com, where we witnessed the epic championship of Manchester City, which has signed an epic championship this season. In this match, we will transform the columns that have taken place and the goal expectations values of the columns scored by the players into a table and transform them as a dataset. So let’s start!!

Ok, now we assign the website from which we will receive our data to a variable and define the match ID variable. I’ll explain the Match ID part again.

MatchID, which we saw in the above image, is an ID located at the end of the URL when we enter the relevant event, namely the Manchester City-Aston Villa match.

The Explanation of the MatchID

We got the matchID from the URL. Now, we get all the materials on the web page related to the Request.get method from the matchID we have, and then we parser the HTML elements with BeautifulSoup.

Above, I stated that we only want to take the shots that took place in the match. Now we only get the shots that took place in the match.

Let’s see what’s inside the strings :

It looks very chaotic.Now let’s bring this structure to a more regular form.

Home Team
Away Team

We transform the organized data into a dataframe. We also define the column names of the dataframe we have created.

The Overlook Of Final Dataset

We put the scattered shot data on the website into a table. Here, our important variable is the result variable. Result variable contains various unique values. These values indicate that the shot was scored (Goal), blocked by an opponent player (BlockedShot), saved by the goalkeeper (SavedShot) and the shot leaving the field of play (MissedShots).

Web scrapping is used as a very useful method when we can’t get a data set from organizations such as Statsbomb that hold the data of sports competitions, or when we want to perform a different analysis approach. In this study, we saw that we can simply obtain football data from a web page. In our next studies, we will focus on more comprehensive topics on football analytics. We have come to the end of our article, I hope it was a pleasant reading session for you.

--

--