2020-21 Transfer Window — Data scraping with Python

Published in

Data Science Soccer Club

3 min readSep 17, 2020

Hello there!
I’m really very excited to share some of the things I’ve been learning for the past few days. I’m not an expert in Python, statistics, maths or even in English, but my purpose is share with you some information about soccer and data science with Python. So without further ado, let’s start!

What we’ll do?

Every beginning of the season the market is heated, many transfers are made, some clubs invest heavily aiming not only at local competitions but also the desired international achievements. Our mission is to capture this information, treat it as necessary and present it through graphic representations.
If we put everything on a list, this is what we’ve to do:

Create a script to get the latest transfer data
Create a script to obtain information (goals, assists, cards, etc.) from the respective players
Create graphic representation through the scraped data

Scraping the data

The first step is to obtain information about the transfers and for that we use Transfermarkt

All we need is to read the information from the table and save it in a csv file and for that we’ll use BeautifulSoup, a library responsible for pulling data out of HTML and XML files, in our case HTML. With bs4 it’s possible to transform rows and columns of a table into a Python list of dictionaries.

Let’s start by importing some libraries:

Requests to make the request to the web address
BeautifulSoup to pull data from HTML
Csv to put the data in the csv file
Re to handle regex

Now we’re going to create some functions that will help us throughout the application. Starting with data_to_csv which receives a list to save a csv output file:

We also have the format_text function that takes a string and remove some chars like double with spaces or escape sequences:

Now a function to handle with the currency:

And finally our function responsible for accessing the pages (line 10), transform the HTML in a soup object (line 11), look for an element with the responsive-table class (line 13), so iterate all the even and odd classes (line 16) to get the ‘tds’ or cell and then create a dictionary with the information we need (line 28), appending the var player (line 42) to the players_list (created on line 2) and finally return on line 46. Easy, right?

Then we run the script saying that we want to browse the first 10 pages, remembering that each page displays 25 players, we’ll have 250 players in all. So, we send the dictionary to the function data_to_csv responsible for saving the data in a csv file:

And the result is this:

Now that we’ve the necessary data, the time has come to manipulate and display it:

2020–21 Transfer Window — Plotting Maps

Hello everyone! In this article we’ll talk about Plotting Maps, after a few days of research, reading and many attempts…

medium.com

2020-21 Transfer Window — Data scraping with Python

What we’ll do?

Scraping the data

2020–21 Transfer Window — Plotting Maps

Hello everyone! In this article we’ll talk about Plotting Maps, after a few days of research, reading and many attempts…

Written by Gabriel Meireles