Photo from Unsplash by Emerson Vieira

2020–21 Transfer Window — Plotting Maps

Gabriel Meireles
Data Science Soccer Club
5 min readSep 21, 2020

--

Hello everyone! In this article we’ll talk about Plotting Maps, after a few days of research, reading and many attempts I came to this final article in which I cover examples not only of how to plot maps using the Plotly library but also some examples of data handling with Pandas.
In the previous post we covered how to get data from a website using the technique known as web scraping, so continuing our work, today we’ll create a way to visualize the data obtained.

What we’ll do?

Using the Pandas library we’ll explore and manipulate the data to be plotted by the Plotly library. We’ll use data sheet extracted in the last article and available on this link.

data sample from csv file

It’s important to have knowledge about the columns that we’ll work on, they are:

  • name: The player’s name
  • position: The player’s position
  • age: The player’s age
  • market_value: The player’s market value
  • country_from: The country that the player was playing
  • league_from: The league that the player was playing
  • club_from: The club that the player was playing
  • country_to: The country the player will play
  • league_to: The league the player will play
  • club_to: The club the player will play
  • fee: How much the club paid for the player

Let’s start by importing some libraries:

  • Pandas to handle the data (Responsible for doing the magic with the data)
  • Nominatim to pull the geo data
  • Plotly to build the plots
  • Pycountry to handle the ISO database
  • Os to handle some paths

Just as we created some functions to help us extract the data, we’ll create some others functions to help us handle the data, so let’s start with get_iso which should receive the country and return the ISO 3166–1 alpha-3:

After that we’ve the function get_geo which receives country and type and returns the latitude and longitude of the informed country.
It’s certain that this function needs to be refactored:

Now let’s start by reading our data:

And that must be the result:

Result from loaded dataframe

So let’s apply the get_geo function to get the latitude and longitude of the departure and arrival of the transfer:

Bearing in mind that this process tends to be time consuming, I chose to save the updated data in a new csv and load it again with pandas:

And then this must be the new dataframe:

Result from loaded dataframe with geo data

After applying the get_geo function we’ll have 4 new columns:

  • lat_departure: Latitude of country of departure
  • lon_departure: Longitude of country of departure
  • lat_arrival: Latitude of the country of arrival
  • lon_arrival: Longitude of the country of arrival

And finally we apply the function get_iso to obtain the alpha_3 of the departure and arrival of the transfer:

After that we’ll have 2 other new columns:

  • iso_from: The ISO 3166–1 alpha-3 of country of origin
  • iso_to: The ISO 3166–1 alpha-3 of country of arrival
Result from loaded dataframe with iso data

In the fee column we’ve some values ​​like: “Free transfer”, “Loan” or “?” for when the transfer amount wasn’t informed. We need to remove the lines that contain these values, leaving only the lines that contain integers:

The data structure is the same, but now we’ve less information, notice that before we’d 250 rows, now we’ve 186 rows.
So, 64 rows were removed in which the fee column didn’t correspond with our parameters.

Result from loaded dataframe without string data in fee column

Well, I think we’ve what we need. So let’s start plotting some graphics! Let’s start with the best sellers of this transfer window, for that we group the columns country_from and iso_from and then sum the values ​​of the fee column:

Result from loaded profits

Now that we know how much each country has sold in this transfer window, we can now create our first chart:

2020–21 Best Sellers

Following the same logic used previously, let’s find out who are the countries that spent the most in this transfer window:

Result from loaded spendthrift
2020–21 Top Buyers

The great time has come for us to use geo columns! And now to finish, we’ll render a last map showing the “flights” of departure and arrival of transfers. So let’s start by grouping some data:

This should be our result:

Now let’s build our map:

Note that the vast majority of transfers are internal, that is, the player only changes clubs or even divisions. Another interesting point is the transfers between the countries of Western Europe, which concentrates the vast majority of transactions, besides, it is there that the Big Five clubs are located.

Connection Map depicting flights from all over the transfers

So that the reading does not become tiresome I decided to divide the data visualization in some parts. In our next step we continue to work with data visualization, we will explore everything in detail, such as clubs, players and leagues, in addition to creating interactive plots. So stay tuned to the Data Science Soccer Club and thank you a lot for reading!

--

--