2020–21 Transfer Window — Plotting Maps
Hello everyone! In this article we’ll talk about Plotting Maps, after a few days of research, reading and many attempts I came to this final article in which I cover examples not only of how to plot maps using the Plotly library but also some examples of data handling with Pandas.
In the previous post we covered how to get data from a website using the technique known as web scraping, so continuing our work, today we’ll create a way to visualize the data obtained.
What we’ll do?
Using the Pandas library we’ll explore and manipulate the data to be plotted by the Plotly library. We’ll use data sheet extracted in the last article and available on this link.
It’s important to have knowledge about the columns that we’ll work on, they are:
- name: The player’s name
- position: The player’s position
- age: The player’s age
- market_value: The player’s market value
- country_from: The country that the player was playing
- league_from: The league that the player was playing
- club_from: The club that the player was playing
- country_to: The country the player will play
- league_to: The league the player will play
- club_to: The club the player will play
- fee: How much the club paid for the player
Let’s start by importing some libraries:
- Pandas to handle the data (Responsible for doing the magic with the data)
- Nominatim to pull the geo data
- Plotly to build the plots
- Pycountry to handle the ISO database
- Os to handle some paths
Just as we created some functions to help us extract the data, we’ll create some others functions to help us handle the data, so let’s start with get_iso which should receive the country and return the ISO 3166–1 alpha-3:
After that we’ve the function get_geo which receives country and type and returns the latitude and longitude of the informed country.
It’s certain that this function needs to be refactored:
Now let’s start by reading our data:
And that must be the result:
So let’s apply the get_geo function to get the latitude and longitude of the departure and arrival of the transfer:
Bearing in mind that this process tends to be time consuming, I chose to save the updated data in a new csv and load it again with pandas:
And then this must be the new dataframe:
After applying the get_geo function we’ll have 4 new columns:
- lat_departure: Latitude of country of departure
- lon_departure: Longitude of country of departure
- lat_arrival: Latitude of the country of arrival
- lon_arrival: Longitude of the country of arrival
And finally we apply the function get_iso to obtain the alpha_3 of the departure and arrival of the transfer:
After that we’ll have 2 other new columns:
- iso_from: The ISO 3166–1 alpha-3 of country of origin
- iso_to: The ISO 3166–1 alpha-3 of country of arrival
In the fee column we’ve some values like: “Free transfer”, “Loan” or “?” for when the transfer amount wasn’t informed. We need to remove the lines that contain these values, leaving only the lines that contain integers:
The data structure is the same, but now we’ve less information, notice that before we’d 250 rows, now we’ve 186 rows.
So, 64 rows were removed in which the fee column didn’t correspond with our parameters.
Well, I think we’ve what we need. So let’s start plotting some graphics! Let’s start with the best sellers of this transfer window, for that we group the columns country_from and iso_from and then sum the values of the fee column:
Now that we know how much each country has sold in this transfer window, we can now create our first chart:
Following the same logic used previously, let’s find out who are the countries that spent the most in this transfer window:
The great time has come for us to use geo columns! And now to finish, we’ll render a last map showing the “flights” of departure and arrival of transfers. So let’s start by grouping some data:
This should be our result:
Now let’s build our map:
Note that the vast majority of transfers are internal, that is, the player only changes clubs or even divisions. Another interesting point is the transfers between the countries of Western Europe, which concentrates the vast majority of transactions, besides, it is there that the Big Five clubs are located.
So that the reading does not become tiresome I decided to divide the data visualization in some parts. In our next step we continue to work with data visualization, we will explore everything in detail, such as clubs, players and leagues, in addition to creating interactive plots. So stay tuned to the Data Science Soccer Club and thank you a lot for reading!