Statistical Analysis with Python: Pierre-Emerick Aubameyang

Editor — Ishmael Njie

DataRegressed Team
DataRegressed
8 min readFeb 3, 2018

--

Yo Pierre, you wanna come out here?

Following Henrikh Mkhitaryan’s arrival at Arsenal, we (Arsenal fans) were convinced that he had ‘let slip’ of the imminent arrival of former team-mate, Pierre-Emerick Aubameyang. The pair were a lethal combination during their time together at Borussia Dortmund; Arsenal fans were excited at the prospect of the pair playing together once again.

Pierre-Emerick Aubameyang signed for Arsenal for a club record fee on Transfer Deadline Day (31st January). Winning last season’s Bundesliga top goal scorer award with 31 goals, he is touted as one of the best strikers in the world.

So as an excited Arsenal fan, I decided to do some analysis to find out how good Aubameyang really is (and to practice more Python).

ESPN provides great statistics of each player divided by year. I selected data dating back to 2012, during his time at French side Saint-Etienne. Bringing these statistics into Excel made it easier to read into Python.

To start off, we need to import the relevant libraries that will help us to clean and analyse the data.

Next, we will read each csv file for each year as a Dataframe (DF).

All of the data in different dataframes is no good to us, so next we will have to bring together all of the dataframes using the ‘concat’ method. Following this, I noticed that the date format had to be changed so that it was easy to sort correctly. We will then create a new DF with the new columns and column names.

Here, we will sort the DF via date

What we will do next is save the DF as a csv file, this will help to reassign the index of each of the observations. As you can see in the previous image, the first instance has an index of 38. The indices have not changed after the ‘concat’ method.

After reading the csv back into Python, we can see that the first labelled column is the index values from previous. We can create a DF that drops this column as it is no use to us.

During data collection, one has to be mindful of null/NaN values in the dataset. Let us check for any missing values.

Indicated by the ‘True’ value, there are columns with null values, so we have to investigate this. We will create a DF with all of the rows that contain NaN values.

Through investigation, I found that the first instance needs to be amended. In actual fact, Aubameyang won that game to win the Coupe de la Ligue Final against Stade Rennais during his time at Saint-Etienne.

Onto the next instances, under Appearance, you can see that they are all noted as ‘Unused Substitute’. Being an Unused Sub, this implies that Pierre did not play. So for our analysis, it is irrelevant to us.

After the removal of the null values you can see that 0 columns have null values.

One thing I had noticed is that there was an instance at the last row of the DF that was not within our time frame.

The last instance, his match against Morocco is outside our time frame, this is in the 2017/2018 season and we are only looking at 2011/12 to 2016/17. So we will proceed to drop that row.

Again, we will save the new clean data as a csv file and read it back in (This is not a key aspect in analysis, this is just for me).

All of the above was just cleaning the data; in my opinion, this is the most important aspect in data analysis is the preparation and cleaning of data. This is to ensure accurate results and easier data handling.

Next, we will look into the summary statistics of the data.

Summary Statistics

Count: The count illustrates the number of instances in the dataset. So from this, we can say that during this time frame, he made 284 appearances.

Mean: a measure of centrality. So looking at the mean of the ‘Goals’ column, we can see that Pierre scores just over 0.5 goals a game. Meaning, roughly he scores a goal every 2 games. We can also see that he is not really known for his assists, assisting on average, just over 15% of the time.

Std: Standard Deviation, measure of variance.

Percentiles: 25%, 50%, 75% being the first, second (median) and third quartile respectively.

Min,Max: Smallest and largest value respectively.

During his Dortmund days

We can look at more statistics to have a greater understanding of his performance.

Instead of performing this summary with many rows and columns, we can narrow down our stat search. We can compute the ratio of goals to appearances. This should result in the same value as the mean of goals.

Ratio is the same as the mean shown above

We can also look at the efficiency of Aubameyang based on his Shots compared to his goals.

This tells us that if Pierre takes a shot, he is likely to score 19% of the time. However, if we look at the shots on target compared to goals, this increases to around 40%.

Let us look at graphs to visualise the data.

First, let’s look at the most important thing for a striker to deliver: Goals. I created new columns to separate both the month of the matches and the ground on which they were played on, whether they were home or away.

Now, Let us look at how well Aubameyang performs each month based on his goal tally.

Aug-Jul to represent season timeline

Now looking at the graph, we can see that he is most prolific in the month of February, which is fitting as he scored on his debut for Arsenal on the 3rd February 2018. Following that, September seems to be a prolific month for him, implying that he starts the season fairly well. One can say it is expected that June and July are not prolific months for Pierre as they are considered as the off season due to most European club competitions ending in May.

In addition, we can look at his performance at Home and Away.

From this graph, we can conclude that Pierre scores most of his goals playing at his Home ground. However, the measures do not seem to be far from one another. If we take the sum of the goals at both home and away, we can have a clearer understanding.

The sums for home and away were 84 and 80 goals respectively. So he has only scored 4 more goals at the home ground of his club than away at an opponent, which could either imply that he performs better at home, but also does not shy away from the goals at the opposition’s ground.

To conclude, Arsenal fans are excited with the arrival of the prolific striker following the exit of ‘mercenary’ Alexis Sanchez. His goals to game ratio of 0.57 is very impressive and Arsenal fans have been screaming for a lethal centre forward to mimic the days of Thierry Henry. His debut goal and immediate involvement is likely to foreshadow a colourful future for Aubameyang and the Gunners. If I had one message for Arsene Wenger, it would be to play him in February!!

Thank you for reading!

--

--