Exploratory Data Analysis -FIFA20

Bikash Acharaya
6 min readOct 2, 2020

--

In this article, we will learn to explore data using python. This will help us to get a better understanding of the data, identify features most helpful for analyses based on their feature importance.

“Exploratory Data Analysis is the process of exploring data, generating insights, testing hypotheses, checking assumptions and revealing underlying hidden patterns in the data”

First, we will import the necessary Python library into Jupyter Notebook

Secondly, we will read the CSV dataset using the panda’s library and observed some analysis on it.

df.head(5): This function returns the first 5rows for the object based on the position
df.head(5): This function returns the first 5rows for the object based on the position.
df.tail(5): This function returns the last 5rows from the object based on the position.
df.shape — Return a tuple representing the dimensionality of the DataFrame. So, Our dataset contains 18278 rows and 104 columns.
df.columns — The column labels of the DataFrame. In our dataset, there is a total of 104 columns and to view all the columns simply code: df.columns.tolist()
df.info() — The function to print a short summary of the data frame. As we can see in the output, the summary is very crisp and short. It is helpful when we have 1000s of attributes in the data frame.
df.describe() — It is used to view some basic statistical details like percentile, mean, std, etc. of a data frame or a series of numeric values. As shown in the output image, the Statistical description of the data frame was returned with the respectively passed percentiles.

Thirdly, We will check the missing (NaN) value in our dataset.

df.isnull() — Detect missing values. As shown in the output image, we can observe that there are False and True keywords in our dataset i.e False refer to valuable data & True refer to missing data. So, In some columns, we can able to detect True keywords.
df.isnull().sum() — From this piece of code we can able to detect the number of missing values in each column easily.

Fill up NaN value with different techniques.

Dropping Columns in the dataset.

Top Players with highest Overall status in FIFA20

We can observe that Messi on the top and follow by Cristiano in the overall rating.

Top Players with highest Potential status in FIFA20

Here, K.Mbappe is in the top with 95 potential ratings.

The most expensive player in FIFA20

Messi is the most expensive player with 56500 euros.

The highest value player in FIFA20

Neymar is the higher value player with 105500000 euros.

Top International Reputation in FIFA20

Messi is the top international reputation in FIFA20

The eldest player in FIFA20

H.Sulaimani belongs to Saudi Arabia and he is 42 years. Although, C.Munoz belong to Argentina and he is also 42 years old.

The youngest player in FIFA20

Here, We can observe that most of the players started the journey at the age of 16.

Data Visualization in FIFA20 dataset

In this Bar graph, Most of the players are from England, Germany, Spain, France, Argentina & Brazil. Although, Japan has the highest number of football players in Asia.

The top player with different skills.

Extract the dataset with top 5 players with the skillset
Again, extracting the data with a new variable which helps to make the visualization easier.
In this bar graph, we can observe that Messi, Neymar & Hazard have good dribbling skills. Although, Rolando has a higher shooting rate compare to other players. And De Bruyne has good passing and defending skills.
Most of the player is playing with the right foot in FIFA20
Most of the player is belong to 21–27 year old in FIFA20
In the dataset of FIFA20, many players have an overall rating between 65–70.
The player belongs to Substitution and Reserve has a higher position in FIFA20. There are only a few Right Winger and left-wingers.
A heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors. Here it illustrates the relationship between physic, defending, dribbling & pace.

Extracting the data of Messi & Rolando from FIFA20 Dataset

In this piechart, Rolando having higher dribbling & Ball control skills. While Messi has higher skills in Curve, Accuracy & Long Passing.

Best Forwarder in FIFA20

Messi & Rolando have good finishing skills. Although Messi & Bernardo Silva have a higher crossing rate compare to other players. In the heading, CR7 has a perfect rate. And Dybala has a nice volley. Finally, Messi has nice passing.

Best Goalkeeper in FIFA20

In this line plot, Oblak has good handling skills. While Ederson & Lloris have a better sprint compare to other players. Although, Ederson has perfect kicking skills. Finally, De Gea has good reflexes.

Best defender in FIFA20

In the context of the defender, Chiellini has a nice marking skill. And Van Dijk has a perfect Standing tackle while Ramos has a strong Sliding tackle.
The above Visualization illustrated Boxplots about different power skills among players. In power stamina, the age between 21–35 has a good rate & Mostly age of 32 to 35 has a higher power shot skill. In jumping skills, 26 age of player has a slightly higher rate. Finally, in long shorts 32 & 34 age player has a higher performance rate.
The above visualization is a scatterplot using subplots. Here It illustrates the relationship between mentality aggression & interceptions as well as the mentality positioning & mentalities penalties which separate the rate between the age of the player. Most of the player between the age of 25 to 35 have a higher rate in both visualizations.
This visualization is a histogram using subplots. Here it describes the Distribution of skills set between players. In terms of Dribbling, most players have a rate between 60 to 70. Also in curve skills, most of the player have a rate between 50 to 70. While accuracy rate 30 to 40. Finally passing skills has a higher rate between 60 to 70. We can clearly observe in the entire histogram that only a few players have rated higher than 80.
This is a pairplot using the seaborn library. A pairplot plot pairwise relationships in a dataset. Here it illustrates the relationship between the different movement skills among players i.e acceleration, speed, agility, reactions & balance which is differed by the age of the player.
Correlation between all variables in the dataset.
Here we can observe that Chad Mozambique has the highest rating. While Brazil and Algeria placed in the second position in terms of rating.

Asking and Answering the Question.

Being able to ask and answer questions is an important part of teaching and learning. Asking questions helps you motivate curiosity about the topic and at the same time helps you assess their understanding of the material.

With the help of the sort_value() function, we can able to trace the top 10 players in the world with the highest overall rate.
So, Messi placed at the top position in the dataset with 94 overall ratings and He belongs to Argentina Nationality also present in Club FC Barcelona.
At first, We groupby(“club”) from the dataset. Then With the help of for loop, we retrieve all the dataset of each club.
Then. By using the get_group(“Real Madrid”) function can able to retrieve the entire dataset of Real Madrid player.
In this barplot, we can observe that the most valuable player of the Real Madrid club in FIFA20 is E. Hazard. After that Toni Kroos placed in the second position. Although, Isco is the third-highest valuable player.
With the help of the loc function, we can easily retrieve the dataset of GK in Real Madrid. In diving and handling, Courtois has the highest point compare to other GK. While Navas has a good skill point in Kicking, reflexes, and speed.
In this barplot, we can observe that Hazard has the perfect shooting skill. While Casemiro has the best physic & Defending skills. And Jovic has higher shooting skills. Finally, Modric has great passing skills compare to other players.

--

--

Bikash Acharaya

A proactive BE. Computer Science student at the Visvesvaraya Technological University (VTU) with strong academic achievement and volunteering experience.