FIFA Stars — a dive into 3 biggies.

Saiyam Bhatnagar
Analytics Vidhya
Published in
3 min readJul 30, 2020

Hello readers! It was a few months ago when I picked up something which gave me a tough time to prepare for my mid-semester exams. As sophomores at an Engineering College — BITS Pilani, we were accustomed to begin preparing only a week before the examinations. The story begins with how in my hostel dorm, my Wingies got me addicted to FIFA-19, a mere 10 days before we ought to take our Mid-semester exams. However, despite diving into the apprehensive reflexes of the past lets discuss what we are meant to. Today, while browsing through the Kaggle data-sets I got through something interesting. So first lets talk about the data.

The datasets consists of 18207 players and 651 clubs. These players are represented by 40 features, some of which are —’Name’, ‘Age’,’Club’,’Crossing’, ‘Finishing’, ‘Heading Accuracy’, ‘Short Passing’, ‘Volleys’, ‘Dribbling’,‘Curve’, ‘F K- Accuracy’, ‘Long Passing’ etc. A couple of minutes after playing with data(though not with the same interest I play FIFA :-p), I found that such a large data-set made the features approximately Normal Distributed. Though there was some skewness present. Therefore, I decided to check how many outliers my 3 favorite teams had. (Barca, Juventus and Man City). These were a couple of steps I took to arrive at a few results.

  1. Feature Engineering

I decided to calculate 2 new values for each of 18207 players- Attacking and Defense. The features were engineered using appropriate class weights. Attacking — {Finishing,Crossing, Heading Accuracy…etc}, Defense — {Long Passing, Marking, Tackle…etc}.

2. Plotting and Visualizing

I could speculate that such a large data-set could easily form a slightly Skewed Normal Distribution. I plotted the histogram using various values for the no. of bins. I took the seemingly best ones and got two Histograms for Attacking and Defense.

Attacking score Histogram
Slightly skewed Defense Score Histogram

3. Inferring from the Distributions

Setting the C.I at 90% and 95%. Apology for those who are not statisticians yet great football fans and want to see the name of their stars in the list. For your reference, what I mean by the CI at 90% and 95% is that I’ll be bringing out the names of top 5% and 2.5% players. These were the mean scores and the respective scores of top attackers and defenders.

The scores out of 100

4. Results

The results then inspired me to check the Release Clauses of the Clubs as a whole. No wonder why so many outliers lie in these clubs. Check the per player Release Clause of the these biggies.

The values is in Millions of Euros

Although the inferences required a bit of data cleaning and pre-processing. Readers interested to go through the code can find it in my github code file. Have a Good day. :-)

--

--