FIFA 22 Players Value Analysis

Emilio Girardin
5 min readDec 19, 2022

--

A Data Analysis with R

Background Info

As FIFA World Cup 2022 is playing, FIFA players become the center of public attention. What are the key characteristics that made them world class players?

*This is a Data Analytics study project based on the work done by @FangyaFIFA 2018 players Euro Value analysis” and updated with the FIFA22 data base

Prepare de Library and FIFA22 Dataset

install.packages(“tidyverse”)
install.packages(“caTools”)
install.packages(“corrplot”)
install.packages(“devtools”)
install.packages(“psych”)
install.packages(“corrr”)
install.packages(“PerformanceAnalytics”)
install.packages(“e1071”)
suppressWarnings(library(caTools))
suppressWarnings(library(ggplot2))
suppressWarnings(library(corrplot))
suppressWarnings(library(forcats))
suppressWarnings(library(devtools))
suppressWarnings(library(psych))
suppressWarnings(library(corrr))
suppressWarnings(library(PerformanceAnalytics))
suppressWarnings(library(e1071))
suppressWarnings(library(dplyr))
suppressWarnings(library(lubridate))
suppressWarnings(library(tidyverse))

fifa22 <- read.csv(“players_22.csv”)

In the following graph, we will show the top 10 most valuable players in the world using FIFA22 player stats. From the graph, we begin to wonder why Mbappé, Halaand, Kane and Neymar are the most expensive players?

player <- fifa22[, c(3,8,15)]
player1<- player[order(-player$value_eur),]
player2<- player1[1:10,]# top 10 players
ggplot(data=player2, aes(x=reorder(short_name,value_eur), y=value_eur))+geom_bar(stat=”identity”, aes(fill=club_name))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle(“10 Most Valuable Players in 2022”) + xlab(“Players”) +ylab(“Players Euro Value”)

We know that E.Haaland is nowadays a player of Manchester City, and R. Lewandowski move to FC Barcelona so we have to update their clubs

We will also set the value of each player in € Millions

player <- fifa22[, c(3,8,15)]
player[30,3] = “Manchester City”
player[2,3] = “FC Barcelona”
player1<- player[order(-player$value_eur),]
player2<- player1[1:10,]# top 10 players
ggplot(data=player2, aes(x=reorder(short_name,-value_eur), y=value_eur/1000000))+geom_bar(stat=”identity”, aes(fill=club_name))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +guides(fill=guide_legend(title=”Club”))+ggtitle(“10 Most Valuable Players in 2022”) + xlab(“Player”) +ylab(“Value in € Millions”)

value <- fifa22[, c(6,7,8,9,31)]
cvalue <- cor(value, use = “pairwise.complete.obs”) ## we use pairwise.complete.obs to avoid having NA (?) values
corrplot(cvalue, title = “Correlation Plot of Value”)

After a simple correlation analysis, we have noticed a players market value is highly associated with players wage (salary in euro), international reputation, overall score and potential score. Although, euro value has an extremely high correlation coefficient, those two variables are too similiar to analize. In addition, internationational reputation is too abstract do an analysis.

Therefore we will investigate the relationship between the euro value and a player’s overall ability.

fifa22_1 <- fifa22[, c(6, 7, 8, 9, 10, 12, 13, 31, 38:72)]
lmfifa22_1 <-svm(formula = (value_eur) ~ overall, data = fifa22_1, type = “eps-regression”)

ggplot()+ geom_point(aes(x=fifa22_1$overall, y=fifa22_1$value_eur/1000000), color=”green”)

## Warning: Removed 74 rows containing missing values (`geom_point()`).

+geom_line(aes(x=fifa22_1$overall), y=predict(lmfifa22_1, newdat1 = fifa22_1)), color=“purple”)

The graph demonstrated there is a strong relationship between players overall score and euro values.

Analysis on players overall score

1. Correlation analysis

We willperform a correlation analysis to select highly correlated variables with overall score for players and plot the correlation map and heat map.

fifa22_2 <- (fifa22_1[, -c(6,7,9)])
cfifa22 <- cor(fifa22_2, use = “pairwise.complete.obs”)

cor1 <- head(round(cfifa22,2))

# pick the highest correlation with overall
cfifa22_1 <- cfifa22[c(1,5,7,8,9,11,15,20,25,27,35,37), c(1,5,7,8,9,11,15,20,25,27,35,37)]

#correlation map
corrplot(cfifa22_1, title = “Correlation Plot for Overall”, tl.col=”black”, tl.cex=0.7, tl.srt=60)

#heatmap
col <- colorRampPalette(c(“darkblue”, “white”, “darkorange”))(20)
heatmap(x=cfifa22_1, col=col, symm=TRUE)

#Correlation map details
test <- cfifa22_1[, -c(1)]
suppressWarnings(chart.Correlation(test, histogram = TRUE, pch=9, method = “pearson”))

2. Split training and testing dataset

#spliting the dataset with ratio 0.75 with highly correlated variables
fifa22_3 <- fifa22_2[, c(1,5,7,8,9,11,15,20,25,27,35,37)]

value <- factor(fifa22_3$overall)
pairs(fifa22_3[, -2], col=value, upper.panel = NULL, pch=16, cex=0.5)

set.seed(124)
split = sample.split(fifa22_3$overall, SplitRatio = 0.75)
train = subset(fifa22_3, split == TRUE)
test = subset(fifa22_3, split == FALSE)

From the correlation analysis we picked 10 most important variables, now we will perform a PCA analysis to see if we can reduce the dimensions down and if so, what is the most important PCs.

pcafifa22 <- train [, -c(1)]
pca.value <- train [, 1]

3. Perform PCA
pca1 <- prcomp(na.omit(pcafifa22), center=TRUE, scale.=TRUE, cor=TRUE)

## Warning: In prcomp.default(na.omit(pcafifa22), center = TRUE, scale. = TRUE,
## cor = TRUE) :
## extra argument ‘cor’ will be disregarded

summary(pca1)

## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.5796 1.2605 0.96119 0.75969 0.63662 0.49144 0.47694
## Proportion of Variance 0.6049 0.1444 0.08399 0.05247 0.03684 0.02196 0.02068
## Cumulative Proportion 0.6049 0.7494 0.83338 0.88584 0.92269 0.94464 0.96532
## PC8 PC9 PC10 PC11
## Standard deviation 0.39771 0.33205 0.28209 0.18289
## Proportion of Variance 0.01438 0.01002 0.00723 0.00304
## Cumulative Proportion 0.97970 0.98973 0.99696 1.00000

#plot PCA
plot(pca1, type=”l”)

biplot(pca1, choices=1:2, cex=.7, expand=3, xlim=c(-0.1,0), ylim=c(-0.7,0.03))

The plot shows the variances associated with the PCs. we can see the first 3 PCS explained most of the variability in the data. Now we use the predict function on the test dataset just by 3 variables, movement_reactions, passing and mentality_composure

4. SVM Regression

Now we will perform a SVM Regression, see how it fits the data

lmfifa22 <- svm(formula= overall ~ movement_reactions+ passing + mentality_composure, data=train)

1. Visualization of the training set

ggplot()+ geom_point(aes(x=train$movement_reactions, y=train$overall), colour=”pink”)+ggtitle(‘FIFA22 players Overall ability vs Players Reactions in Training Set’) +xlab(“Player Reaction”) +ylab(“Players Overall”)

ggplot()+ geom_point(aes(x=test$movement_reactions, y=test$overall), colour=”cyan”)+ ggtitle(‘FIFA players Overall ability vs Players Reactions in Test Set’) +xlab(“Players Reaction”) +ylab(“Players Overall”)

As we can see from the test dataset graph, our SVM model did a pretty good predictions with Players Overall Scores

Conclusion

For FIFA22 players who want to increase their market value, with this data analysis, we can suggest them to improve their overall score, first by practicing their reactions, passing and mental composure in the pitch.

Today is the World Cup 2022 final . France and Argentina players have showed the world that three factors we have studied here have been essential for them to have reached the final match.

--

--