The Best in La Liga 2019/2020

Oscar Rojo
The Startup
Published in
8 min readSep 17, 2020

On July 19th “La Liga Santander” season 2019/2020 ended. It was, for obvious and extra-sporting reasons, the longest in the history of the competition. The first match was played on August 16, 2019, at San Mamés, with Athletic winning 1–0 over Barcelona thanks to Aritz Aduriz’s Chilean goal. Between that match and the last day, 338 days passed.

Photo by Tevarak Phanduang on Unsplash

On July 19th “La Liga Santander” season 2019/2020 ended. It was, for obvious and extra-sporting reasons, the longest in the history of the competition. The first match was played on August 16, 2019, at San Mamés, with Athletic winning 1–0 over Barcelona thanks to Aritz Aduriz’s Chilean goal. Between that match and the last day, 338 days passed.

Taking advantage of the amount of data offered by the official website of the League, we will analyze it and obtain the ranking of the best players and teams.

This are the principal variables:

'MIN_JUGADOS''PART_JUGADOS''PART_COMPLETOS''PART_TITULAR''PART_SUSTITUIDO''AMARILLA''ROJA''SEGUNDAMARILLA''GOLES''PENALRECIBIDO''GOLPROPIAPUERTA''GOL_ENCONTRA''BLOQUEOS''INTERVENCIONES''RECUPERACIONES''DESPEJES''ENTRADAS''ENTRADASFALLIDOS''TUATU''DUELOEXITO''DUELOFALLIDO''DUEROAEREOEXITO''DUELOAEREOFALLIDO''TIROS''TIROSAPUERTA''ASISTENCIA''REGATES''REGATESFALLIDOS''GOLESDENTROAREA''GOLESFUERAAREA''GOLESIZDA''GOLESDCHA''GOLESPENAL''GOLCABEZA''GOLBALONPARADO''FUERAJUEGO''FALTARECIBIDA''FALTACOMETIDA''PENALCONTRA''MANOS''FALTAXTARJETA''CORNETS_LANZADOS''DUELOS''DUELOS_CUERPO_CUERPO''DUELOSAEREOS''PASES''PASES_CORTOS''PASES_LARGOS''PASES_HUECO''GOLESXTIRO''GOLESXFUERAAREA''GOLESDONTROAREA''GOLPARADO'

Ref. https://www.laliga.com/estadisticas

Let´s work

sessionInfo()R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
[1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] funModeling_1.9.4 Hmisc_4.4-1 ggplot2_3.3.2 Formula_1.2-3
[5] survival_3.2-3 lattice_0.20-41 dplyr_1.0.2

loaded via a namespace (and not attached):
[1] pbdZMQ_0.3-3 tidyselect_1.1.0 xfun_0.15
[4] repr_1.1.0 reshape2_1.4.4 pander_0.6.3
[7] purrr_0.3.4 splines_4.0.2 colorspace_1.4-1
[10] vctrs_0.3.4 generics_0.0.2 htmltools_0.5.0
[13] base64enc_0.1-3 rlang_0.4.7 pillar_1.4.6
[16] foreign_0.8-80 glue_1.4.2 withr_2.2.0
[19] entropy_1.2.1 RColorBrewer_1.1-2 uuid_0.1-4
[22] plyr_1.8.6 jpeg_0.1-8.1 lifecycle_0.2.0
[25] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0
[28] moments_0.14 htmlwidgets_1.5.1 evaluate_0.14
[31] latticeExtra_0.6-29 knitr_1.29 fansi_0.4.1
[34] htmlTable_2.0.1 Rcpp_1.0.5 IRdisplay_0.7.0
[37] ROCR_1.0-11 scales_1.1.1 backports_1.1.9
[40] checkmate_2.0.0 IRkernel_1.1.1 jsonlite_1.7.1
[43] gridExtra_2.3 png_0.1-7 digest_0.6.25
[46] stringi_1.4.6 grid_4.0.2 cli_2.0.2
[49] tools_4.0.2 magrittr_1.5 lazyeval_0.2.2
[52] tibble_3.0.3 cluster_2.1.0 crayon_1.3.4
[55] pkgconfig_2.0.3 ellipsis_0.3.1 Matrix_1.2-18
[58] data.table_1.13.0 assertthat_0.2.1 rstudioapi_0.11
[61] R6_2.4.1 rpart_4.1-15 nnet_7.3-14
[64] compiler_4.0.2
## First, we clean the working memoryrm(list = ls())# Second, set working directorysetwd("~/Documentos/Medium/futbol/")
getwd()

‘/home/oscar/Documentos/Medium/futbol’

# We install the libraries that we will needpackages <- c('dplyr',"funModeling")
newpack = packages[!(packages %in% installed.packages()[,"Package"])]

if(length(newpack)) install.packages(newpack)
a=lapply(packages, library, character.only=TRUE)

Load dataset

# Clean dataset
raw_data <- read.csv("datos_completos_limpios_1.csv")
head(raw_data)
data_perc<-raw_data%>%
dplyr::select(NOMBRE,POSICION,EQUIPO)
data <- raw_data%>%
select(-NACIMIENTO,-LUGAR_NACIMIENTO,-ALTURA,-PESO,-EQUIPO,-X._PART_JUGADOS,-X._PART_COMPL,-X._PART_TITU,-X.PART_SUSTI,-URL)
dim(data)
  1. 490
  2. 55
summary(data)NOMBRE            POSICION          MIN_JUGADOS    PART_JUGADOS  
Length:490 Length:490 Min. : 0 Min. : 0.00
Class :character Class :character 1st Qu.: 360 1st Qu.: 8.00
Mode :character Mode :character Median :1298 Median :22.00
Mean :1383 Mean :19.84
3rd Qu.:2289 3rd Qu.:32.00
Max. :3420 Max. :38.00
PART_COMPLETOS PART_TITULAR PART_SUSTITUIDO AMARILLA
Min. :-1.00 Min. : 0.00 Min. : 0.000 Min. : 0.000
1st Qu.: 1.00 1st Qu.: 3.00 1st Qu.: 0.000 1st Qu.: 1.000
Median : 7.00 Median :14.00 Median : 3.000 Median : 3.000
Mean :10.74 Mean :15.42 Mean : 4.684 Mean : 3.349
3rd Qu.:18.75 3rd Qu.:25.00 3rd Qu.: 7.000 3rd Qu.: 5.000
Max. :38.00 Max. :38.00 Max. :23.000 Max. :15.000
ROJA SEGUNDAMARILLA GOLES PENALRECIBIDO
Min. :0.0000 Min. :0.00000 Min. : 0.000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.:0.0000
Median :0.0000 Median :0.00000 Median : 0.000 Median :0.0000
Mean :0.1612 Mean :0.08163 Mean : 1.835 Mean :0.1837
3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.: 2.000 3rd Qu.:0.0000
Max. :2.0000 Max. :2.00000 Max. :25.000 Max. :5.0000
GOLPROPIAPUERTA GOL_ENCONTRA BLOQUEOS INTERVENCIONES
Min. :0.00000 Min. : 0.00 Min. : 0.000 Min. : 0.0
1st Qu.:0.00000 1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.: 1.0
Median :0.00000 Median :17.00 Median : 1.000 Median : 8.0
Mean :0.03878 Mean :18.63 Mean : 3.588 Mean :13.7
3rd Qu.:0.00000 3rd Qu.:30.00 3rd Qu.: 5.000 3rd Qu.:23.0
Max. :1.00000 Max. :62.00 Max. :29.000 Max. :68.0
RECUPERACIONES DESPEJES ENTRADAS ENTRADASFALLIDOS
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
1st Qu.: 19.25 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 1.000
Median : 65.50 Median : 12.00 Median : 9.00 Median : 6.000
Mean : 80.19 Mean : 24.56 Mean :12.21 Mean : 8.457
3rd Qu.:121.75 3rd Qu.: 31.75 3rd Qu.:19.00 3rd Qu.:13.000
Max. :370.00 Max. :174.00 Max. :60.00 Max. :39.000
TUATU DUELOEXITO DUELOFALLIDO DUEROAEREOEXITO
Min. :0.00000 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.:0.00000 1st Qu.: 13.00 1st Qu.: 11.25 1st Qu.: 3.25
Median :0.00000 Median : 66.50 Median : 68.00 Median : 14.00
Mean :0.07143 Mean : 78.79 Mean : 78.80 Mean : 27.61
3rd Qu.:0.00000 3rd Qu.:129.75 3rd Qu.:121.75 3rd Qu.: 38.00
Max. :2.00000 Max. :404.00 Max. :320.00 Max. :312.00
DUELOAEREOFALLIDO TIROS TIROSAPUERTA ASISTENCIA
Min. : 0.00 Min. : 0.0 Min. : 0.000 Min. : 0.000
1st Qu.: 3.00 1st Qu.: 1.0 1st Qu.: 0.000 1st Qu.: 0.000
Median : 18.00 Median : 7.0 Median : 2.000 Median : 0.000
Mean : 27.54 Mean : 12.6 Mean : 5.735 Mean : 1.161
3rd Qu.: 40.00 3rd Qu.: 17.0 3rd Qu.: 7.750 3rd Qu.: 2.000
Max. :186.00 Max. :115.0 Max. :71.000 Max. :21.000
REGATES REGATESFALLIDOS GOLESDENTROAREA GOLESFUERAAREA
Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. :0.000
1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000
Median : 7.50 Median : 4.000 Median : 0.000 Median :0.000
Mean : 13.37 Mean : 9.053 Mean : 1.584 Mean :0.251
3rd Qu.: 18.00 3rd Qu.:12.000 3rd Qu.: 2.000 3rd Qu.:0.000
Max. :182.00 Max. :85.000 Max. :20.000 Max. :9.000
GOLESIZDA GOLESDCHA GOLESPENAL GOLCABEZA
Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
Median :0.0000 Median :0.00000 Median :0.0000 Median :0.00000
Mean :0.0258 Mean :0.05067 Mean :0.2429 Mean :0.01739
3rd Qu.:0.0000 3rd Qu.:0.06000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.00000 Max. :7.0000 Max. :1.00000
GOLBALONPARADO FUERAJUEGO FALTARECIBIDA FALTACOMETIDA
Min. :0.00000 Min. : 0.000 Min. : 0.00 Min. : 0.00
1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 3.00 1st Qu.: 2.00
Median :0.00000 Median : 1.000 Median : 13.00 Median :16.00
Mean :0.07959 Mean : 3.112 Mean : 18.24 Mean :19.29
3rd Qu.:0.00000 3rd Qu.: 3.000 3rd Qu.: 28.00 3rd Qu.:30.00
Max. :5.00000 Max. :38.000 Max. :113.00 Max. :90.00
PENALCONTRA MANOS FALTAXTARJETA CORNETS_LANZADOS
Min. :0.0000 Min. : 0.0 Min. : 0.000 Min. : 0.000
1st Qu.:0.0000 1st Qu.: 0.0 1st Qu.: 0.000 1st Qu.: 0.000
Median :0.0000 Median : 1.0 Median : 4.415 Median : 0.000
Mean :0.2612 Mean : 1.4 Mean : 4.710 Mean : 6.431
3rd Qu.:0.0000 3rd Qu.: 2.0 3rd Qu.: 7.000 3rd Qu.: 1.000
Max. :5.0000 Max. :13.0 Max. :32.000 Max. :141.000
DUELOS DUELOS_CUERPO_CUERPO DUELOSAEREOS PASES
Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.0
1st Qu.: 23.75 1st Qu.: 15.0 1st Qu.: 8.00 1st Qu.: 125.5
Median :137.50 Median : 87.5 Median : 35.00 Median : 491.0
Mean :157.59 Mean :102.4 Mean : 55.16 Mean : 590.8
3rd Qu.:252.00 3rd Qu.:163.0 3rd Qu.: 80.00 3rd Qu.: 919.8
Max. :697.00 Max. :469.0 Max. :471.00 Max. :2681.0
PASES_CORTOS PASES_LARGOS PASES_HUECO GOLESXTIRO
Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. :0.0000
1st Qu.: 78.0 1st Qu.: 5.00 1st Qu.: 0.00 1st Qu.:0.0000
Median : 315.5 Median : 22.00 Median : 0.00 Median :0.0000
Mean : 419.6 Mean : 42.51 Mean : 1.29 Mean :0.0942
3rd Qu.: 636.8 3rd Qu.: 58.00 3rd Qu.: 1.00 3rd Qu.:0.1500
Max. :2309.0 Max. :376.00 Max. :36.00 Max. :1.0000
GOLESXFUERAAREA GOLESDONTROAREA GOLPARADO
Min. :0.00000 Min. :0.00000 Min. :0.000000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000
Median :0.00000 Median :0.00000 Median :0.000000
Mean :0.01067 Mean :0.08367 Mean :0.002408
3rd Qu.:0.00000 3rd Qu.:0.14000 3rd Qu.:0.000000
Max. :0.50000 Max. :1.00000 Max. :0.130000
list <- colnames(data)

lis <- list[3:55]
lis
length(lis)
  1. 53

In order to be able to weight the variables, in another work we have obtained the correlations between the different variables.

#unzip
zipF<- "out.zip"
outDir<-"unzip"
unzip(zipF,exdir=outDir)
cor <- read.csv("unzip/out.csv")
dim(cor)
  1. 54
# Transform
cor_list <- purrr::transpose(cor)
cor_list_a <- gsub("CODE = ", "", cor_list)
cor_list_b <- gsub("list", "", cor_list_a)
cor_list_c <- gsub(")", "", cor_list_b)
cor_list_d <- substring(cor_list_c, 2)
cor_list_d
# select first 53 items in list
list_cor <- cor_list_d[1:53]
length(list_cor)

53

# Convert to numeric
list_cor_num <- as.numeric(unlist(list_cor))
for (i in lis){
data_perc[[i]] <- ecdf(data[[i]])(data[[i]])
}

Convert some columns positive values to negative

negative <- c("AMARILLA","ROJA","SEGUNDAMARILLA","ENTRADAS","ENTRADASFALLIDOS","FUERAJUEGO","FALTACOMETIDA","MANOS","FALTAXTARJETA")
for (j in negative){
data_perc[[j]] <- data_perc[[j]]*(-1)
}

Ponderate the values with correlation

# show list that we are going to use
lis
list_cor_num <- as.numeric(unlist(list_cor))
list_cor_num
for (f in lis){
for (g in list_cor_num){
data_perc[[f]] <- data_perc[[f]]*g
}
}

Sum values raw by row

data_perc$SUMA <- rowSums(data_perc[,4:56])head(data_perc)

Generate several dataframe by POSICION

portero <- data_perc%>%
filter(POSICION=="Portero") %>%
mutate(SUM = -SUMA)%>%
arrange(SUM)%>%
mutate(RANK = min_rank(SUM)) %>%
select(NOMBRE,EQUIPO,RANK)
defensa <- data_perc%>%
mutate(SUM = -SUMA)%>%
mutate(SUM = -SUMA)%>%
arrange(SUM)%>%
mutate(RANK = min_rank(SUM)) %>%
select(NOMBRE,EQUIPO,RANK)
centro <- data_perc%>%
filter(POSICION=="Centrocampista")%>%
mutate(SUM = -SUMA)%>%
arrange(SUM)%>%
mutate(RANK = min_rank(SUM)) %>%
select(NOMBRE,EQUIPO,RANK)
delantero <- data_perc%>%
filter(POSICION=="Delantero")%>%
mutate(SUM = -SUMA)%>%
arrange(SUM)%>%
mutate(RANK = min_rank(SUM)) %>%
select(NOMBRE,EQUIPO,RANK)
print("Portero :")
head(portero,6)
print("Defensa :")
head(defensa,6)
print("Centro :")
head(centro,6)
print("Delantero :")
head(delantero)

The Best player

best <- data_perc%>%
mutate(SUM = -SUMA)%>%
arrange(SUM)%>%
mutate(RANK = min_rank(SUM)) %>%
select(NOMBRE,EQUIPO,RANK)
head(best,10)

The Best Team

best_team <- data_perc%>%
group_by(EQUIPO)%>%
summarise(TOTAL = sum(SUMA))%>%
mutate(TOT = -TOTAL)%>%
arrange(TOT)%>%
mutate(RANK = min_rank(TOT))%>%
select(EQUIPO,RANK)
print("The Best Team")
head(best_team,10)
`summarise()` ungrouping output (override with `.groups` argument)



[1] "The Best Team"

The Final Classification

Conclusion

Although the results have not coincided, we see that they do not differ much from the final result. We have to take into account that the final classification depends on the points obtained by won or tied matches and in their case by goal-average.

I hope it will help you to develop your training.

No matter what books or blogs or courses or videos one learns from, when it comes to implementation everything might look like “Out of Syllabus”

Best way to learn is by doing! Best way to learn is by teaching what you have learned!

Never give up!

See you in Linkedin!

--

--

Oscar Rojo
The Startup

Master in Data Science. Passionate about learning new skills. Former branch risk analyst. https://www.linkedin.com/in/oscar-rojo-martin/. www.oscarrojo.es