On the relationship between performance and market values in football (with data for replication)

--

You don’t need to know much about football (i.e., soccer) to have noticed that performances and money are positively related somehow: (1) Clubs with bigger budgets tend to perform better than clubs with not-so-big budgets. (2) The best players are invariably involved in the most high-prized transfers. (3) The best players generally command the highest salaries. These three features are also interrelated. Rich clubs have more money to spend on buying the best players, obviously. And by keeping the best players loyal to their clubs by paying them large salaries, rich clubs also become the best-performing clubs — at least in general.

In this blog post, I’ll look at the relationship between performance and market values among top-tier Norwegian football players. I use the market values provided by the German website Transfermarkt (see here), as they generally are regarded as reliable proxies of transfer fees and player earnings. (Real transfer fees and earnings are often not disclosed.) Performances are harder to measure. Yet several studies have found that expert journalists’ player ratings throughout a season is a valid measure of performance. So, I’ll trust the Norwegian journalists on this one. That said, see here for a recent open-access paper of mine scrutinizing the relationship between “objective performance” and market values among footballers more thoroughly.

Download data and variables

The data refer to 310 outfield players for the 2022 season. Since I finally succeeded in uploading a subset of these data on GitHub in Stata format (.dta) and R/Excel format (.csv) (see here for both), you can download them from “within” Stata like right below. (I don’t know how to do the similar thing in R. If you do, please tell me!)

use https://raw.githubusercontent.com/christer-thrane/blogs/main/soccer.dta

Now you can run the two commands below to get some descriptive statistics:

tabstat mark_val_eu, stats(n mean p50 SD min max)
tabstat rating_22, stats(n mean p50 SD min max)
. tabstat mark_val_eu, stats(n mean p50 SD min max)

Variable | N Mean p50 SD Min Max
-------------+------------------------------------------------------------
mark_val_eu | 310 60266.13 40000 79428.77 7500 900000
--------------------------------------------------------------------------

. tabstat rating_22, stats(n mean p50 SD min max)

Variable | N Mean p50 SD Min Max
-------------+------------------------------------------------------------
rating_22 | 310 4.739194 4.71 .5484609 3 6.79
--------------------------------------------------------------------------

The output above suggests that the average market value of the players is roughly 60,000 euro, while the median is “only” 40,000 euro. We have the expected right-skewed distribution, where some players’ huge market values pull the mean away from the median. The setup of the player rating system is that every player gets a grade between 1 (terrible performance) and 10 (perfect performance) for every match he plays. The player rating variable in the data is thus the season-average grade for the players. This performance variable ranges from 3.00 to 6.97, with a mean of 4.74.

Without further ado: the scatterplot

Figure 1 is a scatterplot of player performance and market values along with two regression lines summing up the general association between these variables. (I’ll provide the Stata code to create this figure and the next at the end of the blog post.) Note that the market values on the y-axis is expressed in logarithms, which mainly is a way of reducing the influence of a few players’ very high market values. (We’ll leave it at that for now.)

The slope of the linear (red) regression line (0.807) transforms into the following in practice: a one-unit increase on the player rating variable (i.e., performance), say from 4 to 5, entails a 124 percent increase in market value. That said, the linear regression line is by no means a perfect fit to the data. In fact, the non-linear (blue) regression line has a slightly better fit. This non-linear model suggests that the relationship between player rating and market values is stronger (as in steeper) for players with better ratings. In any event, better performances appear to go hand in hand with higher market values in general — much as to be expected. The regression coefficients yielding the lines in Figure 1 appear below for reference.

quietly reg mv_log rating_22
quietly etable
quietly reg mv_log c.rating_22##c.rating_22
etable, append column(index) showstars mstat(n) mstat(r2) showstarsnote ///
title(Regression results comparison: linear (1) and non-linear (2)) ///
note(Dependent variable = Market value (logged))
Regression results comparison: linear (1) and non-linear (2)
-------------------------------------------------------------------------------------
1 2
-------------------------------------------------------------------------------------
Player rating (season average) 0.807 ** -2.393 **
(0.064) (0.684)
Player rating (season average) # Player rating (season average) 0.335 **
(0.071)
Intercept 9.125 ** 16.663 **
(0.307) (1.632)
R-squared 0.34 0.38
-------------------------------------------------------------------------------------
** p<.01, * p<.05
Dependent variable = Market value (logged)

A second look at the scatterplot

Regression analysis has many things going for it. But sometimes its quick-and-dirty summarizing of a variable relationship does not provide us with the most accurate picture we might prefer. In Figure 2, I have thus let the data “speak for themselves” by means of using a scatterplot smoother to identify any (non-linear) trend line.

The results of Figure 2 suggest a mix of the two regression lines in Figure 1. There appears to be no relationship, or perhaps even a slight negative one, between player rating and market values up to average player ratings of about 4.50. For player ratings above 4.50, however, there is an approximate linear relationship between the variables — much in line with the general, positive relationship we expect. But we are not talking about a general, linear relationship.

Takeaways

Large amounts of money and big-time success are two features that often go together in sports. In this blog post, I have looked at the relationship between performance and market values among top-tier Norwegian football players. And although this relationship must be considered positive in the sense that better performances in the main imply higher market values, this relationship should not be described as strictly linear. Or, as they say, at least not in our data.

Code for figures

See below:

* Fig 1

ssc install schemepack, replace
set scheme black_tableau

twoway (scatter mv_log rating_22, msymbol(oh) mcolor(gs10) jitter(12)) (lfit mv_log rating_22, legend(off) ///
lcolor(red%80)) (qfit mv_log rating_22, legend(off) lcolor(blue%80)), ytitle(Market value (logged)) ///
title("Figure 1" "Market value by player rating:" "Regression models", size(medsmall)) xtitle(Player rating (season average)) ///
note(Note. R-squared for linear model (red line) = 0.34; R-squareed for quadratic model (blue line) = 0.38., span size(vsmall))

* Fig 2

* Search mrunning and install it before continuing ...

mrunning mv_log rating_22, lcolor(blue%80) scatter(msymbol(oh) mcolor(gs10) jitter(12) ///
title("Figure 2" "Market value by player rating:" "Data-driven trend line", size(medsmall)) ///
note(Note. R-squared for data-driven trend (i.e., a scatterplot smoother) line = 0.41., span size(vsmall)))

About me

I’m Christer Thrane, a sociologist and professor at Inland University College, Norway. I have written two textbooks on applied regression modeling and applied statistical modeling. Both are published by Routledge, and you find them here and here. I am on ResearchGate here, and you also reach me at christer.thrane@inn.no

--

--

Christer Thrane (christer.thrane@inn.no)
The Stata Gallery

I am Christer Thrane, a sociologist and professor at Inland University College, Norway. You find me on ResearchGate. I do lots of regression modeling ... :-)