Football Insights from FIFA Data: Player Valuation

Ofir Magdaci
18 min readApr 27, 2023

--

What moves transfer market value and how to model it

DALL-E 2: “A few golden falling coins with the shape of a soccer ball embedded in them. digital art.”

Welcome to the second episode of our EA FIFA dataset exploration series, where we uncover the untold stories of footballers using the FIFA dataset. In our previous post, we looked at how age affects a player’s on-pitch abilities, and now we’re turning our attention to the economic side of football.

Just like the stock market, the transfer market in football is a constantly fluctuating landscape, with players’ values rising and falling based on performance and a wide range of other factors. As fans, we often marvel at the talent of our favorite players, but have you ever considered their worth as an investment? What are the factors that move the needle in this multi-billion dollar industry? In this post, we’ll dive into the world of FIFA player valuation and explore how a data-driven approach can help us better understand the rise and fall of these sporting assets. Get ready to witness the intersection of football and finance like never before, uncovering insights that can inform decisions for clubs, agents, and fans alike.

Note: this is player valuation as defined and measured in FIFA. In practice, what we call the transfer fee is also affected by additional factors, such as moving to a competing club from the same league/city, the identity of the buyer (as some clubs, like Chelsea, tend to overpay), player hype, injuries history, marketing potential from selling franchise, public image, etc.

Data (and data bias)

A description of this data and its limitations are addressed in the first post. In addition to the biases mentioned there, it is vital to understand that FIFA player value does not necessarily reflect a real-life value. In fact, such an estimation doesn’t objectively exist, and each source has its biases. Moreover, FIFA has multiple valuations over the course of the year, depending on the product version. This versioning is not supplied within the data and introduces some noise to the system.

With that said, we can get a rough estimate of the extent of the gap between FIFA player valuation to real estimates by comparing these valuations to actual transactions that have taken place. In this way, we are exposed to changes that originate in current player performance, injuries, etc. Thus, an educated comparison can be made between FIFA valuations to similar ones made at the same point in time. For example, in the first post, we compared the valuation of Dele Alli and Ollie Watkins over time, observing that FIFA’s valuation may be lagging behind those made by transfermarkt.com.

Part 1: What influences value

We will kick off this episode with a simple question: what affects the FIFA player valuation? We will tackle this in a few steps, the first among them essentially a factor analysis of our objective, starting with age.

Age

In the first chapter, we discovered that age dictates a lot during players’ careers. Its last figure shows that on average, the price peaks early. Accordingly, a young footballer will cost much more than equally performing players at an older age. From an economic point of view, hot prospects have more years to improve, may have an influence on more games, and can be highly traceable later on, thus they have a higher ROI.

Figure 1: Players’ FIFA value by age (left) and transformed age (right). The applied transformation measures the distance of a player from age 25 (=| Age — 25 |), achieving a nearly monotonic behavior on boxes’ medians. Images by author.

As the left chart illustrates, players’ values generally grow steadily until the age of 25. Then, the median value behavior is nosier and much more volatile. The right figure shows how a simple transformation, such as the distance from the age of 25, can result in much more consistent behavior that we’ll be able to later fit into a linear equation.

Macro trends & market inflation

This is the collective growth of players’ prices over time. One reason for this inflation is the huge increase in clubs’ revenues over the last decade. For more information about market inflation, you may want to review the excellent Medium article by Sam Lyer Sequeira.

Figure 2: Average player FIFA value across years (15 represents 2015). It’s clearly visible that the average price grew consistently over this period, suggesting inflation in FIFA player value over the last decade. Images by author.

As demonstrated above, the entire market is susceptible to macro trends. The global average is our best guess for any given player without further information about him.

Global sports events

Global events such as the World Cup can also initiate shifts in players’ valuation. A nation that wins the World Cup may attract more interest from scouts or enhance the overall public conception of the country as producing high-quality talent. The Spanish golden decade, which started in 2008, is clearly aligned with both Spanish players overall (Figure 3, left) and the La Liga trend (Figure 3, right) since 2015. Spanish players were worth about double in value compared to the global average, and La Liga surpassed the overall inflation growth rate, which was almost tripled during that duration.

However, Spain presented consistent top performance for multiple years, as the short-term horizon is not visible in this dataset. To learn about the more immediate impact on FIFA valuation, we can inspect the dynamics of German and French players and leagues after winning the 2014 and 2018 World Cup titles. Interestingly, the increase in German players’ value went back down three years after the national achievement and hasn’t ever reproduced its peak. Considering recent transfers such as Enzo Fernandez, the above pattern may hold for Argentina — the last winner — as well.

But it seems that benefiting from worldwide events isn’t limited to the winners. Russia, for example, experienced dramatic growth right after hosting the 2018 event. Another future example may be Morocco, judging by the latest world cup tournament.

Figure 3: Average nationalities’ price over time, with global avg baseline (left); Average leagues’ price over time, with global average baseline (right). Images by author.

Unrelated to the current section, but hard to not mention, is the meteoric growth in average value of the English Premier League over the last decade. What seemed like a tight competition with La Liga in 2015 become a complete knock-out in 2020, when Covid hit and clubs started to struggle to manage the income loss, together with the La Liga’s strict fair play regulations.

Rating and potential

Here we shift our focus to the nature of the relationship between skills and value, starting with the overall rating. Naturally, overall rating and value do correlate, but the more important question is how they interact: How much does a player’s value increase per rating point? What does the distribution look like? Let’s find out.

Figure 4: Overall rating — FIFA value distribution (left); Potential rating — FIFA value distribution (right). Images by author.

As expected, rating and potential are highly correlated with value. However:

  • You don’t pay linearly on rating. As ratings get higher, price growth outpaces linear behavior.
  • While the entire price distribution shifts up as the score rises, so does the variance. That is a strong indication that rating is not everything. It’s likely that other compounding factors associated with these ratings, such as age (Figure 1), position, nationality, or league (soon to be covered) can explain the observed variance. That’s the moment our friend “correlation is not causation” springs to mind.

Skills

This is where we want to quantify how different attributes contribute to the market value. As we did in the previous chapter, we can distinguish between attackers and defenders. The simplest thing to start with is the correlation of these abilities with price. Since we don’t expect any linear behavior here (Figure 4), we will use Spearman’s correlation as our measure. The following table shows the top attributes in terms of the Spearman correlation to FIFA players’ valuation, divided into attackers and defenders.

Defenders                                  Attackers
Skill Spearman Skill Spearman
defending_standing_tackle 0.792 skill_ball_control 0.841
defending_sliding_tackle 0.760 skill_dribbling 0.774
mentality_interceptions 0.746 movement_reactions 0.770
defending_marking_awareness 0.721 mentality_positioning 0.751
movement_reactions 0.721 attacking_short_passing 0.704
attacking_short_passing 0.628 attacking_finishing 0.703
mentality_composure 0.587 mentality_composure 0.685
skill_ball_control 0.565 power_long_shots 0.672
skill_long_passing 0.526 power_shot_power 0.668
attacking_heading_accuracy 0.486 attacking_volleys 0.635
mentality_aggression 0.480 mentality_vision 0.597

Judging solely from the above table, the eye-catching skills of dribbling and ball control have the highest correlation to attackers’ valuation. The shooting attributes collection on average comes in second, and passing is third.

Similarly, skills related to direct defending actions — tackling and ball intercepting — are the most correlated to defenders’ price . Off-ball defensive capabilities, such as marking and reactions, are next to follow. Unsurprisingly, peripheral skills such as passing and heading ability provide a weaker signal.

It is only rational that the stronger the background a player comes from, the higher the belief in his likelihood to succeed. The random Argentinian youngster sub from River Plate is likely to be more successful, on average, than a similar rotation player from the top Estonian first-division club. This belief ultimately translates to market value. After all, it’s all about risk vs expected utility. Acknowledging all of this, we will use different factors to describe a player’s background.

Clubs

Strong clubs have the funds to acquire high-value players, directly affecting their average squad value. However, this is not necessarily a one-way street: players’ value usually increases when moving to top clubs, as such transfers are strong evidence for the market belief in the player.

Figure 5: Selected clubs’ value distribution (left) and their players’ average value by year (right). Images by author.

Leagues

The stronger the league, the higher its players’ value. Of course, once again there are compounding factors, as stronger leagues attract better, more expensive players, and the league’s clubs usually have more funds to invest in such players due to their success in international competitions, higher income from broadcasting rights, etc. Moreover, when footballers prove themselves in a top league, or in a league that’s previously shown success in growing talent, they are likely to be of higher value. In light of the tremendous growth of the Premier League (Figure 3), TV rights may have play a significant role here. This source of income is likely to resonates with the league’s popularity around the world, encouraging even more income.

Figure 6: Players’ value distribution by league (left) and by league level (right; lower score means stronger league). Images by author.

Playing position

Some positions are traditionally valued more than others. The reasons for this may be supply and demand (to left-wingers are more scarce) and a bias towards goal creation (CAM vs. CM): traditionally, forwards cost more than defenders.

Figure 7: Players’ value distribution by position, sorted by position — from defending to attacking positions. Image by author.

Nationalities

Here, a few major factors interact:

  1. History — the young talent from Brazil vastly outnumbers the Swiss hot prospects who become top players. This statistic, combined with the belief that the ability to grow talent is fundamental and does not change easily over time, commonly leads clubs to value such players. Uruguay is a great example of a poor country with a mediocre league that consistently produces world-class talent. In addition, the Brazilian “next promise“ striker Endrick is just the most recent evidence of the power of nationality.
  2. League quality — top footballers from leagues like the Brazilian Serie A and Argentine Primera División are likely to be valued much more than, for example, corresponding players from Cyprus or Israel’s premier leagues, as those regularly face inferior opponents.
  3. National economy — Players who come from poor countries are likely to cost less, both in fee and salary, which often aligns with scouts’ destination priorities, like Africa and South America.
Figure 8: Players’ value distribution by nationality. Image by author.

National team participation

Players who participated in international tournaments are likely to have a higher value, as their performance is at the top for their nation. In addition, the player has experienced matches at high levels. Once again, it is a chicken and egg situation, as players selection may be also affected by their market worth.

Figure 9: Players’ value distribution by participating in the national team (1 means part of the squad). Image by author.

Years left in the contract

A very basic notion in economics is that when selling/buying an asset, the price should include any future compensation that may be relevant for the seller. As one example, if the player has 3 years left in his contract, his team is eligible to be compensated for breaking the contract and losing him for those 3 years, and the cost of looking for a replacement.

Alternately, a player who is close to the end of their contract may be considered less valuable than one who has several years remaining, as the former may be more likely to leave the team in the near future.

Simply put, the more years left in a player’s contract, the more other clubs need to pay to break this obligation. It’s interesting to notice (Figure 10) that four-year contracts show a decrease in players’ value. An explanation for that requires further analysis, starting with compounding effects, such as age and number of first-team appearances.

Figure 10: Players’ value distribution by number of years left in the contract. Image by author.

Additional factors can be applied here. Generally speaking, if a factor’s levels exhibit significant coefficients using the ANOVA test (or any other suitable method), it can be considered a candidate for the formula. However, if you aren’t able to explain the results and the trends derived from it, keep it out.

Now that we’ve acknowledged the various factors affecting the price, it’s time to get our hands a bit messy and face a real challenge. In the next section, we will try to model the way these different components translate into FIFA players’ valuation.

Part 2: FIFA market value modeling

Photo by Jeswin Thomas on Unsplash

Here we aim to model the way FIFA’s market value is set. Specifically, we will learn to decompose a given player’s expected market value based on the various factors we observed, measured entirely from the FIFA dataset.

Baseline design

I believe that every prediction problem solution should start with a simple, explainable baseline such as a linear equation. This phase is often neglected, but I think it’s highly beneficial, as we can learn how much of the variance we can explain, and how. Further, it’s during this process when we can determine the missing pieces and what there is to improve when building the primary model.

For this baseline, we will utilize the insights we have gathered so far to define the set of fundamental properties in player valuation. We will use this data to train a linear regression model, creating a simple linear equation for calculating the expected price of a given player. Since factors are not necessarily independent, we expect some of the coefficients to balance each other. These will also serve as explainers for each prediction.

Overall, our linear regression formula will consider the player’s club (incorporated league), playing position, age (transformed), overall rating, national team participation, years left in the contract, potential rating, and macro inflation. Each dimension (or feature) will quantify the average deviation, in euros, from the global players’ average price that year.

We will use a random split of 2015–2021 data for training and validation. The validation set aims to see if we can generalize to instances within our trained time frame.

Note: splitting the data randomly when time is a factor may not be the best practice in machine learning. Ideally, we would like to roll over the years and not mix values from different seasons. However, we aren’t using this baseline as a predictive measure, but rather as a tool to quantify how much of the total variance we can explain with the above information, including time. The next episode will deal with predicting future value, and then we’ll consider such data leakage. Moreover, we will also use the 2022 data as an unseen test set, inspecting what may not be generalized before going to a truly predictive model.

Baseline results

Figure 11: Model prediction vs. true FIFA player value. Colors indicate quality and split to five ordinal categories by error rate: <10%, 10–20%, 20–30%, 30–50% and >50% (see legend). Image by author.

We’ve evaluated the model performance in three ways:

  • Fit quality — (proportion of variance explained): R^2 score = 0.938
  • Prediction quality score: Mean Absolute Error = 0.54M Euros
  • Prediction quality distribution: Bins of error rates: supply bigger picture to evaluate prediction quality, using the colors scheme and binned into five ordinal categories by error rate: <10%, 10–20%, 20–30%, 30–50% and >50% (later). Note: the denominator in the calculated error rate is the predicted value.

Model coefficients

Model coefficients (each feature reflects the average deviation from the global average in millions of euros):

Feature name [millions of euros]      Coefficient
Global average 0.997
Overall rating 0.717
Potential rating 0.375
Age (transformed) 0.848
Playing position 0.071
National team participation 0.031
Years left in contract 0.043
Club 0.005

As the coefficients suggest, the most important for the model is the global market average, as mentioned, which is the most naive guess we can have without further information. Then, overall and potential ranking, naturally, have the highest impact. At first glance, the impact of club level is not particularly significant, even when nationality was abandoned (as the model ignored it completely). That said, club fluctuations were quite high, so overall, this was balanced. For benefiting from all features, as well as from their interactions, we will need a stronger model.

Digging deeper

Figure 12: Model error value histograms (left) and error ratio value (error value / predicted value) histograms (right). Both figures were trimmed due to long tails that distort observing the distribution (less than < 1% of examples). Image by author.

Going over the error distribution (Figure 12), we can see that their absolute values are quite low. However, when describing errors with percentages, the variance is much higher. This is a clue that our modeling capabilities are aligned with the price magnitude — the lower the player value, the noisier the results. To examine this thought further, we can have a mini drill down to see how errors and metrics look for players with low market value, and for those with at least 5M valuation. The following table presents the quality bins we defined before (as MAE will be biased here by price) under and over 5M euros threshold. The error rate presented in the table is simply the error itself divided by the prediction value.

Overall                        Under 5M             Over 5M
Error_bin Count % Count % Count %
< 10% 4,784 22.2% 3,581 19% 1,203 46.0%
10-20% 3,775 17.6% 2,784 14.8% 991 37.9%
20–30% 2,500 11.6% 2,251 11.9% 249 9.5%
30–50% 3,522 16.4% 3,433 18.2% 89 3.4%
> 50% 6,907 32.1% 6,823 36.1% 84 3.2%

It’s clear that results for those valued over 5M are significantly superior to those with a valuation at less than 5M. Figures 13 demonstrates dividing the findings of Figure 11 into two separated populations, as in the table above. Coloring makes it evident — this approach shines at higher values.

Figure 13: Model prediction vs. true FIFA player value: for players valued under 5M Euros (left), and valued over 5M Euros (right). Image by author.

Testing on 2022 data

Now we’ll challenge the model with data from an unseen period — 2022. To this end, the model uses the factors from 2021 and the inputs from 2022. Only the global average of 2022 was used. Alternatively, we can try to predict it using historical data, but that’s too much for a baseline (and this post).

  • Fit quality: R2 score = 0.91
  • Prediction quality score: MAE = 0.83M Euros
Figure 14: Model prediction vs. true FIFA player value. Colors indicate prediction quality (see legend). Image by author.

Naturally, the numbers are lower: unless your model underfits the problem, this is expected behavior when dealing with data and timeframes the model’s never seen before. Surprisingly, performance is still strong. In addition, the pattern of under/over the 5M threshold seems to hold (see table below). Interestingly, over 5M, over 50% of players’ values are predicted within a 10% error rate. That’s actually better than the performance on the validation set for the same sub-population.

Overall                       Under 5M             Over 5M
Error_bin Count % Count % Count %

< 10% 2811 23.0% 3,581 19% 1,203 46.0%
10-20% 2465 20.2% 2,784 14.8% 991 37.9%
20–30% 1775 14.5% 2,251 11.9% 249 9.5%
30–50% 2156 17.6% 3,433 18.2% 89 3.4%
> 50% 2968 24.7% 6,823 36.1% 84 3.2%

Interpreting the results (and disclaimers)

When reviewing results manually, we can speculate about different reasons for the model errors, which show the deviation from the expected value for the player input. What does this distance from the expected price stand for? Does a higher-than-expected result mean the player may be overvalued?

  • First, don’t jump to conclusions — saying a player was under-valued by the model does not mean he is “expensive.” That’s just one interpretation. Any model, at best, is a proxy for the label, which is, at best, correlated to the real market value of players. So as in the previous post, we don’t focus on the absolute values. We don’t pay attention to the coefficients themselves, but rather their relative magnitude, their signs, interactions, and proportion of impact.
  • FIFA player valuation is an estimand by itself. Therefore, we’re carrying their bias along in both our model training and prediction. One can totally disagree with FIFA’s way of valuation.
  • Like stocks, we may believe that some companies deserve higher prices than similar ones. It may be due to a lack of competition in the industry, a strategic advantage it possesses, or any other reason that will make us believe in higher ROI. After all, it’s a free market, driven by supply and demand. Whether this belief is true, only time will tell.
  • Sometimes, errors are just errors :)

What can errors teach us?

Here we will briefly examine top cases by error value / rate (both on validation and test sets), exhibiting cases of the most under-or over-valuation. Finally, we will examine some success stories, when our model did a good job, with small error values.

Biggest underestimates

Player name            Year    Prediction    FIFA VALUE    Error   Error (%)
Fabinho (Liverpool) 2022 €63.6M €73.5M €9.9M 15.5%
Dwight McNeil 2021 €21.1M €31.0M €9.9M 46.9%
Eden Hazard 2017 €52.2M €62.0M €9.8M 18.8%
Mauro Emanuel Icardi 2021 €53.8M €63.0M €9.2M 17.1%
Gianluigi Donnarumma 2022 €110.3M €119.5M €9.2M 8.3%
Luka Jović 2021 €25.7M €33.0M €7.3M 28.4%

Some of the errors are quite interesting, such as Eden Hazard. This is a great error to learn from, as FIFA really did better. In 2017, Eden Hazard was 2 years before his peak at Chelsea, when he won the Player of the Season Award for 2018/2019. However, in the model’s defense, in many cases it didn’t take FIFA a lot of time to reduce these valuations quite drastically. I don’t claim the model can see the future, but merely learned the expected behavior with the data and features it has. It does not have any personal knowledge, so it cannot be blinded by hype.

Biggest overestimates

Player name            Year    Prediction    FIFA VALUE    Error   Error (%)
Hugo Lloris 2022 €70.2M €13.5M €-56.7M 80.8%
Manuel Peter Neuer 2021 €68.0M €17.5M €-50.5M 74.2%
Ezequiel Omar Barco 2019 €17.9M €12.0M €-5.9M 32.9%
Matthijs de Ligt 2018 €18.0M €11.5M €-6.5M 36.1%
Rodrygo (Real Marid) 2021 €45.5M €38.0M €-7.5M 16.5%
Philip Foden 2020 €24.3M €16.5M €-7.8M 32%
Joško Gvardiol 2022 €20.3M €12.5M €-7.8M 38.4%
D.Emiliano Martínez 2022 €41.4M €33.5M €-7.9M 19.1%

Learning from the table, we can identify two main cases of over-estimates:

1. Matured goalkeepers who were able to maintain their overall rating (Lloris, Neuer).

2. Young talent with high potential from strong clubs:

  • Matthijs de Ligt who was rated 76 overall at the age of 18, with a highly promising potential of 89. Playing for Ajax with 3 years left on his contract, the model saw his prominence.
  • Similarly, we can find Rodrygo, Foden (2020), Gvardiol and Emiliano Martinez — before the World Cup.
  • Ezequiel Omar Barco, who was predicted to be valued at €17.9M by the model (age 19), almost €6M over the FIFA price. In 2022, FIFA was pricing him at €15M, and €12.5M at 2023.

These examples demonstrate that sometimes errors are a great way to get insights and find new research leads. Further, investigating these cases over time, we can really witness how the market can fluctuate, change and evolve.

Success stories

Now we’ll wrap up with some examples that show what very precise predictions look like, without anything smart to say about them:

Player name            Year    FIFA VALUE    Prediction    Error   Error (%)
Marc-André ter Stegen 2022 €99.0M €101.6M €-2.6M 2.6%
Marquinhos (PSG) 2022 €82.1M €90.5M €8.4M 9.2%
Eden Hazard 2020 €89.2M €90.0M €0.8M <1%
Patrick Van Aanholt 2021 €9.6M €9.5M €-0.1M 1%
Roque Mesa Quevedo 2017 €9.7M €9.5M €-0.2M 2.1%
Gaël Dimitri Clichy 2015 €8.9M €9.5M €0.6M 6.3%
Randal Kolo Muani 2022 €8.7M €9.5M €0.8M 8.4%

What’s next?

Sometimes, it’s easy to get swept up in the allure of building the most complex and sophisticated model imaginable. However, as we’ve learned in this post, even a simple baseline model can yield powerful insights. By beginning with a basic linear regression, we were able to pinpoint the critical factors that impact a player’s market value in FIFA, along with their relative importance. So, for now, we’ll stick with our humble approach and avoid any fancy modeling techniques.

That said, there’s no need to abandon our passion for advanced modeling altogether, but rather focus on more challenging task. In the next post, we’ll strive to predict players rating trend in upcoming year based on their past performance. Achieving this goal will require us to employ more advanced modeling tools.

Summary

In this second part of the EA-FIFA dataset series, we’ve focused on players’ valuation. Specifically, we investigated how FIFA’s market value is affected by all sorts of factors, and then we tried to model it as a linear interpretable formula. The code of this post, as well as the previous one, are available through the project GitHub repo.

As mentioned before, this is not a model that predicts the true player value, but rather the value associated with the player within the FIFA framework. The FIFA player price may be quite different from other valuations, such as transfermarkt.com. Further, actual transfer fees are not obligated to match any valuation proposal and are determined by the forces of the market, agents, circumstances, and more.

--

--

Ofir Magdaci
Ofir Magdaci

Written by Ofir Magdaci

ML engineer @Meta, M.Sc. in Industrial Engineering at Tel-Aviv University. Writing about whatever feels interesting, intriguing and fun. www.Magdaci.coe

No responses yet