Truth or myth: October is the month of the aces of football

Gbfernandes
10 min readOct 21, 2022

--

In Brazil, there is a topic of discussion that comes and goes from time to time: October is the month of birth for all-time top footballers. TV sports shows bring this topic back occasionally and even the most popular football magazine in the country, Placar, has an article from October 2021 about this (in PT-BR). Placar editors showcase their evidence for this theory:

  1. Pele’ (dob 23rd Oct 1940): no introduction required, he is the only footballer that has won 3 world cups as a player.
  2. Maradona (dob 30th Oct 1960): known for his impressive skills has scored some of the most beautiful and controversial goals in history.
  3. Garrincha (dob 28th Oct 1928): Brazil’s crooked leg angel was known for his legendary dribbling skills.
  4. Lev Yashin (dob 22nd Oct 1929): due to his USSR’s uniform’s colour, the black spider is, to this day, the only goalkeeper to win the Ballon D’Or (1963). He is a certain pick in any all-time dream team.
  5. George Weah (dob 1st Oct 1966): the only African footballer to win the Ballon D’Or (1995) was known for his ability to be at the right spot at the right time and score the most unlikely goals.
  6. Bobby Charlton (dob 11th Oct 1937): the best English player of all time was an ambidextrous midfielder with great flair and precise passing.
  7. Marco Van Basten (dob 31st Oct 1964): his bicycle goals are wonderful paintings that are part of any list of most beautiful goals ever.

If you clicked on any player’s name, you could see some short YouTube videos showing how these players had many marvelous moments. Although videos and actual examples are great storytelling resources that give shape and colour to a plea, could this be a case of confirmation bias?

The fact is that those 7 players I listed are among the best footballers in history. Pele’ and Maradona are certain picks for all-time best and one thing that they all have in common is being born in October. The true question we want to answer is: can this be an example of confirmation bias or there is an actual correlation between month and football skills?

Gathering data

In this post, we would like to provide some data evidence about this theory. Although there is no such thing as a database with all professional footballers' information across time with comparable labeling for their skills, EA Sports has created FIFA, a multiplatform video game that includes the main leagues from several countries. The data can be found on Kaggle’s website and provides the date of birth, nationality, position, overall rating, and much more.

This data might not include Dardo Urchevik, a player born in October who played at Argentino Juniors in the same period as Maradona, but one would not put him in any “all-time best” list. In other words, historical databases might miss out on average and not-so-great players. FIFA 22 on the other hand aggregates nearly all professional players in the selected leagues and divisions. This means that by analysing this data we are allowing evidence against our theory to arise.

In FIFA 22 DB we have 19,238 players from 163 nationalities playing on 702 clubs from 56 different leagues. If you google the number of professional footballers in the world, you would find 129 thousand players which might put into perspective the coverage of this DB. Although we might not be considering the 110-years-old Iceland’s first division, the most famous leagues in the world should be enough for our comparison.

Confirmation bias

This is a concept that, in simple terms, explains the human tendency to, once formed a theory, notice all evidence that confirms it and neglect all that contradicts it. In our case, because we start off with a list of 7 top players born in October we don’t give enough attention to the fact that Messi, Cristiano Ronaldo, Franz Beckenbauer, Zidane, Johan Cruijff, Ronaldo, Zico, and many others are not born in this month and yet are among the greatest players ever.

What do professional players in FIFA look like?

First thing, one would expect 8% of all top players to be born in October (1 out of 12), just like the distribution of all players to be equally distributed throughout the year. But is it? Youth teams divide children and teenagers into age groups composed of those born in a given year. So the Under-8 team for Marlow FC is formed by children born in 2015. It is clear that in youth teams being a few months older can make a lot of difference, so it is possible that children born toward the end of the year are less likely to continue to play until the professional level. This would be an example of the relative age effect. FIFA data corroborates with this as seen in the following chart.

Distribution of the month of birth for all FIFA 22 players

When we analyse data with the aim to explain or estimate the effect of a factor (month-dob in our case) it is important that we control all other factors that might also correlate with the one we are measuring. The players’ position is an obvious one. Out of the 7 players mentioned first, we have 3 strikers (Pele’, Weah, and Van Basten), 2 attacking midfielders (Maradona and Charlton), 1 right winger (Garrincha), and 1 goalkeeper (Yashin). It is noticeable that there is a tendency for forwarding players to appear on these top-players lists. The same is true for nationality. If I google “best football players of all time” the scrolling list at the top is the one below.

Google results for best football players of all time

Out of the 16 players on this list, there are 3 Argentinians, 3 Brazilians, 2 Portuguese, 2 French, and 1 player for England, Germany, Netherlands, Hungary, and USSR. Although Google might be adjusting my results given some information about my search history, this effect can be observed in FIFA data as well. Here is the distribution of players for the top 50 nationalities.

Distribution of players on top 50 nationalities

England appears as the most frequent nationality due to the fact that FIFA includes 4 full leagues with 20–24 teams for this country. Other countries might have a single division. Given that, Argentina’s case is interesting. They are the 5th country in the number of players, but FIFA only considers one league with 26 teams. That means the number of Argentinian players playing abroad is large.

What is a top player?

Defining a top player based on some quantitative criteria is arbitrary, which sounds contradictory, but is the only way for us to move forward with the 19k+ players in FIFA data. EA has created an overall rating for each player in the game. This takes into consideration all the attributes such as stamina, skills, shooting, speed,… (do you see a pattern here as well? 👀).

It is reasonable enough to think that a top player is among the 1% with the highest FIFA overall rating. That leads us to say that a top player has an overall rating above 83. This includes Messi (93), Lewandowski (92), Cristiano Ronaldo, Neymar, Mbappe’ and De Bruyne (all 91), but also includes Fernandinho, Vela, Witsel, Azpilicueta (all 83). In 20 or 30 years time, the first 6 might be remembered among the all-time best players (Messi and C. Ronaldo certainly will), but the last 4 are unlikely especially given they are over 30 y.o. already.

Back to our hypothesis, we can see that players born in October are marginally less likely to be among the top players: 0.985% vs 1.037%. However, will that still be true if we take into consideration other factors such as age, nationality, and position?

What else can be a predictor for a top player?

The model we used here to estimate the joint effect of month of birth and other features (nationality, position, and age) is the logistic regression. First, let’s take a look at the other three features of the model.

Other features in the model

The three features are clearly correlated with the target:

  • Age: One would expect that a player’s peak would happen around 27 y.o. We can observe that happening in FIFA data, but there is a sudden increase in the likelihood of being a top player with 31–33 y.o. and 33+. This might be an effect of two things: a count reduction due to the retirement of older players that were among lower overall ratings or a memory effect that what a player has done in recent years will result in a high overall rating even if the player is not on their peak anymore. We can see a reduction of 10–15% in the number of players over 30 y.o. which corroborates the theory of early retirement of the average player.
  • Position: Clearly there is a strong correlation here. Being a CF increases by far the likelihood of being a top player.
  • Nationality: A curious effect can be observed on the third chart, countries with a low number of players (x-axis) have either a very high chance of having a top player or a zero chance. Armenia (the highest dot on the chart) is not the barn for football stars. In fact, there are only 7 players from that country in FIFA and one of them, Mkhitaryan who currently plays for Inter Milan, is among the top player. When EA Sports creates the list of players that will be part of FIFA, they create practically all players for a few selected leagues and teams. The countries that don’t have a complete league created in FIFA, may result in bias to have only players that were good enough to play abroad. To minimize the effect of the cherry-picking players per country, we will create a one-hot encoding feature for each country with over 600. There is a significant drop between Brazil (897) and Japan (546), the 6th and 7th countries with the most players in FIFA 22, and hence the threshold of 600.

After all, is October a special month?

There is no evidence from FIFA 22 data set that October is anything special. The chart below shows the coefficients estimates (or weights of features if you prefer) for that model. Clearly being a CF or an LW are determinants for being a top player, but age also plays a role in this. Being Argentinian shows a higher chance of not being a top player, even compared with other countries with less tradition in football (feat_nationality_other_countries).

First model: October effect alone

One theory, however, is suggestive: players born toward the end of the year will have to endure the tougher competition in youth teams. So one way to capture this relative age effect hypothesis is by adding the month number into the model. The chart below shows that although there is a small effect of the month of birth in the model, it is not strong in the presence of other features.

Second model: month of birth effect

Conclusion

In this post, we have proposed a few hypotheses and used FIFA 22 data to back them or refute them. Here is a brief summary of our findings.

  1. October theory is a myth: or at least FIFA data do not provide evidence that there is something special about being born in this month. Therefore, the list compiled by Placar is more likely to be an example of confirmation bias once again.
  2. Relative age effect: the way youth teams divide children and teenagers creates a bias on the likelihood of their survival through youth teams until becoming professional footballers. This was observed in the distribution of players per month of birth. However, there isn’t so much evidence that this holds true for the top 1% of players.
  3. Early retirement of the average player: age was one of the most relevant features in predicting if the player is among the top 1%. This is driven by the reduction in the number of players with a lower overall rating after the age of 30 y.o. So it doesn’t mean that there are more players among the top 1% players with 30+, but that there are fewer players with a lower overall rating.
  4. Cherry-picking players per country: some countries might initially look like a fruitful barn for top players, but this is the effect of the bias that FIFA does not include all leagues and teams in the world. So countries like Armenia have only a handful of players created and there will be a higher chance of the Armenian player with a lower overall rating not being created.
  5. Forwards are the best: Perhaps because football is a game with a low number of points (even the world had a final which ended 0–0 on regular time), the goal is so important. In Brazil, since the 80’s there is a TV show called “Goal: the greatest moment of the sport”, emphasizing this. Hence it is natural that the best player a team has, from youth to professional teams, will play in attacking positions.

The full data analysis can be found in this GitHub repository.

--

--