European Soccer Data Analysis

Felipe Mahlmeister
fmeister23-en
Published in
6 min readDec 4, 2019

Is the Barcelona from Messi the greatest of all the time? Who scores the most goals, wins the championship?

We’ll try to answer these questions and among others by analyzing a European Soccer Database with 25.000+ matches from 2008 to 2016.

These matches data contain besides the team names and the goals, all the players, their positions, what’s their stronger foot, what’s their height and weight and many others

In particular, we’ll be interested in finding trends among the successful teams and how they disparate from the middle and unsuccessful teams.

We will discuss this theme as an explanatory analysis if you are interested to go deeper with the code, feel free to check out my GitHub page

Data Structure

Just like the most complex datasets, this database also has a lot of separate tables, which could be joined to model a relational data structure.

Entity-Relationship Diagram (ERD) of the dataset

We’ll focus on the main features of this dataset to answer our questions about the teams, so even though this data set has a lot of information, we just had to use only part of it and joined all these tables into a large dataframe.

After some cleaning and dedupe, we’ve got a consolidated dataframe, let’s move it on.

Explanatory Data Analysis

To begin our journey, why don’t we take a look at the big picture and as we get interested in a certain subject, we will go deeper into it? Let’s start with the following question:

Which season had the most matches?

The season of 2014/2015 was the one that had the most matches. Let’s check it out who was the team that has most wins in this season.

What is the best European team of 2014/2015?

The easiest way to select the best teams is to sum their championship points, which could be calculated by assigning 3 points to victories, 1 point to ties and 0 points to loss. Let’s take a look at the top 10 European teams of the 2014/2015 season.

Top 10 European teams

By summing it all up we have Barcelona as the best team of 2014/2015 season with 30 victories, 4 ties, and 4 loss, finished taking with exceptional 94 championship points!

So, we’ve already discovered who were the best teams of this season, but what’s really matters in terms of winning a championship and being the best team? Is the number of goals you make? Is having a strategic balance between attack and defense?
Let’s first go deeper into the first question I pointed out:

Who scores the most goals, wins the championship?

Every team aims for the victory of the championship, but is correct to affirm that there’s a direct relationship between a winner team and a scorer team?

Looking to the graph we can see that there’s a strong relationship between the first place of the championship and the team which scores more, in this example 9 out of 11 leagues the team with the most goals ends the championship in the first place.

So, if you are a team owner, I highly recommend targeting your team to be as offensive as possible to wins the championship!

But this graph doesn’t show what’s the most important sector to invest the money (attackers, midfield, side players, etc), we would need further analysis to check it.

Which league did most goals in 2014/2015?

Going deeper into the topic of the goal, we already saw that is important having an offensive team to wins the championship, but some leagues are easier to score than others, right? We can take a little taste of this answer by looking at the mean of goals of each league:

Looking at the last two charts we could conclude some things:

  • As the Spanish league’s top 2 teams have the total of goals way bigger than other leagues, and this league has one of the lowest mean of goals of all, we can conclude that the Spanish league is the most unbalanced European league.
  • Eredivisie league has an astonishing mean of 1.54 goals per game and the top 3 teams of this championship were one of the most scorers in this 2014/2015 season, categorizing the Netherlands league as the most offensive of all!
  • It’s not a coincidence that PSV appears as the 4th best team of all, this team has a extremely offensive schema and probably has the most valuable players of the Europe, with investments not only to win the national championship (which was not a big effort for them, looking at the huge gap between the first and the second places) but to won other euro leagues, as Champions League and FIFA Club Cup.

Conclusions

In this story we analyzed:

  • Which season had the most matches
  • What is the best European team of 2014/2015?
  • Who scores the most goals, wins the championship?
  • Which league did most goals in 2014/2015?

We could see that there is a relationship between the total goals of a team and their position in the championship, we couldn’t categorize this relationship (strong, medium or low) or even measure it (descriptive variable), because this question was not in the main objectives of this study.

We could also visualize that the “best teams” of 2014/2015 (Barcelona and Real Madrid) were also in the most unbalanced leagues of all, raising a question whether these teams were really good or they were just in a limited league. This question can be put to the test grouping all the best teams into a major European league and see how well these teams would perform, for example, in the Champions League.

In overview, this project focused on the big picture analysis and hadn’t an intention to seek details of every question. This deeper analysis could be done in another project, and as a suggestion the main topics that could be covered are:

  • Seek (and measure) for a relationship between the team goals and their position in the championship
  • Seek for prediction of the results of other games with these variables and (if possible) measure and analysis the reliability of this/these model(s)

If at any point you get confused on a topic or concept, feel free to ask for help and I will do my best.

Thank you!

--

--