An exploratory statistical analysis of the 2014 World Cup Final
As a fan of both data and soccer — from here on out referred to as football — I find the football fan’s attitude toward statistics and data analysis perplexing, although understandable due to years and years of simple stats being the only thing that the media focuses on. Football is a complex team sport with deep interactions, therefore counting events (goals, assists, tackles, etc.) isn’t enough.
This post shows how you can use play-by-play data to analyse a football match, showing custom measures and visualizations to better understand the sport.
Disclaimer: I’m a fan, not an expert. Germany’s National Team and Manchester City have whole teams dedicated to data analysis, and the state of the art is quite above what is being shown here. However, rarely does that analysis is made public, so I hope this is useful (or at least entertaining). I hope to keep playing with the data and share useful insights in the future. Feel free to star the GitHub repository, or drop me at email at [email protected]
A note on the data used
This play-by-play data was gathered from a public website, and I have no guarantee that it is consistent or correct (the process used to gather is the theme of its own post). On the other hand, all calculations based on the raw data are available on Github, and should be questioned. I would love to get some feedback.
The first half
Let’s get a quick profile of the first half. The chart below shows where in the field most events took place (positive numbers correspond to Germany’s offensive half, negative numbers to its defensive half), with each team’s shots pointed out.
The first 45' of the final were incredibly interesting. Germany dominated possession and pressured high, forcing Argentina to play in its own half. That is obvious once we look at Argentina’s passes during the first half:
Only 28% of Argentina’s passes were made on its offensive half, versus 61% for Germany. Despite playing in the offensive half, Germany managed to get a much higher passing accuracy (84% vs. 69%), a testament to its amazing midfield.
However, that superiority didn’t manifest itself in chances and shots. In fact, Germany had quite a difficult time trying to get inside Argentina’s penalty box.
Out of 16 German atempts to get in the box, only one resulted in a shot: a late corner, with Howedes hitting the post.
The curious case of Christoph Kramer
Kramer suffered an injury on the 19th minute, but was only subsituted 12 minutes later. This included Germany’s worst period in the first half, as the first half profile chart above shows.
Reports say that he acted confused, and data shows that Kramer was largely “absent” in the period between the injury and the subsitutions: his only actions were one succesful reception and pass, and one loss of possesion.
The Second Half
The second half was much more balanced. We reproduce the same charts as the first half, which confirm this perception.
Even though Germany had much more success getting inside the box on the second half, only only pass resulted in a German shot from inside the box.
The Extra Time
Up until the goal, Germany’s dominance continued. After that, Germany completely gave up trying to score another goal, with only one attempt at a pass to the box.
However, its defensive strategy was successful, with Argentina barely entering Germany’s penalty box. Its only 2 shots came from outside the box, both by Messi, who at this point was probably feeling somewhat desperate.
Football is a game of space. That’s why parking the bus can actually allow to win a match. The dataset used only includes positions for players active offensive and defensive actions. Defensive positioning is therefore completely ignored, and even attacking play lacks important information: a player’s run without the ball can be much more important than a pass to score a goal.
Originally published at Football Crunching’s GitHub repository.
You can find the code and extra stats there.