The Art of Soccer Strategy: Exploring Team Dynamics through Clustering

Eitan Zavorin
INST414: Data Science Techniques
4 min readMay 2, 2024

In soccer, it is essential for team managers and coaches to understand the nuances of team dynamics and play styles in order to effectively strategize when facing opposing teams. Professional teams — especially the biggest ones — constantly employ large teams of analysts, tons of money, and countless hours of work to plan tactics for upcoming games. Each team has a unique approach to dissecting the vast data on other teams’ play styles and statistics to inform their coaching strategies and game tactics. In a thorough data analysis, I aim to provide team managers and coaches with a new look into how they can analyze data to prepare for upcoming matches. Using a dataset with roughly 300 teams in the world’s top soccer leagues, I will be investigating the following research question: By uncovering patterns of similarity among soccer teams based on their play styles, can I help soccer team managers and coaches make more accurate, informed decisions in their preparations for upcoming games?

The data I will use to answer this question comes from Kaggle’s public European Soccer Database, which includes vast information regarding about 300 teams and their play style attributes, such as build-up play speed, chance creation positioning, and defense aggression. Utilizing these attributes allows us to understand a quantifiable and comprehensive overview of a team’s playing style. By capturing the essence of how teams approach various facets of the game, we can gather insights into their strategic tendencies.

After carefully cleaning the data by removing irrelevant columns, non-quantitative attributes, and columns with mostly null values, my final dataset is ready to be clustered. In order to do this, I am using traditional K-means clustering, which clusters based on pairwise Euclidean distances from point to point. After testing many different k values, it became clear to me that k = 5 best splits the data while avoiding redundancy between the top attributes and play styles between different clusters. The k value was also determined using my own expertise in the subject matter, because I could tell when clusters were too similar based on the teams that were in them and the top attributes in each. Once I established a clustering model that seemed accurate based on my high-level understanding of the sport, I ended with these clusters:

These five clusters depict 5 different types of teams that distinguish from each other by their tactical approaches. Cluster 1 encompasses teams that emphasize defensive solidity and moderate tempo in build-up play, with some examples being Valencia CF and Club Brugge KV. Cluster 2 exhibits proactive attacking tactics, prioritizing chance creation and shooting, as seen in Eintracht Frankfurt and Fiorentina. Cluster 3 focuses on high-tempo, expansive strategy, such as Manchester United and West Ham United. Cluster 4 excels in defensive structure, positional discipline, and wide attacking play, as demonstrated by Olympique Lyonnais and Crystal Palace. Lastly, Cluster 5 represents more possessive teams focusing on controlling the game and moving the ball forward slowly, typified by Milan and Feyenoord. Judging by the accurate team samples, it’s clear that the teams that share clusters do, in fact, have many things in common that the average fan or team coach might not be aware of.

Upon analyzing the clusters I have laid out based on all of these teams’ play style attributes, we can not only discern distinct clusters representing different tactical approaches, but we can also gain a ton of insight on any given team based on its attributes and the teams that are similar in play style. Here’s a possible scenario in which a manager or coach could use the clusters I have found:

Suppose I am the manager of the English league team, Newcastle United. Every year, my team plays several seasonal games against Manchester United, but we rarely play against Athletic Club de Bilbao, which plays in the Spanish league. Suppose that in a European tournament, Newcastle United gets scheduled to play against Athletic Club de Bilbao. It may be intimidating to be put against a team that we rarely play since we know less of what to expect in their play style. However, if we take a look at Cluster 3 above, we can see that Athletic Club de Bilbao has a very similar play style to Manchester United — who we know how to play against. By knowing this connection in similarities and tactics Athletic Club de Bilbao is known to play with, we could gain some priceless insights on how to tactically prepare for our game against a team we otherwise wouldn’t be familiar with. This is just one hypothetical scenario of how my clustering model could bring valuable insights to guide managerial decision-making for team success on short-term and long-term scales. So, to answer my initial research question: Uncovering patterns of similarity among soccer teams based on their play styles can really help soccer team managers and coaches make more accurate, informed decisions in their preparations for upcoming games.

Before I can finish, it’s essential to acknowledge the limitations in my research. Firstly, the available play style attributes do not capture the full complexity of team dynamics; it’s a certainty that the model overlooks subtle nuances that aren’t present in the data. Additionally, the dataset was last updated eight years ago, and the attributes and team strategies could be somewhat different now than they were eight years ago. With a more detailed and up-to-date dataset, we could acquire much more accurate calculations and gather much more useful insights on the teams. With that said, this is a great start toward bringing a way for coaches to strategize and steer their teams to short-term and long-term success.

Link to GitHub Repository: https://github.com/eitanzav/INST414Module4Assignment

--

--