The Two Types of Soccer Players: Shaping vs. Revolutionizing the Game

Eitan Zavorin
INST414: Data Science Techniques
6 min readApr 30, 2024

In the realm of sports analytics, understanding the complex relationships among players can reveal invaluable insights for coaches, team managers, and scouts. Leveraging network analysis techniques, I delve into the structure of a web-based network centered around soccer players who have bolstered the ranks of current and previous generations. Using my analysis, we embark on a journey to answer a fundamental question: Who are the key players shaping the dynamics of the soccer field?

The burning question in the minds of team managers, coaches, scouts, and fans revolves around identifying the most influential players whose play style and performance drive their team’s overall success. By deciphering a large network of soccer players, including vast data on players’ attributes, play styles, and skill ratings, I aim to pinpoint the key influencers whose strategic contributions steer their team toward victory.

In order to optimize the value of my investigation, it’s crucial to consider the stakeholders invested in the outcomes of the analysis. The primary stakeholders that the soccer world comprises of are team managers, coaches, scouts, club executives, and fans. For managers and coaches, insights into player centrality and influence within the network can inform critical decisions related to transfer targets, tactical strategies, starting lineups, and substitutions during matches. Scouts and talent evaluators depend on these analyses to identify talent prospects for transfers and recruitment. Additionally, club executives use these insights for decision guidance regarding player contracts, financial investment, recruitment strategies, and the long-term development of the team. Lastly and most definitely not least, the huge fanbases in the soccer community could benefit from a deeper understanding of player dynamics and performance, enhancing their watching experience, fostering engaging discussions about the sport, and even inspiring them to learn from the greats.

To determine which key players are shaping the dynamics of the soccer game, I will utilize and manipulate the public European Soccer Database dataset on Kaggle. This entire dataset includes seven tables — Country, League, Match, Player, Player_Attributes, Team, and Team_Attributes — containing a combined total of over 200,000 records, capturing several generations of soccer from numerous angles. For my investigation on soccer players, I will be using cleaned, trimmed, and merged versions of the Player and Player_Attributes table. After various steps of cleaning, the final dataset I am using contains roughly 600 rows. Each row represents a current or former player with a minimum overall skill rating of 82 (out of 100), and includes 32 columns of information regarding their play style and skill levels, such as attacking/defensive work rates, dribbling, passing, shooting, and more.

Before cleaning the data, I first had to prepare it by transforming it from the SQLite format it was in on Kaggle to CSV using DB Browser for SQLite. Once that was done, I could import it into my Jupyter Labs workspace to begin cleaning and organizing. First, I executed an inner join on the Player and Player_Attributes tables using each player’s unique player_ID, so that player names could be next to their attributes in one table. For relevance and downsizing my dataset to an interactable size, I removed all players with overall ratings under 82 (stakeholders would be less interested in worse players, anyway). Then, I dropped the player duplicates and null values from the table. The inner join created some duplicate columns, so I removed those along with several other columns that were irrelevant to the investigation — anything unrelated to play style or skill level. After all of these steps, my final data frame had just over 600 rows of players, including only the relevant attributes necessary to move forward with my investigation.

In the context of my network analysis, each player represents a node or vertex in the graph. These nodes encapsulate the individual soccer players, each characterized by a unique set of attributes depicting their skills and play style. An edge or relationship is created between two players when they share a certain number of equal attributes, which signifies similarities in their play style, skill level ratings, or other relevant characteristics. As for defining “importance” within the network, nodes with higher centrality degrees possess more similarities with other players. It’s crucial to understand here that both ends of the centrality degree spectrum are valuable nodes in the context of our investigation. Players with high degrees of centrality and “importance” in the network can provide insights into what characteristics are seen in most of the successful, high-rated players in the world. On the other side of the spectrum, outliers with low centrality degrees might not necessarily be “important” from a network centrality perspective, but they display what types of players, play styles, and skills stand out amongst the crowd. Thus, stakeholders could gain countless insights by diving deep into both sides. By identifying and analyzing both high centrality nodes and low centrality nodes, we gain valuable insights into the key players shaping the dynamics of the soccer field and driving team success. For some examples of the most “important” nodes in my graph, here are the top 10 players in centrality degree ratings:

  1. Iker Casillas 0.02463054187192118
  2. Mario Gomez 0.02134646962233169
  3. Mickael Landreau 0.02134646962233169
  4. Carlos Kameni 0.019704433497536946
  5. Sebastian Frey 0.019704433497536946
  6. Florent Malouda 0.0180623973727422
  7. Gabi 0.0180623973727422
  8. Heiko Westermann 0.0180623973727422
  9. Helton 0.0180623973727422
  10. Jakub Blaszczykowski 0.0180623973727422

These players demonstrate the highest number of similarities among the top-rated players. Looking into their attributes would give insight into what play styles could make so many players successful. Now, for more graphs, observations, and insights:

Here is one articulation of the network in its entirety. At first glance, this visual provides little meaning, but if you hover over one of the nodes (shown below), it will show you all the nodes connected to it through similarities in play style and skill. This can lead to countless insights, such as the discovery of who plays similarly to a particular player or transfer target.

Here is another visual of the entire network, displaying both sides of the spectrum of centrality, as I mentioned earlier. In the eye of the circle, we can see the players that are most similar to each other, while the outside shows the players who stand out in regard to their skills and play styles. In truth, there is no simple answer to our research question of who the key players shaping the dynamics of soccer are. On the one hand, it can be argued that the players in the center, who all have high skill ratings and influence on their team, outline the common skills and play styles found in most successful and influential soccer players. By exploring this subset, we have found that some of the most influential players include Iker Casillas, Mario Gomez, and Samuel Eto’o. Conversely, it can also be argued that the players on the outside are revolutionizing the game, steering their teams to success with a very unique set of skills. By exploring this subset, we have found that some of the most revolutionary and successful players include Andrea Pirlo, Paul Scholes, and Zlatan Ibrahimovic. At the end of the day, both types of players shape the dynamics of the game in their own ways, and neither side can take all the credit.

But don’t be mistaken. That doesnt mean that these findings don’t give us very valuable insights. Exploring and interacting with the graphs above would guide managers and coaches in strategy, help scouts identify talents for recruitment, enable club executives to drive long-term developmental and organizational decisions, and give fans around the world a deeper understanding of the leaders in the game they love.

Finally, it’s important to note that this is far from the end of the conversation. This data analysis could be expanded in many different ways to address some of its limitations, the biggest being the lack of player performance statistics. Although player ratings in various characteristics are a good way of measuring a player’s play style and skills, this could be enhanced further by taking seasonal/all-time performance stats into account. Knowing the number of goal contributions, shot blocks, saves, etc., would allow us to gather even more insight from our data than we can now. Secondly, having data on each player’s position would allow us to dive deeper into each position, comparing players of the same position to each other. These future steps could be taken to expand our knowledge of the topic and grant even more value to the stakeholders interested.

https://github.com/eitanzav/INST414Module2Assignment

--

--