Predicting a Soccer Player’s Overall Skill Rating through Cosine Similarity Analysis

Eitan Zavorin
INST414: Data Science Techniques
2 min readMay 2, 2024

In soccer analytics, understanding player performance and similarity is crucial for various stakeholders, including coaches, team managers, and scouts. One interesting question that I am choosing to research: How can we identify players with similar playing styles and skills to a target player and predict the target player’s overall rating based on their attributes? Answering this question could bring valuable insights to team managers and scouts when considering potential transfers or assessing the value of existing players. Therefore, this data exploration could inform decisions related to player recruitment, team composition, and strategic planning.

The dataset used in this analysis comprises of the Player and Player_Attributes tables from Kaggle’s public European Soccer Database. These tables contain detailed attribute-related information on soccer players, such as finishing, dribbling, ball control, sprint speed, passing, and more. Each record in the dataset represents a soccer player and includes various quantitative metrics that quantify their play styles and skill levels. Using these attributes for players in the database allows us to measure similarities based on multiple dimensions.

For my analysis, I chose to carry out a regression as the predictive modeling approach since I aim to predict the player’s overall rating, a quantitative, continuous variable. Regression models are well-suited for predicting numerical values, making them ideal for this scenario.

The attributes and features used for the supervised model include various player traits such as speed, passing accuracy, dribbling, shooting, and sprint speed. These attributes are crucial indicators of a player’s technical abilities and play style, which help for identifying player similarity and predicting overall rating.

By analyzing the collected data and applying the regression model, we can identify players with similar playing styles and skills to a target player and predict the target player’s overall rating.

Github Repository: https://github.com/eitanzav/INST414Module6Assignment

--

--