Squad selection and Match Analysis in Soccer using big data

Naeer Amin
Trends in Data Science
9 min readJan 25, 2022

Soccer (or football) is one of the most-watched sports globally and has been increasingly seeing a considerable rise in data analytics for squad selection and match analysis. Professional soccer teams have been using advanced data modelling techniques to solve many challenges, including team selection, personalized training, tactical analysis, and player recruitment. Even though the application of these advanced modelling techniques is very recent, statistical analysis has been in use since the 1950s. Charles Reep was the first analyst in soccer to use a notational system to record every event in a football match (Pollard, 2002). He eventually discovered that a series of three or four successive passes significantly increased the chances of scoring a goal (Larsen, 2001). Despite the inception of data analysis in soccer in the 1960s, image and video data have only been considered for data analytics since the 1990s. At present, though, most of the football clubs in major domestic leagues have a set of 8–10 digital cameras fitted in their stadiums, which collects tracking data (ten data points per second) about every player on the pitch (Marr, 2015). As a result, an enormous amount of data is generated each game, allowing the clubs to use data to solve problems or challenges.

The main objective of a football club is to win football matches and eventually tournaments or leagues. Consequently, the key factors driving innovation in soccer in terms of data analysis originate from this objective. They include factors like squad selection, players’ training programs, and match preparation and analysis. The coaches and managers can use positional data, tracking data, and event data to evaluate their players and opponents and select a team and tactics that would best allow them to win a match (Khaustov & Mozgovoy, 2020; Memmert & Rein, 2018; Rein & Memmert, 2016). Recently, the governing body of soccer, FIFA, also sanctioned wearable technologies during competitive matches, allowing the managers to make in-game tactical switches and substitutions based on positional and tracking data (Memmert & Rein, 2018).

Optimal Squad selection

Football clubs mainly select players for their team through try-outs and recruitments. The more established clubs usually already have a pool of players in their squad, and during each transfer season, they try and recruit players that would improve their team. Traditionally, the managers of these clubs depended on scouting information from their scouts to decide which player to bring in. However, recently, the development of events data captured during matches has allowed the scouts and managers to evaluate Spatio-temporal events such as tackles, passes, fouls, shots, and dribbles to decide which player to pick (Pappalardo et al., 2019). Even though the event data allows analysts to calculate important evaluation metrics such as Expected Goals, they have limitations. Metrics like Expected Goals focus only on the instant effect on shots and do not include the phase of play that occurred before the shot (Liu et al., 2020). As a result, they miss out on the contextual information while evaluating the players.

Similar to Expected Goals, (Brooks et al., 2016) proposed a metric that would determine the probability of a goal-scoring chance after each pass. (Bransen & Van Haaren, 2018) also suggested a metric in which they calculated the chance of a shot before and after a pass. However, these two metrics suffer from the same limitations of Expected Goals as they do not consider a player’s overall performance. Considering the above metrics’ limitations, (Liu et al., 2020) proposed Goal Impact Metric (GIM) that evaluates the player according to their actions and predicts how many assists or goals they will score in a season. An evaluation metric like GIM gives the managers a unique opportunity to quantify a player’s impact before deciding on their transfer. One example of this predictive modelling is how Liverpool FC recruits players. They used a Markov chain model to identify players such as Gini Wijnaldum, Robertson, and Xherdan Shaqiri, who had all been relegated from the Premier League previously but were outstanding performers in their teams (Powell, 2021). These players then went on to win the Champions League and Premier League with Liverpool, so it suffices to say that their recruitment was justified.

Goal Impact Metric (GIM) suggested by (Liu et al., 2020) also has limitations. The metric is measured only on data records that contain the position and actions of the player that currently has possession of the ball. Off-the-ball movements of the players are not considered as part of the metric. The model can be improved by using positional and tracking data of the players during training and competitive matches to understand the player better. It will give information about the players’ off-the-ball movements and the events data (Memmert & Rein, 2018). Consequently, the coaches can make a more informed decision while evaluating a player for transfer.

Potential Issues

Despite the advantages of the applications of events, positional, and tracking data, they pose some potential issues. Trying to fit a data set containing events, positional, and tracking data of players from all over the world will result in significant problems like scalability issues. It will become an even bigger challenge if we consider the historical data of the players. This problem can be overcome by using on-line learning methods in Neural Networks (Liu et al., 2020).

Collecting data of players from other clubs or leagues may be a significant issue as well. As data are usually collected by individual clubs, leagues, and commercial organizations, it might not be easy to access their data. They might not be too keen on sharing their data. Furthermore, there might be data privacy issues as players might not want their data to be shared with other clubs, and teams would not want to either as other clubs might gain an advantage by looking into the data of their players (Rein & Memmert, 2016).

Match Preparation and Tactical Analysis

To prepare for a match, coaches and managers must select the right players and set up their team with the correct formation and tactics. Setting up the team tactically is not easy as many factors can influence the formation of a team, including individual players’ abilities, team chemistry, and the opponent’s tactical setup. Traditionally, tactical analysis was carried out only using observational data and annotated data. However, analysis conducted in this way did not consider any contextual information and off-the-ball players’ movements (Goes, Meerhoff, et al., 2021; Rein & Memmert, 2016). With the recent availability of wearable technologies and tracking data, researchers and analysts can now use observational data, events data, video data, and tracking data to find valuable insights (Rein & Memmert, 2016).

The availability of the vast amounts of events, video, and positional tracking data has prompted researchers to develop data-driven solutions for tactical analysis. (Goes, Brink, et al., 2021) used positional tracking data to identify sub-groups in a team and investigated how the sub-groups interacted with each other during a match. According to (Goes, Brink, et al., 2021), an attack can be successful when spaces between the attacking players decrease in the final third. Coaches might use this information and tell their players to widen the pitch while trying to score a goal. In addition to this, (Lee et al., n.d.) looked at how positional data can be used to select formations or players for a match. He used weighted association rule mining to identify the association between positions and players and how they would help improve the team. This will be incredibly helpful for managers trying to select the first eleven for a match.

Researchers and analysts have also been using big data technologies to find tactical solutions. (Kröckel & Bodendorf, 2020) made use of process mining and positional and events data from OPTA sports to determine which players are most important to a team and determine if a player is being overwhelmed with tasks delegated to him or not. This might be crucial for coaches trying to find out the weak points of their opponents. Another way coaches can improve their tactical setup is by increasing their team’s pass completion rate and defensive shape. (Fang et al., 2021) proposed a long and short-term neural network model to predict pass completion rate and effective defensive positioning. Coaches can use this model to develop effective training plans to improve their pass completion rate and defensive positioning. Similarly, Liverpool FC has been using event data and tracking data to determine which pitch areas are dangerous and which part of the pitch they need to control. Based on the analysis from these concepts, Liverpool has been defending in central blocks and attacking with the help of diagonal balls, bypassing the center of the pitch as they have identified the center of the pitch to be the most dangerous area to lose the ball (Williams, 2020).

Potential Issues

Despite the recent research into using positional and tracking data for tactical analysis, it has not had a significant impact in the real world. Sports scientists keep using group-centric features like team centroid, which are not capable of representing information of the whole team, instead of using computer science techniques like data mining and machine learning for feature selection. Computer scientists keep on developing complex models to deal with larger data sets without collaboration with sports science. As a result, computer scientists and sports scientists have totally different contributions to tactical analysis. Perhaps, a collaboration between the two would have a much more significant impact on the real-world analysis of tactical solutions (Goes, Meerhoff, et al., 2021)

Even though collecting positional and tracking data of players is necessary for clubs to find tactical insights, it leads to data privacy issues. Players might be uncomfortable with their clubs having detailed information about them as it might have a significant impact on their careers. Football clubs also might be reluctant to publicly share their players’ sensitive data as they do not want to concede the competitive advantage to their competitors. However, even when these data are made available, clubs that are new to data analysis will face significant problems in data processing and will ultimately have to build new pipelines to manage and analyze the data (Bai & Bai, 2021; Rein & Memmert, 2016).

Conclusion

In conclusion, the current availability of enormous amounts of observational, events, positional, and tracking data has changed the face of data analysis in soccer. Researchers, analysts, and coaches can now use this vast data to completely change how they prepare for a match and analyze their competitors. As the collaboration between performance analysts, sports scientists, and computer scientists increases, more and more tactical solutions will be discovered using big data technologies.

References

Bai, Z., & Bai, X. (2021). Sports Big Data: Management, Analysis, Applications, and Challenges. Complexity, 2021, e6676297. https://doi.org/10.1155/2021/6676297

Bransen, L., & Van Haaren, J. (2018). Measuring Football Players’ On-the-ball Contributions From Passes During Games. ArXiv:1810.02252 [Cs, Stat]. http://arxiv.org/abs/1810.02252

Brooks, J., Kerr, M., & Guttag, J. (2016). Developing a Data-Driven Player Ranking in Soccer Using Predictive Model Weights. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 49–55. https://doi.org/10.1145/2939672.2939695

Fang, L., Wei, Q., & Xu, C. J. (2021). Technical and Tactical Command Decision Algorithm of Football Matches Based on Big Data and Neural Network. Scientific Programming, 2021, e5544071. https://doi.org/10.1155/2021/5544071

Goes, F. R., Brink, M. S., Elferink-Gemser, M. T., Kempe, M., & Lemmink, K. A. P. M. (2021). The tactics of successful attacks in professional association football: Large-scale spatiotemporal analysis of dynamic subgroups using position tracking data. Journal of Sports Sciences, 39(5), 523–532. https://doi.org/10.1080/02640414.2020.1834689

Goes, F. R., Meerhoff, L. A., Bueno, M. J. O., Rodrigues, D. M., Moura, F. A., Brink, M. S., Elferink-Gemser, M. T., Knobbe, A. J., Cunha, S. A., Torres, R. S., & Lemmink, K. A. P. M. (2021). Unlocking the potential of big data to support tactical performance analysis in professional soccer: A systematic review. European Journal of Sport Science, 21(4), 481–496. https://doi.org/10.1080/17461391.2020.1747552

Khaustov, V., & Mozgovoy, M. (2020). Recognizing Events in Spatiotemporal Soccer Data. Applied Sciences, 10(22), 8046. https://doi.org/10.3390/app10228046

Kröckel, P., & Bodendorf, F. (2020). Process Mining of Football Event Data: A Novel Approach for Tactical Insights Into the Game. Frontiers in Artificial Intelligence, 3, 47. https://doi.org/10.3389/frai.2020.00047

Larsen, O. (2001). Charles Reep: A Major Influence on British and Norwegian Football. Soccer & Society, 2(3), 58. https://doi.org/10.1080/714004854

Lee, G. J., Jung, J. J., & Camacho, D. (n.d.). Exploiting weighted association rule mining for indicating synergic formation tactics in soccer teams. Concurrency and Computation: Practice and Experience, n/a(n/a), e6221. https://doi.org/10.1002/cpe.6221

Liu, G., Luo, Y., Schulte, O., & Kharrat, T. (2020). Deep soccer analytics: Learning an action-value function for evaluating soccer players. Data Mining and Knowledge Discovery, 34(5), 1531–1559. https://doi.org/10.1007/s10618-020-00705-9

Marr. (2015). How Big Data and Analytics are Changing Soccer. https://www.linkedin.com/pulse/how-big-data-analytics-changing-soccer-bernard-marr

Memmert, D., & Rein, R. (2018). Match analysis, Big Data and tactics: Current trends in elite soccer. Deutsche Zeitschrift Für Sportmedizin, 2018(03), 65–72. https://doi.org/10.5960/dzsm.2018.322

Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi, D., & Giannotti, F. (2019). PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach. ACM Transactions on Intelligent Systems and Technology, 10(5), 1–27. https://doi.org/10.1145/3343172

Pollard, R. (2002). Charles Reep (1904–2002): Pioneer of notational and performance analysis in football. Journal of Sports Sciences, 20(10), 853–855.

Powell, D. (2021, February 28). FSG transfer model under scrutiny as rival clubs take different approach. Liverpool Echo. https://www.liverpoolecho.co.uk/sport/football/football-news/the-data-can-only-take-19916562

Rein, R., & Memmert, D. (2016). Big data and tactical analysis in elite soccer: Future challenges and opportunities for sports science. SpringerPlus, 5(1), 1410. https://doi.org/10.1186/s40064-016-3108-2

Williams, J. (2020, January 15). Liverpool are using incredible data science, and match effects are extraordinary. Liverpool.Com. https://www.liverpool.com/liverpool-fc-news/features/liverpool-transfer-news-jurgen-klopp-17569689

--

--