From Our Lab to the Pitch: Advancements in Soccer Analytics through AI

Keisuke Fujii
6 min readFeb 7, 2024

In this article, I present the latest research findings from our lab related to soccer and machine learning. As an associate professor at Nagoya University (Japan), my research involves analyzing group behaviors in sports, animals, and vehicles using machine learning techniques. For more information on research related to soccer or other sports, please refer to our website. I will also briefly discuss the ongoing research projects towards the end.

To automatically evaluate plays in team sports like soccer, we require “event data” such as passes and shots recorded in top European leagues as well as “tracking data” that shows the location of all players at any given time (the latter is basically available for a fee). While the former can somewhat evaluate the player in possession of the ball, without the latter, it becomes incredibly difficult to assess defensive plays or off-ball movements by attackers. The current field of soccer analytics often uses Expected Goals (xG) or related concepts calculated without the full positional data of all players, focusing instead on the ball’s position and event data, which evaluates shots leading up to a goal but does not account for specific labeled actions by the player in possession. Our research has primarily focused on two themes regarding soccer analytics:

Trajectory Prediction of Players

The first theme is the use of machine learning for predicting player trajectories. By employing deep learning models that input player positions and velocities, we have shown that it is possible to accurately predict the future positions and velocities of soccer and basketball players (Fujii et al. 2023, Neural Networks). We have also utilized mathematical models for space evaluation techniques, quantifying aspects like the potential scoring opportunities from a pass (Spearman et al. 2018, MIT SSAC). Particularly, by using machine learning-based trajectory predictions as a benchmark, we have quantified how much a player’s movement contributes to increasing scoring opportunities for their team, thus successfully evaluating actions that create space for teammates (Teranishi et al. 2022 MLSA). This approach has led to the development of defensive evaluation methods, such as assessing team defense positioning through counterfactual predictions about player positions (Umemoto & Fujii 2023, Statsbomb Conf.).

Off-ball player evaluation metrics based on player trajectory prediction (Teranishi et al. 2022 MLSA). The movement generated by trajectory prediction is regarded as a reference, and the movement that an off-ball player sacrifices for his teammate is evaluated from the difference.
A method for evaluating team defense positioning by predicting counterfactuals (Umemoto & Fujii 2023, Statsbomb Conf.). By (a) choosing the location with the highest OBSO (most dangerous), (b) selecting the closest defender and his grid, and (c) exploring which grid will reduce the OBSO value the most. It can evaluate the positioning that “reduced the threat the most”.

In particular, research on trajectory prediction and space evaluation using mathematical models have developed separately, with the former alone being unable to evaluate players, and the latter alone being able to evaluate only the player receiving the ball. In this respect, the work of Teranishi et al. 2022 can evaluate sacrificed movements for teammates (for example, movements to create space), and the method of Umemoto & Fujii 2023 can evaluate each defender in principle. Combining both, in principle, we have proposed a framework that can evaluate the movements of almost all players. However, the method of Teranishi et al. 2022 in particular required a huge amount of computational time to evaluate all 22 players, because it required trajectory prediction for each player to be evaluated. Furthermore, because it only learns the input-output relationship (state and action, respectively) of the agent or player model), there was an issue in that it was unable to express tactical movements in pursuit of goals such as goals.

Reinforcement Learning

The second theme involves modeling players as agents who act to achieve rewards, using reinforcement learning as a framework capable of learning strategic actions. In exploring the landscape of sports analytics, studies such as Liu et al. (2018, IJCAI) and Rahimian et al. (2022, MIT SSAC) have made significant contributions to our understanding of player movements in team sports. These works frequently adopt a holistic approach by considering the team as a unified entity. This perspective has been invaluable for broad strategic analyses. However, this approach naturally focuses more on collective behavior than on the actions of off-ball players throughout the game. Recognizing the strengths and intended scopes of these studies, our research seeks to complement them by offering additional insights into the individual movements of off-ball players. To address these challenges, we proposed a deep reinforcement learning model based on a simplified reward structure, drawing inspiration from the Google Research Football environment (GFootball: Kurach et al. 2020, AAAI) and defining discrete actions like movement in eight directions, shooting, and passing, with each action’s value (Q-value) calculated by a deep learning model (Nakahara et al. 2023, IEEE Access). You can see my previous post on Medium.

Results of the reinfocement learning from soccer data. (Left) Scene of Player A passing to Player B, (Right) Calculated Q-values for each action. You can see my previous post on Medium.

As an advancement, we are currently developing methods to evaluate player decision-making quality and accuracy using “Inverse Reinforcement Learning” and “Game Theory”. Inverse reinforcement learning estimates what players are thinking not based on pre-defined rewards but from data itself. Although the applications of inverse reinforcement learning to soccer exist (Luo et al. 2020, IJCAI; Rahimian et al. 2021, MLSA), game theory has not been fully integrated. We have also explored applying game theory to soccer, presenting analysis on scenarios like one-on-one shooting situations (Yeung & Fujii, 2023, arXiv), but direct application of game theory to player reinforcement learning models as in Nakahara et al. 2023 remains challenging.

The current global trend, represented by AI technologies like large language models, involves using large-scale machine learning models for accurate sequence prediction (here, virtual simulation in soccer). Our lab has proposed a Transformer-based Neural Marked (Event) Spatio-Temporal Point Process model for predicting the type, location, and timing of soccer events from event data (Yeung et al. 2023, arXiv). We have also developed methods to create simulators based on mathematical models for reinforcement learning with real-world player data as demonstrations (Fujii et al. 2024, ICAART).

Challenges in data acquisition

Another significant problem is the acquisition of enough data. Machine learning models require large datasets, which must be either independently collected or purchased. While self-collected data can be used for small-scale models or just an evaluation, it is insufficient for training large-scale models. The volume of data is crucial for enriching the insights derived from analysis. Our lab has been working with limited match tracking and event data, restricting evaluations to a small set of teams or players. Access to full-season data would enable comprehensive evaluations of key players across all teams, highlighting the importance of data volume. However, the purchased data limits its use for broader research dissemination.

To address this, our lab collaborates closely with the University of Tsukuba’s soccer team, to work on publicly available soccer tracking datasets and algorithm research (Scott et al. 2022, CVSports). Despite the current use of large data being limited to top professional teams, not all categories have equal access to it, nor do all players receive high-quality coaching. Through our research, we aspire to create a world where such knowledge is accessible to everyone.

Research on public soccer tracking datasets and algorithms (Scott et al. 2022, CVSports). Toolkit available at https://github.com/AtomScott/SportsLabKit .

In conclusion, our research in soccer analytics, leveraging machine learning, aims to deepen the understanding of the game’s intricacies. By addressing challenges related to data acquisition and exploring innovative methodologies, we hope to contribute significantly to the field of sports analytics, making sophisticated analyses accessible to a broader audience and improving the quality of coaching and performance analysis across all levels of soccer.

--

--

Keisuke Fujii

Associate Professor at Nagoya University, Graduate School of Informatics