Identify Game Tactics in Soccer by Clustering Positional Data

Robert Marzilger
7 min readJun 20, 2023

--

This article presents a Data Scientists perspective to identify game tactics in soccer using a neural compression technique for fast clustering.

Motivation

In 2021 we showed how Siamese Neural Networks can be applied for dimensionality reduction of the high dimensional positional data in soccer (Reeb, 2022). Thus, permit a fast search in the positional data. In this follow-up article we show how clustering of these condensed data can be used to identify game tactics in soccer, from a Data Scientists perspective.

Introduction

Player tracking with camera-based systems is quite common in professional soccer. The camera images are converted to positional data and often manually annotated to identify game events, like shot at goals or corner kicks. During recent years, one research direction on these positional data was to find similar scenes (e.g. Shat et al. 2016; Löffler et al., 2022), another direction was to identify team tactics (Memmert et al. 2017). The later seems to be even more important, especially for coaches during post-game analysis or preparation for the next opponent’s tactic. One approach in that direction, presented by Narizuka et al. (2019), uses clustering to identify formations and possible formation changes during soccer games. Along with this approach, we show that clustering soccer scenes, containing standard situations, can be used to identify differences in team tactics.

Methods

Dataset and preparation

From 317 games of the German Bundesliga season 2015/16 we extracted scenes of five seconds duration, resulting in approximately 1.2 mio. scenes. Each scene is naturally represented by 5750 data points (23 trajectories in x and y for 125 time points, i.e., 5 seconds @ 25 Hz), but reduced to 64 features using Siamese Neural Networks. We have previously shown that this dimensionality reduction approach can significantly reduce the time to find similar scenes when compared to the original (not reduced) dataset (Reeb et al., 2021). Consequently, we also assume a significant speed up for the cluster calculation.

Information for reading the presented figures: in the graphical representation of a scene the attacking team is shown in blue and plays from left to right. The defending team is presented in green and the ball in red.

Clustering approaches

The Scenes in the embedding space were grouped according to the main event that is present in the scene. For the events goal kick and corner kick we applied k-means and agglomerative clustering to identify different plays (e.g., the build-up phase after a goal kick) between teams in these standard situations. The range of cluster numbers for k-means was between 2 and 10. For agglomerative clustering we calculated the whole hierarchical three that represents the distances between the formed cluster, using ward linkage. For further analysis the tree was cut at the first, second, and third level (i.e., split). Resulting in 4, 8 and 16 cluster for the goal kick and corner kick event. For agglomerative clustering we also implemented the possibility to cut the tree at a specific distance to manually identify the different cluster. However, that approach was not evaluated as it is subjective.

Results

Goal kick

For the event goal kick k-means with 5 cluster showed 4 different ways to start the build-up play (i.e., short passes to the left and right and long passes to the left and right). The fifth cluster serves as a kind of garbage cluster and contains scenes that did not fit to the other cluster. Figure 1 shows the cluster distribution for k-means (k=5) and the medoid scenes for each cluster.

If more cluster were allowed to the k-means algorithm, the scene distribution into different cluster was more distinct, i.e., finer separation of different plays.

Figure 1: The upper part shows the Cluster distribution for k-means (k=5) for four different ways of build-up play after a goal-kick (Cluster ID 0–3) and a “garbage” group (Cluster ID 4). The colors indicate the team placement in the league at the end of the season, from upper part (green) to the lower part (red). The lower part of the figure shows the reference scenes for each cluster. Long goal-kicks to the right and left in the first two images and short goal-kicks in the third and fourth image. The last image in the second row represents the “garbage” cluster.

The results for agglomerative clustering were similar to k-means clustering and hence not shown.

In a second step, we reviewed the cluster distribution for the best and worst team separately (according to the ranking at the end of the season). In comparison to the general scene distribution (Figure 1, top), the worst team follows the average cluster distribution for build-up plays (Figure 2, red part of the bars) while the best team has a different playing style (Figure 2, green part of the bars). From this finding it can be argued that the best team rather starts the build-up phase with short passes from the goal (cluster ID 2, 3), while the worst team tries to start with a pass towards the middle line during build-up phase (cluster ID 0, 1). The later might be motivated to not lose the ball in the own half during build-up phase. The identified differences in cluster distribution indicate that clustering of game scenes can be used to identify different build-up game styles.

Figure 2: Comparison of different build-up plays for the best and worst team of the season. See figure 1 for the differences in player and ball trajectories between the plays.

Corner kick

In the following we will present the results for the corner kick event for agglomerative clustering only, as k-means provided no reasonable results. Cutting the hierarchy tree after the first or second level (i.e., 4 and 8 cluster) resulted in non-usable clustering results. Nevertheless, cutting after the third split resulted in six successful types of corner kick. Figure 3 shows the successful (i.e., Ball reaching a player from the own team) short and long corner kicks from the left and right side. Based on the graphical representation of the cluster scenes it is also possible to identify non-successful corner kicks (figure 4), a) by the change of attacking team and b) by the player who shot the corner, he is rushing into the field after the kick.

Figure 3: Successful corner kicks, identified by agglomerative clustering. Top row, left to right reference scenes for cluster ID 0 to 2. Bottom row, left to right reference scenes for cluster ID 3 to 5.
Figure 4: Non-successful corner kicks, identified by agglomerative clustering. Top row, left to right reference scenes for cluster ID 12 and 13. Bottom row, left to right reference scenes for cluster ID 14 and 15.

With the knowledge, which cluster show successful and unsuccessful corner kicks, we can again compare the best and the worst team of a season. Figure 5 shows that the best team (green) had more corner kicks than the worst team (red) (119 vs. 89). According to the distribution of corner kicks, it can be further seen that the best team had more successful corner kicks (i.e., more corner kicks in cluster zero to five, figure 5) than the worst team. However, the unsuccessful corner kicks did not differ much between the best and worst team.

The distribution of successful corner kicks suggest that kicks from the left have a higher variation, as agglomerative clustering found 4 cluster for kicks from the left but only 2 cluster for kicks from the right for the event corner kick.

Figure 5: Distribution of successful and non-successful corner kicks between the best (green) and the worst (red) team of the season.

Discussion

In this article, we showed that different clustering methods can be used to identify team tactics based on positional data from standard game events.

We applied k-means and agglomerative clustering to goal kick and corner kick events. However, the performance for k-means was bad on corner kicks. That indicates that the data distribution is not evenly separated and spherical, as would be the best for k-means.

The distance-based approach from agglomerative clustering on the other hand works well for both event types, indicating an uneven distribution of different types of the same event.

In addition both clustering approaches for the investigated event types indicate that a certain number of scenes cannot be assigned to a useful cluster, i.e., scenes from cluster five for k-means clustering of the goal kick event and cluster six to eleven for agglomerative clustering for the corner kick event. This might be a general issue, but also the often insufficient synchronization between event and positional data might be a reason. Moreover, our definition of a scene (i.e., 5 seconds long and 80% ball possession for one team) might have influenced the clustering outcome.

In conclusion we could show that for a very limited number of standard situations and clustering approaches we can identify differences in game tactics between different teams in soccer. However, we are also aware that our approach has some limitations (e.g., 5 second scene length, data of only one soccer season) and needs further research for an acceptance and possible application for game analysis in soccer.

We would like to encourage others, especially experts in the field of game analysis, to pick up our approach and consider it in their own research or get in contact with us to dive deeper into the presented approach together.

Acknowledgements

This work was supported by the Bavarian Ministry of Economic Affairs, Infrastructure, Energy and Technology as part of the Bavarian project Leistungszentrum Elektroniksysteme (LZE) and through the Center for Analytics-Data-Applications (ADA-Center) within the framework of “BAYERN DIGITAL II”.

Furthermore, I’d like to thank my colleague Nicolas Witt for his contribution to this article.

References

Löffler, C., Reeb, L., Dzibela, D., Marzilger, R., Witt, N., Eskofier, B. & Mutschler, C. (2022). Deep Siamese Metric Learning: A Highly Scalable Approach to Searching Unordered Sets of Trajectories. ACM Transactions of Intelligent Systems and Technology, 13(1).

Memmert, D., Lemmink, K. A. P. M., & Sampaio, J. (2017). Current Approaches to Tactical Performance Analyses in Soccer Using Position Data. Sports Medicine, 47(1).

Narizuka, T., & Yamazaki, Y. (2019). Clustering algorithm for formations in football games. Scientific Reports, 9(1).

Reeb, L. (2022). Searching for Soccer Scenes using Siamese Neural Networks.

Sha, L., Lucey, P., Yue, Y., Carr, P., Rohlf, C., & Matthews, I. (2016). Chalkboarding: A New Spatiotemporal Query Paradigm for Sports Play Retrieval. 21st International Conference on Intelligent User Interfaces.

--

--