Using Markov Models To Analyse Defensive Strategies In Modern Football: Simplified & Explained

Published in

After The Full Time Whistle

16 min readMay 18, 2023

With the advent of technology and the evolution of the application of analysis tools and mathematical models to football, and of course AI, we’ve seen the spectrum of football analysis grow and provide coaches, managers and staff with added insights to make more informed decisions on the pitch.

This generation of football fans have access to a wide range of data and analytics to improve their knowledge and take a deep dive into football, which is starkly different from when I was growing up.

But that bit of reflection aside, in an older article, for reference, I explained how Markov models can be used to calculate expected threat (xT).

However, in this article, we will explore how Markov Models can be used to analyse defensive strategies in the modern game.

In this article I try to explain, in simple terms, the process by which a team can negate and determine how to deploy their defensive tactics so that they can prevent their opponents from scoring.

This is done in a bit of a complicated way using something known as a Markov decision process or MDP.

What Is a Markov decision process or MDP?

Markov models are one of the most commonly used modelling formalisms in AI. Due to their ability to model dynamic environments and decision making, Markov models have been used to tackle a variety of different real-
world problems.

This example is to help you understand what a Markov decision process or MDP is using an analogy of a remote control car, to explain it in as simple terms as possible.

A Markov Decision Process (MDP) is like a game where we have to make decisions, and each decision leads to a different outcome. But we can’t be sure what outcome we’ll get because there’s some randomness involved.

Imagine you have a toy car that you can control with a remote control. You want to make the car go to a specific place, but there are some obstacles in the way. You have to make a series of decisions to get the car to the destination, but each decision has a chance of succeeding or failing.

To help you make the best decisions, you can use an MDP. In an MDP, you have a set of states, which are like the different positions the car can be in. You also have a set of actions, which are like the things you can do with the remote control, such as turning left or right. Each action has a chance of moving you to a new state.

For example, if the car is in a state where it’s facing a wall, and you turn left, the car might hit the wall and stay in the same state. But if you turn right, the car might be able to move to a new state where it’s facing a clear path.

Each state also has a reward, which is like a score that tells you how good that state is. For example, if the car is in a state where it’s close to the destination, the reward might be high. But if the car is in a state where it’s stuck behind a big obstacle, the reward might be low.

The goal of the game is to find the best sequence of actions to take to get the highest total reward. You can use math to calculate the probability of each action succeeding and the reward for each state. Then you can figure out the best sequence of actions to take to get the highest total reward.

So, in summary, an MDP is like a game where you have to make decisions to get to a goal, but there’s some randomness involved. You can use math to figure out the best sequence of decisions to take to get the highest total reward.

The Steps Involved In Determining The Objective

In general, the steps used to determine any objective using Markov models follows the following steps:

Markov Decision Processes (MDPs): MDPs are mathematical models used to study decision-making in situations with uncertainty. In the context of soccer, an MDP can represent the game as a sequence of states, actions, and rewards. Each state represents a particular situation, each action represents a decision a player or team can make, and rewards indicate the desirability of a state or action.
Model Checking: Model checking is a technique that involves analyzing a model to verify its properties or find optimal solutions. In this paper explained ahead, the authors use model checking to analyze the MDP model of football and provide tactical advice. They explore different sequences of actions and evaluate their outcomes to find optimal strategies.
Value Iteration: Value iteration is an algorithm used to compute the optimal values (or scores) of states in an MDP. It starts with an initial estimate of the values and repeatedly updates them until they converge to their optimal values. This process helps determine which states are more favorable or valuable in achieving specific goals.
Policy Improvement: In the context of MDPs, a policy is a strategy that specifies which action to take in each state. The authors use policy improvement techniques to refine the strategies derived from the MDP model. By evaluating and adjusting policies based on the calculated values of states, they aim to improve the team’s performance.

The Paper

Now that that’s out of the way, we come back to the paper that explains the application of MDP to football with a specific target or objective of focusing on defense, whereby a team can minimize its risk of conceding a goal.

The paper is called: Analyzing Learned Markov Decision Processes using Model Checking for Providing Tactical Advice in Professional Soccer by Maaike Van Roy, Wen-Chi Yang, Luc De Raedt and Jesse Davis.

Abstract

The paper’s Abstract explains the end objective, which is to prevent a goal.

Markov models are commonly used to model professional sports matches as they enable modelling the various actions players may take in a particulargame state. In this paper, our objective is to reason about the goal-directed policies these players follow. Concretely, we focus on soccer and pro-pose a novel Markov decision process (MDP) that models the behavior of the team possessing the ball.
To reason about these learned policies, we employ techniques from probabilistic model checking. Our analysis focuses on defense, where a team aims to minimize its risk of conceding a goal (i.e., its opponent scores).
Specifically, we analyze the MDP in order to gain insight into various ways an opponent may generate dangerous situations, that is, ones where the opponent may score a goal, during a match.
Then, we use probabilistic model checking to assess how much a team can lower its chance of conceding by employing different ways to prevent these dangerous situations from arising.
Finally, we consider how effective the defensive strategies remain once the offensive team adapts to them. We provide multiple illustrative use cases by analyzing real-world event stream data from professional soccer matches in the English Premier League.

How The Authors Construct Their MDP Model With The Above Objective

The MDP’s state space consists of locations on the pitch, and actions involve moving between these states or shooting on goal. The probabilistic model checker PRISM is used to reason about the MDP in different ways, such as understanding how an opponent may generate a scoring opportunity, evaluating the effectiveness of defensive strategies, and estimating their effectiveness even if the opponent adapts to them.

The objective is to gain insight into various ways an opponent may generate dangerous situations where they can score a goal, and to assess how much a team can lower its chance of conceding by employing different ways to prevent these situations from arising.

Players continue to perform actions until one of two absorbing states is reached: a goal is scored and they receive a reward of 1

the possession ends (e.g., a turnover occurs, a shot is taken and missed, etc.) and they receive a reward of 0.

The value of a non-absorbing state is then the probability of eventually scoring from that state, which can be obtained using the standard dynamic programming approach.

An example of a possession sequence for Manchester City is provided in this Figure 1. below.

**Figure 1:** Four actions in event stream format recorded from Manchester City versus Liverpool on 14/1/2018

The MDP for this paper is defined as consisting of states (S), actions (A), transition probabilities (P), a reward function (R), and a discount factor (γ).

The set of states (S) consists of 89 field states and three absorbing states: loss of possession, failed shot, and successful shot.

The partitioning of the field states is fine-grained where chances of scoring are higher and more coarse-grained where goal scoring chances are lower to ensure sufficient data in each state while being fine-grained enough to capture important differences between locations.

The set of actions (A) includes moving to any field state and shooting.

The transition probabilities (P) are defined for both the absorbing and field states. For the field states, the transition probabilities are defined for successfully moving to a new field state, losing possession, or scoring a goal.

The reward function (R) is 1 only when a goal is scored and 0 otherwise.

To explain the above further, in a bit of a simplified manner:

States (S): The MDP consists of different states that represent specific situations on the football field. In this case, there are 89 field states, which are different locations on the field, and three absorbing states: loss of possession, failed shot, and successful shot. These states capture different scenarios that can occur during a football match.
Actions (A): Actions refer to the decisions that a player can make in a given state. The set of actions includes moving to any field state on the football field and shooting.
Transition Probabilities (P): Transition probabilities describe the likelihood of moving from one state to another after taking a particular action. In this paper, the transition probabilities are defined for both the absorbing states (such as loss of possession, failed shot, and successful shot) and the field states. They specify the probabilities of successfully moving to a new field state, losing possession, or scoring a goal.
Reward Function (R): The reward function assigns a value to each state-action pair in the MDP. In this case, the reward function is simple: it assigns a reward of 1 when a goal is scored and 0 otherwise. So, scoring a goal is considered a desirable outcome and receives a reward of 1, while other actions or states receive a reward of 0.
Discount Factor (γ): The discount factor is a number between 0 and 1 that determines the importance of future rewards in the decision-making process. It reflects how much we value immediate rewards compared to rewards in the future. However, the specific value of the discount factor is not mentioned in the text.

The long-term value of a state is given by the value function, which considers the policy to be followed and calculates the sum of the rewards discounted over time.

The policy and transition model for a team can be learned from historical data by estimating all probabilities with simple counts. However, the chosen action space complicates this process, requiring the identification of intended end locations of failed movement actions.

To solve this problem, the authors of the paper use a gradient boosted trees ensemble to predict the intended end location of actions based on several characteristics of the actions and what happened before those actions.

Probabilistic Model Checking

In this section, the authors of the paper introduce probabilistic model checking, which is a technique used to verify whether a probabilistic system satisfies a specific property. Probabilistic model checkers such as PRISM and STORM are commonly used to provide quantitative guarantees for systems with probabilistic behavior.

The authors focus on reachability related properties in PRISM to reason about how an opponent reaches a dangerous situation. They use the probabilistic reachability property in PCTL∗ to evaluate whether a property holds in a state, which returns a true or false value.

The authors specifically look at quantitative properties of the form P=?[α], which query the probability that α holds, returning a real number in the range [0,1].

In football, possession sequences can start anywhere on the field, and a property can be evaluated for any initial location or state. The authors use Pf=?[prop] to indicate that a property prop is evaluated for a state f, which returns the probability that prop holds in f.

Reasoning About MDPs

This section of the research paper focuses on the application of the methodology mentioned earlier to learn team-specific policies and transition models of the underlying MDPs in the context of soccer. The authors use real-world event stream data from the 2017/18 and 2018/19 English Premier League (EPL) seasons, provided by StatsBomb, to create a team-specific MDP for each of the 17 teams that played in both seasons.

The team-specific policy, together with the MDP, is transformed into a MRP(Markov Reward Process) that probabilistic model checkers such as PRISM can analyze.

By checking different properties against the MRP, the authors gain insight into how the team reacts to different situations. The objective of a football team is to score goals, which can be specified as a probabilistic reachability property. By checking this property against the team’s MRP, the authors produce a value function that assigns a probability of scoring to each location on the field, representing the team’s threat level.

The threat level can be used by the opponent to reason about the effect of possible defensive tactics, which is the focus of this paper.

The authors demonstrate how one can reason about possible strategies for reducing the opponent’s scoring opportunities by analyzing the opponent’s MRP (i.e., MDP with a fixed policy) and forcing them to avoid certain critical locations on the field.

Specifically, they look at the crucial locations for generating shots and buildup play, and assess how effective these defensive strategies remain after the offensive team adapts to them. This approach provides valuable insights into the decision-making process of football teams that can help coaches and analysts optimize their team’s performance by analyzing and adapting to different strategies.

Shot Suppression

Note: The mathematics might be slightly complicated, however, I’ve tried to explain it using whatever mathematical knowledge I have from my undergrad B.E. Engineering degree days, especially when it comes to probabilistic computing.

Coming to the meat of the paper, in the Shot Suppression section:

The paper presents two approaches for suppressing shots in football games: indirect shot suppression and direct shot suppression.

The yellow shaded region in Figure 2, which includes shot locations, accounts for approximately 91% of all shots taken during a game.

And of course, as you can see, it’s closer to the goal, in the box and just outside the box.

“**Figure 2:** The gray lines denote the field states used in the MDP. A team attacks from left to right. The partitioning is more fine-grained near the opponent’s goal and more coarse-grained in defensive half of the pitch. The colored regions denote zones used in the verification queries: yellow denotes the area where most shots are taken from, gray denotes the final third entry region, and dark blue denotes the middle third of the pitch.”

Indirect shot suppression aims to limit the number of times the opponent reaches this region, thereby indirectly suppressing shots.

To determine how likely an opponent is to reach shot locations from a given location, the probability of reaching shot locations from a location f is computed using a query:

Formula 2

To assess the effect of a counterfactual policy where the opposing team avoids entering a specific location f0, the probability of reaching shot locations from f is computed using:

Formula 3

By computing the percentage decrease in the probability of reaching shot locations when forced to avoid f0 for all locations in non-shot, Formula 4 can be used to measure the importance of f0 for indirectly suppressing shots.

Manchester City and Burnley’s most important states for indirect shot suppression lie centrally, outside the shooting locations and on the right side of the penalty box, respectively. Shown in Figure 3(a) and 3(b).

Preventing Manchester City and Burnley’s entry into their most important states decreases their chance of reaching the shooting locations by almost 20% and just over 9%, respectively.

“**Figure 3:** The percent decrease in reaching ((a) and (b)) and shooting from (© and (d)) the common shot locations (in gray) for Manchester City (MC) and Burnley (B). Yellow shading indicates states with a larger decrease, whereas dark blue shading indicates a smaller decrease. The three states with the largest decrease are labeled in each figure. Manchester City experiences the biggest decreases in the (deep) central areas with less impact on the flanks. In contrast, Burnley experiences large decreases in the central areas near the box and on the right flank.”

As explained earlier, the field states are fine-grained where chances of scoring are higher and more coarse-grained where goal scoring chances are lower.

Direct shot suppression on the other hand, aims to limit the number of shots an opponent takes, reducing the chances of conceding. To assess a team’s likelihood of generating shots, the probability of a sequence starting in location f and eventually resulting in a shot from shot locations is computed using:

Formula 5

The probability of shooting is computed using the below equation and is used to assess the effect of forcing the opposing team to avoid entering a specific location f0.

Formula 6

Formula 7 can then be used to measure each state’s importance for directly suppressing shots by computing the percentage decrease in the probability of shooting when forced to avoid f0 for all locations in non-shot.

Manchester City and Burnley’s most important states for direct shot suppression lie centrally and on the right side of the penalty box, respectively. As shown in Figure 3(c) and 3(d).

Preventing entry into these locations decreases Manchester City and Burnley’s chances of shooting by almost 10% and 4%, respectively.

So positioning players or pressing in these regions, can decrease both teams’ chances of scoring.

Movement Suppression

In this section of the paper, the authors introduce a method for suppressing the movement of the opponent team in order to reduce their chance of scoring during the build-up phase of an attack.

The method focuses on two regions of the field: the final third entry region and the middle third region. The authors define a set of states in these regions that an opposing team should prevent their opponent from entering to decrease their chance of scoring by a certain percentage. The authors use formula 8 and 9 to formulate this query.

Formula 8 computes the difference in the probability of scoring between a given state and a set of states that the opponent should avoid.

While formula 9 finds a cluster of states around a given state where the probability of reaching that state from any of the other states in the cluster is greater than a threshold value.

Formula 8

Formula 9

The authors then proceed to demonstrate the effectiveness of this method by applying it to Manchester City and Liverpool.

By avoiding certain areas of the field, the authors were able to reduce the chance of scoring from each of the final third entry states by at least 10%.

For Manchester City, the crucial areas to avoid were located around the center and left side of the field, where their creative players operate.

For Liverpool, the crucial areas to avoid were a mirrored version of those of Manchester City, located on the middle and right side of the field, where their attacking wing-back and player of the season Trent Alexander-Arnold operate.

The authors also show that decreasing the chance of scoring by at least 1% in each state in the middle third of the field can be achieved by forcing the opponent team to avoid the center of their defensive third.

Evaluating the Effect of Adapting the Policy

Here, the authors explore the effect of a team adapting their old policy (π) towards a new policy (π0) in response to an opponent’s strategy of forcing them to avoid certain areas.

The new policy (π0) simply stops trying to reach locations in the area, and the lost probability mass is redistributed over all other states. By fixing the new policy (π0) in the team’s MDP, the authors reason about the effect on the chances of scoring while adapting to being forced to avoid the area. This is quantified using Formula 11, where Vπ(f) and Vπ0(f) represent the expected value of state f under policies π and π0, respectively.

The authors illustrate the effect of Manchester City and Liverpool adapting their policies to an opponent’s strategy of forcing them to avoid certain areas.

Forcing Manchester City to avoid the blue area in Figure 4a will de-
crease their chance of scoring by 16.9%. However, if they adjust their policy, the decrease is reduced to 3.9%.

For Liverpool, forcing them to avoid the blue area in Figure 4e will
decrease their chance of scoring by 12.3%. When they adjust
their policy, this decrease is reduced to 4%.

“**Figure 4:** Illustrates for Manchester City (top row) and Liverpool (bottom row) four areas (blue) to prevent them from reaching in order to decrease their chances of scoring in each final third entry state by at least 10% and in each middle third state by at least 1%.”

To put that into context with the example of players, if the defensive strategy against Man City positions players in the blue area of Figure 4a, which is where their creative midfielders De Bruyne and Bernardo Silva often drop into and operate, City’s chances of scoring decrease by as mentioned 16.9%.

Whereas for Liverpool, Figure 4e, is where Trent Alexander-Arnold and Mo Salah operate, arguably Liverpool’s most potent and creative players who offer a significant goal threat. By pressing or positioning defenders in that area of the pitch, Liverpool’s chances of scoring decrease by 12.3%

While the decrease is less impressive in the latter case for both teams, this
still represents a reasonable reduction, certainly given that
adapting one’s strategy is hard.

Conclusion

As the authors write in the paper’s conclusion:

We have shown how machine learning techniques can learn a model that can be used to reason about goal-directed policies in the complex dynamic environment of professional soccer. We believe that our approach is also applicable to other environments. While there are no strong guarantees about the model’s correctness as would be required in a ver- ification context, it clearly supports reasoning about strategies and policies with respect to safety (i.e. reducing the chance of conceding).

Furthermore, visualizing the results of the queries can help human soccer experts better understand the effects of potential strategies, which in turn contributes to trustworthy AI. From an application perspective, the proposed approaches can form a basis for future tactical analysis in sports.

Resources

Van Roy, M., Yang, W. C., De Raedt, L., & Davis, J. (2021). Analyzing learned markov decision processes using model checking for providing tactical advice in professional soccer.