Evaluating Player Impact Judiciously — T20

Evidence weighted normalized evaluation of players in terms of contribution to victory based on situations & conditions

Amol Desai
Boundary Line
12 min readAug 30, 2020

--

In a previous piece, I talked about how context plays an important role when evaluating players and their performances and discussed in detail, a methodology that I developed, that gives more importance to stronger evidence and uses several match, competition and venue level contextual information to produce a “judicious” evaluation of players. I developed metrics capturing the rate of scoring and the duration of survival for batsmen and the rate of conceding runs and the rate of taking wickets for bowlers.

Here, I will discuss a complementary model producing match impact metrics to evaluate players based on their influence on the outcomes of matches. These metrics are intended to provide additional color and granularity to the performance based metrics, but they are heavily influenced by factors around match, opposition and team context that & I feel that they are less useful when used in isolation.

Although available in the link above, as a recap, here are the base metrics we have so far:

  1. Evidence Weighted Relative RPO (EWR-RPO)
  2. Evidence Weighted Survival Factor (EWSF) (for batsmen)
  3. Evidence Weighted Relative WicketsPerOver (EWR-WPO) (for bowlers)

To these, we will add:

4. Evidence Weighted Relative Win Probability Added (EWR-RWPA)

5. Evidence Weighted Normalized Relative Win Probability Added (EW-NRWPA)

6. Evidence Weighted Inn/Over Win Probability Added (EW-Inn/Over-WPA)

Why do we need this?

Simply, put, match outcome based metrics evaluate the impact of a player’s performance on the match, whereas performance based metrics evaluate the player’s performance at the ball level. They should be correlated, but looking at both perspectives, lets us differentiate a player’s contributions in matches.

Here is sneak peak showing how the match impact based metric directly helps disambiguate player contribution:

Avg. batting metrics & match results

This disambiguation especially comes into play for situations were a player may have delivered a performance way above or below average in a foregone conclusion. In such cases, the match outcome based metric assigns the appropriate impact to the player.

Additionally, this allows combining batting and bowling contributions in the same metric. Performance based metrics, especially the dismissal related ones, don’t combine as elegantly.

Context for Match Outcome

In order to develop a metric for win probability added, we first need to build a model to assess win probability. I used a similar approach to the one in my previous piece, to get contextual information for the model. Here, I did not use the batsman and bowler specific information and instead let context be defined at the game level and not at the individual player level.

This is because win probability is defined at the game level, not at the matchup level and so individual state shouldn’t be a part of “context” here.

A Dual Model Approach

In the second innings, the required run rate, which in turn in determined by the total being chased down, along with the current state of the chase is a key determinant of victory.

In order to use a consistent model to evaluate win probability across both innings, rather than having two completely different models, I defined win probability as the probability of a successful chase.

In the first innings, the probability of the batting team winnings is still based on whether their posed total will be successfully chased down in the second innings. So, for the first innings, I built a model to predict the score of the batting team at the end of the innings and for the second innings, I built a model to assess the probability of a successful chase. I then chained these models together to get win probability for the first innings using the model that I built for the second innings.

In line with not using batter or bowler specific information for the matchup on each delivery, unlike the previously discussed model for predicted ball level outcomes, I did not use the batting position of the striker in this model. Adding something contextual to differentiate between scenarios like a) 2 wkts down with a partnership of 0 between batters 3 and 4 or b) 2 wkts down with a partnership of 0 between batters 1 and 4 could be useful and I may add position of the more senior batsman as a feature to the model in a later revision.

Let’s take a brief look at how this model performs. Here, I used all deliveries from 1000 matches to train the model and then ran the model to predict win probability on all deliveries of 483 new matches that the model had not previously seen. I then divided the respective deliveries into several bins based on their predicted win probability. The below plots show the actual win probability in these bins against the average win probability of predictions in the respective bins.

We see that the model does fairly well in both innings across the board. For the first innings, it is a bit overconfident — about 10% at the extremes, on average.

There is obviously a lot more of the match still to be played when the first innings is in progress, making it harder to predict the winner accurately, so predictions are bound to be less accurate (This is actually not a big problem for binary prediction, but does show up to some extent in our application as we use the probability estimate, not a binary estimate on victory). Below, we can see how, as the innings progresses, the error in the estimate of the first innings total goes down.

This estimate is what we use, plugged into the first ball of the second innings to estimate win probability. A similar improvement in win probability estimations also happens as the second innings progresses, which we are obviously only able to take advantage of, for the second innings predictions.

Win Probability Added

Now that we have win probability at each stage of the match, WPA (win probability added) is simply the change in win probability between balls. This tells us how much of an impact the events of each delivery had on match outcome.

Evidence Weighting

In order to account for the number of balls that a player has been involved with, I used the same methodology as I did in the piece on performance based metrics.

The Metrics

Evidence Weighted Relative Win Probability Added (EWR-WPA): This metric gives the difference between the actual WPA and the WPA in the expected case for a player, with the amount of evidence accounted for. We talked about win probability added above and about evidence weighting. So let’s talk a little bit about the “Relative” part of this metric. For this, I took the delta between the win probability added and the expected win probability added, in favor of the batting team.

In order to come up with the expected win probability added, I computed the win probabilities for two hypothetical situations: i) the case where an additional wicket falls ii) the case where no wicket falls and the batsman scores the expected number of runs from the delivery. I converted these into WPA and weighted these using the outcome probabilities from the earlier work to get expected win probability added.

Evidence Weighted Normalized Relative Win Probability Added (EW-NRWPA): EWR-WPA gives us an indication of the impact a player has relative to the average player in that situation. However, this doesn’t account for the opportunity of impact available to the player. A player who hits a 4 on a given ball when the average player takes 2 is still leaving 2 runs on the table that they could get by hitting a six. These may not be as critical when the required run rate is 7 an over; in that case, the WPA difference between a 4 & a 6 might be small. But, this would be very critical in the last over with 20 to win. Normalized relative WPA tries to account for this. It is a measure of the extent of possible WPA range between the expected outcome and the extreme outcome that the batsman was able to grab. It is expressed in terms of R-WPA as a % of achievable (in either direction, depending on the direction of actual WPA) WPA.

If actual WPA is lower than expected WPA, the normalized R-WPA is negative and the denominator uses the absolute delta between min WPA & expected WPA.

Evidence Weighted Innings/Over Win Probability Added (EW-Inn/Over-WPA): This is the evidence weighted metric that measures stint level contribution. For batsmen, this is WPA per innings, while for bowlers, this is WPA per over bowled, weighted by evidence i.e. number of innings played and overs bowled. This is one metric that is not relative to expectations.

As with the other metrics we had developed before, one can use this methodology to get splits of innings, bowler/batsman type etc. depending on what one is trying to evaluate.

Putting the metrics in perspective

We have already looked at EWR-RPO (how fast a batsman scores, relatively) and EWSF (how long a batsman bats, relatively, and hence plays more balls with their “eye in”) before, so let’s add EWR-WPA (the relative impact a batsman has per delivery) to that mix.

As one would expect, players who score faster and play longer generally are more impactful per delivery faced. However, we see that there is an interesting separation between someone like a Narine and a Perera in terms of impact even though they are close in the other two dimensions.

This has to do with the situations in which they bat. Perera, batting down the order, has more opportunities to bat in situations where different per ball outcomes can have significantly different impact on match result.

This range in possible WPA provides a larger opportunity for impact in either direction. A six in a chase of 180 is less critical in the first over than it is in the 18th over with 30 to go and 2 wickets in hand. Moreover, overall WPAs in situations where Perera bats are lower — fewer batsmen actually exceed expectations in crunch situations — giving him a higher relative WPA since he is one of those few.

The EW-NRWPA metric tries to measure how much of the available opportunity was taken up by the player, relative to the expectation, and thus it tries to “correct” for the above scenario. Note, that this just means that the two metrics complement each other, not that one is more useful or better than the other. Perera is about 9x higher than Narine on EWR-WPA, but only 1.37x on EW-NRWPA.

These metrics tell us how impactful a player is per opportunity provided (per ball) and per unit leverage that the opportunity provides (range of possible WPA). Short cameos, especially in crunch situations can be more highly rewarded by these metrics than innings that may be (but not necessarily) slightly longer, but more balanced between risk and reward. These are better captured better by the EW-Inn-WPA metric, which looks at the impact that a batsman has per stint. Perera is not higher than Narine on this metric at 0.087x.

Longer innings are more likely to be rewarded by this metric, but not always. Babar Azam, Shoaib Malik & Pollard (whose recent match winning brilliance with the bat for TKR which would have garnered an innings WPA of ~6–7%, not withstanding. In the data we were using here, when the chasing team needed between 2 and 2.5 a ball in the last over with 3 wkts in hand chasing between 140 and 160, they still lost 15 out of 19 times) who lead the pack on survival factor, produce much less impactful innings than DJM Short & Munro who are 4th and 5th on EWSF.

Note, that the stint (inn for batsmen) level metric here isn’t relative to other batsmen coming in to bat in the same situation. It is a measure of the delta between the likelihood of victory while the batsman was at the crease.

Proper use of WPA metrics

As I mentioned earlier in the piece, and as we saw in the previous section, WPA metrics are influenced by the situational opportunities that a player is subjected to, in a way that performance based measures like EWR-RPO are not.

When the possible WPA range is larger, a player has a better chance of achieving a more extreme WPA, thus favoring players who are involved in these high leverage situations. While the normalized version of this metric (EW-NRWPA) accounts for this, in isolation, it favors players who are involved with low leverage situations. The narrower the range of possible WPA, the less skill one needs to exercise to have a better score on this metric.

For example, if we look at the top bowlers who have bowled at least 300 balls in the last 3 years using just the WPA metrics in isolation, we get the following:

Apart from Rashid Khan, one would expect a lot of other names to make the top 5 cut before these names do.

Instead, if we look at top 5 bowlers by EWR-RPO and use the WPA measures to look at them, we get the following:

Note that RWPA & NRWPA can have opposite signs when aggregated per bowler (but not when computed at the delivery level).

Rashid Khan is far better than the others on both axes, pulling back win probability by ~0.1% more than average per delivery and giving away only 11–12% of the available WPA headroom above the average outcome per delivery. Narine-Mujeeb & Bumrah-Wahab are interesting here. These pairs are almost equal on one axis, but not on the other.

For the situations that they bowl in, Bumrah is just about as much more impactful that the average bowler as Wahab is. But, Wahab gives less of the available headroom away to the batsman and so there is less additional improvement opportunity available to Wahab in terms of the impact that he can have. Bumrah bowls a bit more than Wahab in the the death however, so while there is more opportunity for him to pulls things back, it may also be harder to do so.

Mujeeb & Narine are similar on wicket taking ability and economy with Narine being the slightly more economical one. They give up about the same relative leverage to the batsman, but since Mujeeb opens the bowling more often than Narine, when boundaries impact match outcomes a bit less than later in the innings, his relative impact is a bit lower on saving those boundaries.

When they bowl

It would be interesting to look into what could happen if Narine opened the bowling more or if Wahab bowled more in the death. To keep a tab on the length and scope of this piece, I’ll punt these investigations to another time. The work here should pave the way for several such interesting analyses to help understand player contribution, situational differences between players as well as to look into more abstract concepts like middle order batsmen and finishers, pressure and it’s impact etc.

p.s. I may tweak some of the methods, definitions and nomenclature discussed here in the future and I will try to come back and edit these pieces to keep them updated.

If you enjoyed this piece, check out more of my work at Boundary Line and follow along here & on twitter @amol_desai

I can be reached on twitter or via email or Linkedin

--

--

Amol Desai
Boundary Line

Cricket Analytics Consultant, Cricket Platform @ZelusAnalytics (working with Rajasthan Royals), Freelance @CricViz linkedin.com/in/amoldesai-ds