Cricket Science — Average vs Strike Rate (Part I: Introducing BEREX)

7 min readApr 15, 2022

What is the hallmark of a good batsman? In simple numerical terms, it is the ability to score many runs and to do it fast. The same applies to bowlers, except it is their capacity to take wickets and limit run scoring that matters. This idea makes the Average and Strike Rate useful metrics:

Equivalently for bowlers, we have Average and Economy Rate:

The Question

I was in sixth or seventh grade and speaking to a friend about batsmen averages and strike rates. We debated the relative importance of the two metrics. Of course, strike rate is not considered particularly relevant in test cricket, where teams have substantially more time to play. But for limited-overs cricket, increasing either of the two numbers indicates better batting. My friend proposed a method to combine the two:

From which we can easily determine that

This seems reasonable on paper but I had my qualms.

One, is an (Avg,SR) pair of (30,80) — with a product of 2400 — equally valuable as (40,60) or (20,120)? Perhaps in 50 over cricket but certainly not in the then-newfangled Twenty20 format.
Two, the mathematical expression is rather arbitrary. This is because the very choice of multiplication is arbitrary. Why should the exponent for the Runs term equal to 2 instead of, say, 1.8 or 2.5?
Three, the resulting units and scale do not have a natural interpretation for a cricket enthusiast (I was admittedly bordering on nitpicky by now).

I was thus left with the question — how can average and strike rate be combined into a single metric in a statistically sound way?

This can similarly be posed for bowlers. For the sake of simplicity, I will discuss the batting metrics in this article and later generalize our results to bowling metrics as well.

The Idea

I didn’t have an ‘aha’ moment when I came up with a satisfying answer; it was a gradual process of noodling with ideas during my undergrad. With the advantage of hindsight, we’ll take a much more direct path.

Consider a player with an average of 30 and a strike rate of 75. Their ‘average inning’ will be 30 off 40 balls (since 75 = 100 * 30/40). We can imagine a simple ball-by-ball simulation of an inning: for each ball faced, they have a 1 in 40 chance of being out. Otherwise, they score 0.75 runs. Keep repeating this until they get out. This is an example of a Bernoulli process in statistics and probability.

Now imagine a team consisting of eleven copies of that exact player. We can use the above idea to simulate a full inning of this hypothetical team. Stop simulating when 10 wickets have fallen or all overs are bowled. These team scores could be a quality measure that addresses my qualms noted above. It’s measuring the ability to contribute to the team score — which directly affects the primary goal of winning games. The relative importance of the two metrics automatically ‘adapts’ when we change the number of overs in the inning. We won’t use a conjured-up ‘seems reasonable’ equation. The resulting units are runs and its values comparable to inning scores that cricket fans can intuitively judge. Three for three!

A good idea is *almost as satisfying as the sweet sound of leather on willow*

Constructing the Mathematical Model

Let’s begin by formalizing the idea.

For a single ball, this gives

If the player isn’t out on a given ball, we assumed they score a constant number of runs instead. However, this isn’t exactly one-hundredth of their strike rate. The ‘average inning’ is Avg runs scored off bpd balls. The latter includes the one ball on which they were dismissed, so for the actual scoring balls we define:

We now have the parameters of each Bernoulli trial — the process will be a finite sequence of such independent trials. The expected value of the runs scored by the hypothetical team-of-clones is, by definition:

We multiply every possible score x with its corresponding probability, then sum all such terms

I call this value BEREX, an acronym for BErnoulli Run EXpectation. Split the calculation for two cases: Case 1 is if the hypothetical team-of-clones is all out; Case 2 is if 9 or fewer wickets fall but the scheduled overs are all bowled. In Case 1, for the example of a 50-over inning, the team will face anywhere from 10 to 300 balls of which exactly 10 were outs. If they faced exactly b balls:

This would happen if there were 9 outs in b-1 balls and the tenth out on the last ball. The probability of this is calculated using the probability mass function of the Binomial distribution:

Generalizing to a maximum inning length of N balls (N=300 for ODIs, N=120 for T20s and so on):

Now for case 2, the number of balls is set to N, of which anywhere from 0 to 9 balls were outs. Here, the summation will be over the number of wickets w:

Finally,

Some of you may be questioning a couple of key features. Is the probability of dismissal really constant throughout? No. Does a player score a constant, say, 0.83 runs on every other ball? Obviously not. These are called simplifying assumptions, and they are an essential part of practically any mathematical model. One of my favorite quotes is about this:

“All models are wrong, but some are useful.”
-George E.P. Box

In future articles, I hope to share my work on the dynamics of dismissal probability and on estimating the likelihood of specific outcomes for a ball-by-ball simulator. For now, let’s test BEREX!

Example Results

We can apply the model to any pair of average and scoring (or conceding) rate, whether for a player’s career, a season/tournament’s performance, or something hypothetical.

How does BEREX judge the trio of (Avg, SR) pairs I’d considered in middle school? For a 50-over game, (30, 80) wins with a BEREX of 227.6 runs. In other words, a team consisting of players who all average 30 with a strike rate of 80 will score an average of 227.6 runs per inning, excluding extras. Next is (20, 120) with 199.5 runs and (40, 60) comes last at 179.6 runs. The high batting average doesn’t help all that much when the scoring rate is a measly 3.6 runs per over.

For a Twenty20 game however, the order goes (20, 120): 139.7 runs > (30, 80): 96.0 runs > (40, 60): 72.0 runs. Here, the chances of an all-out inning are much lower, so the strike rate becomes even more important.

Let’s now consider some career figures in 50-over ODIs. Virat Kohli (58.07, 92.92) as of this writing gets 277.9 runs; Virender Sehwag (25.05, 104.33) gets 287.5 runs; Shahid Afridi (23.57, 117.0) is at 232.6 runs. For bowlers, BEREX uses (Avg, Econ) and is interpreted as the runs conceded so a smaller value is better. Muralidaran (23.89, 3.93) is at 185.1 runs; Glenn McGrath (22.02, 3.88) at 179.0 runs; Zaheer Khan (29.43, 4.93) at 231.0 runs.

Summary and What’s Next

BEREX, short for Bernoulli Run Expectation, is the expected runs scored (or conceded) by a hypothetical team of eleven clones of a player facing an ‘average’ bowling (or batting) attack. It is a statistically sound model for assigning a run value to a pair of Batting Average and Strike Rate, or Bowling Average and Economy Rate. Since the typical values of these metrics have evolved over time, BEREX too shouldn’t be used for comparison across eras. I find it most interesting for answering questions of hypothetical comparisons or for comparing season/tournament performances.

Read Part II to see BEREX implemented in code and view some pretty visualizations of its predictions. In Part III, I will apply the model to real-world player data. Thanks for reading!