Cricket Science — Average vs Strike Rate (Part II: Implementing BEREX)

6 min readApr 15, 2022

BEREX, short for BErnoulli Run EXpectation, is a mathematical model created to answer the question - how can average and strike rate be combined into a single metric in a statistically sound way? Read Part I to learn about the idea, what motivated it, and its mathematical derivation.

In Part II, we will implement the model in code and explore the model’s results.

Code

As derived in Part I, BEREX is the sum of these two expressions:

We can use for-loops for both sums. The parameters are easily calculated from the inputs (average and strike rate/economy rate) or independent, such as the number of overs N. The probability density function of the binomial distribution is included in SciPy (for Python) and the Statistics & Machine Learning Toolbox (for MATLAB). Should you prefer to define that yourself, it is a simple combinatorics expression:

Contour Plots!

I created the plots below six or seven years ago in MATLAB. I’ll save myself the effort of recreating them but the process is simple — call the BEREX function for each pair of Average and Strike Rate in a grid of choices. These plots also work for bowlers since the math is the same for both. For that case, the ‘Strike Rate’ on the y-axis should be interpreted to mean the runs conceded per 100 balls on average i.e. it is Economy Rate multiplied by 100/6. Conventionally in cricket statistics, the term ‘Strike Rate’ has a different definition for bowlers (balls per dismissal).

Pick an average on the x-axis and a strike rate on the y-axis. The BEREX mean is given by the color of their intersection point, as labeled in the colorbar legend. ‘Walking’ along any contour line corresponds to a constant BEREX mean. If comparing against actual cricket scores for context, note that this plot doesn’t include extras.

Notice how much flatter the contour lines are for T20s. This means that once batting average crosses around 30, any additional increase in average hardly affects the BEREX since the team is very unlikely to be all-out within 20 overs. The score is thus bounded by the scoring rate in those overs. A contrary effect is shown by the vertical lines in first plot for 50-over innings; for low averages, an increasing strike rate is not helpful because the team will likely be all out before 50 overs anyway. I encourage you to spend time with the plots and draw your own insights or questions.

A Note on Standard Deviations

Notice that the function returns two numbers. The first is the expected score we’ve discussed so far. The second is the standard deviation of these hypothetical simulated scores.

Why is the standard deviation important? It is a measure of the relative certainty in the BEREX scores; a lower value suggests the imaginary team will more consistently score around the mean. Let’s use an example to clarify, using the calculated standard deviation as an error estimate (N=50 overs):

While the means are about the same, the scores from the player with the higher average will tend to cluster closer to 237 runs. The lower-average player’s BEREX scores are less consistent — there is a greater chance they will score substantially lesser (or greater) than 237. Is one better than the other? That depends on the strategy favored by a team and its existing composition.

How is the standard deviation calculated? I’ve used a well-known formula for the variance of a random variable X:

Thus, the for-loops also calculate the expected values of the squares of the scores in the variables Eoutsq and Eoversq. The subtracted term is simply the square of the calculated mean. Standard Deviation is the square root of the variance.

Will the standard deviation be useful in other ways? Yes! Thanks to the Central Limit Theorem, the scores resulting from BEREX are normally distributed. This fact can be applied to create a win probability calculator between two teams or hypothetical player-clone-teams. That’s a story for another day.

More Contour Plots!

Knowing the BEREX means and standard deviations, we can use z-scores to calculate the BEREX score at a chosen percentile.

For example, the plots below are for 20 %ile, which means that BEREX expects the inning score to be greater 80% of the time. This is given by a z-score of about -0.84.

This is useful since it tells us what the BEREX score will be in at least 4 out of 5 matches. It incorporates a sense of reliability. Predictably, these plots favor Average as compared to the previous plots.

A very curious behavior also shows up here — at low averages and high strike rates, the contour lines are not vertical but lean towards the right! This implies that with a low average, increasing the strike rate beyond a point actually decreases the BEREX score at this percentile, albeit slightly. Why does this happen? Offer your thoughts in the comments. I’ll share my answer in a later post.

Numerical vs Analytical Approaches

We have used an analytical solution in our code. This means we derived the underlying mathematical expressions and use computation to calculate the exact values predicted by our model.

Contrast this with a numerical solution. This involves running several trials that approximate the true solution. For example, the BEREX model has an underlying ball-by-ball simulation rule: out with probability p_out and a constant amount of runs otherwise. We can simply generate a random number for each delivery and thereby simulate thousands of innings. The law of large numbers states that the mean of the resulting scores will converge towards the true (analytical) expected value.

Numerical solutions are very useful in cases where an analytical solution cannot be easily derived, or if it will be much more computationally expensive to compute the exact form of the mathematical expression. Indeed, we’ll put the approach to good use in future articles on Cricket Science.

Summary and What’s Next

The intuition of a sports fan is a powerful tool. The results generated by BEREX passed my intuitive sense for comparing (Avg, SR) pairs. It also works in a mathematically sound way by carefully defining a model and calculating its predictions. It therefore passes the criteria of my motivating question but with one caveat — the relevant outputs of the model are not a single metric. If you’re familiar with idea of degrees of freedom in statistics, you will know that was unlikely to happen. The BEREX mean and standard deviation can (and arguably should) be combined to get a better sense of assessment.

In Part I, we were introduced to BEREX and derived its mathematics. Here, we implemented it in code and created contour plots to draw theoretical insights. In Part III, let us apply BEREX to real-world player data.