Incorporating and Analysing Advanced Metrics into Cricket: Player Value & VORP 🏏

Published in

Football Applied

28 min readJul 3, 2024

Understanding and analysing Player Value and VORP (Value Over Replacement Player) in cricket can be significantly enriched by incorporating concepts from game theory, particularly game entrance points and matchups. Game theory, with its strategic decision-making framework, helps dissect the complexities of cricket, where decisions on player deployment and match strategies are crucial. By examining game entrance points — moments when players enter the game — and matchups — specific player-versus-player scenarios — we can gain deeper insights into a player’s true value and performance impact. These strategic elements highlight how players perform under varying conditions and against different opponents, which, when integrated with traditional statistical metrics, provide a more comprehensive and dynamic understanding of Player Value and VORP. This approach not only captures the raw numbers but also the situational prowess of players, offering a richer, more strategic perspective on their contributions to the game.

Entrance Points

Understanding the importance of cricket statistics transcends mere recounts of events; they provide crucial insights into player performance and team dynamics under varying circumstances. Questions such as “What additional value did a bowler bring to the team?” and “How did this impact increase the likelihood of winning?” are pivotal for grasping the strategic nuances of the game. This understanding hinges on factors like the strategic use of entrance points for batsmen, tactical matchups for bowlers, and the prevailing pitch conditions.

Entrance points in T20 cricket play a pivotal role in shaping game momentum. For instance, deploying a pinch-hitter like Sunil Narine early in the innings exploits powerplay fielding restrictions, swiftly boosting the team’s scoring rate. Conversely, strategic entries of experienced batsmen like MS Dhoni during middle overs stabilize innings under pressure, capitalizing on their ability to control run chases and unleash late-inning firepower. These calculated placements ensure teams adapt dynamically, whether accelerating the run rate or consolidating their position.

An illustrative example occurred during India’s match against Pakistan in New York, where losing two early wickets prompted Axar Patel’s promotion to number four. His entrance steadied India’s innings, paving the way for power-hitters like Suryakumar Yadav, Shivam Dube, and Hardik Pandya to follow suit strategically. Contrastingly, in the Super 8s against Australia, Rohit Sharma’s aggressive approach maintained the batting order’s integrity despite facing a formidable bowling attack.

Despite scrutiny of India’s middle-order strategy in the World Cup, flexibility in adjusting batting orders proved advantageous. In the final against South Africa, early wickets necessitated Axar’s entry at number five, forming a crucial partnership with Virat Kohli. This foundation enabled Hardik Pandya and Shivam Dube to capitalize, highlighting the strategic importance of adapting entry points to match dynamics. Such decisions are crucial in T20 cricket, renowned for its unpredictability and strategic depth.

Matchups

Understanding the intricacies of matchups and strategic decisions in T20 cricket is crucial for teams aiming to excel in high-stakes encounters. These decisions, often rooted in game theory principles like dominant strategy and maximizing payoffs, can significantly sway game outcomes by exploiting opponents’ weaknesses and optimizing player performances.

In the context of the 2024 T20 World Cup, strategic matchups played a pivotal role in determining match dynamics. For instance, deploying a leg-spinner like Rashid Khan against left-handed batsmen or utilizing a left-arm pacer such as Trent Boult against right-handed batsmen weak against swing highlighted the tactical acumen behind these decisions. These moves are meticulously planned, leveraging player statistics and past performances to outmaneuver opponents and gain crucial advantages on the field.

In the semi-final against England, India showcased effective matchup strategies when Axar Patel’s effectiveness against Johnny Bairstow led to a critical dismissal, halting England’s scoring momentum. This tactical advantage was pivotal as India’s left-arm spinners subsequently capitalized, leading to England’s collapse and a low total.

However, matchups can also pose challenges, as seen in the final against South Africa. Here, India’s spinners struggled against Heinrich Klaasen’s onslaught, highlighting the constraints of tactical adjustments amid varying bowling strengths. Captain Rohit Sharma’s decision to introduce Hardik Pandya in the 17th over aimed to exploit Klaasen’s vulnerability against pace, showcasing the nuanced decision-making required in crunch moments. This strategic move, backed by statistical analysis and past performances, exemplified how teams strategically deploy their best resources to influence match outcomes.

Backward induction, a fundamental concept in game theory, was integral to the strategic decisions that shaped the climax of the match. As the game approached its decisive phase, Rohit Sharma faced a critical decision: whether to deploy Jasprit Bumrah or Hardik Pandya, weighing each player’s historical matchups and current form. Bumrah’s subsequent crucial over, complemented by Pandya’s impactful dismissal of Klaasen, decisively swung momentum in India’s favor.

The strategic calculus extended into the practical application of matchup analysis. With David Miller at the crease, Sharma’s strategic move favored Bumrah as the ideal matchup. At this juncture, it was apparent that Klaasen and Miller would not rotate the strike frequently against Bumrah. Miller’s statistics against Bumrah revealed a 42% dot ball percentage, indicating that he scored runs off Bumrah in 58% of deliveries, with 38% of those resulting in boundaries. Therefore, when South Africa needed 30 runs from 30 balls and faced Bumrah’s penultimate over, Miller managed to take a single off the first ball, while Klaasen focused on surviving against Bumrah’s challenging bowling.

India’s fortunes turned significantly when Hardik Pandya dismissed Heinrich Klaasen with four overs remaining. With Miller and Marco Jansen at the crease, and the prospect of tailenders looming, India’s strategic advantage grew clearer.

The strategic deployment of bowlers and the strategic importance of matchups underscored the game’s dynamics. The payoff matrix, derived from statistical insights and strategic planning, highlighted the criticality of optimal matchups. Facing Bumrah proved formidable for South Africa due to his ability to stifle runs and claim crucial wickets. Conversely, Pandya’s pivotal dismissals of key players like Klaasen and his knack for containing runs under pressure exemplified his indispensable role in India’s bowling strategy.

These strategic manoeuvres and decisions encapsulate game theory principles such as dominant strategy and maximizing payoffs, demonstrating the strategic acumen required to excel in the dynamic and pressure-laden environment of T20 cricket.

In the crucial moments of the match, Hardik Pandya’s pivotal dismissal of Heinrich Klaasen turned the tide in India’s favor. Adding to this, Marco Jansen had struggled against Pandya previously, facing dot balls 50% of the time and never scoring a boundary off him. As the game progressed, with David Miller and Jansen at the crease and Jasprit Bumrah set to bowl his final over, South Africa needed 22 runs from 18 balls, slightly below the required run rate at that stage. Initially appearing to have a slight edge, South Africa seemed poised to manage the required runs in Bumrah’s over to ease the pressure.

However, Miller’s strategic thinking likely revolved around waiting out Bumrah and Pandya to face his preferred matchup against Arshdeep Singh. Notably, Miller had the highest boundary and dot ball percentages against Singh compared to other Indian bowlers. Despite managing a single against Bumrah in the final over, Jansen fell victim to a brilliant delivery that cleaned up his stumps, bringing South Africa’s first tailender, Keshav Maharaj, to the crease.

The decision-making process regarding bowling choices can be likened to backward induction. Assuming Bumrah’s role in the penultimate over to apply pressure and take key wickets, and Pandya handling two of the final overs, left Singh to bowl the 18th over. This strategic alignment allowed India to contextualize matchup statistics into a payoff matrix, emphasizing the importance of dot balls and boundaries.

Miller’s effectiveness against Arshdeep was markedly higher, considering Pandya’s historical success against him, making Rohit’s strategic gamble pay off by ensuring that Pandya and Bumrah faced Miller in crucial overs. Ultimately, Miller’s dismissal, facilitated by Suryakumar Yadav’s outstanding fielding, left South Africa with Keshav Maharaj and Kagiso Rabada at the crease, needing an improbable 16 runs from the final 5 balls.

These strategic matchups and decisions embody game theory principles such as dominant strategy and maximizing payoffs, highlighting the strategic acumen needed to succeed in T20 cricket’s dynamic and high-pressure environment.

Impact/Game

Understanding the strategic intricacies of entrance points and matchups in T20 cricket, we can delve deeper into the quantitative metrics that encapsulate a player’s contribution to the game, and extend it to the other formats. This leads us to the Cricinfo Impact per Game metric and the concept of Value Over Replacement Player (VORP). These advanced metrics provide a nuanced view of a player’s performance, beyond traditional statistics like runs scored or wickets taken. They aim to quantify a player’s overall impact on the game and how much value they add to the team compared to a hypothetical replacement-level player.

The Cricinfo Impact per Game metric evaluates players based on various aspects such as batting, bowling, and fielding, integrating factors like match context and pressure situations. This comprehensive approach allows us to measure the true influence of a player on the outcome of a match, providing a holistic view of their performance.

Similarly, VORP in cricket, adapted from its origins in baseball, measures the additional value a player brings to the team compared to a typical replacement player. This metric helps in understanding how critical a player is to their team’s success, taking into account not just raw performance but also their contribution relative to other players available in the league.

By exploring these metrics, we can gain deeper insights into player performance and strategic decision-making, enhancing our appreciation of the tactical elements that define T20 cricket.

PlayerValue/VORP

Data Retrieval and Preparation

First, the code defines URLs for fetching batting and bowling statistics for various cricket teams across different formats (Test, ODI, T20). Using pd.read_html, it reads the tables from these URLs into Pandas DataFrames, adds columns indicating the country and nationality, and combines all the data into a single DataFrame (all_data). This aggregated data is then saved to a CSV file for further processing.

import pandas as pdimport pandas as pd

*For test batting data*
urls_dict = {
    "India": "https://www.espncricinfo.com/records/team/averages-batting/india-6/test-matches-1?current=2",
    "Pakistan": "https://www.espncricinfo.com/records/team/averages-batting/pakistan-7/test-matches-1?current=2",
    "Australia": "https://www.espncricinfo.com/records/team/averages-batting/australia-2/test-matches-1?current=2",
    "England": "https://www.espncricinfo.com/records/team/averages-batting/england-1/test-matches-1?current=2",
    "Afghanistan": "https://www.espncricinfo.com/records/team/averages-batting/afghanistan-40/test-matches-1?current=2",
    "West Indies": "https://www.espncricinfo.com/records/team/averages-batting/west-indies-4/test-matches-1?current=2",
    "New Zealand": "https://www.espncricinfo.com/records/team/averages-batting/new-zealand-5/test-matches-1?current=2",
    "Sri Lanka": "https://www.espncricinfo.com/records/team/averages-batting/sri-lanka-8/test-matches-1?current=2",
    "South Africa": "https://www.espncricinfo.com/records/team/averages-batting/south-africa-3/test-matches-1?current=2",
    "Bangladesh": "https://www.espncricinfo.com/records/team/averages-batting/bangladesh-25/test-matches-1?current=2"
}

*For ODI batting data*
urls_dict = {
    "India": "https://www.espncricinfo.com/records/team/averages-batting/india-6/one-day-internationals-2?current=2",
    "Pakistan": "https://www.espncricinfo.com/records/team/averages-batting/pakistan-7/one-day-internationals-2?current=2",
    "Australia": "https://www.espncricinfo.com/records/team/averages-batting/australia-2/one-day-internationals-2?current=2",
    "England": "https://www.espncricinfo.com/records/team/averages-batting/england-1/one-day-internationals-2?current=2",
    "Afghanistan": "https://www.espncricinfo.com/records/team/averages-batting/afghanistan-40/one-day-internationals-2?current=2",
    "West Indies": "https://www.espncricinfo.com/records/team/averages-batting/west-indies-4/one-day-internationals-2?current=2",
    "New Zealand": "https://www.espncricinfo.com/records/team/averages-batting/new-zealand-5/one-day-internationals-2?current=2",
    "Sri Lanka": "https://www.espncricinfo.com/records/team/averages-batting/sri-lanka-8/one-day-internationals-2?current=2",
    "South Africa": "https://www.espncricinfo.com/records/team/averages-batting/south-africa-3/one-day-internationals-2?current=2",
    "Bangladesh": "https://www.espncricinfo.com/records/team/averages-batting/bangladesh-25/one-day-internationals-2?current=2"
}

*For T20 batting data*

urls_dict = {
    "India": "https://www.espncricinfo.com/records/team/averages-batting/india-6/twenty20-internationals-3?current=2",
    "Pakistan": "https://www.espncricinfo.com/records/team/averages-batting/pakistan-7/twenty20-internationals-3?current=2",
    "Australia": "https://www.espncricinfo.com/records/team/averages-batting/australia-2/twenty20-internationals-3?current=2",
    "England": "https://www.espncricinfo.com/records/team/averages-batting/england-1/twenty20-internationals-3?current=2",
    "Afghanistan": "https://www.espncricinfo.com/records/team/averages-batting/afghanistan-40/twenty20-internationals-3?current=2",
    "West Indies": "https://www.espncricinfo.com/records/team/averages-batting/west-indies-4/twenty20-internationals-3?current=2",
    "New Zealand": "https://www.espncricinfo.com/records/team/averages-batting/new-zealand-5/twenty20-internationals-3?current=2",
    "Sri Lanka": "https://www.espncricinfo.com/records/team/averages-batting/sri-lanka-8/twenty20-internationals-3?current=2",
    "South Africa": "https://www.espncricinfo.com/records/team/averages-batting/south-africa-3/twenty20-internationals-3?current=2",
    "Bangladesh": "https://www.espncricinfo.com/records/team/averages-batting/bangladesh-25/twenty20-internationals-3?current=2"
}

all_data = pd.DataFrame()

for country, url in urls_dict.items():
    # Read the tables from the URL
    all_tables = pd.read_html(url)
    
    # Assuming the relevant table is the first one
    df = all_tables[0]
    
    # Add the Country and Nationality columns
    df['Country'] = url.split('/')[5]
    df['Nationality'] = country
    
    # Append the data to all_data
    all_data = pd.concat([all_data, df], ignore_index=True)

# Save the combined data to a CSV file
all_data.to_csv('filename.csv', index=False)

print(all_data)

Scaling Ratings for Bowlers

Once the data is aggregated and saved, the code focuses on scaling ratings specifically for bowlers in Test matches. It normalizes metrics such as bowling average (Ave), economy rate (Econ3), number of wickets (Wkts), and innings played (Inns). Each metric is weighted according to its importance in bowling performance: average (50%), economy rate (40%), matches played (10%), and wickets taken (5%). The overall rating is calculated as a weighted average of these normalized metrics, scaled to a range of 0 to 100. This scaled rating (scaled_rating) is computed for each bowler and sorted to identify the top performers.

*Generating scaled ratings*

df = pd.read_csv("/Users/name/Downloads/filename.csv", index_col=0)

df.columns

*For bowling ratings*

# Normalize the metrics
df['normalized_avg'] = (df['Ave'] - df['Ave'].min()) / (df['Ave'].max() - df['Ave'].min())
df['normalized_wickets'] = (df['Wkts'] - df['Wkts'].min()) / (df['Wkts'].max() - df['Wkts'].min())
df['normalized_economy'] = (df['Econ3'] - df['Econ3'].min()) / (df['Econ3'].max() - df['Econ3'].min())
df['normalized_matches'] = (df['Inns'] - df['Inns'].min()) / (df['Inns'].max() - df['Inns'].min())

# Define weights for each metric
weight_avg = 0.50
weight_economy = 0.40
weight_matches = 0.10
weight_wickets = 0.05

# Calculate the overall rating
df['rating'] = (
    weight_avg * (1 - df['normalized_avg']) +  # lower average is better
    weight_economy * (1 - df['normalized_economy']) +  # lower economy rate is better
    weight_matches * df['normalized_matches'] +
    weight_wickets * df['normalized_wickets']
)

# Scale ratings to 0-100
df['rating'] = df['rating'] * 100

min_rating = df['rating'].min()
max_rating = df['rating'].max()
df['scaled_rating'] = 60 + ((df['rating'] - min_rating) / (max_rating - min_rating)) * 38  # 39 because 99 - 60 = 39

df = df.sort_values(by='scaled_rating', ascending=False)

print(df[['Player.1', 'scaled_rating']].head(10))

df.to_csv('cricket_bowler_scaled_testratings.csv', index=False)

*For batting ratings*

# Normalize the metrics
df['normalized_avg'] = (df['Ave'] - df['Ave'].min()) / (df['Ave'].max() - df['Ave'].min())
df['normalized_runs'] = (df['Runs'] - df['Runs'].min()) / (df['Runs'].max() - df['Runs'].min())
df['normalized_sr'] = (df['SR'] - df['SR'].min()) / (df['SR'].max() - df['SR'].min())

# Define weights for each metric
weight_avg = 0.45
weight_runs = 0.15
weight_sr = 0.40

# Calculate the overall rating
df['rating'] = (
    weight_avg * (df['normalized_avg']) + 
    weight_runs * df['normalized_runs'] +
    weight_sr * df['normalized_sr'] 
)

# Scale ratings to 0-100
df['rating'] = df['rating'] * 100

min_rating = df['rating'].min()
max_rating = df['rating'].max()
df['scaled_rating'] = 60 + ((df['rating'] - min_rating) / (max_rating - min_rating)) * 38  # 39 because 99 - 60 = 39

# Sort by scaled rating
df = df.sort_values(by='scaled_rating', ascending=False)

# Display the top 10 bowlers
print(df[['Player.2', 'scaled_rating']].head(10))

# Save the scaled ratings to a new CSV file
df.to_csv('filename.csv', index=False)

Scaling Ratings for Batsmen

Similarly, the code scales ratings for batsmen in Test matches. Metrics like batting average (Ave), total runs (Runs), and strike rate (SR) are normalized and weighted based on their significance in batting performance: average (45%), runs scored (15%), and strike rate (40%). The overall batting rating (scaled_rating) is computed, scaled to a range of 0 to 100, and sorted to identify the top batsmen.

*For test bowling data*
urls_dict = {
    "India": "https://www.espncricinfo.com/records/team/averages-bowling/india-6/test-matches-1?current=2",
    "Pakistan": "https://www.espncricinfo.com/records/team/averages-bowling/pakistan-7/test-matches-1?current=2",
    "Australia": "https://www.espncricinfo.com/records/team/averages-bowling/australia-2/test-matches-1?current=2",
    "England": "https://www.espncricinfo.com/records/team/averages-bowling/england-1/test-matches-1?current=2",
    "Afghanistan": "https://www.espncricinfo.com/records/team/averages-bowling/afghanistan-40/test-matches-1?current=2",
    "West Indies": "https://www.espncricinfo.com/records/team/averages-bowling/west-indies-4/test-matches-1?current=2",
    "New Zealand": "https://www.espncricinfo.com/records/team/averages-bowling/new-zealand-5/test-matches-1?current=2",
    "Sri Lanka": "https://www.espncricinfo.com/records/team/averages-bowling/sri-lanka-8/test-matches-1?current=2",
    "South Africa": "https://www.espncricinfo.com/records/team/averages-bowling/south-africa-3/test-matches-1?current=2",
    "Bangladesh": "https://www.espncricinfo.com/records/team/averages-bowling/bangladesh-25/test-matches-1?current=2"
}

*For ODI bowling data*
urls_dict = {
    "India": "https://www.espncricinfo.com/records/team/averages-bowling/india-6/one-day-internationals-2?current=2",
    "Pakistan": "https://www.espncricinfo.com/records/team/averages-bowling/pakistan-7/one-day-internationals-2?current=2",
    "Australia": "https://www.espncricinfo.com/records/team/averages-bowling/australia-2/one-day-internationals-2?current=2",
    "England": "https://www.espncricinfo.com/records/team/averages-bowling/england-1/one-day-internationals-2?current=2",
    "Afghanistan": "https://www.espncricinfo.com/records/team/averages-bowling/afghanistan-40/one-day-internationals-2?current=2",
    "West Indies": "https://www.espncricinfo.com/records/team/averages-bowling/west-indies-4/one-day-internationals-2?current=2",
    "New Zealand": "https://www.espncricinfo.com/records/team/averages-bowling/new-zealand-5/one-day-internationals-2?current=2",
    "Sri Lanka": "https://www.espncricinfo.com/records/team/averages-bowling/sri-lanka-8/one-day-internationals-2?current=2",
    "South Africa": "https://www.espncricinfo.com/records/team/averages-bowling/south-africa-3/one-day-internationals-2?current=2",
    "Bangladesh": "https://www.espncricinfo.com/records/team/averages-bowling/bangladesh-25/one-day-internationals-2?current=2"
}

*For T20 bowling data*

urls_dict = {
    "India": "https://www.espncricinfo.com/records/team/averages-bowling/india-6/twenty20-internationals-3?current=2",
    "Pakistan": "https://www.espncricinfo.com/records/team/averages-bowling/pakistan-7/twenty20-internationals-3?current=2",
    "Australia": "https://www.espncricinfo.com/records/team/averages-bowling/australia-2/twenty20-internationals-3?current=2",
    "England": "https://www.espncricinfo.com/records/team/averages-bowling/england-1/twenty20-internationals-3?current=2",
    "Afghanistan": "https://www.espncricinfo.com/records/team/averages-bowling/afghanistan-40/twenty20-internationals-3?current=2",
    "West Indies": "https://www.espncricinfo.com/records/team/averages-bowling/west-indies-4/twenty20-internationals-3?current=2",
    "New Zealand": "https://www.espncricinfo.com/records/team/averages-bowling/new-zealand-5/twenty20-internationals-3?current=2",
    "Sri Lanka": "https://www.espncricinfo.com/records/team/averages-bowling/sri-lanka-8/twenty20-internationals-3?current=2",
    "South Africa": "https://www.espncricinfo.com/records/team/averages-bowling/south-africa-3/twenty20-internationals-3?current=2",
    "Bangladesh": "https://www.espncricinfo.com/records/team/averages-bowling/bangladesh-25/twenty20-internationals-3?current=2"
}

all_data = pd.DataFrame()

for country, url in urls_dict.items():
    # Read the tables from the URL
    all_tables = pd.read_html(url)
    
    # Assuming the relevant table is the first one
    df = all_tables[0]
    
    # Add the Country and Nationality columns
    df['Country'] = url.split('/')[5]
    df['Nationality'] = country
    
    # Append the data to all_data
    all_data = pd.concat([all_data, df], ignore_index=True)

# Save the combined data to a CSV file
all_data.to_csv('filename.csv', index=False)

print(all_data)

Calculation of VORP

Finally, the code calculates the Value Over Replacement Player (VORP) for each player. VORP is a metric used in sports analytics to quantify a player’s value compared to a hypothetical replacement player, often defined as the performance level of the average or bottom percentile player in the league. In this case, we did it by type of bowler and by country. For example, we tried to calculate the VORP value of a given seamer from Country X compared to the next best seamer option of Country X.

Here’s how VORP is computed:

For each country’s team, the code identifies the replacement-level performance, typically defined as the performance level of the bottom 20% of players in terms of economy rate (Econ3) and bowling average (Ave).
It then calculates the Player Value using a formula that considers both economy rate and bowling average relative to the replacement level. The weights (0.48 for economy rate and 0.52 for bowling average) indicate their relative importance in determining player value.
The Replacement Value is set to 1 as a benchmark.
VORP is computed as the difference between Player Value and Replacement Value (VORP = Player_Value - Replacement_Value).

dsf[sdf

#Calculating player value and VORP#

import pandas as pddf = pd.read_csv("/Users/name/Downloads/filename.csv", index_col=0)df.columnsdef calculate_vorp(df):
    vorp_list = []
    countries = df['Country.1'].unique()
    
    for country in countries:
        country_df = df[df['Country.1'] == country]
        
        # Calculate replacement-level performance (e.g., bottom 20%)
        replacement_econ = country_df['Econ3'].quantile(0.80)
        replacement_avg = country_df['Ave'].quantile(0.80)
        
        # Calculate player value and VORP
        country_df['Player_Value'] = 0.48 * (replacement_econ / country_df['Econ3']) + 0.52 * (replacement_avg / country_df['Ave'])
        country_df['Replacement_Value'] = 1  # since it's the benchmark
        country_df['VORP'] = country_df['Player_Value'] - country_df['Replacement_Value']
        
        vorp_list.append(country_df)
    
    return pd.concat(vorp_list)# Calculate VORP for each player
df_vorp = calculate_vorp(df)# Display the results
print(df_vorp[['Player2', 'Country.1', 'Econ3', 'Ave', 'Player_Value', 'VORP']])# Save to CSV
df_vorp.to_csv('filenamevorp.csv', index=False)

Bowling

ICC World Cup T20 World Cup 2024

In the context of the 2024 ICC T20 World Cup, the calculation of Scaled Ratings for players was enriched by incorporating advanced metrics that reflect their impact per game. While these ratings provide an objective view based on statistical performance, the inclusion of impact metrics factors in the game’s importance and the player’s performance under pressure. This holistic approach sometimes reveals outliers, as seen with Fazalhaq Farooqi, who initially dominated by becoming the tournament’s leading wicket-taker with a stellar performance in the early matches, including 9 wickets in the first two games. However, his performance tapered off during the Super 8s, causing his scaled rating to dip slightly below that of players like Jasprit Bumrah.

Jasprit Bumrah, despite not being the tournament’s leading wicket-taker, emerged as a standout player due to his exceptional consistency in taking wickets and maintaining a low economy rate throughout the competition. His ability to deliver crucial breakthroughs and change the course of matches, particularly in challenging situations where India faced deficits, underscored his impact. These critical contributions, often not fully captured by basic descriptive statistics alone, highlight the significance of incorporating contextual factors like match importance and performance under pressure when evaluating player ratings and impact in major tournaments.

A practical example of how these concepts filter into the analysis of Player Value and VORP can be seen in the 2024 ICC T20 World Cup. During the tournament, the interplay of game entrance points, matchups, and game theory principles became apparent. For instance, Jasprit Bumrah’s performance was analyzed not just through basic statistics but through his effectiveness in high-pressure situations, his strategic deployment in crucial overs, and his ability to exploit matchups against key opponents.

Bumrah’s success was partly due to his strategic usage in game entrance points, where his ability to deliver under pressure was critical. Similarly, Fazalhaq Farooqi’s early dominance and subsequent decline illustrated how performance in key moments can affect a player’s overall rating. While Farooqi was the leading wicket-taker early on, his reduced impact in later stages showed how game theory and strategic decisions can influence perceptions of value.

In essence, while statistical ratings provide a foundation for assessing player performance, integrating advanced metrics that account for game importance and performance context enriches our understanding of player contributions, ensuring a more nuanced and comprehensive evaluation of their overall impact during events like the ICC T20 World Cup.

T20 Internationals

When expanding this to T20 internationals, we see that Rashid Khan unsurprisingly has the best rating of any bowler in the format currently, but also has one of the highest VORP values. Furthermore, the considerably high VORP value of Kuldeep Yadav suggests that although he’s not India’s best bowler in terms of his rating, the drop off between him and the next best spin option in T20’s for India based on their current T20 stats is a larger drop off then for any other country.

If a bowler is in the top right quadrant, this implies they are one of the best bowlers in the world, but also provides tremendous value to its team relative to the next best seamer/spin option, depending on whether they are a seamer of a spinner. If the bowler is in the bottom right quadrant, on the other hand, this suggests that although the bowler is extremely skilled, they’re not completely indispensable, and there’s a suitable replacement for that bowler.

However, it’s still worth noticing there’s a couple of discrepancies in the dataset. Whilst most of the bowlers in this plot played in the recently World Cup, some of these players may not have played and may have skewed stats. Ravi Bishnoi, for example, despite having an above average VORP value and a solid rating, was not considered for World Cup selection, and didn’t even make the squad. Rather, it could imply that the drop off between him and the next best option is greater, even though there’s better bowlers with a lower VORP value then him. On the other extreme, Avesh Khan made India’s squad for the World Cup as a reserve, and so had a significantly below average VORP value, which suggests that he’s considerably down the pecking order in terms of India’s seamer options.

ODI’s

In terms of ODI’s, once again Rashid Khan shows that beyond his wicket taking ability in the white ball formats, his indispensable value to the Afghanistan cricket team continues to make him one of the greats of the sport.

However, it’s difficult to draw as many conclusions when it comes to ODI related data. Many countries nowadays don’t play ODI’s outside of the World Cups, with msot of those bilaterial games being T20 matches and Test Series. Jofra Archer, for example, barely played any ODI cricket following the 2019 World Cup, and so whilst the stock of the other English bowler’s hasn’t necessarily risen, whether his skill level has remained the same for England remains to be seen.

Including a trend line can show us which players have greater skill compared to their value to the team, or lesser skill relative to the team. In an ideal team scenario, you’d want players in the bottom right quadrant as you’d want players performing well for the team, but the value drop off should not be so big that there’s almost an over-reliance on them. One of the key reasons Pakistan missed out on the semi-finals in the 2023 ODI World Cup was due to the absence of Naseem Shah, leaving the Pakistan team short of a quality seamer. With Shaheen Afridi and especially Haris Rauf getting hit for runs, the inclusion of Naseem Shah would’ve likely affected Pakistan’s chances of at least finishing in the top 4.

To understand just how poor Pakistan were at the last World Cup, I’ve attached an excerpt from one of my previous articles trying to understand the ODI World Cup as a whole:

From November 2023:
Pakistan’s future

Pakistan’s bowlers underperformed massively during this World Cup
There’s a lot of questions for Pakistan to figure out from this tournament and going forward:
Why wasn’t Imam Ul-Haq dropped earlier? Especially with the form of Fakhar Zaman
Why have Pakistan’s spinners been so poor at taking wickets?
Why have Pakistan’s bowlers been so expensive?
Probably the question for many Pakistani fans was why wasn’t Fakhar Zaman brought in earlier? Perhaps he would have scored more runs than Imam Ul-Haq against South Africa and would’ve helped Pakistan win that game. The same can be said about that game against Australia. Zaman clearly showed that he was comfortable chasing big targets at the Chinnaswamy, so perhaps Pakistan could’ve shoe-horned their way into 3rd place had those decision been made earlier.
However, something that simply is inexcusable is how poor Pakistan’s bowling has been during this World Cup. Let’s talk about their best bowlers: Shaheen Shah Afridi, Haris Rauf, and Shadab Khan. Obviously the injury of Naseem Shah is a big miss, but not to the point where all the bowlers are underperforming. It took until the Bangladesh game for Shaheen to arguably find the feet, and for a bowler who’s consistently shown his talent in white ball cricket, it’s been a dissapointing World Cup for him. The same criticism can be extended to Haris Rauf, who was just simply too expensive to keep bowling. Despite beating New Zealand, Pakistan’s bowling performance that game was a stark reminder of just how underwhelming they’ve been.
Obviously, the outside noise from the PCB has been far from helpful to the cricket team’s preparation and performances. Though, it’s clear that it’ll probably be the last ODI World Cup for some of these players as they invest more and more of their talents into the t20 format. The difference of these bowlers bowling 4 overs vs 10 overs is rather staggering.

ave1= Bowling average from 2022 t20 World Cup; var6= economy rate from t20 world cup, difference= bowling average from 2023 ODI world cup-ave1

In regards to the spinners, the data is is rather brutal. I generated a variable that calculated the difference in the bowling averages in this year’s World Cup and last year’s t20 World Cup, and whilst for some bowlers, the difference hasn’t been too big, the drop off is abnormally large for the spinners. Obviously, whilst it’s important to account that the World Cups were held in different countries and different venues, the Indian pitches are usually more favourable to spinners, which makes these differences even more surprising.
As mentioned, harsh decisions certainly need to be made, especially pertaining to the bowling lineup.

Key Takeaways from the ICC World Cup 2023 Group Stage

As the group stage of this year’s 2023 ICC ODI World Cup draws to a close, with nothing but a few dead rubber group…

medium.com

Test Cricket

In the toughest format in terms of physical and mental endurance, given that there’s fewer fielding restrictions and there’s no limit to the amount of overs bowled, you often see more offensive bowling lines in test cricket and thus better bowling averages, hence why Scott Boland’s bowling average is so high for Australia, despite not playing too many test matches for Australia.

Given that in test cricket, experience matters and economy rate isn’t necessarily an issue, I added a small weightage for innings when calculating the scaled ratings, and also omitted the economy rate from calculation. As you can see from the plot, there’s more bowlers in the bottom right quadrant compared to the other formats. Weirdly, the Australian seamers in the top right quadrant suggests that there’s a huge drop-off between those 3 and the next best seamer option, which could od with a considerably smaller sample size for Australia’s other seam options such as Michael Neser and Jhye Richardson.

In summary, here are each country’s most valuable bowler for each format:

Batting

T20 Internationals

When looking at the T20 format, not team quite dominates the power hitting in terms of value and skill like the Indian cricket team. Not only do a chunk of their batters sit in the top left quadrant of the data, but even Rinku Singh, who has a batting average of 89, had to be omitted from the dataset as he’s such a large outlier. With the way we estimaed VORP/Player Value, the lower the player value, the better the player typically, hence why the best t20 batters are in the top left quadrant rather then in the top right quadrant.

Evidently, while there’s batters like Virat Kohli, Rohit Sharma, David Warner, and Jos Buttler possessing plenty of batting skills, none of them quite affect the game quite as much as Suryakumar Yadav. The value he provides to the Indian team means that relative to all the other batters, there’s no real replacement for him. However, compared to other formats, the T20 format is most discriminatory in terms of impact. For example, Despite Rohit Sharma having a much higher rating then Yashsavi Jaiswal, Jaiswal provides greater player value, according to the plot. Whilst much of this is skewed by Jaiswal being selected to bilateral series in between major tournaments, what it suggests that the manner of which one is scoring runs is arguably more important then the amount of runs they’ve scored, due to T20 being the shortest format of the sport.

For instance, a finisher like MS Dhoni, renowned for his ability to handle high-pressure situations, is often deployed in the final overs of a T20 match. His effectiveness in these crucial moments, where the outcome of the game can hinge on a few balls, underscores his high Player Value. This strategic deployment maximizes his impact, as his ability to chase down targets or set defendable scores is unparalleled.

ODI’s

In ODIs, the timing of a batsman’s entry is crucial. A player like Virat Kohli, who often bats at number three, can anchor the innings and accelerate as required. His ability to adapt to different phases of the game — whether consolidating after early wickets or capitalizing on a strong start — demonstrates his high Player Value. Kohli’s consistent performance in these varied scenarios elevates his VORP.

It’s no surprise that once again, Virat Kohli is the best in terms of adding value to a side and the skillset he possesses. In fact, this metric is once again dominated by Indian batters, as a bulk of them lie in the bottom left corner. This is largely due to the data being strongly dependede on the performances from the 2023 ODI World Cup, given most countries don’t play much ODI cricket outside of the World Cups.

India were an extremely dominant side throughout that whole World Cup, but unfortunately for them came short on the most important occassion against Australia at the Narendra Modi stadium. In fact, throughout the entire tournament, apart from the whole bowling attack firing at a different level, the batters were also in extremely in fine knick.

From November 2023:
The batting hasn’t been too shabby either. The team has shown a certain maturity in this edition of the world cup, with big partnerships between batters once early wickets fall. Throughout all of India’s matches, at least one of the top order batters have scored 50+, ensuring that even as the innings progresses, there’s usually a set top order batsmen to help rebuild the partnerships as wickets fall.
Here are some of the notable key partnerships in India’s matches:

Whereas in 2019, the issue was that the middle order didn’t have much batting to do before the semi-finals, everyone’s been able to have a go this time, and pretty much all of India’s top order + middle order batters have scored important runs.

Test Cricket

In the toughest format, it’s once again no surprise that those who are so strong across all formats are the best in test cricket, with the “Fab 4” playing at an extremely high level over the past decade. Evidently, this is a format where impact is measured more on batting averages and the amount of time spent batting, hence why for Jaiswal and Harry Brook, they’re at the bottom left, as they’ve both had a relatively short test career so far, and also come in during this more attacking era of test cricket.

In Tests, the timing of a bowler’s spell can be pivotal. Consider a bowler like James Anderson, who is often brought into the attack during the first hour of a match, exploiting the conditions with the new ball. His ability to swing the ball and take early wickets sets the tone for the innings, significantly impacting the game’s outcome. Anderson’s effectiveness in these critical moments underlines his high Player Value and substantial VORP.

Extending the plot to the amount of innings played and the batting average they have, you can see how good and for how long the “Fab 4” have been:

In terms of matchups, on the other hand, are strategic battles that can last over multiple sessions. For instance, the duel between a bowler like Ravichandran Ashwin and a top-order batsman such as Steve Smith involves intricate game theory. Ashwin’s variations and ability to exploit the batsman’s weaknesses over extended periods make him invaluable. His strategic use to target specific batsmen elevates his VORP, as his contributions are crucial in turning matches in his team’s favor.

In summary, here are each country’s most valuable batter for each format:

In conclusion, the integration of game entrance points, matchups, and game theory principles significantly enhances our understanding and analysis of Player Value and VORP across T20s, ODIs, and Test cricket. By considering the strategic elements of when a player enters the game and their historical matchups against opponents, we can better appreciate their true impact and value beyond mere statistics. Game theory concepts, such as backward induction and payoff matrices, provide a framework for making optimal decisions and understanding the complex dynamics of cricket matches. These advanced approaches allow us to evaluate players more comprehensively, recognizing their contributions in various formats and under different game conditions. This holistic perspective not only refines player ratings and VORP calculations but also offers deeper insights into the strategic intricacies that define the modern game of cricket.

Bonus

Augmented T20 k-factor model

In a dataset with different k-factor, the script defines functions for Elo rating calculation (`calculate_elo_rating`) and determining the K-factor (`determine_k_factor`) based on it’s a group stage match or a knockout match. The K-factor adjustment ensures that knockout matches, where stakes are higher, contribute more significantly to Elo rating changes.

import os
import json
import pandas as pd
import pydash
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import csv
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import seaborn as sns

matches = pd.read_csv("/Users/samyukth/Downloads/t2020.csv", index_col=0)
matches.columns
Index(['Date', 'playoff', 'Team1', 'Team2', 'Venue', 'Winner', 'Year',
       'Country', 'ELO Rating team 1', 'ELO Rating Team 2', 'EA', 'EB',
       'New ELO Team 1', 'New ELO Team 2', 'id.1', 'team1_rating',
       'team2_rating', 'team1_newrating', 'team2_newrating', 'team1_expected',
       'team2_expected', 'K-factor', 'outcometeam1', 'outcometeam2'],
      dtype='object')
unique_teams = pd.concat([matches['Team1'].str.strip(), matches['Team2'].str.strip()]).unique()
# Print unique teams
print(unique_teams)
unique_countries_team1 = matches['Team1'].str.strip().unique()
for country in unique_countries_team1:
    if isinstance(country, str):
        clean_country = country.strip()
        matches.loc[matches['Team1'].str.strip() == country, 'Team1'] = clean_country
print(matches['Team1'].unique())
matches['Team1'] = matches['Team1'].str.strip()
matches_count = {}
for team in unique_teams:
    matches_count[team] = ((matches['Team1'].str.strip() == team) | (matches['Team2'].str.strip() == team)).sum()
matches_count_df = pd.DataFrame(list(matches_count.items()), columns=['Team', 'MatchesPlayed'])
matches_count_df = matches_count_df.sort_values(by='MatchesPlayed', ascending=False)
print(matches_count_df)
import pandas as pd
# Read the T20 data
matches = pd.read_csv("/Users/samyukth/Downloads/t2020.csv", index_col=0)
# Function to calculate Elo rating
def calculate_elo_rating(team1_rating, team2_rating, outcome, k_factor):
    expected1 = 1 / (1 + 10 ** ((team2_rating - team1_rating) / 400))
    elo_change = k_factor * (outcome - expected1)
    return expected1, elo_change
# Function to determine K-factor based on playoff value
def determine_k_factor(playoff):
    if playoff == 1:
        return 10
    elif playoff == 2:
        return 15
    elif playoff == 3:
        return 25
    elif playoff == 4:
        return 35
    else:
        return 10  # Default K-factor if playoff value is not in the specified range
# Function to update Elo ratings
def update_elo_ratings(matches, elo_ratings):
    for index, match in matches.iterrows():
        team1 = str(match['Team1']).strip()
        team2 = str(match['Team2']).strip()
        
        winner = match['Winner'] if pd.notna(match['Winner']) else ''
        
        if winner.strip() == team1:
            outcome_team1 = 1  # Win for team1
            outcome_team2 = 0  # Loss for team2
        elif winner.strip() == team2:
            outcome_team1 = 0  # Loss for team1
            outcome_team2 = 1  # Win for team2
        else:
            outcome_team1 = 0  # Draw
            outcome_team2 = 0  # Draw
        # Determine K-factor based on playoff value
        k_factor = determine_k_factor(match['playoff'])
        # Get current Elo ratings
        team1_rating = elo_ratings.get(team1, 1000)
        team2_rating = elo_ratings.get(team2, 1000)
        # Calculate Elo changes and expected outcomes
        expected1, elo_change1 = calculate_elo_rating(team1_rating, team2_rating, outcome_team1, k_factor)
        expected2, elo_change2 = calculate_elo_rating(team2_rating, team1_rating, outcome_team2, k_factor)
        # Update Elo ratings in the dictionary
        elo_ratings[team1] += elo_change1
        elo_ratings[team2] += elo_change2
        # Also update the Elo ratings and expected outcomes in the DataFrame
        matches.at[index, 'team1_rating'] = team1_rating
        matches.at[index, 'team2_rating'] = team2_rating
        matches.at[index, 'team1_newrating'] = elo_ratings[team1]
        matches.at[index, 'team2_newrating'] = elo_ratings[team2]
        matches.at[index, 'team1_expected'] = expected1
        matches.at[index, 'team2_expected'] = expected2
        matches.at[index, 'outcometeam1'] = outcome_team1
        matches.at[index, 'outcometeam2'] = outcome_team2
    
    return elo_ratings
# Extract unique teams
unique_teams = pd.concat([matches['Team1'], matches['Team2']]).astype(str).str.strip().unique()
# Initialize Elo ratings dictionary
elo_ratings = {team: 1000 for team in unique_teams}
# Initialize Elo ratings columns in the matches DataFrame
matches['team1_rating'] = matches['Team1'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
matches['team2_rating'] = matches['Team2'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
matches['team1_newrating'] = None
matches['team2_newrating'] = None
matches['team1_expected'] = None
matches['team2_expected'] = None
matches['outcometeam1'] = None
matches['outcometeam2'] = None
# Update Elo ratings based on matches data
elo_ratings = update_elo_ratings(matches, elo_ratings)
# Display updated Elo ratings
for team, rating in elo_ratings.items():
    print(f"{team}: {rating}")
# Print the updated DataFrame
print(matches[['id.1', 'Team1', 'Team2', 'team1_rating', 'team2_rating', 'team1_newrating', 'team2_newrating', 'team1_expected', 'team2_expected', 'outcometeam1', 'outcometeam2']])
# Save the updated DataFrame to a CSV file
output_file = 't20eloo5.csv'
matches.to_csv(output_file, index=False)
Oman: 976.3812587527545
Bangladesh: 952.8634998712928
Ireland: 965.3651239277015
Sri Lanka: 981.8646206069699
Scotland: 975.9860437140121
Namibia: 975.0882001158549
Australia: 1057.992357608624
England: 1036.8987806777936
India: 1079.2698628776238
Afghanistan: 975.3412193059294
South Africa: 1034.0110224194202
Pakistan: 1016.4490209766757
West Indies: 980.8495698293474
New Zealand: 1004.055909537729
nan: 850.0
Zimbabwe: 981.6985115884502
Netherlands: 966.2426172039565
Papua New Guinea: 970.2566136715835
#Note: Teams that played less then 7 games total from 2021-2024 in the 3 T20 World Cups were omitted from the data due to small sample size.#

The main function (`update_elo_ratings`) iterates through each match in the t20 DataFrame, determines the outcome (win, loss, or draw) for each team, calculates Elo rating changes using the previously defined functions, and updates the Elo ratings accordingly. It also updates the DataFrame with columns for current and updated Elo ratings, expected outcomes, and match results.

After updating the Elo ratings for all matches, the script prints out the final Elo ratings for each team and displays a subset of the updated DataFrame containing Elo-related columns. Finally, it saves the entire updated DataFrame to a new CSV file for further analysis or reporting purposes.

This script is designed to provide a comprehensive analysis of T20 matches data using Elo ratings, adjusting the ratings dynamically based on match outcomes and playoff significance, thus reflecting the teams’ performance over time accurately.

The comparison of the new Elo ratings to the old ratings for international teams reveals intriguing insights into their recent performances and overall competitiveness. Only India have notably increased their ratings, suggesting strong recent performances that have bolstered their standings, despite England being the only side to have made the semi-finals in all 3 editions of the World Cup. Conversely, teams like Bangladesh and West Indies have shown a sharp drop, depicting their poor performances in the recent T20 editions. This stability suggests these teams ability greatly fluctuate based on confounding factors such as weather or condition, further depicting the unpredictability of the T20 format.

Want to read more?

The Future of Cricket 🏏

As the game has changed, so have the priorities of cricketers, with many sacrificing the opportunity to represent their…

medium.com

What wins you a T20 World Cup: The Underlying Numbers 🏏 :

In cricket and many sports, surface-level statistics often paint an incomplete picture of a team’s performance. Merely…

medium.com

Key Takeaways from the ICC World Cup 2023 Group Stage

As the group stage of this year’s 2023 ICC ODI World Cup draws to a close, with nothing but a few dead rubber group…

medium.com

Predictive Modelling NBA Games using Excel and Python 🏀

In my attempt to create an accurate prediction model of NBA games and playoff results, I used a player-based stats…

medium.com

Incorporating and Analysing Advanced Metrics into Cricket: Player Value & VORP 🏏

Entrance Points

Matchups

Impact/Game

PlayerValue/VORP

Data Retrieval and Preparation

Scaling Ratings for Bowlers

Scaling Ratings for Batsmen

Calculation of VORP

Bowling

ICC World Cup T20 World Cup 2024

T20 Internationals

ODI’s

Key Takeaways from the ICC World Cup 2023 Group Stage

As the group stage of this year’s 2023 ICC ODI World Cup draws to a close, with nothing but a few dead rubber group…

Test Cricket

In summary, here are each country’s most valuable bowler for each format:

Batting

T20 Internationals

ODI’s

Test Cricket

In summary, here are each country’s most valuable batter for each format:

Bonus

Augmented T20 k-factor model

Want to read more?

The Future of Cricket 🏏

As the game has changed, so have the priorities of cricketers, with many sacrificing the opportunity to represent their…

What wins you a T20 World Cup: The Underlying Numbers 🏏 :

In cricket and many sports, surface-level statistics often paint an incomplete picture of a team’s performance. Merely…

Key Takeaways from the ICC World Cup 2023 Group Stage

As the group stage of this year’s 2023 ICC ODI World Cup draws to a close, with nothing but a few dead rubber group…

Predictive Modelling NBA Games using Excel and Python 🏀

In my attempt to create an accurate prediction model of NBA games and playoff results, I used a player-based stats…

Written by Sam Iyer-Sequeira