IDENTIFYING BEST BATSMAN TO BAT WITH IN IPL (INDIAN PREMIER LEAGUE)

Rahul Araveti
Web Mining [IS688, Spring 2021]
9 min readApr 25, 2021

INTRODUCTION

Statistics have always had a significant role in sports. As we all know, sports analytics is on the rise and will continue to play a significant role in how teams operate, pick their players, how they play the game, etc.

Cricket is no different. The runs scored by a batsman, the wickets taken by a bowler, or the matches won by a cricket team — these are all examples of the most important numbers in the game of cricket.

Maintaining a record of all such statistics has multiple benefits. The teams and the individual players can dig deep into this data and find areas of improvement. It can also be used to assess an opponent’s strengths and weaknesses.

Cricket has always been my passion and hence made me choose this topic to perform network analysis on.

Below you can see a snapshot of a few key statistics of Virat Kohli, one of the best batsmen in cricket:

This record gives us quite a lot of useful information, such as:

  • The batting average in different formats
  • Number of times the batsman scored >= 100 runs
  • Highest scores of the batsman
  • Number of matches played, etc.

An extensive exploratory analysis of this well-structured data can be helpful in comparing multiple players.

Think about it — we can try to answer the age-old questions like who’s the all-time best batsman? Was Courtney Walsh better than Curtley Ambrose? And so on. This analysis would be based on their individual records.

However, there are many crucial insights that are difficult and cumbersome to obtain by using only traditional data analysis techniques. Since cricket is a team game, it involves interaction among the players within a team as well as with those in the opposition team.

It is quite difficult to win matches if individuals focus on their respective performances only. They must support their fellow players as well.

This aspect of the game is obvious, especially during batting. There are always going to be two batsmen at the crease.

Significance of Batting Partnerships:

Who’s your favorite opening batting partnership? I used to love watching Sachin Tendulkar and Sourav Ganguly open the innings — that left-right hand combination was deadly. Healthy batting parternships have a strong role in building high scores for the team. Good partnerships during the early stage of an innings give solid foundation to the subsequent batsmen to play freely and score runs according to the situation regardless of the format of the game.

Let’s have a look at a partnership chart taken from a T20 match between India and Sri Lanka that took place on 7th January 2020:

Sri Lanka batted first and scored 142. India easily overcame this target and won the match.

The chart above shows a comparison of batting partnerships between the two teams. As you can see, Sri Lanka had only 1 decent partnership (Mathews and de Silva). On the other hand, India had three good partnerships of 97, 42, and 37 runs.

NETWORK ANALYSIS OF BATTING PARTNERSHIPS

I have done network analysis of batting partnerships in IPL 2019. For the uninitiated, IPL (Indian Premier League) is a yearly T20 cricket tournament that takes place in India. There are eight teams in IPL, each consisting of local and international players:

An Overview of the Batting-Partnership Network:

I manually collected the batting partnership data from ESPN Cricinfo for the semi-finalists: Mumbai Indians (MI), Chennai Super Kings, Sunrisers Hyderabad (SRH), and Delhi Capitals (DC).

The idea is to build a network for a single team, where the nodes are the batsmen that batted for the team during the entire tournament (IPL 2019). If any two batsmen batted together even once, then they would have an edge between them.

The edges will have directions, i.e, each edge will point from one node to another. This direction will tell who contributes more to the partnership.

For example, in the figure above, the overall contribution of Batsman B is more than that of Batsman A, considering all the partnerships they have had together. What exactly is this “overall contribution”? Well, this is essentially a performance metric.

Let’s say batsman A and batsman B have had 6 partnerships in a cricket tournament. Given below are the bar charts of the individual scores in every partnership they put up together:

We can clearly see that Batsman B has been more consistent in scoring runs as compared to Batsman A. However, we can’t rely on our visual inspection to find the better batsman in a pair. We should come up with some metric to determine who was the better batsman.

We will follow the below steps to compute this performance metric:

  1. Compute the individual median values of the runs scored by the two batsmen
  2. Median score of Batsman A is 17.5 and that of Batsman B is 35.5
  3. Then find the ratio of the larger value to the sum of the two median values. Hence, in this case, the ratio will be 35.5/(17.5 + 35.5)

Below is the batting-partnership graph of one of the most successful teams in the IPL era, Chennai Super Kings (CSK):

Chennai Super Kings Batting Partnership Network

Let’s see how we can interpret this graph and what sort of insights we can get from it.

Network Interpretation and Inferences:

In the graph above, there are 17 nodes and 40 edges with directions. This means that during IPL 2019, 17 players batted for CSK and there were 40 unique pairs of players who batted together.

The insights we can fetch from this network:

  • AT Rayudu built partnerships with the maximum number of players (9) followed by MS Dhoni (8)
  • KM Jadhav had the maximum number of incoming edges (5) which means he was the more effective batsman in his partnerships with 5 different players
  • AT Rayudu and SR Watson opened the innings for CSK. Rayudu did slightly better than Watson in building partnerships
  • We can also infer that a batsman with a high number of edges can adapt to different situations and bat with players of different styles

This network is more useful if we use it for inter-team comparison of players. For example, compare the middle order batsmen across all the teams.

Let’s compare all the opening batsmen of the IPL 2019 semi-finalists. Mumbai Indians (MI), Chennai Super Kings, Sunrisers Hyderabad (SRH), and Delhi Capitals (DC) made it to the semi-finals. I have listed below the openers of these four teams:

By constructing the similar partnership networks for these teams, we can extract the edge count and incoming edge count for each of the above players.

The column ‘Partners’ contains the edge count (number of unique partnerships), and the column ‘Leading’ contains the incoming edge count (number of unique partnerships where the batsman has performed better than his partner).

I have ranked these players based on the edge count and the incoming edge count, giving more preference to the edge count. However, in the case of a tie, I have used the total runs scored by the batsmen.

Now, I am going to add the players’ price to our existing table:

We can list down a few inferences from this table:

  • As per this ranking, Q de Kock is on top as he partnered with 10 different batsmen
  • DA Warner has also done a great job. However, he was way more expensive than Q de Kock
  • Surprisingly, AT Rayudu didn’t score a bag full of runs. And yet he managed to support the other batsmen and build partnerships
  • RG Sharma just couldn’t do justice to his price tag

SOFTWARES USED

· Python

· Google Colab

· NetworkX

IMPLEMENTATION

I have used Python to design our network analysis and Google Colab to implement the code.

IMPORT LIBRARIES AND DATA

Let’s first import the required libraries:

We then proceed with loading the dataset. The screenshot below gives us a glimpse of how the dataset looks.

Every row in the dataset represents a batting partnership. The columns player_1 and player_2 contain the batsmen pairs and the columns score_1 and score_2 are the runs scored by them in their respective partnerships, respectively.

DATA PREPARATION

Let’s first prepare the dataset for one of the four IPL teams. We will use that to create the batting-partnership network.

Right now, we have records of all the partnerships for Delhi Capitals during IPL 2019. To construct the network, we need to aggregate this data.

For example, if two players put together 5 partnerships, then we would aggregate it by taking the median values of scores for both the batsmen separately:

Let’s put the aggregated data in a dataframe:

Now that we have the median runs scored by each and every batsman, we can compute the performance metric (overall contribution):

CONSTRUCT NETWORK

Finally, it’s time to build our partnership network. It is going to be a directed network as the edges in the network will have directions based on the performance metric:

Here’s the code to plot the network:

Delhi Capitals Batting-Partnership Network

We can easily extract the count of edges and incoming edges using the networkx library.

Let’s get the count of all the edges, node-wise:

Get the count of incoming edges, node-wise:

CONCLUSION:

In this assignment we made a graph showing various partners in batting for Delhi capitals using NetworkX. This helps us to find out which pair was consistent and more likely to score more runs for the team. Players like Dhawan, P Shaw, Shreyas and pant has maximum of edges as they were top order batsmen compared to others.

Refrences:

  1. Complex Network Analysis in Cricket by Satyam Mukherjee.
  2. Source: sports.ndtv.com

--

--