Unleashing the Power of Numbers: Analyzing IPL Ball by Ball Data.

6 min readJun 21, 2023

Cricket, often referred to as a gentleman’s game, has witnessed a remarkable transformation in the 21st century. The advent of data analytics has revolutionized the way teams strategize, players train, and fans perceive the sport. In this era of information-driven decision-making, one of the most exciting and widely followed cricket tournaments worldwide, the Indian Premier League (IPL), provides a treasure trove of data for analysis.

The IPL, a heady concoction of thrilling cricketing action, star-studded line-ups, and fierce competition, captivates millions of fans across the globe. Behind the scenes, an abundance of data is generated during every match, capturing crucial details about each delivery bowled and every run scored. This ball-by-ball data has become a goldmine for cricket analysts, statisticians, and enthusiasts who seek to uncover hidden patterns, unlock strategic insights, and gain a deeper understanding of the game.

In this article, we delve into the fascinating world of IPL ball-by-ball data analysis. By examining the wealth of information at our disposal, we aim to unravel the intricate dynamics that shape the outcome of matches and influence players’ performances. Whether it’s identifying a batsman’s strengths and weaknesses against specific bowlers, assessing the impact of different fielding positions, or evaluating a bowler’s success rate across different stages of the match, ball-by-ball data analysis has the potential to unlock strategic advantages for teams and arm fans with a deeper appreciation of the sport.

As we embark on this analytical journey, I invite you to join us in deciphering the intricacies of IPL cricket and uncovering the stories hidden within the numbers. From statistical anomalies to game-changing insights, the analysis of IPL ball-by-ball data promises to enrich our understanding of the game and elevate the way we perceive and appreciate the world’s most beloved sport.

Let us explore data from IPL’s first season in 2008 up to 2023 and figure out interesting analytics to understand which bowlers and batsmen had the upper and on each other

1. Getting the data

Firstly, we will need data, and where better to find it than Kaggle. We found a ball by ball database for all IPL seasons consolidated in a single csv sourced through cricksheet.org. You can access the same data here.

2. Getting the libraries

For now we just need 4 libraries in python. We can import them as:

3. Import the data and explore

We can import the csv as follows:

Then we can play around and explore the data as we like which will help us understand the data more before starting to analyse it. I will attach a few screenshots of how I explored it. Though, this is entirely subjective and you may explore it differently based on the kind of analysis you want to do with the data.

4. Getting player dismissed count

Now let us extract the data about which bowler dismissed which batsman how many times since 2008 in IPL. We can do that by the following code:

Here is how the df dataframe looks after the operations:

5. Getting balls bowled count:

Lets extract the count of balls bowled by each bowler against batsmen they have dismissed atleast once(because if they have never dismissed the batsmen our calculations of average will tend to infinity later(Zero division error), thus removing all the batsmen not dismissed by the bowler ever)

The code for the same is as follows:

The output of the merged data frame will look like:

6. Getting runs scored count

Now we use the same techniques to find the runs scored by each batsmen(using the bat only, ignoring byes and leg byes)against a bowler(ofcourse which have been dismissed atleast once by them for the same reason as I explained in the last point).

The code for runs scored count is:

The output of the merged data frame will look like:

7. Filtering the data frame for players who have faced each other for atleast 1 over

Now we are gonna remove the columns where the players have not faced each other for atleast 1 over(6 balls) to remove outliers. We will also remove the column “player dismissed” since the values are duplicate with column “striker”. The code is as follows:

Finally the data frame looks like this after the operations:

Note: I have also created a few extra columns namely striker average( that is runs off bat divided by dismissed count), run rate(That is runs scored divided by ball count) and dismissal index(That is balls played divided by dismissed count). You can also use these columns or any more columns you wish to create for your analysis. I will share the code for these three columns:

8. Let the Viz fun Begin

Now that we have extracted all the columns necessary for the couple visualizations I want to create for my analysis, let us start with it.

Lets try to see which batsmen have had an upper hand on R Ashwin.

The plot looks like this:

From the plot we can see that kohli has scored a lot of runs against Ashwin while only getting dismissed by him once. Similarly we can see Rahane has struggled against Ashwin to score runs and keeps getting out frequently against him. Uthappa, on the other hand does score runs but also gets out frequently against Ashwin(Frenemies?).

Now let us try to see which bowlers Kohli enjoys playing against the most and least

The plot looks like:

From the plot we can see kohli enjoys playing against Ashwin,Bravo, Mishra and Yadav while he gets dismissed often against players like Sandeep Sharma and Nehra.

This is the end of this article but this is definitely not the only analysis you can do with this data. I invite you all to be creative and try to find different inferences from the data set and share it down in the comments!

I have compiled all this code in a Kaggle notebook as well which you can access here.

Please put any feedback, issues or doubts you have in the comments, I would love to interact with anyone interested in sports data science!!