IPL matches — An interesting Data Analysis

5 min readApr 14, 2019

A very basic analysis of IPL matches data available till 2017 to dig some interesting insights.

Introduction

We generate a lot of data today in our daily lives considering the data-hungry smart devices be it your smartphone, car, smartwatch even your home (if it’s fitted with smart devices).

Since IPL is already in full swing and every cricket freak is keeping a close watch whichever team they support on their smart devices, having some insights based on IPL matches data readily available is like gold especially for those who are busy selecting players for their team on dream11 to increase their chances of winning cash.

So it would make a lot of sense and sounds like fun digging that useful information hidden somewhere in those row-columns. Isn’t it ? 😎

Data Set being used for analysis

Before getting into it, Data set being used for analysis is easily available at https://www.kaggle.com/nowke9/ipldata

1. Problem Statement

In this analysis of IPL matches from 2008 to 2017 is done using python packages like pandas, matplotlib and seaborn. This Exploratory Data Analysis will help us to find patterns in data, determining relationships in data. We will try to identify the team that has more chance to win the upcoming seasons by finding observations like success rate of each team, identify the team that has won maximum seasons, best defending and chasing team and analyse the toss decisions etc.

2. Data Loading and Description

The dataset consists of the information about IPL matches held from 2008 to 2017.
The dataset comprises of 696 observations of 18 columns. Below is a table showing names of all the columns and their description.

3. Importing the Packages and Dataset

Now, let’s read from data and create dataframe:

So, we have the following dataframe(couldn’t capture complete data in screenshot):

a. Understanding the Dataset

Observing few rows and columns of data both from the starting and from the end, IPL matches data has 696 rows and 18 columns.

To understand dataset in a better way, we have in-built methods in python to get basic attributes of data set

Now we check null values in each column:

We can see umpire3, umpire2, umpire1, player_of_the_match,winner, city has null values which suggests we need pre-processing of data first before any data analysis.

b. Pre-Processing

We did following treatment on data set:

Replaced missing entries of City from the Venue Column.
Replaced Rising Pune supergiant as Rising Pune supergiants .
Dropped the column ‘Umpire3’ as it has 91% null values.
Replaced city Bengaluru to Bangalore

Now, after data is pre-processed and ready for analysis, we need to identify answers to which questions are hidden in data-set. Everyone of us might interpret data in different ways and thus have a completely different perspective from what I am sharing. Everyone’s perspective is correct as far as that perspective is derived from data. So, following are some questions which I tried to answer from data that we processed above:

Q1: Which team has highest number of wins across all IPLs ?

Q2: Who are the players with Most player of the matches awards in an IPL season?

Q3: What are the chances of a team winning the match if toss is won by them?

Q4: What are the chances of winning if a team bats second?

For now, I’ll try to answer the above basic questions.

Q1: Which team has highest number of wins across all IPLs ?

Answer:

So we can see Mumbai Indians have won the most matches across IPL seasons till 2017 followed by Chennai Super Kings and Kolkata Knight Riders.

Q2: Who are the players with Most player of the matches awards in an IPL season?

Answer:

From above, we can see that Ben Stokes and NM Coulter-Nile have won most ‘player of the match’ awards in 2017 edition of IPL.

Q3: What are the chances of a team winning the match if toss is won by them?

Answer:

If we try to plot the results in the form of pie chart, we would have:

Q4: What are the chances of winning if a team bats second?

Answer:

If we try to interpret in the form of pie chart for better visual:

YAY! We finally have some basic insights from IPL data and can now use this information to predict the result of the ongoing IPL matches.

Enjoyyy!!

Note: The inferences drawn above may or may not follow correct predictions since we are using historical data. In case of real life scenarios, many external factors are there which cannot be captured and never be defined by historical data.

PS: I attached screenshots of the python code just for reference and might not be full fletched code. This post is focused more on the inferences that we are drawing. So please feel free to reach out to me in case any help needed.

IPL matches — An interesting Data Analysis

Introduction

Data Set being used for analysis

1. Problem Statement

2. Data Loading and Description

3. Importing the Packages and Dataset

a. Understanding the Dataset

b. Pre-Processing

Q1: Which team has highest number of wins across all IPLs ?

Answer:

Q2: Who are the players with Most player of the matches awards in an IPL season?

Answer:

Q3: What are the chances of a team winning the match if toss is won by them?

Answer:

Q4: What are the chances of winning if a team bats second?

Answer:

Written by Nishank Arora