IPL data analysis with Pandas and Matplotlib

Siddharth Murugan
Analytics Vidhya
Published in
3 min readDec 7, 2020

Hey everyone, hope you might have enjoyed watching IPL(Indian Premier League). Are you interested in analyzing the same IPL data?! Then this article is for you!!

Prerequisites for this exercise:

  • Python
  • Pandas
  • Matplotlib
  • Downloaded data set

Downloading the data set:

Let us download the data set to begin the data analysis. Below link has the IPL entire data set starting from 2008 to 2020,

You may need to login to get the data set.

Let’s Code:

Once you have downloaded the data set, unzip it and make it ready to use it. Here am going to analyze the file “IPL Matches 2008–2020.csv”. It has the details of all the match details like which team played the match, who won the man, man of the match, etc.. In this exercise, I want to know the total matches played by each team. Now let us import the necessary library,

import pandas as pdimport matplotlib.pyplot as plt

Now we need to get the csv file which we unzipped and store it in a variable as dataframe,

df = pd.read_csv('/home/siddharth/Downloads/IPL Matches 2008-2020.csv',index_col='id')

To check whether data has been loaded into the variable using print function,

print(df)

Dataframe is a two dimentional data structure similar to a table which has rows and columns. Since we need to find the total matches played by all the teams, we just need to refer only two columns (i.e) “team1” and “team2”. So let us get the unique values of the two columns,

team1 = df['team1'].value_counts(dropna=False)
team2 = df['team2'].value_counts(dropna=False)
print(team1)

In the above snip of code, we are getting the unique values of the column team1 and team2. Try to print team1 series(Series is denoted as a column in dataframe).

Misspelled team name

You may notice that the team name “Rising Pune Supergaints” has “s” missing which is causing it to count the team as separate value.

Let’s clean the data:

To clean data, we just need to replace the string with the team name having “s” in the end,

df["team1"]=df["team1"].replace("Rising Pune Supergiant","Rising Pune Supergiants")
df["team2"]=df["team2"].replace("Rising Pune Supergiant","Rising Pune Supergiants")

So above code will replace the occurrence of “Rising Pune Supergiant” with “Rising Pune Supergiants” in the series team1 and team2. Now, you will the the correct count of teams in team1 and team2 series,

Corrected team count

Calculating the total matches played by the team:

In order to calculate the total matches played by the teams, you need to count the occurrence of team names in two columns — “team1” and “team2”. Below logic will help to get the total matches played by all the teams,

teams={}
for i in team1.index:
t1=team1[i]
t2=team2[i]
teams[i]=t1+t2
print(teams)

In the above piece of code, we have teams defined as a dictionary with team name as key and number of matches played by the team as values. team1.index variable will hold the entire index value of team1 column. We are trying to add the count of occurrence of team name in both the columns(“team1” & “team2”).

Now let us plot the values and key in a graph using matplotlib.

plt.bar(range(len(teams)), teams.values(), align='edge')
plt.xticks(range(len(teams)), list(teams.keys()),rotation=20)
plt.show()

We are trying to display output in a bar graph using bar function for y-axis and xticks function for x-axis. Then displaying it using show() function.

Your graph which you got should look like below,

Complete graph

Complete code:

Happy analyzing!! :-)

--

--

Siddharth Murugan
Analytics Vidhya

Programmer who loves to do things creatively. #automationTester by profession #javascript #nodejs #reactjs