SQL Case Study 2 — IPL

Saloni Chaubal
7 min readMay 23, 2023

--

Through this case study, we delve into the step-by-step implementation of SQL in the context of IPL team management. We explore the database design, query optimization, and data integration techniques utilized by the franchise to consolidate and analyze player statistics, match results, and other relevant data sources. By showcasing real-world examples and practical applications, the impact of SQL on decision-making and overall team performance is demonstrated.

This dataset contains 2 main tables :

1]Deliveries — id, inning, over, ball, batsman, non striker, bowler, batsman runs, extra runs ,total runs , is wicket, dismissal kind, player dismissed, fielder, extras type, batting team, bowling team, venue, match date.

2]Matches — id, city, date, player of match, venue, neutral venue, team1, team2, toss winner, toss decision, winner, result, result margin, eliminator, method, umpire1, umpire2.

The deliveries table contains 193468 rows of data, while matches table has 816 rows of data. Since there are large number of rows in the dataset, we will import the tables via the command prompt into MySQL Workbench.

After creating a database named ‘IPL’ , it is time to create the tables in the workbench before importing them.

Matches Table:-

Deliveries Table:-

Business questions to query using MySQL :

  1. a)Select the top 20 rows of the deliveries/matches table.
    b)Fetch data of all the matches played on 2nd May 2013.
    c)Fetch data of all the matches where the margin of victory is more than
    100 runs.
    d)Fetch data of all the matches where the final scores of both teams tied
    and order it in descending order of the date.
  2. Create table deliveries_v02 with all the columns of deliveries and an
    additional column ball result containing value boundary, dot or other
    depending on the total run (boundary for >= 4, dot for 0 and other for any other number).
    a) Write a query to fetch the total number of boundaries and dot balls.
    b)Write a query to fetch the total number of boundaries/dot balls scored by each team.
    c)Write a query to fetch the total number of dismissals by dismissal kinds.
  3. Write a query to create a table named deliveries_v03 with all the
    columns of deliveries_v02 table and two additional column (named venue and match date) of venue and date from table matches.
    a)Write a query to fetch the total runs scored for each venue and order it
    in the descending order of total runs scored.
    b)Write a query to fetch the year-wise total runs scored at Eden Gardens
    and order it in the descending order of total runs scored.
  4. Create a new table deliveries_v04 with the first column as ball id
    containing information of match id, inning, over and ball separated by’(For ex. 335982–1–0–1 match_idinning-over-ball) and rest of the columns same as deliveries_v03).
    a)Compare the total count of rows and total count of distinct ball id in
    deliveries_v04.
  5. Create table deliveries_v05 with all columns of deliveries_v04 and an
    additional column for row number partition over ball id.
    a)Use the r num created in deliveries_v05 to identify instances where
    ball id is repeating.
    b) Use subqueries to fetch data of all the ball id which are repeating.

Now that we have a clean dataset, created a database, created and imported tables and a general idea of the requirements from the business questions, let’s dive right in!

1.a) Select the top 20 rows of the deliveries/matches table.

1.a

Output:

Output 1.a
Output 1.a

1.b) Fetch data of all the matches played on 2nd May 2013.

Add a new date column in the matches table to update the existing date to a desired format . Set the safe sql updates to 0 before the update statement. Here, the str_to_date function has been used.

1.b

Output:

Output 1.b

1.c) Fetch data of all the matches where the margin of victory is more than 100 runs.

1.c

Output:

Output 1.c

1.d) Fetch data of all the matches where the final scores of both teams tied and order it in descending order of the date.

1.d

Output:

Output 1.d

2.Create table deliveries_v02 with all the columns of deliveries and an additional column ball result containing value boundary, dot or other depending on the total run (boundary for >= 4, dot for 0 and other for any other number).

In this query, the create table ‘table name’ as , along with the case when statements will give us the desired result.

2

Output:

Output 2

2.a) Write a query to fetch the total number of boundaries and dot balls.

2.a

Output:

Output 2.a

2.b) Write a query to fetch the total number of boundaries/dot balls scored by each team.

2.b

Output:

2.b
2.b

2.c) Write a query to fetch the total number of dismissals by dismissal kinds.

2.c

Output:

Output 2.c

3. Write a query to create a table named deliveries_v03 with all the
columns of deliveries_v02 table and two additional column (named venue and match date) of venue and date from table matches.

3

Output:

Output 3

3.a) Write a query to fetch the total runs scored for each venue and order it in the descending order of total runs scored.

3.a

Output:

Output 3.a

3.b) Write a query to fetch the year-wise total runs scored at Eden Gardens and order it in the descending order of total runs scored.

In order to find year wise total runs we need use an if statement in an update statement to deal with null values.

3.b

Output:

Output 3.b

4. Create a new table deliveries_v04 with the first column as ball id
containing information of match id, inning, over and ball separated by’(For ex. 335982–1–0–1 match id-inning-over-ball) and rest of the columns same as deliveries_v03).

The concat function has been used to get the ball id column. Since we want the first column as ball id, the key word ‘first’ is used while altering the table after the add column command.

4

Output:

Output 4

4.a) Compare the total count of rows and total count of distinct ball id in deliveries_v04.

4.a

Output:

Output 4.a

5.Create table deliveries_v05 with all columns of deliveries_v04 and an additional column for row number partition over ball id.

The function row_number will assign numbers serially for all unique values of the rows of ball id.

5

Output:

Output 5

5.a)Use the r num created in deliveries_v05 to identify instances where ball id is repeating.

5.a

Output:

Output 5.a

5.b) Use subqueries to fetch data of all the ball id which are repeating.

Nesting two queries together using in operator in the where statement gives us the desired result.

5.b

Output:

Output 5.b

In conclusion, through the implementation of SQL, IPL teams can make data-driven decisions, unearth hidden talents, and fine-tune their strategies for each match scenario. IPL franchises can optimize player selection, create strategic team compositions, and analyze player performance to drive overall team success.

Thank you for sticking till the end. I hope you found this case study helpful!
Feel free to leave any suggestions!!

--

--