BUSINESS ANALYSIS IN SQL: PIZZA RUNNER SQL CASE STUDY

Samueloyedele
9 min readNov 24, 2023

--

Introduction

This is a SQL project (case study #2) from Danny’s 8weeksqlchallenge.

Pizza Runner is a Pizza business that recruit ‘runners’ to deliver fresh pizzas from Pizza Runner headquarters to its customers.

Problem Statement

Danny wants to use the data to answer a few simple questions about his customers, especially about the pizzas orders, how much money they’ve spent and also which pizza menu items are their favorite.

The data provided requires further data cleaning in other to perform some basic calculations so he can better direct his runners and optimize Pizza Runner’s operation.

Datasets

Six (6) tables was provided for this case study:

  1. Runners: The runners table shows the registration_date for each new runner

2. Customer_orders: Customer pizza orders are captured in the customer_orders table with 1 row for each individual pizza that is part of the order.

The pizza_id relates to the type of pizza which was ordered whilst the exclusions are the ingredient_id values which should be removed from the pizza and the extras are the ingredient_id values which need to be added to the pizza.

The exclusions and extras columns will need to be cleaned up before using them in the queries.

Note that customers can order multiple pizzas in a single order with varying exclusions and extras values even if the pizza is the same type!

3. Runner orders: The runner_order table captures the pickup_time, distance and duration of each orders assigned to a runner. However not all orders are fully completed and can be cancelled by the restaurant or the customer.

Note: There are some known data issues with this table that need to be cleaned up using them in the queries.

4. Pizza names: The pizza_names table captures the pizzas available in Pizza Runner.

5. Pizza recipes: Each pizza_id has a standard set of toppings which are used as part of the pizza recipe.

6. Pizza toppings: This table contains all of the topping_name values with their corresponding topping_id value.

Entity Relationship Diagram

This show relationship between the tables based on their common column id.

Tool Used

PostgreSQL v13

Data Cleaning

In this section, some of the tables are cleaned before performing queries on them for better analysis.

Customer orders table

· Replace ‘null’ or empty string values with 0 in exclusions and extras columns using UPDATE

· Create a new table ‘customer_orders_ new’.

· Insert data from the old table into the new table with type conversion.

· Split the comma-separated values in exclusions and extras columns into separate rows using UNNEST() and STRING_TO_ARRAY() functions then convert their data types to INTEGER.

· Replace NULL with 0 in exclusions and extras columns

· Drop the old table ‘running order’.

· Rename the new table ‘customer_orders_ new’ to the original table name ‘customer_orders’.

Customer orders clean table

Runner orders table

· Create a new table ‘running_order_new’

· Insert data from the old table into the new table with type conversions

· UseNULLIF() function to replace ‘null’ with NULL and convert the datatype to TIMESTAMP in pickup time column.

· Use REGEXP_REPLACE() function to replace ‘km’ from the duration column with empty string ‘ ’ and NULLIF() function to replace ‘ ’ with NULL then CAST the datatype to DECIMAL.

· Replace ‘null’ values with NULL and CAST() the datatype to INTERVAL in duration column using CASE WHEN.

· Replace ‘null’ or empty string ‘ ’ values with NULL and CAST the datatype to TEXT in cancellation column using CASE WHEN.

· Replace NULL with ‘No’ values using UPDATE.

· Drop the old table ‘running order’

· Rename the new table ‘running_order_new’ to the original table name ‘running order’

Runner orders clean table

Pizza recipes table

· Create a new table ‘pizza_recipes_ new’.

· Insert data from the old table into the new table with type conversion.

· Split the comma-separated values in toppings column into separate rows using UNNEST() and STRING_TO_ARRAY() functions then CAST datatype to INTEGER.

· Drop the old table ‘running order’.

· Rename the new table ‘pizza_recipes_ new’ to the original table name ‘pizza_recipes’.

Pizza recipes clean table

Case Study Questions

A. Pizza Metrics

  1. How many pizzas were ordered?

Explanation: The query uses COUNT() to count no of pizzas orders FROM customer orders table.

2. How many unique customer orders were made?

Explanation: The query count the distinct customer orders using COUNT(DISTINCT customer id) FROM customer orders table.

3. How many successful orders were delivered by each runner?

Explanation: The query SELECT runner_id and count the total orders for each runner using COUNT(*) FROM runner orders table WHERE orders is successful then GROUP BY runner id.

4. How many of each type of pizza was delivered?

Explanation: The query SELECT pizza_id, pizza_name, count the total delivered pizzas for each pizza type using COUNT(*)

LEFT JOIN to combine two tables (pizza_names and customer_orders) based on the pizza_id, GROUP BY pizza id and pizza name.

NOTE: The LEFT JOIN returns all rows in pizza names table.

5. How many Vegetarian and Meatlovers were ordered by each customer?

Explanation: The query SELECT customer_id from customer_orders table, pizza_name from pizza_names table, count the total order for each customer based on pizza type using COUNT()

JOIN to combine two tables (customer_orders and pizza_names) based on the pizza_id, GROUP BY customer id, pizza name and ORDER BY customer id.

6. What was the maximum number of pizzas delivered in a single order?

Explanation: The sub query SELECT order_id, count the total pizza orders for each order id using COUNT(*) FROM customer orders table then GROUP BY order id.

The main query calculate the maximum no of pizzas in a single order using MAX(pizza_count) from the sub query “pizza_counts”.

7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

Explanation: The query SELECT customer_id, use CASE WHEN to perform iterations on exclusions and extras columns for pizza_with_change and pizza_with_no_change then use COUNT to count their occurrences FROM customer orders table.

GROUP BY customer_id, use HAVING function to aggregate the COUNT() where pizza_with_change exceed 1 change and pizza_with_no_change is equal to 0 change for each customer id.

8. How many pizzas were delivered that had both exclusions and extras?

Explanation: The query count pizza orders delivered using COUNT(order_id) FROM customer orders table WHERE exclusions and extras columns is not equal to 0.

9. What was the total volume of pizzas ordered for each hour of the day?

Explanation: The query uses Common Table Expression (CTE) using WITH clause to get the order hour from order time column using DATE_TRUNC() , count total pizza orders for each order hour FROM customer orders table then GROUP BY order hour.

The main query SELECT order hour, sum the total pizza orders for each order hour using SUM() from pizza_count_hour sub query then GROUP BY and ORDER BY order hour of the day.

10. What was the volume of orders for each day of the week?

Explanation: The query usesEXTRACT() to get week and day columns from order time column for each order then count the total orders using COUNT(*) from customer orders table. GROUP BY and ORDER BY week and day to return the total orders for each day of the week.

B. Runner and Customer Experience

1. How many runners signed up for each 1 week period? (i.e. week starts 2021–01–01)

Explanation: The inner query uses GENERATE_SERIES() to general date with 1 week interval starting from ‘2021–01–01’ to ‘2021–01–20’ .

The main query SELECT week_start_date generated from inner query, count the runners signed up for each week using COUNT(*) . LEFT JOIN the inner query and runners table, ON to filter the rows then GROUP BY and ORDER BY week_start_date.

2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?

Explanation: The query SELECT runner id, use EXTRACT() to get minute from pickup_time column and calculate the average time in minutes for each runner using AVG() from runner_orders table. GROUP BY and ORDER BY runner id.

3. What was the average distance travelled for each runner?

Explanation: The query SELECT runner id, calculate the average distance travelled for each runner using AVG(distance) ROUND to one (1) decimal place FROM runner orders table. GROUP BY AND ORDER BY runner_id.

The areas of SQL covered in this case study:

Basic Aggregations, Windows Function, Common Table Expression (CTE), Joins, String Transformations, Regular expressions, Dealing with null values etc.

Data Insights

  • Total pizzas order — 16
  • They are five unique customers’ orders.
  • Total delivered (successful) orders — 8.
  • Runner id (1) delivered the most successful pizza orders — 4.
  • Meatlovers pizza is the customer’s pizzas favourite.
  • Customer id (103) has the highest pizza orders — 5.
  • Pizzas delivered with both exclusions and extras — 3.

Conclusion

I was able to clean and analyze business data that gives data insights to the stakeholders and helped in business decision-making. It was a great case study to practice and advance my SQL skills. I will be doing other case studies from Danny’s 8weeksqlchallenge.

Special thanks to the author ‘Danny Ma’ for providing the platform for people to practice their SQL skills.

Github code: https://github.com/Samizic29/Pizza-Runner-Analysis-using-SQL

Here is the link to the case study for anyone interested: Danny’s Diner

You can also check out projects on my Github repository and website

Thanks for taking your time to read it, kindly comment your suggestion on this project. I will also appreciate a like/follow from you.

--

--