8 Week SQL Challenge — Case Study #2 — Pizza Runner (B)

Kshama Singhal
9 min readMay 3, 2023

--

Runner and Customer Experience

We will use MYSQL Workbench for solving this case study. You can see this case study here: https://8weeksqlchallenge.com/case-study-2/

This blog is the continuation of the 8 Week SQL Challenge- Case Study #2- Pizza Runner.

Let’s have an overview of the tables:

Table 1: runners

The runners table shows the registration_date for each new runner

Table 2: customer_orders

Customer pizza orders are captured in the customer_orders table with 1 row for each individual pizza that is part of the order.

The pizza_id relates to the type of pizza which was ordered whilst the exclusions are the ingredient_id values which should be removed from the pizza and the extras are the ingredient_id values which need to be added to the pizza.

Note that customers can order multiple pizzas in a single order with varying exclusions and extras values even if the pizza is the same type!

The exclusions and extras columns will need to be cleaned up before using them in your queries.

Table 3: runner_orders

After each orders are received through the system — they are assigned to a runner — however not all orders are fully completed and can be cancelled by the restaurant or the customer.

The pickup_time is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The distance and duration fields are related to how far and long the runner had to travel to deliver the order to the respective customer.

Table 4: pizza_names

At the moment — Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!

Table 5: pizza_recipes

Each pizza_id has a standard set of toppings which are used as part of the pizza recipe.

Table 6: pizza_toppings

This table contains all of the topping_name values with their corresponding topping_id value

B. Runner and Customer Experience

1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)

Ans: In this question, we have to find the number of runners who signed up( registered) in each 1 week period. To find out this, we will use CASE WHEN THEN ELSE condition to find the table having runner_id & week_start for the starting date of the week.

SELECT runner_id, (
CASE
WHEN registration_date BETWEEN '2021-01-01' AND '2021-01-07' THEN '2021-01-01'
WHEN registration_date BETWEEN '2021-01-08' AND '2021-01-14' THEN '2021-01-08'
WHEN registration_date BETWEEN '2021-01-15' AND '2021-01-22' THEN '2021-01-15'
WHEN registration_date BETWEEN '2021-01-23' AND '2021-01-30' THEN '2021-01-23'
END) AS week_start
FROM runners;

Now, let’s use CTE as we can’t directly apply COUNT function, then count number of runners in each week using COUNT function using GROUP BY week_start .

WITH week_details AS (
SELECT runner_id, (
CASE
WHEN registration_date BETWEEN '2021-01-01' AND '2021-01-07' THEN '2021-01-01'
WHEN registration_date BETWEEN '2021-01-08' AND '2021-01-14' THEN '2021-01-08'
WHEN registration_date BETWEEN '2021-01-15' AND '2021-01-22' THEN '2021-01-15'
WHEN registration_date BETWEEN '2021-01-23' AND '2021-01-30' THEN '2021-01-23'
END) AS week_start
FROM runners)
SELECT week_start,COUNT(*) AS runners_signed_up FROM week_details
GROUP BY week_start;

Hence, in 1st week 2 runners signed up, in 2nd week only 1 runner signed up & in 3rd week also only 1 runner signed up.

2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?

Ans: First we have to join runner_orders table with customer_orders table. then calculate the time difference of order_time & pickup_time using TIMESTAMPDIFF function & filter the data using WHERE pickup_time is not equal to ‘null’ (as those orders were cancelled).

SELECT runner_id,ro.order_id,TIMESTAMPDIFF(MINUTE,order_time,pickup_time) AS time_in_minute FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null';
Output for the above query

As we need to find the average time of each runner but we apply the AVG function directly on time_in_minute , using GROUP BY runner_id .

SELECT runner_id,AVG(TIMESTAMPDIFF(MINUTE,order_time,pickup_time)) AS avg_time_minute FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null'
GROUP BY runner_id;
Output for the above query

Ans: Hence, the average time it took for runner_1 is 15.33 minutes, runner_2 took 23.40 minutes while runner_3 took 10 minutes to arrive at the Pizza Runner HQ to pick up the order.

3. Is there any relationship between the number of pizzas and how long the order takes to prepare?

Ans: First, JOIN customer_orders & runner_orders table and filter the data WHERE pickup_time is not null.

SELECT * FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null';

Now, let’s count number of pizzas for each order using COUNT (*) on GROUP BY order_id .

SELECT co.order_id,COUNT(*) AS num_pizza FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null'
GROUP BY order_id;

Now, we will use the raw table of the 2nd query,

SELECT DISTINCT co.order_id, TIMESTAMPDIFF(MINUTE,order_time,pickup_time) AS time_in_minute FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null';

Let’s combine both the above tables using JOIN function on pizza.order_id=prep_time.order_time where pizza is the name of the first table while prep_time is the second table. You can follow this to have deep understanding of combining these two tables.

SELECT pizza.order_id,num_pizza,time_in_minute AS preparation_time_min FROM
(SELECT co.order_id,COUNT(*) AS num_pizza FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null'
GROUP BY order_id) AS pizza
JOIN
(SELECT DISTINCT co.order_id, TIMESTAMPDIFF(MINUTE,order_time,pickup_time) AS time_in_minute FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null') AS prep_time
ON pizza.order_id=prep_time.order_id
ORDER BY preparation_time_min DESC;

As, we can see for preparing 3 pizzas the time taken is highest i.e. 29 minutes, while for preparing 2 pizzas for order 3 time taken is 21 minutes while for order 10 it is 15 minutes. And for preparing 1 pizza the time taken is almost 10 minutes except for order 8.

Hence, we can say more number of pizzas will take higher time to prepare than preparing 1 pizza.

4. What was the average distance travelled for each customer?

Ans: First join customer_orders& runner_orderstable. Use the customer_id,order_id,runner_id,distancecolumns.

SELECT customer_id, co.order_id, runner_id, distance FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id;

As there are duplicate rows, first we need to choose distinct tuples/rows by using DISTINCT function on customer_id, as this function won’t work for the in-between columns.

SELECT DISTINCT customer_id, co.order_id,runner_id,distance AS distance FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE distance<>'null';
After removing the duplicated rows

Then, to find out the average distance travelled for each customer, we have to use CTE (distance_travelled) (without using it, we won’t be able to retain the previous table, as then it will work after applying the GROUP BY function), we will use the AVG function by using GROUP BY customer_id, WHERE distance is not equal to ‘null’.

WITH distance_travelled AS (
SELECT DISTINCT customer_id, co.order_id,runner_id,distance AS distance FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE distance<>'null'
)
SELECT customer_id, AVG(distance) AS average_distance_km FROM distance_travelled
GROUP BY customer_id;

Hence, average distance travelled for customer 101 is 20 km, for customer 102 is 18.4 km, customer 103 is 23.4 km, customer 104 is 10 km while for customer 105 is 25 km.

5. What was the difference between the longest and shortest delivery times for all orders?

Ans: We will use the raw table of 2nd question with some changes, in the time_in_minutecolumn, add duration to get the delivery_time , & choose DISTINCT order_idto remove duplicates.

SELECT DISTINCT ro.order_id,TIMESTAMPDIFF(MINUTE,order_time,pickup_time)+duration AS delivery_time FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null';

Now, we will use CTE ( delivery_details) as without this we can’t directly use MAX — MIN, then difference of MAX delivery_time & MIN delivery_time .

WITH delivery_details AS(
SELECT DISTINCT ro.order_id,TIMESTAMPDIFF(MINUTE,order_time,pickup_time)+duration AS delivery_time FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null'
)
SELECT MAX(delivery_time)-MIN(delivery_time) AS diff_max_min_delivery_time FROM delivery_details;

Hence, the difference between the longest and shortest delivery times for all orders is 44 minutes.

6. What was the average speed for each runner for each delivery and do you notice any trend for these values?

Ans: We will use runner_orders table WHERE distance is not null, use (distance/duration)*60to find out the average speed for each runner in km/h.

SELECT runner_id,order_id,distance,ROUND((distance/duration)*60,2) AS avg_speed_kmph FROM runner_orders
WHERE distance<>'null'
ORDER BY runner_id,distance DESC,avg_speed_kmph;

Hence, we can say that the runner’s speed is higher in case of shorter distance than the longer distance for runner 1. While in case of runner 2 this pattern doesn’t follow for all the orders as in case of same distance, there is huge different in his speed.

7. What is the successful delivery percentage for each runner?

Ans: First, let’s create the table for calculating total number of orders for each runner using COUNT function, GROUP BY runner_id .

SELECT runner_id, COUNT(*) AS total_orders FROM runner_orders
GROUP BY runner_id;

Now, Let’s create the table for calculating delivered number of orders for each runner using COUNT function WHERE distancenot equal to null then GROUP BY runner_id .

SELECT runner_id, COUNT(*) AS delivered_orders FROM runner_orders
WHERE distance<>'null'
GROUP BY runner_id;

Let’s combine both the above tables using JOIN, then use delivered_orders*100/total_ordersto find the successful_delivery_percentage& use ROUND function to round the value.

SELECT o.runner_id,ROUND(delivered_orders*100/total_orders,0) AS successful_delivery_percentage FROM (
SELECT runner_id, COUNT(*) AS total_orders FROM runner_orders
GROUP BY runner_id) AS o
JOIN
( SELECT runner_id, COUNT(*) AS delivered_orders FROM runner_orders
WHERE distance<>'null'
GROUP BY runner_id) AS od
ON o.runner_id=od.runner_id;

Hence, the successful delivery percentage for runner 1 is 100%, runner 2 is 75% while runner 3 is 50%.

Insights:

  • In 1st week 2 runners signed up, in 2nd week only 1 runner signed up & in 3rd week also only 1 runner signed up.
  • The average time it took for runner_1 is 15.33 minutes, runner_2 took 23.40 minutes while runner_3 took 10 minutes to arrive at the Pizza Runner HQ to pick up the order.
  • We can say more number of pizzas will take higher time to prepare than preparing 1 pizza.
  • Average distance travelled for customer 101 is 20 km, for customer 102 is 18.4 km, customer 103 is 23.4 km, customer 104 is 10 km while for customer 105 is 25 km.
  • The difference between the longest and shortest delivery times for all orders is 44 minutes.
  • Runner’s speed is higher in case of shorter distance than the longer distance for runner 1. While in case of runner 2 this pattern doesn’t follow for all the orders as in case of same distance, there is huge different in his speed.
  • The successful delivery percentage for runner 1 is 100%, runner 2 is 75% while runner 3 is 50%.

Future Work Ideas:

  • Ingredient Optimisation
  • Pricing and Ratings
  • Bonus DML Challenges (DML = Data Manipulation Language)

--

--

Kshama Singhal

Data Analysis | Data Visualization | SQL | EDA | Python | Web Scraping with Beautiful soup | Tableau | Advanced Excel | Mathematics | A/B Testing