8 Week SQL Challenge — Case Study #2 — Pizza Runner (B)
Runner and Customer Experience
We will use MYSQL Workbench for solving this case study. You can see this case study here: https://8weeksqlchallenge.com/case-study-2/
This blog is the continuation of the 8 Week SQL Challenge- Case Study #2- Pizza Runner.
Let’s have an overview of the tables:
Table 1: runners
The runners
table shows the registration_date
for each new runner
Table 2: customer_orders
Customer pizza orders are captured in the customer_orders
table with 1 row for each individual pizza that is part of the order.
The pizza_id
relates to the type of pizza which was ordered whilst the exclusions
are the ingredient_id
values which should be removed from the pizza and the extras
are the ingredient_id
values which need to be added to the pizza.
Note that customers can order multiple pizzas in a single order with varying exclusions
and extras
values even if the pizza is the same type!
The exclusions
and extras
columns will need to be cleaned up before using them in your queries.
Table 3: runner_orders
After each orders are received through the system — they are assigned to a runner — however not all orders are fully completed and can be cancelled by the restaurant or the customer.
The pickup_time
is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The distance
and duration
fields are related to how far and long the runner had to travel to deliver the order to the respective customer.
Table 4: pizza_names
At the moment — Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!
Table 5: pizza_recipes
Each pizza_id
has a standard set of toppings
which are used as part of the pizza recipe.
Table 6: pizza_toppings
This table contains all of the topping_name
values with their corresponding topping_id
value
B. Runner and Customer Experience
1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01
)
Ans: In this question, we have to find the number of runners who signed up( registered
) in each 1 week period. To find out this, we will use CASE WHEN THEN ELSE condition to find the table having runner_id
& week_start
for the starting date of the week.
SELECT runner_id, (
CASE
WHEN registration_date BETWEEN '2021-01-01' AND '2021-01-07' THEN '2021-01-01'
WHEN registration_date BETWEEN '2021-01-08' AND '2021-01-14' THEN '2021-01-08'
WHEN registration_date BETWEEN '2021-01-15' AND '2021-01-22' THEN '2021-01-15'
WHEN registration_date BETWEEN '2021-01-23' AND '2021-01-30' THEN '2021-01-23'
END) AS week_start
FROM runners;
Now, let’s use CTE as we can’t directly apply COUNT function, then count number of runners in each week using COUNT function using GROUP BY week_start
.
WITH week_details AS (
SELECT runner_id, (
CASE
WHEN registration_date BETWEEN '2021-01-01' AND '2021-01-07' THEN '2021-01-01'
WHEN registration_date BETWEEN '2021-01-08' AND '2021-01-14' THEN '2021-01-08'
WHEN registration_date BETWEEN '2021-01-15' AND '2021-01-22' THEN '2021-01-15'
WHEN registration_date BETWEEN '2021-01-23' AND '2021-01-30' THEN '2021-01-23'
END) AS week_start
FROM runners)
SELECT week_start,COUNT(*) AS runners_signed_up FROM week_details
GROUP BY week_start;
Hence, in 1st week 2 runners signed up, in 2nd week only 1 runner signed up & in 3rd week also only 1 runner signed up.
2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
Ans: First we have to join runner_orders
table with customer_orders
table. then calculate the time difference of order_time
& pickup_time
using TIMESTAMPDIFF function & filter the data using WHERE pickup_time
is not equal to ‘null’ (as those orders were cancelled).
SELECT runner_id,ro.order_id,TIMESTAMPDIFF(MINUTE,order_time,pickup_time) AS time_in_minute FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null';
As we need to find the average time of each runner but we apply the AVG function directly on time_in_minute
, using GROUP BY runner_id
.
SELECT runner_id,AVG(TIMESTAMPDIFF(MINUTE,order_time,pickup_time)) AS avg_time_minute FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null'
GROUP BY runner_id;
Ans: Hence, the average time it took for runner_1 is 15.33 minutes, runner_2 took 23.40 minutes while runner_3 took 10 minutes to arrive at the Pizza Runner HQ to pick up the order.
3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
Ans: First, JOIN customer_orders
& runner_orders
table and filter the data WHERE pickup_time
is not null.
SELECT * FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null';
Now, let’s count number of pizzas for each order using COUNT (*) on GROUP BY order_id
.
SELECT co.order_id,COUNT(*) AS num_pizza FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null'
GROUP BY order_id;
Now, we will use the raw table of the 2nd query,
SELECT DISTINCT co.order_id, TIMESTAMPDIFF(MINUTE,order_time,pickup_time) AS time_in_minute FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null';
Let’s combine both the above tables using JOIN function on pizza.order_id=prep_time.order_time
where pizza is the name of the first table while prep_time is the second table. You can follow this to have deep understanding of combining these two tables.
SELECT pizza.order_id,num_pizza,time_in_minute AS preparation_time_min FROM
(SELECT co.order_id,COUNT(*) AS num_pizza FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null'
GROUP BY order_id) AS pizza
JOIN
(SELECT DISTINCT co.order_id, TIMESTAMPDIFF(MINUTE,order_time,pickup_time) AS time_in_minute FROM customer_orders co
JOIN runner_orders ro
ON co.order_id=ro.order_id
WHERE pickup_time<>'null') AS prep_time
ON pizza.order_id=prep_time.order_id
ORDER BY preparation_time_min DESC;
As, we can see for preparing 3 pizzas the time taken is highest i.e. 29 minutes, while for preparing 2 pizzas for order 3 time taken is 21 minutes while for order 10 it is 15 minutes. And for preparing 1 pizza the time taken is almost 10 minutes except for order 8.
Hence, we can say more number of pizzas will take higher time to prepare than preparing 1 pizza.
4. What was the average distance travelled for each customer?
Ans: First join customer_orders
& runner_orders
table. Use the customer_id,order_id,runner_id,distance
columns.
SELECT customer_id, co.order_id, runner_id, distance FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id;
As there are duplicate rows, first we need to choose distinct tuples/rows by using DISTINCT function on customer_id
, as this function won’t work for the in-between columns.
SELECT DISTINCT customer_id, co.order_id,runner_id,distance AS distance FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE distance<>'null';
Then, to find out the average distance travelled for each customer, we have to use CTE (distance_travelled
) (without using it, we won’t be able to retain the previous table, as then it will work after applying the GROUP BY function), we will use the AVG function by using GROUP BY customer_id
, WHERE distance is not equal to ‘null’.
WITH distance_travelled AS (
SELECT DISTINCT customer_id, co.order_id,runner_id,distance AS distance FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE distance<>'null'
)
SELECT customer_id, AVG(distance) AS average_distance_km FROM distance_travelled
GROUP BY customer_id;
Hence, average distance travelled for customer 101 is 20 km, for customer 102 is 18.4 km, customer 103 is 23.4 km, customer 104 is 10 km while for customer 105 is 25 km.
5. What was the difference between the longest and shortest delivery times for all orders?
Ans: We will use the raw table of 2nd question with some changes, in the time_in_minute
column, add duration to get the delivery_time
, & choose DISTINCT order_id
to remove duplicates.
SELECT DISTINCT ro.order_id,TIMESTAMPDIFF(MINUTE,order_time,pickup_time)+duration AS delivery_time FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null';
Now, we will use CTE ( delivery_details
) as without this we can’t directly use MAX — MIN, then difference of MAX delivery_time
& MIN delivery_time
.
WITH delivery_details AS(
SELECT DISTINCT ro.order_id,TIMESTAMPDIFF(MINUTE,order_time,pickup_time)+duration AS delivery_time FROM runner_orders ro JOIN customer_orders co
ON ro.order_id=co.order_id WHERE pickup_time<>'null'
)
SELECT MAX(delivery_time)-MIN(delivery_time) AS diff_max_min_delivery_time FROM delivery_details;
Hence, the difference between the longest and shortest delivery times for all orders is 44 minutes.
6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
Ans: We will use runner_orders
table WHERE distance
is not null, use (distance/duration)*60
to find out the average speed for each runner in km/h.
SELECT runner_id,order_id,distance,ROUND((distance/duration)*60,2) AS avg_speed_kmph FROM runner_orders
WHERE distance<>'null'
ORDER BY runner_id,distance DESC,avg_speed_kmph;
Hence, we can say that the runner’s speed is higher in case of shorter distance than the longer distance for runner 1. While in case of runner 2 this pattern doesn’t follow for all the orders as in case of same distance, there is huge different in his speed.
7. What is the successful delivery percentage for each runner?
Ans: First, let’s create the table for calculating total number of orders for each runner using COUNT function, GROUP BY runner_id
.
SELECT runner_id, COUNT(*) AS total_orders FROM runner_orders
GROUP BY runner_id;
Now, Let’s create the table for calculating delivered number of orders for each runner using COUNT function WHERE distance
not equal to null then GROUP BY runner_id
.
SELECT runner_id, COUNT(*) AS delivered_orders FROM runner_orders
WHERE distance<>'null'
GROUP BY runner_id;
Let’s combine both the above tables using JOIN, then use delivered_orders*100/total_orders
to find the successful_delivery_percentage
& use ROUND function to round the value.
SELECT o.runner_id,ROUND(delivered_orders*100/total_orders,0) AS successful_delivery_percentage FROM (
SELECT runner_id, COUNT(*) AS total_orders FROM runner_orders
GROUP BY runner_id) AS o
JOIN
( SELECT runner_id, COUNT(*) AS delivered_orders FROM runner_orders
WHERE distance<>'null'
GROUP BY runner_id) AS od
ON o.runner_id=od.runner_id;
Hence, the successful delivery percentage for runner 1 is 100%, runner 2 is 75% while runner 3 is 50%.
Insights:
- In 1st week 2 runners signed up, in 2nd week only 1 runner signed up & in 3rd week also only 1 runner signed up.
- The average time it took for runner_1 is 15.33 minutes, runner_2 took 23.40 minutes while runner_3 took 10 minutes to arrive at the Pizza Runner HQ to pick up the order.
- We can say more number of pizzas will take higher time to prepare than preparing 1 pizza.
- Average distance travelled for customer 101 is 20 km, for customer 102 is 18.4 km, customer 103 is 23.4 km, customer 104 is 10 km while for customer 105 is 25 km.
- The difference between the longest and shortest delivery times for all orders is 44 minutes.
- Runner’s speed is higher in case of shorter distance than the longer distance for runner 1. While in case of runner 2 this pattern doesn’t follow for all the orders as in case of same distance, there is huge different in his speed.
- The successful delivery percentage for runner 1 is 100%, runner 2 is 75% while runner 3 is 50%.
Future Work Ideas:
- Ingredient Optimisation
- Pricing and Ratings
- Bonus DML Challenges (DML = Data Manipulation Language)
References:
- https://8weeksqlchallenge.com/case-study-2/
- https://medium.com/jovianml/8-week-sql-challenge-case-study-2-pizza-runner-5899386d9
- https://stackoverflow.com/questions/1629201/sql-replace-old-values-with-new-ones
- https://stackoverflow.com/questions/11448068/mysql-error-code-1175-during-update-in-mysql-workbench
- https://learnsql.com/blog/what-is-common-table-expression/
- https://www.w3schools.com/sql/func_mysql_weekday.asp
- https://stackoverflow.com/questions/17407481/check-if-a-time-is-between-two-times-time-datatype
I hope you enjoyed reading the article. Do follow for the continuation of this case study.