SQL Challenge 2: Pizza Runner (Exploratory Data Analysis)

Okonkwo Chukwuebuka Malcom
7 min readMar 28, 2023

--

By Okonkwo Chukwuebuka Malcom

Introduction

This is the challenge 2 — Pizza Runner of the 8 Weeks SQL Challenge by Danny Ma. I am playing the role of a data analyst at Pizza Runner.

The Week 2 task encompasses both Data Cleaning and Exploratory Data Analysis.

The Data cleaning phase of the work has been done and well documented in this article, so make sure you read that before reading this.

The Exploratory Data Analysis (EDA) would be documented in the post. The Full Documentation of this process can be found in my GitHub Repository

Exploratory Data Analysis

To begin this stage, I would like the share the Entity relationship Diagram of the tables in the pizza_runner database

For the runners table

The runners table shows the registration_date for each new runner

For the customer_orders:

Customer pizza orders are captured in the customer_orders table with 1 row for each individual pizza that is part of the order.

The pizza_id relates to the type of pizza which was ordered whilst the exclusions are the ingredient_id values which should be removed from the pizza and the extras are the ingredient_id values which need to be added to the pizza.

Note that customers can order multiple pizzas in a single order with varying exclusions and extras values even if the pizza is the same type!

The exclusions and extras columns will need to be cleaned up before using them in your queries.

For the runner_orders table:

After each orders are received through the system — they are assigned to a runner — however not all orders are fully completed and can be cancelled by the restaurant or the customer.

The pickup_time is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The distance and duration fields are related to how far and long the runner had to travel to deliver the order to the respective customer.

For the pizza_names table:

At the moment — Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!

For the pizza_recipes table:

Each pizza_id has a standard set of toppings which are used as part of the pizza recipe.

For the pizza_toppings table:

This table contains all of the topping_name values with their corresponding topping_id value.

Case Study Questions

The following Case Study Questions were asked of the database and solutions were provided using SQL

  1. How many pizzas were ordered?

Solution:

-- How many pizzas were ordered?

SELECT COUNT(*) AS number_of_ordered_pizzas
FROM customer_orders;

The question aims to find the number of Pizza ordered by the customers. The COUNT function was use to count the number of rows of the customer_orders table to find out the number of pizza ordered.

Output:

2. How many unique customer orders were made?

SELECT COUNT(DISTINCT order_id) AS distinct_orders_count
FROM customer_orders;

This problems requires knowledge of counting without including duplicate values. The knowledge can be used to count the unique customer orders

Output:

3. How many successful orders were delivered by each runner?

SELECT runner_id, 
COUNT(*) AS number_of_successful_orders
FROM runner_orders
WHERE cancellation IS NULL
GROUP BY runner_id;

In this case, successful orders can be described as orders that have null in the cancellation column of the runner_orders table.

Output:

4. How many of each type of pizza was delivered?

SELECT c.pizza_id,
COUNT(*) AS number_of_pizza
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL
GROUP BY c.pizza_id;

This question aims to answer the number of each pizza ordered. This is where the GROUP BY function works best. It groups by the pizza type and then allows for aggregation with another column, in this case, a count function, to count the number of pizza

Output:

5. How many Vegetarian and Meatlovers were ordered by each customer?

SELECT c.customer_id,
COUNT(CASE WHEN p.pizza_name = 'Meatlovers'
THEN pizza_name END) AS count_of_meatlover_pizza_orders,
COUNT(CASE WHEN p.pizza_name = 'Vegetarian'
THEN pizza_name END) AS count_of_vegetarian_pizza_orders
FROM customer_orders AS c
JOIN pizza_names AS p
USING(pizza_id)
GROUP BY c.customer_id
ORDER BY c.customer_id;

In this question, the COUNT CASE function is used to answer it. Other methods could be employed but I chose to work with this. The first CASE COUNT statement aims to count the number of “Meatlovers” pizza that were successfully delivered while the second aims to count the “Vegetarian” pizza that were delivered successfully

Output:

6. What was the maximum number of pizzas delivered in a single order?What was the maximum number of pizzas delivered in a single order?

SELECT MAX(Number_of_pizzas) AS maximum_number_of_pizzas
FROM(
SELECT c.order_id,
COUNT(c.pizza_id) as Number_of_pizzas
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL
GROUP BY c.order_id
ORDER BY Number_of_pizzas DESC
) AS number_of_orders;

For this problem, I used a subquery to first count the number of pizza for each order then the outer query to find the Maximum number of pizza for an order.

Output:

7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

SELECT c.customer_id, 
COUNT(CASE WHEN exclusions IS NOT NULL OR extras IS NOT NULL THEN 1 END)
AS delivered_pizza_with_least_one_change,
COUNT(CASE WHEN exclusions IS NULL AND extras IS NULL THEN 1 END)
AS delivered_pizza_with_no_change
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL
GROUP BY c.customer_id
ORDER BY c.customer_id;

In this question, the changes made are referred to as extras and/or exclusions. Once they included an exclusion or an extras in their order then changes have been made to it. This was the explanation I used to write the codes

Output:

8. How many pizzas were delivered that had both exclusions and extras?

SELECT COUNT(c.pizza_id) AS pizza_count
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL AND c.exclusions IS NOT NULL
AND c.extras IS NOT NULL;

This Questions aims to find out the pizza order with both exclusions and extras added to the order. This can be implemented in the WHERE clause

Output:

9. What was the total volume of pizzas ordered for each hour of the day?

SELECT HOUR(order_time) AS hour_of_the_day,
COUNT(pizza_id) AS volume_of_pizza
FROM customer_orders
GROUP BY HOUR(order_time)
ORDER BY HOUR(order_time) ASC;

The HOUR function extracts the hour of the day from the time stamp using a 24 hours format. The volume in this case represents the number of pizza ordered. Once the hour has been extracted, I then used the GROUP BY function to group by the hour then count the number of pizza ordered in each hour

Output:

10. What was the volume of orders for each day of the week?

SELECT DAYOFWEEK(order_time) AS day_of_week,
DAYNAME(order_time) AS day_name,
COUNT(pizza_id) AS volume_of_pizza
FROM customer_orders
GROUP BY DAYOFWEEK(order_time), DAYNAME(order_time)
ORDER BY DAYOFWEEK(order_time) ASC;

The DAYOFWEEK function is used to to extract the day of the week from a date function. MySQL counts Sunday as Day 1 which could be confusing, so I used the DAYNAME function to also show the day name as well for easy interpretation

Output:

There are a lot more Question in this Case study and you can find them properly documented on my GitHub repository.

Thanks For reading and Feel free to comment, share and correct the codes in case of an error. I would also love feedbacks. Thank you… Time to continue from where I stopped and keep learning on this Journey.

Feel free to reach out to me on LinkedIn and Twitter

--

--

Okonkwo Chukwuebuka Malcom

A Data Analyst, Problem solver and Graduate Chemical Engineer. Check out my Articles if you like to see how data can be used to solve problems & make decisions