SQL Challenge 2: Pizza Runner (Exploratory Data Analysis)
By Okonkwo Chukwuebuka Malcom
Introduction
This is the challenge 2 — Pizza Runner of the 8 Weeks SQL Challenge by Danny Ma. I am playing the role of a data analyst at Pizza Runner.
The Week 2 task encompasses both Data Cleaning and Exploratory Data Analysis.
The Data cleaning phase of the work has been done and well documented in this article, so make sure you read that before reading this.
The Exploratory Data Analysis (EDA) would be documented in the post. The Full Documentation of this process can be found in my GitHub Repository
Exploratory Data Analysis
To begin this stage, I would like the share the Entity relationship Diagram of the tables in the pizza_runner
database
For the runners
table
The runners
table shows the registration_date
for each new runner
For the customer_orders:
Customer pizza orders are captured in the customer_orders
table with 1 row for each individual pizza that is part of the order.
The pizza_id
relates to the type of pizza which was ordered whilst the exclusions
are the ingredient_id
values which should be removed from the pizza and the extras
are the ingredient_id
values which need to be added to the pizza.
Note that customers can order multiple pizzas in a single order with varying exclusions
and extras
values even if the pizza is the same type!
The exclusions
and extras
columns will need to be cleaned up before using them in your queries.
For the runner_orders
table:
After each orders are received through the system — they are assigned to a runner — however not all orders are fully completed and can be cancelled by the restaurant or the customer.
The pickup_time
is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The distance
and duration
fields are related to how far and long the runner had to travel to deliver the order to the respective customer.
For the pizza_names
table:
At the moment — Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!
For the pizza_recipes
table:
Each pizza_id
has a standard set of toppings
which are used as part of the pizza recipe.
For the pizza_toppings
table:
This table contains all of the topping_name
values with their corresponding topping_id
value.
Case Study Questions
The following Case Study Questions were asked of the database and solutions were provided using SQL
- How many pizzas were ordered?
Solution:
-- How many pizzas were ordered?
SELECT COUNT(*) AS number_of_ordered_pizzas
FROM customer_orders;
The question aims to find the number of Pizza ordered by the customers. The COUNT function was use to count the number of rows of the customer_orders
table to find out the number of pizza ordered.
Output:
2. How many unique customer orders were made?
SELECT COUNT(DISTINCT order_id) AS distinct_orders_count
FROM customer_orders;
This problems requires knowledge of counting without including duplicate values. The knowledge can be used to count the unique customer orders
Output:
3. How many successful orders were delivered by each runner?
SELECT runner_id,
COUNT(*) AS number_of_successful_orders
FROM runner_orders
WHERE cancellation IS NULL
GROUP BY runner_id;
In this case, successful orders can be described as orders that have null
in the cancellation
column of the runner_orders
table.
Output:
4. How many of each type of pizza was delivered?
SELECT c.pizza_id,
COUNT(*) AS number_of_pizza
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL
GROUP BY c.pizza_id;
This question aims to answer the number of each pizza ordered. This is where the GROUP BY function works best. It groups by the pizza type and then allows for aggregation with another column, in this case, a count function, to count the number of pizza
Output:
5. How many Vegetarian and Meatlovers were ordered by each customer?
SELECT c.customer_id,
COUNT(CASE WHEN p.pizza_name = 'Meatlovers'
THEN pizza_name END) AS count_of_meatlover_pizza_orders,
COUNT(CASE WHEN p.pizza_name = 'Vegetarian'
THEN pizza_name END) AS count_of_vegetarian_pizza_orders
FROM customer_orders AS c
JOIN pizza_names AS p
USING(pizza_id)
GROUP BY c.customer_id
ORDER BY c.customer_id;
In this question, the COUNT CASE function is used to answer it. Other methods could be employed but I chose to work with this. The first CASE COUNT statement aims to count the number of “Meatlovers” pizza that were successfully delivered while the second aims to count the “Vegetarian” pizza that were delivered successfully
Output:
6. What was the maximum number of pizzas delivered in a single order?What was the maximum number of pizzas delivered in a single order?
SELECT MAX(Number_of_pizzas) AS maximum_number_of_pizzas
FROM(
SELECT c.order_id,
COUNT(c.pizza_id) as Number_of_pizzas
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL
GROUP BY c.order_id
ORDER BY Number_of_pizzas DESC
) AS number_of_orders;
For this problem, I used a subquery to first count the number of pizza for each order then the outer query to find the Maximum number of pizza for an order.
Output:
7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
SELECT c.customer_id,
COUNT(CASE WHEN exclusions IS NOT NULL OR extras IS NOT NULL THEN 1 END)
AS delivered_pizza_with_least_one_change,
COUNT(CASE WHEN exclusions IS NULL AND extras IS NULL THEN 1 END)
AS delivered_pizza_with_no_change
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL
GROUP BY c.customer_id
ORDER BY c.customer_id;
In this question, the changes made are referred to as extras and/or exclusions. Once they included an exclusion or an extras in their order then changes have been made to it. This was the explanation I used to write the codes
Output:
8. How many pizzas were delivered that had both exclusions and extras?
SELECT COUNT(c.pizza_id) AS pizza_count
FROM customer_orders AS c
JOIN runner_orders AS r
USING(order_id)
WHERE r.cancellation IS NULL AND c.exclusions IS NOT NULL
AND c.extras IS NOT NULL;
This Questions aims to find out the pizza order with both exclusions and extras added to the order. This can be implemented in the WHERE clause
Output:
9. What was the total volume of pizzas ordered for each hour of the day?
SELECT HOUR(order_time) AS hour_of_the_day,
COUNT(pizza_id) AS volume_of_pizza
FROM customer_orders
GROUP BY HOUR(order_time)
ORDER BY HOUR(order_time) ASC;
The HOUR function extracts the hour of the day from the time stamp using a 24 hours format. The volume in this case represents the number of pizza ordered. Once the hour has been extracted, I then used the GROUP BY function to group by the hour then count the number of pizza ordered in each hour
Output:
10. What was the volume of orders for each day of the week?
SELECT DAYOFWEEK(order_time) AS day_of_week,
DAYNAME(order_time) AS day_name,
COUNT(pizza_id) AS volume_of_pizza
FROM customer_orders
GROUP BY DAYOFWEEK(order_time), DAYNAME(order_time)
ORDER BY DAYOFWEEK(order_time) ASC;
The DAYOFWEEK function is used to to extract the day of the week from a date function. MySQL counts Sunday as Day 1 which could be confusing, so I used the DAYNAME function to also show the day name as well for easy interpretation
Output:
There are a lot more Question in this Case study and you can find them properly documented on my GitHub repository.
Thanks For reading and Feel free to comment, share and correct the codes in case of an error. I would also love feedbacks. Thank you… Time to continue from where I stopped and keep learning on this Journey.