SQL Challenge : Pizza Runner (Exploratory Data Analysis) by Mariam Salvador

Salvadormariam
9 min readAug 4, 2023

--

This is the challenge 2 — Pizza Runner of the 8 Weeks SQL Challenge by Danny Ma.

Note: this is the exploratory anlysis of this case study and this data has been cleaned before analysis. Data cleaning process has been documented here pizza runner data cleaning . PostgreSQL will be used for analysis

Problem Statement

Danny was scrolling through his Instagram feed when something really caught his eye — “80s Retro Styling and Pizza Is The Future!”

Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire — so he had one more genius idea to combine with it — he was going to Uberize it — and so Pizza Runner was launched!

Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.

Danny has created an entity relationship diagram (ERD) for his database design in the Pizza Runner challenge. However, he needs additional support to clean and refine his data, as well as perform basic calculations. This will enable him to make more informed decisions regarding the direction of his runners and optimize the operations of Pizza Runner.

The Entity relationship Diagram can be seen below

ERD

Data Cleaning

Data has to be cleaned before carrying out analysis. Data cleaning process has been documented here pizza runner data cleaning . kindly read through the data cleaning process before this exploratory analysis.

Solution to Case Study Questions

A. Pizza Metrics

1. How many pizzas were ordered?

14 pizzas were ordered

Questio 1

2. How many unique customer orders were made?

Danny Ma has 10 customers.

question 2

3. How many successful orders were delivered by each runner?

Danny Ma has 3 runners and runner_id 1 had the highest number of successful delivery

question 3

4. How many of each type of pizza was delivered?

I filtered the cancellation column to get the delivered orders. Meatlovers pizza had the highest number of successful deliveries

question 4

5. How many Vegetarian and Meatlovers were ordered by each customer?

Meatlovers pizza was mostly ordered by each customer

question 5

6. What was the maximum number of pizzas delivered in a single order?

question 6

7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

In this question, the changes made are referred to as extras and/or exclusions. Once they included an exclusion or an extras in their order then changes have been made to it.

question 7

8. How many pizzas were delivered that had both exclusions and extras?

Only 1 pizza order had both extras and exclusions

question 8

9. What was the total volume of pizzas ordered for each hour of the day?

question 9

10. What was the volume of orders for each day of the week?

I used the extract function to get the day of week number and to_char function to derive the name of the day of the week. Saturday and Wednesday had the highest volume of orders.

question 10

B. Runner and Customer Experience

  1. How many runners signed up for each 1 week period? (i.e. week starts 2021–01–01)
calender

The week number for 2021–01–01 is week 53. This is because this date falls in the last week of December 2020. Since Danny Ma’s start week is 2021–01–01 , then we have to add 3 days to make this date fall in the first week of January. From the output, week 1 had the highest number of runners sign up.

question 1

2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?

I used the age function to get the difference between the pickup time and order time then extracted minute from the difference. Runner_id 2 had the highest average time to arrive at ruuner HQ.

question 2

3. Is there any relationship between the number of pizzas and how long the order takes to prepare?

The more the pizzas ordered, the longer it takes for that order to be ready.

question 3

output

output

4. What was the average distance travelled for each customer?

question 4

5. What was the difference between the longest and shortest delivery times for all orders?

question 5

6. What was the average speed for each runner for each delivery and do you notice any trend for these values?

To calculate speed, we’ll use the formula ‘Distance/Time’. This can be calculated as kilometer per hour(km/h) or meter per seconds(mps). The duration column in customer orders table is less than 60 mins and it can’t be converted to hour, therefore I’ll use the mps formula. To convert km to meter, we’ll multiply km by 1000 and multiply 60 by minutes to convert to seconds.

question 6

output

output

7. What is the successful delivery percentage for each runner?

I filtered the cancellation column to get only the non-cancelled orders

question 7

output

Runner id 1 has 100% successful delivery.

output

C. Ingredient Optimization

1. What are the standard ingredients for each pizza?

In this question, I’ll make use of the temp table (cleaned_pizza_recipes) I created in the data cleaning process. The string_agg function helps to combine multiple rows into a single row with the commas as a separator.

query

output

output

2. What was the most commonly added extra?

In this question, I’ll make use of the temp table (cleaned_customer_orders) I created in the data cleaning process.

query

Output

Bacon is the most common added extras

output

3. What was the most common exclusion?

query

output:

Cheese was commonly excluded from pizza orders

output

4. Generate an order item for each record in the customers_orders table in the format of one of the following:

Meat Lovers

Meat Lovers — Exclude Beef

Meat Lovers — Extra Bacon

Meat Lovers — Exclude Cheese, Bacon — Extra Mushroom, Peppers

solution

The toppings are:

4-Exclude Cheese

1-Exra Bacon

2,6-Exclude BBQ Sauce, Mushrooms

1,4-Extra Bacon, Cheese

1,5-Extra Bacon, Chicken

I manually hardcoded the case statement (not the best option though).

query

output

output

D. Pricing and Ratings

1. If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes — how much money has Pizza Runner made so far if there are no delivery fees?

I used the case statement to determine the pricing for pizzas. Pizza Runner made $138.

query & output

2. What if there was an additional $1 charge for any pizza extras?

query

output:

output

3. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset — generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.

query

Here’s a breakdown of how the code works:

· RANDOM(): The RANDOM() function returns a random value between 0 (inclusive) and 1 (exclusive). It generates a random floating-point number.

· RANDOM() * (5 - 1): This part of the code multiplies the random value generated by RAND() with the range of numbers you want (5 - 1), which results in a random value between 0 (inclusive) and 4 (exclusive). The range (5 - 1) is used to ensure that the upper bound (4) is exclusive.

· 1 + RANDOM() * (5 - 1): Adding 1 to the result of the previous step shifts the range to be between 1 (inclusive) and 5 (exclusive).

· FLOOR(1 + RANDOM() * (5 - 1)): The FLOOR() function is then used to round down the result to the nearest integer. This ensures that the final result will be an integer between 1 and 4, both inclusive.

In summary, the SQL code will produce a random integer between 1 and 5 with equal probability for each value. The value 1 has a 20% chance of being generated, the value 2 has a 20% chance, the value 3 has a 20% chance, the value 4 has a 20% chance, and the value 5 has a 20% chance.

Output:

output

4. Using your newly generated table — can you join all of the information together to form a table which has the following information for successful deliveries?

· customer_id

· order_id

· runner_id

· rating

· order_time

· pickup_time

· Time between order and pickup

· Delivery duration

· Average speed

· Total number of pizzas

query

output:

output

5. If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled — how much money does Pizza Runner have left over after these deliveries?

query

output

output

Thank you so much for reading.

you can reach out to me via twitter on linkedln

Twitter handle: @SalvadorMariam6 . Link: https://twitter.com/salvadormariam6?s=21&t=KlR9Wysi0L7CAaTJZl72FQ

linkedln: http://linkedin.com/in/salvador-mariam-7784061a4

--

--