Danny’s Diner: Exploring SQL

Sai Praneeth
Geek Culture
Published in
13 min readNov 15, 2021

This is part of Danny ma’s 8 week SQL challenge.

If, you are curious and want to solve it yourself .You can solve it yourself here.

This has been really challenging and i really enjoyed solving these questions. Lets’ get it cracking right away.

Information

Danny is trying to open a little restaurant with his 3 favourite foods: sushi, curry, ramen (My mouth is watering!). The restaurant has collected some very basic data and have no idea how to use their data to help them run their business.

Problem Statement

Danny is interested in answering some basic questions of his customers like:

  • How many times the customers visited?
  • How much money they have spent in the restaurant altogether?
  • What are the favorite items of menu to the customers?

Danny wants to create a personalised experience for his loyal customers so that he can decide whether he should expand his existing members in customer loyality program.

He also wants to generate some basic datasets so that his team can easily inspect the data without using SQL.

Entity Relationship Diagram (ERD)

It includes 3 datasets: sales, menu, memebers.

Image from Danny ma challenge
  1. Sales dataset : It contains each customer (customer_id) and when they have ordered (order_date) and what they have ordered (product_id)
  2. Menu: It contains product details such as it’s name (product_name), unit price (price) and unique identification(product_id)
  3. Members: It contains the details of the customer(customer_id) who are in the loyality program and when they have joined (join_date).

All the details given above and the following questions have been explained in detail here. I have written in way i understood it.

Lets Start Slurpping! Image from alamy

Exploration of datasets

  1. Sales Dataset

This has general sales of the restaurant.

SELECT * FROM dannys_diner.sales;
Snnipet of the result

We can see that sales dataset consits of customers who visited, what they ordered and when they ordered.

2. Menu Dataset

As there are only 3 items in the menu. This data set contains the items price and its details.

3. Members dataset

We will see the members/customers who are in the loyalty Program.

The challenge already includes the questions that need to be solved so lets look at each question in detail.

Case-Study Questions

  1. What is the total amount each customer spent at the restaurant?
  2. How many days has each customer visited the restaurant?
  3. What was the first item from the menu purchased by each customer?
  4. What is the most purchased item on the menu and how many times was it purchased by all customers?
  5. Which item was the most popular for each customer?
  6. Which item was purchased first by the customer after they became a member?
  7. Which item was purchased just before the customer became a member?
  8. What is the total items and amount spent for each member before they became a member?
  9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier — how many points would each customer have?
  10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi — how many points do customer A and B have at the end of January?

Let’s help danny to know his customers well. This is gonna be exciting.

Let’s get coding!

The code is written in Postgre SQL.

  1. What is the total amount each customer spent?

From the question, it’s obvious for each customer, how much did he spend altogether. From ERD diagram, it’s shown that the price of the product is given in menu base table and the customers are in sales table. We need to join these 2 tables so that we can group by each customer and give their sum of their purchased items.

Now, before jumping into the solution, it may be tiring to see that for every question i need to join the tables again and again. What if i have a table already joined? Then i can just aggregate the required data from the joined tables.

So, before answering each question, lets join the required tables in order to extract our data outputs for this case study.

Joining the tables (Implementation)

We have 3 base tables and we will be creating 2 seperate tables. Why? because observing the questions we can find 2 categories/set of questions.

  1. One set concentrates on the items purchased and who purchased it. This would require to join sales and menu table to obtain the following columns:
  • customer_id
  • order_date
  • product_id
  • product_name
  • price

2. Similarly, the questions that involve to check their loyal customers regarding the items they bought, frequent items that are bought, etc. We need to join all 3 tables. This would lead to get the following columns:

  • customer_id
  • order_date
  • join_date
  • product_id
  • product_name
  • price

Now once we have decided which tables to be joined, i have learn’t that deciding which type of table join plays a crucial part! In order to, not be repetative i will explain in detail how to join 2 tables (first category to join) and the same procedure is used for the second category too.

Joining these 2 tables.

We need to answer 3 vital questions before deciding on the type of table join to be used.

  1. What is the purpose of joining these 2 tables?

To know what product each customer bought, we need to match product_id from sales to product_id from menu.

2. What is the distribution of foreign keys within each table?

To understand what is foreign key and other keys look here.

We need to understand the distribution of foreign keys in each table as it helps to decide on the type of table join to be used.

a. Are there multiple records of each product_id in sales table?

WITH product_base AS(
SELECT
product_id,
COUNT(product_id) AS product_counts
FROM dannys_diner.sales
GROUP BY product_id
)
SELECT
product_counts,
COUNT(DISTINCT product_id) AS unique_product_id
FROM product_base
GROUP BY product_counts
ORDER BY product_counts;

The first query, group up each product_id and number of times it has been ordered in sales table. Then see, how many product_counts are there for each product_id.

There are multiple rows per product_id value in sales table.

b. It’s obvious that there will be unique row for each product_id has it doen’t make any sense if there are any duplicate valus (1–to-1 relationship)

To verify we follow the same procedure as above in menu’s table.

WITH product_base AS(
SELECT
product_id,
COUNT(product_id) AS product_counts
FROM dannys_diner.menu
GROUP BY product_id
)
SELECT
product_counts,
COUNT(DISTINCT product_id) AS unique_product_id
FROM product_base
GROUP BY product_counts
ORDER BY product_counts;

We find that, there are 3 unique product_id values that have single row or else there is 1-to-1 relationships of product_id in menu table.

3. How many overlapping and missing unique foreign key values are there between the two tables?

We need to find if there are any unique product_id values in sales that are not present in menu table. We will use anti-join to check it.

SELECT
COUNT(product_id)
FROM dannys_diner.sales
WHERE NOT EXISTS (SELECT product_id
FROM dannys_diner.menu
WHERE menu.product_id = sales.product_id);

There are no unique values.

Now the other way, to check if there any unique product_id values in menu that are not present in sales table.

SELECT
COUNT(product_id)
FROM dannys_diner.menu
WHERE NOT EXISTS (SELECT product_id
FROM dannys_diner.sales
WHERE menu.product_id = sales.product_id);

There are no unique values or missing values.

From this it may be obvious that we can either use inner or left join, but it’s better to conclude with a proof.

Left Join or Inner Join?

DROP TABLE IF EXISTS inner_join_table;
CREATE TEMP TABLE inner_join_table AS (
SELECT
sales.customer_id,
sales.order_date,
sales.product_id,
menu.product_name,
menu.price
FROM dannys_diner.sales
JOIN dannys_diner.menu
ON sales.product_id =menu.product_id
);
DROP TABLE IF EXISTS left_join_table;
CREATE TEMP TABLE left_join_table AS (
SELECT
sales.customer_id,
sales.order_date,
sales.product_id,
menu.product_name,
menu.price
FROM dannys_diner.sales
LEFT JOIN dannys_diner.menu
ON sales.product_id =menu.product_id
);
(
SELECT
'left_join' AS join_type,
COUNT(*) as row_count,
COUNT(DISTINCT product_id) AS unique_product_id
FROM left_join_table
)
UNION
(
SELECT
'inner_join' AS join_type,
COUNT(*) as row_count,
COUNT(DISTINCT product_id) AS unique_product_id
FROM inner_join_table
);

Here we first created individually joined tables and then counted the rows in each table to compare.

Thus, we can see that it doesn’t matter we use either type of join for these 2 table.

To summarise,

  • We first wanted to understand the purpose of joining these 2 tables.
  • Then we understood the distribution of the foreign key (product_id) in each table.
  • Then we checked if there are any unique values of foreign key in both these tables.
  • Finally we compared both the type of joins and decided that we can use either of join type.

The process may be tedious but i think, it will be useful working with big data.

So the result of joing these 2 tables:

DROP TABLE IF EXISTS customer_menu_table;
CREATE TEMP TABLE customer_menu_table AS (
SELECT
sales.customer_id,
sales.order_date,
sales.product_id,
menu.product_name,
menu.price
FROM dannys_diner.sales
INNER JOIN dannys_diner.menu
ON sales.product_id =menu.product_id
);
SELECT * FROM customer_menu_table;

Similarly, for joining all 3 tables, following the above procedure. It can be divided into 2 parts.

The first part is already explained above and the secon part is similar.

Joining 3 tables we obtain

DROP TABLE IF EXISTS member_menu_table;
CREATE TEMP TABLE member_menu_table AS (
SELECT
sales.customer_id,
sales.order_date,
sales.product_id,
menu.product_name,
menu.price,
members.join_date
FROM dannys_diner.sales
INNER JOIN dannys_diner.menu
ON sales.product_id =menu.product_id
LEFT JOIN dannys_diner.members
ON sales.customer_id = members.customer_id
);
SELECT * FROM member_menu_table;

Now, back to the case-study questions. With the above acquired tables, we can start helping danny out.

Let’s Start Serving!
  1. What is the total amount each customer spent?

From customer_menu table we can group by customer_id to get how much each customer spent altogether in the diner.

SELECT 
customer_id,
SUM(price) AS Total_amount
FROM customer_menu_table
GROUP BY customer_id
ORDER BY customer_id;

A has spent 76£, B has spent 74£ and C has spent 36£ in total.

2. How many days has each customer visited the restaurant?

We can get it from counting the number of times he/she ordered by counting order_date.

SELECT
customer_id,
COUNT(DISTINCT order_date) as Visited_total
FROM dannys_diner.sales
GROUP BY customer_id;

A visited 4 times, B visited 6 times and C visited 2 times to danny’s diner.

3. What was the first item from the menu purchased by each customer?

It’s clear that we have to use customer_menu_table.

  • First we have to find the first item of each customer that can be found with respect to order_date.
  • We need to rank them based on the order_date and then extract the data whose rank is 1.
WITH ranking As (
SELECT
customer_id,
product_name,
order_date,
DENSE_RANK() OVER (PARTITION BY customer_id
ORDER BY order_date)
AS dense_ranking
FROM customer_menu_table
)
SELECT
customer_id,
product_name
FROM ranking
WHERE dense_ranking = 1
GROUP BY customer_id, product_name;

A ordered curry and sushi, B ordered curry and C ordered ramen as their first dishes.

4. What is the most purchased item on the menu and how many times was it purchased by all customers?

This has 2 questions i.e.

  • Most purchased item by all customers
  • Number of times that this famous item by all customer

This can be obtained from customer_menu table. By counting each product_id in the table, we can find the product_name as well as the number of times it was bought in total.

SELECT
product_name,
COUNT(product_id) as total
FROM customer_menu_table
GROUP BY product_id, product_name
ORDER BY total DESC
LIMIT 1;

Looks like ramen is pretty popular in danny’s diner. (Curious, how ramen tastes?)

5. Which item was the most popular for each customer?

This can be obtained by each window function.

  • First count the no. of times each product was bought by each customer
  • Then rank them based on each customer
  • Extract the ones we desire.
WITH ranking AS(
SELECT
customer_id,
product_name,
COUNT(product_id) AS times,
DENSE_RANK() OVER (PARTITION BY customer_id
ORDER BY COUNT(product_id) DESC)
AS ranks
FROM customer_menu_table
GROUP BY customer_id, product_name
)
SELECT
customer_id,
times,
product_name
FROM ranking
WHERE ranks = 1;

A & C ordered ramen frequently where as B ordered all the items in the menu frequently. Looks like B really loves danny’s diner.

6. Which item was purchased first by the customer after they became a member?

This insight can be obtained from member_menu table.

  • First we have to compare joining date with order dates and extract all the dates that fall after the joining date of each member of loyalty program.
  • Then rank them according to their ordering date and extract the first item each customer purchased.
WITH ranking AS (
SELECT
customer_id,
order_date,
join_date,
product_name,
DENSE_RANK() OVER(PARTITION BY customer_id
ORDER BY order_date) AS ranks
FROM member_menu_table
WHERE order_date >= join_date
)
SELECT
customer_id,
product_name
FROM ranking
WHERE ranks = 1;

A ordered curry and B ordered sushi before becoming a member of danny’s diner.

7. Which item was purchased just before the customer became a member?

It’s same as above with few alterations.

WITH ranking AS (
SELECT
customer_id,
order_date,
join_date,
product_name,
DENSE_RANK() OVER(PARTITION BY customer_id
ORDER BY order_date DESC) AS ranks
FROM member_menu_table
WHERE order_date < join_date
)
SELECT
customer_id,
product_name
FROM ranking
WHERE ranks = 1;

A ordered sushi and curry where as B ordered sushi before becoming members.

8. What is the total items and amount spent for each member before they became a member?

From members_menu_table, we can group the customers who ordered before they became members.

  • Filter the data who have order_date < join_date
  • Group By customers
  • Count unique items
  • Total the price.
SELECT
customer_id,
COUNT( DISTINCT product_name) AS total_items,
SUM(price) AS total_amount
FROM member_menu_table
WHERE order_date < join_date
GROUP BY customer_id;

A has spent 25 £ on 2 unique items and B has spent 40£ on 2 unique items on the menu from danny’s diner.

9. If each 1£ spent equates to 10 points and sushi has a 2*points mutiplier- how many points would each customer have?

The question is quite clear that

  • For each customer, the money they spend they 10*their money spent as their points
  • If they have ordered shushi then their points are doubled by sushi’s price.
SELECT 
customer_id,
SUM(CASE when product_name = 'sushi' THEN 2*10*price ELSE 10*price END) AS total_points
FROM customer_menu_table
GROUP BY customer_id
ORDER BY customer_id;

Customer A will have 860 points, B will have 940 points and C will have 360 points.

10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi — how many points do customer A and B have at the end of January?

  • Group and order by customer
  • Filter the data to only have the dates that are not after 01–02–2021.
  • Select the sum according to the conditions as explained in the question
SELECT
customer_id,
SUM (
CASE
WHEN order_date BETWEEN join_date AND ((join_date + INTERVAL '6 DAYS')::DATE)
THEN 2*10*price
WHEN product_name = 'sushi'
THEN 2*10*price
ELSE 10*price
END
) AS total_points
FROM member_menu_table
WHERE order_date < '2021-02-01'
GROUP BY customer_id
ORDER BY customer_id;

A has 1370 points and B has 820 points by ens of january.

Now comes the final question to generate some basuc datasets so that team can derive insights.

11. Recreate the following table using the available data:

SELECT 
customer_id,
order_date,
product_name,
price,
(CASE WHEN order_date >= join_date THEN 'Y'
ELSE 'N' END) AS member
FROM member_menu_table
ORDER BY customer_id,order_date;
  • So, when we group the data based on when order_date is after the join_date and that is when they become a member.

12. Danny also requires further information about the ranking of customer products, but he purposely does not need the ranking for non-member purchases so he expects null ranking values for the records when customers are not yet part of the loyalty program

WITH members_list AS (
SELECT
customer_id,
order_date,
product_name,
price,
CASE WHEN order_date >= join_date THEN 'Y'
ELSE 'N' END AS member
FROM member_menu_table
ORDER BY customer_id,order_date;
)
SELECT *,
CASE
WHEN member = 'Y' THEN DENSE_RANK () OVER(PARTITION BY customer_id, member ORDER BY order_date)
ELSE NULL
END AS ranking
FROM members_list;

So it is kind of continuation from previous where danny wants to rank the members, so we can rank the members according to the order_date.

It was really an interesting project for me and enjoyed throughout the journey. If you have any feedback do let me know as it will be vey helpful.

Learning and exploring new everyday.

--

--

Sai Praneeth
Geek Culture

Data Analyst and data enthusiast and love to work on data.