SQL Insider- Case 2: Danny’s Diner Survival Guide
In this case of SQL insider, we’re going to help the beloved Danny’s Diner ride the wave of analytics using SQL and get to know some detailed concepts of the language. Let’s get going!
After the intriguing sleuthing of SQL City’s Murder Mystery, this time let’s take a more business approach, and help a small business owner ride the wave, armed with data. We’re going to dig in Danny’s Diner Dataset to uncover some of its mysteries.
What’s inside:
A Table for CTEs & Partitioning
Common Table Expressions are essentially temporary outputs, defined within a SQL query and can be referenced subsequently within the same query.
Let’s take a simple example to illustrate the process: You have two lists — one with the names of people and another with their favorite activities. You are planning a fun event and want to quickly review this information to prepare an enjoyable day for everyone.
As you prepare for a winter event, you may want to explore all ‘out’ activities and the names of individuals who enjoy them.
With the help of CTEs, a single readable query allows you to join the tables and present the desired information seamlessly.
with event_win as
(select a.person as 'Name',
b.activity as 'Hobby',
b.category as 'Category'
from people a
left join activities b on a.id = b.person_id)
select Name, Hobby, Category
From event_win
where Category = 'out'
This query does not create a table/view in the database, rather allows temporary usage of the join-query as a table which can be referenced further within the same query.
What makes CTE’s so extensively used is because of:
- Code Readability: CTEs allow breaking down complex queries into smaller, logical, and self-contained pieces, making the SQL code easier to understand and maintain.
- Code Reusability: CTEs can be referenced multiple times within the same query, enabling code reuse and reducing redundancy. This is particularly valuable for repetitive subqueries.
- Recursive Queries: CTEs are particularly useful for handling recursive queries, where a query references its own output. This helps solve problems like hierarchical data representations.
- Optimization: SQL query optimizers can often optimize CTEs better than nested subqueries or derived tables, leading to improved query performance.
That brings us to the concept of ‘Partition by’. In SQL, the PARTITION BY
function is used in combination with window functions to divide the result set into partitions or groups based on one or more specified columns. It is a crucial feature in SQL for performing calculations and analysis within each partition separately, providing more granular control over the application of aggregate functions and analytical functions.
Adding upon our earlier example, suppose you have a specific budget requirement for organizing the event. Rather than pre-determining an outdoor event, you wish to compare the total costs for both indoor and outdoor categories to make a well-informed decision.
SELECT
Category,
Name,
Cost,
SUM(Cost) OVER (PARTITION BY Category) AS TotalCostPerCategory,
RANK() OVER (PARTITION BY Category ORDER BY Cost DESC) AS Ranks
FROM
Activities;
These are a couple of advanced SQL concepts, that come in handy while helping our friends at Danny’s Diner.
Now that we have a fair amount of knowledge of our fundamentals, let’s get started, shall we?
Danny’s Diner Menu: Problem
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.
Danny’s Diner is in need of your assistance to help the restaurant stay afloat — the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business.
Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.
He plans on using these insights to help him decide whether he should expand the existing customer loyalty program — additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.
Danny has provided you with a sample of his overall customer data due to privacy issues — but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!
SQL Specials: Queries
Danny is a wise business ownwer who understands the value of asking the right questions, defining clear end goals and setting up the foundation of communication within his team. He has very carefully devised a set of questions, to which we will be looking out for patterns in their rich collection of data.
We’ll follow a structure as below to keep the findings and learnings uniform:
→ Q: Question/Insight
→ SQLQuery (Postgre SQL v13)
→ Findings
Q1: What is the total amount each customer spent at the restaurant?
-- Get the sum of price of each product as they occur in the sales records
select a.customer_id as Customer, sum(b.price) as Total_Spent
from dannys_diner.sales a
left join dannys_diner.menu b on a.product_id = b.product_id
group by a.customer_id
order by Total_Spent desc;
Q2: How many days has each customer visited the restaurant?
-- Count unique dates from sales record, per customer
select a.customer_id as Customer, count(distinct(a.order_date)) as Days
from dannys_diner.sales a
group by a.customer_id;
It’s important to remember, in this case, we’re looking to get unique or distinct number of days each of the customer visited, and since our base table is ‘sales’ here, we ought to remind ourselves that a person can place multiple orders on the same day of the visit.
Q3: What was the first item from the menu purchased by each customer?
-- CTE to rank items ordered by customers as per the date of order
WITH firsts AS
(
SELECT a.customer_id, a.order_date, a.product_id, b.product_name,
DENSE_RANK() OVER(PARTITION BY a.customer_id
ORDER BY a.order_date) AS rank
FROM dannys_diner.sales AS a
JOIN dannys_diner.menu AS b
ON a.product_id = b.product_id
)
-- Display results
SELECT customer_id as Customer, product_id as Product_Id, product_name as Name
FROM firsts
WHERE rank = 1
GROUP BY customer_id, product_id, product_name;
Here comes our first CTE. Since we want to see which items had a lasting first impression on our customers, we make use of the DENSE_RANK () function to give the orders a hierarchy and group them (partition by) by customer id, sorted by order date (ascending by default).
Q4: What is the most purchased item on the menu and how many times was it purchased by all customers?
-- Calculate number of times each item was ordered and sort them in descending order
select a.product_id as Prd_id, b.product_name as Prd_Name,
count(a.product_id) as purchases
from dannys_diner.sales a
join dannys_diner.menu b
on a.product_id = b.product_id
group by Prd_id, Prd_Name
order by purchases desc
-- To display only one record, use limit in Postgre and top(1) in MySQL
limit 1;
When using aggregate functions like count (), sum (), min (), max (), avg () etc., ‘group by’ clause is the most essential to include within the queries. This ensures that the aggregation calculations are based on a particular categorical feature of our dataset.
Q5: Which item was the most popular for each customer?
-- CTE to combine customer details with sales records and rank the most preferred items, customer wise
WITH favorites AS
(
SELECT
a.customer_id as Customer,
b.product_name as Product,
count(a.product_id) as purchases,
DENSE_RANK() OVER(PARTITION by a.customer_id
order by count(a.product_id) desc) as rank
FROM dannys_diner.sales a
JOIN dannys_diner.menu b on a.product_id = b.product_id
GROUP BY Customer, Product
)
-- Display results
SELECT * FROM favorites
WHERE rank = 1
Q6: Which item was purchased first by the customer after they became a member?
-- CTE that partitions and ranks customer's orders placed after they became members of the diner
WITH firsts_after_mem AS
(
SELECT
a.customer_id as customer,
b.product_id as Prd_id,
c.product_name as Prd_Name,
ROW_NUMBER() OVER(
PARTITION BY a.customer_id
ORDER BY b.order_date) AS row_num
FROM dannys_diner.members a
JOIN dannys_diner.sales b
ON a.customer_id = b.customer_id
AND b.order_date > a.join_date
JOIN dannys_diner.menu c
ON b.product_id = c.product_id
)
-- Display the records with first rank i.e first orders after becoming a member
SELECT customer, Prd_Name
FROM firsts_after_mem
WHERE row_num = 1;
Q7: Which item was purchased just before the customer became a member?
-- CTE with reverse date logic from the previous query to determine last orders as non-memebers
WITH firsts_before_mem AS
(
SELECT
a.customer_id as customer,
b.product_id as Prd_id,
c.product_name as Prd_Name,
ROW_NUMBER() OVER(
PARTITION BY a.customer_id
ORDER BY b.order_date desc) AS row_num
FROM dannys_diner.members a
JOIN dannys_diner.sales b
ON a.customer_id = b.customer_id
AND b.order_date < a.join_date
JOIN dannys_diner.menu c
ON b.product_id = c.product_id
)
-- Display the results with rank 1
SELECT customer, Prd_name
FROM firsts_before_mem
WHERE row_num = 1
Q8: What are the total items and amount spent for each member before they became a member?
-- Count the items ordered by each customer and sum their prices, by comparing the order dates and their membership starting date
SELECT
a.customer_id as customer,
count(a.product_id) as purchases,
sum(b.price) as Spent
FROM dannys_diner.sales a
JOIN dannys_diner.menu b ON a.product_id = b.product_id
JOIN dannys_diner.members c ON a.customer_id = c.customer_id
WHERE a.order_date < c.join_date
GROUP BY customer
ORDER BY customer;
Q9: If each $1 spent equates to 10 points and sushi has a 2x points multiplier — how many points would each customer have?
WITH points_earned As
(
SELECT
a.product_id as Prd_Id,
a.product_name as Prd_Name,
a.price as Price,
(
CASE when a.product_name = 'sushi' then price*20
Else price * 10
end
) as points
FROM dannys_diner.menu a
)
SELECT
a.customer_id as Customer,
sum(b.points) as Points
FROM dannys_diner.sales a
JOIN points_earned b ON a.product_id = b.Prd_Id
GROUP by customer
ORDER by customer
Q10: In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi — how many points do customer A and B have at the end of January?
WITH jan_points AS (
SELECT
DISTINCT(a.customer_id) as customer,
b.join_date as Member_Date,
b.join_date + interval '6 day'as Bonus_Pt
FROM dannys_diner.sales a
JOIN dannys_diner.members b
ON a.customer_id = b.customer_id)
SELECT
a.customer_id as Customers,
SUM(
CASE
WHEN a.order_date BETWEEN jp.Member_Date AND jp.Bonus_Pt THEN 20 * m.price
WHEN m.product_name = 'sushi' THEN 20 * m.price
ELSE 10 * m.price
END
) as Points
FROM dannys_diner.sales a
JOIN jan_points jp on a.customer_id = jp.customer
JOIN dannys_diner.menu m on a.product_id = m.product_id
WHERE a.order_date < '2021-02-01'
GROUP BY Customers;
Remember, the question says including the day of becoming a member, so when considering a week, it would be 6 days from the day of becoming a member, not 7.
Serving SQL Lessons
- Although ‘sushi’ offered attractive points and probable benefits, ‘ramen’ still remains at the top among the most popular offerings of Danny’s Diner.
- Customer B seems to be a loyal customer of the diner, whose favorites include all the items in the menu.
- Offering discounts and coupons upon the points earned by the members could be an attractive option to allure more members, thereby attracting more potential customers.
Check Please: Conclusion!
- Before diving into crafting intricate SQL queries, it's always advisable to spend time understanding the datasets, their relationships and start building small.
- Advanced methodologies like CTEs & Partition By functions make the queries structured.
- Digging up just numbers from the dataset often do not drive decisions and it is best to utilize SQL methodologies to derive patterns and trends from the dataset.
Although our dataset was hypothetical and relatively smaller in size than a real-life scenario, the learnings it offered while uncovering its mysteries was quite a lesson.