66DaysOfData challenge -DataScience Interview questions-Day29
Greetings everyone!👋
I’ve recently come across this video about how to build up a habit of learning data science. I was inspired by the author, Ken Jee, and the author of 5 Tips to Make Data Engineering a Marathon, Not a Sprint — Tim Webster, and have decided to take on this challenge. I aim to post three to four random data science interview questions from Stratascratch every week. All the questions will be coming from big tech companies like FAANG. I will be utilizing AI tools to enhance my learning speed on specific topics.
Right! Before diving into the Day 29 question, make sure you’ve done Day 28.
LET’S DIVE IN!
Company: Amazon
Question type: System Design
Question level: Medium
Job Title: Data Scientist / ML Engineer
Question:
Write a SQL query to compute a frequency table of a certain attribute involving two joins. What if you want to GROUP or ORDER BY some attribute? What changes would you need to make? How would you account for NULLs?
Suggested answer:
let’s consider a hypothetical scenario in which you have three tables: Customers
, Orders
, and Products
. Suppose you're interested in computing a frequency table of product categories based on completed orders.
The schema for these tables might look like this:
Customers
:
CustomerID
: unique identifier for customersCustomerName
: name of the customer
Orders
:
OrderID
: unique identifier for ordersCustomerID
: identifier for customer who placed the orderProductID
: identifier for the ordered product
Products
:
ProductID
: unique identifier for productsCategory
: category to which a product belongs
The SQL query to compute a frequency table for product categories could be written as follows:
SELECT P.Category, COUNT(*) as Frequency
FROM Customers C
JOIN Orders O ON C.CustomerID = O.CustomerID
JOIN Products P ON O.ProductID = P.ProductID
GROUP BY P.Category
ORDER BY Frequency DESC;
In this query, the joins connect Customers
to Orders
and Orders
to Products
. The GROUP BY
clause groups the results by the Category
attribute of the Products
table. The COUNT(*)
function then counts the number of rows in each group, effectively giving us the frequency of each product category. Finally, ORDER BY Frequency DESC
sorts the table in descending order based on the frequency count.
Adjustments for GROUP BY
or ORDER BY
:
If you wish to group by a different attribute, replace P.Category
in the GROUP BY
clause. For example, if you want to group by CustomerName
:
GROUP BY C.CustomerName
If you want to order by a different attribute, change the ORDER BY
clause. For example, to sort alphabetically by category:
ORDER BY P.Category ASC
Handling NULLs:
NULLs can be tricky to deal with in SQL queries. If any of the joined attributes contain NULL values, those rows will not be included in the output. If you’d like to include them, you can use a LEFT JOIN
:
SELECT P.Category, COUNT(*) as Frequency
FROM Customers C
LEFT JOIN Orders O ON C.CustomerID = O.CustomerID
LEFT JOIN Products P ON O.ProductID = P.ProductID
GROUP BY P.Category
ORDER BY Frequency DESC;
In this version, even if an attribute in Orders
or Products
is NULL, the join will still produce a row with NULLs in all of the columns from the table that does not have a matching row.
You can also explicitly account for NULLs in the GROUP BY
clause, utilizing the ISNULL()
or COALESCE()
function depending on your RDBMS:
SELECT COALESCE(P.Category, 'Unknown') as Category, COUNT(*) as Frequency
...
GROUP BY COALESCE(P.Category, 'Unknown')
...
Here, COALESCE()
will replace any NULL values in P.Category
with the string 'Unknown', thereby including them in the frequency count.
These are just a few of the ways to adjust the query based on your specific needs, data distribution, and acceptable levels of computational complexity.
Feel free to drop me a question or comment below.
Cheers, happy learning. I will see you tomorrow.
The data journey is not a sprint but a marathon.
Medium: MattYuChang
LinkedIn: matt-chang
Facebook: Taichung English Meetup
(I created this group four years ago for people who want to hone their English skills. Events are held regularly by our awesome hosts every week. Follow the FB group link for more information!)