66DaysOfData challenge -DataScience Interview questions-Day29

Matt Chang
3 min readOct 3, 2023

--

Greetings everyone!👋

I’ve recently come across this video about how to build up a habit of learning data science. I was inspired by the author, Ken Jee, and the author of 5 Tips to Make Data Engineering a Marathon, Not a SprintTim Webster, and have decided to take on this challenge. I aim to post three to four random data science interview questions from Stratascratch every week. All the questions will be coming from big tech companies like FAANG. I will be utilizing AI tools to enhance my learning speed on specific topics.

Right! Before diving into the Day 29 question, make sure you’ve done Day 28.

LET’S DIVE IN!

Photo by Brian Erickson on Unsplash

Company: Amazon

Question type: System Design

Question level: Medium

Job Title: Data Scientist / ML Engineer

Question:

Write a SQL query to compute a frequency table of a certain attribute involving two joins. What if you want to GROUP or ORDER BY some attribute? What changes would you need to make? How would you account for NULLs?

Suggested answer:

let’s consider a hypothetical scenario in which you have three tables: Customers, Orders, and Products. Suppose you're interested in computing a frequency table of product categories based on completed orders.

The schema for these tables might look like this:

Customers:

  • CustomerID: unique identifier for customers
  • CustomerName: name of the customer

Orders:

  • OrderID: unique identifier for orders
  • CustomerID: identifier for customer who placed the order
  • ProductID: identifier for the ordered product

Products:

  • ProductID: unique identifier for products
  • Category: category to which a product belongs

The SQL query to compute a frequency table for product categories could be written as follows:

SELECT P.Category, COUNT(*) as Frequency
FROM Customers C
JOIN Orders O ON C.CustomerID = O.CustomerID
JOIN Products P ON O.ProductID = P.ProductID
GROUP BY P.Category
ORDER BY Frequency DESC;

In this query, the joins connect Customers to Orders and Orders to Products. The GROUP BY clause groups the results by the Category attribute of the Products table. The COUNT(*) function then counts the number of rows in each group, effectively giving us the frequency of each product category. Finally, ORDER BY Frequency DESC sorts the table in descending order based on the frequency count.

Adjustments for GROUP BY or ORDER BY:

If you wish to group by a different attribute, replace P.Category in the GROUP BY clause. For example, if you want to group by CustomerName:

GROUP BY C.CustomerName

If you want to order by a different attribute, change the ORDER BY clause. For example, to sort alphabetically by category:

ORDER BY P.Category ASC

Handling NULLs:

NULLs can be tricky to deal with in SQL queries. If any of the joined attributes contain NULL values, those rows will not be included in the output. If you’d like to include them, you can use a LEFT JOIN:

SELECT P.Category, COUNT(*) as Frequency
FROM Customers C
LEFT JOIN Orders O ON C.CustomerID = O.CustomerID
LEFT JOIN Products P ON O.ProductID = P.ProductID
GROUP BY P.Category
ORDER BY Frequency DESC;

In this version, even if an attribute in Orders or Products is NULL, the join will still produce a row with NULLs in all of the columns from the table that does not have a matching row.

You can also explicitly account for NULLs in the GROUP BY clause, utilizing the ISNULL() or COALESCE() function depending on your RDBMS:

SELECT COALESCE(P.Category, 'Unknown') as Category, COUNT(*) as Frequency
...
GROUP BY COALESCE(P.Category, 'Unknown')
...

Here, COALESCE() will replace any NULL values in P.Category with the string 'Unknown', thereby including them in the frequency count.

These are just a few of the ways to adjust the query based on your specific needs, data distribution, and acceptable levels of computational complexity.

Feel free to drop me a question or comment below.

Cheers, happy learning. I will see you tomorrow.

The data journey is not a sprint but a marathon.

Medium: MattYuChang

LinkedIn: matt-chang

Facebook: Taichung English Meetup

(I created this group four years ago for people who want to hone their English skills. Events are held regularly by our awesome hosts every week. Follow the FB group link for more information!)

--

--