SQL Simplified — 10 advanced practice problems and solutions covering most SQL concepts

Published in

Nerd For Tech

7 min readJul 8, 2021

Using SQL effectively is a critical skill for any data scientist and is one of the most tested in interviews. In our own attempts at improving at this skill, we found an incredibly useful article by Mrinal Gupta (original article linked here) that covered 10 high quality questions with most SQL concepts tested in interviews. After the article, however difficult a practice question we tried, it paled in comparison to those 10 and it was surprising to see the sudden jump in our abilities! To best benefit from the aforementioned article, we had to do 3 preparatory steps:

Prepare practice data (In the original article, the training data wasn’t readily available. We had to manually make SQL tables & enter in the data to try our test queries).
Write down our step by step thought process (Some answer queries were very intimidating at first glance. It took a long while to figure out the step by step thought process behind them).
Outdated keywords replacement (Some SQL keywords used in the original article were outdated possibly due to version changes. We had to find replacements for the current version).

In this article, we share our preparatory steps (with the 10 key questions) for a smoother learning experience. We hope you find it as beneficial as we did!

Q1. Given two tables below, write a query to display the comparison result (higher/lower/same) of the average salary of employees in a department to the company’s average salary.

First, let’s create the necessary tables.

Now that we have our tables set up, let’s do the initial JOIN.

Step #1. JOIN

Now the two tables are joined together using the shared ‘employee_id’ column:

Step #2. Now that we have the two tables joined, consider the intermediary columns needed for our final table. Select and engineer those.

With this step, we achieve a few things. By using the DATE_FORMAT keyword, we allow our window functions to be averaged by month. Then we alias these statements for clarity. Here is the output of the above statement. If that seems a bit unclear, we suggest quickly Googling what Window Functions are, because they will show up a lot in the proceeding questions.

Step #3. Let’s convert our existing code into a sub-query & pipe into a blank final query.

We are creating the temporary tables using the WITH keyword, as this allows us to query aliases. Here’s the output from the above code:

Step #4. With that, all that’s left is to do the relevant selections & feature engineering for our final display. We’ll use the DISTINCT keyword to remove the duplicate columns from the result set and set our conditions using the CASE WHEN clause.

Now we can display the comparison result (higher/lower/same) of the average salary of employees in a department to the company’s average salary.

Q2. Write an SQL query to report the students (student_id, student_name) being “quiet” in ALL exams. A “quiet” student is the one who took at least one exam and didn’t score neither the high score nor the low score.

Solution 2:

First, let’s create the necessary tables.

Step #1. Join (We do an inner join since no interest in children with no exams)

The tables are now joined on student_id.

Step #2. Next we create all the intermediary columns needed.

Step#3. Convert previous query to a sub-query so we can access intermediary columns.

Step#4. Consider final output & use relevant ‘selecting’ & ‘filtering’ to produce.

Q3. Write a query to display the records which have 3 or
more consecutive rows and the amount of people more than 100 (inclusive).

Solution 3:

First, let’s create the necessary tables.

Step#1. Make a column that identifies 100+ streaks.

Step#2. Select other relevant intermediary columns & make into a sub-query so we can access them.

Step#3. We then make a table that counts the streak size from t1 for each streak.

Step#4. We filter out only streaks with 3+ streak_size.

Step#5. Let’s inner join this to t1 so we get all the entries of the shortlisted streak_num.

Step# 6. Then make relevant selections as needed for final output.

Q4. Write an SQL query to find how many users visited the bank and didn’t do any transactions, how many visited the bank and did one transaction and so on.

First, let’s create the necessary tables.

Step#1. Let us start off by finding the # of visits by day by user_id.

Step#2. We then calculate the number of transactions by day by user_id.

Step#3. Let us do a left join so we can maintain total visits & wrangle data together.

Step#4. We select only date column, num_visits column & replace null num_trans by 0.

Step#5. Let’s convert our previous result into a sub query then make appropriate bins using another query t2. We will have to use the RECURSIVE clause for t2 to correctly make it function.

Step#6. Join t1 & t2 appropriately, and we have our desired output!

Q5. Write an SQL query to generate a report of period_state for each continuous interval of days in the period from 2019–01–01 to 2019–12–31.

Let’s start off by making our tables.

Step#1. Let’s make a column that records each streak in the year 2019. State 1 added to simply signify success state.

Step#2. Let’s find the start & end date of each streak.

Step#3. Let’s repeat the process for failed state.

Step#4. Let’s combine t1 & t2 using UNION ALL & order by date.

Step#5. Let’s use case when to replace 1 & 0 with succeeded & failed, and we have our desired output!

Q6. Write an SQL query to report how many units in each category have been ordered on each day of the week.

First, create the schema and tables:

Step #1. Join the tables with a RIGHT JOIN, as every item will not have an order.

If you look at the inputted values for the results table, we see that ‘T-Shirt’ has no values, but all of our items in the table are there.

Step #2.1. Let’s create a temporary table and join the necessary columns from the orders and items tables and test with the 1st day of the week, Monday.

For the output, we should expect to see the sum of all items just for the Monday.

Step #2.2. Great, now that we know that works, let’s fetch all the days of the week.

Output:

Step #3. What’s wrong with the above output? We don’t want to see multiple item_categories appearing, so let’s use DISTINCT on the item_category column.

Now we can see how many units in each category have been ordered on each day of the week.

Q7. Write an SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows (order of rows does not matter).

First let’s create the schema and tables:

Step #1. We’ll need to JOIN the two tables on the respective id’s.

We have the following results to work with. What data do we care about from our joined table?

Step #2. Well, we’ll need the employee name, department name, and salary

Step #3. Great, so we have the info we’ll need in the end. Let’s turn our existing code into a subquery, and set a condition to return only top 3 values.

In IT dept, Max earns the highest salary, Randy and Joe have a tie for second, and Will earns the third highest. There are only two employees in the Sales department, Henry earns the highest salary while Sam earns the second highest salary.

Q8. Write an SQL query to compute moving average of how much customer paid in a 7 days window (current day + 6 days before) .

First let’s create the schema and ‘customers’ table we need.

Step #1. Let’s start by aggregating the amount with SUM and grouping by visited_on in order to see the window of time

So now we can see the amount of customers who visited each day.

Step #2. We need to find the rolling average. We want to know the total amount during this time period, as well as the average amount. Each 7 day period out of the 10 days should equate to 4 possibilities per day.

Step #3. In order to SELECT visited_on in this statement, we’ll need to make our first query a temporary table, as well as this one.

Now we can see the moving average and the total amount of how much a customer paid in a 7 day window (current day + 6 days before).

Q9. Write a query to find the shortest distance between these points rounded to 2 decimals.

First let’s create the table:

Step#1. First let’s use a CROSS JOIN, which is a type of join that generates a paired combination of each row of the first table with each row of the second table.

Step#2. Now that we have the cross join taken care of, let’s find the shortest distance between each coordinate from every other.

Step#3. We can clearly see from the output that the shortest distance is 1, but to keep things clean, we’ll finish this question off by sorting by ASC and limiting 1 so only the shortest distance is displayed.

The above query will output 1, which is what we expected.

Q10. Write an SQL query to find all numbers that appear at least three times consecutively.

Step#1. Since we’re interested in getting the consecutive numbers to see which appear 3x in a row, we’ll use the LAG function to fetch data from the previous row, and LEAD for the subsequent row.

Notice that the first row for prev is null, since the row from which the value (num) should be obtained doesn’t exist. The inverse is true for ‘next’.

Step#2. Let’s turn our SELECT statement into subquery to enable WINDOW function condition.

Step#3. Let’s finish this off by using DISTINCT to return the numbers that meet our condition (as we see, it’s simply 1 from the previous query).

We hope you enjoyed our article and are feeling much more confident about SQL. For next steps, we recommend just free-styling some medium & high difficulty data science SQL interview questions. Stratascratch has several such questions available in their free tier (at least at the time of writing this article). Then after that, considering projects where you can apply your new found SQL skills. Thanks again for engaging with our work. It makes us happy to know it helped someone out, like our past-selves!

Sincerely,
Duncan Anderson & Ammar Khan.
Ammar’s LinkedIn
Duncan’s LinkedIn

SQL Simplified — 10 advanced practice problems and solutions covering most SQL concepts

Written by Duncan Anderson