SQL Data Interview Series: Subqueries in SQL

Maria-Goretti Anike
4 min readDec 6, 2023

--

Image from Stratascratch

It’s the beginning of the month, and you’ve been tasked by the CMO of your company to present the details of your stores whose sales were better than the average sales across all stores nationwide in the last month. How would you go about this?

Hello everyone. Welcome to Day 4 of the SQL Data Analyst Interview Questions series. Following our previous discussion of ROW_NUMBER, RANK, and DENSE_RANK yesterday (you can read that up here), today, we’ll be talking on Subqueries in SQL, their uses, types, and how they differ from joins.

What is a Subquery?

A subquery is a SELECT statement that returns a single value and can be nested inside a SELECT, UPDATE, INSERT, or DELETE statement. Simply put, a subquery is a query placed inside another query; it is querying from the results of another query. It is used to return data that will be used in the main query. Subqueries are always enclosed in parentheses and are also called inner/nested queries, while the query containing the subquery is called the main/outer query.
Note: We can have a subquery inside another subquery. Also, the columns from a table that are present only inside a subquery cannot be used in the SELECT list of the outer query.

Going back to our question, we have a table named ‘Sales’, which contains the Store names and the product prices. First off, we’ll get the total sales for each store, then use this to find the average sales for all the stores, then compare the two results.
Breaking this out, after (i) we sum up all prices to get the total sales:

SELECT store_name, SUM(price) total_sales
FROM sales
GROUP BY store_name

(ii) we then put this query into another query to get the average:

SELECT AVG(total_sales)
FROM (SELECT store_name, SUM(price) total_sales
FROM sales
GROUP BY store_name) avs

(iii) then compare the results of the first query to that of the second. Easy!

Another example: You’ve been asked by your manager to get the details of employees whose salaries are more than the average salary earned by all employees.
(i) find the average salary:

SELECT AVG(salary) FROM employee

(ii) select details of employees whose salaries are greater than the above:

SELECT *
FROM employee
WHERE salary > (SELECT AVG(salary) FROM employee)

Again, easy.

One more example: From our table ‘Customer Income’, let’s get the details of customers whose income are less than the average income.

Types of Subqueries

We have different types of Subqueries:
• Scalar Subquery • Multiple row Subquery • Correlated Subquery
Let’s discuss these.
(i) A scalar subquery is a subquery which will always return 1 row and 1 column.
(ii) Multiple row subquery is further divided into: (a) Subquery which returns multiple columns and multiple rows (b) Subquery which returns only 1 column and multiple rows.
(iii) Correlated subquery is a subquery which is dependent on the outer query for its values, which means that, unlike other subqueries, it cannot be executed independently of the outer query. In this, for every single record that is processed in the outer query, it will execute the inner query. This implies that if there are 50 records in the outer query, the subquery will be executed 50 times.

Uses of Subqueries:
• To add a new column to the main query result
• To create a filter
• To return data used by the main query as a condition to restrict the data retrieval further
• To form flexible SQL statements

SQL Clauses where Subqueries are Allowed

  • SELECT • FROM • WHERE • HAVING

SQL Commands which allow Subqueries

  • SELECT • INSERT • UPDATE • DELETE

Subquery or Join?

When do you know to use either a subquery or a join? Which should you choose for performance purposes?

According to MSDN (Microsoft Developer Network), in most cases, there is usually no performance difference between queries that use subqueries and equivalent queries using joins. However, in some cases where existence must be checked (using the EXISTS keyword), a join produces better performance. Otherwise, the nested query must be processed for each result of the outer query. In such cases, a join approach would yield better results.

A join is more efficient in most cases, but there are cases in which constructs other than a subquery is not possible. It is a good practice to avoid multiple levels of nested subqueries, since they are not easily readable and do not have good performance. In general, it is better to write a query with joins rather than with subqueries if possible, especially if the subqueries are correlated.

I hope you understand Subqueries better. If you enjoyed reading this, leave lots of claps and encouraging comments. Tomorrow, we’ll be discussing Stored Procedures in SQL. 🤗

GIF from GIPHY

--

--

Maria-Goretti Anike

Hey yo there 😄! I'm Maria, your favourite Data Explorer and ardent SQL devotee. I write all about Healthcare, Marketing, and Product Analytics.