Subqueries in SQL Server

Advanced SQL for data Analysis - Learning Day 1

Gianpiero Andrenacci
Data Bistrot
14 min readAug 29, 2024

--

Sql Server Advanced Data Analysis

Welcome to the Advanced SQL for Data Analysis series. This educational content is designed for individuals who have a basic understanding of T-SQL and SQL Server and are looking to deepen their knowledge in data analysis and data science using SQL. In the first day of the series we’ll explore Subqueries in SQL Server for data analysis.

Prerequisites

Before diving into this series, it is expected that you have a foundational understanding of the following:

  • Basic T-SQL syntax and operations
  • Fundamental concepts of SQL Server
  • Experience with querying databases using SELECT statements
  • Understanding of JOINs, GROUP BY, and aggregate functions

Goals of This Series

The primary objectives of this series are to:

  • Enhance your SQL skills with advanced querying techniques.
  • Demonstrate best practices for writing efficient SQL code.
  • Provide practical examples that you can apply to real-world data analysis projects.
  • Explain complex concepts in a clear and understandable manner.

Structure of the Content

Each section of this series will include:

  • Detailed explanations of advanced SQL concepts.
  • Code examples using one of the AdventureWorks databases to illustrate these concepts.
  • Practical advice on implementing these techniques in your projects.

By the end of this series, you should have a solid understanding of advanced SQL techniques and be able to apply them effectively in your data analysis and data science tasks.

Let’s get started and take your SQL skills to the next level!

Databases Used

All examples and exercises in this series will utilize the following AdventureWorks databases:

  • AdventureWorks2022: An OLTP (Online Transaction Processing) database designed to handle transactional data.
  • AdventureWorksDW2022: A data warehouse version of the AdventureWorks database, optimized for analytical queries and reporting.
  • AdventureWorksLT2022: A lighter version of the AdventureWorks database, suitable for less complex queries and smaller datasets.

To install and use the AdventureWorks databases, please refer to the official Microsoft documentation at AdventureWorks Install and Configure.

This guide provides detailed instructions on how to set up the AdventureWorks databases in SQL Server, including steps for downloading, restoring, and configuring the databases using SQL Server Management Studio (SSMS).

Additionally, you may find it helpful to watch this YouTube guide: Installing AdventureWorks Databases in SQL Server, which provides a step-by-step visual walkthrough.

What is a Subquery?

In SQL Server, subqueries are a powerful tool for simplifying complex queries, performing advanced data manipulation, and enhancing the overall efficiency of your SQL code.

A subquery, often referred to as an inner query or inner select, is a query embedded within another SQL statement, which is termed the outer query or outer select. Subqueries play a crucial role in SQL, allowing more complex and dynamic data retrieval operations.

Subqueries vs. Joins

In many cases, Transact-SQL statements containing subqueries can be restructured using joins. However, there are scenarios where only subqueries can fulfill the required query logic. Generally, in Transact-SQL, there’s no significant performance difference between a statement utilizing a subquery and a semantically similar statement using a different structure.

Performance Considerations

There are situations, particularly when checking for existence, where joins can outperform subqueries. This is because a nested query has to be processed for each result of the outer query to ensure there are no duplicates, whereas a join can handle this more efficiently. Therefore, choosing the appropriate method based on the specific use case is essential for optimizing performance.

Components of a Subquery

A subquery included in an outer SELECT statement comprises the following components:

  • A standard SELECT query with the usual select list components.
  • A regular FROM clause with one or more table or view names.
  • Optional clauses such as WHERE, GROUP BY, and HAVING.

Subqueries are always enclosed in parentheses. They cannot include a COMPUTE or FOR BROWSE clause and can only have an ORDER BY clause if a TOP clause is also specified.

Placement and Nesting

Subqueries can be nested inside the WHERE or HAVING clause of outer SELECT, INSERT, UPDATE, or DELETE statements, or even within another subquery. SQL Server allows up to 32 levels of subquery nesting, although practical limits depend on memory availability and query complexity. Subqueries that return a single value can appear anywhere an expression is valid.

Types of Subqueries

Subqueries in SQL generally fall into three categories:

  1. List Operations: Introduced with IN or modified by comparison operators with ANY or ALL.
  2. Single Value Returns: Introduced with an unmodified comparison operator.
  3. Existence Tests: Introduced with EXISTS.
  4. Many other usages.

Subquery Usage Formats

Statements incorporating subqueries typically follow one of these formats:

  • WHERE expression [NOT] IN (subquery)
  • WHERE expression comparison_operator [ANY | LL] (subquery)
  • WHERE [NOT] EXISTS (subquery)

In some Transact-SQL scenarios, subqueries are evaluated as independent queries, with their results conceptually substituted into the outer query. However, this may not always represent the actual processing method used by SQL Server.

Understanding these fundamental aspects of subqueries enables more effective and efficient query construction, enhancing both the functionality and performance of SQL operations.

Subquery Examples Using AdventureWorksLT2022

Let’s explore each type of subquery with examples from the AdventureWorksLT2022 database.

Scalar Subquery

A scalar subquery filter based on a single value. It can be used in places where a single value is expected, such as in a SELECT list or a WHERE clause.

Example: Finding the average ListPrice of all products and then listing products that have a ListPrice greater than this average.

SELECT ProductID, Name, ListPrice
FROM SalesLT.Product
WHERE ListPrice > (SELECT AVG(ListPrice) FROM SalesLT.Product);

This query first calculates the average ListPrice using a subquery and then selects products whose ListPrice is greater than this average.

More Complex Scalar Subquery

We’ll find products that have a StandardCost greater than the average StandardCost, and include the average StandardCost as well as the difference between each product's StandardCost and the average StandardCost in the results.

SELECT 
ProductID,
Name,
StandardCost,
ListPrice,
AvgStandardCost = (SELECT AVG(StandardCost) FROM SalesLT.Product),
StandardCostDiff = StandardCost - (SELECT AVG(StandardCost) FROM SalesLT.Product)
FROM SalesLT.Product
WHERE StandardCost > (SELECT AVG(StandardCost) FROM SalesLT.Product)
ORDER BY StandardCost ASC;
  • AvgStandardCost: A scalar subquery calculates the average StandardCost from the SalesLT.Product table.
  • StandardCostDiff: The difference between the product’s StandardCost and the average StandardCost is calculated using a scalar subquery.
  • A scalar subquery is used to filter products with a StandardCost greater than the average StandardCost from the SalesLT.Product table.

This approach uses subqueries to both filter the products and calculate additional columns for each product in the result set, ensuring that the query is both functional and informative.

Working with Multiple Levels of Nesting in SQL Subqueries

In SQL Server, a subquery can be nested within another subquery, allowing for complex querying and data retrieval. This section will demonstrate how to implement and understand multiple levels of nested subqueries using the AdventureWorksLT2022 database.

SELECT TOP 5 *    
FROM SalesLT.Customer
WHERE CustomerID IN (
SELECT CustomerID
FROM SalesLT.SalesOrderHeader
WHERE SalesOrderID IN (
SELECT SalesOrderID
FROM SalesLT.SalesOrderHeader
WHERE TotalDue > 1000
)
)
CustomerID  NameStyle Title    FirstName                                          MiddleName                                         LastName                                           Suffix     CompanyName                                                                                                                      SalesPerson                                                                                                                                                                                                                                                      EmailAddress                                       Phone                     PasswordHash                                                                                                                     PasswordSalt rowguid                              ModifiedDate
----------- --------- -------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------- -------------------------------------------------------------------------------------------------------------------------------- ------------ ------------------------------------ -----------------------
29485 0 Ms. Catherine R. Abel NULL Professional Sales and Service adventure-works\linda3 catherine0@adventure-works.com 747-555-0171 zh3goJUbYsPv92k4bVZuJtlLHwuvpQtu6uNcjkKSdF8= rpyd5Tw= 392AE773-D7EC-48AC-B8D0-6E65B770285C 2009-05-16 16:33:33.077
29531 0 Mr. Cory K. Booth NULL Remarkable Bike Store adventure-works\linda3 cory0@adventure-works.com 121-555-0157 OfvFj11NFlpVFhEacTp7Nw+8EqwxHKIv9aJ+2ZgcUMI= XaJdIYM= 75D00AA7-752F-41C9-9C5D-899E8E3061BD 2005-12-01 00:00:00.000
29546 0 Mr. Christopher R. Beck Jr. Bulk Discount Store adventure-works\jae0 christopher1@adventure-works.com 1 (11) 500 555-0132 sKt9daCzEEKWAzivEGPOp8tmaM1R3I+aJfcBjzJRFLo= 8KfYx/4= 228A34A5-66EF-4A18-BEF2-BE74BEB84191 2006-09-01 00:00:00.000
29568 0 Mr. Donald L. Blanton NULL Coalition Bike Company adventure-works\shu0 donald0@adventure-works.com 357-555-0161 pKYDelLBOZMO98GBzhMxBSzzE0gUYKx9dXzYTYNuBgw= jKtOaOw= D8B39163-B4B4-428D-ACD7-1DF795D801DC 2006-09-01 00:00:00.000
29638 0 Ms. Rosmarie J. Carroll NULL Aerobic Exercise Company adventure-works\linda3 rosmarie0@adventure-works.com 244-555-0112 OKT0scizCdIzymHHOtyJKQiC/fCILSooSZ8dQ2Y34VM= ihWf50M= 3611A221-A20A-463A-AC19-265DB991CCC0 2007-09-01 00:00:00.000

Explanation

  1. Innermost Subquery:
  • This subquery retrieves SalesOrderID from the SalesOrderHeader table where TotalDue exceeds 1000.

2. Middle Subquery:

  • This subquery uses the result of the innermost subquery to get the CustomerID of customers who have made orders with TotalDue greater than 1000.

3. Outermost Query:

  • Finally, the outermost query fetches all details of customers whose CustomerID matches those found in the middle subquery.

Practical Considerations

  • Performance: Nested subqueries can impact performance, especially if the innermost subqueries return large datasets. Ensure that the subqueries are optimized and indexed appropriately.
  • Readability: While nested subqueries can be powerful, they can also reduce the readability of the query. Consider breaking complex queries into simpler parts or using Common Table Expressions (CTEs) for clarity (we’ll see CTE later in this series).

Correlated Subqueries in SQL

A correlated subquery (also known as a repeating subquery) is a subquery that uses values from the outer query. Unlike a regular subquery, which is executed once, a correlated subquery is executed repeatedly, once for each row selected by the outer query. This type of subquery can be particularly useful for row-by-row processing, allowing you to perform more complex filtering and calculations.

Example: Retrieving Employee Names with Specific Bonuses

This example retrieves the first and last names of employees who have a bonus of 5000, ensuring that the employee identification numbers match in the Employee and SalesPerson tables.

USE AdventureWorks2022;
GO

SELECT DISTINCT c.LastName, c.FirstName, e.BusinessEntityID
FROM Person.Person AS c
JOIN HumanResources.Employee AS e
ON e.BusinessEntityID = c.BusinessEntityID
WHERE 5000.00 IN (
SELECT sp.Bonus
FROM Sales.SalesPerson sp
WHERE e.BusinessEntityID = sp.BusinessEntityID
);
GO
LastName                                           FirstName                                          BusinessEntityID
-------------------------------------------------- -------------------------------------------------- ----------------
Ansman-Wolfe Pamela 280
Saraiva José 282

The outer query selects distinct last names (LastName), first names (FirstName), and business entity IDs (BusinessEntityID) from the Person.Person table (c) joined with the HumanResources.Employee table (e).

The WHERE clause uses a correlated subquery to filter results. The subquery checks if the Bonus in the Sales.SalesPerson table (sp) is 5000 for the matching BusinessEntityID from the outer query.

Practical Example with AdventureWorksLT2022

Let’s adapt this to the AdventureWorksLT2022 database to find customers who have placed an order with a total due greater than 1000.

USE AdventureWorksLT2022;
GO

SELECT DISTINCT c.LastName, c.FirstName, c.CustomerID
FROM SalesLT.Customer AS c
WHERE EXISTS (
SELECT 1
FROM SalesLT.SalesOrderHeader AS soh
WHERE c.CustomerID = soh.CustomerID
AND soh.TotalDue > 1000
);
GO

WHERE EXISTS (SELECT 1 FROM SalesLT.SalesOrderHeader AS soh WHERE c.CustomerID = soh.CustomerID AND soh.TotalDue > 1000): Checks if there are any orders in the SalesOrderHeader table (soh) where the TotalDue is greater than 1000 for the matching CustomerID.

Understanding Subqueries with IN in SQL Server

Subqueries introduced with IN (or NOT IN) are a powerful feature in SQL Server, allowing you to filter results based on a list of values returned by a subquery. This technique is particularly useful for filtering data based on related tables without requiring a join operation. In this guide, we will explore how to use subqueries with IN using the AdventureWorks databases.

A subquery with IN works as follows:

  1. The subquery executes first, returning a list of values.
  2. The outer query then uses this list to filter its results.

The general syntax for a subquery with IN is:

SELECT column1, column2, ...
FROM table_name
WHERE column_name IN (SELECT column_name FROM another_table WHERE condition);

In this structure:

  • The outer query selects columns from table1.
  • The subquery inside the parentheses retrieves a list of values from table2 that meet the specified condition.
  • The IN operator checks if column1 in table1 matches any of the values returned by the subquery.

“IN” Example Using AdventureWorksDW2022

Now, let’s consider scenario using the AdventureWorksDW2022 database. Suppose we want to find internet sales orders for customers located in a specific region.

USE AdventureWorksDW2022;

-- Subquery to get CustomerKeys for a specific state/province (e.g., 'New South Wales')
SELECT top 5 SalesOrderNumber, OrderDate, SalesAmount
FROM FactInternetSales
WHERE CustomerKey IN (
SELECT CustomerKey
FROM DimCustomer
WHERE GeographyKey IN (
SELECT GeographyKey
FROM DimGeography
WHERE StateProvinceName = 'New South Wales'
)
);
SalesOrderNumber     OrderDate               SalesAmount
-------------------- ----------------------- ---------------------
SO43701 2010-12-29 00:00:00.000 3399.99
SO43710 2010-12-31 00:00:00.000 3578.27
SO43716 2011-01-02 00:00:00.000 3578.27
SO43717 2011-01-02 00:00:00.000 699.0982
SO43731 2011-01-06 00:00:00.000 3578.27

Explanation

The innermost subquery retrieves the GeographyKey for the region named 'New South Wales'. The middle subquery uses the GeographyKey to find all CustomerKeys in the 'Northwest' region. The outer query filters the FactInternetSales table to include only those sales whose CustomerKey matches the list returned by the subquery.

“NOT IN” Example Using AdventureWorks2022

To find products that have never been sold, you can use the NOT IN operator:

USE AdventureWorks2022;

SELECT top 5
p.ProductID,
p.Name,
p.ProductNumber,
p.Color
FROM
Production.Product p
WHERE
p.ProductID NOT IN (
SELECT DISTINCT sod.ProductID
FROM Sales.SalesOrderDetail sod
);
ProductID   Name                                               ProductNumber             Color
----------- -------------------------------------------------- ------------------------- ---------------
1 Adjustable Race AR-5381 NULL
2 Bearing Ball BA-8327 NULL
3 BB Ball Bearing BE-2349 NULL
4 Headset Ball Bearings BE-2908 NULL
316 Blade BL-2036 NULL

This query will return products whose ProductID is not found in any sales records.

By understanding and utilizing subqueries with IN and NOT IN, you can create more flexible and powerful SQL queries for data analysis and reporting. Remember to always consider the performance implications and test your queries to ensure they run efficiently.

Using Comparison Operators with ANY, SOME, and ALL

In SQL Server, comparison operators can be enhanced with the keywords ALL, ANY, and SOME to extend their functionality in subqueries. These keywords allow for more nuanced comparisons involving sets of values.

Understanding ANY, SOME, and ALL

  • ANY: The condition is true if it is true for any value returned by the subquery. It’s equivalent to SOME.
  • SOME: The condition is true if it is true for some (at least one) of the values returned by the subquery. It’s an ISO standard equivalent to ANY.
  • ALL: The condition is true if it is true for all values returned by the subquery.

Using the > comparison operator, > ALL means greater than every value. In other words, it means greater than the maximum value.

For example, > ALL (1, 2, 3) means greater than 3. > ANY means greater than at least one value, that is, greater than the minimum. So > ANY (1, 2, 3) means greater than 1.

For a row in a subquery with > ALL to satisfy the condition specified in the outer query, the value in the column introducing the subquery must be greater than each value in the list of values returned by the subquery.

Differently, > ANY means that for a row to satisfy the condition specified in the outer query, the value in the column that introduces the subquery must be greater than at least one of the values in the list of values returned by the subquery.

Syntax and Examples

1. Using ANY

The ANY operator checks if any of the values in the subquery satisfy the condition.

Example: Find Products Priced Lower Than Any Product in a Specific Category

Suppose we want to find all products in the Production.Product table that have a list price lower than any product in the 'Bikes' category.

SELECT top 10 Name, ListPrice
FROM Production.Product
WHERE ListPrice < ANY (
SELECT ListPrice
FROM Production.Product
JOIN Production.ProductSubcategory ON Production.Product.ProductSubcategoryID = Production.ProductSubcategory.ProductSubcategoryID
JOIN Production.ProductCategory ON Production.ProductSubcategory.ProductCategoryID = Production.ProductCategory.ProductCategoryID
WHERE Production.ProductCategory.Name = 'Bikes'

)
and ListPrice > 0;
Name                                               ListPrice
-------------------------------------------------- ---------------------
LL Mountain Seat Assembly 133.34
ML Mountain Seat Assembly 147.14
HL Mountain Seat Assembly 196.92
LL Road Seat Assembly 133.34
ML Road Seat Assembly 147.14
HL Road Seat Assembly 196.92
LL Touring Seat Assembly 133.34
ML Touring Seat Assembly 147.14
HL Touring Seat Assembly 196.92
HL Road Frame - Black, 58 1431.50

2. Using ALL

The ALL operator checks if all the values in the subquery satisfy the condition.

Example: Find Products Priced Higher Than All Products in a Specific Category

Suppose we want to find all products in the Production.Product table that have a list price higher than all products in the 'Accessories' category.

SELECT top 10 Name, ListPrice
FROM Production.Product
WHERE ListPrice > ALL (
SELECT ListPrice
FROM Production.Product
JOIN Production.ProductSubcategory ON Production.Product.ProductSubcategoryID = Production.ProductSubcategory.ProductSubcategoryID
JOIN Production.ProductCategory ON Production.ProductSubcategory.ProductCategoryID = Production.ProductCategory.ProductCategoryID
WHERE Production.ProductCategory.Name = 'Accessories'
);
Name                                               ListPrice
-------------------------------------------------- ---------------------
HL Mountain Seat Assembly 196.92
HL Road Seat Assembly 196.92
HL Touring Seat Assembly 196.92
HL Road Frame - Black, 58 1431.50
HL Road Frame - Red, 58 1431.50
HL Road Frame - Red, 62 1431.50
HL Road Frame - Red, 44 1431.50
HL Road Frame - Red, 48 1431.50
HL Road Frame - Red, 52 1431.50
HL Road Frame - Red, 56 1431.50

3. Using SOME

The SOME operator functions identically to ANY and checks if any of the values in the subquery satisfy the condition.

Example: Find Products Priced Lower Than Any Product in a Specific Category

As you can see the results are identical to the Any example.

SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = SOME (
SELECT ListPrice
FROM Production.Product
JOIN Production.ProductSubcategory ON Production.Product.ProductSubcategoryID = Production.ProductSubcategory.ProductSubcategoryID
JOIN Production.ProductCategory ON Production.ProductSubcategory.ProductCategoryID = Production.ProductCategory.ProductCategoryID
WHERE Production.ProductCategory.Name = 'Clothing'
);
Name                                               ListPrice
-------------------------------------------------- ---------------------
LL Mountain Seat Assembly 133.34
ML Mountain Seat Assembly 147.14
HL Mountain Seat Assembly 196.92
LL Road Seat Assembly 133.34
ML Road Seat Assembly 147.14
HL Road Seat Assembly 196.92
LL Touring Seat Assembly 133.34
ML Touring Seat Assembly 147.14
HL Touring Seat Assembly 196.92
HL Road Frame - Black, 58 1431.50

Practical Considerations

  1. Performance: Using ANY, SOME, and ALL with subqueries can impact performance, especially with large datasets. Ensure subqueries are optimized with indexes.
  2. Data Validation: Validate the data in subqueries to ensure expected results. Misinterpreted results can occur if subqueries return unexpected values.
  3. Use Cases: These operators are particularly useful for comparing values across different categories or groups within the same table or across multiple tables.

Using ANY, SOME, and ALL with comparison operators in subqueries allows for more flexible and powerful SQL queries. These keywords enable you to perform complex comparisons and derive meaningful insights from your data.

Using Subqueries in UPDATE, DELETE, and INSERT Statements

Subqueries are often nested in UPDATE, DELETE, INSERT, and SELECT data manipulation (DML) statements. While this technique is versatile and powerful, it is particularly common in Online Transaction Processing (OLTP) applications, where transactional integrity and complex data relationships need to be managed efficiently.

Example: Doubling the ListPrice for Products Supplied by a Specific Vendor

In this example, we will use a subquery in an UPDATE statement to double the ListPrice for products supplied by a specific vendor (BusinessEntityID = 1540). The subquery in the WHERE clause ensures that only the relevant rows in the Production.Product table are updated.

-- Double the ListPrice for products supplied by BusinessEntityID 1540
UPDATE Production.Product
SET ListPrice = ListPrice * 2
WHERE ProductID IN (
SELECT pv.ProductID
FROM Purchasing.ProductVendor pv
WHERE pv.BusinessEntityID = 1540
);

The subquery returns a list of ProductIDs for products supplied by the vendor with BusinessEntityID = 1540. The main query then updates the ListPrice for only those products.

Practical Tips for Using Subqueries

  1. Performance: Subqueries can sometimes impact performance. Use them judiciously, especially in the WHERE clause. Consider using joins or other techniques if subqueries slow down your queries.
  2. Readability: Subqueries can make SQL queries more readable by breaking down complex logic into simpler parts. However, nested subqueries can become difficult to read. Always strive for a balance between simplicity and readability.
  3. Testing: Test subqueries separately before integrating them into the main query. This helps in debugging and ensures each part of the query works as expected.

Subqueries are a versatile feature in SQL Server that can help simplify complex queries and enhance data manipulation capabilities. By understanding and utilizing the different types of subqueries, you can write more efficient and readable SQL code. Practice using subqueries in your SQL projects to become proficient in leveraging their power.

Bonus: Subquery Rules

Restrictions on Subqueries

When working with subqueries in SQL, it’s essential to adhere to several restrictions to ensure proper functionality and avoid errors. Here are the key rules governing subqueries:

  1. Single Expression or Column:
  • For subqueries introduced with a comparison operator, the select list can only include one expression or column name.
  • Exceptions are made for subqueries used with EXISTS and IN, where SELECT * or a list is permissible.

2. Join Compatibility:

  • When the WHERE clause of an outer query includes a column name, it must be compatible with the column in the subquery’s select list for a valid join.

3. Unsupported Data Types:

  • The ntext, text, and image data types cannot be used in the select list of subqueries.

4. Single Value Return Requirement:

  • Subqueries introduced by an unmodified comparison operator (not followed by ANY or ALL) must return a single value.
  • These subqueries cannot include GROUP BY and HAVING clauses.

5. DISTINCT Keyword Usage:

  • The DISTINCT keyword cannot be used with subqueries that include GROUP BY.

6. COMPUTE and INTO Clauses:

  • The COMPUTE and INTO clauses are not allowed in subqueries.

7. ORDER BY Clause:

  • The ORDER BY clause can only be specified in a subquery when the TOP clause is also present.

8. View Limitations:

  • A view created using a subquery cannot be updated.

9. EXISTS Subqueries:

  • By convention, subqueries introduced with EXISTS use an asterisk (*) in the select list.
  • The rules for an EXISTS subquery are the same as those for a standard select list because an EXISTS subquery performs an existence test, returning TRUE or FALSE instead of data.

--

--

Gianpiero Andrenacci
Data Bistrot

AI & Data Science Solution Manager. Avid reader. Passionate about ML, philosophy, and writing. Ex-BJJ master competitor, national & international titleholder.