CTEs in SQL Server

Advanced SQL for data Analysis — Learning Day 2

Gianpiero Andrenacci
Data Bistrot
Published in
18 min readSep 5, 2024

--

Advanced SQL for data Analysis

A Common Table Expression (CTE) is a temporary result set defined within the execution scope of a single SQL statement, usually a SELECT, INSERT, UPDATE, or DELETE statement. Unlike traditional subqueries, CTEs are easier to read and maintain, making them a preferred choice for complex queries. CTEs are particularly useful for improving the readability and manageability of SQL queries, especially in data analysis and reporting.

How a CTE Works

A CTE is defined using the WITH keyword followed by a query that generates a temporary result set. This result set can be referenced by the main query that follows, treating it as if it were a regular table. However,

the CTE itself is not stored in the database and is available only during the execution of the query in which it is defined.

Advantages of Using CTEs

Improves Code Readability

CTEs make complex SQL queries easier to read and understand. By breaking down complex queries into smaller, manageable parts, each represented by a CTE, the overall structure becomes clearer. This modular approach helps SQL developers and analysts to focus on one logical part of the query at a time, reducing cognitive overload.

Simplifies Recursive Queries

Recursive queries, which are used to perform hierarchical or iterative operations, are much simpler and more readable with CTEs. A recursive CTE is one that references itself within its definition, making it ideal for navigating hierarchical data such as organizational charts or directory structures. Unlike traditional methods using loops or cursors, recursive CTEs provide a straightforward way to perform these operations.

Allows Repeated Reference of Results

A significant advantage of using CTEs is the ability to reference the same result set multiple times within a single query. This eliminates the need to repeat the same complex subquery multiple times, which not only enhances readability but also can improve performance by avoiding redundant calculations.

Facilitates Multi-Level Aggregations

CTEs are particularly useful in scenarios where multi-level aggregations are required. For example, if you need to perform several layers of aggregation, such as calculating total sales by product and then further aggregating those totals by region, CTEs allow you to stage each aggregation level in a logical order. This makes the query easier to build, test, and modify.

Supports Ranking Functions

CTEs help simplify the use of ranking functions like ROW_NUMBER(), RANK(), DENSE_RANK(). When using these functions, you often need to partition data into subsets and perform ranking operations over these partitions. CTEs provide a clean way to define these subsets and apply ranking functions, resulting in more straightforward and maintainable code.

Enhances Data Modification Queries

CTEs can be particularly useful in cases where you need to select rows or columns based on specific criteria and then perform actions like updating or deleting data. For instance, when deleting duplicates or updating a table based on certain conditions, CTEs provide a clear and concise way to define the subset of data to be modified.

Ideal for Intermediate Result Processing

In general, CTEs are beneficial for any case where you need to select a few rows or columns based on some criteria and then perform further operations with these. This flexibility makes CTEs a powerful tool for intermediate result processing, enabling you to stage data transformations or calculations before using them in the final query output.

How to Use Common Table Expressions (CTEs) in SQL Server

To illustrate how to use Common Table Expressions (CTEs), let’s look at a simple example. A CTE is defined using the WITH keyword, followed by a temporary name for the CTE and a query that generates the result set. The CTE is then referenced like a regular table or subquery in the main query.

Syntax for Creating a CTE

Here’s the basic syntax for defining a CTE:

WITH CTE_Name (Column1, Column2, ...)
AS
(
-- Your CTE query
SELECT ...
FROM ...
WHERE ...
)
-- Main query using the CTE
SELECT *
FROM CTE_Name;

When using the "USE Database" syntax in SQL, you need to prefix a Common Table Expression (CTE) with a semicolon (;) because of how SQL interprets statements.

A CTE must be the first statement in a query batch, and it begins with the WITH keyword. If the preceding statement ("USE Database") isn't explicitly terminated with a semicolon, SQL won't recognize the start of the CTE properly. The semicolon before the CTE ensures that SQL knows where the previous statement ends and that the CTE is a new, separate statement.

USE DB01;
WITH CTE_Name (Column1, Column2, ...)
AS
(
-- Your CTE query
SELECT ...
FROM ...
WHERE ...
)
-- Main query using the CTE
SELECT *
FROM CTE_Name;

CTE Common Usage Examples

All examples and exercises in this series will utilize the following AdventureWorks databases:

  • AdventureWorks2022: An OLTP (Online Transaction Processing) database designed to handle transactional data.
  • AdventureWorksDW2022: A data warehouse version of the AdventureWorks database, optimized for analytical queries and reporting.
  • AdventureWorksLT2022: A lighter version of the AdventureWorks database, suitable for less complex queries and smaller datasets.

Refer to the first article of the series to install and configure the databases:

Example: Selecting Top Performing Employees

Suppose we want to find employees from the AdventureWorksDW2022database who have made sales above a certain threshold and then perform some analysis on them. Let's define a simple CTE to select employees with total sales greater than $100,000.

USE AdventureWorksDW2022 
;WITH EmployeeSales AS (
SELECT
e.EmployeeKey,
e.FirstName,
e.LastName,
SUM(frs.SalesAmount) AS TotalSales
FROM
dbo.DimEmployee e
INNER JOIN
dbo.FactResellerSales frs ON e.EmployeeKey = frs.EmployeeKey
GROUP BY
e.EmployeeKey, e.FirstName, e.LastName
HAVING
SUM(frs.SalesAmount) > 100000 -- Filtering employees with sales over $100,000
)
SELECT
EmployeeKey,
FirstName,
LastName,
TotalSales
FROM
EmployeeSales
ORDER BY
TotalSales DESC; -- Order by total sales in descending order
EmployeeKey FirstName                                          LastName                                           TotalSales
----------- -------------------------------------------------- -------------------------------------------------- ---------------------
282 Linda Mitchell 10367007.4286
283 Jillian Carson 10065803.5429
281 Michael Blythe 9293903.0055
291 Jae Pak 8503338.6472
285 Tsvi Reiter 7171012.7514
287 Shu Ito 6427005.5556
288 José Saraiva 5926418.3574
292 Ranjit Varkey Chudukatil 4509888.933
289 David Campbell 3729945.3501
284 Garrett Vargas 3609447.2163
286 Pamela Ansman-Wolfe 3325102.5952
293 Tete Mensa-Annan 2312545.6905
295 Rachel Valdez 1790640.2311
296 Lynn Tsoflias 1421810.9252
272 Stephen Jiang 1092123.8562
290 Amy Alberts 732078.4446
294 Syed Abbas 172524.4515

Explanation of the Example

Define the CTE: The WITH clause defines a CTE named HighPerformingEmployees. This CTE selects employees who have made total sales above $100,000.

Query Inside the CTE: The inner query computes the total sales (SUM(frs.SalesAmount)) for each employee, grouped by their EmployKey, FirstName, and LastName. The HAVING clause filters employees whose total sales exceed $100,000.

Main Query: The main query simply selects all records from the HighPerformingEmployees CTE, effectively returning a list of top-performing employees.

The example demonstrates several key benefits of using a Common Table Expression (CTE).

First, it enhances readability by providing a clear and descriptive name, like “HighPerformingEmployees,” which immediately indicates the CTE’s purpose. This reduces the effort needed to understand the query’s intent.

Second, it promotes reusability; the CTE can be easily referenced multiple times in subsequent queries, ensuring consistency and reducing redundancy across the code.

Lastly, it simplifies complex operations by handling aggregation (SUM(frs.SalesAmount)) and filtering (HAVING clause) in a single, coherent step. This makes the query more straightforward and easier to maintain.

Overall, the use of a CTE in this example makes the query clearer, more efficient, and easier to work with.

Simplifying Recursive Queries with Common Table Expressions (CTEs)

Recursive queries are used to perform operations over hierarchical or self-referential data, such as organizational charts, folder structures, or graph data. With Recursive CTEs, you can express these operations more clearly and succinctly compared to using procedural code like loops or cursors. A recursive CTE is a CTE that references itself within its own definition, allowing it to process data iteratively.

Understanding Recursive CTEs

Recursive CTEs are defined in two parts:

  1. Anchor Member: The initial query that provides the starting point of the recursion.
  2. Recursive Member: The part that repeatedly executes and references the CTE itself, processing data iteratively until a termination condition is met.

Example: Traversing an Employee Hierarchy

Let’s consider an example where we need to navigate an organizational hierarchy to find all subordinates of a given employee.

In the AdventureWorksDW2022 database, the DimEmployee table represents employees, and it includes a hierarchical structure with fields like EmployeeKey (a unique identifier for each employee) and ParentEmployeeKey (which references the EmployeeKey of the employee's manager).

Here’s how we can use a Recursive CTE to retrieve all employees under a specific manager.

WITH EmployeeHierarchy AS (
-- Anchor Member: Select the top-level employees (those without a manager)
SELECT
EmployeeKey,
ParentEmployeeKey,
FirstName,
LastName,
0 AS HierarchyLevel -- Starting level
FROM
dbo.DimEmployee
WHERE
ParentEmployeeKey IS NULL -- Top-level employees

UNION ALL

-- Recursive Member: Select employees and their managers
SELECT
e.EmployeeKey,
e.ParentEmployeeKey,
e.FirstName,
e.LastName,
eh.HierarchyLevel + 1 AS HierarchyLevel
FROM
dbo.DimEmployee e
INNER JOIN
EmployeeHierarchy eh ON e.ParentEmployeeKey = eh.EmployeeKey
)
-- Final SELECT: Display the full hierarchy
SELECT
EmployeeKey,
ParentEmployeeKey,
FirstName,
LastName,
HierarchyLevel
FROM
EmployeeHierarchy
ORDER BY
HierarchyLevel, LastName;
EmployeeKey ParentEmployeeKey FirstName                                          LastName                                           HierarchyLevel
----------- ----------------- -------------------------------------------------- -------------------------------------------------- --------------
112 NULL Ken Sánchez 0
8 112 David Bradley 1
7 112 David Bradley 1
14 112 Terri Duffy 1
152 112 James Hamilton 1
23 112 Peter Krebs 1
144 112 Laura Norman 1
143 112 Laura Norman 1
44 112 Jean Trenary 1
277 112 Brian Welcker 1
294 277 Syed Abbas 2
204 23 Hazem Abolrous 2
87 23 Pilar Ackerman 2
120 44 François Ajenstat 2
290 277 Amy Alberts 2
222 152 Gary Altman 2
105 44 Dan Bacon 2
106 143 David Barber 2
32 143 Paula Barreto de Mattos 2
273 7 Wanida Benshoof 2

Explanation of the Query

Anchor Member:

  • Retrieves the top-level employees from DimEmployee where ParentEmployeeKey is NULL.
  • These employees are assigned a HierarchyLevel of 0.

Recursive Member:

  • Joins DimEmployee with the EmployeeHierarchy CTE itself.
  • This step selects employees whose ParentEmployeeKey matches the EmployeeKey of the employees already selected by the CTE, incrementing the HierarchyLevel by 1 with each iteration.
  • This recursion continues until no more employees match the condition.

Final Output:

  • Displays all employees in a hierarchical order, starting with top-level employees (HierarchyLevel = 0) and continuing to lower levels.

Benefits of Using Recursive CTEs

Recursive CTEs provide a clearer and more concise way to express recursive queries compared to procedural code like loops or cursors. This improves readability and maintainability by aligning with SQL’s declarative nature.

Recursive CTEs can leverage SQL Server’s query optimization features, such as execution plan reuse and parallel processing, which may lead to better performance compared to traditional methods.

See also: Query processing architecture guide

Overall, recursive CTEs make complex queries easier to write and understand while potentially delivering performance improvements.

Example: Repeated Reference of Results with CTE

Let’s say we want to calculate the total sales and the average sales amount for each salesperson from the AdventureWorksDW2022 database. We'll use the FactResellerSales table, which contains sales data, and the DimEmployee table, which contains employee details.

We will first calculate the total sales amount per salesperson and then use this result set to compute the average sales across all salespeople. The CTE will allow us to reference the total sales result set twice — once for displaying individual totals and once for calculating the overall average.

SQL Query Using CTE for Repeated Reference

WITH SalesPerEmployee AS (
SELECT
e.EmployeeKey,
e.FirstName,
e.LastName,
SUM(frs.SalesAmount) AS TotalSales
FROM
dbo.DimEmployee e
INNER JOIN
dbo.FactResellerSales frs ON e.EmployeeKey = frs.EmployeeKey
GROUP BY
e.EmployeeKey, e.FirstName, e.LastName
)
SELECT TOP 5
EmployeeKey,
FirstName,
LastName,
TotalSales,
(SELECT AVG(TotalSales) FROM SalesPerEmployee) AS AverageSales -- Repeated reference
FROM
SalesPerEmployee
ORDER BY
TotalSales DESC;
EmployeeKey FirstName                                          LastName                                           TotalSales            AverageSales
----------- -------------------------------------------------- -------------------------------------------------- --------------------- ---------------------
282 Linda Mitchell 10367007.4286 4732388.0577
283 Jillian Carson 10065803.5429 4732388.0577
281 Michael Blythe 9293903.0055 4732388.0577
291 Jae Pak 8503338.6472 4732388.0577
285 Tsvi Reiter 7171012.7514 4732388.0577

Explanation of the Query

CTE (SalesPerEmployee):

  • Computes the total sales (SUM(frs.SalesAmount)) for each employee from the FactResellerSales table.
  • Joins with the DimEmployee table to get employee names and other details.

Main Query:

  • Selects from the SalesPerEmployee CTE to display each employee's TotalSales.
  • Uses a subquery (SELECT AVG(TotalSales) FROM SalesPerEmployee) to calculate the average sales amount across all employees by referencing the same CTE result set.

By using a CTE, we efficiently reuse the result set for different purposes within the same query, making the SQL script cleaner and potentially improving performance. This example demonstrates how to leverage CTEs to avoid repeating the same complex logic multiple times.

Example: Multi-Level Aggregation of Sales by Product and by Region

We will use the Common Table Expressions (CTEs) to first compute the total sales by product and then the total sales by region. Finally, we’ll select both results in one comprehensive query.

SQL Query for Multi-Level Aggregation

WITH ProductSales AS (
-- Level 1: Calculate total sales by product
SELECT
p.ProductKey,
p.EnglishProductName,
SUM(frs.SalesAmount) AS TotalSalesByProduct
FROM
dbo.FactResellerSales frs
INNER JOIN
dbo.DimProduct p ON frs.ProductKey = p.ProductKey
GROUP BY
p.ProductKey, p.EnglishProductName
),
RegionSales AS (
-- Level 2: Aggregate the product sales totals by region
SELECT
st.SalesTerritoryKey,
st.SalesTerritoryRegion,
p.ProductKey,
SUM(frs.SalesAmount) AS TotalSalesByRegionAndProduct
FROM
dbo.FactResellerSales frs
INNER JOIN
dbo.DimProduct p ON frs.ProductKey = p.ProductKey
INNER JOIN
dbo.DimSalesTerritory st ON frs.SalesTerritoryKey = st.SalesTerritoryKey
GROUP BY
st.SalesTerritoryKey, st.SalesTerritoryRegion, p.ProductKey
)
-- Final Output: Display total sales by product and by region
SELECT top 10
ps.ProductKey,
ps.EnglishProductName,
ps.TotalSalesByProduct,
rs.SalesTerritoryKey,
rs.SalesTerritoryRegion,
rs.TotalSalesByRegionAndProduct
FROM
ProductSales ps
LEFT JOIN
RegionSales rs ON ps.ProductKey = rs.ProductKey
ORDER BY
ps.TotalSalesByProduct DESC, rs.TotalSalesByRegionAndProduct DESC;
ProductKey  EnglishProductName                                 TotalSalesByProduct   SalesTerritoryKey SalesTerritoryRegion                               TotalSalesByRegionAndProduct
----------- -------------------------------------------------- --------------------- ----------------- -------------------------------------------------- ----------------------------
359 Mountain-200 Black, 38 1634647.9374 6 Canada 347581.7435
359 Mountain-200 Black, 38 1634647.9374 1 Northwest 344405.2478
359 Mountain-200 Black, 38 1634647.9374 4 Southwest 264382.848
359 Mountain-200 Black, 38 1634647.9374 7 France 168480.7239
359 Mountain-200 Black, 38 1634647.9374 5 Southeast 146540.6195
359 Mountain-200 Black, 38 1634647.9374 3 Central 123929.46
359 Mountain-200 Black, 38 1634647.9374 10 United Kingdom 118151.8227
359 Mountain-200 Black, 38 1634647.9374 2 Northeast 53702.766
359 Mountain-200 Black, 38 1634647.9374 9 Australia 34424.85
359 Mountain-200 Black, 38 1634647.9374 8 Germany 33047.856

Explanation of the Query

CTE: ProductSales:

  • Computes the total sales for each product by summing up SalesAmount from the FactResellerSales table.
  • Joins with DimProduct to get product details.

CTE: RegionSales:

  • Computes total sales by both region and product.
  • Joins FactResellerSales with DimProduct and DimSalesTerritory to get product and region details.
  • Groups by both SalesTerritoryKey, SalesTerritoryRegion, and ProductKey.

Final SELECT Statement:

  • Selects from both CTEs to provide total sales by product and by region-product combination.
  • Uses a LEFT JOIN to combine the ProductSales and RegionSales based on ProductKey.

Benefits of This Approach

This query showcases how to use CTEs for multi-level aggregations to compute sales totals both by product and by region-product combination, providing a complete and detailed view of sales performance across multiple dimensions.

  • Comprehensive Output: Provides a detailed view of sales both by product and by region in one query.
  • Clear and Maintainable: Separates the logic for each aggregation level using CTEs, making the query easier to read and modify.
  • Optimized Execution: The use of CTEs helps SQL Server optimize the query by avoiding redundant calculations.

Example: Ranking Sales by Employee Using CTEs

Common Table Expressions (CTEs) can be used to simplify the application of ranking functions such as ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE(). These functions are useful when you need to assign ranks to rows based on specific criteria, often within subsets of data (partitions). Using a CTE to define these subsets makes the query easier to write, read, and maintain.

Let’s use the AdventureWorksDW2022 database to rank employees based on their total sales amount. We will use the FactResellerSales table, which contains sales data, and the DimEmployee table, which contains employee details.

Scenario

We want to assign a rank to each employee based on their total sales amount within each sales territory. The ranking should:

  • Use the ROW_NUMBER() function to assign a unique rank to each employee within their sales territory.
  • Use the RANK() function to rank employees while allowing for gaps in the ranking sequence when there are ties.
  • Use the DENSE_RANK() function to rank employees without gaps in the ranking sequence in case of ties.

SQL Query Using CTE for Ranking

WITH EmployeeSales AS (
-- Calculate total sales per employee per sales territory
SELECT
e.EmployeeKey,
e.FirstName,
e.LastName,
st.SalesTerritoryKey,
st.SalesTerritoryRegion,
SUM(frs.SalesAmount) AS TotalSales
FROM
dbo.DimEmployee e
INNER JOIN
dbo.FactResellerSales frs ON e.EmployeeKey = frs.EmployeeKey
INNER JOIN
dbo.DimSalesTerritory st ON frs.SalesTerritoryKey = st.SalesTerritoryKey
GROUP BY
e.EmployeeKey, e.FirstName, e.LastName, st.SalesTerritoryKey, st.SalesTerritoryRegion
)
-- Apply ranking functions using the CTE
SELECT top 10
EmployeeKey,
FirstName,
LastName,
SalesTerritoryKey,
SalesTerritoryRegion,
TotalSales,
ROW_NUMBER() OVER (PARTITION BY SalesTerritoryKey ORDER BY TotalSales DESC) AS RowNumRank,
RANK() OVER (PARTITION BY SalesTerritoryKey ORDER BY TotalSales DESC) AS Rank,
DENSE_RANK() OVER (PARTITION BY SalesTerritoryKey ORDER BY TotalSales DESC) AS DenseRank
FROM
EmployeeSales
ORDER BY
SalesTerritoryKey, RowNumRank;
EmployeeKey FirstName                                          LastName                                           SalesTerritoryKey SalesTerritoryRegion                               TotalSales            RowNumRank           Rank                 DenseRank
----------- -------------------------------------------------- -------------------------------------------------- ----------------- -------------------------------------------------- --------------------- -------------------- -------------------- --------------------
289 David Campbell 1 Northwest 3729945.3501 1 1 1
286 Pamela Ansman-Wolfe 1 Northwest 3325102.5952 2 2 2
293 Tete Mensa-Annan 1 Northwest 2312545.6905 3 3 3
282 Linda Mitchell 1 Northwest 2084593.1621 4 4 4
287 Shu Ito 1 Northwest 749822.897 5 5 5
272 Stephen Jiang 1 Northwest 233066.3052 6 6 6
283 Jillian Carson 2 Northeast 3920394.7721 1 1 1
281 Michael Blythe 2 Northeast 2929231.1092 2 2 2
272 Stephen Jiang 2 Northeast 83216.1335 3 3 3
283 Jillian Carson 3 Central 2939417.4488 1 1 1

(10 rows affected)

Explanation of the Query

CTE: EmployeeSales:

  • Calculates the total sales per employee for each sales territory.
  • Joins DimEmployee and FactResellerSales to get the sales data per employee.
  • Joins DimSalesTerritory to associate sales with the corresponding territory.
  • Groups by employee details and sales territory to compute the total sales.

Ranking Functions:

  • ROW_NUMBER(): Assigns a unique rank to each employee within each SalesTerritoryKey based on their TotalSales in descending order.
  • RANK(): Assigns ranks to employees within each SalesTerritoryKey, allowing gaps in the ranking sequence in case of ties.
  • DENSE_RANK(): Similar to RANK(), but does not leave gaps in the ranking sequence when there are ties.

FunctionPurposeBehavior with TiesRANK()Assigns ranks, skips ranks after tiesTied rows receive the same rank; gaps occurDENSE_RANK()Assigns ranks without skipping ranks after tiesTied rows receive the same rank; no gapsROW_NUMBER()Assigns unique, consecutive numbers to each rowNo ties considered; every row gets a unique number

Final SELECT Statement:

  • Selects from the EmployeeSales CTE and applies the ranking functions to generate the desired ranks.

Common Table Expressions (CTEs) can simplify data modification tasks, such as deleting duplicates or updating rows based on specific conditions. Using CTEs allows you to define the subset of data that needs to be modified, making the query more readable and maintainable.

Example: Deleting Duplicate Records Using CTEs

Suppose you have a table dbo.Customers in the AdventureWorks database that contains duplicate records, and you want to delete all but one of each duplicate based on a combination of FirstName, LastName, and EmailAddress.

SQL Query Using CTE to Delete Duplicates

Here’s how you can use a CTE to identify and delete duplicate records:

WITH DuplicateRecords AS (
SELECT
FirstName,
LastName,
EmailAddress,
ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, EmailAddress ORDER BY CustomerID) AS RowNum
FROM
dbo.Customers
)
-- Delete the duplicate records, keeping only the first occurrence
DELETE FROM DuplicateRecords
WHERE RowNum > 1;

Explanation of the Query

CTE: DuplicateRecords:

  • The CTE calculates a row number (RowNum) for each row within each group of duplicates (same FirstName, LastName, and EmailAddress).
  • The ROW_NUMBER() function is used with the PARTITION BY clause to group the rows and assign a unique number to each row within these groups, ordering by CustomerID to keep the first occurrence.

DELETE Statement:

  • The DELETE operation removes rows from the DuplicateRecords CTE where RowNum > 1, meaning it deletes all but the first occurrence of each duplicate record.

Benefits of Using CTEs for Data Modification

Using CTEs for data modification queries, like deleting duplicates or updating rows, enhances the clarity and maintainability of your SQL scripts, making it easier to perform targeted updates or deletions.

  • Improves Readability: Clearly separates the logic to identify the rows to modify.
  • Eases Maintenance: Simplifies complex modifications by breaking them into manageable parts.
  • Provides Flexibility: Allows for easy adjustments to the criteria for selecting rows.

Example: Processing Intermediate Results Using CTEs

Suppose we want to analyze sales data to find the top-selling products in each sales territory, along with their average sales amount, using the AdventureWorksDW2022 database. We'll perform multiple steps of data processing:

  1. Step 1: Calculate the total and average sales for each product.
  2. Step 2: Determine the top-selling product for each sales territory.

We’ll use the FactResellerSales table for sales data and the DimProduct and DimSalesTerritory tables for product and territory details, respectively.

SQL Query for Intermediate Result Processing

WITH ProductSales AS (
-- Step 1: Calculate total and average sales by product
SELECT
p.ProductKey,
p.EnglishProductName,
st.SalesTerritoryKey,
st.SalesTerritoryRegion,
SUM(frs.SalesAmount) AS TotalSales,
AVG(frs.SalesAmount) AS AverageSales
FROM
dbo.FactResellerSales frs
INNER JOIN
dbo.DimProduct p ON frs.ProductKey = p.ProductKey
INNER JOIN
dbo.DimSalesTerritory st ON frs.SalesTerritoryKey = st.SalesTerritoryKey
GROUP BY
p.ProductKey, p.EnglishProductName, st.SalesTerritoryKey, st.SalesTerritoryRegion
),
TopSellingProducts AS (
-- Step 2: Determine the top-selling products for each territory, including AverageSales
SELECT
SalesTerritoryKey,
SalesTerritoryRegion,
ProductKey,
EnglishProductName,
TotalSales,
AverageSales,
RANK() OVER (PARTITION BY SalesTerritoryKey ORDER BY TotalSales DESC) AS SalesRank
FROM
ProductSales
)
-- Final Output: Display top-selling products for each region
SELECT
SalesTerritoryRegion,
EnglishProductName,
TotalSales,
AverageSales
FROM
TopSellingProducts
WHERE
SalesRank = 1 -- Filter to get only the top-selling product for each region
ORDER BY
SalesTerritoryRegion;
SalesTerritoryRegion                               EnglishProductName                                 TotalSales            AverageSales
-------------------------------------------------- -------------------------------------------------- --------------------- ---------------------
Australia Touring-1000 Yellow, 60 139229.688 2900.6185
Canada Mountain-200 Black, 38 347581.7435 5037.4165
Central Mountain-100 Silver, 42 157232.8775 8275.4146
France Mountain-200 Black, 38 168480.7239 5105.4764
Germany Touring-1000 Blue, 60 154636.502 6185.46
Northeast Road-350-W Yellow, 48 168621.6902 4437.4129
Northwest Mountain-100 Black, 44 359233.5981 6652.474
Southeast Mountain-100 Black, 38 304128.449 7417.767
Southwest Mountain-200 Black, 38 347030.3485 4887.7513
United Kingdom Mountain-200 Black, 38 118151.8227 4544.3008

Explanation of the Query

CTE: ProductSales

  • Calculates Total and Average Sales: Aggregates total and average sales (SUM and AVG) for each product by SalesTerritoryKey and SalesTerritoryRegion.
  • Joins FactResellerSales with DimProduct and DimSalesTerritory to include product and sales territory details.

CTE: TopSellingProducts

  • Determines the Top-Selling Products: Uses the RANK() function to rank products within each sales territory based on their TotalSales.
  • Assigns a rank to each product within its respective territory (PARTITION BY SalesTerritoryKey).

Final SELECT Statement:

  • Filters the results to show only the top-selling product for each region (WHERE SalesRank = 1).

IN this example we have seen that by using CTEs for intermediate result processing, you can effectively stage data transformations or calculations, making it easier to perform complex queries in a structured and manageable way. This example demonstrates how CTEs can simplify multi-step data processing to achieve a desired final output.

Conclusion: The Versatility and Power of Common Table Expressions (CTEs) in SQL Server

As we have explored throughout this article, Common Table Expressions (CTEs) in SQL Server are not just a syntactic convenience but a powerful feature that brings significant benefits to SQL developers and data analysts. CTEs improve the readability of complex SQL queries

by breaking them down into smaller, more manageable parts, allowing for a modular approach that simplifies both understanding and maintenance.

This alone can lead to more efficient and effective data analysis, but the advantages of CTEs extend far beyond just clarity.

When it comes to recursive queries, CTEs provide a natural and intuitive way to navigate hierarchical data structures, such as organizational charts or directory paths. Unlike traditional procedural methods, which might rely on loops or cursors, recursive CTEs allow for elegant solutions that are both simpler to write and easier to understand.

The ability to repeatedly reference a result set within a single query is another compelling reason to use CTEs. This feature eliminates the need to duplicate complex subqueries, reducing redundancy and potentially enhancing query performance. This repeated referencing capability becomes particularly useful in scenarios requiring multi-level aggregations, where CTEs help stage different levels of calculation in a logical and orderly fashion. By doing so, they enable more efficient data processing and make the SQL queries easier to test and modify.

Furthermore, CTEs simplify the use of ranking functions such as ROW_NUMBER(), RANK(), and DENSE_RANK(), providing a clean way to partition data into subsets and apply these functions in a straightforward manner. This leads to clearer and more maintainable code, crucial for complex analytical tasks.

For data modification, CTEs shine by offering a concise and readable way to handle tasks such as updating rows based on certain criteria or deleting duplicate records. They make it easy to define subsets of data that need modification, thereby streamlining the overall process.

Finally, CTEs are ideal for intermediate result processing. Whenever there is a need to perform multiple transformations or calculations in stages, CTEs provide the flexibility to handle these intermediate steps efficiently, laying the groundwork for more complex data manipulations and analyses.

Guidelines for Creating and Using Common Table Expressions (CTEs)

When working with non-recursive CTEs, consider the following guidelines to ensure proper usage:

  1. Single Statement Requirement: A CTE must be immediately followed by a single SELECT, INSERT, UPDATE, or DELETE statement that utilizes one or more of the CTE's columns. Additionally, a CTE can be included in a CREATE VIEW statement, where it serves as part of the view's defining SELECT statement.
  2. Multiple CTEs in a Single Clause: You can define multiple CTEs in the same WITH clause by combining them using set operators such as UNION ALL, UNION, INTERSECT, or EXCEPT. However, the CTEs must be defined sequentially; a CTE can reference itself or any previously defined CTEs within the same WITH clause, but forward referencing is not permitted.
  3. Restrictions on Nested WITH Clauses: A CTE definition cannot contain more than one WITH clause. This means that if a CTE includes a subquery, that subquery cannot have its own nested WITH clause to define another CTE.
  4. Prohibited Clauses: Certain clauses are not allowed within a CTE:
  • ORDER BY (except when used with a TOP clause).
  • INTO.
  • The OPTION clause with query hints.
  • FOR BROWSE.

6. Semicolon Requirement: If a CTE is used in a batch of SQL statements, ensure the preceding statement ends with a semicolon (;). This is necessary to signal the end of the prior statement and correctly interpret the start of the CTE.

CTEs and Cursors: A query that references a CTE can be used to define a cursor.

Remote Server Tables: Tables located on remote servers can be referenced within a CTE.

Handling Conflicting Hints: If a query uses hints that reference a CTE, and those hints conflict with other hints related to the underlying tables accessed by the CTE (similar to how hints work with views), an error will occur during query execution.

These guidelines ensure that your use of CTEs is both correct and effective, allowing you to take full advantage of their capabilities while avoiding common pitfalls. For guidelines specific to recursive CTEs, refer to the next section on “Guidelines for Defining and Using Recursive Common Table Expressions.”

If you enjoyed this piece, please clap, follow for more, and share with those who might benefit — your support helps keep me writing!

--

--

Data Bistrot
Data Bistrot

Published in Data Bistrot

Data Bistrot is a cozy corner on Medium where I share my personal journey through the world of data science, data analysis, Python programming for data science, artificial intelligence (AI), and machine learning (ML).

Gianpiero Andrenacci
Gianpiero Andrenacci

Written by Gianpiero Andrenacci

AI & Data Science Solution Manager. Avid reader. Passionate about ML, philosophy, and writing. Ex-BJJ master competitor, national & international titleholder.

No responses yet