How I used multiple SQL Server CTEs to parameterise SELECT TOP X and make my query more scalable

David Garces
5 min readNov 16, 2014

This is my first blog post of the (hopefully) some many more to come! I would like to share my experiences while I do my job as a Software/Database developer as I feel that there are some things that the web or your mentors don’t teach you and you have to find out about yourself!

In this post I will write about CTEs (Common Table Expressions) which is a very nice tool from SQL Server to store temporary results set which you can use to manipulate data. I will use an example that will hopefully be closer to a real life situation than just writing something technical which would rather stay in the book and not be used. I will use AdventureWorks to implement my example (you can download it from here or here).

I once was approched by a friend with a basic SQL Server problem to retrieve the top x records, but with a particular requirement: he needed to be able to use x as a parameter so that the users could send a custom value.

Not having known that from SQL Server 2005 it is possible to just add the parameter using a specific syntax like this:

https://gist.github.com/gbdavid2/01c92bd8c3b7708aadc8#file-basictopsqlparam

… I decided to challenge myself and use CTEs (Common Table Expressions) to solve the problem, and as a result I actually found an interesting query that could be extended for more complex reporting of the required data.

The Problem

Imagine you are required to create an interface for your staff/clients that connects a database with customer information, orders, and some specific information about them. Your customer has asked you to create an sql query with the following requirements:

  • It will be used for a quick view link in the customer’s website that shows the top 10 customers that have placed an order recently,
  • It should show their most recent order date,
  • It should not to show duplicate value and,
  • The quick view should only show orders for the logged in member of staff.
  • The logged in member of staff will be able to increase or decrease the number of clients they want to see in their list (top x)

Let’s go step by step through a few queries until we narrow down to our final query. For illustration purposes I’m going to use AdvendureWorks2012 which you can download freely from the Microsoft website (see links below)

We should first understand our table structure:

AdvendureWorks ER diagram

Our main table with the core data is Sales.SalesOrderHeader, we will not need Sales.SalesOrderDetails as all the information we need is in the header. We will need to connect to Sales.Customer to link to the Person table so that we can retrieve the Customer’s Full Name, and Sales.SalesPerson to filter by the ID of the person that is logged in to the system and created each order.

As a first step we could try to retrieve all the data we need for say the first 10 customer orders ordered by date:

https://gist.github.com/gbdavid2/26a495d6a63e1a03fbf3#file-adventureworkssql-soh1

This query hasn’t quite solved our problem yet, we have the list of customers that recently placed an order, but because we haven’t grouped our results yet, we may show a few orders per customer (if the most recent customer placed two or more orders on the same day), So we can try this:

https://gist.github.com/gbdavid2/cf7961814899d453415a#file-adventureworkssql-soh2

Now that we have all the data we have all the data we want to filter by logged in employee, so we’ll use SalesPersonID = 289 for this example:

https://gist.github.com/gbdavid2/aa67355176bd5ac586dc#file-adventureworkssql-soh3

having finished the basic part of our problem we want to focus on transforming the top 10 rows into a parameter, so how would you do this if say you didn’t have the “(@top)” syntax?

My colleague initially suggested that an easy way to solve the problem would be to just write the query dynamically from you application, so he suggested you could just create a string and replace the parameter with a value provided by the user, something like this (c#):

https://gist.github.com/gbdavid2/1e6b4e931894b5cdcfb2#file-adventureworkscs-soh5

But this is actually not a very good idea, especially with issues relating to sql code injection and other stuff that you could imagine as a risk, so I challenged myself and thought of a solution using SQL CTEs (Common Table Expressions):

https://gist.github.com/gbdavid2/068e67b711b5f0f72670#file-adventureworkssql-soh4

My foundation of this solution is the use of RANK() and ROW_NUMBER(). The first CTE: CTE_OrderDates retrieves the data in a very similar way than our old SQL query, it gets the customer name and order date for the latest orders for the provided sales person. Additional to this, it ranks the orders by first splitting them in groups of Customer Name + Date and second ordering them by Date and Customer Name (do notice that the Partition by part is replacing the use of group by in the query).

The second CTE: CTE_CustomerDates reads the information from the first CTE and only retrieves the orders where the sales order achieved a rank of 1. This is so that we don’t have duplicate values for the same customer and the same date, it then creates a new column (PriorityRow) to have an ordered list of combinations (Client Name + Date) so that we can later select the top x from them.

Finally, we simply select the results from CTE_CustomerDates where PriorityRow is less or equal than an input parameter (@ProvidedTop), thus simulating the use of TOP for the query.

After executing this query we can achieve exactly the same results as the old query. Not surprisingly, after running an execution plan, it can be seen that SQL Server is smart enough to execute the CTE query with a slightly more efficient speed to the old query by having only 24% of the the weight of the performance for computation and sorting compared to 28% on the old query.

Conclusion

Although this example is a bit of a long-winded approach to solve a problem that could be solved with a simpler query (using the right syntax), it shows a different perspective to writing your queries and handling your logic. As mentioned above, using CTEs can lead to some benefits such as better performance and/or room for expansion (adding extra features to your query).

Many developers out there opt to handle their logic mostly from the application side, or by using the not very efficient old fashioned cursors, but in some situations it’s easier, safer, more efficient or more secure to write your logic on the database server side.

CTEs are very powerful and can both be very efficient and handle your set logic from the database side very efficiently to help you expand your queries and solve complex problems when manipulating your data.

If you have liked my story or would like to comment, provide some feedback or just talk about it, follow me on twitter

https://twitter.com/gbdavid2/status/534088385300668417

… or press Recommend ❤ if you found it useful! ☺

David.

--

--