Let’s know more about CTE

Common Table Expressions or CTE is commonly used in SQL to make the query cleaner, makes it easier to read

Jonathan Shinray Fang
3 min readMay 27, 2024

Jonathan Shinray Fang | LinkedIn

Hello, I’m Jonathan a mere data enthusiast, I’m learning while writing article that I find interesting in medium, and this is the let’s know more about series.

A thumbnail for let’s know more about: CTE
image source

Introduction

For you guys that are just getting into the world of structured query language, or SQL, I know your sins as they were also mine. You frequently make the query operate as is without considering how to make it more understandable. And believe me, it will be a living hell if you continue this type of behavior. It’s fine for the first few weeks after you write the query, but I’m talking about what happens after that. Even with a comment, you won’t fully grasp it unless you read the entire query. And I hope that the knowledge I’m about to provide will improve your habit and allow you to spare more of your sanity for something else.

What is CTE?

For starters, what is CTE? After reading the subtitle, you might assume it’s a query cleaner. Well, it’s not wrong, Common Table Expressions, or CTE, is a mechanism for creating a temporary named result set that may be used in conjunction with a larger query. This strategy allows you to break down large queries into more clear and manageable parts, enhancing readability and maintenance.

CTE act as a virtual table or I prefer to call it ‘shadow’ or ‘temporary’ table, not to confuse with VIEW (I know what it is, it’s a temporary table when you want to let others see some variable in your table), created during query execution, it’s kinda similar to subquery. What I found interesting about them is they make everything simpler, by breaking the long query down into the smaller, more manageable parts.

CTE Use Case

To explore much further I’ll use DataLemur IBM SQL Interview Question practice. So, from this practice, we are assigned to check unique queries run by employees during the third quarter of 2023 and to display it as histogram categories.

Here is my query without CTE:

SELECT 
unique_queries,
COUNT(employee_id) AS employee_count
FROM
(
SELECT
e.employee_id,
CASE WHEN COUNT(DISTINCT q.query_id) IS NOT NULL THEN COUNT(DISTINCT q.query_id) ELSE 0 END AS unique_queries
FROM
employees AS e
LEFT JOIN queries AS q ON e.employee_id = q.employee_id
AND EXTRACT(YEAR FROM q.query_starttime)= 2023
AND EXTRACT(MONTH FROM q.query_starttime) BETWEEN 7 AND 9
GROUP BY
e.employee_id
) AS employee_subquery
GROUP BY
unique_queries
ORDER BY
unique_queries;

*PS: I purposely using CASE on the null handling in this query, but this can be improvised with COALESCE.

The Output without CTE

And now my query with CTE:

WITH eq AS (
SELECT
e.employee_id,
--this is the coalesce I mentioned before
COALESCE(COUNT(DISTINCT q.query_id), 0) AS unique_queries
FROM
employees AS e
LEFT JOIN queries AS q ON e.employee_id = q.employee_id
AND EXTRACT(YEAR FROM q.query_starttime)= 2023
AND EXTRACT(MONTH FROM q.query_starttime) BETWEEN 7 AND 9
GROUP BY e.employee_id
)

SELECT
unique_queries,
COUNT(employee_id) AS employee_count
FROM eq
GROUP BY unique_queries
ORDER BY unique_queries;
The Output using CTE

Conclusion

See the differences? Don’t you think it’s easier to read? The output stays the same, I don’t change anything to the output, but I can make it easier to read and maintain the query. So, let’s normalize using CTE in our query!

--

--

Jonathan Shinray Fang

Anything but boring! I'm a Data Enthusiast, that have a newbie mindset and want to learn everything, so I can implement it!