Let’s know more about CTE

Common Table Expressions or CTE is commonly used in SQL to make the query cleaner, makes it easier to read

3 min readMay 27, 2024

Hello, I’m Jonathan a mere data enthusiast, I’m learning while writing article that I find interesting in medium, and this is the let’s know more about series.

A thumbnail for let’s know more about: CTE — image source

Introduction

For you guys that are just getting into the world of structured query language, or SQL, I know your sins as they were also mine. You frequently make the query operate as is without considering how to make it more understandable. And believe me, it will be a living hell if you continue this type of behavior. It’s fine for the first few weeks after you write the query, but I’m talking about what happens after that. Even with a comment, you won’t fully grasp it unless you read the entire query. And I hope that the knowledge I’m about to provide will improve your habit and allow you to spare more of your sanity for something else.

What is CTE?

For starters, what is CTE? After reading the subtitle, you might assume it’s a query cleaner. Well, it’s not wrong, Common Table Expressions, or CTE, is a mechanism for creating a temporary named result set that may be used in conjunction with a larger query. This strategy allows you to break down large queries into more clear and manageable parts, enhancing readability and maintenance.

CTE act as a virtual table or I prefer to call it ‘shadow’ or ‘temporary’ table, not to confuse with VIEW (I know what it is, it’s a temporary table when you want to let others see some variable in your table), created during query execution, it’s kinda similar to subquery. What I found interesting about them is they make everything simpler, by breaking the long query down into the smaller, more manageable parts.

CTE Use Case

To explore much further I’ll use DataLemur IBM SQL Interview Question practice. So, from this practice, we are assigned to check unique queries run by employees during the third quarter of 2023 and to display it as histogram categories.

Here is my query without CTE:

SELECT 
  unique_queries, 
  COUNT(employee_id) AS employee_count 
FROM 
  (
    SELECT 
      e.employee_id, 
      CASE WHEN COUNT(DISTINCT q.query_id) IS NOT NULL THEN COUNT(DISTINCT q.query_id) ELSE 0 END AS unique_queries 
    FROM 
      employees AS e 
      LEFT JOIN queries AS q ON e.employee_id = q.employee_id 
      AND EXTRACT(YEAR FROM q.query_starttime)= 2023 
      AND EXTRACT(MONTH FROM q.query_starttime) BETWEEN 7 AND 9 
    GROUP BY 
      e.employee_id
  ) AS employee_subquery 
GROUP BY 
  unique_queries 
ORDER BY 
  unique_queries;

*PS: I purposely using CASE on the null handling in this query, but this can be improvised with COALESCE.

And now my query with CTE:

WITH eq AS (
  SELECT 
    e.employee_id,
    --this is the coalesce I mentioned before
    COALESCE(COUNT(DISTINCT q.query_id), 0) AS unique_queries 
  FROM 
    employees AS e
    LEFT JOIN queries AS q ON e.employee_id = q.employee_id 
    AND EXTRACT(YEAR FROM q.query_starttime)= 2023 
    AND EXTRACT(MONTH FROM q.query_starttime) BETWEEN 7 AND 9
  GROUP BY e.employee_id
)

SELECT
  unique_queries,
  COUNT(employee_id) AS employee_count
FROM eq
GROUP BY unique_queries
ORDER BY unique_queries;

Conclusion

See the differences? Don’t you think it’s easier to read? The output stays the same, I don’t change anything to the output, but I can make it easier to read and maintain the query. So, let’s normalize using CTE in our query!