Three SQL Techniques to add to your toolkit

Dan Larson
4 min readDec 29, 2022

--

DALL-E Generated Spray Paint Art

IMHO, SQL is one of the most important technical skills for data engineers, scientists and analysts to develop. I have spent countless hours perfecting SQL queries and then later interpreting them. I have found that while SQL is easy to learn, there is a steep curve with and often times there are many ways to do a single task. Here are three of my go-to techniques when writing readable SQL queries.

With Statement

As a data engineer, I am surprised to learn that many younger data professionals working in SQL aren’t familiar with the WITH statement. The WITH statement is a method of organizing complex queries with subqueries, making them easier to read. It provides a way to store intermediate query results, which can be useful in many situations. Practically speaking, you define each subquery and give it a name. You then call all of those named subqueries in a final simplified select statement. In many ways, using a WITH statement allows you to create temporary tables that you query against. In the basic example below you can see how a WITH statement is structured.

WITH 
t1 AS (SELECT * FROM table1),
t2 AS (SELECT * FROM table2)
SELECT *
FROM t1 LEFT JOIN t2
ON t1.id = t2.id;

By using the WITH statement, you can break down the query into more manageable pieces, making it easier to read and understand. An added bonus of this technique is that you can easily test each component of your final query for errors.

Windowed Functions

Another one of my go-to SQL functions is the windowed function. All too often I want to compare the value to the mean of a subgroup. One way to do this is to have two subqueries and join them together or pull the data in R/Python and do the comparison. However, you can accomplish this using a windowed function.

Windowed functions are functions that perform calculations over a set of records, called a “window.” They allow you to partition a query result into multiple categories and perform calculations on each partition.

For example, you want to compare the average test score for a particular subset to individual data points. Using a windowed function you can easily compare a single event to the mean by subgroup (or partition). The windowed functions include aggregate functions, ranking functions, and offset functions. These functions can help you quickly analyze and summarize data within a query. Below is the syntax for the average function as a windowed function.

AVG() OVER (PARTITION BY [column_name] ORDER BY [column_name])

This function is used to calculate the average value of a column over a set of records. Another example is the rank function which is used to return the rank of a value in a set of values.



Syntax: RANK () OVER (PARTITION BY [column_name] ORDER BY [column_name])

A windowed function is different from the GROUP BY clause in that it operates on a partition of the table and is not limited to just the columns in the GROUP BY clause. Additionally, windowed functions can be used to perform calculations on the entire partition, not just the columns within the GROUP BY clause. Finally, a windowed function can perform calculations over multiple partitions, whereas a GROUP BY clause is limited to a single partition.

Try Cast

Finally, TRY_CAST is a function that I use heavily when first exploring a database or new table. TRY_CAST is a function that I use to convert an expression of one data type to another data type. Unlike CAST, it does not generate an error if the conversion fails. Instead, it returns a NULL value. TRY_CAST is incredibly useful for handling data that may not be valid for a particular data type (I am talking dates in particular). It is often used to ensure data integrity and to prevent unexpected errors when dealing with data of unknown origin. I find that using TRY_CAST is especially helpful when I am in the data exploration step of a project.

SELECT TRY_CAST(column_name AS data_type)
FROM table_name;

TRY_CAST is used instead of CAST because it does not generate an error if the conversion fails. TRY_CAST is typically more efficient than CAST, as it does not require additional logic to determine whether or not the conversion succeeded. While TRY_CAST is a useful function, you should be careful using it. It is possible that it could lead to inaccurate results.

These three SQL techniques can come in handy when you are working with data in a database. Like all tools or techniques, you need to understand when they are the right tools for the job.

--

--

Dan Larson

Data Engineer, father, and sixers fan in Philadelphia.