Advanced SQL Techniques for Data Engineers

Dr. Murugavel KN
2 min readSep 8, 2023

--

  1. Common Table Expressions (CTEs) and Window Functions

Using CTEs can make complex SQL queries more readable. Combine it with window functions to analyze data within a specific “window” or subset of your dataset.

Example: Find the running total of orders for each product.


WITH CTE_Orders AS (
SELECT product_id,
SUM(quantity) OVER(PARTITION BY product_id ORDER BY date) AS running_total
FROM order_table
)
SELECT p.name, c.running_total
FROM CTE_Orders c
JOIN product_table p ON c.product_id = p.id;

2. Recursive CTEs

For hierarchical data or when iterative solutions are required.

Example: In a different context, recursive CTEs could be used for finding hierarchies or parent-child relationships.

3. Pivot Tables

Transform your data from long to wide format.

Example: Show each product’s sales by date.


SELECT product_id,
SUM(CASE WHEN date = '2022–03–28' THEN quantity ELSE 0 END) AS "2022–03–28",
SUM(CASE WHEN date = '2022–03–29' THEN quantity ELSE 0 END) AS "2022–03–29",

FROM order_table
GROUP BY product_id;

4. Using `CASE` for Binning

Create bins/categories based on specific criteria.

Example: Categorize products by price range.


SELECT name, price,
CASE
WHEN price <= 500 THEN 'Low'
WHEN price BETWEEN 501 AND 1000 THEN 'Medium'
ELSE 'High'
END AS price_category
FROM product_table;

5. Complex Joins and Subqueries

Combining data from multiple tables.

Example: Find the total sales and category for each product.


SELECT p.name,
c.name AS category,
SUM(o.quantity) AS total_sales
FROM product_table p
JOIN category_table c ON p.category_id = c.id
LEFT JOIN order_table o ON p.id = o.product_id
GROUP BY p.name, c.name;

6. Date Functions

For dealing with date-time data.

Example: Find products sold in the last 7 days.


SELECT product_id, SUM(quantity) as total_sold
FROM order_table
WHERE date BETWEEN CURRENT_DATE - INTERVAL 7 DAY AND CURRENT_DATE
GROUP BY product_id;

7. String Functions and Pattern Matching

Working with textual data.

Example: Find all products with the word “Pro” in the name.


SELECT name
FROM product_table
WHERE name LIKE '%Pro%';

Remember, while SQL is powerful, always ensure that your database is indexed appropriately, especially when working with large datasets. Proper indexing can significantly speed up query execution times. As a data engineer, your goal is not just to write SQL but to write efficient and optimized SQL!

--

--

Dr. Murugavel KN

Professor | Data Steward | Data Analytics | Data Scientist | Full Stack web Developer | SQL Developer | DBA | PMP | Business Intelligence