Advanced SQL Techniques for Data Engineers
- Common Table Expressions (CTEs) and Window Functions
Using CTEs can make complex SQL queries more readable. Combine it with window functions to analyze data within a specific “window” or subset of your dataset.
Example: Find the running total of orders for each product.
WITH CTE_Orders AS (
SELECT product_id,
SUM(quantity) OVER(PARTITION BY product_id ORDER BY date) AS running_total
FROM order_table
)
SELECT p.name, c.running_total
FROM CTE_Orders c
JOIN product_table p ON c.product_id = p.id;
2. Recursive CTEs
For hierarchical data or when iterative solutions are required.
Example: In a different context, recursive CTEs could be used for finding hierarchies or parent-child relationships.
3. Pivot Tables
Transform your data from long to wide format.
Example: Show each product’s sales by date.
SELECT product_id,
SUM(CASE WHEN date = '2022–03–28' THEN quantity ELSE 0 END) AS "2022–03–28",
SUM(CASE WHEN date = '2022–03–29' THEN quantity ELSE 0 END) AS "2022–03–29",
…
FROM order_table
GROUP BY product_id;
4. Using `CASE` for Binning
Create bins/categories based on specific criteria.
Example: Categorize products by price range.
SELECT name, price,
CASE
WHEN price <= 500 THEN 'Low'
WHEN price BETWEEN 501 AND 1000 THEN 'Medium'
ELSE 'High'
END AS price_category
FROM product_table;
5. Complex Joins and Subqueries
Combining data from multiple tables.
Example: Find the total sales and category for each product.
SELECT p.name,
c.name AS category,
SUM(o.quantity) AS total_sales
FROM product_table p
JOIN category_table c ON p.category_id = c.id
LEFT JOIN order_table o ON p.id = o.product_id
GROUP BY p.name, c.name;
6. Date Functions
For dealing with date-time data.
Example: Find products sold in the last 7 days.
SELECT product_id, SUM(quantity) as total_sold
FROM order_table
WHERE date BETWEEN CURRENT_DATE - INTERVAL 7 DAY AND CURRENT_DATE
GROUP BY product_id;
7. String Functions and Pattern Matching
Working with textual data.
Example: Find all products with the word “Pro” in the name.
SELECT name
FROM product_table
WHERE name LIKE '%Pro%';
Remember, while SQL is powerful, always ensure that your database is indexed appropriately, especially when working with large datasets. Proper indexing can significantly speed up query execution times. As a data engineer, your goal is not just to write SQL but to write efficient and optimized SQL!