Lessons Learned from my own SQL coding pitfalls
Writing clear, logical, easy-to-read, and high-performance ETL pipelines in SQL is an important part of a data scientist’s day-to-day job. Recently, after reviewing multiple SQL style guides (see references), I reflected on my practices and identified the following 10 SQL best practices, more accurately, common mistakes I’ve made. I hope this can be helpful to you.
The benefits of following these SQL style best practices include:
- Improved code readability
- Reduced ambiguity
- Enhanced debugging efficiency
- Optimized code performance.
- Keywords and function names should all be lowercase.
Lower case are much easier for human to read
/* Good */
select count(*) as customers_count
from customers
/* Bad */
SELECT *
FROM customers
2. Use select distinct
instead of grouping by all columns. This makes the intention clear.
/* Good */
select distinct
customer_id
, date_trunc('day', created_at) as purchase_date
from orders
/* Bad */
select
customer_id
, date_trunc('day', created_at) as purchase_date
from orders
group by 1, 2