When I joined a project recently which used Airflow, one of the things I found that I was doing was copying the same DAG configuration whenever I needed to create a new DAG.

The following configuration existed in some form in every DAG:

# my_pipeline/dags/create_crm_model.pydag = DAG(

I recently switched jobs to a company that had never employed a full time data engineer before. As soon as I got AWS credentials I wanted to get a read on the Redshift cluster.

These are some of the things I looked for straight away (in no particular order):

Database Backups


We are currently migrating our data warehouse to a new cluster which is GDPR compliant. Historically there hasn’t been a migration strategy in place so we don’t have a source of truth for how our schema’s are structured.

Here is a little query that generates the DDL (Data Definition Language) statements for a whole schema.

Before you run the query create this view.

-- User generated schema's are not in pg_table_def search path by default

FROM admin
JOIN pg_table_def ON (
admin.v_generate_tbl_ddl.schemaname = pg_table_def.schemaname AND
.v_generate_tbl_ddl.tablename = pg_table_def.tablename
WHERE admin.v_generate_tbl_ddl.schemaname = 'android'
GROUP BY admin
.v_generate_tbl_ddl.tablename, ddl, "seq"
ORDER BY admin
.v_generate_tbl_ddl.tablename ASC, "seq" ASC;

Hope this helps someone out :)

When I made the switch to data engineering I missed the flow and quick feedback of test driven development pretty quickly, this was especially apparent in the codebase I first worked on which was written in pure SQL tasks. SQL doesn’t natively have a testing framework and this made testing…

Alex Handley

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store