The following configuration existed in some form in every DAG:
# my_pipeline/dags/create_crm_model.pydag = DAG(
I recently switched jobs to a company that had never employed a full time data engineer before. As soon as I got AWS credentials I wanted to get a read on the Redshift cluster.
These are some of the things I looked for straight away (in no particular order):
We are currently migrating our data warehouse to a new cluster which is GDPR compliant. Historically there hasn’t been a migration strategy in place so we don’t have a source of truth for how our schema’s are structured.
Here is a little query that generates the DDL (Data Definition Language) statements for a whole schema.
Before you run the query create this view.
-- User generated schema's are not in pg_table_def search path by default
SET SEARCH_PATH TO android;
JOIN pg_table_def ON (
admin.v_generate_tbl_ddl.schemaname = pg_table_def.schemaname AND
admin.v_generate_tbl_ddl.tablename = pg_table_def.tablename
WHERE admin.v_generate_tbl_ddl.schemaname = 'android'
GROUP BY admin.v_generate_tbl_ddl.tablename, ddl, "seq"
ORDER BY admin.v_generate_tbl_ddl.tablename ASC, "seq" ASC;
Hope this helps someone out :)
When I made the switch to data engineering I missed the flow and quick feedback of test driven development pretty quickly, this was especially apparent in the codebase I first worked on which was written in pure SQL tasks. SQL doesn’t natively have a testing framework and this made testing…