This article will discuss things (tips on how to) that I learned from my experience when setting up dbt Project for my team that I think can be valuable for other data engineering teams when trying these new great tools.
For those who are new to dbt, it's basically a data transformation tool that follows software engineering best practices. dbt lies in the T part of ELT (extract, transform, load) processes.
Of course (obviously, literally) dbt is an easy to use data transformation tool that requires minimal effort to set up especially for those who use cloud version (cloud dbt)…
Thanks for feedback Priyank,
First, it's a good idea to keep dbt_project.yml leaner, hence I'm just using it when it comes to setting a project-level configuration (ex: persist_docs) or schema-level configuration (ex: per schema level table_granting), and for models specific config I usually just set it on coresponding model files.
For number 2, I use both (cloud dbt and dbt-core/dbt-cli), I just separate them by environments, cloud dbt for model developments and dbt-core on production pipelines.
And answering the last question, yes, I use dbt’s lineage for model dependencies. For model execution within/by Airflow, i simply just use model Tag, so airflow just need to trigger dbt run command on specific Tag and model dependency will be handled by dbt using its data lineage.