Unlocking Data Team Potential: The Advanced dbt Journey

Weber Matias
Building Inventa
Published in
4 min readAug 8, 2023

How our evolution as a Data Team helped create the CoRise Advanced dbt course

Scaling your dbt project is no different than scaling your data team. They share a lot of similarities. Scaling effectively is a challenge that every team must face. As it matures, the best practices need to mature too.

We started using dbt at Inventa in May 2022. As our project quickly grew, so did the need for a common ground of best practices. Initially, we selected our linter, set up CI, and embraced code reviews. As the project continued to expand, we faced new challenges. Running the full pipeline became time-consuming and the local development workflow became resource-intensive. Ensuring documentation consistency was non-trivial. The DAG grew to a point that we needed to assign ownership over different portions to different people. We could not afford the whole team looking at the entire DAG and this made focusing impossible. We realized we needed an on-call rotation to decentralize this responsibility.

By this point we had around 1,400 models and over 3,000 tests with 14 contributors.

This is when I had the opportunity to join the Advanced dbt Team at CoRise to build this course. Motivated by what we had learned during the last year, I decided to join the team. My goal was to organize our knowledge in such a way that could be useful for other data teams. The dbt community is growing at such a pace, that many teams are now struggling with the same things that we did at our early stage.

I was excited to join Lindsay Murphy (Head of Data at Secoda) and Steve Haraguchi (Learning Design & Operations at CoRise). As a development partner, my role was to contribute to what Lindsay was building (kudos to Lindsay! 🥳). Not an easy task. But I was confident that the experience of our Data Team could provide a valuable case study for the course material.

The first line of the course outline was: “Learn: Best practices for scaling dbt projects, workflows, and documentation”. I would recommend any dbt practitioner to enroll based on this statement alone! But the similarities kept coming in Weeks 2, 3 and 4.

Soon we realized that our teams operate in very similar ways. We seek out feedback and improve upon it. These iterations result in many lessons and insightful projects.

The first step that we took at Inventa with our modern data stack was to embrace best practices. We defined our coding conventions, enforced them within the project and doubled down on documentation. This is what Week 1 of the course is all about: how do we enforce best practices within the project? How do you scale documentation? How do you enforce coding conventions?

Week 2 of Advanced dbt brings more lessons we learned the hard way at Inventa. As our codebase grew, we were applying the DRY (Don’t repeat yourself) principle through the use of Jinja and macros. We looked for existing packages before writing custom code. For example, dbt_expectations helped us increase test coverage quickly with minimal lines of code.

Managing data quality at scale was our next challenge. With expectations of continued growth, we wrote tests often and early. But our test suite started to become too expensive and slow. We needed to optimize. We also complemented our testing stack with data quality tools like Datafold. These challenges are now the focus of Week 3.

Testing was not the only part of the project that needed optimization as we matured. Our runs started taking longer, slowing down the team. To move fast in the beginning, our models were materialized as tables or views. But many were taking too long and our Snowflake bill kept climbing. We decided it was time to standardize our use of incremental models. At the same time, we optimized our Slim CI (learn more about how we did this in How we slimmed down Slim CI for dbt Cloud). In Week 4 Lindsay gets into the weeds about the importance of these practices and more!

Advanced dbt was born out of the desire of creating a game-changer course to scale data teams effectively. Leveraging our experiences at Inventa and collaborating with Lindsay and CoRise, we crafted a course that is more than dbt. It offers insights into building and scaling advanced data teams and best practices. It was so much fun to work on with Lindsay and Steve. We hope you enjoy learning from the course half as much as we enjoyed making it!

Link to the course: Advanced dbt Course at CoRise

--

--

Weber Matias
Building Inventa

I'm a passionate data scientist and marathon runner!