Use this updated pull request comment template for your dbt data projects

Dave Flynn
In the Pipeline
Published in
6 min readJan 26, 2024

tl;dr dbt provides an official pull request comment template for data projects, but we can make it even better.

Once your data project has multiple contributors, or you just want a better way to track changes and impacts to your project, you’ll need to use a pull request comment template to help organize your review process.

Example pull request comment sections in a dbt project

A PR comment template gives the submitter (a data/analytics engineer in our case) a structured way to validate and review changes to the project — a series of points that should be considered for each PR. It’s important for all software, but particularly tricky for data projects, which also need to validate the data generated from the modeling code.

The PR comment will help the PR reviewer to understand the modeling changes and make a decision on if it’s safe to merge the PR into prod.

Official dbt PR template

dbt provides an official GitHub PR template that includes the following sections:

  • Description & motivation
  • To-do before merge (optional)
  • Screenshots
  • Validation of models
  • Changes to existing models
  • Checklist

It’s a great template, but I think it can be even better. Here’s a few modifications we can make to it to create the perfect PR comment for your data project.

Let’s make the dbt template even better

There are a few new sections we can add to clarify the purpose of the PR and make it easier to review

Type of change

Classifying the type of change helps to frame the work, and the reviewer will know what they should look out for, and what they should and shouldn’t see.

A sensible set of change types could be as follows:

  • [ ] New model
  • [ ] Bugfix
  • [ ] Refactoring
  • [ ] Breaking change
  • [ ] Documentation
  • [ ] Other project-specific item

Related Issues

Link to any related GitHub issues or tickets that will help clarify the background of this PR and add more context to your work. dbt covers this in the ‘description’ section, but I prefer to have this in its own section.

Impact considerations

Your modeling changes have two kinds of impacts — on the model your are modifying, and then on downstream models that use this data.

In this section, include validation on how downstream models have/have not been impacted and what considerations are required, such as notifying stakeholders of potential impact to critical models/exposures. As with validation of models, use screenshots and queries to illustrate the impact.

Updated Sections

Some of the sections could also be better defined, and potential use clarified:

Screenshots → Lineage DAG/Diff

dbt calls this section ‘screenshots’ but then say it’s specifically for the DAG screenshot. It’s common to use screenshots throughout the whole PR comment as part of validating changes (see validating models below) so, let’s call this section ‘Lineage DAG’ to avoid confusion that all screenshots should be put here.

As dbt says, make sure to only post the “relevant sections from our DAG”, and if possible use a lineage DAG-Diff that highlights the changes area and types of changes.

Validation of models

The validation of models section should be your main focus, as it’s the way you prove that the intention of the PR was realized and your work is complete.

The dbt blog mentions briefly the use of this section, but focuses mainly on dbt test results, brushing over “ad-hoc query that you wrote to validate your data”. In practice, this section would likely consist mainly of ad-hoc queries and screenshot evidence of query results related to your modeling changes.

This section should include:

  • Ad-hoc queries — Spot-check queries to confirm the results are as expected
  • Profiling stats — Do you see the expected impact in the overall profile of the data

A pro-tip for performing these kinds of tests is to compare, or diff, the results of production against your development branch. Often the results of your modeling changes look correct, but it’s only when comparing to production do issues become evident. You could run:

  • Query diff — Compare ad-hoc queries between dev and prod
  • Value diff — Compare the percentage of matched rows between dev and prod
  • Profile diff — A statistical comparison of dev and prod
  • Schema diff — If the schema has changed and if it’s intended

Take screenshots of the queries you ran and the results and post them in this section.

Depending on your project, you may move the dbt test results to another section (or a checklist item) because automated testing can result in a lot of noise. It’s something you might scan over to make sure they passed, but definitely not ‘signal’ in the context of your changes.

The updated PR comment boilerplate for data projects

Here’s the updated PR comment template that you can use to adapt for your needs. As mentioned, this is adapted directly from the official dbt example, with the changes mentioned above applied.

What it looks like in action

A picture is worth a thousand words, so what does this template look like when used in an actual PR? Here’s an example PR on one version of dbt’s Jaffle Shop project that fixes the calculation for customer_lifetime_value in the customers table:

Here’s a screenshot with all of the sections expanded so you can see how they would be used in practice with screenshots of validation tests and ad-hoc queries.

The validations in this screenshot were created with Recce.

Example pull request comment sections in a dbt project

Get data modeling validations for your PR comment with Recce

You can get the modeling validation checks like in the screenshot above with Recce. The data reconnaissance tool for validating data-modeling changes in dbt projects. Here’s a Loom I recorded about it. (Skip to around 2:15 to see Recce in action).

Recce is open-source and available on GitHub now:

What does your PR comment template look like?

The updated boilerplate PR comment template represents some possible improvements to help you make better PR comments. What suits you best will depend on your project, team, and the type of work you’re doing so, of course, YMMV — adjust it as you feel necessary!

Do you use any other sections, or different methods to validate your changes?

Share a link to your PR comment template in the comments, or join the chat on LinkedIn.

Further Reading

--

--

Dave Flynn
Dave Flynn

Written by Dave Flynn

Dave is a developer advocate for DataRecce.io — the data modeling validation and PR review toolkit for dbt data projects

No responses yet