How to customise dbt’s model naming for easy developing on production

Mark MacArdle
Data@ManyPets
Published in
4 min readDec 20, 2021

At ManyPets we use dbt to manage the SQL queries (which dbt calls models) that clean and transform the data in our warehouse. One of dbt’s many great features is that it allows you to change how models and schema are named while testing. This allows things like testing on production as you don’t have to overwrite your existing tables to see the outputs of your modified queries. In this post we’d like to share how we handle our model and schema naming when we’re testing and deploying, as well as the macro code we use to do it.

Why develop on production?

Production or prod is the environment end users interact with. If you’re from a software development background it may sound like terrible practice to want to test things out there instead of a separate development or dev environment. We do actually have and use a dev warehouse environment. It imports data from the dev databases and systems used by the software development teams.

However there are issues with this. It only contains a low volume of dummy data and, due to being used for testing, has low data quality. We don’t want to spend time handling quality issues that don’t occur on production and want to be sure our queries work at full data volume. Also our data warehouse, Google BigQuery, is fully managed and on-demand so we don’t have to worry about it getting overloaded.

dbt’s standard schema naming options

In dbt you use a profiles.yml file to specify environments you have and a default schema (called a dataset in BigQuery) for each. dbt calls this the target schema. Typically you’ll include your name so it’s different for every developer. I use dbt_markm. You also normally specify a custom schema for a model (eg stripe, zendesk, etc) to keep them organised in your warehouse. dbt’s default behaviour is to concatenate the target and custom schema names together to produce schema names like dbt_markm_stripe, dbt_markm_zendesk.

This is a great start as it means if you start modifying queries and running them they’ll be materialised in schemas just for you to use as there all prefixed with your name. However, when you’re finished testing you’ll probably want to remove your name. Thankfully dbt makes it easy to change the naming behaviour. The schema naming is controlled by an inbuilt macro called generate_schema_name() which you can override by creating a macro with the same name.

dbt provides a macro called generate_schema_name_for_env() to do a common naming approach. This macro will check the name of the environment (specified in profiles.yml) the run is happening on. If it’s prod it’ll only use the custom schema name (eg stripe), for all other environments it’ll materialise everything in the target schema. For example, if I was working on a Stripe “charges” query it’d be named:

  • stripe.charges on prod
  • dbt_markm.charges for all other environments

We tried this out for a while but found we made a lot of mistakes with not putting models in the right schemas. We sometimes even forgot to assign a schema at all and didn’t realise until we’d deployed on prod. The problem is there isn’t an easy way to see the final schema while testing.

Customising the schema and model naming macros

In our first attempt at remedying this we modified generate_schema_name() to keep the prod behaviour, but elsewhere do the original behaviour of concatenating the target and custom schema names. So that Stripe charges query would be:

  • stripe.charges on prod
  • dbt_markm_stripe.charges for all other environments

This did make it easy to see what the final schema name would be, but it ended up creating a lot of clutter in the schema list. We have a lot of data sources so even with just 3 initial dbt users it became a real hassle to have to scroll through dozens of dbt_markm_xyz schemas when looking for something.

So our final solution for testing environments is to not only modify the schema name, but also modify the model name. dbt allows this by creating a generate_alias_name() macro to override the inbuilt one. We use this to prefix the custom schema name on the table and achieve naming like:

  • stripe.charges on prod
  • dbt_markm.stripe__charges on testing environments

So while we’re testing everything gets built in our target schemas (dbt_markmfor me) and avoids the issue with cluttering the schema list.

The code for the macros we use is below. As we may test on our production as well as development environments we prefix “testing_” to the environment names (eg we have prod, testing_prod, dev and testing_dev envs). You’ll see in the macro code that the start of an environment name is checked to tell which type of environment it is.

We’re growing fast here at ManyPets 🚀. This means there’s a lot more data challenges to solve and we’d would love your help to do it 😄. See our careers page to come join us!

--

--