Using environments with SQL pipelines

Josie Hall
Dataform
Published in
3 min readJul 30, 2020

A recommended practice for any software development process is the use of multiple deployment environments. Using multiple environments ensures that your code is well tested before it is deployed to production and made available to the end user.

An example setup would have three main environments:

  • Development: The development environment is your local computer. Here, is where you deploy code and test any newly implemented features. Any bugs found are dealt with before re-deploying for further testing.
  • Staging: The staging environment is as similar to the production environment as it can be. You’ll have all of the code on a server this time instead of a local machine. Any configuration changes or major version updates can be tested here.
  • Production: Every time you talk about making your project live, this is the environment you are talking about. The production environment is where users access the final code after all of the updates and testing.

At Dataform, we believe that analytics should follow software engineering best practices and therefore SQL based pipelines should be subject to the same rigorous testing process.

Environments in Dataform

By default, Dataform runs all of your project code from your project’s master Git branch. Configuring environments allows you to control this behaviour, enabling you to run multiple different versions of your project code and run schedules against them.

An environment is effectively a wrapper around a version of your project code. Once you have defined an environment, and added some schedules to that environment, Dataform runs all of those schedules using the environment’s version of the project code.

Using separate databases for development and production data

Some teams may not be at the stage where they require a staging environment, but still would like to keep development and production data separated. A clean and simple way to do this is to use a different database for each environment. This can be done using a <configOverride> in the production environment.

In the example below:

  • any code deployed during development will use <defaultDatabase> from the dataform.json file
  • schedules are defined in the production environment, and so use the <defaultDatabase>from that environment's <configOverride>
  • the production environment specifies the master Git branch, so all of its schedules will run using the latest version of the code

dataform.json:

environments.json:

Note that the <defaultDatabase> setting is only supported for BigQuery and Snowflake. For other warehouses, we recommend overriding schema suffixes (read the docs here).

Managing a production release process using environments

For more advanced Dataform projects a common use-case for environments is to run a staged release process. All changes to project code go into a staging environment which is intentionally kept separate from production. Once the code in staging has been verified to be sufficiently stable, that version of the code is then promoted to the production environment.

Read more about how to configure environments in our docs.

As always if you have any feedback on features please message us on Slack or email us at team@dataform.co

Originally published at https://dataform.co.

--

--