One of Dataform’s key motivations has been to bring software engineering best practices to teams building ETL/ELT pipelines. To further that goal, we recently launched support for you to run Continuous Integration (CI) checks against your Dataform projects.
What is CI/CD?
CI/CD is a set of processes which aim to help teams ship software quickly and reliably.
Continuous integration (CI) checks automatically verify that all changes to your code work as expected, and typically run before the change is merged into your Git master branch. This ensures that the version of the code on the master branch always works correctly.
Continuous deployment (CD) tools automatically (and frequently) deploy the latest version of your code to production. This is intended to minimize the time it takes for new features or bugfixes to be available in production.
CI/CD for Dataform projects
Dataform already does most of the CD gruntwork for you. By default, all code committed to the master branch is automatically deployed. For more advanced use cases, you can configure exactly what you want to be deployed and when using environments.
CI checks, however, are usually configured as part of your Git repository (usually hosted on GitHub, though Dataform supports other Git hosting providers).
How to configure CI checks
Dataform distributes a Docker image which can be used to run the equivalent of Dataform CLI commands. For most CI tools, this Docker image is what you’ll use to run your automated checks.
If you host your Dataform Git repository on GitHub, you can use GitHub Actions to run CI workflows. This post assumes you’re using GitHub Actions, but other CI tools are configured in a similar way.
Here’s a simple example of a GitHub Actions workflow for a Dataform project. Once you put this in a .github/workflows/<some filename>.yaml
file, GitHub will run the workflow on each pull request and commit to your master branch.
name: CI
on:
push:
branches:
- master
pull_request:
branches:
- master
jobs:
compile:
runs-on: ubuntu-latest
steps:
- name: Checkout code into workspace directory
uses: actions/checkout@v2
- name: Install project dependencies
uses: docker://dataformco/dataform:1.6.11
with:
args: install
- name: Run dataform compile
uses: docker://dataformco/dataform:1.6.11
with:
args: compile
This workflow runs dataform compile
- this means that if the project fails to compile, the workflow will fail, and this will be reflected in the GitHub UI.
Note that it’s possible to run any dataform
CLI command in a CI workflow. However, some commands do need credentials in order to run queries against your data warehouse. In these circumstances, you should encrypt those credentials and commit the encrypted file to your Git repository. Then, in your CI workflow, you decrypt the credentials so that the Dataform CLI can use them.
For further details on configuring CI/CD for your Dataform projects, please see our docs. As always, if you have any questions, or would like to get in touch with us, please send us a message on Slack!
Originally published at https://dataform.co.