Centralize and Serve your dbt Documentation in Google Cloud

A comprehensive guide to securely deploy and update your dbt documentation within Google Cloud using Cloud Build, App Engine, and Identity Aware Proxy (IAP).

Axel Thevenot 🐣
Google Cloud - Community

--

Let’s be honest, documentation often takes a back seat to other immediate tasks. However, overlooking documentation can result in confusion and inefficiency down the road. Prioritizing the documentation deployment of your dbt projects within Google Cloud is an investment that yields long-term benefits.

Whether you are managing a mono-repository or multiple repositories for dbt, it is a best practice to centralize your documentation. This is the idea which will be our starting point in this tutorial.

This guide will to assist you in setting up your documentation within Google Cloud following industry best practices, saving you hours of research. So regardless if you have adopted a centralized or decentralized approach in your dbt architecture, this guide is all yours!

Note: I am not an “evil” advocate of manual deployment. But some parts of this guide will be deployed manually as it does not require fully automated things. Feel free to use Terraform or other deployment tools according to your needs (but it is really unnecessary).

Summary

Step 1: Identify your need — mono or multi repository
Step 2: Set up your dbt project for documentation (multi-repo only)
Step 3: Create an App Engine application
Step 4: Enhance Security with Identity Aware Proxy
Step 5: Automate Deployment with Cloud Build
Step 6: Schedule and Trigger your Documentation updates (multi-repo only)
Step 7: (bonus) Improve your documentation

Step 1: Identify your need — mono or multi repository

As mentioned earlier, it is optimal to centralize your dbt documentation, even if your dbt projects are decentralized across domains, teams, or layers in multiple repositories.

If you have a single dbt project (mono-project), this tutorial will guide you in building the following architecture to deploy, serve, and secure your dbt documentation.

dbt documentation architecture for mono-repository (image from author)

This architecture simplifies deployment, serving, and securing of documentation, ensuring consistency and ease of management. Using App Engine, the documentation is deployed as a scalable web application, with an extra layer of security with Identity Aware Proxy.

If your use case is more complex, consider the following architecture.

dbt documentation architecture for multi-repository (image from author)

For the multi-repository architecture, documentation from multiple independent dbt repositories is centralized into a single project. This setup uses Google Cloud Pub/Sub and Cloud Build services for efficient integration and automation of documentation updates. With Pub/Sub, custom triggers can be implemented for scheduled or event-based documentation updates, while Cloud Build automates the deployment process.

This architecture is recommended if you meet any of the following criteria:

  • You manage multiple independent dbt repositories.
  • You want to build documentation based on custom triggers (scheduled or perhaps triggered at the end of your DAG runs to update metadata).
  • You currently have a mono-repository but anticipate a future split and want to prepared.

Take the time to determine which architecture suits your needs before following this tutorial. I will refer to the first architecture as the “mono-repo” and the second architecture as the “multi-repo.”

Note: Certain parts of this tutorial will be specific to the multi-project architecture.

Note: You can organize your projects as you want. As long as one project centralizes all projects as packages, you will be able to complete this tutorial.

Step 2: Set up your dbt project for documentation (multi-repo only)

For a multi-repo setup, create an additional dbt project to centralize documentation from other projects/repositories

I will assume you use GitHub but it will be the same for other Repository managers. Start by creating a new empty repository. And clone it locally.

git clone <your-repository-url>
cd <your-repository>

# Create empty files to complete during this part.
touch dbt_project.yml
touch profiles.yml
touch packages.yml

# For nexts parts
touch app.yaml
touch cloudbuild.yaml
mkdir documentation && touch documentation/overview.md

In your dbt_project.yml file, copy the following.

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_documentation'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_documentation'

docs-paths: ['documentation'] # For bonus part.

clean-targets: # directories to be removed by `dbt clean`
- 'target'
- 'dbt_packages'

Then you can fill your profiles.yml file using an existing connection profile you have (running in your terminal cat ~./dbt/profiles.yml ) or creating one according to your current data platform.

Here is simple example for a BigQuery connection.

dbt_documentation:
outputs:
dev:
dataset: dbt_temp
job_execution_timeout_seconds: 300
job_retries: 0
location: EU
method: oauth
priority: interactive
project: <project-id>
threads: 1
type: bigquery
target: dev

And to finish, just copy and paste the following in your packages.yml file.

# Set here the dependencies containing all your repositories
# To configure your packages: https://docs.getdbt.com/docs/build/packages
packages:
- package: AxelThevenot/dbt_assertions
version: 1.2.0
- package: AxelThevenot/dbt_star
version: 0.1.1

Note: No need to fill with your packages yet. You can do it later when the documentation is set and fully automated.

Note: Those two are my packages (auto-promo hehe!). See dbt_assertions and dbt_star.

Before committing your changes, you can validate the following is running without any error.

dbt deps && dbt compile && dbt clean

Step 3: Create an App Engine application

Now comes the exciting part — deploying your dbt documentation using Google Cloud App Engine. App Engine provides a scalable platform for hosting web applications and services, making it an ideal choice for serving dbt documentation.

First, create a app.yaml file at the root of your repository with the following content.

service: default
runtime: python311
handlers:
# Maps the root URL (/) to the static file target/index.html.
# It specifies that the file should be both served statically and uploaded.
- url: /
static_files: target/index.html
upload: target/index.html
# Specifies that any URL matches should look for files in the target directory.
- url: /
static_dir: target
# Fallback Handler
# Matches all other URLs (/.*) to your application.
# It enforces secure connections and issues a permanent redirect (redirect_http_response_code: 301).
# The `script: auto` ask to App Engine to automatically determine how to handle the request.
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto

Then it is time to create your application. It is a one-time set up so you can do it manually from your Console or running the gcloud app create command.

To create your App Engine application from the console:

When the application is up and running, you can run the following.

# Generate the documentation.
dbt clean && dbt deps && dbt docs generate

# Deploy the documentation.
gcloud app deploy

Note: To deploy on App Engine, you must have the App Engine Deployer and Storage Object User roles. (IAM on the Console)

You can validate everything is running as expected using the following command to redirect you to the App Engine service you just created.

gcloud app browse

Note: At this point, your documentation is publicly accessible!

Step 4: Enhance Security with Identity Aware Proxy

Security should always be a top priority. Integrate Google Cloud Identity-Aware Proxy (IAP) to add an extra layer of security to your dbt documentation. With IAP, you can configure a consent screen, and bind roles to authorized users/groups in the IAM Console to ensure only authorized users can view sensitive information.

  • Go to App Engine Settings.
  • Under the Identity-Aware Proxy section, click on CONFIGURE NOW.
  • Click on ENABLE API if Identity-Aware Proxy API is not yet enabled.
  • Click on GOT TO IDENTITY-AWARE PROXY. It will redirect you to the IAP Console
  • Click on CONFIGURE CONSENT SCREEN so you can use IAP with Oauth consent screen.
  • Click on “external” or “internal” according to your needs.
  • Give a name (dbt_documentation for instance) to your consent screen and fill the required emails fields. Leave everything else as default.

When IAP is enabled and configured, add this security layer to your App Engine application.

  • Go back to the IAP page on the Google Cloud Console.
  • Locate the toggle button on the App Engine app row and switch it on, then click TURN ON.

At this point nobody is authorized to access the documentation served by App Engine (including you).

  • Click on App Engine app row to open the IAM side panel.
  • Click on ADD PRINCIPAL.
  • Bind the IAP-secured Web App User role to the groups, domains or members of your choice.

Now everything is running and is secured through IAP. You can verify your application is secured with an Incognito Window.

Step 5: Automate Deployment with Cloud Build

It is time to automate your deployment process using Cloud Build saving time and effort. With Cloud Build, you can automatically build, test, and deploy your dbt documentation whenever changes are made, or whenever you want with your custom trigger and events.

Simple as that, create a cloudbuild.yaml file with the following content which reproduces what we made from our terminal just above.

steps:
- id: '[dbt] Generate documentation'
name: 'ghcr.io/dbt-labs/dbt-bigquery:1.7.2'
entrypoint: /bin/bash
args:
- -ceux
- |
# cd custom/dbt/documentation/folder/
dbt clean
dbt deps
dbt docs generate # add "--empty-catalog" flag if you don't need metadata.

- id: '[App Engine] Deploy documentation'
name: 'gcr.io/cloud-builders/gcloud'
entrypoint: /bin/bash
args:
- -ceux
- |
# cd custom/appengine/configuration/folder/
gcloud app deploy

And then, only remains the Cloud Build Triggers creation.

  • Go to Cloud Build Triggers.
  • Click on + CREATE TRIGGER.
  • Give it a name (dbt-documentation-trigger for instance) and select your preferred region.
  • (mono-repo architecture) Under the Event section, select Push to a branch.
  • (multi-repo architecture) Under the Event section, select Pub/Sub message. Under the Subscription section, click on the topic selection and CREATE TOPIC. Give it a name (dbt-documentation-topic for instance).
  • Under Source section, connect to your dbt project repository clicking on CONNECT NEW REPOSITORY and following the side panel indication according to your source code management provider (GitHub, BitBucket, …)
  • (mono-repo architecture) Under the Source > Branch section, set ^main$ according to your production branch name.
  • (multi-repo architecture) Under the Source > Revision section, select branch revision and set main according to your production branch name.
  • You can leave everything else as default.
  • Click on CREATE.

Note: You can have one trigger on push event on your branch and one trigger based on PubSub event at the same time. If both are needed, you can simply repeat the process above.

Now everything is configured except for the role binding. You can go to the IAM Console and give to your Cloud Build Service Account the following roles:

  • App Engine Deployer
  • App Engine Service Admin
  • Storage Object User
  • BigQuery Data Viewer (depending on your underlying database)

Note: The default service account for Cloud Build is <project-number>@cloudbuild.gserviceaccount.com.

You can trigger your Cloud Build Trigger manually from the Console to see if everything is working as expected.

Step 6: Schedule and Trigger your Documentation updates (multi-repo only)

Now everything is perfect. You are free to trigger your documentation in thousands of ways including from:

  • other Cloud Build deployments on your domains when the production is updated.
  • your daily job running in Cloud Composer or Cloud Workflows.
  • Cloud Scheduler with custom schedule.
  • any custom API POST to your Pub/Sub topic.
dbt documentation architecture for multi-repository (image from author)

Of course I can not provide you every code for every use case but let me give you an example of:

  • a daily Cloud Scheduler.
gcloud scheduler jobs create pubsub dbt-documentation-daily-job \
--schedule="0 9 * * *" \
--topic="dbt-documentation-topic" \
--message-body="{}"
  • a programatic Pub/Sub message publication you can add at the end of your Cloud Build triggers of you domain for instance.
gcloud pubsub topics publish dbt-documentation-topic --message="{}"

Step 7: (bonus) Improve your documentation

To improve your documentation, nothing can beat a good landing page! The default dbt landing page you can see in your dbt documentation is not really helpful for your users.

You can think about a better landing page where you describe your context, how your domains are organized, maybe some best practices or even setting the contact email. It really depends on what is relevant on your context.

In your dbt_project.ymlfile, you can add the following to indicate to dbt where to parse the Markdown documentation.

# dbt_project.yml file

# [...]

docs-paths: ["documentation"]

# [...]

Then in a documentation/overview.md, write everything your users will love!

{% docs __overview__ %}
# This is the first title of what appears on my home page
here I can write anything in Markdown to help my user to understand
the context and how to navigate in the different

| Domain | Contact Email |
| ----------- | ------------------------------- |
| merchandise | merchandise-group@my-domain.com |
| supply | supply-group@my-domain.com |
| product | product-group@my-domain.com |

{% enddocs %}

{% docs __dbt_assertions__ %}
# This is the overrided home page of the dbt_assertions pacakge

[dbt-assertions](https://hub.getdbt.com/AxelThevenot/dbt_assertions)
offers robust row-level data quality checks, enhancing downstream model reliability.
It provides efficient exception detection,
flagging specific rows failing assertions for easy resolution.


{% enddocs %}

{% docs __dbt_star__ %}
# This is the overrided home page of the dbt_star pacakge

[dbt-star](https://hub.getdbt.com/AxelThevenot/dbt_star)
The perfect pacakge to `SELECT *` without `SELECT *`.
{% enddocs %}

Note: You can set you custom project-level overviews in documentation project or in the initial project. More on custom overviews.

Conclusion

Follow these steps to efficiently serve your dbt documentation within Google Cloud. Prioritize documentation, set up the environment, deploy with App Engine, enhance security with IAP, automate with Cloud Build, and optionally improve documentation. In approximately an hour, your dbt documentation will be up and running, ready to support your data engineering workflows.

In the meantime, feel free to share this article if you found it helpful, give it a clap, leave a comment, subscribe, or follow me on LinkedIn.

And thank you for sticking with me through this lengthy read!

--

--