dbt-metalog: Your Metadata's Catalog for dbt

Bruno Souza de Lima
Indicium Engineering

--

Have you ever found yourself jotting down essential business rules, questions, technical owners, or ToDos in a separate Excel sheet — far from your code —, only to forget about them later?

Do you struggle to keep track of all the metadata that’s critical for effective data management and analysis?

Worry no more! We have a solution — dbt-metalog.

https://github.com/techindicium/dbt-metalog

dbt-metalog is the perfect solution for all your metadata documentation needs. This package enables you to document all the important information right within your models, ensuring that you never miss a beat.

With dbt-metalog, you can create highly organized, light, customizable models, making it easier to utilize and manage your data. Say goodbye to the mess of disorganized Excel sheets and hello to a streamlined and efficient data management process.

You can easily create models for

  • ✅ Business questions.
  • ✅ Tech owners.
  • ✅ Requesting areas/persons.
  • ✅ Date the model was created.
  • ✅ ToDo’s.
  • ✅ Any metadata you want…

You have total flexibility to choose the metadata you want in your model. Choose it by:

  • ✅ name.
  • ✅ resource type.
  • ✅ file.

Content

What do I need to use it?

To use dbt-metalog, you need dbt version 1.0.0 or higher

dbt-version ≥ 1.0.0

and one of the supported adapters:

dbt-bigquery
dbt-databricks
dbt-postgres
dbt-redshift
dbt-snowflake

How do I install it?

1. Include dbt-metalog in your packages.yml file

packages:
- package: techindicium/metalog
version: 1.0.0

2. And run dbt deps to install it.

As simple as that!

How do I use it?

Using dbt-metalog is easy too.

This {{ create_metadata_model() }} generates SQL for creating customizable tables or views from the metadata of your nodes and sources. You have the flexibility to select the specific metadata you want to include in your model. If a node does not contain the specified metadata, it will be displayed as “Undefined”, but you can alter this default text to your preference.

Every time you see “node” consider also “sources”

The macro will check the metadata defined in your nodes. Which can be

  • Models
  • Sources
  • Seeds
  • Snapshots
  • Tests
  • Analyses
  • Macros

1. Include metadata in your resource's configuration. Wait… Don't you know how to do that? Check the documentation here!

Let's suppose we have these two models and this seed:

--dummy_model_1.sql
{{ config(
meta={
'main_subject': 'sales'
, 'owner': 'Alice'
, 'business_questions': [
'How many stores of type ...?'
, 'How many stores in ...?'
]
, 'business_rules': [
'Stores of type A receive code B ...'
, 'Consider only stores open after ...'
]
, 'todos': 'change to incremental'
}
)}}

select 1 as dummy
-- dummy_model_2.sql
{{ config(
meta={
'main_subject': 'people'
, 'owner': 'Bob'
, 'business_questions': 'How many employees ...?'
}
)}}

select *
from {{ source('raw', 'dummy_raw') }}
-- dummy_seed.yml
version: 2

seeds:
- name: 'dummy_seed'
description: "stores"
config:
meta: {
'main_subject': 'sales'
, 'owner': 'Carl'
}

columns:
- name: "seed_column"
description: "the description of the dummy seed column of dummy_seed"

⚠️ Warning: Currently this package does not supports dicts in the meta config, just single values or lists.

2. Now you just have to create a model for your project using the create_metadata_model() macro, selecting the specific metadata you want to include in your table, such as

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
)}}

The metadata you want to include must be passed inside a list as the value of the metadata argument.

3. dbt run, and you will get this result:

More customization

granularity

If you want to separate some specific metadata into different rows, you can use the optional argument granularity

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
, granularity = [
"business_questions"
]
)}}

You can pass more than one metadata inside the list

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
, granularity = [
"business_questions"
, "business_rules"
, "todos"
]
)}}

resource_type

You can ask the macro to include metadata of more resource types with the resource_typeargument. Let's include seeds along with models.

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
, granularity = [
"business_questions"
, "business_rules"
, "todos"
]
, resource_type = [
"model"
, "seed"
]
)}}

If you do not specify any resource_type, the default is model. The available types are:

  • 'model'
  • 'source'
  • 'seed'
  • 'snapshot'
  • 'test'
  • 'analysis'
  • 'macro'

show_resource_type

If you want to, you can hide the resource_type column from your model setting the show_resource_type to False

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
, granularity = [
"business_questions"
, "business_rules"
, "todos"
]
, resource_type = [
"model"
, "seed"
]
, show_resource_type = False
)}}

undefined

You can override the default ‘Undefined’ string with undefined.

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
, granularity = [
"business_questions"
, "business_rules"
, "todos"
]
, resource_type = [
"model"
, "seed"
]
, show_resource_type = True
, undefined = "Not defined"
)}}

undefined_as_null

You can also set the undefined metadata to appear as null values with undefined_as_null.

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
, granularity = [
"business_questions"
, "business_rules"
, "todos"
]
, resource_type = [
"model"
, "seed"
]
, show_resource_type = True
, undefined = "Not defined"
, undefined_as_null = True
)}}

files

You can select the files you want to include (files) or to exclude (exclude_files). They are defined by regex patterns.

{{ metalog.create_metadata_model(
metadata = [
"main_subject"
, "owner"
, "business_questions"
, "business_rules"
, "todos"
]
, granularity = [
"business_questions"
, "business_rules"
, "todos"
]
, resource_type = [
"model"
, "seed"
, "source"
]
, show_resource_type = True
, undefined = "Not defined"
, undefined_as_null = True
, files = [
"models/.*"
, "seeds/.*"
, ".*/source.yml"
]
, exclude_files = [
'models/metadata_tests/.*'
'models/description_tests/.*'
]
)}}

Not only metadata

With dbt-metalog you don't have to be limited to metadata, we are improving it to new capabilities.

{{ create_description_model() }}

Currently, you can also use the {{ create_description_model() }} macro.

The usage is very similar, and you have the following arguments, which have the same behavior as in the {{ create_metadata_model() }} macro.

  • resource_type (optional) (default = ['model']): A list of the resource types you want to read the metadata from.
  • show_resource_type(optional) (default = True): A boolean to show or hide the resource_type column in your resulting model.
  • files(optional) (default = []): A list of regex specifying the files to include, e.g. If you want to include all files in models, then files=['models/.*'].
  • exclude_files(optional) (default = []): A list of regex specifying the files to exclude, e.g. If you want to exclude all staging files, then files=['.*stg_.*'].

Example:

Given the following .yml files:

version: 2

models:
- name: 'dummy_model_1'

columns:
- name: "dummy"
description: "the description of the dummy column of dummy_model_1"y
version: 2

models:
- name: 'dummy_model_2'
description: "description of dummy_model_2"

columns:
- name: "dummy"
description: "the description of the dummy column of dummy_model_2"

Create a model with the {{ create_description_model() }} macro.

{{ metalog.create_description_model(
resource_type = [
"model"
, "seed"
, "source"
]
, show_resource_type = True
, files = [
'.*.sql'
]
, exclude_files = [
'models/metadata.*'
]
)}}

When you run it, the result will be:

Conclusion

dbt-metalog is a must-have package for anyone who wants to say goodbye to the days of searching through cluttered Excel sheets for important metadata and hello to a world of organized data management.

If you’re feeling adventurous, try it out and see just how simple managing your metadata can be.

And don’t forget, we’re always here to help. If you need a hand or have a suggestion, feel free to reach out.

You can also raise any issues you encounter on the Github repository.

Let’s make dbt-metalog the best it can be, together!

--

--

Bruno Souza de Lima
Indicium Engineering

https://www.linkedin.com/in/brunoszdl/ #dbt #sql #snowflake #bigquery #databricks #analytics #analyticsengineer #data #elt