Elementary data for observability within dbt project

Tsebek Badmaev
Rock Your Data
Published in
5 min readMar 26, 2024

One of the greatest features of dbt is the availability of a vast number of external packages and how easily you can integrate them into your project. Elementary is one of those packages and it helps to implement data observability and alerting.

Let’s start with a brief overview of the observability. Here is a great YouTube video that reveals the definition and key concepts. Observability is crucial for understanding the health and performance of data systems and it involves:

  • Monitoring
  • Tracking
  • Analyzing data

dbt project initialization

Since Elementary is a dbt package, we should start by initializing the dbt project. It’s great if you already have one. Project initialization begins with Python virtual environment installation. You can find many guides on how to set up dbt project. We won’t stop at this point, as this is not the focus of our article. I just want to point out that you need to add some models and tests to get data in the Elementary report.

Elementary installation

Elementary has two versions — cloud and open source. We choose the open source (OSS) version. Here is a great step-by-step guide from the official website with text descriptions and videos.

Screenshot from the Elementary official website

The first part is the Elementary dbt package installation.

  1. Add elementary to packages.yml. I’ve already had some packages so just added the following:

2. Add to your dbt_project.yml Elementary model. This means Elementary models will have their own schema:

3. Import the package and build Elementary models using the following commands:

#install needed dependencies
dbt deps
#run dbt models
dbt run

4. Validate the installation by running some tests

#run dbt tests
dbt test

After the tests are finished we can check the results in the Elementary table

The second part is Elementary CLI installation.

In order to connect, Elementary needs a connection profile in a file named profiles.yml. Elementary can generate the profile by running the following command:

dbt run-operation elementary.generate_elementary_cli_profile

Copy the output, and add the profile to your profiles.yml, but don’t forget to use Environment Variables for sensitive information like your account, username, and password.

We can set Environment Variables once for the duration of your CLI session

# set environment variables once for MacOS
export SNOWFLAKE_ACCOUNT=your_account
export SNOWFLAKE_USER=your_user
export SNOWFLAKE_PASSWORD=your_password

Or we can automate the Environment Variable setup. For MacOS follow these steps:

  • Navigate to your virtual environment’s directory. Inside, you’ll find a bin directory that contains scripts that are run when the virtual environment is activated or deactivated.
  • Find the activate script within the bin directory. This script is executed whenever you activate your virtual environment.
  • Edit the activate script to include the export commands above for your environment variables at the end of the file.

To install the monitor module run:

# install monitor module
pip install elementary-data
# based on your platform run one of the following commands (no need to run all)
pip install 'elementary-data[snowflake]'
pip install 'elementary-data[bigquery]'
pip install 'elementary-data[redshift]'
pip install 'elementary-data[databricks]'
pip install 'elementary-data[athena]'
## Postgres doesn't require this step

Run command edr --help in the terminal to ensure the installation was successful. You should get this message:

Elementary observability report generation

Now we can generate the report that can be used for visualization and exploration of data from the dbt-package tables. That includes dbt test results, Elementary anomaly detection results, dbt artifacts, test runs, etc.

Run the following command to generate the report as an HTML file:

edr report

Observability report overview

Let’s discover the following features in the report:

  • The main page with dashboard — it’s data health review with charts that show the number of tests, their execution, failures, and anomalies
  • Lineage of the dbt project
  • Test execution history where we could see every test execution in detail and could find frequently failing tests
  • Model duration which shows the time execution of each model in the dbt project and allows to see failures, and detect bottlenecks

Comparison of Open source version with Cloud version

Cloud version set up is very similar to the OSS version. You need to install Elementary dbt package, but instead of Elementary CLI installation, you need to create an account and connect your data warehouse.

Additional features in the cloud version:

  • Catalog which is the data catalog of a dataset and shows table description, columns information, dependencies, and SQL query (typically stored in the target path of the dbt project)
  • Test configuration that allows the creation of different types of tests
  • Alerts and rules are allowed to create alerts in the UI

There are 3 types of tests in the test configuration:

  • Table tests: Freshness, Volume, Schema changes
  • Column tests: Unique, Not null, Accepted values, Relationship
  • Custom query test

Conclusion

During our weekly Surfalytics session, we discussed about Elementary (OSS and Cloud), and dbt Cloud.

Elementary cloud version has some additional features, but they can be covered with open source products:

  • Catalog can be replaced with dbt docs. The Catalog in Elementary doesn’t allow you to edit description so this is the biggest cons
  • Tests are very similar to dbt built-in tests (especially column tests) and you can use SODA or Great Expectations
  • Alerts and rules you can set up for OSS version, but not in UI

The price for Elementary cloud version is higher than dbt Cloud so it’s better to use dbt Cloud where you can use catalog, lineage, observe tests, and create alerts. In addition, dbt Cloud can be used as an orchestration tool.

In terms of time and cost the best choice is dbt Cloud. If you want to reduce costs or can’t use the cloud (or maybe you are a big fan of the open source solutions) then you can build the project using all open-source products.

--

--