How To Create Data Pipeline And Data Quality SLA Alerts In Databand

Eitan Chazbani @ Databand
Databand, an IBM Company
5 min readOct 19, 2022

--

This post was written by Helen Soloveichik, Product Manager @ Databand, an IBM Company

Data engineers often get inundated by alerts from data issues.

The last thing an engineer wants to do is get woken up at night for a minor issue, or worse, miss a critical one that requires immediate attention.

Databand helps fix this problem by breaking through noisy alerts with focused alerting and routing when a data pipeline and quality issues occur.

This blog will walk through how you can set alert notifications for data pipelines and datasets so the team can detect and quickly resolve incidents like data SLAs, column changes, null records, and much more.

Data pipeline alerts

Data pipeline alerts include errors such as failed runs, longer than expected durations, or missing data operations.

Watch the video to see it in action, or continue reading below.

To create a data pipeline alert, select the “Create Alert” button and then select pipeline.

Now, all you have to do is select which pipeline you want to assign an alert to and pick the type of alert. We’ll walk through each of the alerts below.

Run state alert

A run state alert notifies you when a pipeline hits a certain status like running, success, failed, shutdown, and canceled.

Then you select the severity value corresponding to the pipeline’s status and criticality.

In the recent values, you can see all the recent times where this pipeline has produced different values. This helps you understand how often you or your team might get this alert if it’s activated.

Run duration alert

Next is a run duration alert based on a metric value measured in seconds.

For example, if you’re expecting the pipeline to complete within a certain time frame, you can set an alert to fire if it’s outside that time window.

Anomaly detection might be the coolest run duration metric because you might not know how long this pipeline executes.

By selecting anomaly detection, Databand creates a baseline of run durations and tells you when the run deviates from what’s expected.

You can adjust the sensitivity levels so Databand knows how sensitive the alert trigger should be when it’s live in production. Using the lookback range gives you more granularity by telling Databand how many runs back you want us to look at to build the anomaly detection.

Missing data operations alerts

Missing data operations alerts tell you when a dependent operation to your pipeline doesn’t work.

Here four datasets relate to this pipeline, meaning that the tasks in this pipeline depend on reading and writing from these datasets. The alert will show you which operations didn’t read or write.

Schema change alerts

Schema change alerts notify you about any changes in the schema (column type change, new or removed columns, etc.).

Dataset alerts

Starting off, you have two options for dataset alerts.

  • Data delay: alerts that let you know if the data arrived on time and as expected.
  • Data quality check: alerts to check the quality of each dataset.

Don’t worry. We’re going to go through both options so you can see which ones you would likely use with Databand.

Watch the video to see it in action, or continue reading below.

Data delay alerts

Let’s look at data delay alerts first.

This type of SLA alert you trigger when a dataset isn’t updated.

For example, if you expect this dataset to be updated each day at 3 PM Eastern, you can create an alert that will tell you if the dataset wasn’t updated.

And you can apply this alert to one or multiple datasets.

You can further customize the alert by isolating it to a certain pipeline. So if you want all alerts regardless of the pipeline, you can just leave this blank. Select your alert severity to be done.

Run quality alerts

Here you’re alerting on the quality of the data columns within the dataset. You first define where Databand checks while data is processed.

Then apply the validations for the dataset columns. For example, a popular validation would be to select one or multiple columns that you want Databand to check for null percentages or counts.

This way, you’ll know immediately if zero records are about to be sent downstream to a data consumer.

Connect to receiver

The last page of both alerts is to hook them up to a receiver like Slack, PagerDuty, or Email. This helps data engineering focus on only the alerts that pertain to them.

In this example, we have details things data engineers care about, such as:

  • Missing operations
  • Schema changes
  • Alert trigger time
  • Run name
  • Source
  • Affected datasets

Wrapping it up

For more information on how Databand can help you set data incident management alerts to keep your data SLAs on track, check out our demo center or book a demo.

This post was written by Helen Soloveichik, Product Manager @ Databand, an IBM Company

--

--