The Ultimate Guide to Evaluating Data Observability Tools

Mona Rakibe
Telmai
Published in
7 min readDec 12, 2023

Over the past few years, I have spoken to hundreds of data practitioners about their needs and challenges that a data observability tool could address. Some customers/prospects even contacted us with well-defined use cases and requirements for Data Observability.

These raw notes, learnings, and RFPs slowly became a spreadsheet we started validating against market reports and G2 reviews around what people deeply care about.

While this was meant for our internal use, we expanded its use to help users think through the evaluation of data observability tools. Our idea was to give them a solid starting point, and recently, our marketing team added their magic touch to make this an excellent little evaluation guide.

It’s not a massive book, but it’s 17 pages of me putting down learnings around how data practitioners think about their data reliability needs and data observability.

Download the full guide here: https://www.telm.ai/gated-assets/the-ultimate-guide-to-data-observability-tools/

Few things :
✅We will email you a PDF and CSV version; the CSV version will make it easier for you to edit and adapt to your needs.
✅This guide might still need improvement, but with your feedback, it could become closer to that. So, if you’d like more features, please let us know, and we’ll include them.
❌ This is not Telmai’s product feature list :) There are features in this guide that we don’t support, but we have seen their importance and felt the need to add them here.

The guide will walk you through key areas of consideration in detail. Still, at a high level, these are the critical criteria to think through when evaluating your data observability tool:

✅Integrations like Data lake, Data warehouse, BI tools, etc

One of the first and most important considerations when evaluating a data observability tool is its adaptability and supportability in the client’s current and future architecture and data pipelines.

As the tools and applications across data pipelines continue to proliferate rapidly, it is crucial to identify a data observability tool that can be integrated into your ecosystem.

The source systems you will need to be supported in your ecosystem across Databases, Streams, Data Lakes, and Data Warehouses and the ease of integration with these source systems. Specifically, the Data Observability tool can monitor your system without going through a transformation process. Next comes data formats like CSV, JSON, Parquet, and Avro. Does the system support all the formats your data needs to be read from?

Also, consider the need to support the composability of your existing and new data pipelines.

Data Observability tools will need to handle integration with upstream systems like BI and analytics for impact analysis. They also need to support integration into your existing data catalog so your catalog can give a 360-degree view of your data assets, including its health metrics.

✅ Data Quality Reporting

Data quality refers to data accuracy, reliability, completeness, and consistency. It optimizes data to align with the desired state and meet the expectations defined in the business metadata. Data quality answers the fundamental question, “What do you get compared to what you expect?”

The outcome of Data Observability is automatically classified into Data Quality indicators and health reports. Since data observability tools are constantly scanning new and updated data for data metrics, these metrics can be used to generate and update data quality reports automatically.

A data observability tool can help automate and report data quality KPIs.

When evaluating a tool, first and foremost, understand what capabilities around DQ metrics are automatic and the degree to which the tool can take customization into account. More often than not, each company and team may have a different meaning of individual DQ KPIs based on business needs and data values.

As a part of Data Observability evaluation, users should also look into their definitions for these KPIs and map them with supportability from data Observability vendors

✅Data Quality Rules

Call it data contracts, SLAs, expectations, or rules; every system in a data pipeline has specific rules and validations that can ensure the integrity and reliability of the data that are stored, processed, and transferred. Many times, ML-based tools reduce the dependency on such rules as rules are rigid and hard to maintain and manage. However, even with the most advanced ML models, there is key validation on complex business rules that are best done through predefined policies.

A data observability system will not only monitor the data for anomalies but can help users interactively define such checks and then continuously monitor the data flowing through the pipeline against these rules.

These rules can be simple policies like acceptable values for a picklist based on industry code or multi-attribute policies based on business constraints.

While evaluating a data observability tool, users often consider the tool’s supportability for building and managing interactive DQ rules.

✅Data Monitoring

Data Monitoring and observability are related concepts in managing and maintaining data. While they are often used interchangeably, they refer to slightly different concepts.

Monitoring involves collecting and analyzing data points to understand their health, performance, and behavior. It involves setting up various metrics, alerts, and dashboards to track key performance indicators (KPIs) and alerts when there are issues.

Observability, on the other hand, is a broader and more proactive approach to understanding data systems. So, most data observability platforms do have monitoring capabilities.

✅Anomaly Detection

Enterprise data is transformational; hence it goes through multiple changes throughout its journey from ingestion to consumption.

Additionally, depending on the type of data, i.e., 1st party, 2nd or 3rd party, there are often anomalies like drift in values or outlier or out-of-range values. These could be true or false positives, but unless detected and investigated, these are hard to address and have a tremendous impact downstream. Based on your needs look for tools that detect meta-data, data, and business metric/calculated metrics drifts.

✅Notification/Alerting and API access

The most common outcome of data observability is notifications. A notification can be a soft notification in the UI or an alert via systems like PageDuty, Slack, SMS, etc.

Outside of integrations with alerting channels, when considering data observability tools, identify your needs around API access and automation of pipeline workflows using alerts.

For example, users must programmatically access alerts and stop/continue the pipeline flow to implement a circuit breaker pattern.

✅Actions and Remediation

Once the monitoring system detects errors and anomalies, there could be multiple options for the next steps. The obvious ones are alerting on Slack, pager duty, etc.

Key considerations in this area are: Does the tool give field-level insights and help quickly get to the source of the issue using column-level and system-level lineage?

This workflow works specifically well for unknown and new issues; however, once data engineers start seeing repeatable patterns on issues, they may want to automate the subsequent actions, i.e., create service tickets in JIRA.

Also, often, companies are leveraging the outcome of data observability to constantly separate good and bad data into separate bins (buckets) and, depending on the use case send bad data for remediation or storing separately so they can not only reduce the cost of storing, processing and transferring good data but also enables users to train ML-models on quality controlled data.

✅Scale/Performance and Deployment Considerations

When looking into a data observability tool, there are two critical considerations around scale and performance, the first being the tool designed to scale quickly to your current and future data needs. The elasticity of the tool should be considered for all dimensions of data, like volume, velocity, and variety. The second is the performance implications of data observability on the underlying data infrastructure. Data observability tools are metric calculation systems that require a high degree of computation to support scale. This processing will often get pushed to the underlying infrastructure, causing latency and performance degradation.

Techniques like change data capture(CDC), delta processing, and samplings can be adopted to reduce cost and improve performance. In contrast, support for CDC and delta processing is very important, and sampling can become a limitation based on business needs. For example: If users need exact records (IDs) of anomalous data for remediation, sampling techniques will not work.

✅Total Cost of Ownership (TCO) and Time to Value

The first and foremost consideration when evaluating a tool is time to value. The time-to-value would depend on multiple factors like ease of use and set-up, time is taken by the tool (ML-based) to train its models, and start alerting, training, and on-boarding needed.

✅Deployment and Hosting

Support SaaS, on-prem and hybrid deployments

✅ UI and UX

While the feature functionality is paramount for your data observability solution, usability shouldn’t have to be a tradeoff!

This is important because data reliability is a highly distributed function across source owners, data engineers, product owners, and business users. Having a tool that’s easy to adapt and collaborate across these teams will significantly impact overall ROI.

Identify this tool’s key user persona in your organization and evaluate the user experience with them. For example, If the primary users are non-sql, ensure that the tool does not implicitly assume SQL skills.

I hope this provides an excellent framework to start. You can download the complete guide here: https://www.telm.ai/gated-assets/the-ultimate-guide-to-data-observability-tools/

--

--

Mona Rakibe
Telmai
Editor for

Love solving complex problems using data. CEO, product head & data leader(www.telm.ai). Democratize data quality across business and technology using ML/AI