Action-Position data quality assessment framework

Where do you place data quality validations? What are the actions you take when they fail?

Yerachmiel Feltzman
Israeli Tech Radar

--

“What is the business impact of an error on production for this pipeline?”, I asked our senior manager.

“Well” — he said — “it’s ugly”.

“So, we will be better served with a downtime than an error”, I concluded.

We were talking about a data pipeline triggering deletion on a production database, based on a TTL. Therefore, deleting the wrong items could cause a direct impact on client-facing features. The technical implementation of the pipeline itself was straightforward, but the business impact of an error was huge.

At the same time, we had a super complex streaming pipeline running a change-data-capture that powered several analytical workloads. Analysts could handle temporary errors by themselves, but freshness was a key KPI.

How should we approach data quality checks when designing those two pipelines?

I am sure you care about the quality of the outputs of your data pipelines. You also care about the end user and do your best to ensure they can rely on the data your pipelines release downstream. It is also true that you have done that using one or a mix of tools to validate your job outputs. As a matter of fact, validations can be done both for inputs and outputs and you…

--

--