Written by Suresh Appavu, Thiliban Varadharajan, and Soyel Alam


One major challenge in data pipeline implementation is reliably testing the pipeline codes. The outcome of the code is tightly coupled with data and the environment and this consequently blocks the developer to follow test-driven development, identify early bugs by writing good unit testing, and release the code via CICD with confidence.

One way to overcome the reliability challenge is to use immutable data to run and test the pipeline so that the result of ETL functions can be matched against known outputs.

Obviously, this requires a good knowledge of the…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store