This is the start of a series of articles about building test harnesses for big data workflows in the cloud. In this first part I’m going to explain why automated testing is one of the most useful tools for building maintainable data workflows.

The rest of the series continues as below:

Part 1: Why Great Data Engineering Needs Automated Testing
Part 2: The Keys To Unlock TDD For Data Engineering
Part 3: The Test Pyramid in Data Engineering

Tests are great

Most people who have spent some time as a software developer are familiar with the joy of a freshly passing build. You…


In Part 1 I hope you were convinced that automated testing is something worthwhile investing in for your long term sanity. If you agree with me that putting testing in the front seat is a good idea, and our jobs are easier when you write the tests first (which is Test Driven Development or TDD), now we have to give some thought into how to actually do it. Which to be honest, can be quite daunting.

Testing can be evil.

The rest of the series continues as below:

Part 1: Why Great Data Engineering Needs Automated Testing Part 2: The Keys To Unlock TDD…


Writing your first test can be a daunting moment. Every test comes with a cost and some more than others. You will find yourself asking questions such as:

  • What do I test?
  • Do I test the whole system?
  • Do I test every function?
  • How many tests do I need?
  • How many failure scenarios am I satisfied with? If any?

Unfortunately, I can’t explicitly tell you the answer to what is right or wrong for your particular scenario, but hopefully, this post can help you out. I’m going to walk you through the process of creating tests for a simple data…

David O'Keeffe

Senior Consultant at Servian. Specialist in Data Infrastructure & Automation.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store