How Data Analytics Professionals Can Sleep Better

DataKitchen
Mar 21, 2017 · 4 min read

Seven Steps to Implementing DataOps: Step 1 — Add Data and Logic Tests

In a previous blog we introduced DataOps, a new approach to data analytics, which can put analytics professionals in the center of the company’s strategy, advancing its most strategic and important objectives. DataOps doesn’t require you to throw away your existing tools and start from scratch. With DataOps, you can keep the tools you currently use and love. You may be surprised to learn that an analytics team can migrate to DataOps in seven simple steps. Step 1 of 7 is below.

If you make a change to an analytic pipeline, how do you know that you did not break anything? Between the anxiety and the phone calls at odd hours, a data analytics professional might not be sleeping well when changes are being made to a business-critical system. Below we’ll discuss how to prevent IT emergencies and hopefully improve your sleep quality.

Tests applied throughout the data analytic pipeline

Automated testing insures that a feature release is of high quality without requiring time-consuming, manual testing. The idea in DataOps is that every time a data-analytics team member makes a change, he or she adds a test for that change. There are two categories of tests: Logic Tests that cover the code in a data pipe line and Data Test that cover the data as it flows by in production.

Testing is expanded incrementally, with the addition of each feature, so testing gradually improves and quality is literally built in. In a big run, there could be hundreds of tests at each stage in the pipeline. Every time a release is deployed to users, the tests are run to validate the functionality of the release. Each one of those tests is an insurance policy against critical failures. This is bound to improve the mental well-being of the data analytics professional.

Adding tests in data analytics is analogous to the statistical process controls that are implemented in a manufacturing operations flow. Tests insure the integrity of the final output by verifying that work-in-progress (the result of intermediate steps in the pipeline) matches expectations. Testing can be applied to data, models and logic. The table below shows examples of tests in the data-analytics pipeline.

Tests may be applied to inputs, business logic and outputs

For every step in the pipeline, there should be at least one test. The philosophy is to start with simple tests and grow over time. Even a simple test will eventually catch an error before it is released to the users. For example, just making sure that row counts are consistent throughout the process can be a very powerful test. One could easily make a mistake on a join, and make a cross product, which fails to execute correctly. A simple row-count test would quickly catch that.

Tests can detect warnings in addition to errors. A warning might be triggered if data exceeds certain boundaries. For example, the number of customer transactions in a week may be OK if it is within 90% of its historical average. If the transaction level exceeds that, then a warning could be flagged. This might not be an error. It could be a seasonal occurrence for example, but the reason would require investigation. Once recognized and understood, the users of the data could be alerted. Warnings can be a powerful business tool which helps the company understand its business better.

DataOps is not about being perfect. In fact, it acknowledges that code is imperfect. It’s natural that a data-analytics team will make a best effort, yet still miss something. If so, they can determine the cause of the issue and add a test so that it never happens again. In a rapid release environment, a fix can quickly propagate out to the users.

With a suite of tests in place, DataOps allows you to move fast because you can make changes and quickly rerun the test suite. If the changes pass the tests, then the data-analytics team member can be confident and release it. The knowledge is built into the system and the process stays under control. Tests catch potential errors and warnings before they are released so the quality remains high. When quality is ensured, the data analytics team can sleep like babies.

In our next blog we will cover step 2 in implementing DataOps.


Like this story? Download the 140 page DataOps Cookbook!

data-ops

The DataOps Blog

DataKitchen

Written by

data-ops

data-ops

The DataOps Blog

More From Medium

More on Dataops from data-ops

More on Dataops from data-ops

DataOps is Not Just a DAG for Data

DataKitchen
Feb 25 · 7 min read

60

More on Agile from data-ops

More on Agile from data-ops

DataOps Aphorisms

DataKitchen
Feb 17 · 2 min read

8

More on Dataops from data-ops

DataKitchen
Feb 4 · 2 min read

4

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade