DataOps in Data Lifecycle Management

Published in

Big Data Processing

3 min readNov 6, 2021

DataOps’ goal is to minimize analytics cycle time by covering the whole data analysis stage; from requirements collection to distribution of results [1]. Data lifecycle management relies on people and tools [2], and DataOps collaborates with people and tools to better manage the data life cycle.

Data analytics pipeline alters data through a series of tasks. Whether the ETL/ ELT pipeline or analysis pipeline, the output will always be different from the input. In data pipelines, one of the challenging tasks is to track data. Data goes through a series of transformations while going from one stage to another.

In DataOps, data lifecycle management is unavoidable because of the need to monitor the quality of processes and products. Data governance and data lineage are part of DataOps to assure process and product quality. Quality assurance and the DataOps principle of reproducible reuse depend on managing and maintaining data lifecycle change events. Data governance and data lineage are not easy to address; it starts with managerial level planning and flourishes with the tools to implement our plans. In DataOps, transparency in data lifecycle management is always a priority task.

DataOps applies to the entire data lifecycle [3]; from data collection to publishing the result, all data preparation and analysis stages can implement the DataOps methodology to execute the job. It provides the significant advantage of easy data lifecycle management by applying the intrinsic approach to handle data throughout the analytics lifecycle. A data pipeline serves to transport data from one stage of the life cycle. DataOps have restructured traditional data pipelines, taken them out of the black box, and made them measurable and maintainable through collaboration, communication, integration, and automation. As a result of the restructuring of the traditional pipeline, data lifecycle management becomes more straightforward.

DataOps support all stages in the data lifecycle; with the right people and technology in use, data will seemingly flow from one stage to another. With DataOps, a published result from the analysis can be trackback to a raw data source, decomposing each transformation task performed over them. DataOps acknowledges the interconnected nature of data engineering, data integration, data quality, and data security [38] and combines all these aspects of data analytics to form an interspace of data movement between data lifecycle stages.

The figure presented above is a simple visual representation of the DataOps pipeline coverage over the data lifecycle. The dotted line resembles the DataOps pipeline, which covers the entire data lifecycle, including planning. Thus, data governance and data quality management are well implemented in DataOps throughout the lifecycle of data. Moreover, in DataOps, there is no necessity of creating separate pipelines for different stages, unlike in traditional data pipelines. Preferably, DataOps utilizes the technical modularity of orchestration, workflow management, and automation tools to provide flexible and customized transformation process when needed.

References

DataKitchen, DataOps is NOT just DevOps for data, (2018). https://medium.com/data-ops/dataops-is-not-just-devops-for-data6e03083157b7
C. Bergh, G. Benghiat, S. Eran, The DataOps Cookbook, second, 2019
Margaret Rouse, What is DataOps (data operations)? — Definition from WhatIs.com, TechTarget. (2019). https://searchdatamanagement.techtarget.com/definition/DataOps

The article is the extraction from Thesis Paper on “DataOps: Towards Understanding and Defining Data Analytics Approach”.

DataOps in Data Lifecycle Management

Written by Kiran Mainali