DataOps Implementation Guidelines

Kiran Mainali
Big Data Processing
4 min readNov 15, 2021
DataOps has evolved from lean manufacturing and SDLC (Source: DataKitchen)

When it comes to implementing DataOps, there is no well-defined approach. However, different organization and DataOps platform providers [1, 2] have proposed their implementation methods. Therefore, in an effort to create generic guidelines, I have summarized the implementation practices with the careful and detailed evaluation of approaches used by some early adaptors of DataOps.

According to the DataKitchen whitepaper, the data analytics team can implement DataOps in seven simple steps [3]. The implementation process has three sections in the iCEDQ whitepaper [4]: people culture, process practice, and tools. DataKitchen focuses more on technical implementation, whereas the iCEDQ proposal is a holistic approach to shifting organizational culture to smooth the technical aspect of DataOps implementation. I have listed nine generic DataOps Implementation Guidelines for organizations in the following. Discussion on implementing DataOps is still a topic of research, and I would like readers to provide feedback on my effort to outline the guidelines.

  1. Set DataOps culture

Start DataOps by identifying people and culture in an organization. Then, establish management control, communication process, and project management to align with the process and tools.

2. Automate and orchestrate

Use automation and orchestration tools to reduce manual work. Collaboration between team members to execute data analytics projects tries to automate tasks as much as possible. With orchestration, integrating tools and technology will be easier to automate the data analytics projects.

3. Use version control

Versioning is essential for data, documents, and code tracking. Data governance, data provenance, and data lineage can depend on version control tools to some extent. Also, with version control, different team members can create their version of work and be merged for implementation.

4. Reuse and containerize

Do not waste time redoing the same thing if it is possible to reuse it. Furthermore, containerized applications and pipelines help to reduce the risk of failure due to external circumstances.

5. Setup multiple environments

Setup a separate environment for production and development gives the flexibility of innovation and change management without risking ongoing pipeline execution. Within the development environment, each data worker should have their work environment to work independently without affecting other performance.

6. Test and test

Without testing, we cannot assure data pipeline quality. Create test cases to cover every possible pipeline corner (data, code, system, and output). Do an extensive test before releasing the data pipeline or changes in the data pipeline into the production environment.

7. Continuous integration and deployment

To assemble work of various data workers and put into a test environment, use continuous integration. After passing all the test cases, use continuous deployment to release work into the production environment.

8. Continuous monitor

Regularly monitor the production and development environment to track overall data pipeline performance, the pipeline’s input and output quality, and used tools and technology performance. Also, always crosscheck the results of two environments. With continuous monitoring, system performance and quality statistics can be recorded. Scope of further improvement always exists with test results analysis.

9. Communicate and collaborate

Continuously communicate with customers, stakeholders, and team members. Try to minimize the communication loop to a minimum so that messages can travel faster. If required, create a collaborative workspace between tools and people and tools for the task to provide better results.

Discussion

DataOps, from its initial establishment of the term until now, significant contribution to defining and practice has been seen. DataOps enthusiasts collaborate to create a common principle to offer uniformity in applying the methodology in the heterogeneous data operation environment. Due to their diverse nature, there are still certain ambiguities on applicability with all these efforts. Data analytics itself is a broad field where numerous tools, approaches, and technology can lead to the same result. However, DataOps advocates collaboration, quality control, and faster delivery of data analytics pipeline by extending proven DevOps methodology from SDLC and combining with Agile and Lean Manufacturing’s SPC.

With a combination of three methodologies as a reference point, DataOps have been continuously evolving as an efficient and reliable methodology in data lifecycle management. Companies like DataKitchen and iCEDQ are actively developing data analytics tools supporting DataOps principles and contributing to DataOps research and development by publishing their works. For instance, the DataOps implementation guidelines they have presented are derivative from their project experience. Even though they have developed guidelines considering their tool, which holds true for implementing DataOps for other tools and technologies. Both (iCEDQ and DataKitchen) implementation approaches satisfy DataOps principles, which shows the uniformity in DataOps practice. By studying two implementation approaches, and DataOps principles, nine implementation guidelines are listed in the above section. The implementation guidelines created above fulfill the DataOps principles manifesto.

References:

  1. W. Eckerson, Diving into DataOps: The Underbelly of Modern Data Pipelines, (2018). https://www.eckerson.com/articles/diving-into-dataops-the-underbelly-of-modern-data-pipelines.
  2. J. Zaino, Get Ready for DataOps — DATAVERSITY, (2019). https://www.dataversity.net/get-ready-for-dataops/.
  3. DataKitchen, The Seven Steps to Implement DataOps, 2017. www.datakitchen.io.
  4. S. Gawande, Complete DataOps Implementation Guide, ICEDQ. (2019). https://icedq.com/dataops/dataops-implementation-guide.

--

--