Executive summary: DataOps and DevOps for Big Data

2 min readJan 29, 2019

In the past decade, flexibility, agility, and automation have been key advancements in software development. Agile and DevOps have made the whole development experience rapid, iterative, collaborative, and authentic.

Teams looked forward to achieve a similar feat in data analytics and for the same DataOps have been evolving. However it is frequently mixed up with DevOps.

DevOps, short for development and operations, is combination of tools and practices to achieve seamless development, deployment, infrastructure management, configurations and applications monitoring. While Agile helped establish an iterative approach by leveraging customer oriented rapid releases, DevOps is a practice of bringing development and operations together to manage end-to-end processes using tools, automation and enhanced collaboration.

DataOps provides a controlled, integrated, and quality process to capture, store, manage, compute, analyze, visualize, and consume the data. ETL tools help data replication to hydrate data lakes, sandboxes, and data marts. Many of the tools required for the same have inbuilt automation whereas tools like Oozie, Autosys, and Control-M help to schedule the process and script execution. Numerous data layers including semantic layers, data views are built by business logic and based on consumption needs.

Data can be in many forms, which are generally categorized as structured, semi-structured, or unstructured data. There are numbers of data source types comprises of a variety of data use cases, scenarios and patterns. There are number of other key drivers like size, security, latency, cost etc which affect product management decision tree. Metadata, which is simply information about data, has played a key role in the data curation, establishing data lineage, ensuring data quality and data governance.

Data is valuable when it provides the business outcomes and as data platform one can facilitate it by managing volume, variety, velocity and veracity of big data. To fulfill these demands many tools and open source systems have lined up to provide speed, stability, and functional interoperability. Mostly tools are available On-premise as well as on Cloud but some are part of exclusive ecosystems.

As one size doesn’t fit all, it requires some effort and optimal investment to identify the right tools befitting to one’s needs.

Exponential data growth, data democratization, monetization, IoT, AI and many more continue to add new frontier in data and analytics space.

References below are a good read to provide insight and a deeper dive into DataOps.

DataOps is NOT Just DevOps for Data

DataOps: Building A Next Generation Data Engineering Organization

Executive summary: DataOps and DevOps for Big Data

Written by Ram Prakash