From DevOps to DataOps

From DevOps to DataOps

Over the past 10 years, many of us in technology companies have experienced the emergence of “DevOps.” This new set of practices and tools has improved the velocity, quality, predictability and scale of software engineering and deployment. Starting at the large internet companies, the trend towards DevOps is now transforming (albeit slowly) the way that systems are developed and managed inside the enterprise — often dovetailing with enterprise cloud adoption initiatives.

Regardless of your opinion about on-prem vs. multi-tenant cloud infrastructure, the adoption of DevOps is undeniably improving how quickly new features and functions are delivered at scale for end users.

I think there is much to be learned from the evolution of DevOps — across the modern internet as well as within the modern enterprise — most notably for those of us who work with data every day.

At its core, DevOps is about the combination of software engineering, quality assurance and technology operations. DevOps emerged because traditional systems management wasn’t remotely adequate to meet the needs of modern, web-based application development and deployment.

I believe that it’s time for data engineers and data scientists to embrace a similar new discipline — let’s call it “DataOps” — that at its core addresses the needs of data professionals on the modern internet and inside the modern enterprise.

Two trends are creating the need for DataOps:

1. The democratization of analytics, which is giving more individuals access to cutting-edge visualization, data modeling, machine learning and statistics. Tableau CEO Christian Chabot has frequently championed democratization as “a tremendous opportunity to help people answer questions, solve problems and generate meaning from data in a way that has never before been possible. And we believe there’s an opportunity to put that power in the hands of a much broader population of people.”

2. The implementation of “built-for-purpose” database engines, which radically improve the performance and accessibility of large quantities of data at unprecedented velocities. My partner Mike Stonebraker has been arguing convincingly for years, as he does in this KDnuggets interview, that “one size does not fit all — i.e. in every vertical market I can think of, there is a way to beat legacy relational DBMSs by 1–2 orders of magnitude. The techniques used vary from market to market. Hence, StreamBase, Vertica, VoltDB and SciDB are all specialized to different markets.

Posted on