Eight Top DataOps Trends for 2022
DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. From our unique vantage point in the evolution toward DataOps automation, we publish an annual prediction of trends that most deeply impact the DataOps enterprise software industry as a whole. Keep an eye on the eight top trends below that we believe will be significant in the year 2022.
The global AI market is projected to grow at a compound annual growth rate (CAGR) of 33% through 2027, drawing upon strength in cloud-computing applications and the rise in connected smart devices. The problem is that algorithms can absorb and perpetuate racial, gender, ethnic and other social inequalities and deploy them at scale. Many in the data industry recognize the serious impact of AI bias and seek to take active steps to mitigate it. The data industry realizes that AI bias is simply a quality problem, and AI systems should be subject to this same level of process control as an automobile rolling off an assembly line. In 2022, data organizations will institute robust automated processes around their AI systems to make them more accountable to stakeholders.
Model developers will test for AI bias as part of their pre-deployment testing. Quality test suites will enforce “equity,” like any other performance metric. Continuous testing, monitoring and observability will prevent biased models from deploying or continuing to operate. We call this application of DataOps methods to the problem of AI bias “equity as code” because the tests that enforce equity are built into automated software applications that test, deploy and monitor the model 24/7.
Companies Commit to Remote
With data and tools increasingly in the cloud, data organizations are finding ways to accommodate remote work. Web conferencing helps, but chance encounters by the water cooler are non-existent. The processes and workflows that depend on individuals with tribal knowledge huddling to solve problems are nearly impossible to execute through video conferences.
As a result, enterprises will examine their end-to-end data operations and analytics creation workflows. Are they building up or tearing down the communication and relationships that are critical to your mission? Instead of allowing technology to be a barrier to teamwork, leading data organizations in 2022 will further expand the automation of workflows to improve and facilitate communication and coordination between the groups. In other words, they will use DataOps principles to build a platform that creates a robust, transparent, efficient, repeatable analytics process hub that unifies all workflows.
Data Gets Meshier
2022 will bring further momentum behind modular enterprise architectures like data mesh. The data mesh addresses the problems characteristic of large, complex, monolithic data architectures by dividing the system into discrete domains managed by smaller, cross-functional teams. Each domain is an independently deployable cluster of related microservices which communicate with users or other domains through modular interfaces. A domain has an important job and a dedicated team — five to nine members — who develop an intimate knowledge of data sources, data consumers and functional nuances. An array of interdependent domains may sound great in theory, but we can say from experience, a decentralized organizational/architectural structure raises a host of issues that must be addressed. For example, managing ordered data dependencies, inter-domain communication, shared infrastructure, and incoherent workflows. Also, decentralized teams tend to duplicate effort, for example, in horizontal infrastructure that cuts across multiple domains.
DataOps addresses the challenges of a decentralized organizational structure/architecture such as a data mesh. A DataOps Platform spans toolchains, teams and data centers to incorporate all of an enterprise’s domains into a single superstructure. Data mesh and DataOps make a great team that enables innovation through decentralization while harmonizing domain activities in a coherent end-to-end pipeline of workflows. Data mesh encourages autonomy, while DataOps handles global orchestration, shared infrastructure, inter-domain dependencies and enables policy enforcement . The infrastructure ingredients required by domains can be unified into a self-service infrastructure-as-a-platform managed using a DataOps superstructure. DataOps is the perfect partner to data mesh.
The Great Resignation Hits Data & Analytics
Since April 2021, 24 million workers have quit their jobs. The percentage of resignations relative to total employment was 3%, the highest such figure on record. Like many other sectors of the economy, data professionals are feeling the pull .
We recently surveyed 600 data engineers , including 100 managers, to understand how they are faring and feeling about the work that they are doing. The top-line result was that 97% of data engineers are feeling burnout. Over 70% of the data engineers surveyed indicated that they would likely leave their current company in the next twelve months. Even more surprising is that 79% of those surveyed have considered abandoning the field of data engineering entirely. There has been a growing shortage of data professionals over the last five years. Hiring managers will need to find ways to address the lack of work-life balance experienced by members of the data team. Companies already have a hard time hiring data professionals. It will get worse in 2022 unless enterprises institute process improvements, like DataOps, that address the causes of unplanned work and lack of work-life balance.
Rise of the DataOps Engineer
If data analytics is like a factory, the DataOps Engineer owns the assembly line used to build a data and analytic product. Most organizations run the data factory using manual labor. We know from surveys that data scientists and other data professionals spend over 50% of their time executing procedures supporting data operations. DataOps Engineers manage the process hub that automates data production and analytics development workflows so that the data team is more efficient, innovative and less prone to error. A DataOps Engineer can have a significant impact on the productivity of the data organization. A recent LinkedIn job search showed over 950 positions advertised for job candidates with DataOps experience. A similar role, DevOps Engineer, was recently named by Linked-In as the number one “in-demand” tech job for 2022. Following in the footsteps of DevOps Engineers in the software industry, we predict that the DataOps Engineer will become the most sought-after and highly paid member of the data analytics team.
Hub-Spoke Enterprise Architectures
The shortage of data scientists is driving companies to find ways to put data in the hands of business users who perform their own self-service analytics. The rise of distributed data architectures like Data Mesh will combine with DataOps automation to give rise to Hub-Spoke architectures that deftly blend the benefits of centralization and decentralization. For example, a Hub-Spoke architecture could integrate data from a multitude of sources into a data lake. Orchestrated pipelines that span teams, toolchains, data centers and organizational boundaries emanate from the data lake to create analytics platforms used by data scientists and business users to generate on-demand insights.
The Hub-Spoke architecture is part of a data enablement trend in IT. Increasingly, IT teams build the data lakes at the hub of the enterprise data architecture and provide outside groups with access to raw data and tools. Line-of-business self-service teams serve as the spokes encircling the hub.
Data that flows through the Hub-Spoke data architecture will be controlled and managed by workflows located in a centralized process hub. In 2022, modularized enterprise architectures powered by DataOps process hubs will enable the next generation of self-serve analytics by non-technical business users who need to derive insights from data.
When analytics and dashboards are inaccurate, business leaders may not be able to solve problems and pursue opportunities. When there’s an outage in critical data, every second counts. Data Observability enables you to ascertain the state of a system by observing its external outputs. A more observable system enables you to more easily pinpoint the source of an issue. Observability requires DataOps instrumentation at a granular level hooked into high-level dashboards and alerts. It includes testing and monitoring of data pipelines using tests, metrics, logs and other artifacts. In 2022, enterprises will add DataOps observability to data factories to reduce errors, eliminate unplanned work and minimize the cycle time of error resolution.
XOps is Cool
We group all of these methodologies underneath “Lean Manufacturing.” Lean seeks to identify waste in manufacturing processes by focusing on eliminating errors, cycle time, collaboration and measurement. Lean is about self-reflection and seeking smarter, less wasteful dynamic solutions together. Lean principles have transformed software development and are now poised to do the same in the data analytics space. The terminology is less important than staying focused on the goals of lean manufacturing. Anything that eliminates errors, streamlines workflow processes, improves collaboration and enhances transparency aligns with DevOps, DataOps and all the other possible Ops’ that are out there. All of this Ops momentum (or XOps) will continue in 2022, as enterprises strive to enable AI and Machine Learning process efficiency, robustness and repeatability using DataOps automation.
Originally published at https://datakitchen.io on November 29, 2021.