DataOps: The Role of the Data Architect
Much has been written about the role of data engineers and data scientists in building today’s data pipelines. For DataOps to scale in a consistent and cohesive manner, a data architect is essential but the approach to data architecture needs to change. It can’t be about what agile refers to a Big Up Front Design. (BUFD).
Since the 90’s, data acquisition and ingestion, ETL processing, data warehouse models, reporting and analytics have been well understood and established practices. Innovation was not a significant part of data warehouse solution. Instead, it was about the optimization and application of hardened practices. Data warehouses were delivered, often one subject area at a time, over a number years of development. The approach to data architecture, was largely what Agile practitioners today refer to as Big Up Front Design (BUFD).
So what has changed ? Not only have the architectures around data pipelines become more complex, but innovation and adaption is now a major part of the design and development of the modern enterprise scale data pipelines. This is not likely to change, for some time.
Further to this, the expectations from business have changed, the notion of waiting months if not years for cleansed and well integrated data for large subject areas has given way to more specialized and focused data preparation delivered in a near continuous manner.
The alternative to BUFD, is Agile architecture which seeks to find a balance between the practices of Intentional Architecture and Emergent Design, allowing for a kind of architectural runway designed to provide enough of a cohesive foundation for the project to move forward. This is actually harder to do, than it sounds. It requires the architect to adopt a far more collaborative and adaptive approach from start to finish.
Since an Agile architecture is developed in a continuous an incremental manner over time, this allows for the essential innovation and adaptIon new ideas to emerge and be integrated into the solution.
As described in the Scaled Agile Frame (SAFe) documentation.
“Agile Architecture is a set of values, practices, and collaborations that support the active, evolutionary design and architecture of a system. This approach embraces the DevOps mindset, allowing the architecture of a system to evolve continuously over time, while simultaneously supporting the needs of current users. It avoids the overhead and delays associated with the start-stop-start nature and large-scale redesign inherent with phase-gate processes and Big Up Front Design (BUFD)”
© Scaled Agile, Inc.
It took the Agile movement a long time to realize that architecture in any form needed to be part of the solution. It’s only when you increase the scale and complexity of what is to be built that the need for architecture becomes apparent and critically essential to a successful outcome.
For enterprise scale data pipelines, there is a tendency for large teams to lose sight of the whole, as they focus on the individual parts of what is a large and complex undertaking. Teams specialize and optimize for one part of the pipeline, can lose sight of the consequences of design decisions to teams further along. A singular vision of what is being built, ensures that each team understands their part of the whole solution.
“The architecture of a system can either accelerate or impede the ability to provide frequent, independent releases for the business to meet its objectives”
© Scaled Agile, Inc.
A BUFD architecture, can and often does stifle innovation and adaption to new learnings, that needs to take place for a data pipeline to be successful. A BUFD architecture does not foster collaboration, it is not easily adaptable.
An Agile architecture, even if emergent, is key for all stakeholders to understand the scope and capabilities of the data pipeline. As the scale and complexity of what is being built increases so does the need for stakeholders to understand what is to be built.
In an enterprise scale data pipeline, the role of the architect becomes essential as the architecture needs to address a number of aspects of the solution that include but are not limited to:
Regulatory compliance: What are the capabilities needed to ensure regulatory compliance of the data moving through the pipeline. From this perspective, financial data or data that has information about the identity of an individual must be managed differently than other kinds of data.
Data Quality: In moving the data through a pipeline, what is needed to ensure an appropriate level of data quality. This is about building trust in the data being used for decision making. It includes all manner of validation, reconciliation and automated testing, applied end to end.
Data Governance: While Data Quality is the degree to which data is accurate, complete, timely and consistent. Data Governance is about the exercise of authority and control of the data. In large scale data pipelines, the need for data governance goes from being a nice-to-have, to be a critical success factor.
Stakeholder Communications: As the scope and complexity of what is being built increases, so does the need for an architectural representation of what is being built, so that those that invest, plan and build each understands the whole of what is being built.
Architectural Patterns and Best Practices: The architect sees the solution as a set of capabilities being delivered by components that are the result of custom development, or through the customization of off-the-shelf technology and a set of patterns of execution of those components that are expanded to include those practices that are the result of innovation and ever changing requirements.
Alignment with Enterprise Strategies: It is not uncommon for architect of a data pipeline to work with an Enterprise Architect. This about aligning the technologies and best practices with those of the enterprise as a whole. This collaboration often becomes an exercise in compromise and even exceptions, based on understanding the trade-offs being met
What these aspects of the solution have in common is that they are the result of the application of people, process and technology, and not technology alone. They are achieved through collaboration with enterprise level stakeholders. The architecture, emergent or otherwise, serves as a document of understanding, a singular and comprehensible, “big picture” view is what is being built, without which there is no basis on which to build consensus and agreement.
Without an architecture, the cohesiveness and efficiencies that comes from prescribed patterns and strategies for dealing with the complexity of the whole as opposed to its individual parts, begins to unravel over time. The business value that comes from early wins in prototyping, and the first stages of development will begin to diminish quickly as the scale and complexity of what is being built becomes less comprehensible, and less maintainable as a whole.
This article was originally published on LinkedIn https://www.linkedin.com/pulse/data-ops-role-architect-dennis-layton/