Team Topologies for Data Engineering Teams
Data Teams have been the outliers of the IT departments for quite a long time. The majority of these teams follow practices that lack the efficiency, structure and flow of software development and raise questions about the excessive time-to-market or the poor release quality.
In recent years, and especially with the introduction of new roles and areas such as Data Science, Machine Learning or Big Data Engineering, Data Teams are being force to adopt not only a set of new practices and tools but also a different mindset which is now starting to mold the term DataOps and bridging the gap between data and software development.
As part of my search to create a logical and agile organization for the Data Engineering Teams I’m currently leading, I have recently came across the book Team Topologies by Matthew Skelton and Manuel Pais, which features relevant and interesting analysis and definitions of topics such as Team First Approach, Team API, Cognitive Load, Fundamental Team Topologies and Team Interaction Modes.
Rooted in the idea that a team is the core element of an organization and that objectives should be assigned to teams and not to individuals, I’ve used the Team Topologies proposed in the book to settle an Ecosystem for Data Engineering Teams which addresses the existing roles and responsibilities and enables communication, transparency and feedback and that in time will fuel the capabilities to shorten time-to-data and improve quality.
Platform Team — Data Platform Engineering Team
A Platform Team works on the underlying platform that supports all other teams. Its main goal is to simplify the access to services and resources, governing and maintaining the platform and with that reducing the cognitive load for other teams.
I’ve assigned this Topology to the Data Platform Engineering Team, which is responsible for the different components used by other Data Teams within the Data Platform. Within the scope of this team or group of teams one can find all kinds of databases, container based technologies, scheduling and orchestration tools, monitoring and reporting platforms and any other underlying system that can be shared by one or more data teams.
Due to the characteristics of this team, the main Interaction Mode used is the X-as-a-Service Mode, which needs to be as nimble as possible and leverage anticipation to provide the best service possible.
Enabling Team — Data Operations Team
An Enabling Team supports other teams to adopt new practices, processes and tools over a transition or learning period.
This Topology fits perfectly a Data Operations Team, since its objective is to provide the best capabilities for other teams to deliver quick, stable and valuable solutions around data.
Besides being in charge of all the applications and processes in production, the Data Operations Team, through the hands of its DataOps Engineers, enables and helps other Data Teams by providing things like CI/CD pipeline structures, new automatically created development environments or by accelerating test and release phases.
The Facilitating Mode is usually the best Interaction Mode because it allows this Team to be in constant movement and help multiple Data Teams at once and over time.
Complicated Sub-system Team — Data Frameworks Teams
In the words of the authors of the book this Topology is for “teams with a special remit for a subsystem that is too complicated to be dealt with by a normal stream-aligned team or platform”  and it’s the only Topology considered optional.
I found it quite useful to define the Data Frameworks Teams, which are responsible for developing, maintaining and improving the Data Frameworks used by all data development roles.
A Data Framework is designed and managed as a product and is any piece of software that can be used to accelerate or standardize the way the stages of the data lifecycle like Data Ingestion, Data Quality, Data Preparation, Model Training or Data Egress are performed.
This Team Topology uses all the different modes of interaction in its relation with other teams. However, the most common in our case are the X-as-a-Service Mode, which is used to request new features or to solve Framework bugs, and the Collaboration Mode that allows for other Teams to learn from the Data Frameworks Teams how to correctly use the Frameworks and their features.
Stream-Aligned Team — Data Product/Service Teams
A Stream-Aligned Team is the closest Topology to the cross-functional agile team model that organizations are largely embracing. This Topology is used for teams that are focused on a business stream, a product or a customer journey. This type of team happens naturally in the organizations, they are sometimes created as part of a project or, more recently, formed upon a framework like Scrum or Kanban to independently and incrementally deliver value.
On our current approach the Stream-Aligned Teams are all the Data Product/Service Teams that deliver value to a certain line of business through analytical models, reports and dashboards or ad-hoc queries. These Teams are formed with many data related roles such as Data Scientists, Data Engineers, Data Architects and Data Analysts to which are added other more functional profiles like Business Analysts, Business Partners and other business SME.
In this scenario Data driven Stream-Aligned Teams benefit from all the Interaction Modes since they behave as clients to the other Topologies which work to make sure that the platforms, tools and assistance are always available for the Stream-Aligned Teams to deliver the best data products and services timely and with excellence.
Data Engineering is the core and silent enabler of a successful Data Strategy, as this practice supports all the other data related roles and areas from the moment data is collected through the moment a decision is made by a human or a machine.
There is still a long way to go when it comes to make data driven solutions reach their full potential. In order to get there, companies should make a clear move towards a planned and structured investment roadmap to raise data literacy, to create the right capabilities and mindset, to define an enterprise data vision and architecture (visit my article on this topic — A new Enterprise Data Architecture for the new Data World), to source, hire and train data profiles, and to explore and implement new lines of thought like DataOps, MLOps and adapt other more generic ones like Team Topologies.
Disclaimer: The opinions expressed or implied in this text are exclusively mine and do not reflect those of third parties, who will not be responsible for any action that may result from my views and opinions.
I’d like to acknowledge Matthew Skelton and Manuel Pais for a useful and well written book that helped me steer my vision and strategy around managing Data Engineering Teams.
— — If you like this article please share it with your network — —
 Matthew Skelton, Manuel Pais, Team Topologies (2019)