DevOps embedding as aiding and abetting

Aiding and abetting is a legal doctrine related to the guilt of someone who aids or abets in the commission of a crime (or in another’s suicide). It exists in a number of different countries and generally allows a court to pronounce someone guilty for aiding and abetting in a crime even if they are not the principal offender.
Photo by Matt Popovich on Unsplash

The two most talented operations engineers I have ever had the privilege of working with shared an important desire not to be embedded in product teams. They wanted to be part of a team that delivered core platform functionality that could be used across products. In addition to that organizational placement preference, they excelled at system engineering, scalability, and distributed systems. Their programming language of choice was Golang; not because it looked good on a resume, but because it provided for operational simplicity through its statically linked native binaries without any external dependencies. A software choice informed by operational experience.

As their manager, I expended political capital to shield them from swift-changing winds of product team embedment. It paid dividends for the company. They built operational software that helped us scale, instrument, and monitor multiple-products better. Their impact was far great than it would have been were they focused on a single service.

Sadly, the lessons learned from their engagement could not be replicated across all the operations teams I managed. Organizational changes outside of our organization and other contributing factors allowed me to only have one of the teams, a fraction of the entire group, to focus on platform engineering initiatives. Regular skip-level one-on-ones with team members who were product focused versus those who were platform focused demonstrated a stark difference in morale and engagement. Those operations engineers exercising their expertise in building out a shared platform from an internal backlog were more likely to be happier and counter-intuitively, more understanding of how their talents contributed to the overall vision of the company.

The anecdotal evidence from my management career and a reflection on the nature of operations in an evolving software landscape both contributed to several axioms I find hold true in this space:

  1. Organizational structure should be such that product teams are formed that have all of the necessary skills to deliver business functionality with minimum dependencies on other teams and maximum accountability for the product they are delivering.
  2. Operations engineering is a craft, distinct but related to software engineering.
  3. Their common ground is the delivery of business value through automation with programming.
  4. Their difference is in the type of business value being delivered. Software engineering has a focus on customer facing functionality while operations engineering has a focus on non-functional requirements (NFR) such as scalability, visibility, security, and governance.
  5. In the “physics” of product delivery organizations, customer-facing features are prioritized, to the detriment of the business, over NFRs.
  6. The remedy to number five above is the deliberate establishment of platform (operations) engineering focused teams that deliver a self-service platform where good scalability, visibility, security, and governance choices are paths of least resistance and consistent between products.

On the surface, this sounds like functional orientation of teams; something the DevOps movement has tried hard to eliminate. However, many mature, high-performing engineering organizations utilize this approach and find it consistent with DevOps practices. Dedicated platform engineering teams are not a throwback to legacy IT models of operation. Even pure functionally oriented teams can be compatible with a DevOps model:

We can also achieve our desired DevOps outcomes through functional orientation, as long as everyone in the value stream views customer and organizational outcomes as a shared goal, regardless of where they reside in the organization. — The DevOps Handbook

Organizational size and industry vertical play a role in how the six axioms impact team structure. Products or services in the proof of concept phase of life, of lower complexity, or inconsequential security concerns obviously don’t need dedicated platform engineering teams. So when are dedicated teams needed? There are two main drivers.

The first driver is when an organization has a portfolio of products that have to integrate in order to provide a competitive advantage. In other words, they are trying to sell customers that their products are greater than the sum of their parts. (Platform Revolution) This maturity point impacts engineering teams in a unique way that likely hasn’t presented itself in their previous software development history. Niche functionality is no longer the primary goal. Integrations that include both technology and teams are the emerging problem space.

Integration points are the number-one killer of systems. Every single one of those feeds presents a stability risk. — Release It!

Getting stability right now becomes harder because you are trying to bring together independently developed code that wasn’t communicating prior. Abstractions that “cushion” the integration blow need to utilized. Unique abstractions, such as circuit breakers, back-offs, and automated throttling should not be created for each integration. Instead, they should be developed once by a common team and applied to each project.

Getting security right also increases in difficulty. Your organization now has problems in authentication and authorization of service-to-service communication. Engineering teams that were designing for confidentiality and integrity requirements using pre-shared keys and network access control lists find these solutions no longer scale. The operational overhead for symmetric key management and maintaining lists of IP addresses sink the historical ways of doing security. (For the inherent headaches of managing IP addresses in a ephemeral, cloud native world and the emergent patterns for network design see: Trust No One: A Gap Analysis of Moving IP-Based Network Perimeters to A Zero Trust Network Architecture)

Platform teams should now be creating abstractions that functional teams can consume which provide for service identity built on a public key infrastructure and internal certificate authorities. These abstractions will provide for mutually authenticated communication over untrusted networks, such as the public Internet.

The second driver for dedicated, platform engineering teams can generally be grouped under governance; compliance, regulatory, or cost efficiency concerns. Consistency is the key here. It can be a balancing act to implement when trying to give development teams the flexibility to choose their own tools and experiment. However, there are some areas for investment that are good starting places.

Many modern organizations are subject to more than one regulatory body (SOC II, GDPR, ISO 27001, etc…) and thus some upfront initiative needs to take place so that requirements of each framework are not implemented independently. This is where control frameworks and mapping exercises kick in. The controls that surface as part of the output of these exercises should be consistent between services. For example, software supply-chain controls, such as static code analysis and vulnerability dependency checking, should be consistent between services. Platform engineering teams that are focusing on release or application infrastructure can partner with their organization’s audit and security teams to define common patterns for implementation across services.

Consistency in meta-tagging of resources is probably the most obvious, least-impactful to development teams, yet one of the hardest areas to get right. At my last gig, we had an extended debate about what to build first when kicking off our platform team. “Visibility is the most primitive of primitives” was a quote that emerged from that discussion. If you don’t know what you have how can you even begin to manage (govern) it? It is not surprising that the first two controls in the CIS Top 20 list are inventory of hardware and inventory of software assets. (For more on this see my article on asset management.) Therefore, building visibility in into the entire business value pipeline should be one of the highest priorities for a platform engineering team.

There comes a time where the common concerns of a maturing engineering organization hit a tipping point. Deliberate focus on these areas from a team with a separate backlog is necessary. Idealistic implementations of Agile or naive product management that consider embedding of operations engineers on functional teams as the only valid team structure end up harming the organization’s future. These mindsets are aiding and abetting a Feature Factory software development shop where the realities of growing platform concerns are ignored and will eventually paralyze future business value delivery.