A few days ago, Mark Schwartz (of AWS) published an article about the critical missing piece of DevOps, discussing the scale of work in and by IT Operations, and how the way organizations approach DevOps often includes some, but not all of those responsibilities. With that, I could not agree more.
I believe Mark touches an important concept in his article — efficiency. There are many responsibilities that fall on the Ops side of the IT organization, and a significant part of that work is not product or feature specific. I will cover some of the topics where I believe we are aligned with Mark and some topics where I believe our experience differs.
There is Ops and then there is Ops
Software developers, being part of product or project teams, used to most often interact with Ops voluntarily, if not necessarily happily, when they needed a new version of software to be deployed to production. Significantly less happy interactions took place when Dev were informed by Ops that their newest version had indeed failed in production and needed to either be fixed immediately, or a roll-back would commence in one hour.
This limited exposure to what Ops really does (beyond scheduling strange release windows and forwarding tickets, always about issues) did, in many ways, lead to the situation where the DevOps movement was essentially inevitable. Both sides had perfected their art, but that had led them down different paths of optimization and improvement. Agile practices required Ops to become more agile too, to realize their potential value. Release and deployment work was the common touch-point here, highlighting many of the pains, and overshadowing a lot of other work done by Ops.
Without the environments Ops was maintaining, there would not have been anywhere to deploy the new versions. In building these environments, Ops was concerned with questions like ‘How to measure demand, understand the patterns, and plan for the required capacity in a cost-effective manner?’, ‘What level of interruptions that is likely to have some downtime is acceptable for the customer?’, and ‘How to ensure that a change with this system here does not bring down that other system there?’. This short list is just a taster of a much larger set of concerns that all fall within Ops’ remit.
In addition to technical aspects of Ops’ role, there’s also the customer support function (usually called Helpdesk or Service desk) that has the skills, knowledge, processes and tools to deal with incidents and customer requests. In some organizations, this is bundled together with the NOC, and in some others, it is a separate function. While it is important for developers to have close contact with the users of their software, running a decentralized support model with multiple contact points for customers is unfeasible. How would the customer know which one feature team out of the one hundred and seven to contact when there’s a problem they cannot solve by reading the FAQ?
Ops have their hands full
Roughly speaking, we can split the functions in Ops into three categories — planning, maintenance, and support. This is a rather crude and simplistic division and a hugely generalizing one as well, but as I don’t have the luxury of referring to previous four chapters in my not-yet-written book on this matter, generalization is what we get.
In planning, we are looking at how to better manage the vendors (which then helps to better utilize the capabilities you’re paying for and/or save costs), how to translate fluctuating demand into capacity requirements, and how to include various regulatory requirements in the design of production environments.
In maintenance, we are monitoring how systems are performing and are introducing changes into these systems — hopefully fully tested, hopefully mostly automated. This is the ‘situation normal’ context, where what can be done is constrained by how the systems were designed. Just to be clear — this is not the second step of a waterfall model, but a group of activities where people performing the tasks have to live with decisions that were made earlier. A continual feedback loop is crucial here to improve systems’ resilience and capabilities, with improvements taking anywhere from a few seconds to several months, based on the nature of the change.
In support, we are dealing with system failures that don’t just go away when an automated response is triggered. This is also the group of activities that is at least as close to the customer as product strategy and design are — an outage usually has an impact on the customer, and good communication is crucial here. Again, this is not a third step in a waterfall model — the tasks in all three categories are intertwined, co-dependent, and require proper feedback loops to be in place for the service not to degrade.
Mark notes that there is a significant amount of Ops work that sits outside of DevOps teams. While I agree with him in principle, I would call for caution when using phrases like ‘DevOps teams’. What I believe Mark is referring to is product teams, or perhaps feature teams. A DevOps team, unless it includes all roles from the organization involved in the value chain — which is rather unrealistic — is at a high risk of becoming yet another silo, which I think is confirmed by the story Mark mentions in the article.
I consider DevOps to be about collaboration between all teams involved in supporting customer outcomes, and not about newly created (agile) teams trying to do other teams’ work. Even in the limited and literal interpretation of DevOps, the attempts at creating DevOps teams are almost always Dev-led, covering everything about Dev work, but ignoring a large part of Ops work (due to limited exposure to Ops work), not to even mention QA or Security.
While we have recently seen many — way too many! — discussions about things like DevSecOps and DevQAOps and BizDevOps, they all miss the point. I will come back to BizDevOps in one of my next articles, but for now, let’s just say that repeating the same mistakes in more places does not usually deliver fantastic results for the organization or their customers.
I have seen teams struggle with positioning DevOps in the full picture of all the work done by all the (technical) teams in the organization. And in this context, I would say that if you feel you have Ops outside of what you consider DevOps, you do not, in fact, have DevOps.
No-snowflake product teams
Efficiency. There are indeed common solutions, processes, and procedures each individual product team does not have to design or maintain themselves. It would make sense for the organization to try to centralize at least some of these activities to avoid (further) duplication of effort and unhelpful waste.
Operations-as-a-Platform is an approach where Ops build and support automated solutions for other internal teams. The model is not new, but its scope has changed. Dev (that is, product or feature teams) were usually not seen as customers for Ops’ services — only non-IT teams had that luxury.
The scope of services that can be provided by Ops is not limited to e.g. the automated deployment pipeline, although this might be one of the highest priorities in the context of DevOps practices. Developers are interested in better understanding how their code performs in real life, and while some insight can be achieved through instrumentation, it does not provide a full picture. The role of Ops here is not to create (yet another) ticket queue for specific Dev requests, but to deploy or build a tool allowing developers to build their own reports and dashboards. This is significantly easier to achieve if all product teams are happy to use the same centrally supported tools.
Mark mentions ‘business results of the code when it is running in production’. I personally do not believe something like this exists. The code running in production is only one, and potentially very small part of what is delivered to the customer, packaged into a holistic offering that is is helping customers to achieve their desired outcomes.
Of course, the more feature-focused the organization and their customers are, the bigger the role new functionality plays in customer satisfaction, but all teams involved with the customer in one way or another are part of contributing to those business results. Positioning code as what delivers business results has the risk of alienating back-of-the-house teams, as it has indeed happened in many organizations.
Understanding what components and how exactly contribute to the value customer receives is crucial for any organization that has set out on a digital transformation journey. Without the big picture view, we are very likely to fall into the trap of local optimizations, which are often coupled with meaningless metrics to justify (alleged improvement) activities rather than their impact on the organization and their customers.
While mapping the components and clarifying the flow, it is also important to not go overboard and forget what the customer cares about. The mapping is for the provider to better understand how they work, and not for printing out on glossy paper, telling customers how transparent and/or lean the provider now is. Most customers value actions and results over words in brochures.