Two weeks ago I attended an all day session with a variety of Chief Technical Officers at the Collision Conference in New Orleans. The CTOs were from a variety of companies: start ups, rapidly growing businesses, and older, established corporations. The companies had small and large audiences and users; some were B2B, some consumer; some government vendors; some mass, some niche. They had a variety of business challenges, but cutting across all their concerns was the impact of DevOps and the Cloud on their IT departments and development processes. Here are some thoughts on those issues coming out of that conference.
1. DevOps and the Cloud are the new Agile. We didn’t directly talk about DevOps and the Cloud. We talked about innovation, release velocity, scaling, costs, effective organization, data analytics and mobile. But the cases and examples that everyone used in these discussions made it pretty clear that DevOps and the Cloud are blowing huge holes into patterns, workflows, and organizations. DevOps is kind of where Agile was ten years ago: a definite phenomenon, everyone is contending with it, and no one really knows what it means. For startups, it’s a nonissue. They are already doing it. The Developer is the test lead, the operations team, the product owner, and the business manager. The Cloud and managed services are a Godsend for a startup. They drive the time from idea to reality to zero. They free the developer from getting wrapped around the axle of system maintenance and allow them to concentrate on the business opportunity. As I walked around the floor of Collision and saw literally hundreds of startups pitching great, crazy ideas, I kept thinking that the conference itself was impossible without the Cloud.
But for everyone else, from the company trying to scale to large organizations, Dev Ops and the Cloud are one more problem to contend with, and the larger (and older) the company, the more difficult the problem. Which kind of makes sense if you think of DevOps as a movement to empower the individual developer. Agile had the advantage of starting with teams. DevOps doesn’t, and so the more developers you have, the more important it becomes to set up your services and workflows so they provide fast, reliable releases. And when you have a seemingly infinite assortment of systems and configurations to choose from, the deeper the problem of that set up is. The companies that are graduating from startups to viable businesses feel this immediately, because scaling an organization is a different problem than starting one. At a certain point the interdependencies between different components and services in a system become more important than the individual components themselves, and teams get overwhelmed by trying to make different services and components work together. Several of the CTOs from these types of businesses said the same thing: I used to get more done with fewer people.
The easy answer here is to go back to fewer people, but, that, of course, doesn’t scale. So the real issue here is — even as you release new product — you need to find engineers willing to continuously refactor your processes and systems in ways analogous to devs refactoring code. This is easier than it sounds. Even in small companies there is always some dev more interested in running your Chef knives (or Docker orchestrations) than in writing a new data form widget. Chances are, though, they’re going to be coming to Chef cold and you’ll need to give them ramp up time and patience as they work it out. DevOps and the Cloud usually utilize a lot of new, undefined technologies, and it is rare that you will find developers in your organization with experience across some much new tech. The pace of change in the Cloud is truly astonishing. So you have to give your devs time to learn that tech. Which doesn’t help if you’ve overcommitted a series of features to your users (and investors) and you need a release yesterday. At that point process reengineering takes a back seat to getting the release out, and you’re back to that classic situation where you hack your way through legacy systems and code in order to meet the needs of the business. And then that legacy code becomes a millstone for your business moving forward.
Established organizations have a similar, but slightly different problem. They usually have layer upon layer of legacy code that inhibits their releases. Almost all of their time is spent maintaining that code, and working through the incompatibilities built across systems that weren’t designed together. There’s little to no DevOps, and so DevOps at these companies is more of an organizational and cultural challenge a problem of architecture and design. Their developers are opting out of their IT systems out of frustration with the overhead involved in doing simple things. Why make a long, bureaucratic request to spin up a dev environment with IT when you can do it in AWS instantly? “Shadow IT” is a problem everyone in the industry is discussing. But they’re talking about it because the Cloud and do-it-yourself development are the new model for software releases, just like Agile was ten years ago, and whether the CTO and other C-level execs at large corporations want it or not, their devs and product managers do, and the shift is inexorable.
The real problem for any large, established company is that managing this shift properly takes a fairly large upfront investment, one lacking a direct return. If you’re a company like this, any established IT-centric infrastructure probably underlay the delivery of your current revenue. To replace it and deliver the same margins to the business makes no sense, because you’d be bearing the expense the cloud migration, and it can be high. So even if you’re aware that your systems are antiquated and will inevitably be replaced, it won’t make sense to do it until the business is materially jeopardized and you have to do it. Or you undertake that investment because you believe it will expand market or grow the business. But often that belief is countered by the fact that in your established lines of business, your old, analog margins will often exceed your new digital ones, and no one wants to upgrade their IT systems and software products for less revenue. Most of the large retailers in the US, such as Nordstroms, Sears, and Best Buy have faced something like this in the last five years. And the mixed successes they have had with their Innovation Labs and Tech Development Centers is testimony to just how difficult this problem can be.
2. The Cloud is Not Turnkey. One reason companies like Puppet spend so much time quantifying the benefits of DevOps is that they are very aware of the economic disadvantages DevOps and Cloud can pose. Claims that “High-performing IT organizations deploy 30x more frequently with 200x shorter lead time and 60x fewer failures,” are impressive until you consider what goes into creating a “high-performing” organization. The fact of the matter is that Cloud and DevOps are not turnkey solutions, even if many of the managed services themselves are black box and autonomous. How you put these services together is key. It’s extremely challenging to optimize the performance of your web mobile app if the various cloud service calls you make introduce latency between the different services. And the only way you will minimize these challenges is to test and monitor all key components of your production system, and to architect that system so that you can deploy products and components separate from the systems they rely on.
In startups, this problem is solved by self-forming teams who select services together and deploy them end-to-end; they manage and resolve the risks of deployment communally. But the larger the organization the more difficult it becomes to drive consensus across the various services and technologies available to any team, and at some point the delta between a service like Puppet or Chef is more a matter of preference than definitive outcome. At which point you either mandate a development process for an organization, or you allow the deployment of software across a multitude of different systems, which isn’t inherently harder or less effective than proscribing an approach, but does present maintenance and support challenges.
3. Key Positions in DevOps. My own experience is that DevOps works best when you get the business owners to agree to an investment in continuous refactoring of systems, and to identify and empower a few key architects who are chartered to resolve interoperability issues and push systems towards more modern implementations. They can’t do this on whim, or in response to the latest and most touted technology — especially in the Cloud. Good developers have an inherent bias towards adopting new tech, but it isn’t always what the business needs. Instead, product and systems upgrades have to be built in response to data, to performance monitoring, and to user feedback. These identify gaps and shortcomings in current products and systems, and become the guide for updating them. This is, essentially, how DevOps has developed; it is really a data-driven method for continually improving the shortcomings of the software release cycle.
At both scaling companies and established companies, you’ll usually be setting up these new systems independent of or separate branches from the current system. If you don’t, you’ll risk release failures to your current systems or products. Integrating the new system into the old becomes crucial. The architect role in the organization should be akin to that of a release manager, and the spirit of DevOps should be paramount here: the architect here should be close enough to the release as to be responsible for its success or failure, or, more likely, the tweak: once the new software or product is deployed, the DevOps monitors the deployment and identifies areas for improvement. The role is more hands-on than traditional architects, and more concerned with follow through and maintenance than traditional release managers.
The DevOps role is also similar to that of Product Management, in that it is focused on the evolution of the software or product on an ongoing basis, and in insuring that the measures of success in the data are clearly defined and weighed appropriately. Every dev team has had the experience of receiving ambiguous and conflicting data from its user base and its monitoring systems. More than anything else, product management in DevOps is interpreting that data and translating it into requirements. That, and insuring that deliverables have to be built around the next three releases, not just the immediate one.
Given its similarity to traditional IT, DevOps should be less challenging than it is to traditional IT organizations. And yet, from what I heard the CTOs in New Orleans, it is presenting enormous challenges. Some of the challenges are technical. The array of new tech coming into the cloud is astonishing, and presents such a wealth of opportunity to revising and upgrading systems that it is difficult to know where to start (with metrics), and very difficult to implement. The practical reality is that if you are moving out of your DC and into the Cloud it will take time and money, and if your organization has neither than making that move will be painful. And CTOs at large companies have to contend with the marketing pressure from Cloud companies. They are, frankly, overselling the Cloud, which is to be expected. Business owners are embracing the revolutionary potential of the Cloud and DevOps and are afraid they will be out competed if it isn’t implemented yesterday. One of the best ways to respond to that pressure is to implement the Cloud and DevOps on discreet projects that demonstrate its potential and minimize its risk (Consumer-facing web mobile or native app is a great place to start, but that’s another article).
Some articles on the web and some of the comments from the CTOs in New Orleans emphasize the cultural challenge that DevOps presents to organizations. The cultural challenge is usually framed as one where sclerotic IT personnel resist the coming wave of DevOps efficiency, just as antediluvian Waterfall managers fought the promise of Agile. I haven’t seen a lot of this. The key cultural challenge — and it is real — is getting devs to use and understand metrics in an objective fashion, as opposed to a way that validates their biases. But it isn’t one where IT personnel refuse to take responsibility for their releases. That’s more a matter of structuring the organization such that DevOps or IT can take responsibility for a release. The real promise of DevOps is precisely in such empowerment, and if you give it to your developers or your IT personnel, they will take it: You own the code. You own the product. Now go fix it.