DevOps explained using ToC Logical Thinking Process

The “Goal Tree” for DevOps

In an earlier post on DevOps Transformation using Theory of Constraints, I explained how adopting “DevOps” as a buzzword is likely to improve very little in an organization. What is crucial for great results is adopting and understanding the required new policies while forcibly moving away from previous ways of doing things.

What exactly are the new policies and practices that need to be adopted? This post attempts to cover some of the more essential items.

The graphical representation above shows the Critical Success Factors and their Necessary conditions to make Excellent IT happen — which we assume leads to excellent competitiveness in the marketplace. The image in the header is a “Goal Tree”, an example of the Logical Thinking Processes, which are also part of the Theory of Constraints body of knowledge.

To learn more about the Theory of Constraints Logical Thinking Process, I recommend reading books by Goldratt, Dettmer, and Scheinkopf that show how this is a powerful tool to uncover the logical connections behind our reality. Those of you who read The Goal can recognize the main character in It’s Not Luck as he continues to climb the ranks of company management.

First, I’ll add yet another attempt to note where DevOps originated. The market expectation today demands software companies to release ever more frequently and cover more features and capabilities. A company that delivers software and cannot deliver fast enough is bound to be outpaced by competitors who can, taking your customers with them. Also, as you can guess, these days all companies are IT companies, that are more and more reliant on software, making it every company’s goal to get better at IT.

Once the software is developed and delivered, it needs to operate so that customers can use it reliably. So here comes the conflict, on the one hand, it is necessary to make frequent changes and improvements, and on the other hand stability, reliability and security are also requirements. Traditionally organizations solved this by assigning two (or more) different managers in their organization to deliver on these promises. The IT VP of R&D who manage developers who frequently need changes and the IT VP of Operations who manage system administrators that make sure no changes happen that could destabilize the systems.

Two different empowered people with teams to manage who have been given conflicting goals. No wonder that in many cases software organizations are very busy all the time but can show very little progress for it.

Adopting “DevOps” by hiring a third “DevOps Manager” can often make the problem much much worse and in no way closer to the goal of better and faster IT delivery. Now the company has three managers that conflict with one another.

The organization must achieve both goals simultaneously: fast delivery and reliable, secure systems are both important. Moving slow means missing the market; having unreliable systems infuriates customers who expect more than systems that don’t work and there are plenty of examples of what can happen when security is lacking.

Fast Delivery

How can a company achieve fast delivery? Several factors are common reasons for delays when delivering software, and unfortunately, people doing the work rarely realize these factors.

First, the delivery process itself — starting from ideas that become developer tasks and ending in that idea turned into a working feature in the software that is used by a customer.

The “delivery process” often includes much waste in the form of manual work; delays for various reasons like handoffs and batching of work that prevents the flow of completed work towards the process completion; rework of subpar quality work, and many other reasons. The book Lean Enterprise by Jez Humble goes into detail on all the Muda (無駄 waste) commonly found in a software delivery process. Fixing the delivery process means actively investing in eliminating waste and streamlining this process every day. Architecting this process correctly in the first place is quite significant.

Next on the list is the frequency of delivery. Many software companies have adopted Scrum and Agile in various forms, and have moved into bi-weekly delivery cycles. There is a problem with this as well. The bi-weekly cycles were meant to prevent over-planning and too frequent changes of requirements. When the delivery process is manual and requires people to spend hands-on time processing tasks, then it is natural to make it as infrequent as possible as well. However, once the first stage, streamlining the flow of the delivery process, is achieved — then doing bi-weekly releases of work makes no sense. It is far better to release more frequently and get much faster feedback from quality assurance and customers regarding the work.

The cost of late feedback quickly becomes more and more expensive. Any improvement that can make the feedback to developers regarding their code, or product managers regarding their idea, is a good improvement.

From http://www.agilemodeling.com/essays/costOfChange.htm

Last but not least, having an automated delivery and release process is not enough. It also needs to be fast! A feature finished by a developer, has to be tested and validated, and then potentially released for customers for them to have a look at and provide their feedback. When releasing in bi-weekly cycles, any objections and improvements regarding such a feature can only come at best, every two weeks. Thus, getting a feature to “done” can be an ordeal of more than a month. Just changing this policy to have daily deliveries, can have the same feature “done” in less than two weeks. That is what I mean when I say “fast!”. Stop delivering once every two weeks, start delivering every day and that will make you three times faster. According to the 2017 State of DevOps Report high performing organizations are x440 times faster than their peers.

An automated and fast pipeline coupled with frequent deliveries is a goal to achieve. Whatever you have today, most probably can be improved — you only need to put your mind to it and try and understand what all the common causes for delays are.

Reliable and Secure systems

“Move fast and break things” is no longer a good option, even Facebook is moving away from their original mantra and adopts a very important suffix — “Move fast with stable infra.”

What we realized over time is that it wasn’t helping us to move faster because we had to slow down to fix these bugs and it wasn’t improving our speed.” — Mark Zuckerberg at the 2014 F8 conference.

Achieving reliability of IT systems has some necessary conditions to follow:

  • Fast(er) response to failures that occur in your systems
  • Enable systems to automatically scale to accommodate and fit the required load or save cost by scaling down and reducing costs when there is less load on the system.
  • Avoid failures in the first place, especially when deploying new changes
  • Continuously monitor and see what can develop into a problem; or what is already a problem of stability, cost or some other factor — see it in the data of a live system and act accordingly.
  • Have just enough access to systems to investigate, and potentially fix problems — without increasing risk of exposing these systems to security threats.
  • Continuously check that the security measures in place are sufficient and that there are no low-hanging fruits that can be used to improve security. Like, for example, adding multi-factor authentication to your login processes, or closing down public open network ports that should not have been open in the first place.

The responsibility to have reliable systems cannot fall on a segregated team within the organization. A customer doesn’t care that in your organization, you as a developer cannot fix a production problem — any problem with software used by customers, is a problem of the whole company, not a single individual or team.

Conclusion

A software company can claim that it has a “DevOps Culture” if and only if that organization continuously strives to move towards the direction explained above. There are probably many additional items that I missed, but these cover the core of what it means to be more DevOps or less DevOps.

I created a hypothetical example of this on the following scale —

The scale of DevOps

Your software company might be far on the left, but likely it is somewhere in the middle. The trick is that you can always improve and more over more to the right on this scale.

Is your organization already on the very right side of the scale?
Would you care to share what additional practical steps you have in your day-to-day culture and practice? Leave your ideas in the comments!

Good luck with your DevOps journey!


Originally published at www.prodops.io.