Thanks Mathias! Glad to know that you found the thoughts provocative.
I’ve tried to make some of these points in my Velocity talks:
I’m now trying to understand how you and others make web systems, get them working, and — most of all — keep them working despite the complex failures that continue to dog everyone from AWS to Southwest Airlines to Facebook.
While the work is at an early stage one thing is clear: the price of continuous deployment is continuous vigilance. Every group is now making huge investments in trying to build human and machine infrastructure to keep track of and intervene in these big, unwieldy systems. No one imagines that this stuff is going to work on its own and everyone knows that failure is just around the corner. Compared to the automated toolchain this area is in its infancy and it is changing rapidly. It’s obvious in Nagios etc. but the whole devops world getting attention as groups seek to aid collaboration at the ops end of devops: slack, cog, pagerduty and others are, at least partly, acknowledgements of the reality of How Complex Systems Fail and the need to anticipate and head off the cascade of complex systems failure.
I look forward to reading your next installment! Thank you very much for making the effort to add to the understanding of how complex systems fail!