Glasnostic … or How to Operate a Service Landscape
My how the world has changed. As a former “PaaS guy” — I was the technical founder of a company that became Red Hat OpenShift — I experienced applications evolve from being self-contained, simple architectures you could code up along a single thread, throw on a PaaS, flick the New Relic switch and be done with, to today’s mesh of services that is continually changing and relentlessly growing. Today’s applications consist of lean, highly connected microservices with vastly differing life cycles and diverse maturities that run in cloudy environments ranging from VMs to containers to serverless. And because businesses have an insatiable need for new digital capabilities and because coding services has become so easy due to their single-minded focus and because running them requires little more than a credit card and some orchestration, more services are added to the landscape every day.
But how do you operate such a thing? How do you ensure Quality-of-Service if your architecture changes by the hour? How can you create bulkheads to protect critical parts of the topology? Where should you insert circuit breakers? And most importantly: if something goes wrong, how do you even know where to look?
Like always in operations, we’d be flying blind without monitoring. But the visibility provided by it must also be actionable or we’ll be stuck wondering how to interpret the data. Of course, visibility alone is of little value unless we also can turn insights into actions and unless these actions also have predictable outcomes. In other words, operating a service landscape requires both, actionable visibility and predictable control.
This is where approaching service landscape operations with traditional notions of monitoring and remediation falls short. If you are facing the risk of cascading failure and other systemic issues, traditional, node-centric monitoring is nothing but a dangerously self-defeating distraction. Don’t drown in irrelevant and overly specific data that only exist because they are easy to collect. If you catch yourself worrying how many sockets on a particular machine are in a particular state, you are fighting the wrong battle. You need to focus ruthlessly on the big picture, which you’ll see only by looking at the communication between nodes. If a node misbehaves, it is replaced and the dev team notified. It is not your job to care about why.
Likewise, with failure a constant in such systems, you have to let go of the fetish of total control. Service landscapes are not deterministic, well-defined or fully understood, predictable machines. They exhibit failure as an intrinsic quality that comes in many shades. Like services having to accommodate compensating strategies, you have to be able to compensate at the systemic level, in real time, and with the confidence that results from applying tried and tested, predictable operational patterns. If you catch yourself thinking everything would be fine “if our engineers would just code a little better”, you are operating a number of services, not a service landscape.
These are the convictions that led us to create Glasnostic. As every business is becoming a software business, every software portfolio is becoming a service landscape. And these continually growing and changing service landscapes require a fundamentally different approach to operations — an approach that provides actionable visibility and predictable control.
Please join me in welcoming Glasnostic to the world.
Sign up for early access or play with the demo at https://glasnostic.com. And be sure to let us know what you think!