To Boldly Go: Mission Control For Microservices
When I co-founded a PaaS company many years ago, we single-mindedly focused on how to optimally support the building of applications. When that company then became Red Hat OpenShift and subsequently gained in popularity, I learned three interesting things: first, that important, meaningful applications are never just applications-they are always systems of applications. Second, that the scale and flexibility that we increasingly expect of such systems make building them prohibitively expensive. And third, that successful systems don’t become successful because they are well engineered. They become successful when they are operated well. In other words, what I learned was that successful missions are, above all, run.
Successful missions are run, not built.
Innovation Dies in Ops
This has never been truer and more important than today when enterprises all over the world push towards a new operating model that is fully agile. Where we work in small, self-managing teams, with rapid decision and learning cycles. Where we execute on microservice strategies, with cloud-native productivity and development teams all deploying in parallel.
The result of this push towards agility, both on the business and the technology side, is a continually evolving landscape of sprawling, increasingly hyperconnected services. Unfortunately, such service landscapes have two fundamentally troubling characteristics: first, they tend to contain ephemeral actors-actors that are moving around, which makes them difficult to secure. And second, they exhibit complex and disruptive interaction behaviors that make them inherently unstable.
Worst of all, these characteristics can’t be engineered away. In a fast-moving service landscape, we can’t simply define operational structure and create policies, then stash them in configuration and forget about them. Every threshold and every configuration value is bound to create a new ripple effect as the landscape continues to change and thus becomes the source of additional instability or vulnerability.
In other words, the agility that the enterprise craves leaves us with an architecture we can’t control. And of course, if we can’t control, we can’t operate, and if we can’t operate, innovation dies.
This operational crisis is the defining problem in software architecture today.
It’s the Environment, Stupid!
Because the business wants a service landscape on the one hand but on the other we fundamentally cannot engineer their stability and security issues away, we need an ability to define structure and to govern, in real time. We need to be able to detect and react so we can provide our service landscape with the systemic resilience that is essential to running it successfully. We need to be able to control disruptive behaviors, prevent cascading failures and avert security breaches.
In other words, like an air traffic controller, we need to operate with a mission control mindset. We need to care about the stability and security of the airspace, not individual ground operations. It is the global environment that matters, not the technical details of local processes. And because environmental behaviors are inherently unpredictable, this requires real-time control of the key metrics of all flights so we can structure and govern the airspace and detect and react to any issues that arise.
What matters is the ability to control the global environment in real-time, not to trace individual processes.
This is what Glasnostic does.
A Control Plane for Operations, SRE and Security Teams
We are a control plane that helps operations, SRE and security teams to detect what happens at the systemic level, to structure interactions and to govern, in real time. In short, we let mission-control Ops and SecOps teams remediate today’s rapidly evolving architectures.
What sets us apart from service meshes and observability tools is that we are a touchless solution that works without agents or sidecars and is independent of any technology stack. We are about remediating landscapes of services, not monitoring individual nodes or threads of execution. We are about solving today’s critical operations problems, not scratching the newest developer itch.
The value Glasnostic provides is that we remove the operational roadblock that keeps organizations from deploying innovative services rapidly by enabling operators to repair disruptive behaviors in real time, while saving on operational tooling.
Benefits
Our users are the mission-control operators of cloud architectures across enterprises, telcos and governments.
For instance, the reduction of deployment risk that operational patterns such as bulkheads or quarantines provide has enabled an e-commerce company to eliminate staging and deploy to production instead. The real-time visibility and control based on “golden signals” that our console provides has allowed a connected car company to avoid expensive up-front design and expose their infrastructure through a rapidly evolving service layer instead. Finally, our ability to define structure via channels and to govern in real time at the application layer via policies has enabled a cloud provider to abandon a long-running attempt to shoehorn their SDN layer into that level of control.
Originally published at https://glasnostic.com on September 24, 2019.