Apollo 13 mission control (source: spacecenter.org)

To Boldly Go: Mission Control For Microservices

Tobias Kunze
Glasnostic
Published in
4 min readSep 24, 2019

--

When I co-founded a PaaS company many years ago, we single-mindedly focused on how to optimally support the building of applications. When that company then became Red Hat OpenShift and subsequently gained in popularity, I learned three interesting things: first, that important, meaningful applications are never just applications-they are always systems of applications. Second, that the scale and flexibility that we increasingly expect of such systems make building them prohibitively expensive. And third, that successful systems don’t become successful because they are well engineered. They become successful when they are operated well. In other words, what I learned was that successful missions are, above all, run.

Successful missions are run, not built.

Innovation Dies in Ops

This has never been truer and more important than today when enterprises all over the world push towards a new operating model that is fully agile. Where we work in small, self-managing teams, with rapid decision and learning cycles. Where we execute on microservice strategies, with cloud-native productivity and development teams all deploying in parallel.

Enterprise agility creates a continually evolving landscape of sprawling, increasingly hyperconnected services that are difficult to control.

The result of this push towards agility, both on the business and the technology side, is a continually evolving landscape of sprawling, increasingly hyperconnected services. Unfortunately, such service landscapes have two fundamentally troubling characteristics: first, they tend to contain ephemeral actors-actors that are moving around, which makes them difficult to secure. And second, they exhibit complex and disruptive interaction behaviors that make them inherently unstable.

Worst of all, these characteristics can’t be engineered away. In a fast-moving service landscape, we can’t simply define operational structure and create policies, then stash them in configuration and forget about them. Every threshold and every configuration value is bound to create a new ripple effect as the landscape continues to change and thus becomes the source of additional instability or vulnerability.

In other words, the agility that the enterprise craves leaves us with an architecture we can’t control. And of course, if we can’t control, we can’t operate, and if we can’t operate, innovation dies.

This operational crisis is the defining problem in software architecture today.

It’s the Environment, Stupid!

Because the business wants a service landscape on the one hand but on the other we fundamentally cannot engineer their stability and security issues away, we need an ability to define structure and to govern, in real time. We need to be able to detect and react so we can provide our service landscape with the systemic resilience that is essential to running it successfully. We need to be able to control disruptive behaviors, prevent cascading failures and avert security breaches.

Service landscapes need to be run with a mission control mindset. Like air traffic controllers, operations, SRE and security teams need to manage the stability and security of the entire architecture, not just individual applications and processes. And because complex, emergent behaviors are unpredictable, they need to do this in real time.

In other words, like an air traffic controller, we need to operate with a mission control mindset. We need to care about the stability and security of the airspace, not individual ground operations. It is the global environment that matters, not the technical details of local processes. And because environmental behaviors are inherently unpredictable, this requires real-time control of the key metrics of all flights so we can structure and govern the airspace and detect and react to any issues that arise.

What matters is the ability to control the global environment in real-time, not to trace individual processes.

This is what Glasnostic does.

Glasnostic’s console lets mission-control operations, SRE and security teams detect what happens at the systemic level, structure interactions into channels and govern with policies, in real-time.

A Control Plane for Operations, SRE and Security Teams

We are a control plane that helps operations, SRE and security teams to detect what happens at the systemic level, to structure interactions and to govern, in real time. In short, we let mission-control Ops and SecOps teams remediate today’s rapidly evolving architectures.

What sets us apart from service meshes and observability tools is that we are a touchless solution that works without agents or sidecars and is independent of any technology stack. We are about remediating landscapes of services, not monitoring individual nodes or threads of execution. We are about solving today’s critical operations problems, not scratching the newest developer itch.

The value Glasnostic provides is that we remove the operational roadblock that keeps organizations from deploying innovative services rapidly by enabling operators to repair disruptive behaviors in real time, while saving on operational tooling.

Benefits

Our users are the mission-control operators of cloud architectures across enterprises, telcos and governments.

Mission control operations enables companies to deploy to production, architect in realtime and define finely granulated cloud structure, among other benefits.

For instance, the reduction of deployment risk that operational patterns such as bulkheads or quarantines provide has enabled an e-commerce company to eliminate staging and deploy to production instead. The real-time visibility and control based on “golden signals” that our console provides has allowed a connected car company to avoid expensive up-front design and expose their infrastructure through a rapidly evolving service layer instead. Finally, our ability to define structure via channels and to govern in real time at the application layer via policies has enabled a cloud provider to abandon a long-running attempt to shoehorn their SDN layer into that level of control.

Originally published at https://glasnostic.com on September 24, 2019.

--

--

Tobias Kunze
Glasnostic

Co-founder & CEO of Glasnostic, bringing traffic control to microservices. Co-founded Makara, the company that became Red Hat OpenShift.