Apollo 11 crew in quarantine after their successful return (Image: NASA)

Mitigating Deployment Risk in Microservice Architectures: The Quarantine Operational Pattern

Published in

Glasnostic

8 min readMay 28, 2019

Enterprises are increasingly organizing themselves around self-managing teams that develop in parallel and embrace rapid decision making and learning cycles. Of course, with parallel development come independent release cycles, which produce software architecture that moves away from individual applications towards a style that thrives on the organic composition of individual services. Adopting this type of development and architecture style results in an assembly of federated business capabilities.

While such architectures are great for business agility, they are also challenging to keep stable and secure as each independent deployment introduces changes with unknown consequences. Therefore, it is a necessity that operations teams are able to quarantine new deployments effectively and with flexibility.

What is a Quarantine?

“Quarantine” is a medical term that refers to the practice of containing potentially contagious agents from the public at large until a determination has been made as to their infectiousness. As such, quarantining is mostly about protecting the surrounding environment from a potential threat.

In the days of the Plague, quarantining meant confining potential carriers for 40 days ( quaranta giorni). In the context of microservice operations it refers to ring-fencing new deployments for a variable, but generally short amount of time. The reason for this is so the potential impact on the rest of the architecture can be understood and the operations team becomes reasonably confident that its introduction won’t lead to issues in the service landscape at large. To that effect, deployments are first brought up on “air-gapped” infrastructure and then slowly allowed to interact with upstream services.

The medical origin of the concept of quarantining sometimes causes it to be confused with isolation or infection prevention. However, the operational pattern that implements isolation is segmentation and the most common pattern in failure prevention is backpressure.

Why Quarantine Deployments?

In any software architecture, deployments are the most significant drivers of change. The risk associated with deployments is traditionally managed by testing and staging each new deployment before releasing it into production. This process is costly but works reasonably well as long as changes don’t happen too frequently, production environments can be staged and the architecture is static enough to make staging a deployment a meaningful exercise in the first place.

None of these staging characteristics are the case when dealing with service landscapes. Development teams releasing services continuously and independent of each other introduce changes at a fast rate, leading to organic federated growth, which makes creating a realistic staging environment impossible and staging less meaningful.

As a result, teams operating a service landscape need to “stage in production.” This is where the power of quarantining comes in. Keeping new deployments away from the rest of the architecture at first and only allowing them to engage “at closer range” as they prove unproblematic is the single most valuable pattern that lets operations teams manage the risks associated with change.

The quarantine pattern is different from but related to canary deployments. While canaries primarily test the suitability of a new deployment by controlling the amount of ingress traffic it is allowed to receive, quarantines aim to control the potential impact a deployment may have on the rest of the landscape and thus focus on a deployment’s egress traffic. As we discussed in “How Canary Deployments Work, Part 2: Developer vs. Operator Concerns,” operations teams sometimes choose to combine aspects of both patterns for more complete coverage.

Example Scenario

Let’s assume a company launches an initiative that includes a new mobile app. The app connects to an existing API Gateway, which then routes mobile requests to a new mobile-backend service that acts as a facade to existing production services. Of course, cleanly extracting and provisioning a separate set of production services just for this new mobile-backend service is impractical. To protect the existing service landscape, requests from this service to its underlying production services are therefore rate-limited.

As a first approximation, rate limits are applied across the board. Later, as the team learns more about the mobile-backend service, limits are adjusted individually over time to accommodate its specific fan-out balance.

Quarantining in Kubernetes

Kubernetes doesn’t support rate limits directly, but we can use network policies to limit the egress from one service to specific upstream services. While setting network policies for ingress traffic has been stable since Kubernetes 1.7, setting corresponding egress rules, which are needed for implementing this limited quarantine pattern was only added in Kubernetes 1.8. However, network policies don’t work out of the box in Kubernetes. They need to be supported by the network provider in use. (A list of providers who support them can be found here.)

To quarantine the mobile-backend service, the goal is to only allow egress traffic to products, orders, and users and to deny all other traffic. This can be achieved by applying the network policy shown in figure 1.

Figure 1: Kubernetes network policy limiting “mobile-backend” egress traffic to the “products,” “orders” and “users” services.

As you can see, network policies are enforced at the connection level. In our example, this means the mobile-backend is permitted to initiate a connection to each of the upstream services, but not the other way around.

It should also be noted that, although we have allowed traffic coming from the new mobile-backend service, Kubernetes network policies do not support rate limiting. As a result, although we have locked down the directional flow of traffic, mobile-backend still has unlimited access to upstream services. These services are critical to the enterprise and are therefore exposed to potential degradation and possibly even failure caused by the new mobile-backend service. We will return to this need for rate limiting below when we discuss quarantining with Istio and Glasnostic.

(For an excellent discussion of how network policies work in Kubernetes, see Ahmet Alp Balkan’s “Securing Kubernetes Cluster Networking — The Unofficial Guide to Kubernetes Network Policies.”)

Linkerd is an open source service mesh created by Buoyant, who also coined the term “service mesh.” Initially written in Scala like Twitter’s Finagle library, on which it was based, it has since merged with the newer, lightweight Conduit project and was relaunched as Linkerd 2.0.

As we mentioned in “Preventing Systemic Failure: Bulkheads in Microservice Architectures,” both Linkerd v1 and Linkerd v2 are focused more on routing than on security or enforcing additional policies. And as such they do not support quarantining in any meaningful way. In fact, rate limiting is not supported in either version, although there is a plugin available for Linkerd v1 and a request to have this feature mainlined in a future Linkerd v2 release.

Quarantining with Istio

Istio has experienced a meteoric rise since it became generally available in 2018 and is now the most popular service mesh. Created by Google with the help of IBM and Lyft, it has since joined the Cloud Native Computing Foundation as a platinum member. Unsurprisingly, given its popularity, developers often wonder how they might use it to implement operational patterns such as quarantining.

Istio supports the same network policies as Kubernetes, with the additional ability to specify rate limiting. Configuring Istio to provide rate limiting, however, is a multi-step process. First, policy enforcement needs to be enabled. This, in turn, requires Redis and an adapter so that quotas can be stored. (To merely test configurations, the memquota adapter can be used instead.) Then, each participating service requires a VirtualService definition to which the rate limit can be attached. With all these pieces in place, rate limiting can finally be applied.

Unfortunately, Istio only supports the application of rate limits to VirtualService definitions. There is no corresponding ability to apply rate limits via a "VirtualClient" definition to egress traffic from a set of services. As a result, we'll have to work around this limitation by applying rate limits to each upstream service individually.

Figure 2 lists an Istio configuration to limit the mobile-backend service to 50 requests per second against the products service. It is important to note that if the allowed budget of requests is exceeded, Istio’s data plane doesn’t queue or otherwise delay excess requests, but instead simply returns an HTTP status code of 429 (“Too Many Requests”).

Figure 2: Configuration stanzas required to implement the rate limit for traffic from the “mobile-backend” service to the “products” service.

As mentioned before, a similar configuration must be applied for each of the other two upstream services. Also, operations teams will have to rewrite and reapply these configuration specifications repeatedly as a quarantine is gradually lifted.

Istio does not support quarantining new deployments by default. Operations teams can work around this limitation by writing a script that generates and applies the relevant Istio configuration snippets and by calling it at deployment time.

How to Quarantine with Glasnostic

Glasnostic captures and controls service interactions using channels. Channels are defined by specifying source (client) and destination (service) endpoints of the interactions a channel should apply to.

To implement our example quarantine, we would simply create a channel that covers all mobile-backend instances on the client side and all destinations on the service side. This channel captures all traffic originating from mobile-backend. Because channel definitions in Glasnostic don’t have to refer to actual running endpoints, we can define this channel before mobile-backend is deployed to production and set up an air-gapped environment by suspending it.

Once the mobile-backend service is deployed, we can then apply a general rate limit of 200 requests per second (figure 3). Using this channel, operators can then proceed to relax the limit gradually to support more users of the mobile app. Unlike in Istio, which requires lengthy and tedious YAML configuration objects to be applied for every possible service destination, all these adjustments are made with the click of a button.

***Figure 2****: Configuration stanzas required to implement the rate limit for traffic from the “mobile-backend” service to the “products” service.*

Controlling interactions based on channels allows for a great deal of flexibility in operating a service landscape. To refine this quarantine, for instance, operators can layer on additional channels alongside and over the quarantine channel. A segmentation channel could be added to ensure mobile-backend instances can only interact with its specific dependencies or a critical upstream master data management could receive extra protection from the extra load that the mobile app puts on it by a separate bulkhead channel.

For operations teams looking to quarantine new deployments by default, Glasnostic provides this critical ability via an API that tools such as continuous deployment pipelines, DevOps scripts or security monitors can call into to create and update channels. Used that way, Glasnostic becomes a control plane for the entire ecosystem of tools that an operations group relies on.

Summary

Quarantining deployments is an essential operational pattern in the quest to reduce the risks associated with the constant change in dynamic service landscapes. While canary deployments are designed to shift load towards a new service slowly, the quarantine pattern is designed to release new services into production gradually. In that regard, quarantines are complementary to canary deployments.

Due to the relatively short-term nature of the pattern, it is vital that quarantines be controlled independently from other, slower-moving patterns and policies. This requires the ability to layer policies. Layering policies also allows quarantines to be refined and extended with other patterns.

Although Kubernetes allows operators to define access permissions for services if a suitable network provider is used, it does not support rate limiting for those connections. While Linkerd currently provides no way to quarantine services, Istio does support quarantines at least indirectly, albeit with a high level of configuration overhead. A critical ability in defining quarantines is to specify a blanket rate limit across all destinations. This is difficult to achieve with Istio but straightforward in Glasnostic.

Originally published at https://glasnostic.com on May 28, 2019.